Commit graph

23443 commits

Author SHA1 Message Date
Ke Wang
7d5a251c66 sched: EAS: update trg_cpu to backup_cpu if no energy saving for target_cpu
If no energy saving for target_cpu in the calculation of energy_diff(),
backup_cpu will be set as the new dst_cpu for the next calculation. At this
point, we also need update the new trg_cpu as backup_cpu to make sure the
subsequent calculation of energy_diff() is correct.

Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>
2017-11-02 17:59:53 +00:00
Ke Wang
1cb87c38cb sched: EAS: Fix the condition to distinguish energy before/after
Before commit 5f8b3a757d ("sched/fair: consider task utilization in
group_norm_util()"), eenv->util_delta is used to distinguish energy
before and energy after in sched_group_energy(). After that commit,
eenv->util_delta can not do that any more.

In this commit, use trg_cpu to distinguish energy before/after in
sched_group_energy().

Before apply this commit, cap_before/cap_delta is not correct:
<idle>-0 [001] 147504.608920: sched_energy_diff: pid=7 comm=rcu_preempt
src_cpu=1 dst_cpu=3 usage_delta=7 nrg_before=250 nrg_after=250 nrg_diff=0
cap_before=0 cap_after=528 cap_delta=1056 nrg_delta=0 nrg_payoff=0

After apply this commit, cap_before/cap_delta retrun to normal:
<idle>-0 [001] 220.494011: sched_energy_diff:    pid=7 comm=rcu_preempt
src_cpu=1 dst_cpu=2 usage_delta=3 nrg_before=248 nrg_after=248 nrg_diff=0
cap_before=528 cap_after=528 cap_delta=0 nrg_delta=0 nrg_payoff=0

Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>
2017-11-02 17:58:20 +00:00
Greg Kroah-Hartman
aed4c54ad1 This is the 4.4.96 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAln62hMACgkQONu9yGCS
 aT4KoRAAg9FasYL4oGtTiNglajWWP33eVv5NiwwdFdCfQGOEIzZPQy7ST/I/b1CB
 Kql3D9oA7d8ZFtg05KyQb4csH+9WNjeLBr4G2NGZurEC0c6vbU/64ceADgzA4YKJ
 RJ2iDWEQKq45RVH6BlrDyITu9H20TRgzcsQZ7fiswB3ZJsPTfdyvDlUlN3A4JhXY
 2eTF3CNVPEGLnaT4PY5tuLZpLIZkQkZzw1xr9YCq9YA5aNYi2OywWbyQq27ruX2d
 K238FiaYSN5LeUxU6JE2tk4CxrHJ0pOiw6kBiSgIv3MwDQa5iQypKVQA2tnAXHqL
 rPb4cGAcDSQYzpCu4XimDlLEQhoAX2BceSakdYXoMu66AKewizSnopAljhPHp9uk
 0GO6lSJv0f+NGoCpxOE2FDfMIwiPbLC9LfMDWqpFvPanMfMe156p6D+LL4GfTaus
 x4oZZa61aPwjomobEM4hzZk5bp1AjkiDxKHCBvwpuVTOIFlxlVcuB4RyuY2VsuHN
 4a/tw9iEHkyJYCt3tsePTltgrAws2j7KCWLx+F3LTXWzmZ9//9bFq63V6kIh0a2b
 nPozkt0Xj7iygJwU1G2i5XAMTF5tPH8ELioGiakv0Rkj1ncMSXx1s2dO1uxR06a5
 bx/MFLbo1AyZhE8Tk4LcT/rEHtjhj/24FX6sEq4xNjw/GvAzlp0=
 =ScnL
 -----END PGP SIGNATURE-----

Merge 4.4.96 into android-4.4

Changes in 4.4.96
	workqueue: replace pool->manager_arb mutex with a flag
	ALSA: hda/realtek - Add support for ALC236/ALC3204
	ALSA: hda - fix headset mic problem for Dell machines with alc236
	ceph: unlock dangling spinlock in try_flush_caps()
	usb: xhci: Handle error condition in xhci_stop_device()
	spi: uapi: spidev: add missing ioctl header
	fuse: fix READDIRPLUS skipping an entry
	xen/gntdev: avoid out of bounds access in case of partial gntdev_mmap()
	Input: elan_i2c - add ELAN0611 to the ACPI table
	Input: gtco - fix potential out-of-bound access
	assoc_array: Fix a buggy node-splitting case
	scsi: zfcp: fix erp_action use-before-initialize in REC action trace
	scsi: sg: Re-fix off by one in sg_fill_request_table()
	can: sun4i: fix loopback mode
	can: kvaser_usb: Correct return value in printout
	can: kvaser_usb: Ignore CMD_FLUSH_QUEUE_REPLY messages
	regulator: fan53555: fix I2C device ids
	x86/microcode/intel: Disable late loading on model 79
	ecryptfs: fix dereference of NULL user_key_payload
	Revert "drm: bridge: add DT bindings for TI ths8135"
	Linux 4.4.96

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-11-02 10:24:37 +01:00
Tejun Heo
fce67b31c7 workqueue: replace pool->manager_arb mutex with a flag
commit 692b48258dda7c302e777d7d5f4217244478f1f6 upstream.

Josef reported a HARDIRQ-safe -> HARDIRQ-unsafe lock order detected by
lockdep:

 [ 1270.472259] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
 [ 1270.472783] 4.14.0-rc1-xfstests-12888-g76833e8 #110 Not tainted
 [ 1270.473240] -----------------------------------------------------
 [ 1270.473710] kworker/u5:2/5157 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
 [ 1270.474239]  (&(&lock->wait_lock)->rlock){+.+.}, at: [<ffffffff8da253d2>] __mutex_unlock_slowpath+0xa2/0x280
 [ 1270.474994]
 [ 1270.474994] and this task is already holding:
 [ 1270.475440]  (&pool->lock/1){-.-.}, at: [<ffffffff8d2992f6>] worker_thread+0x366/0x3c0
 [ 1270.476046] which would create a new lock dependency:
 [ 1270.476436]  (&pool->lock/1){-.-.} -> (&(&lock->wait_lock)->rlock){+.+.}
 [ 1270.476949]
 [ 1270.476949] but this new dependency connects a HARDIRQ-irq-safe lock:
 [ 1270.477553]  (&pool->lock/1){-.-.}
 ...
 [ 1270.488900] to a HARDIRQ-irq-unsafe lock:
 [ 1270.489327]  (&(&lock->wait_lock)->rlock){+.+.}
 ...
 [ 1270.494735]  Possible interrupt unsafe locking scenario:
 [ 1270.494735]
 [ 1270.495250]        CPU0                    CPU1
 [ 1270.495600]        ----                    ----
 [ 1270.495947]   lock(&(&lock->wait_lock)->rlock);
 [ 1270.496295]                                local_irq_disable();
 [ 1270.496753]                                lock(&pool->lock/1);
 [ 1270.497205]                                lock(&(&lock->wait_lock)->rlock);
 [ 1270.497744]   <Interrupt>
 [ 1270.497948]     lock(&pool->lock/1);

, which will cause a irq inversion deadlock if the above lock scenario
happens.

The root cause of this safe -> unsafe lock order is the
mutex_unlock(pool->manager_arb) in manage_workers() with pool->lock
held.

Unlocking mutex while holding an irq spinlock was never safe and this
problem has been around forever but it never got noticed because the
only time the mutex is usually trylocked while holding irqlock making
actual failures very unlikely and lockdep annotation missed the
condition until the recent b9c16a0e1f73 ("locking/mutex: Fix
lockdep_assert_held() fail").

Using mutex for pool->manager_arb has always been a bit of stretch.
It primarily is an mechanism to arbitrate managership between workers
which can easily be done with a pool flag.  The only reason it became
a mutex is that pool destruction path wants to exclude parallel
managing operations.

This patch replaces the mutex with a new pool flag POOL_MANAGER_ACTIVE
and make the destruction path wait for the current manager on a wait
queue.

v2: Drop unnecessary flag clearing before pool destruction as
    suggested by Boqun.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 09:40:48 +01:00
Joonwoo Park
9e293db052 sched: EAS: upmigrate misfit current task
Upmigrate misfit current task upon scheduler tick with stopper.

We can kick an random (not necessarily big CPU) NOHZ idle CPU when a
CPU bound task is in need of upmigration.  But it's not efficient as that
way needs following unnecessary wakeups:

  1. Busy little CPU A to kick idle B
  2. B runs idle balancer and enqueue migration/A
  3. B goes idle
  4. A runs migration/A, enqueues busy task on B.
  5. B wakes up again.

This change makes active upmigration more efficiently by doing:

  1. Busy little CPU A find target CPU B upon tick.
  2. CPU A enqueues migration/A.

Change-Id: Ie865738054ea3296f28e6ba01710635efa7193c0
[joonwoop: The original version had logic to reserve CPU.  The logic is
 omitted in this version.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
2017-11-01 15:09:35 -07:00
Prasad Sodagudi
dc626b28ee sched: avoid pushing tasks to an offline CPU
Currently active_load_balance_cpu_stop is run by cpu stopper and it
pushes running tasks off the busiest CPU onto idle target CPU. But
there is no check to see whether target cpu is offline or not before
pushing the tasks.  With the introduction of active migration in the
scheduler tick path (see check_for_migration()) there have been
instances of attempts to migrate tasks to offline CPUs.

Add a check as to whether the target cpu is online or not to prevent
scheduling on offline CPUs.

Change-Id: Ib8ac7f8aeabd3ca7365f3eae977075952dab4f21
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
2017-11-01 15:09:34 -07:00
Srivatsa Vaddagiri
2da014c0d8 sched: Extend active balance to accept 'push_task' argument
Active balance currently picks one task to migrate from busy cpu to
a chosen cpu (push_cpu). This patch extends active load balance to
recognize a particular task ('push_task') that needs to be migrated to
'push_cpu'. This capability will be leveraged by HMP-aware task
placement in a subsequent patch.

Change-Id: If31320111e6cc7044e617b5c3fd6d8e0c0e16952
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2017-11-01 15:09:34 -07:00
Joel Fernandes
3a353d6cea Revert "sched/core: Warn if ENERGY_AWARE is enabled but data is missing"
This reverts commit a21299785a.

Change-Id: Idb707c80788d8b6d26d400a59d9b14f854cce89f
2017-10-30 21:15:39 +00:00
Joel Fernandes
c2f18159a2 Revert "sched/core: fix have_sched_energy_data build warning"
This reverts commit a899b9085c.

Reverting for now due to suspend warning issues.

Change-Id: I9387297b52552a73d5a253b9e8e4467b366479a5
2017-10-30 21:14:39 +00:00
Greg Kroah-Hartman
ceee5bdd47 This is the 4.4.95 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlny7PcACgkQONu9yGCS
 aT7UAw//Zqfv7J8fAyfeTnVsbBg5GcD7SW7clzILMclb2nCGcS1g6ZFZAOoWRZV3
 eSnYKLBvOxfnF+TunD/aNagtKMIEuPqA/9Srd3uL2sdso1FtMsL8PR07Ml1o3Qf3
 eUeKt/JfchpD5r+1ca4CWqRDvFyuQwP8RMsbXUqjsDEy/5USeGOzPGa9jdQ5JI/O
 zvXlHyWuXryvDWHlM52H67CSdZC2KSUNeybji8EqrJlK2yijJQ8CvN7jrBevSVVy
 2XpMx9RoeAbOAH36VuW5R3/tpUPW70L3Tw8zTo8dfs5w/TEMgddzDd4aWz8PLLqo
 mhD6V3bgkqzkbXEaLvmmht/IMg497K6HcAFzfDu/M8X1wSaNfNmJ8IBKmDTBeuBO
 t5Ha2mqDxiCTrEXq/USEgiBW28PJXKv3C5MhdSlYPWo/QGho0boTRirmbbmYupf/
 T02LwRWo8MQT9l+sX5zFx+/Tw/f5/f1bWSWVJ5ns+lFbQ5smnA2nxvcPLg6LuGEe
 tXZ7R+7v5yFp5quyUPTz6eY8Tau4mswuzm4avob0QLr6ZDQXTt8WbD8kmaQRoDq4
 U0k58ZZbdPnOyf0zxFTURQoPk+MI/EV9tclQcWEsR+AXWVzqA71rSUMCcSl7Gad2
 /vSkgDcSQ1xt0UQ7Bqf7PdNSVLtfH/n/jXJGJXwYW/Q0r/zRTYY=
 =vcfF
 -----END PGP SIGNATURE-----

Merge 4.4.95 into android-4.4

Changes in 4.4.95
	USB: devio: Revert "USB: devio: Don't corrupt user memory"
	USB: core: fix out-of-bounds access bug in usb_get_bos_descriptor()
	USB: serial: metro-usb: add MS7820 device id
	usb: cdc_acm: Add quirk for Elatec TWN3
	usb: quirks: add quirk for WORLDE MINI MIDI keyboard
	usb: hub: Allow reset retry for USB2 devices on connect bounce
	ALSA: usb-audio: Add native DSD support for Pro-Ject Pre Box S2 Digital
	can: gs_usb: fix busy loop if no more TX context is available
	usb: musb: sunxi: Explicitly release USB PHY on exit
	usb: musb: Check for host-mode using is_host_active() on reset interrupt
	can: esd_usb2: Fix can_dlc value for received RTR, frames
	drm/nouveau/bsp/g92: disable by default
	drm/nouveau/mmu: flush tlbs before deleting page tables
	ALSA: seq: Enable 'use' locking in all configurations
	ALSA: hda: Remove superfluous '-' added by printk conversion
	i2c: ismt: Separate I2C block read from SMBus block read
	brcmsmac: make some local variables 'static const' to reduce stack size
	bus: mbus: fix window size calculation for 4GB windows
	clockevents/drivers/cs5535: Improve resilience to spurious interrupts
	rtlwifi: rtl8821ae: Fix connection lost problem
	KEYS: encrypted: fix dereference of NULL user_key_payload
	lib/digsig: fix dereference of NULL user_key_payload
	KEYS: don't let add_key() update an uninstantiated key
	pkcs7: Prevent NULL pointer dereference, since sinfo is not always set.
	parisc: Avoid trashing sr2 and sr3 in LWS code
	parisc: Fix double-word compare and exchange in LWS code on 32-bit kernels
	sched/autogroup: Fix autogroup_move_group() to never skip sched_move_task()
	f2fs crypto: replace some BUG_ON()'s with error checks
	f2fs crypto: add missing locking for keyring_key access
	fscrypt: fix dereference of NULL user_key_payload
	KEYS: Fix race between updating and finding a negative key
	fscrypto: require write access to mount to set encryption policy
	FS-Cache: fix dereference of NULL user_key_payload
	Linux 4.4.95

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-10-30 09:21:18 +01:00
Joel Fernandes
a899b9085c sched/core: fix have_sched_energy_data build warning
have_sched_energy_data is defined only for CONFIG_SMP, so declare it
only with CONFIG_SMP.

Fixes warning from intel bot:

tree:   https://android.googlesource.com/kernel/msm android-4.4
head:   a21299785a
commit: a21299785a [5/5] sched/core: Warn
if ENERGY_AWARE is enabled but data is missing
config: i386-randconfig-x002-201743 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        git checkout a21299785a
        # save the attached .config to linux build tree
        make ARCH=i386

All warnings (new ones prefixed by >>):

>> kernel//sched/core.c:94:13: warning: 'have_sched_energy_data' used
but never defined
    static bool have_sched_energy_data(void);
                ^~~~~~~~~~~~~~~~~~~~~~

vim +/have_sched_energy_data +94 kernel//sched/core.c

    93
  > 94  static bool have_sched_energy_data(void);
    95

Change-Id: I266b63ece6fb31d2b5b11821a8244e147ba6d3a4
Signed-off-by: Joel Fernandes <joelaf@google.com>
2017-10-27 13:24:40 -07:00
Brendan Jackman
a21299785a sched/core: Warn if ENERGY_AWARE is enabled but data is missing
If the EAS energy model is missing or incomplete, i.e. sd_scs is NULL, then
sched_group_energy will return -EINVAL on the assumption that it raced with a
CPU hotplug event. In that case, energy_diff will return 0 and the energy-aware
wake path will silently fail to trigger any migrations.

This case can be triggered by disabling CONFIG_SCHED_MC on existing platforms,
so that there are no sched_groups with the SD_SHARE_CAP_STATES flag, so that
sd_scs is NULL.

Add checks so that a warning is printed if EAS is ever enabled while the
necessary data is not present.

Change-Id: Id233a510b5ad8b7fcecac0b1d789e730bbfc7c4a
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
2017-10-27 19:13:36 +00:00
Vikram Mulukutla
e79f447a97 sched: walt: Correct WALT window size initialization
It is preferable that WALT window rollover occurs just
before a tick, since the tick is an opportune moment
to record a complete window's statistics, as well as report
those stats to the cpu frequency governor. When CONFIG_HZ
results in a TICK_NSEC that isn't a integral number, this
requirement may be violated. Account for this by reducing
the WALT window size to the nearest multiple of TICK_NSEC.

Commit d368c6faa1 ("sched: walt: fix window misalignment
when HZ=300") attempted to do this but WALT isn't using
MIN_SCHED_RAVG_WINDOW as the window size and the patch was
doing nothing.

Also, change the type of 'walt_disabled' to bool and warn
if an invalid window size causes WALT to be disabled.

Change-Id: Ie3dcfc21a3df4408254ca1165a355bbe391ed5c7
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
2017-10-27 11:58:48 -07:00
Brendan Jackman
38ddcff85a FROMLIST: sched/fair: Use wake_q length as a hint for wake_wide
(from https://patchwork.kernel.org/patch/9895261/)

This patch adds a parameter to select_task_rq, sibling_count_hint
allowing the caller, where it has this information, to inform the
sched_class the number of tasks that are being woken up as part of
the same event.

The wake_q mechanism is one case where this information is available.

select_task_rq_fair can then use the information to detect that it
needs to widen the search space for task placement in order to avoid
overloading the last-level cache domain's CPUs.

                               * * *

The reason I am investigating this change is the following use case
on ARM big.LITTLE (asymmetrical CPU capacity): 1 task per CPU, which
all repeatedly do X amount of work then
pthread_barrier_wait (i.e. sleep until the last task finishes its X
and hits the barrier). On big.LITTLE, the tasks which get a "big" CPU
finish faster, and then those CPUs pull over the tasks that are still
running:

     v CPU v           ->time->

                    -------------
   0  (big)         11111  /333
                    -------------
   1  (big)         22222   /444|
                    -------------
   2  (LITTLE)      333333/
                    -------------
   3  (LITTLE)      444444/
                    -------------

Now when task 4 hits the barrier (at |) and wakes the others up,
there are 4 tasks with prev_cpu=<big> and 0 tasks with
prev_cpu=<little>. want_affine therefore means that we'll only look
in CPUs 0 and 1 (sd_llc), so tasks will be unnecessarily coscheduled
on the bigs until the next load balance, something like this:

     v CPU v           ->time->

                    ------------------------
   0  (big)         11111  /333  31313\33333
                    ------------------------
   1  (big)         22222   /444|424\4444444
                    ------------------------
   2  (LITTLE)      333333/          \222222
                    ------------------------
   3  (LITTLE)      444444/            \1111
                    ------------------------
                                 ^^^
                           underutilization

So, I'm trying to get want_affine = 0 for these tasks.

I don't _think_ any incarnation of the wakee_flips mechanism can help
us here because which task is waker and which tasks are wakees
generally changes with each iteration.

However pthread_barrier_wait (or more accurately FUTEX_WAKE) has the
nice property that we know exactly how many tasks are being woken, so
we can cheat.

It might be a disadvantage that we "widen" _every_ task that's woken in
an event, while select_idle_sibling would work fine for the first
sd_llc_size - 1 tasks.

IIUC, if wake_affine() behaves correctly this trick wouldn't be
necessary on SMP systems, so it might be best guarded by the presence
of SD_ASYM_CPUCAPACITY?

                               * * *

Final note..

In order to observe "perfect" behaviour for this use case, I also had
to disable the TTWU_QUEUE sched feature. Suppose during the wakeup
above we are working through the work queue and have placed tasks 3
and 2, and are about to place task 1:

     v CPU v           ->time->

                    --------------
   0  (big)         11111  /333  3
                    --------------
   1  (big)         22222   /444|4
                    --------------
   2  (LITTLE)      333333/      2
                    --------------
   3  (LITTLE)      444444/          <- Task 1 should go here
                    --------------

If TTWU_QUEUE is enabled, we will not yet have enqueued task
2 (having instead sent a reschedule IPI) or attached its load to CPU
2. So we are likely to also place task 1 on cpu 2. Disabling
TTWU_QUEUE means that we enqueue task 2 before placing task 1,
solving this issue. TTWU_QUEUE is there to minimise rq lock
contention, and I guess that this contention is less of an issue on
big.LITTLE systems since they have relatively few CPUs, which
suggests the trade-off makes sense here.

Change-Id: I2080302839a263e0841a89efea8589ea53bbda9c
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
2017-10-27 18:50:59 +00:00
Joonwoo Park
43bd960dfe sched: WALT: account cumulative window demand
Energy cost estimation has been a long lasting challenge for WALT
because WALT guides CPU frequency based on the CPU utilization of
previous window.  Consequently it's not possible to know newly
waking-up task's energy cost until WALT's end of the current window.

The WALT already tracks 'Previous Runnable Sum' (prev_runnable_sum)
and 'Cumulative Runnable Average' (cr_avg).  They are designed for
CPU frequency guidance and task placement but unfortunately both
are not suitable for the energy cost estimation.

It's because using prev_runnable_sum for energy cost calculation would
make us to account CPU and task's energy solely based on activity in the
previous window so for example, any task didn't have an activity in the
previous window will be accounted as a 'zero energy cost' task.
Energy estimation with cr_avg is what energy_diff() relies on at present.
However cr_avg can only represent instantaneous picture of energy cost
thus for example, if a CPU was fully occupied for an entire WALT window
and became idle just before window boundary, and if there is a wake-up,
energy_diff() accounts that CPU is a 'zero energy cost' CPU.

As a result, introduce a new accounting unit 'Cumulative Window Demand'.
The cumulative window demand tracks all the tasks' demands have seen in
current window which is neither instantaneous nor actual execution time.
Because task demand represents estimated scaled execution time when the
task runs a full window, accumulation of all the demands represents
predicted CPU load at the end of window.

Thus we can estimate CPU's frequency at the end of current WALT window
with the cumulative window demand.

The use of prev_runnable_sum for the CPU frequency guidance and cr_avg
for the task placement have not changed and these are going to be used
for both purpose while this patch aims to add an additional statistics.

Change-Id: I9908c77ead9973a26dea2b36c001c2baf944d4f5
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2017-10-27 18:10:15 +00:00
Leo Yan
effc721b3c sched/fair: remove useless variable in find_best_target
Patch 5680f23f20 ("sched/fair: streamline find_best_target
heuristics") has reworked function find_best_target, as result the
variable "target_util" is useless now. So remove it.

Change-Id: I5447062419e5828a49115119984fac6cd37db034
Signed-off-by: Leo Yan <leo.yan@linaro.org>
2017-10-27 18:04:43 +00:00
Blagovest Kolenichev
dbad9b8f72 Merge android-4.4@89074de (v4.4.94) into msm-4.4
* refs/heads/tmp-89074de
  Linux 4.4.94
  Revert "tty: goldfish: Fix a parameter of a call to free_irq"
  cpufreq: CPPC: add ACPI_PROCESSOR dependency
  nfsd/callback: Cleanup callback cred on shutdown
  target/iscsi: Fix unsolicited data seq_end_offset calculation
  uapi: fix linux/mroute6.h userspace compilation errors
  uapi: fix linux/rds.h userspace compilation errors
  ceph: clean up unsafe d_parent accesses in build_dentry_path
  i2c: at91: ensure state is restored after suspending
  net: mvpp2: release reference to txq_cpu[] entry after unmapping
  scsi: scsi_dh_emc: return success in clariion_std_inquiry()
  slub: do not merge cache if slub_debug contains a never-merge flag
  ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
  crypto: xts - Add ECB dependency
  net/mlx4_core: Fix VF overwrite of module param which disables DMFS on new probed PFs
  sparc64: Migrate hvcons irq to panicked cpu
  md/linear: shutup lockdep warnning
  f2fs: do not wait for writeback in write_begin
  Btrfs: send, fix failure to rename top level inode due to name collision
  iio: adc: xilinx: Fix error handling
  netfilter: nf_ct_expect: Change __nf_ct_expect_check() return value.
  net/mlx4_en: fix overflow in mlx4_en_init_timestamp()
  mac80211: fix power saving clients handling in iwlwifi
  mac80211_hwsim: check HWSIM_ATTR_RADIO_NAME length
  irqchip/crossbar: Fix incorrect type of local variables
  watchdog: kempld: fix gcc-4.3 build
  locking/lockdep: Add nest_lock integrity test
  Revert "bsg-lib: don't free job in bsg_prepare_job"
  tipc: use only positive error codes in messages
  net: Set sk_prot_creator when cloning sockets to the right proto
  packet: only test po->has_vnet_hdr once in packet_snd
  packet: in packet_do_bind, test fanout with bind_lock held
  tun: bail out from tun_get_user() if the skb is empty
  l2tp: fix race condition in l2tp_tunnel_delete
  l2tp: Avoid schedule while atomic in exit_net
  vti: fix use after free in vti_tunnel_xmit/vti6_tnl_xmit
  isdn/i4l: fetch the ppp_write buffer in one shot
  bpf: one perf event close won't free bpf program attached by another perf event
  packet: hold bind lock when rebinding to fanout hook
  net: emac: Fix napi poll list corruption
  ip6_gre: skb_push ipv6hdr before packing the header in ip6gre_header
  udpv6: Fix the checksum computation when HW checksum does not apply
  bpf/verifier: reject BPF_ALU64|BPF_END
  sctp: potential read out of bounds in sctp_ulpevent_type_enabled()
  MIPS: Fix minimum alignment requirement of IRQ stack
  drm/dp/mst: save vcpi with payloads
  percpu: make this_cpu_generic_read() atomic w.r.t. interrupts
  trace: sched: Fix util_avg_walt in sched_load_avg_cpu trace
  sched/fair: remove erroneous RCU_LOCKDEP_WARN from start_cpu()
  sched: EAS/WALT: finish accounting prior to task_tick
  cpufreq: sched: update capacity request upon tick always
  sched/fair: prevent meaningless active migration
  sched: walt: Leverage existing helper APIs to apply invariance

Conflicts:
	kernel/sched/core.c
	kernel/sched/fair.c
	kernel/sched/sched.h

Change-Id: I0effac90fb6a4db559479bfa2fefa31c41200ce9
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2017-10-27 10:56:12 -07:00
Russ Weight
e3ba92c160 sched/tune: access schedtune_initialized under CGROUP_SCHEDTUNE
schedtune_initialized is protected by CONFIG_CGROUP_SCHEDTUNE, but
is being used without CONFIG_CGROUP_SCHEDTUNE being defined. Add
appropriate ifdefs around the usage of schedtune_initialized to
avoid a compilation error when CONFIG_CGROUP_SCHEDTUNE is not
defined.

Change-Id: Iab79bf053d74db3eeb84c09d71d43b4e39746ed2
Signed-off-by: Russ Weight <russell.h.weight@intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
2017-10-27 13:30:34 +01:00
Patrick Bellasi
3c71cbb896 sched/fair: consider task utilization in group_max_util()
The group_max_util() function is used to compute the maximum utilization
across the CPUs of a certain energy_env configuration.
Its main client is the energy_diff function when it needs to compute the
SG capacity for one of the before/after scheduling candidates.

Currently, the energy_diff function sets util_delta = 0 when it wants to
compute the energy corresponding to the scheduling candidate where the
task runs in the previous CPU. This implies that, for the task waking up
in the previous CPU we consider only its blocked load tracked by the CPU
RQ. However, in case of a medium-big task which is waking up on a long
time idle CPU, this blocked load can be already completely decayed.

More in general, the current approach is biased towards under-estimating
the capacity requirements for the "before" scheduling candidate.

This patch fixes this by:
- always use the cpu_util_wake() to properly get the utilization of a CPU
  without any (partially decayed) contribution of the waking up task
- adding the task utilization to the cpu_util_wake just for the target
  cpu

The "target CPU" is defined by the energy_env to be either the src_cpu or
the dst_cpu, depending on which scheduling candidate we are considering.

Finally, since this update removes the last usage of calc_util_delta()
this function is now safely removed.

Change-Id: I20ee1bcf40cee6bf6e265fb2d32ef79061ad6ced
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:34 +01:00
Chris Redpath
5f8b3a757d sched/fair: consider task utilization in group_norm_util()
The group_norm_util() function is used to compute the normalized
utilization of a SG given a certain energy_env configuration.
The main client of this function is the energy_diff function when it
comes to compute the SG energy for one of the before/after scheduling
candidates.

Currently, the energy_diff function sets util_delta = 0 when it wants to
compute the energy corresponding to the scheduling candidate where the
task runs in the previous CPU. This implies that, for the task waking up
in the previous CPU we consider only its blocked load tracked by the CPU
RQ. However, in case of a medium-big task which is waking up on a long
time idle CPU, this blocked load can be already completely decayed.

More in general, the current approach is biased towards under-estimating
the energy consumption for the "before" scheduling candidate.

This patch fixes this by:
- always use the cpu_util_wake() to properly get the utilization of a CPU
  without any (partially decayed) contribution of the waking up task
- adding the task utilization to the cpu_util_wake just for the
  target cpu

The "target CPU" is defined by the energy_env to be either the src_cpu
or the dst_cpu, depending on which scheduling candidate we are
considering.

This patch update also the definition of __cpu_norm_util(), which is
currently called just by the group_norm_util() function. This allows to
simplify the code by using this function just to normalize a specified
utilization with respect to a given capacity.

This update allows to completely remove any dependency of
group_norm_util() from calc_util_delta().

Change-Id: I3b6ec50ce8decb1521faae660e326ab3319d3c82
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:34 +01:00
Patrick Bellasi
ca42e80446 sched/fair: enforce EAS mode
For non latency sensitive tasks the goal is to optimize for energy efficiency.
Thus, we should try our best to avoid moving a task on a CPU which is then
going to be marked as overutilized.

Let's use the capacity_margin metric to verify if a candidate target CPU
should be considered without risking to bail out of EAS mode.

Change-Id: Ib3697106f4073aedf4a6c6ce42bd5d000fa8c007
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Patrick Bellasi
4edc5b0e38 sched/fair: ignore backup CPU when not valid
The find_best_target can sometimes not return a valid backup CPU, either
because it cannot find one or just becasue it returns prev_cpu as a backup.
In these cases we should skip the energy_diff evaluation for the backup CPU.

Change-Id: I3787dbdfe74122348dd7a7485b88c4679051bd32
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Patrick Bellasi
2aada289d7 sched/fair: trace energy_diff for non boosted tasks
In systems where SchedTune is enabled, we do not report energy diff for non
boosted tasks. Let's fix this by always genereting an energy_diff event where
however:
  nrg.delta = 0, since we skip energy normalization
  payoff = nrg.diff, since the payoff is defined just by the energy difference

Change-Id: I9a11ec19b6f56da04147f5ae5b47daf1dd180445
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Brendan Jackman
2f30db8df4 UPSTREAM: sched/fair: Sync task util before slow-path wakeup
We use task_util() in find_idlest_group() via capacity_spare_wake().
This task_util() updated in wake_cap(). However wake_cap() is not the
only reason for ending up in find_idlest_group() - we could have been sent
there by wake_wide(). So explicitly sync the task util with prev_cpu
when we are about to head to find_idlest_group().

We could simply do this at the beginning of
select_task_rq_fair() (i.e. irrespective of whether we're heading to
select_idle_sibling() or find_idlest_group() & co), but I didn't want to
slow down the select_idle_sibling() path more than necessary.

Don't do this during fork balancing, we won't need the task_util and
we'd just clobber the last_update_time, which is supposed to be 0.

Change-Id: I935f4bfdfec3e8b914457aac3387ce264d5fd484
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andres Oportus <andresoportus@google.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Link: http://lkml.kernel.org/r/20170808095519.10077-1-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry-picked-from: commit ea16f0ea6c3d tip:sched/core)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Brendan Jackman
5a86636649 UPSTREAM: sched/fair: Fix usage of find_idlest_group() when the local group is idlest
find_idlest_group() returns NULL when the local group is idlest. The
caller then continues the find_idlest_group() search at a lower level
of the current CPU's sched_domain hierarchy. find_idlest_group_cpu() is
not consulted and, crucially, @new_cpu is not updated. This means the
search is pointless and we return @prev_cpu from select_task_rq_fair().

This is fixed by initialising @new_cpu to @cpu instead of @prev_cpu.

Change-Id: Ie531f5bb29775952bdc4c148b6e974b2f5f32b7a
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171005114516.18617-6-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry-picked-from: commit 93f50f90247e tip:sched/core)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Brendan Jackman
4116547645 UPSTREAM: sched/fair: Fix usage of find_idlest_group() when no groups are allowed
When 'p' is not allowed on any of the CPUs in the sched_domain, we
currently return NULL from find_idlest_group(), and pointlessly
continue the search on lower sched_domain levels (where 'p' is also not
allowed) before returning prev_cpu regardless (as we have not updated
new_cpu).

Add an explicit check for this case, and add a comment to
find_idlest_group(). Now when find_idlest_group() returns NULL, it always
means that the local group is allowed and idlest.

Change-Id: I5f2648d2f7fb0465677961ecb7473df3d06f0057
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171005114516.18617-5-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry-picked-from: commit 6fee85ccbc76 tip:sched/core)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Brendan Jackman
9c825cf616 BACKPORT: sched/fair: Fix find_idlest_group when local group is not allowed
When the local group is not allowed we do not modify this_*_load from
their initial value of 0. That means that the load checks at the end
of find_idlest_group cause us to incorrectly return NULL. Fixing the
initial values to ULONG_MAX means we will instead return the idlest
remote group in that case.

BACKPORT: Note 4.4 is missing commit 6b94780e45c1 "sched/core: Use
load_avg for selecting idlest group", so we only have to fix
this_load instead of this_runnable_load and this_avg_load.

Change-Id: I41f775b0e7c8f5e675c2780f955bb130a563cba7
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171005114516.18617-4-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry-picked-from: commit 0d10ab952e99 tip:sched/core)
(backport changes described above)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Brendan Jackman
529def2ffe UPSTREAM: sched/fair: Remove unnecessary comparison with -1
Since commit:

  83a0a96a5f ("sched/fair: Leverage the idle state info when choosing the "idlest" cpu")

find_idlest_group_cpu() (formerly find_idlest_cpu) no longer returns -1,
so we can simplify the checking of the return value in find_idlest_cpu().

Change-Id: I98f4b9f178cd93a30408e024e608d36771764c7b
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171005114516.18617-3-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry-picked-from commit e90381eaecf6 in tip:sched/core)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Brendan Jackman
0f743ce745 BACKPORT: sched/fair: Move select_task_rq_fair slow-path into its own function
In preparation for changes that would otherwise require adding a new
level of indentation to the while(sd) loop, create a new function
find_idlest_cpu() which contains this loop, and rename the existing
find_idlest_cpu() to find_idlest_group_cpu().

Code inside the while(sd) loop is unchanged. @new_cpu is added as a
variable in the new function, with the same initial value as the
@new_cpu in select_task_rq_fair().

Change-Id: I9842308cab00dc9cd6c513fc38c609089a1aaaaf
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171005114516.18617-2-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(reworked for eas/cas schedstats added in Android)
(cherry-picked commit 18bd1b4bd53a from tip:sched/core)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Brendan Jackman
795a6867cf UPSTREAM: sched/fair: Force balancing on nohz balance if local group has capacity
The "goto force_balance" here is intended to mitigate the fact that
avg_load calculations can result in bad placement decisions when
priority is asymmetrical.

The original commit that adds it:

  fab476228b ("sched: Force balancing on newidle balance if local group has capacity")

explains:

    Under certain situations, such as a niced down task (i.e. nice =
    -15) in the presence of nr_cpus NICE0 tasks, the niced task lands
    on a sched group and kicks away other tasks because of its large
    weight. This leads to sub-optimal utilization of the
    machine. Even though the sched group has capacity, it does not
    pull tasks because sds.this_load >> sds.max_load, and f_b_g()
    returns NULL.

A similar but inverted issue also affects ARM big.LITTLE (asymmetrical CPU
capacity) systems - consider 8 always-running, same-priority tasks on a
system with 4 "big" and 4 "little" CPUs. Suppose that 5 of them end up on
the "big" CPUs (which will be represented by one sched_group in the DIE
sched_domain) and 3 on the "little" (the other sched_group in DIE), leaving
one CPU unused. Because the "big" group has a higher group_capacity its
avg_load may not present an imbalance that would cause migrating a
task to the idle "little".

The force_balance case here solves the problem but currently only for
CPU_NEWLY_IDLE balances, which in theory might never happen on the
unused CPU. Including CPU_IDLE in the force_balance case means
there's an upper bound on the time before we can attempt to solve the
underutilization: after DIE's sd->balance_interval has passed the
next nohz balance kick will help us out.

Change-Id: I807ba5cba0ef1b8bbec02cbcd4755fd32af10135
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170807163900.25180-1-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry-picked-from: commit 583ffd99d765 tip:sched/core)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Peter Zijlstra
fd4a95dab8 UPSTREAM: sched/core: Add missing update_rq_clock() call in set_user_nice()
Address this rq-clock update bug:

  WARNING: CPU: 30 PID: 195 at ../kernel/sched/sched.h:797 set_next_entity()
  rq->clock_update_flags < RQCF_ACT_SKIP

  Call Trace:
    dump_stack()
    __warn()
    warn_slowpath_fmt()
    set_next_entity()
    ? _raw_spin_lock()
    set_curr_task_fair()
    set_user_nice.part.85()
    set_user_nice()
    create_worker()
    worker_thread()
    kthread()
    ret_from_fork()

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 2fb8d36787affe26f3536c3d8ec094995a48037d)
Change-Id: I53ba056e72820c7fadb3f022e4ee3b821c0de17d
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Peter Zijlstra
bab39eb879 UPSTREAM: sched/core: Add missing update_rq_clock() call for task_hot()
Add the update_rq_clock() call at the top of the callstack instead of
at the bottom where we find it missing, this to aid later effort to
minimize the number of update_rq_lock() calls.

  WARNING: CPU: 30 PID: 194 at ../kernel/sched/sched.h:797 assert_clock_updated()
  rq->clock_update_flags < RQCF_ACT_SKIP

  Call Trace:
    dump_stack()
    __warn()
    warn_slowpath_fmt()
    assert_clock_updated.isra.63.part.64()
    can_migrate_task()
    load_balance()
    pick_next_task_fair()
    __schedule()
    schedule()
    worker_thread()
    kthread()

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 3bed5e2166a5e433bf62162f3cd3c5174d335934)
Change-Id: Ief5070dcce486535334dcb739ee16b989ea9df42
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Peter Zijlstra
bea1b621d9 UPSTREAM: sched/core: Add missing update_rq_clock() in detach_task_cfs_rq()
Instead of adding the update_rq_clock() all the way at the bottom of
the callstack, add one at the top, this to aid later effort to
minimize update_rq_lock() calls.

  WARNING: CPU: 0 PID: 1 at ../kernel/sched/sched.h:797 detach_task_cfs_rq()
  rq->clock_update_flags < RQCF_ACT_SKIP

  Call Trace:
    dump_stack()
    __warn()
    warn_slowpath_fmt()
    detach_task_cfs_rq()
    switched_from_fair()
    __sched_setscheduler()
    _sched_setscheduler()
    sched_set_stop_task()
    cpu_stop_create()
    __smpboot_create_thread.part.2()
    smpboot_register_percpu_thread_cpumask()
    cpu_stop_init()
    do_one_initcall()
    ? print_cpu_info()
    kernel_init_freeable()
    ? rest_init()
    kernel_init()
    ret_from_fork()

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 80f5c1b84baa8180c3c27b7e227429712cd967b6)
Change-Id: Ibffde077d18eabec4c2984158bd9d6d73bd0fb96
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Peter Zijlstra
4863faf5e4 UPSTREAM: sched/core: Add missing update_rq_clock() in post_init_entity_util_avg()
Address this rq-clock update bug:

  WARNING: CPU: 0 PID: 0 at ../kernel/sched/sched.h:797 post_init_entity_util_avg()
  rq->clock_update_flags < RQCF_ACT_SKIP

  Call Trace:
    __warn()
    post_init_entity_util_avg()
    wake_up_new_task()
    _do_fork()
    kernel_thread()
    rest_init()
    start_kernel()

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 4126bad6717336abe5d666440ae15555563ca53f)
Change-Id: Ibe9a73386896377f96483d195e433259218755a5
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Vincent Guittot
c14c9b6e3e UPSTREAM: sched/core: Fix find_idlest_group() for fork
During fork, the utilization of a task is init once the rq has been
selected because the current utilization level of the rq is used to
set the utilization of the fork task. As the task's utilization is
still 0 at this step of the fork sequence, it doesn't make sense to
look for some spare capacity that can fit the task's utilization.
Furthermore, I can see perf regressions for the test:

   hackbench -P -g 1

because the least loaded policy is always bypassed and tasks are not
spread during fork.

With this patch and the fix below, we are back to same performances as
for v4.8. The fix below is only a temporary one used for the test
until a smarter solution is found because we can't simply remove the
test which is useful for others benchmarks

| @@ -5708,13 +5708,6 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
|
|	avg_cost = this_sd->avg_scan_cost;
|
| -	/*
| -	 * Due to large variance we need a large fuzz factor; hackbench in
| -	 * particularly is sensitive here.
| -	 */
| -	if ((avg_idle / 512) < avg_cost)
| -		return -1;
| -
|	time = local_clock();
|
|	for_each_cpu_wrap(cpu, sched_domain_span(sd), target, wrap) {

Tested-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: kernellwp@gmail.com
Cc: umgwanakikbuti@gmail.com
Cc: yuyang.du@intel.comc
Link: http://lkml.kernel.org/r/1481216215-24651-2-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit f519a3f1c6b7a990e5aed37a8f853c6ecfdee945)
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: I86cc2ad81af3467c0b2f82b995111f428248baa4
2017-10-27 13:30:33 +01:00
Peter Zijlstra
97cb74f485 BACKPORT: sched/fair: Fix PELT integrity for new tasks
Vincent and Yuyang found another few scenarios in which entity
tracking goes wobbly.

The scenarios are basically due to the fact that new tasks are not
immediately attached and thereby differ from the normal situation -- a
task is always attached to a cfs_rq load average (such that it
includes its blocked contribution) and are explicitly
detached/attached on migration to another cfs_rq.

Scenario 1: switch to fair class

  p->sched_class = fair_class;
  if (queued)
    enqueue_task(p);
      ...
        enqueue_entity()
	  enqueue_entity_load_avg()
	    migrated = !sa->last_update_time (true)
	    if (migrated)
	      attach_entity_load_avg()
  check_class_changed()
    switched_from() (!fair)
    switched_to()   (fair)
      switched_to_fair()
        attach_entity_load_avg()

If @p is a new task that hasn't been fair before, it will have
!last_update_time and, per the above, end up in
attach_entity_load_avg() _twice_.

Scenario 2: change between cgroups

  sched_move_group(p)
    if (queued)
      dequeue_task()
    task_move_group_fair()
      detach_task_cfs_rq()
        detach_entity_load_avg()
      set_task_rq()
      attach_task_cfs_rq()
        attach_entity_load_avg()
    if (queued)
      enqueue_task();
        ...
          enqueue_entity()
	    enqueue_entity_load_avg()
	      migrated = !sa->last_update_time (true)
	      if (migrated)
	        attach_entity_load_avg()

Similar as with scenario 1, if @p is a new task, it will have
!load_update_time and we'll end up in attach_entity_load_avg()
_twice_.

Furthermore, notice how we do a detach_entity_load_avg() on something
that wasn't attached to begin with.

As stated above; the problem is that the new task isn't yet attached
to the load tracking and thereby violates the invariant assumption.

This patch remedies this by ensuring a new task is indeed properly
attached to the load tracking on creation, through
post_init_entity_util_avg().

Of course, this isn't entirely as straightforward as one might think,
since the task is hashed before we call wake_up_new_task() and thus
can be poked at. We avoid this by adding TASK_NEW and teaching
cpu_cgroup_can_attach() to refuse such tasks.

.:: BACKPORT

Complicated by the fact that mch of the lines changed by the original
of this commit were then changed by:

df217913e72e sched/fair: Factorize attach/detach entity <Vincent Guittot>

and then

d31b1a66cbe0 sched/fair: Factorize PELT update <Vincent Guittot>

, which have both already been backported here.

Reported-by: Yuyang Du <yuyang.du@intel.com>
Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 7dc603c9028ea5d4354e0e317e8481df99b06d7e)
Change-Id: Ibc59eb52310a62709d49a744bd5a24e8b97c4ae8
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:32 +01:00
Vincent Guittot
138a670d97 BACKPORT: sched/cgroup: Fix cpu_cgroup_fork() handling
A new fair task is detached and attached from/to task_group with:

  cgroup_post_fork()
    ss->fork(child) := cpu_cgroup_fork()
      sched_move_task()
        task_move_group_fair()

Which is wrong, because at this point in fork() the task isn't fully
initialized and it cannot 'move' to another group, because its not
attached to any group as yet.

In fact, cpu_cgroup_fork() needs a small part of sched_move_task() so we
can just call this small part directly instead sched_move_task(). And
the task doesn't really migrate because it is not yet attached so we
need the following sequence:

  do_fork()
    sched_fork()
      __set_task_cpu()

    cgroup_post_fork()
      set_task_rq() # set task group and runqueue

    wake_up_new_task()
      select_task_rq() can select a new cpu
      __set_task_cpu
      post_init_entity_util_avg
        attach_task_cfs_rq()
      activate_task
        enqueue_task

This patch makes that happen.

BACKPORT: Difference from original commit:

- Removed use of DEQUEUE_MOVE (which isn't defined in 4.4) in
  dequeue_task flags
- Replaced "struct rq_flags rf" with "unsigned long flags".

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
[ Added TASK_SET_GROUP to set depth properly. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit ea86cb4b7621e1298a37197005bf0abcc86348d4)
Change-Id: I8126fd923288acf961218431ffd29d6bf6fd8d72
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:32 +01:00
Peter Zijlstra
6b02ab68ec UPSTREAM: sched/fair: Fix and optimize the fork() path
The task_fork_fair() callback already calls __set_task_cpu() and takes
rq->lock.

If we move the sched_class::task_fork callback in sched_fork() under
the existing p->pi_lock, right after its set_task_cpu() call, we can
avoid doing two such calls and omit the IRQ disabling on the rq->lock.

Change to __set_task_cpu() to skip the migration bits, this is a new
task, not a migration. Similarly, make wake_up_new_task() use
__set_task_cpu() for the same reason, the task hasn't actually
migrated as it hasn't ever ran.

This cures the problem of calling migrate_task_rq_fair(), which does
remove_entity_from_load_avg() on tasks that have never been added to
the load avg to begin with.

This bug would result in transiently messed up load_avg values, averaged
out after a few dozen milliseconds. This is probably the reason why
this bug was not found for such a long time.

Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit e210bffd39d01b649c94b820c28ff112673266dd)
Change-Id: Icbddbaa6e8c1071859673d8685bc3f38955cf144
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:32 +01:00
Chris Redpath
792510d9b3 BACKPORT: sched/fair: Make it possible to account fair load avg consistently
While set_task_rq_fair() is introduced in mainline by commit ad936d8658fd
("sched/fair: Make it possible to account fair load avg consistently"),
the function results to be introduced here by the backport of
commit 09a43ace1f98 ("sched/fair: Propagate load during synchronous
attach/detach"). The problem (apart from the confusion introduced by the
backport) is actually that set_task_rq_fair() is currently not called at
all.

Fix the problem by backporting again commit ad936d8658fd
("sched/fair: Make it possible to account fair load avg consistently").

Original change log:

The current code accounts for the time a task was absent from the fair
class (per ATTACH_AGE_LOAD). However it does not work correctly when a
task got migrated or moved to another cgroup while outside of the fair
class.

This patch tries to address that by aging on migration. We locklessly
read the 'last_update_time' stamp from both the old and new cfs_rq,
ages the load upto the old time, and sets it to the new time.

These timestamps should in general not be more than 1 tick apart from
one another, so there is a definite bound on things.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
[ Changelog, a few edits and !SMP build fix ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445616981-29904-2-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry-picked from ad936d8658fd348338cb7d42c577dac77892b074)
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: I17294ab0ada3901d35895014715fd60952949358
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
2017-10-27 13:30:32 +01:00
Chris Redpath
fac311be26 cpufreq/sched: Consider max cpu capacity when choosing frequencies
When using schedfreq on cpus with max capacity significantly smaller than
1024, the tick update uses non-normalised capacities - this leads to
selecting an incorrect OPP as we were scaling the frequency as if the
max capacity achievable was 1024 rather than the max for that particular
cpu or group. This could result in a cpu being stuck at the lowest OPP
and unable to generate enough utilisation to climb out if the max
capacity is significantly smaller than 1024.

Instead, normalize the capacity to be in the range 0-1024 in the tick
so that when we later select a frequency, we get the correct one.

Also comments updated to be clearer about what is needed.

Change-Id: Id84391c7ac015311002ada21813a353ee13bee60
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:32 +01:00
Oleg Nesterov
0f85c0954b sched/autogroup: Fix autogroup_move_group() to never skip sched_move_task()
commit 18f649ef344127ef6de23a5a4272dbe2fdb73dde upstream.

The PF_EXITING check in task_wants_autogroup() is no longer needed. Remove
it, but see the next patch.

However the comment is correct in that autogroup_move_group() must always
change task_group() for every thread so the sysctl_ check is very wrong;
we can race with cgroups and even sys_setsid() is not safe because a task
running with task_group() == ag->tg must participate in refcounting:

	int main(void)
	{
		int sctl = open("/proc/sys/kernel/sched_autogroup_enabled", O_WRONLY);

		assert(sctl > 0);
		if (fork()) {
			wait(NULL); // destroy the child's ag/tg
			pause();
		}

		assert(pwrite(sctl, "1\n", 2, 0) == 2);
		assert(setsid() > 0);
		if (fork())
			pause();

		kill(getppid(), SIGKILL);
		sleep(1);

		// The child has gone, the grandchild runs with kref == 1
		assert(pwrite(sctl, "0\n", 2, 0) == 2);
		assert(setsid() > 0);

		// runs with the freed ag/tg
		for (;;)
			sleep(1);

		return 0;
	}

crashes the kernel. It doesn't really need sleep(1), it doesn't matter if
autogroup_move_group() actually frees the task_group or this happens later.

Reported-by: Vern Lovejoy <vlovejoy@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hartsjc@redhat.com
Cc: vbendel@redhat.com
Link: http://lkml.kernel.org/r/20161114184609.GA15965@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
 [sumits: submit to 4.4 LTS, post testing on Hikey]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-27 10:23:17 +02:00
Chris Redpath
4f8767d1ca ANDROID: sched/fair: Select correct capacity state for energy_diff
The util returned from group_max_util is not capped at the max util
present in the group, so it can be larger than the capacity stored in
the array. Ensure that when this happens, we always use the last entry
in the array to fetch energy from.

Tested with synthetics on Juno board.

Bug: 38159576
Change-Id: I89fb52fb7e68fa3e682e308acc232596672d03f7
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-25 19:28:51 +00:00
Leo Yan
774481506a cpufreq: schedutil: clamp util to CPU maximum capacity
The code is to get the CPU util by accumulate different scheduling
classes and when the total util value is larger than CPU capacity
then it clamps util to CPU maximum capacity. So we can get correct util
value when use PELT signal but if with WALT signal it misses to clamp
util value.

On the other hand, WALT doesn't accumulate different class utilization
but it needs to applying boost margin for WALT signal the CPU util
value is possible to be larger than CPU capacity; so this patch is to
always clamp util to CPU maximum capacity.

Change-Id: I05481ddbf20246bb9be15b6bd21b6ec039015ea8
Signed-off-by: Leo Yan <leo.yan@linaro.org>
2017-10-24 15:49:27 +00:00
Chris Redpath
d34d2c97ae cpufreq/sched: Use cpu max freq rather than policy max
When we convert capacity into frequency, we used policy->max to get
the max freq of the cpu. Since this can be changed by userspace policy
or thermal events, we are potentially asking for a lower frequency
than the utilization demands.

Change over to using cpuinfo.max which is the max freq supported by
that cpu rather than the currently-chosen max. Frequency granted still
honours the max policy.

Tested by setting a userspace policy and observing the relevant vars
in a trace. In this instance, we ask for around 1ghz instead of 620MHz.

freq_new=1013512
unfixed_freq_new=624487
capacity=546
cpuinfo_max=1900800
policy_max=1171200

Change-Id: I8c5694db42243c6fb78bb9be9046b06ac81295e7
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-23 12:30:46 +01:00
Greg Kroah-Hartman
89074de67a This is the 4.4.94 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlnrYxcACgkQONu9yGCS
 aT6VpQ/9GzBA59FGi6ohxZnrUR35+U5Ehuw0IZo4JTUTrlj28QozeV6dBAdgQHLH
 eGcejtzAsD39m7JjmBzkxiBBlCH9nQkq5IaUrJG6q5dYoTCYMLzHwQLgPSbrhbnS
 hCSeHdJ0fevw9tKQELtWlIiG1iOULrWATf4MtpOCHcRmpxxSMRi22yQ4vKD2Vz4y
 TdKb5c8bYjJoEqbtON4wKIiEK1JfyO80E4eZtNK7FXI+XX1WI65pum9/NBiDqB78
 K0sK1t5pSJHvDgMwtOJ7Nxzcwle1cG3xm7NhZhNCfF9OWedCy+ZCc+e48T+TeoF4
 UDHIhEvhOOOf/W3dRBQQj8VElj0zt92I+ivsWxKmheY9JzJdOvq2pTQoPAtLsBMD
 /mChCvMSNEcHTfLYrm6Bjap0e6D10n1oUHX7jgLtq04EcX9Rh2zgYvL9u9QFLjFx
 sAgTp+kmScgj0fi0XgiXJxj8mPc2MpTVmSUjcwAZD+N9Kuafkqbf3ZddZJiGyPfw
 v4ZiAdUAtACdOaIRVPRUcG2fyLfKYqg2bFsif4Z67/0RmNf3C3rJpS9yX+Q36zCo
 f66xbvysN3pRiME0obenrsxBJ0LvIkSVskxV0+5x0UfP5pOdf7jZqqpkr6IFMtLZ
 /o4DYV+Da/qeYZQnmvF0BEvEnnX8GJFIJ+9RSbz9mAWcCWtWxTU=
 =gevA
 -----END PGP SIGNATURE-----

Merge 4.4.94 into android-4.4

Changes in 4.4.94
	percpu: make this_cpu_generic_read() atomic w.r.t. interrupts
	drm/dp/mst: save vcpi with payloads
	MIPS: Fix minimum alignment requirement of IRQ stack
	sctp: potential read out of bounds in sctp_ulpevent_type_enabled()
	bpf/verifier: reject BPF_ALU64|BPF_END
	udpv6: Fix the checksum computation when HW checksum does not apply
	ip6_gre: skb_push ipv6hdr before packing the header in ip6gre_header
	net: emac: Fix napi poll list corruption
	packet: hold bind lock when rebinding to fanout hook
	bpf: one perf event close won't free bpf program attached by another perf event
	isdn/i4l: fetch the ppp_write buffer in one shot
	vti: fix use after free in vti_tunnel_xmit/vti6_tnl_xmit
	l2tp: Avoid schedule while atomic in exit_net
	l2tp: fix race condition in l2tp_tunnel_delete
	tun: bail out from tun_get_user() if the skb is empty
	packet: in packet_do_bind, test fanout with bind_lock held
	packet: only test po->has_vnet_hdr once in packet_snd
	net: Set sk_prot_creator when cloning sockets to the right proto
	tipc: use only positive error codes in messages
	Revert "bsg-lib: don't free job in bsg_prepare_job"
	locking/lockdep: Add nest_lock integrity test
	watchdog: kempld: fix gcc-4.3 build
	irqchip/crossbar: Fix incorrect type of local variables
	mac80211_hwsim: check HWSIM_ATTR_RADIO_NAME length
	mac80211: fix power saving clients handling in iwlwifi
	net/mlx4_en: fix overflow in mlx4_en_init_timestamp()
	netfilter: nf_ct_expect: Change __nf_ct_expect_check() return value.
	iio: adc: xilinx: Fix error handling
	Btrfs: send, fix failure to rename top level inode due to name collision
	f2fs: do not wait for writeback in write_begin
	md/linear: shutup lockdep warnning
	sparc64: Migrate hvcons irq to panicked cpu
	net/mlx4_core: Fix VF overwrite of module param which disables DMFS on new probed PFs
	crypto: xts - Add ECB dependency
	ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
	slub: do not merge cache if slub_debug contains a never-merge flag
	scsi: scsi_dh_emc: return success in clariion_std_inquiry()
	net: mvpp2: release reference to txq_cpu[] entry after unmapping
	i2c: at91: ensure state is restored after suspending
	ceph: clean up unsafe d_parent accesses in build_dentry_path
	uapi: fix linux/rds.h userspace compilation errors
	uapi: fix linux/mroute6.h userspace compilation errors
	target/iscsi: Fix unsolicited data seq_end_offset calculation
	nfsd/callback: Cleanup callback cred on shutdown
	cpufreq: CPPC: add ACPI_PROCESSOR dependency
	Revert "tty: goldfish: Fix a parameter of a call to free_irq"
	Linux 4.4.94

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-10-22 08:09:11 +02:00
Peter Zijlstra
28eab3db72 locking/lockdep: Add nest_lock integrity test
[ Upstream commit 7fb4a2cea6b18dab56d609530d077f168169ed6b ]

Boqun reported that hlock->references can overflow. Add a debug test
for that to generate a clear error when this happens.

Without this, lockdep is likely to report a mysterious failure on
unlock.

Reported-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicolai Hähnle <Nicolai.Haehnle@amd.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-21 17:09:03 +02:00
Yonghong Song
1a4f1ecdb2 bpf: one perf event close won't free bpf program attached by another perf event
[ Upstream commit ec9dd352d591f0c90402ec67a317c1ed4fb2e638 ]

This patch fixes a bug exhibited by the following scenario:
  1. fd1 = perf_event_open with attr.config = ID1
  2. attach bpf program prog1 to fd1
  3. fd2 = perf_event_open with attr.config = ID1
     <this will be successful>
  4. user program closes fd2 and prog1 is detached from the tracepoint.
  5. user program with fd1 does not work properly as tracepoint
     no output any more.

The issue happens at step 4. Multiple perf_event_open can be called
successfully, but only one bpf prog pointer in the tp_event. In the
current logic, any fd release for the same tp_event will free
the tp_event->prog.

The fix is to free tp_event->prog only when the closing fd
corresponds to the one which registered the program.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-21 17:09:02 +02:00
Edward Cree
2ec54b21dd bpf/verifier: reject BPF_ALU64|BPF_END
[ Upstream commit e67b8a685c7c984e834e3181ef4619cd7025a136 ]

Neither ___bpf_prog_run nor the JITs accept it.
Also adds a new test case.

Fixes: 17a5267067 ("bpf: verifier (add verifier core)")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-21 17:09:02 +02:00
Dietmar Eggemann
724091f67f sched/fair: remove erroneous RCU_LOCKDEP_WARN from start_cpu()
Fixes: https://bugs.linaro.org/show_bug.cgi?id=3075

Change-Id: I62d714fc4b9366a9b2535649aa92d1edc840cf94
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-20 15:33:41 +00:00
Blagovest Kolenichev
f9719b203c Merge android-4.4@d6fbbe5 (v4.4.93) into msm-4.4
* refs/heads/tmp-d6fbbe5
  Linux 4.4.93
  x86/alternatives: Fix alt_max_short macro to really be a max()
  USB: serial: console: fix use-after-free after failed setup
  USB: serial: qcserial: add Dell DW5818, DW5819
  USB: serial: option: add support for TP-Link LTE module
  USB: serial: cp210x: add support for ELV TFD500
  USB: serial: ftdi_sio: add id for Cypress WICED dev board
  fix unbalanced page refcounting in bio_map_user_iov
  direct-io: Prevent NULL pointer access in submit_page_section
  usb: gadget: composite: Fix use-after-free in usb_composite_overwrite_options
  ALSA: line6: Fix leftover URB at error-path during probe
  ALSA: caiaq: Fix stray URB at probe error path
  ALSA: seq: Fix copy_from_user() call inside lock
  ALSA: seq: Fix use-after-free at creating a port
  ALSA: usb-audio: Kill stray URB at exiting
  iommu/amd: Finish TLB flush in amd_iommu_unmap()
  usb: renesas_usbhs: Fix DMAC sequence for receiving zero-length packet
  KVM: nVMX: fix guest CR4 loading when emulating L2 to L1 exit
  crypto: shash - Fix zero-length shash ahash digest crash
  HID: usbhid: fix out-of-bounds bug
  dmaengine: edma: Align the memcpy acnt array size with the transfer
  MIPS: math-emu: Remove pr_err() calls from fpu_emu()
  USB: dummy-hcd: Fix deadlock caused by disconnect detection
  rcu: Allow for page faults in NMI handlers
  iwlwifi: mvm: use IWL_HCMD_NOCOPY for MCAST_FILTER_CMD
  nl80211: Define policy for packet pattern attributes
  CIFS: Reconnect expired SMB sessions
  ext4: in ext4_seek_{hole,data}, return -ENXIO for negative offsets
  brcmfmac: add length check in brcmf_cfg80211_escan_handler()
  ANDROID: HACK: arm64: use -mno-implicit-float instead of -mgeneral-regs-only
  sched: Update task->on_rq when tasks are moving between runqueues
  FROMLIST: f2fs: expose some sectors to user in inline data or dentry case
  crypto: Work around deallocated stack frame reference gcc bug on sparc.
  UPSTREAM: f2fs: fix potential panic during fstrim
  ANDROID: fscrypt: remove unnecessary fscrypto.h
  ANDROID: binder: fix node sched policy calculation
  ANDROID: Kbuild, LLVMLinux: allow overriding clang target triple
  CHROMIUM: arm64: Disable asm-operand-width warning for clang
  CHROMIUM: kbuild: clang: Disable the 'duplicate-decl-specifier' warning
  UPSTREAM: x86/build: Use cc-option to validate stack alignment parameter
  UPSTREAM: x86/build: Fix stack alignment for CLang
  UPSTREAM: efi/libstub/arm64: Set -fpie when building the EFI stub
  BACKPORT: efi/libstub/arm64: Force 'hidden' visibility for section markers
  UPSTREAM: compiler, clang: always inline when CONFIG_OPTIMIZE_INLINING is disabled
  UPSTREAM: x86/boot: #undef memcpy() et al in string.c
  UPSTREAM: crypto: arm64/sha - avoid non-standard inline asm tricks
  UPSTREAM: kbuild: clang: Disable 'address-of-packed-member' warning
  UPSTREAM: x86/build: Specify stack alignment for clang
  UPSTREAM: x86/build: Use __cc-option for boot code compiler options
  BACKPORT: kbuild: Add __cc-option macro
  UPSTREAM: x86/hweight: Don't clobber %rdi
  BACKPORT: x86/hweight: Get rid of the special calling convention
  BACKPORT: x86/mm/kaslr: Use the _ASM_MUL macro for multiplication to work around Clang incompatibility
  UPSTREAM: crypto, x86: aesni - fix token pasting for clang
  UPSTREAM: x86/kbuild: Use cc-option to enable -falign-{jumps/loops}
  UPSTREAM: compiler, clang: properly override 'inline' for clang
  UPSTREAM: compiler, clang: suppress warning for unused static inline functions
  UPSTREAM: Kbuild: provide a __UNIQUE_ID for clang
  UPSTREAM: modules: mark __inittest/__exittest as __maybe_unused
  BACKPORT: kbuild: Add support to generate LLVM assembly files
  UPSTREAM: kbuild: use -Oz instead of -Os when using clang
  BACKPORT: kbuild, LLVMLinux: Add -Werror to cc-option to support clang
  UPSTREAM: kbuild: drop -Wno-unknown-warning-option from clang options
  UPSTREAM: kbuild: fix asm-offset generation to work with clang
  UPSTREAM: kbuild: consolidate redundant sed script ASM offset generation
  UPSTREAM: kbuild: Consolidate header generation from ASM offset information
  UPSTREAM: kbuild: clang: add -no-integrated-as to KBUILD_[AC]FLAGS
  UPSTREAM: kbuild: Add better clang cross build support

Conflicts:
	arch/x86/lib/Makefile
	net/wireless/nl80211.c

Change-Id: I76032e8d1206903bc948b9ed918e7ddee7e746c7
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2017-10-20 06:07:34 -07:00