Commit graph

22582 commits

Author SHA1 Message Date
Morten Rasmussen
1e960320c9 sched: Prevent unnecessary active balance of single task in sched group
Scenarios with the busiest group having just one task and the local
being idle on topologies with sched groups with different numbers of
cpus manage to dodge all load-balance bailout conditions resulting the
nr_balance_failed counter to be incremented. This eventually causes a
pointless active migration of the task. This patch prevents this by not
incrementing the counter when the busiest group only has one task.
ASYM_PACKING migrations and migrations due to reduced capacity should
still take place as these are explicitly captured by
need_active_balance().

A better solution would be to not attempt the load-balance in the first
place, but that requires significant changes to the order of bailout
conditions and statistics gathering.

cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
2016-09-14 14:48:50 +05:30
Dietmar Eggemann
11d962803d sched: Enable idle balance to pull single task towards cpu with higher capacity
We do not want to miss out on the ability to pull a single remaining
task from a potential source cpu towards an idle destination cpu. Add an
extra criteria to need_active_balance() to kick off active load balance
if the source cpu is over-utilized and has lower capacity than the
destination cpu.

cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
2016-09-14 14:48:50 +05:30
Morten Rasmussen
d72801bf86 sched: Consider spare cpu capacity at task wake-up
find_idlest_group() selects the wake-up target group purely
based on group load which leads to suboptimal choices in low load
scenarios. An idle group with reduced capacity (due to RT tasks or
different cpu type) isn't necessarily a better target than a lightly
loaded group with higher capacity.

The patch adds spare capacity as an additional group selection
parameter. The target group is now selected based on the following
criteria:

1. Return the group with the cpu with most spare capacity and this
capacity is significant if such group exists. Significant spare capacity
is currently at least 20% to spare.

2. Return the group with the lowest load, unless it is the local group
in which case NULL is returned and the search is continued at the next
(lower) level.

cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
2016-09-14 14:48:50 +05:30
Morten Rasmussen
91b2b63314 sched: Add cpu capacity awareness to wakeup balancing
Wakeup balancing is completely unaware of cpu capacity, cpu utilization
and task utilization. The task is preferably placed on a cpu which is
idle in the instant the wakeup happens. New tasks
(SD_BALANCE_{FORK,EXEC} are placed on an idle cpu in the idlest group if
such can be found, otherwise it goes on the least loaded one. Existing
tasks (SD_BALANCE_WAKE) are placed on the previous cpu or an idle cpu
sharing the same last level cache unless the wakee_flips heuristic in
wake_wide() decides to fallback to considering cpus outside SD_LLC.
Hence existing tasks are not guaranteed to get a chance to migrate to a
different group at wakeup in case the current one has reduced cpu
capacity (due RT/IRQ pressure or different uarch e.g. ARM big.LITTLE).
They may eventually get pulled by other cpus doing
periodic/idle/nohz_idle balance, but it may take quite a while before it
happens.

This patch adds capacity awareness to find_idlest_{group,queue} (used by
SD_BALANCE_{FORK,EXEC} and SD_BALANCE_WAKE under certain circumstances)
such that groups/cpus that can accommodate the waking task based on task
utilization are preferred. In addition, wakeup of existing tasks
(SD_BALANCE_WAKE) is sent through find_idlest_{group,queue} also if the
task doesn't fit the capacity of the previous cpu to allow it to escape
(override wake_affine) when necessary instead of relying on
periodic/idle/nohz_idle balance to eventually sort it out.

cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
2016-09-14 14:48:50 +05:30
Dietmar Eggemann
50df3f37c6 sched: Store system-wide maximum cpu capacity in root domain
To be able to compare the capacity of the target cpu with the highest
cpu capacity of the system in the wakeup path, store the system-wide
maximum cpu capacity in the root domain.

cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
2016-09-14 14:48:50 +05:30
Yuyang Du
8e21145595 sched/fair: Fix new task's load avg removed from source CPU in wake_up_new_task()
If a newly created task is selected to go to a different CPU in fork
balance when it wakes up the first time, its load averages should
not be removed from the source CPU since they are never added to
it before. The same is also applicable to a never used group entity.

Fix it in remove_entity_load_avg(): when entity's last_update_time
is 0, simply return. This should precisely identify the case in
question, because in other migrations, the last_update_time is set
to 0 after remove_entity_load_avg().

Reported-by: Steve Muckle <steve.muckle@linaro.org>
Signed-off-by: Yuyang Du <yuyang.du@intel.com>
[peterz: cfs_rq_last_update_time]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <Juri.Lelli@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Link: http://lkml.kernel.org/r/20151216233427.GJ28098@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-14 14:45:40 +05:30
Riley Andrews
9400d22ae1 cpuset: Make cpusets restore on hotplug
This deliberately changes the behavior of the per-cpuset
cpus file to not be effected by hotplug. When a cpu is offlined,
it will be removed from the cpuset/cpus file. When a cpu is onlined,
if the cpuset originally requested that that cpu was part of the cpuset,
that cpu will be restored to the cpuset. The cpus files still
have to be hierachical, but the ranges no longer have to be out of
the currently online cpus, just the physically present cpus.

Change-Id: I22cdf33e7d312117bcefba1aeb0125e1ada289a9
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
2016-09-14 14:44:29 +05:30
Guenter Roeck
8b94247342 ANDROID: rcu_sync: Export rcu_sync_lockdep_assert
x86_64:allmodconfig fails to build with the following error.

ERROR: "rcu_sync_lockdep_assert" [kernel/locking/locktorture.ko] undefined!

Introduced by commit 3228c5eb7a ("RFC: FROMLIST: locking/percpu-rwsem:
Optimize readers and reduce global impact"). The applied upstream version
exports the missing symbol, so let's do the same.

Change-Id: If4e516715c3415fe8c82090f287174857561550d
Fixes: 3228c5eb7a ("RFC: FROMLIST: locking/percpu-rwsem: Optimize ...")
Signed-off-by: Guenter Roeck <groeck@chromium.org>
2016-09-14 14:26:20 +05:30
Balbir Singh
48bb58c012 RFC: FROMLIST: cgroup: reduce read locked section of cgroup_threadgroup_rwsem during fork
cgroup_threadgroup_rwsem is acquired in read mode during process exit
and fork.  It is also grabbed in write mode during
__cgroups_proc_write().  I've recently run into a scenario with lots
of memory pressure and OOM and I am beginning to see

systemd

 __switch_to+0x1f8/0x350
 __schedule+0x30c/0x990
 schedule+0x48/0xc0
 percpu_down_write+0x114/0x170
 __cgroup_procs_write.isra.12+0xb8/0x3c0
 cgroup_file_write+0x74/0x1a0
 kernfs_fop_write+0x188/0x200
 __vfs_write+0x6c/0xe0
 vfs_write+0xc0/0x230
 SyS_write+0x6c/0x110
 system_call+0x38/0xb4

This thread is waiting on the reader of cgroup_threadgroup_rwsem to
exit.  The reader itself is under memory pressure and has gone into
reclaim after fork. There are times the reader also ends up waiting on
oom_lock as well.

 __switch_to+0x1f8/0x350
 __schedule+0x30c/0x990
 schedule+0x48/0xc0
 jbd2_log_wait_commit+0xd4/0x180
 ext4_evict_inode+0x88/0x5c0
 evict+0xf8/0x2a0
 dispose_list+0x50/0x80
 prune_icache_sb+0x6c/0x90
 super_cache_scan+0x190/0x210
 shrink_slab.part.15+0x22c/0x4c0
 shrink_zone+0x288/0x3c0
 do_try_to_free_pages+0x1dc/0x590
 try_to_free_pages+0xdc/0x260
 __alloc_pages_nodemask+0x72c/0xc90
 alloc_pages_current+0xb4/0x1a0
 page_table_alloc+0xc0/0x170
 __pte_alloc+0x58/0x1f0
 copy_page_range+0x4ec/0x950
 copy_process.isra.5+0x15a0/0x1870
 _do_fork+0xa8/0x4b0
 ppc_clone+0x8/0xc

In the meanwhile, all processes exiting/forking are blocked almost
stalling the system.

This patch moves the threadgroup_change_begin from before
cgroup_fork() to just before cgroup_canfork().  There is no nee to
worry about threadgroup changes till the task is actually added to the
threadgroup.  This avoids having to call reclaim with
cgroup_threadgroup_rwsem held.

tj: Subject and description edits.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Zefan Li <lizefan@huawei.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Tejun Heo <tj@kernel.org>
[jstultz: Cherry-picked from:
 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git 568ac888215c7f]
Change-Id: Ie8ece84fb613cf6a7b08cea1468473a8df2b9661
Signed-off-by: John Stultz <john.stultz@linaro.org>
2016-09-14 14:26:20 +05:30
Peter Zijlstra
a81c69e149 RFC: FROMLIST: cgroup: avoid synchronize_sched() in __cgroup_procs_write()
The current percpu-rwsem read side is entirely free of serializing insns
at the cost of having a synchronize_sched() in the write path.

The latency of the synchronize_sched() is too high for cgroups. The
commit 1ed1328792 talks about the write path being a fairly cold path
but this is not the case for Android which moves task to the foreground
cgroup and back around binder IPC calls from foreground processes to
background processes, so it is significantly hotter than human initiated
operations.

Switch cgroup_threadgroup_rwsem into the slow mode for now to avoid the
problem, hopefully it should not be that slow after another commit
80127a39681b ("locking/percpu-rwsem: Optimize readers and reduce global
impact").

We could just add rcu_sync_enter() into cgroup_init() but we do not want
another synchronize_sched() at boot time, so this patch adds the new helper
which doesn't block but currently can only be called before the first use.

Cc: Tejun Heo <tj@kernel.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Reported-by: John Stultz <john.stultz@linaro.org>
Reported-by: Dmitry Shmidt <dimitrysh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
[jstultz: backported to 4.4]
Change-Id: I34aa9c394d3052779b56976693e96d861bd255f2
Mailing-list-URL: https://lkml.org/lkml/2016/8/11/557
Signed-off-by: John Stultz <john.stultz@linaro.org>
2016-09-14 14:26:20 +05:30
Peter Zijlstra
d4d74af4b8 RFC: FROMLIST: locking/percpu-rwsem: Optimize readers and reduce global impact
Currently the percpu-rwsem switches to (global) atomic ops while a
writer is waiting; which could be quite a while and slows down
releasing the readers.

This patch cures this problem by ordering the reader-state vs
reader-count (see the comments in __percpu_down_read() and
percpu_down_write()). This changes a global atomic op into a full
memory barrier, which doesn't have the global cacheline contention.

This also enables using the percpu-rwsem with rcu_sync disabled in order
to bias the implementation differently, reducing the writer latency by
adding some cost to readers.

Mailing-list-URL: https://lkml.org/lkml/2016/8/9/181
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[jstultz: Backported to 4.4]
Change-Id: I8ea04b4dca2ec36f1c2469eccafde1423490572f
Signed-off-by: John Stultz <john.stultz@linaro.org>
2016-09-14 14:26:20 +05:30
Jeff Vander Stoep
1a565f59cb FROMLIST: security,perf: Allow further restriction of perf_event_open
When kernel.perf_event_open is set to 3 (or greater), disallow all
access to performance events by users without CAP_SYS_ADMIN.
Add a Kconfig symbol CONFIG_SECURITY_PERF_EVENTS_RESTRICT that
makes this value the default.

This is based on a similar feature in grsecurity
(CONFIG_GRKERNSEC_PERF_HARDEN).  This version doesn't include making
the variable read-only.  It also allows enabling further restriction
at run-time regardless of whether the default is changed.

https://lkml.org/lkml/2016/1/11/587

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

Git-repo: https://android.googlesource.com/kernel/common.git
Git-commit: 012b0adcf7299f6509d4984cf46ee11e6eaed4e4
[d-cagle@codeaurora.org: Resolve trivial merge conflicts]
Signed-off-by: Dennis Cagle <d-cagle@codeaurora.org>
Bug: 29054680
Change-Id: Iff5bff4fc1042e85866df9faa01bce8d04335ab8
2016-09-13 12:23:33 -07:00
Joonwoo Park
04f14aa884 cpuset: handle race between CPU hotplug and cpuset_hotplug_work
A discrepancy between cpu_online_mask and cpuset's effective_cpus
mask is inevitable during hotplug since cpuset defers updating of
effective_cpus mask using a workqueue, during which time nothing
prevents the system from more hotplug operations.  For that reason
guarantee_online_cpus() walks up the cpuset hierarchy until it finds
an intersection under the assumption that top cpuset's effective_cpus
mask intersects with cpu_online_mask even with such a race occurring.

However a sequence of CPU hotplugs can open a time window, during which
none of the effective CPUs in the top cpuset intersect with
cpu_online_mask.

For example when there are 4 possible CPUs 0-3 and only CPU0 is online:

  ========================  ===========================
   cpu_online_mask           top_cpuset.effective_cpus
  ========================  ===========================
   echo 1 > cpu2/online.
   CPU hotplug notifier woke up hotplug work but not yet scheduled.
      [0,2]                     [0]

   echo 0 > cpu0/online.
   The workqueue is still runnable.
      [2]                       [0]
  ========================  ===========================

  Now there is no intersection between cpu_online_mask and
  top_cpuset.effective_cpus.  Thus invoking sys_sched_setaffinity() at
  this moment can cause following:

   Unable to handle kernel NULL pointer dereference at virtual address 000000d0
   ------------[ cut here ]------------
   Kernel BUG at ffffffc0001389b0 [verbose debug info unavailable]
   Internal error: Oops - BUG: 96000005 [#1] PREEMPT SMP
   Modules linked in:
   CPU: 2 PID: 1420 Comm: taskset Tainted: G        W       4.4.8+ #98
   task: ffffffc06a5c4880 ti: ffffffc06e124000 task.ti: ffffffc06e124000
   PC is at guarantee_online_cpus+0x2c/0x58
   LR is at cpuset_cpus_allowed+0x4c/0x6c
   <snip>
   Process taskset (pid: 1420, stack limit = 0xffffffc06e124020)
   Call trace:
   [<ffffffc0001389b0>] guarantee_online_cpus+0x2c/0x58
   [<ffffffc00013b208>] cpuset_cpus_allowed+0x4c/0x6c
   [<ffffffc0000d61f0>] sched_setaffinity+0xc0/0x1ac
   [<ffffffc0000d6374>] SyS_sched_setaffinity+0x98/0xac
   [<ffffffc000085cb0>] el0_svc_naked+0x24/0x28

The top cpuset's effective_cpus are guaranteed to be identical to
cpu_online_mask eventually.  Hence fall back to cpu_online_mask when
there is no intersection between top cpuset's effective_cpus and
cpu_online_mask.

CRs-fixed: 1058529
Change-Id: I83ee4619feff2ca7452119c9baecb6ffde755287
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: cgroups@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.17+
Signed-off-by: Tejun Heo <tj@kernel.org>
2016-09-13 10:32:11 -07:00
Runmin Wang
c568eb7aca Merge branch 'tmp-bab1564' into msm-4.4
* tmp-bab1564:
  ANDROID: mmc: Add CONFIG_MMC_SIMULATE_MAX_SPEED
  android: base-cfg: Add CONFIG_INET_DIAG_DESTROY
  cpufreq: interactive: only apply interactive boost when enabled
  cpufreq: interactive: fix policy locking
  ANDROID: dm verity fec: add sysfs attribute fec/corrected
  ANDROID: android: base-cfg: enable CONFIG_DM_VERITY_FEC
  UPSTREAM: dm verity: add ignore_zero_blocks feature
  UPSTREAM: dm verity: add support for forward error correction
  UPSTREAM: dm verity: factor out verity_for_bv_block()
  UPSTREAM: dm verity: factor out structures and functions useful to separate object
  UPSTREAM: dm verity: move dm-verity.c to dm-verity-target.c
  UPSTREAM: dm verity: separate function for parsing opt args
  UPSTREAM: dm verity: clean up duplicate hashing code
  UPSTREAM: dm: don't save and restore bi_private
  mm: Export do_munmap
  sdcardfs: remove unneeded __init and __exit
  sdcardfs: Remove unused code
  fs: Export d_absolute_path
  sdcardfs: remove effectless config option
  inotify: Fix erroneous update of bit count
  fs: sdcardfs: Declare LOOKUP_CASE_INSENSITIVE unconditionally
  trace: cpufreq: fix typo in min/max cpufreq
  sdcardfs: Add support for d_canonical_path
  vfs: add d_canonical_path for stacked filesystem support
  sdcardfs: Bring up to date with Android M permissions:
  Changed type-casting in packagelist management
  Port of sdcardfs to 4.4
  Included sdcardfs source code for kernel 3.0
  ANDROID: usb: gadget: Add support for MTP OS desc
  CHROMIUM: usb: gadget: f_accessory: add .raw_request callback
  CHROMIUM: usb: gadget: audio_source: add .free_func callback
  CHROMIUM: usb: gadget: f_mtp: fix usb_ss_ep_comp_descriptor
  CHROMIUM: usb: gadget: f_mtp: Add SuperSpeed support
  FROMLIST: mmc: block: fix ABI regression of mmc_blk_ioctl
  FROMLIST: mm: ASLR: use get_random_long()
  FROMLIST: drivers: char: random: add get_random_long()
  FROMLIST: pstore-ram: fix NULL reference when used with pdata
  usb: u_ether: Add missing rx_work init
  ANDROID: dm-crypt: run in a WQ_HIGHPRI workqueue
  misc: uid_stat: Include linux/atomic.h instead of asm/atomic.h
  hid-sensor-hub.c: fix wrong do_div() usage
  power: Provide dummy log_suspend_abort_reason() if SUSPEND is disabled
  PM / suspend: Add dependency on RTC_LIB
  drivers: power: use 'current' instead of 'get_current()'
  video: adf: Set ADF_MEMBLOCK to boolean
  video: adf: Fix modular build
  net: ppp: Fix modular build for PPPOLAC and PPPOPNS
  net: pppolac/pppopns: Replace msg.msg_iov with iov_iter_kvec()
  ANDROID: mmc: sdio: Disable retuning in sdio_reset_comm()
  ANDROID: mmc: Move tracepoint creation and export symbols
  ANDROID: kernel/watchdog: fix unused variable warning
  ANDROID: usb: gadget: f_mtp: don't use le16 for u8 field
  ANDROID: lowmemorykiller: fix declaration order warnings
  ANDROID: net: fix 'const' warnings
  net: diag: support v4mapped sockets in inet_diag_find_one_icsk()
  net: tcp: deal with listen sockets properly in tcp_abort.
  tcp: diag: add support for request sockets to tcp_abort()
  net: diag: Support destroying TCP sockets.
  net: diag: Support SOCK_DESTROY for inet sockets.
  net: diag: Add the ability to destroy a socket.
  net: diag: split inet_diag_dump_one_icsk into two
  Revert "mmc: Extend wakelock if bus is dead"
  Revert "mmc: core: Hold a wake lock accross delayed work + mmc rescan"
  ANDROID: mmc: move to a SCHED_FIFO thread

Conflicts:
	drivers/cpufreq/cpufreq_interactive.c
	drivers/misc/uid_stat.c
	drivers/mmc/card/block.c
	drivers/mmc/card/queue.c
	drivers/mmc/card/queue.h
	drivers/mmc/core/core.c
	drivers/mmc/core/sdio.c
	drivers/staging/android/lowmemorykiller.c
	drivers/usb/gadget/function/f_mtp.c
	kernel/watchdog.c

Signed-off-by: Runmin Wang <runminw@codeaurora.org>
Change-Id: Ibb4db11c57395f67dee86211a110c462e6181552
2016-09-12 18:25:49 -07:00
Syed Rameez Mustafa
1389927146 sched: Move data structures under CONFIG_SCHED_HMP
Frequency-demand conversion data structures are only used under
CONFIG_SCHED_HMP. Move them out of sched.h into hmp.c to where they
actually belong after the recent refactor.

Change-Id: I3c3eebca86062f11b80af93ba3716695eb787376
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-09-09 15:35:38 -07:00
Tejun Heo
beb35bfaa5 cgroup: make sure a parent css isn't freed before its children
commit 8bb5ef79bc0f4016ecf79e8dce6096a3c63603e4 upstream.

There are three subsystem callbacks in css shutdown path -
css_offline(), css_released() and css_free().  Except for
css_released(), cgroup core didn't guarantee the order of invocation.
css_offline() or css_free() could be called on a parent css before its
children.  This behavior is unexpected and led to bugs in cpu and
memory controller.

The previous patch updated ordering for css_offline() which fixes the
cpu controller issue.  While there currently isn't a known bug caused
by misordering of css_free() invocations, let's fix it too for
consistency.

css_free() ordering can be trivially fixed by moving putting of the
parent css below css_free() invocation.

Change-Id: I97febdd414ef5cd57490ce2746650dde7fdda28f
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>i
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Git-commit: 8bb5ef79bc0f4016ecf79e8dce6096a3c63603e4
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Runmin Wang <runminw@codeaurora.org>
2016-09-09 15:01:53 -07:00
Syed Rameez Mustafa
591ce8ed84 sched: Further re-factor HMP specific code
The structures being moved around are only used for trace events
defined under CONFIG_SCHED_HMP. Move code to hmp.c to reflect
the same.

Change-Id: Ib959355264405ab779b24948f111a2ca61d367de
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-09-08 15:44:52 -07:00
Wanpeng Li
15abaa07a2 sched/nohz: Fix affine unpinned timers mess
commit 444969223c81c7d0a95136b7b4cfdcfbc96ac5bd upstream.

The following commit:

  9642d18eee ("nohz: Affine unpinned timers to housekeepers")'

intended to affine unpinned timers to housekeepers:

  unpinned timers(full dynaticks, idle)   =>   nearest busy housekeepers(otherwise, fallback to any housekeepers)
  unpinned timers(full dynaticks, busy)   =>   nearest busy housekeepers(otherwise, fallback to any housekeepers)
  unpinned timers(houserkeepers, idle)    =>   nearest busy housekeepers(otherwise, fallback to itself)

However, the !idle_cpu(i) && is_housekeeping_cpu(cpu) check modified the
intention to:

  unpinned timers(full dynaticks, idle)   =>   any housekeepers(no mattter cpu topology)
  unpinned timers(full dynaticks, busy)   =>   any housekeepers(no mattter cpu topology)
  unpinned timers(housekeepers, idle)     =>   any busy cpus(otherwise, fallback to any housekeepers)

This patch fixes it by checking if there are busy housekeepers nearby,
otherwise falls to any housekeepers/itself. After the patch:

  unpinned timers(full dynaticks, idle)   =>   nearest busy housekeepers(otherwise, fallback to any housekeepers)
  unpinned timers(full dynaticks, busy)   =>   nearest busy housekeepers(otherwise, fallback to any housekeepers)
  unpinned timers(housekeepers, idle)     =>   nearest busy housekeepers(otherwise, fallback to itself)

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[ Fixed the changelog. ]
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: 'commit 9642d18eee ("nohz: Affine unpinned timers to housekeepers")'
Link: http://lkml.kernel.org/r/1462344334-8303-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-07 08:32:41 +02:00
Peter Zijlstra
c3cf68ec55 sched/cputime: Fix NO_HZ_FULL getrusage() monotonicity regression
commit 173be9a14f7b2e901cf77c18b1aafd4d672e9d9e upstream.

Mike reports:

 Roughly 10% of the time, ltp testcase getrusage04 fails:
 getrusage04    0  TINFO  :  Expected timers granularity is 4000 us
 getrusage04    0  TINFO  :  Using 1 as multiply factor for max [us]time increment (1000+4000us)!
 getrusage04    0  TINFO  :  utime:           0us; stime:         179us
 getrusage04    0  TINFO  :  utime:        3751us; stime:           0us
 getrusage04    1  TFAIL  :  getrusage04.c:133: stime increased > 5000us:

And tracked it down to the case where the task simply doesn't get
_any_ [us]time ticks.

Update the code to assume all rtime is utime when we lack information,
thus ensuring a task that elides the tick gets time accounted.

Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Fredrik Markstrom <fredrik.markstrom@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Fixes: 9d7fb04276 ("sched/cputime: Guarantee stime + utime == rtime")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-07 08:32:41 +02:00
Marc Zyngier
6722e24787 genirq/msi: Make sure PCI MSIs are activated early
commit f3b0946d629c8bfbd3e5f038e30cb9c711a35f10 upstream.

Bharat Kumar Gogada reported issues with the generic MSI code, where the
end-point ended up with garbage in its MSI configuration (both for the vector
and the message).

It turns out that the two MSI paths in the kernel are doing slightly different
things:

generic MSI: disable MSI -> allocate MSI -> enable MSI -> setup EP
PCI MSI: disable MSI -> allocate MSI -> setup EP -> enable MSI

And it turns out that end-points are allowed to latch the content of the MSI
configuration registers as soon as MSIs are enabled.  In Bharat's case, the
end-point ends up using whatever was there already, which is not what you
want.

In order to make things converge, we introduce a new MSI domain flag
(MSI_FLAG_ACTIVATE_EARLY) that is unconditionally set for PCI/MSI. When set,
this flag forces the programming of the end-point as soon as the MSIs are
allocated.

A consequence of this is that we have an extra activate in irq_startup, but
that should be without much consequence.

tglx:

 - Several people reported a VMWare regression with PCI/MSI-X passthrough. It
   turns out that the patch also cures that issue.

 - We need to have a look at the MSI disable interrupt path, where we write
   the msg to all zeros without disabling MSI in the PCI device. Is that
   correct?

Fixes: 52f518a3a7 "x86/MSI: Use hierarchical irqdomains to manage MSI interrupts"
Reported-and-tested-by: Bharat Kumar Gogada <bharat.kumar.gogada@xilinx.com>
Reported-and-tested-by: Foster Snowhill <forst@forstwoof.ru>
Reported-by: Matthias Prager <linux@matthiasprager.de>
Reported-by: Jason Taylor <jason.taylor@simplivity.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Link: http://lkml.kernel.org/r/1468426713-31431-1-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-07 08:32:38 +02:00
Thomas Gleixner
fd59f98be0 genirq/msi: Remove unused MSI_FLAG_IDENTITY_MAP
commit b6140914fd079e43ea75a53429b47128584f033a upstream.

No user and we definitely don't want to grow one.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: linux-block@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Cc: linux-nvme@lists.infradead.org
Cc: axboe@fb.com
Cc: agordeev@redhat.com
Link: http://lkml.kernel.org/r/1467621574-8277-2-git-send-email-hch@lst.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-07 08:32:38 +02:00
Linux Build Service Account
ca667c3ef5 Merge "mutex: Add a delay into the SPIN_ON_OWNER wait loop." 2016-09-02 13:52:24 -07:00
Alex Shi
b56111f481 Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android
Conflicts:
	arch/arm/Kconfig
2016-08-30 10:27:13 +08:00
Balbir Singh
35f2961082 RFC: FROMLIST: cgroup: reduce read locked section of cgroup_threadgroup_rwsem during fork
cgroup_threadgroup_rwsem is acquired in read mode during process exit
and fork.  It is also grabbed in write mode during
__cgroups_proc_write().  I've recently run into a scenario with lots
of memory pressure and OOM and I am beginning to see

systemd

 __switch_to+0x1f8/0x350
 __schedule+0x30c/0x990
 schedule+0x48/0xc0
 percpu_down_write+0x114/0x170
 __cgroup_procs_write.isra.12+0xb8/0x3c0
 cgroup_file_write+0x74/0x1a0
 kernfs_fop_write+0x188/0x200
 __vfs_write+0x6c/0xe0
 vfs_write+0xc0/0x230
 SyS_write+0x6c/0x110
 system_call+0x38/0xb4

This thread is waiting on the reader of cgroup_threadgroup_rwsem to
exit.  The reader itself is under memory pressure and has gone into
reclaim after fork. There are times the reader also ends up waiting on
oom_lock as well.

 __switch_to+0x1f8/0x350
 __schedule+0x30c/0x990
 schedule+0x48/0xc0
 jbd2_log_wait_commit+0xd4/0x180
 ext4_evict_inode+0x88/0x5c0
 evict+0xf8/0x2a0
 dispose_list+0x50/0x80
 prune_icache_sb+0x6c/0x90
 super_cache_scan+0x190/0x210
 shrink_slab.part.15+0x22c/0x4c0
 shrink_zone+0x288/0x3c0
 do_try_to_free_pages+0x1dc/0x590
 try_to_free_pages+0xdc/0x260
 __alloc_pages_nodemask+0x72c/0xc90
 alloc_pages_current+0xb4/0x1a0
 page_table_alloc+0xc0/0x170
 __pte_alloc+0x58/0x1f0
 copy_page_range+0x4ec/0x950
 copy_process.isra.5+0x15a0/0x1870
 _do_fork+0xa8/0x4b0
 ppc_clone+0x8/0xc

In the meanwhile, all processes exiting/forking are blocked almost
stalling the system.

This patch moves the threadgroup_change_begin from before
cgroup_fork() to just before cgroup_canfork().  There is no nee to
worry about threadgroup changes till the task is actually added to the
threadgroup.  This avoids having to call reclaim with
cgroup_threadgroup_rwsem held.

tj: Subject and description edits.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Zefan Li <lizefan@huawei.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Tejun Heo <tj@kernel.org>
[jstultz: Cherry-picked from:
 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git 568ac888215c7f]
Change-Id: Ie8ece84fb613cf6a7b08cea1468473a8df2b9661
Signed-off-by: John Stultz <john.stultz@linaro.org>
Git-commit: e91f1799ff
Git-repo: https://android.googlesource.com/kernel/common/+/android-4.4
Signed-off-by: Omprakash Dhyade <odhyade@codeaurora.org>
2016-08-29 14:19:13 -07:00
Peter Zijlstra
00eaad05be RFC: FROMLIST: cgroup: avoid synchronize_sched() in __cgroup_procs_write()
The current percpu-rwsem read side is entirely free of serializing insns
at the cost of having a synchronize_sched() in the write path.

The latency of the synchronize_sched() is too high for cgroups. The
commit 1ed1328792 talks about the write path being a fairly cold path
but this is not the case for Android which moves task to the foreground
cgroup and back around binder IPC calls from foreground processes to
background processes, so it is significantly hotter than human initiated
operations.

Switch cgroup_threadgroup_rwsem into the slow mode for now to avoid the
problem, hopefully it should not be that slow after another commit
80127a39681b ("locking/percpu-rwsem: Optimize readers and reduce global
impact").

We could just add rcu_sync_enter() into cgroup_init() but we do not want
another synchronize_sched() at boot time, so this patch adds the new helper
which doesn't block but currently can only be called before the first use.

Cc: Tejun Heo <tj@kernel.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Reported-by: John Stultz <john.stultz@linaro.org>
Reported-by: Dmitry Shmidt <dimitrysh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
[jstultz: backported to 4.4]
Change-Id: I34aa9c394d3052779b56976693e96d861bd255f2
Mailing-list-URL: https://lkml.org/lkml/2016/8/11/557
Signed-off-by: John Stultz <john.stultz@linaro.org>
Git-commit: 0c3240a1ef
Git-repo: https://android.googlesource.com/kernel/common/+/android-4.4
Signed-off-by: Omprakash Dhyade <odhyade@codeaurora.org>
2016-08-29 14:18:07 -07:00
Peter Zijlstra
9fe73e0c81 RFC: FROMLIST: locking/percpu-rwsem: Optimize readers and reduce global impact
Currently the percpu-rwsem switches to (global) atomic ops while a
writer is waiting; which could be quite a while and slows down
releasing the readers.

This patch cures this problem by ordering the reader-state vs
reader-count (see the comments in __percpu_down_read() and
percpu_down_write()). This changes a global atomic op into a full
memory barrier, which doesn't have the global cacheline contention.

This also enables using the percpu-rwsem with rcu_sync disabled in order
to bias the implementation differently, reducing the writer latency by
adding some cost to readers.

Mailing-list-URL: https://lkml.org/lkml/2016/8/9/181
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[jstultz: Backported to 4.4]
Change-Id: I8ea04b4dca2ec36f1c2469eccafde1423490572f
Signed-off-by: John Stultz <john.stultz@linaro.org>
Git-commit: 3228c5eb7a
Git-repo: https://android.googlesource.com/kernel/common/+/android-4.4
Signed-off-by: Omprakash Dhyade <odhyade@codeaurora.org>
2016-08-29 14:11:36 -07:00
Linux Build Service Account
2b4e8cbd34 Merge "Revert "Merge remote-tracking branch 'msm-4.4/tmp-510d0a3f' into msm-4.4"" 2016-08-29 00:49:25 -07:00
Linux Build Service Account
fcb4d9dd29 Merge "sched: Make use of sysctl_sched_wake_to_idle in select_best_cpu" 2016-08-26 22:22:38 -07:00
Linux Build Service Account
1b7819036e Merge "sched: handle frequency alert notifications better" 2016-08-26 22:22:38 -07:00
Linus Torvalds
41a69b502d x86: remove more uaccess_32.h complexity
I'm looking at trying to possibly merge the 32-bit and 64-bit versions
of the x86 uaccess.h implementation, but first this needs to be cleaned
up.

For example, the 32-bit version of "__copy_from_user_inatomic()" is
mostly the special cases for the constant size, and it's actually almost
never relevant.  Most users aren't actually using a constant size
anyway, and the few cases that do small constant copies are better off
just using __get_user() instead.

So get rid of the unnecessary complexity.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit bd28b14591b98f696bc9f94c5ba2e598ca487dfd)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
2016-08-27 11:23:38 +08:00
Linux Build Service Account
05966eacca Merge "cpuset: Make cpusets restore on hotplug" 2016-08-26 14:48:45 -07:00
Trilok Soni
5ab1e18aa3 Revert "Merge remote-tracking branch 'msm-4.4/tmp-510d0a3f' into msm-4.4"
This reverts commit 9d6fd2c3e9 ("Merge remote-tracking branch
'msm-4.4/tmp-510d0a3f' into msm-4.4"), because it breaks the
dump parsing tools due to kernel can be loaded anywhere in the memory
now and not fixed at linear mapping.

Change-Id: Id416f0a249d803442847d09ac47781147b0d0ee6
Signed-off-by: Trilok Soni <tsoni@codeaurora.org>
2016-08-26 14:34:05 -07:00
Riley Andrews
dbc6f463a6 mutex: Add a delay into the SPIN_ON_OWNER wait loop.
On arm systems the spin on owner optimization can intermittently cause a
lockup that's usually as long as the waiting thread's cpu timeslice. The
repeated mutex aquisitions + atomics in a single spinning thread can
completely lock out the owner from releasing the kernel mutex. The
owner needs to acquire a spinlock on the relase path and this spinlock
can share a monitor with the other locks and atomics on the waiter path.
Rate limit the waiter so that the thread releasing the mutex never
is starved.

Bug 23036902

Change-Id: Ie1b64275a0c6141f94faaf3e63fcbf9b5438140c
Signed-off-by: Riley Andrews <riandrews@google.com>
Git-commit: 84d8ce7e0025cac60a8a379a7ee3e59d640fbc03
Git-repo: https://android.googlesource.com/kernel/msm.git
Signed-off-by: Trilok Soni <tsoni@codeaurora.org>
2016-08-25 23:54:08 -07:00
Riley Andrews
1943b682a3 cpuset: Make cpusets restore on hotplug
This deliberately changes the behavior of the per-cpuset
cpus file to not be effected by hotplug. When a cpu is offlined,
it will be removed from the cpuset/cpus file. When a cpu is onlined,
if the cpuset originally requested that that cpu was part of the cpuset, that
cpu will be restored to the cpuset. The cpus files still
have to be hierachical, but the ranges no longer have to be out of
the currently online cpus, just the physically present cpus.

Change-Id: I3efbae24a1f6384be1e603fb56f0d3baef61d924
[ohaugan@codeaurora.org: Port to 4.4]
Git-commit: f180bcac788464a0baf3d79d76dd86d6972ea413
Git-repo: https://android.googlesource.com/kernel/common/msm.git
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2016-08-24 16:03:23 -07:00
Syed Rameez Mustafa
3506942e60 sched: Make use of sysctl_sched_wake_to_idle in select_best_cpu
sysctl_sched_wake_to_idle is a means to allow or disallow a global
task placement preference for idle CPUs. It has been unused thus
far since we've preferred to use a per-task flag instead to control
placement for individual tasks. Using this global flag, however, does
allow greater flexibility for testing and system evaluation.
Incorporate sysctl_sched_wake_to_idle in the placement policy.

Change-Id: I7e830bc914eb9c159ae18f165bc8b0278ec9af40
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-08-22 14:06:37 -07:00
Pavankumar Kondeti
078568e425 sched: Introduce sched_freq_aggregate_threshold tunable
Do the aggregation for frequency only when the total group busy time
is above sched_freq_aggregate_threshold. This filtering is especially
needed for the cases where groups are created by including all threads
of an application process. This knob can be tuned to apply aggregation
only for the heavy workload applications.

When this knob is enabled and load is aggregated, the load is not
clipped to 100% @ current frequency to ramp up the frequency faster.

Change-Id: Icfd91c85938def101a989af3597d3dcaa8026d16
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-08-22 14:06:37 -07:00
Pavankumar Kondeti
2552980f79 sched: handle frequency alert notifications better
The load reporting during frequency alert notifications is broken under
load aggregation. When aggregation is enabled, the total group busy
time is accounted towards the maximum busy CPU of a frequency domain.
If this CPU has a notification pending, it's group busy time alone is
accounted and other CPU's group busy time is completely ignored.
Similarly if any CPU other than maximum busy CPU has a pending
notification, its group busy time is accounted twice.

Maintain the frequency alert notification flag per frequency domain.
When the notification is pending, don't clip the load to 100% @ fur
for any of the CPUs in the frequency domain.

Change-Id: Iebc7d74d6fafa20430fa1c7d80f34a6ab198832d
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-08-22 14:06:36 -07:00
Pavankumar Kondeti
5ddfbfec06 sched: inherit the group id from the group leader
When sysctl_sched_enable_thread_grouping is set to 1, any new tasks
created are put in the same group as their group leader.

Change-Id: If1837dd7c8120c8b097cfffa1dc52eb4781f1641
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-08-22 14:06:35 -07:00
Olav Haugan
fbc251af5a sched/fair: Add flag to indicate why we picked the CPU
Add a flag to the trace event that indicates why we picked a particular
CPU. This is very useful information/statistic that can be used to
analyse the effectiveness of the scheduler.

Change-Id: Ic9462fef751f9442ae504c09fbf4418e08f018b0
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2016-08-22 14:06:34 -07:00
Syed Rameez Mustafa
67e0df6e33 sched: Move notify_migration() under CONFIG_SCHED_HMP
notify_migration() is a HMP specific function that relies on all
of its contents to be stubbed out for !CONFIG_SCHED_HMP. However,
it still maintains calls to rcu_read_lock/unlock(). In the !HMP
case these calls are simply redundant. Move the function under
CONFIG_SCHED_HMP and add a stub when the config is not defined so
that there is no overhead.

Change-Id: Iad914f31b629e81e403b0e89796b2b0f1d081695
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-08-22 14:06:33 -07:00
Syed Rameez Mustafa
9095a09ab1 sched: Move most HMP specific code to a separate file.
Most code pertaining to CONFIG_SCHED_HMP has been moved to a separate
file "hmp.c" in order to facilitate kernel upgrades. Fewer changes in
the original scheduler files means fewer conflicts. Some parts of code,
however, could not be moved to the separate file either because of
dependencies with other non-HMP code or because the changes are specific
only to the scheduling classes where the code resides.

Change-Id: Ib067ac75e5a494008dcb3c67586b622c1b3962ce
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-08-22 14:06:33 -07:00
Syed Rameez Mustafa
7663fb1d6e sched: Consolidate CONFIG_SCHED_HMP sections in various files
Code sections found either CONFIG_SCHED_HMP or !CONFIG_SCHED_HMP
have become quite fragmented over time. Some of these fragmented
sections are necessary because of the code dependencies. Others
fragmented sections can easily be consolidated. Do so in order
to make kernel upgrades a lot simpler.

Change-Id: I6be476834ce70274aec5a52fd9455b5f0065af87
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-08-22 14:06:32 -07:00
Syed Rameez Mustafa
b01a93838d sched: Fix compile issues for !CONFIG_SCHED_HMP
Fix compile issues observed when CONFIG_SCHED_HMP is not turned on.
There are still targets that may want that config option turned off.

Change-Id: I29e69356da8d003d13d8cd3927a0b166cc1ef95e
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-08-22 14:06:31 -07:00
Syed Rameez Mustafa
62f2600ce9 sched: Remove all existence of CONFIG_SCHED_FREQ_INPUT
CONFIG_SCHED_FREQ_INPUT was created to keep parts of the scheduler
dealing with frequency separate from other parts of the scheduler
that deal with task placement. However, overtime the two features
have become intricately linked whereby SCHED_FREQ_INPUT cannot be
turned on without having SCHED_HMP turned on as well. Given this
complex inter-dependency and the fact that all old, existing and
future targets use both config options, remove this unnecessary
feature separation. It will aid in making kernel upgrades a lot
simpler and faster.

Change-Id: Ia20e40d8a088d50909cc28f5be758fa3e9a4af6f
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-08-22 11:37:22 -07:00
Syed Rameez Mustafa
e2b9b4a395 sched: Move CPU cstate tracking under CONFIG_SCHED_HMP
While tracking C-states makes sense under CONFIG_SMP as well, cstate
information is currently unused under CONFIG_SMP. Move it under
CONFIG_SCHED_HMP for now since that is the only place it is relevant
at the moment.

Change-Id: Ifc5812cfe14ebf2b4d447100dcd87f02ab29ff7a
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-08-22 11:33:55 -07:00
Syed Rameez Mustafa
e978394406 sched: Remove unused PELT extensions for HMP scheduling
PELT extensions for HMP have never been used since the early days
of the HMP scheduler. Furthermore, changes to PELT itself in newer
kernel versions render some of the code redundant or incorrect. These
extensions have not been tested for a long time and are practically
dead code. Remove it so that future upgrades become easier.

Change-Id: I029f327406ca00b2370c93134158b61dda3b81e3
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-08-22 11:32:57 -07:00
Syed Rameez Mustafa
ef1e55638d sched: Remove unused migration notifier code.
Migration notifiers were created to aid the CPU-boost driver manage
CPU frequencies when tasks migrate from one CPU to another. Over time
with the evolution of scheduler guided frequency, the scheduler now
directly manages load when tasks migrate. Consequently the CPU-boost
driver no longer makes use of this information. Remove unused code
pertaining to this feature.

Change-Id: I3529e4356e15e342a5fcfbcf3654396752a1d7cd
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-08-22 11:32:19 -07:00
Ben Hutchings
bc2318cc76 module: Invalidate signatures on force-loaded modules
commit bca014caaa6130e57f69b5bf527967aa8ee70fdd upstream.

Signing a module should only make it trusted by the specific kernel it
was built for, not anything else.  Loading a signed module meant for a
kernel with a different ABI could have interesting effects.
Therefore, treat all signatures as invalid when a module is
force-loaded.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-20 18:09:27 +02:00
Paul Moore
53eaa3910a audit: fix a double fetch in audit_log_single_execve_arg()
commit 43761473c254b45883a64441dd0bc85a42f3645c upstream.

There is a double fetch problem in audit_log_single_execve_arg()
where we first check the execve(2) argumnets for any "bad" characters
which would require hex encoding and then re-fetch the arguments for
logging in the audit record[1].  Of course this leaves a window of
opportunity for an unsavory application to munge with the data.

This patch reworks things by only fetching the argument data once[2]
into a buffer where it is scanned and logged into the audit
records(s).  In addition to fixing the double fetch, this patch
improves on the original code in a few other ways: better handling
of large arguments which require encoding, stricter record length
checking, and some performance improvements (completely unverified,
but we got rid of some strlen() calls, that's got to be a good
thing).

As part of the development of this patch, I've also created a basic
regression test for the audit-testsuite, the test can be tracked on
GitHub at the following link:

 * https://github.com/linux-audit/audit-testsuite/issues/25

[1] If you pay careful attention, there is actually a triple fetch
problem due to a strnlen_user() call at the top of the function.

[2] This is a tiny white lie, we do make a call to strnlen_user()
prior to fetching the argument data.  I don't like it, but due to the
way the audit record is structured we really have no choice unless we
copy the entire argument at once (which would require a rather
wasteful allocation).  The good news is that with this patch the
kernel no longer relies on this strnlen_user() value for anything
beyond recording it in the log, we also update it with a trustworthy
value whenever possible.

Reported-by: Pengfei Wang <wpengfeinudt@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-20 18:09:22 +02:00
Satyajit Desai
c34bf4be22 coresight: abort coresight tracing on kernel crash
Add trace events to control aborting CoreSight trace
dynamically based on module parameter.
Coresight driver will dump any trace present in the current sink
in case we hit a kernel panic, user fault or an undefined instruction.

Change-Id: Iee1ccf5cbd7b767753a3115c0570e63fbe2aa8f3
Signed-off-by: Satyajit Desai <sadesai@codeaurora.org>
2016-08-19 14:56:53 -07:00