evie/android_kernel_oneplus_msm8998 - Gay Catgirls Forgejo: gay catgirls having sex

evie/android_kernel_oneplus_msm8998

Author	SHA1	Message	Date
Sami Tolvanen	249d2baf9b	ANDROID: dm verity fec: limit error correction recursion If verity tree itself is sufficiently corrupted in addition to data blocks, it's possible for error correction to end up in a deep recursive error correction loop that eventually causes a kernel panic as follows: [ 14.728962] [<ffffffc0008c1a14>] verity_fec_decode+0xa8/0x138 [ 14.734691] [<ffffffc0008c3ee0>] verity_verify_level+0x11c/0x180 [ 14.740681] [<ffffffc0008c482c>] verity_hash_for_block+0x88/0xe0 [ 14.746671] [<ffffffc0008c1508>] fec_decode_rsb+0x318/0x75c [ 14.752226] [<ffffffc0008c1a14>] verity_fec_decode+0xa8/0x138 [ 14.757956] [<ffffffc0008c3ee0>] verity_verify_level+0x11c/0x180 [ 14.763944] [<ffffffc0008c482c>] verity_hash_for_block+0x88/0xe0 This change limits the recursion to a reasonable level during a single I/O operation. Bug: 28943429 Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Change-Id: I0a7ebff331d259c59a5e03c81918cc1613c3a766 (cherry picked from commit f4b9e40597e73942d2286a73463c55f26f61bfa7)	2016-06-06 14:16:11 -07:00
Jeff Vander Stoep	0ac94b48c0	ANDROID: restrict access to perf events Add: CONFIG_SECURITY_PERF_EVENTS_RESTRICT=y to android-base.cfg The kernel.perf_event_paranoid sysctl is set to 3 by default. No unprivileged use of the perf_event_open syscall will be permitted unless it is changed. Bug: 29054680 Change-Id: Ie7512259150e146d8e382dc64d40e8faaa438917	2016-06-01 14:21:04 -07:00
Jeff Vander Stoep	9d5f5d9346	FROMLIST: security,perf: Allow further restriction of perf_event_open When kernel.perf_event_open is set to 3 (or greater), disallow all access to performance events by users without CAP_SYS_ADMIN. Add a Kconfig symbol CONFIG_SECURITY_PERF_EVENTS_RESTRICT that makes this value the default. This is based on a similar feature in grsecurity (CONFIG_GRKERNSEC_PERF_HARDEN). This version doesn't include making the variable read-only. It also allows enabling further restriction at run-time regardless of whether the default is changed. https://lkml.org/lkml/2016/1/11/587 Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Bug: 29054680 Change-Id: Iff5bff4fc1042e85866df9faa01bce8d04335ab8	2016-05-31 22:22:16 -07:00
Ben Hutchings	ebac2a3dcd	BACKPORT: perf tools: Document the perf sysctls perf_event_paranoid was only documented in source code and a perf error message. Copy the documentation from the error message to Documentation/sysctl/kernel.txt. perf_cpu_time_max_percent was already documented but missing from the list at the top, so add it there. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-doc@vger.kernel.org Link: http://lkml.kernel.org/r/20160119213515.GG2637@decadent.org.uk [ Remove reference to external Documentation file, provide info inline, as before ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Bug: 29054680 Change-Id: I13e73cfb2ad761c94762d0c8196df7725abdf5c5	2016-05-31 22:22:01 -07:00
Ard Biesheuvel	d8853fd2e9	UPSTREAM: arm64: module: avoid undefined shift behavior in reloc_data() Compilers may engage the improbability drive when encountering shifts by a distance that is a multiple of the size of the operand type. Since the required bounds check is very simple here, we can get rid of all the fuzzy masking, shifting and comparing, and use the documented bounds directly. Change-Id: Ibc1b73f4a630bc182deb6edfa7458b5e29ba9577 Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-05-31 13:26:37 -07:00
Ard Biesheuvel	ef55b4532b	UPSTREAM: arm64: module: fix relocation of movz instruction with negative immediate The test whether a movz instruction with a signed immediate should be turned into a movn instruction (i.e., when the immediate is negative) is flawed, since the value of imm is always positive. Also, the subsequent bounds check is incorrect since the limit update never executes, due to the fact that the imm_type comparison will always be false for negative signed immediates. Let's fix this by performing the sign test on sval directly, and replacing the bounds check with a simple comparison against U16_MAX. Change-Id: I9ad3d8bfd91e5fdc6434b1be6c3062dfec193176 Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [will: tidied up use of sval, renamed MOVK enum value to MOVKZ] Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-05-31 13:25:56 -07:00
Amit Pundir	fd2e341b47	Revert "armv6 dcc tty driver" This reverts commit `97312429c2`. Drop AOSP's "armv6 dcc tty driver" in favor of upstream DCC driver for ARMv6/v7 `16c63f8ea4` (drivers: char: hvc: add arm JTAG DCC console support) and for ARMv8 `4cad4c57e0` (ARM64: TTY: hvc_dcc: Add support for ARM64 dcc). Change-Id: I0ca651ef2d854fff03cee070524fe1e3971b6d8f Signed-off-by: Amit Pundir <amit.pundir@linaro.org>	2016-05-26 23:53:27 +05:30
Amit Pundir	dbe3b633a7	Revert "arm: dcc_tty: fix armv6 dcc tty build failure" This reverts commit `dfc1d4be88`. Drop AOSP's "armv6 dcc tty driver" in favor of upstream DCC driver for ARMv6/v7 `16c63f8ea4` (drivers: char: hvc: add arm JTAG DCC console support) and for ARMv8 `4cad4c57e0` (ARM64: TTY: hvc_dcc: Add support for ARM64 dcc). Change-Id: I8110a4fd649b8ac1ec9bfac00255c1214135e4b2 Signed-off-by: Amit Pundir <amit.pundir@linaro.org>	2016-05-26 23:52:35 +05:30
Dmitry Shmidt	9f6f7a2724	ARM64: Ignore Image-dtb from git point of view Change-Id: I5bbf1db90f28ea956383b4a5d91ad508eea656dc Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>	2016-05-24 14:42:55 -07:00
Haojian Zhuang	c11dd134ea	arm64: add option to build Image-dtb Some bootloaders couldn't decompress Image.gz-dtb. Change-Id: I698cd0c4ee6894e8d0655d88f3ecf4826c28a645 Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>	2016-05-24 14:21:58 -07:00
Winter Wang	cba8d1f729	ANDROID: usb: gadget: f_midi: set fi->f to NULL when free f_midi function fi->f is set in f_midi's alloc_func, need to clean this to NULL in free_func, otherwise on ConfigFS's function switch, midi->usb_function it self is freed, fi->f will be a wild pointer and run into below kernel panic: --------------- [ 58.950628] Unable to handle kernel paging request at virtual address 63697664 [ 58.957869] pgd = c0004000 [ 58.960583] [63697664] *pgd=00000000 [ 58.964185] Internal error: Oops: 80000005 [#1] PREEMPT SMP ARM [ 58.970111] Modules linked in: [ 58.973191] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.15-03504-g34c857c-dirty #89 [ 58.981024] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [ 58.987557] task: c110bd70 ti: c1100000 task.ti: c1100000 [ 58.992962] PC is at 0x63697664 [ 58.996120] LR is at android_setup+0x78/0x138 <..snip..> [ 60.044980] 1fc0: ffffffff ffffffff c1000684 00000000 00000000 c108ecd0 c11f7294 c11039c0 [ 60.053181] 1fe0: c108eccc c110d148 1000406a 412fc09a 00000000 1000807c 00000000 00000000 [ 60.061420] [<c073b1fc>] (android_setup) from [<c0730490>] (udc_irq+0x758/0x1034) [ 60.068951] [<c0730490>] (udc_irq) from [<c017c650>] (handle_irq_event_percpu+0x50/0x254) [ 60.077165] [<c017c650>] (handle_irq_event_percpu) from [<c017c890>] (handle_irq_event+0x3c/0x5c) [ 60.086072] [<c017c890>] (handle_irq_event) from [<c017f3ec>] (handle_fasteoi_irq+0xe0/0x198) [ 60.094630] [<c017f3ec>] (handle_fasteoi_irq) from [<c017bcfc>] (generic_handle_irq+0x2c/0x3c) [ 60.103271] [<c017bcfc>] (generic_handle_irq) from [<c017bfb8>] (__handle_domain_irq+0x7c/0xec) [ 60.112000] [<c017bfb8>] (__handle_domain_irq) from [<c0101450>] (gic_handle_irq+0x24/0x5c) -------------- Signed-off-by: Winter Wang <wente.wang@nxp.com>	2016-05-23 15:53:44 -07:00
Jeff Mahoney	8f2aea82ca	UPSTREAM: mac80211: fix "warning: ‘target_metric’ may be used uninitialized" (This cherry-picks b4201cc4fc6e1c57d6d306b1f787865043d60129 upstream) This fixes: net/mac80211/mesh_hwmp.c:603:26: warning: ‘target_metric’ may be used uninitialized in this function target_metric is only consumed when reply = true so no bug exists here, but not all versions of gcc realize it. Initialize to 0 to remove the warning. Change-Id: I13923fda9d314f48196c29e4354133dfe01f5abd Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> [jstultz: Cherry-picked to android-4.4] Signed-off-by: John Stultz <john.stultz@linaro.org>	2016-05-17 17:36:19 +00:00
Peter Hurley	1f95b0e9be	UPSTREAM: tty: Fix unsafe ldisc reference via ioctl(TIOCGETD) (cherry pick from commit 5c17c861a357e9458001f021a7afa7aab9937439) ioctl(TIOCGETD) retrieves the line discipline id directly from the ldisc because the line discipline id (c_line) in termios is untrustworthy; userspace may have set termios via ioctl(TCSETS*) without actually changing the line discipline via ioctl(TIOCSETD). However, directly accessing the current ldisc via tty->ldisc is unsafe; the ldisc ptr dereferenced may be stale if the line discipline is changing via ioctl(TIOCSETD) or hangup. Wait for the line discipline reference (just like read() or write()) to retrieve the "current" line discipline id. Cc: <stable@vger.kernel.org> Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Bug: 28409131 Change-Id: I6774bd883a2e48bbe020486c72c42fb410e3f98a	2016-05-17 09:44:29 -07:00
Amit Pundir	9a326f6c08	Revert "drivers: power: use 'current' instead of 'get_current()'" This reverts commit `e1b5d10389`. This patch fixed the aosp commit `ad86cc8ad6` (drivers: power: Add watchdog timer to catch drivers which lockup during suspend.), which we dropped in Change Id Ic72a87432e27844155467817600adc6cf0c2209c, so we no longer need this fix. A part of this patch is already reverted in above mentioned Change Id. Signed-off-by: Amit Pundir <amit.pundir@linaro.org>	2016-05-17 16:33:47 +05:30
Amit Pundir	287d9d0efb	cpufreq: interactive: drop cpufreq_{get,put}_global_kobject func calls Upstream commit `8eec1020f0` (cpufreq: create cpu/cpufreq at boot time) make sure that cpufreq sysfs entry get created at boot time, and there is no need to create/destroy it on need basis anymore. So drop deprecated cpufreq_{get,put}_global_kobject function calls which otherwise result in following compilation errors: drivers/cpufreq/cpufreq_interactive.c: In function 'cpufreq_governor_interactive': drivers/cpufreq/cpufreq_interactive.c:1187:4: error: implicit declaration of function 'cpufreq_get_global_kobject' [-Werror=implicit-function-declaration] WARN_ON(cpufreq_get_global_kobject()); ^ drivers/cpufreq/cpufreq_interactive.c:1197:5: error: implicit declaration of function 'cpufreq_put_global_kobject'[-Werror=implicit-function-declaration] cpufreq_put_global_kobject(); ^ Signed-off-by: Amit Pundir <amit.pundir@linaro.org>	2016-05-16 13:47:55 +05:30
Amit Pundir	d495067ae1	Revert "cpufreq: interactive: build fixes for 4.4" This reverts commit `bc68f6c4ef`. This build fix broke the Interactive Gov at runtime with duplicate sysfs entry warnings at boot time. We no longer need to this create/destroy cpufreq sysfs entry at run time on need basis thanks to upstream commit `8eec1020f0` (cpufreq: create cpu/cpufreq at boot time) which creates it at boot time. Hence drop this build fix. Signed-off-by: Amit Pundir <amit.pundir@linaro.org>	2016-05-16 13:43:28 +05:30
John Stultz	cc0063b8eb	xt_qtaguid: Fix panic caused by processing non-full socket. In an issue very similar to `4e461c777e` (xt_qtaguid: Fix panic caused by synack processing), we were seeing panics on occasion in testing. In this case, it was the same issue, but caused by a different call path, as the sk being returned from qtaguid_find_sk() was not a full socket. Resulting in the sk->sk_socket deref to fail. This patch adds an extra check to ensure the sk being retuned is a full socket, and if not it returns NULL. Reported-by: Milosz Wasilewski <milosz.wasilewski@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org>	2016-05-12 11:24:10 -07:00
Dmitry Shmidt	239dd54388	fiq_debugger: Add fiq_debugger.disable option This change allows to use same kernel image with different console options for uart and fiq_debugger. If fiq_debugger.disable will be set to 1/y/Y, fiq_debugger will not be initialized. Change-Id: I71fda54f5f863d13b1437b1f909e52dd375d002d Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>	2016-05-12 10:45:33 -07:00
Janis Danisevskis	92c4fc6f09	UPSTREAM: procfs: fixes pthread cross-thread naming if !PR_DUMPABLE The PR_DUMPABLE flag causes the pid related paths of the proc file system to be owned by ROOT. The implementation of pthread_set/getname_np however needs access to /proc/<pid>/task/<tid>/comm. If PR_DUMPABLE is false this implementation is locked out. This patch installs a special permission function for the file "comm" that grants read and write access to all threads of the same group regardless of the ownership of the inode. For all other threads the function falls back to the generic inode permission check. Signed-off-by: Janis Danisevskis <jdanis@google.com>	2016-05-11 08:46:18 +00:00
Jimmy Perchet	e9253e8760	FROMLIST: wlcore: Disable filtering in AP role When you configure (set it up) a STA interface, the driver install a multicast filter. This is normal behavior, when one application subscribe to multicast address the filter is updated. When Access Point interface is configured, there is no filter installation and the "filter update" path is disabled in the driver. The problem happens when you switch an interface from STA type to AP type. The filter is installed but there are no means to update it. Change-Id: Ied22323af831575303abd548574918baa9852dd0 Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>	2016-05-10 15:13:21 -07:00
Lianwei Wang	e3b53389cf	Revert "drivers: power: Add watchdog timer to catch drivers which lockup during suspend." This reverts commit `ad86cc8ad6`. Commit `70fea60d88` ("PM / Sleep: Detect device suspend/resume lockup...") added a suspend/resume watchdog timer to catch the lockup. Let's revert the duplicate one. Change-Id: Ic72a87432e27844155467817600adc6cf0c2209c Signed-off-by: Lianwei Wang <lianwei.wang@gmail.com> Signed-off-by: Amit Pundir <amit.pundir@linaro.org>	2016-05-10 16:16:06 +05:30
Patrick Bellasi	fed95d9ef1	DEBUG: schedtune: add tracepoint for schedtune_tasks_update() values Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>	2016-05-10 16:54:43 +08:00
Patrick Bellasi	a09a25c5df	DEBUG: schedtune: add tracepoint for CPU boost signal Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>	2016-05-10 16:54:42 +08:00
Patrick Bellasi	c8a65d2e9a	DEBUG: schedtune: add tracepoint for SchedTune configuration update Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>	2016-05-10 16:54:42 +08:00
Dietmar Eggemann	0a0f4aa12f	DEBUG: sched: add energy procfs interface This patch makes the energy data available via procfs. The related files are placed as sub-directory named 'energy' inside the /proc/sys/kernel/sched_domain/cpuX/domainY/groupZ directory for those cpu/domain/group tuples which have energy information. The following example depicts the contents of /proc/sys/kernel/sched_domain/cpu0/domain0/group[01] for a system which has energy information attached to domain level 0. ├── cpu0 │ ├── domain0 │ │ ├── busy_factor │ │ ├── busy_idx │ │ ├── cache_nice_tries │ │ ├── flags │ │ ├── forkexec_idx │ │ ├── group0 │ │ │ └── energy │ │ │ ├── cap_states │ │ │ ├── idle_states │ │ │ ├── nr_cap_states │ │ │ └── nr_idle_states │ │ ├── group1 │ │ │ └── energy │ │ │ ├── cap_states │ │ │ ├── idle_states │ │ │ ├── nr_cap_states │ │ │ └── nr_idle_states │ │ ├── idle_idx │ │ ├── imbalance_pct │ │ ├── max_interval │ │ ├── max_newidle_lb_cost │ │ ├── min_interval │ │ ├── name │ │ ├── newidle_idx │ │ └── wake_idx │ └── domain1 │ ├── busy_factor │ ├── busy_idx │ ├── cache_nice_tries │ ├── flags │ ├── forkexec_idx │ ├── idle_idx │ ├── imbalance_pct │ ├── max_interval │ ├── max_newidle_lb_cost │ ├── min_interval │ ├── name │ ├── newidle_idx │ └── wake_idx The files 'nr_idle_states' and 'nr_cap_states' contain a scalar value whereas 'idle_states' and 'cap_states' contain a vector of power consumption at this idle state respectively (compute capacity, power consumption) at this capacity state. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>	2016-05-10 16:54:42 +08:00
Juri Lelli	dd2460f387	DEBUG: sched,cpufreq: add cpu_capacity change tracepoint This is useful when we want to compare cpu utilization and cpu curr capacity side by side. Signed-off-by: Juri Lelli <juri.lelli@arm.com>	2016-05-10 16:54:42 +08:00
Juri Lelli	e5a25996d3	DEBUG: sched: add tracepoint for CPU load/util signals Signed-off-by: Juri Lelli <juri.lelli@arm.com>	2016-05-10 16:53:25 +08:00
Juri Lelli	8017fd7418	DEBUG: sched: add tracepoint for task load/util signals Signed-off-by: Juri Lelli <juri.lelli@arm.com>	2016-05-10 16:53:25 +08:00
Juri Lelli	99ed4e57cb	DEBUG: sched: add tracepoint for cpu/freq scale invariance Signed-off-by: Juri Lelli <juri.lelli@arm.com>	2016-05-10 16:53:25 +08:00
Patrick Bellasi	a2a6dc7508	sched/fair: filter energy_diff() based on energy_payoff value Once the SchedTune support is enabled and the CPU bandwidth demand of a task is boosted, we could expect increased energy consumptions which are balanced by corresponding increases of tasks performance. However, the current implementation of the energy_diff() function accepts all and _only_ the schedule candidates which results into a reduced expected system energy, which works against the boosting strategy. This patch links the energy_diff() function with the "energy payoff" engine provided by SchedTune. The energy variation computed by the energy_diff() function is now filtered using the SchedTune support to evaluated the energy payoff for a boosted task. With that patch, the energy_diff() function is going to reported as "acceptable schedule candidate" only the schedule candidate which corresponds to a positive energy_payoff. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>	2016-05-10 16:53:25 +08:00
Patrick Bellasi	637ee3715a	sched/tune: add support to compute normalized energy The current EAS implementation considers only energy variations, while it disregards completely the impact on performance for the selection of a certain schedule candidate. Moreover, it also makes its decision based on the "absolute" value of expected energy variations. In order to properly define a trade-off strategy between increased energy consumption and performances benefits it is required to compare energy variations with performance variations. Thus, both performance and energy metrics must be expressed in comparable units. While the performance variations are expressed in terms of capacity deltas, which are defined in the range [0..SCHED_LOAD_SCALE], the same scale is not used for energy variations. This patch introduces the function: schedtune_normalize_energy(energy_diff) which returns a normalized value in the same range of capacity variations, i.e. [0..SCHED_LOAD_SCALE]. A proper set of energy normalization constants are required to provide a fast division by a constant during the normalziation of the energy_diff. The value of these constants depends on the specific energy model and topology of a target device. Thus, this patch provides also the required support for the computation at boot time of this set of variables. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>	2016-05-10 16:53:25 +08:00
Patrick Bellasi	36967b2412	sched/fair: keep track of energy/capacity variations The current EAS implementation does not allow "to boost" tasks performances, for example by running them at an higher OPP (or a more capable CPU), even if that could require a "reasonable" increase in energy consumption. To defined how much reasonable is an energy increase with respect to a required boost value, it is required to define and compute a trade-off between the expected energy and performance variations. However, the current EAS implementation considers only energy variations while completely disregard the impact on performance for the selection of a certain schedule candidate. This patch extends the eenv energy environment to keep track of both energy and performance deltas which are implied by the activation of a schedule candidate. The performance variation is estimated considering the different capacities of the CPUs in which the task could be scheduled. The idea is that while running on a CPU with higher capacity (e.g. higher operating point) the task could (potentially) complete faster and thus get better performance. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>	2016-05-10 16:53:25 +08:00
Patrick Bellasi	a8f65584fc	sched/fair: add boosted task utilization The task utilization signal, which is derived from PELT signals and properly scaled to be architecture and frequency invariant, is used by EAS as an estimation of the task requirements in terms of CPU bandwidth. When the energy aware scheduler is in use, this signal affects the CPU selection. Thus, a convenient way to bias that decision, which is also little intrusive, is to boost the task utilization signal each time it is required to support them. This patch introduces the new function: boosted_task_util(task) which returns a boosted value for the utilization of the specified task. The margin added to the original utilization is: 1. computed based on the "boosting strategy" in use 2. proportional to boost value defined either by the sysctl interface, when global boosting is in use, or the "taskgroup" value, when per-task boosting is enabled. The boosted signal is used by EAS a. transparently, via its integration into the task_fits() function b. explicitly, in the energy-aware wakeup path Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>	2016-05-10 16:53:25 +08:00
Patrick Bellasi	a515b887ec	sched/{fair,tune}: track RUNNABLE tasks impact on per CPU boost value When per-task boosting is enabled, every time a task enters/exits a CPU its boost value could impact the currently selected OPP for that CPU. Thus, the "aggregated" boost value for that CPU potentially needs to be updated to match the current maximum boost value among all the tasks currently RUNNABLE on that CPU. This patch introduces the required support to keep track of which boost groups are impacting a CPU. Each time a task is enqueued/dequeued to/from a CPU its boost group is used to increment a per-cpu counter of RUNNABLE tasks on that CPU. Only when the number of runnable tasks for a specific boost group becomes 1 or 0 the corresponding boost group changes its effects on that CPU, specifically: a) boost_group::tasks == 1: this boost group starts to impact the CPU b) boost_group::tasks == 0: this boost group stops to impact the CPU In each of these two conditions the aggregation function: sched_cpu_update(cpu) could be required to run in order to identify the new maximum boost value required for the CPU. The proposed patch minimizes the number of times the aggregation function is executed while still providing the required support to always boost a CPU to the maximum boost value required by all its currently RUNNABLE tasks. cc: Ingo Molnar <mingo@redhat.com> cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>	2016-05-10 16:53:24 +08:00
Patrick Bellasi	9cd53fbeb2	sched/tune: compute and keep track of per CPU boost value When per task boosting is enabled, we could have multiple RUNNABLE tasks which are concurrently scheduled on the same CPU but each one with a different boost value. For example, we could have a scenarios like this: Task SchedTune CGroup Boost Value T1 root 0 T2 low-priority 10 T3 interactive 90 In these conditions we expect a CPU to be configured according to a proper "aggregation" of the required boost values for all the tasks currently scheduled on this CPU. A suitable aggregation function is the one which tracks the MAX boost value for all the tasks RUNNABLE on a CPU. This approach allows to always satisfy the most boost demanding task while at the same time: a) boosting all the concurrently scheduled tasks thus reducing potential co-scheduling side-effects on demanding tasks b) reduce the number of frequency switch requested towards SchedDVFS, thus being more friendly to architectures with slow frequency switching times Every time a task enters/exits the RQ of a CPU the max boost value should be updated considering all the boost groups currently "affecting" that CPU, i.e. which have at least one RUNNABLE task currently allocated on that CPU. This patch introduces the required support to keep track of the boost groups currently affecting CPUs. Thanks to the limited number of boost groups, a small and memory efficient per-cpu array of boost groups values (cpu_boost_groups) is used which is updated for each CPU entry by schedtune_boostgroup_update() but only when a schedtune CGroup boost value is updated. However, this is expected to be a rare operation, perhaps done just one time at system boot time. cc: Ingo Molnar <mingo@redhat.com> cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>	2016-05-10 16:53:24 +08:00
Patrick Bellasi	13001f47c9	sched/tune: add initial support for CGroups based boosting To support task performance boosting, the usage of a single knob has the advantage to be a simple solution, both from the implementation and the usability standpoint. However, on a real system it can be difficult to identify a single value for the knob which fits the needs of multiple different tasks. For example, some kernel threads and/or user-space background services should be better managed the "standard" way while we still want to be able to boost the performance of specific workloads. In order to improve the flexibility of the task boosting mechanism this patch is the first of a small series which extends the previous implementation to introduce a "per task group" support. This first patch introduces just the basic CGroups support, a new "schedtune" CGroups controller is added which allows to configure different boost value for different groups of tasks. To keep the implementation simple but still effective for a boosting strategy, the new controller: 1. allows only a two layer hierarchy 2. supports only a limited number of boost groups A two layer hierarchy allows to place each task either: a) in the root control group thus being subject to a system-wide boosting value b) in a child of the root group thus being subject to the specific boost value defined by that "boost group" The limited number of "boost groups" supported is mainly motivated by the observation that in a real system it could be useful to have only few classes of tasks which deserve different treatment. For example, background vs foreground or interactive vs low-priority. As an additional benefit, a limited number of boost groups allows also to have a simpler implementation especially for the code required to compute the boost value for CPUs which have runnable tasks belonging to different boost groups. cc: Tejun Heo <tj@kernel.org> cc: Li Zefan <lizefan@huawei.com> cc: Johannes Weiner <hannes@cmpxchg.org> cc: Ingo Molnar <mingo@redhat.com> cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>	2016-05-10 16:53:24 +08:00
Patrick Bellasi	07e22949de	sched/fair: add boosted CPU usage The CPU usage signal is used by the scheduler as an estimation of the overall bandwidth currently allocated on a CPU. When SchedDVFS is in use, this signal affects the selection of the operating points (OPP) required to accommodate all the workload allocated in a CPU. A convenient way to boost the performance of tasks running on a CPU, which is also little intrusive, is to boost the CPU usage signal each time it is used to select an OPP. This patch introduces a new function: get_boosted_cpu_usage(cpu) to return a boosted value for the usage of a specified CPU. The margin added to the original usage is: 1. computed based on the "boosting strategy" in use 2. proportional to the system-wide boost value defined by provided user-space interface The boosted signal is used by SchedDVFS (transparently) each time it requires to get an estimation of the capacity required for a CPU. cc: Ingo Molnar <mingo@redhat.com> cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>	2016-05-10 16:53:24 +08:00
Patrick Bellasi	6ed2714501	sched/fair: add function to convert boost value into "margin" The basic idea of the boost knob is to "artificially inflate" a signal to make a task or logical CPU appears more demanding than it actually is. Independently from the specific signal, a consistent and possibly simple semantic for the concept of "signal boosting" must define: 1. how we translate the boost percentage into a "margin" value to be added to the original signal to inflate 2. what is the meaning of a boost value from a user-space perspective This patch provides the implementation of a possible boost semantic, named "Signal Proportional Compensation" (SPC), where the boost percentage (BP) is used to compute a margin (M) which is proportional to the complement of the original signal (OS): M = BP * (SCHED_LOAD_SCALE - OS) The computed margin then added to the OS to obtain the Boosted Signal (BS) BS = OS + M The proposed boost semantic has these main features: - each signal gets a boost which is proportional to its delta with respect to the maximum available capacity in the system (i.e. SCHED_LOAD_SCALE) - a 100% boosting has a clear understanding from a user-space perspective, since it means simply to run (possibly) "all" tasks at the max OPP - each boosting value means to improve the task performance by a quantity which is proportional to the maximum achievable performance on that system Thus this semantics is somehow forcing a behaviour which is: 50% boosting means to run at half-way between the current and the maximum performance which a task could achieve on that system This patch provides the code to implement a fast integer division to convert a boost percentage (BP) value into a margin (M). NOTE: this code is suitable for all signals operating in range [0..SCHED_LOAD_SCALE] cc: Ingo Molnar <mingo@redhat.com> cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>	2016-05-10 16:53:24 +08:00
Patrick Bellasi	344f4ec429	sched/tune: add sysctl interface to define a boost value The current (CFS) scheduler implementation does not allow "to boost" tasks performance by running them at a higher OPP compared to the minimum required to meet their workload demands. To support tasks performance boosting the scheduler should provide a "knob" which allows to tune how much the system is going to be optimised for energy efficiency vs performance. This patch is the first of a series which provides a simple interface to define a tuning knob. One system-wide "boost" tunable is exposed via: /proc/sys/kernel/sched_cfs_boost which can be configured in the range [0..100], to define a percentage where: - 0% boost requires to operate in "standard" mode by scheduling tasks at the minimum capacities required by the workload demand - 100% boost requires to push at maximum the task performances, "regardless" of the incurred energy consumption A boost value in between these two boundaries is used to bias the power/performance trade-off, the higher the boost value the more the scheduler is biased toward performance boosting instead of energy efficiency. cc: Ingo Molnar <mingo@redhat.com> cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>	2016-05-10 16:53:24 +08:00
Patrick Bellasi	8aaf022dd7	sched/tune: add detailed documentation The topic of a single simple power-performance tunable, that is wholly scheduler centric, and has well defined and predictable properties has come up on several occasions in the past. With techniques such as a scheduler driven DVFS, we now have a good framework for implementing such a tunable. This patch provides a detailed description of the motivations and design decisions behind the implementation of the SchedTune. cc: Jonathan Corbet <corbet@lwn.net> cc: linux-doc@vger.kernel.org Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>	2016-05-10 16:53:24 +08:00
Juri Lelli	08d1cfd7c9	fixup! sched/fair: jump to max OPP when crossing UP threshold Signed-off-by: Juri Lelli <juri.lelli@arm.com>	2016-05-10 16:53:23 +08:00
Juri Lelli	3eb29105cf	fixup! sched: scheduler-driven cpu frequency selection Signed-off-by: Juri Lelli <juri.lelli@arm.com>	2016-05-10 16:53:23 +08:00
Vincent Guittot	6e1e1ed872	sched: rt scheduler sets capacity requirement RT tasks don't provide any running constraints like deadline ones except their running priority. The only current usable input to estimate the capacity needed by RT tasks is the rt_avg metric. We use it to estimate the CPU capacity needed for the RT scheduler class. In order to monitor the evolution for RT task load, we must peridiocally check it during the tick. Then, we use the estimated capacity of the last activity to estimate the next one which can not be that accurate but is a good starting point without any impact on the wake up path of RT tasks. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Steve Muckle <smuckle@linaro.org>	2016-05-10 16:53:23 +08:00
Vincent Guittot	fab5cc59bf	sched: deadline: use deadline bandwidth in scale_rt_capacity Instead of monitoring the exec time of deadline tasks to evaluate the CPU capacity consumed by deadline scheduler class, we can directly calculate it thanks to the sum of utilization of deadline tasks on the CPU. We can remove deadline tasks from rt_avg metric and directly use the average bandwidth of deadline scheduler in scale_rt_capacity. Based in part on a similar patch from Luca Abeni <luca.abeni@unitn.it>. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Steve Muckle <smuckle@linaro.org>	2016-05-10 16:53:23 +08:00
Vincent Guittot	cd248fa758	sched: remove call of sched_avg_update from sched_rt_avg_update rt_avg is only used to scale the available CPU's capacity for CFS tasks. As the update of this scaling is done during periodic load balance, we only have to ensure that sched_avg_update has been called before any periodic load balancing. This requirement is already fulfilled by __update_cpu_load so the call in sched_rt_avg_update, which is part of the hotpath, is useless. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Steve Muckle <smuckle@linaro.org>	2016-05-10 16:53:23 +08:00
Steve Muckle	9d44dc7fe2	sched/cpufreq_sched: add trace events Trace events will aid in debugging, profiling and tuning. Signed-off-by: Steve Muckle <smuckle@linaro.org>	2016-05-10 16:53:23 +08:00
Steve Muckle	6b6c192453	sched/fair: jump to max OPP when crossing UP threshold Since the true utilization of a long running task is not detectable while it is running and might be bigger than the current cpu capacity, create the maximum cpu capacity head room by requesting the maximum cpu capacity once the cpu usage plus the capacity margin exceeds the current capacity. This is also done to try to harm the performance of a task the least. Original fair-class only version authored by Juri Lelli <juri.lelli@arm.com>. cc: Ingo Molnar <mingo@redhat.com> cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Steve Muckle <smuckle@linaro.org>	2016-05-10 16:53:23 +08:00
Juri Lelli	f99e3fe6eb	sched/fair: cpufreq_sched triggers for load balancing As we don't trigger freq changes from {en,de}queue_task_fair() during load balancing, we need to do explicitly so on load balancing paths. [smuckle@linaro.org: move update_capacity_of calls so rq lock is held] cc: Ingo Molnar <mingo@redhat.com> cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Steve Muckle <smuckle@linaro.org>	2016-05-10 16:53:23 +08:00
Juri Lelli	7ff814dd71	sched/{core,fair}: trigger OPP change request on fork() Patch "sched/fair: add triggers for OPP change requests" introduced OPP change triggers for enqueue_task_fair(), but the trigger was operating only for wakeups. Fact is that it makes sense to consider wakeup_new also (i.e., fork()), as we don't know anything about a newly created task and thus we most certainly want to jump to max OPP to not harm performance too much. However, it is not currently possible (or at least it wasn't evident to me how to do so :/) to tell new wakeups from other (non wakeup) operations. This patch introduces an additional flag in sched.h that is only set at fork() time and it is then consumed in enqueue_task_fair() for our purpose. cc: Ingo Molnar <mingo@redhat.com> cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Steve Muckle <smuckle@linaro.org>	2016-05-10 16:53:22 +08:00
Juri Lelli	ea429ccef1	sched/fair: add triggers for OPP change requests Each time a task is {en,de}queued we might need to adapt the current frequency to the new usage. Add triggers on {en,de}queue_task_fair() for this purpose. Only trigger a freq request if we are effectively waking up or going to sleep. Filter out load balancing related calls to reduce the number of triggers. [smuckle@linaro.org: resolve merge conflicts, define task_new, use renamed static key sched_freq] cc: Ingo Molnar <mingo@redhat.com> cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Steve Muckle <smuckle@linaro.org>	2016-05-10 16:53:22 +08:00

... 4 5 6 7 8 ...