evie/android_kernel_oneplus_msm8998 - Gay Catgirls Forgejo: gay catgirls having sex

evie/android_kernel_oneplus_msm8998

Author	SHA1	Message	Date
Shashank Mittal	e56ad58d2c	coresight: enable stm logging for trace events, marker and printk Dup ftrace event traffic and writes to trace_marker file from userspace to STM. Also dup trace printk traffic to STM. This allows Linux tracing and log data to be correlated with other data transported over STM. Change-Id: I4fcb42f2e97ab963fdc85853f4f3ea1f208bfc3c Signed-off-by: Pratik Patel <pratikp@codeaurora.org> [spjoshi@codeaurora.org: 3.18 code fixup] Signed-off-by: Sarangdhar Joshi <spjoshi@codeaurora.org> [mittals@codeaurora.org: 4.4 code fixup] Signed-off-by: Shashank Mittal <mittals@codeaurora.org>	2016-05-24 14:15:31 -07:00
Joonwoo Park	d9ff0d77af	sched: simplify CPU frequency estimation and cycle counter API Most of CPUs increase cycle counter by one every cycle which makes frequency = cycles / time_delta is correct. Therefore it's reasonable to get rid of current cpu_cycle_max_scale_factor and ask cycle counter read callback function to return scaled counter value when it's needed in such a case that cycle counter doesn't increase every cycle. Thus multiply NSEC_PER_SEC / HZ_PER_KHZ to CPU cycle counter delta as we calculate frequency in khz and remove cpu_cycle_max_scale_factor. This allows us to simplify frequency estimation and cycle counter API. Change-Id: Ie7a628d4bc77c9b6c769f6099ce8d75740262a14 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2016-05-20 19:23:47 -07:00
Srinivasarao P	06ae01888e	perf: duplicate deletion of perf event a malicious app can open a perf event with constraint_duplicate bit set, disable the event, and close the fd. On closing the fd, the perf_release() modification causes the kernel to clean up the event as if it still were enabled, leading to the event being removed from a list twice. CRs-Fixed: 977563 Change-Id: I5fbec3722407d2f3d0ff0d9f7097c5889e31fd62 Signed-off-by: Srinivasarao P <spathi@codeaurora.org>	2016-05-18 13:39:58 -07:00
Joonwoo Park	c0cc65346e	sched: use correct Kconfig macro name CONFIG_SCHED_HMP_CSTATE_AWARE Fix macro name so CONFIG_SCHED_HMP_CSTATE_AWARE=y to take effect. Change-Id: I0218b36b2d74974f50a173a0ac3bc59156c57624 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2016-05-16 20:10:32 -07:00
Joonwoo Park	cd947ad761	Revert "sched: set HMP scheduler's default initial task load to 100%" This reverts commit `28f67e5a50` ("sched: set HMP scheduler's default initial task load to 100%") since 100% of init task load makes too much of power inefficiency on some targets. CRs-fixed: 1006303 Change-Id: I81b4ba8fdc2e2fe1b40f18904964098fa558989b Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2016-05-16 20:10:30 -07:00
David Keitel	5834faf085	trace: cpu_freq_switch: use tracefs instead of debugfs Rather than using debugfs, switch to tracefs which trace moved to in kernel 4.4. Signed-off-by: David Keitel <dkeitel@codeaurora.org> Change-Id: I52ef7d45cabb20cc61fbd2fb3ef5016b041bc56c	2016-05-16 20:10:17 -07:00
Biswajit Paul	60c6b65403	kernel: Restrict permissions of /proc/iomem. The permissions of /proc/iomem currently are -r--r--r--. Everyone can see its content. As iomem contains information about the physical memory content of the device, restrict the information only to root. Change-Id: If0be35c3fac5274151bea87b738a48e6ec0ae891 CRs-Fixed: 786116 Signed-off-by: Biswajit Paul <biswajitpaul@codeaurora.org> Signed-off-by: Avijit Kanti Das <avijitnsec@codeaurora.org>	2016-05-09 18:35:28 -07:00
Tejun Heo	e2b6ea208b	workqueue: implement lockup detector Workqueue stalls can happen from a variety of usage bugs such as missing WQ_MEM_RECLAIM flag or concurrency managed work item indefinitely staying RUNNING. These stalls can be extremely difficult to hunt down because the usual warning mechanisms can't detect workqueue stalls and the internal state is pretty opaque. To alleviate the situation, this patch implements workqueue lockup detector. It periodically monitors all worker_pools periodically and, if any pool failed to make forward progress longer than the threshold duration, triggers warning and dumps workqueue state as follows. BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 31s! Showing busy workqueues and worker pools: workqueue events: flags=0x0 pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=17/256 pending: monkey_wrench_fn, e1000_watchdog, cache_reap, vmstat_shepherd, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, cgroup_release_agent workqueue events_power_efficient: flags=0x80 pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256 pending: check_lifetime, neigh_periodic_work workqueue cgroup_pidlist_destroy: flags=0x0 pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1 pending: cgroup_pidlist_destroy_work_fn ... The detection mechanism is controller through kernel parameter workqueue.watchdog_thresh and can be updated at runtime through the sysfs module parameter file. v2: Decoupled from softlockup control knobs. CRs-Fixed: 1007459 Change-Id: Id7dfbbd2701128a942b1bcac2299e07a66db8657 Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Don Zickus <dzickus@redhat.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Chris Mason <clm@fb.com> Cc: Andrew Morton <akpm@linux-foundation.org> Git-commit: 82607adcf9cdf40fb7b5331269780c8f70ec6e35 Git-repo: git://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git Signed-off-by: Trilok Soni <tsoni@codeaurora.org>	2016-05-05 15:05:53 -07:00
Tejun Heo	b24e86e268	watchdog: introduce touch_softlockup_watchdog_sched() touch_softlockup_watchdog() is used to tell watchdog that scheduler stall is expected. One group of usage is from paths where the task may not be able to yield for a long time such as performing slow PIO to finicky device and coming out of suspend. The other is to account for scheduler and timer going idle. For scheduler softlockup detection, there's no reason to distinguish the two cases; however, workqueue lockup detector is planned and it can use the same signals from the former group while the latter would spuriously prevent detection. This patch introduces a new function touch_softlockup_watchdog_sched() and convert the latter group to call it instead. For now, it just calls touch_softlockup_watchdog() and there's no functional difference. CRs-Fixed: 1007459 Change-Id: I6fe77926acd4240458cab29d399f81d8739a16c0 Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Git-commit: 03e0d4610bf4d4a93bfa16b2474ed4fd5243aa71 Git-repo: git://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git Signed-off-by: Trilok Soni <tsoni@codeaurora.org>	2016-05-05 15:05:52 -07:00
Joonwoo Park	55b8e041e6	sched: take into account of limited CPU min and max frequencies Actual CPU's min and max frequencies can be limited by hardware components while governor's not aware of. Provide an API for them to notify for scheduler to be able to notice accurate currently operating frequency boundaries which helps better task placement decision. CRs-fixed: 1006303 Change-Id: I608f5fa8b0baff8d9e998731dcddec59c9073d20 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2016-04-27 19:13:06 -07:00
Joonwoo Park	35f1d99e0a	sched: add support for CPU frequency estimation with cycle counter At present scheduler calculates task's demand with the task's execution time weighted over CPU frequency. The CPU frequency is given by governor's CPU frequency transition notification. Such notification may not be available. Provide an API for CPU clock driver to register callback functions so in order for scheduler to access CPU's cycle counter to estimate CPU's frequency without notification. At time point scheduler assumes the cycle counter increases always even when cluster is idle which might not be true. This will be fixed by subsequent change for more accurate I/O wait time accounting. CRs-fixed: 1006303 Change-Id: I93b187efd7bc225db80da0184683694f5ab99738 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2016-04-27 19:13:05 -07:00
Joonwoo Park	f8bf0307bc	sched: revise sched_boost to make the best of big cluster CPUs At present sched_boost changes scheduler to place tasks on the least loaded CPU under the assumption both big and little clusters capacities are same at the same level of frequency. This is suboptimal for the big.Little system that doesn't have such a symmetrical capacity between big and little CPUs. Fix sched_boost to place tasks on the big CPUs for the non-symmetrical capacity target. CRs-fixed: 1006303 Change-Id: I752f020acf1a76580edb5cd0e5ad283b62edfeed Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2016-04-25 17:44:39 -07:00
Joonwoo Park	16c433e4c5	sched: fix excessive task packing where CONFIG_SCHED_HMP_CSTATE_AWARE=y At present among the same power cost and c-state CPUs scheduler places newly waking up task on the most loaded CPU which can incur too much of task packing on the same CPU. Place onto the most loaded CPU only when the best CPU is in idle cstate, otherwise spread out by placing onto the least loaded CPU. CRs-fixed: 1006303 Change-Id: I8ae7332971b3293d912b1582f75e33fd81407d86 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2016-04-22 15:05:34 -07:00
Joonwoo Park	2e0ebb0155	sched: add option whether CPU C-state is used to guide task placement There are CPUs that don't have an obvious low power mode exit latency penalty. Add a new Kconfig CONFIG_SCHED_HMP_CSTATE_AWARE which controls whether CPU C-state is used to guide task placement. CRs-fixed: 1006303 Change-Id: Ie8dbab8e173c3a1842d922f4d1fbd8cc4221789c Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2016-04-22 15:05:24 -07:00
Syed Rameez Mustafa	d4ca4d767f	sched: update placement logic to prefer C-state and busier CPUs Update the wakeup placement logic when need_idle is not set. Break ties in power with C-state. If C-state is the same break ties with prev_cpu. Finally go for the most loaded CPU. CRs-fixed: 1006303 Change-Id: Iafa98a909ed464af33f4fe3345bbfc8e77dee963 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed bug where assigns best_cpu_cstate with uninitialized cpu_cstate.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2016-04-22 15:05:13 -07:00
Syed Rameez Mustafa	c34b0b85aa	sched: Optimize wakeup placement logic when need_idle is set Try and find the min cstate CPU within the little cluster when a task fits there. If there is no idle CPU return the least busy CPU. Also Add a prev CPU bias when C-states or load is the same. CRs-fixed: 1006303 Change-Id: I577cc70a59f2b0c5309c87b54e106211f96e04a0 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>	2016-04-22 15:05:01 -07:00
Alex Shi	2bf7955152	Merge tag 'v4.4.8' into linux-linaro-lsk-v4.4 This is the 4.4.8 stable release	2016-04-21 12:06:25 +08:00
Peter Zijlstra	695ca6389e	perf: Cure event->pending_disable race commit 28a967c3a2f99fa3b5f762f25cb2a319d933571b upstream. Because event_sched_out() checks event->pending_disable _before_ actually disabling the event, it can happen that the event fires after it checks but before it gets disabled. This would leave event->pending_disable set and the queued irq_work will try and process it. However, if the event trigger was during schedule(), the event might have been de-scheduled by the time the irq_work runs, and perf_event_disable_local() will fail. Fix this by checking event->pending_disable _after_ we call event->pmu->del(). This depends on the latter being a compiler barrier, such that the compiler does not lift the load and re-creates the problem. Tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dvyukov@google.com Cc: eranian@google.com Cc: oleg@redhat.com Cc: panand@redhat.com Cc: sasha.levin@oracle.com Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160224174948.040469884@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-20 15:42:14 +09:00
Peter Zijlstra	3c1a5d344e	perf: Do not double free commit 130056275ade730e7a79c110212c8815202773ee upstream. In case of: err_file: fput(event_file), we'll end up calling perf_release() which in turn will free the event. Do not then free the event _again_. Tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dvyukov@google.com Cc: eranian@google.com Cc: oleg@redhat.com Cc: panand@redhat.com Cc: sasha.levin@oracle.com Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160224174947.697350349@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-20 15:42:14 +09:00
Alexei Starovoitov	e8e4323262	bpf: avoid copying junk bytes in bpf_get_current_comm() [ Upstream commit cdc4e47da8f4c32eeb6b2061a8a834f4362a12b7 ] Lots of places in the kernel use memcpy(buf, comm, TASK_COMM_LEN); but the result is typically passed to print("%s", buf) and extra bytes after zero don't cause any harm. In bpf the result of bpf_get_current_comm() is used as the part of map key and was causing spurious hash map mismatches. Use strlcpy() to guarantee zero-terminated string. bpf verifier checks that output buffer is zero-initialized, so even for short task names the output buffer don't have junk bytes. Note it's not a security concern, since kprobe+bpf is root only. Fixes: `ffeedafbf0` ("bpf: introduce current->pid, tgid, uid, gid, comm accessors") Reported-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-20 15:42:01 +09:00
Liam Mark	21f86651a6	android/lowmemorykiller: Ignore tasks with freed mm A killed task can stay in the task list long after its memory has been returned to the system, therefore ignore any tasks whose mm struct has been freed. Change-Id: I76394b203b4ab2312437c839976f0ecb7b6dde4e CRs-fixed: 450383 Signed-off-by: Liam Mark <lmark@codeaurora.org>	2016-04-13 11:09:29 -07:00
Alex Shi	ad592b70ae	Merge tag 'v4.4.7' into linux-linaro-lsk-v4.4 This is the 4.4.7 stable release	2016-04-13 12:02:17 +08:00
Jeevan Shriram	a6c4b5ad91	kernel: sched: Fix compilation issues for Usermode Linux Fix compilation errors for ARCH=um for x86_64 architecture. CRs-Fixed: 996252 Change-Id: I414b551e28a950e4b601f31bb4bfa2f1200d1713 Signed-off-by: Jeevan Shriram <jshriram@codeaurora.org>	2016-04-12 15:49:42 -07:00
Thomas Gleixner	2a8225ef46	sched/cputime: Fix steal time accounting vs. CPU hotplug commit e9532e69b8d1d1284e8ecf8d2586de34aec61244 upstream. On CPU hotplug the steal time accounting can keep a stale rq->prev_steal_time value over CPU down and up. So after the CPU comes up again the delta calculation in steal_account_process_tick() wreckages itself due to the unsigned math: u64 steal = paravirt_steal_clock(smp_processor_id()); steal -= this_rq()->prev_steal_time; So if steal is smaller than rq->prev_steal_time we end up with an insane large value which then gets added to rq->prev_steal_time, resulting in a permanent wreckage of the accounting. As a consequence the per CPU stats in /proc/stat become stale. Nice trick to tell the world how idle the system is (100%) while the CPU is 100% busy running tasks. Though we prefer realistic numbers. None of the accounting values which use a previous value to account for fractions is reset at CPU hotplug time. update_rq_clock_task() has a sanity check for prev_irq_time and prev_steal_time_rq, but that sanity check solely deals with clock warps and limits the /proc/stat visible wreckage. The prev_time values are still wrong. Solution is simple: Reset rq->prev_*_time when the CPU is plugged in again. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: commit `095c0aa83e` "sched: adjust scheduler cpu power for stolen time" Fixes: commit `aa48380851` "sched: Remove irq time from available CPU power" Fixes: commit `e6e6685acc` "KVM guest: Steal time accounting" Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603041539490.3686@nanos Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-12 09:09:05 -07:00
Lukas Wunner	4cd4ebbdf5	PM / sleep: Clear pm_suspend_global_flags upon hibernate commit 276142730c39c9839465a36a90e5674a8c34e839 upstream. When suspending to RAM, waking up and later suspending to disk, we gratuitously runtime resume devices after the thaw phase. This does not occur if we always suspend to RAM or always to disk. pm_complete_with_resume_check(), which gets called from pci_pm_complete() among others, schedules a runtime resume if PM_SUSPEND_FLAG_FW_RESUME is set. The flag is set during a suspend-to-RAM cycle. It is cleared at the beginning of the suspend-to-RAM cycle but not afterwards and it is not cleared during a suspend-to-disk cycle at all. Fix it. Fixes: `ef25ba0476` (PM / sleep: Add flags to indicate platform firmware involvement) Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-12 09:09:05 -07:00
Steven Rostedt (Red Hat)	3dba3f672d	tracing: Fix trace_printk() to print when not using bprintk() commit 3debb0a9ddb16526de8b456491b7db60114f7b5e upstream. The trace_printk() code will allocate extra buffers if the compile detects that a trace_printk() is used. To do this, the format of the trace_printk() is saved to the __trace_printk_fmt section, and if that section is bigger than zero, the buffers are allocated (along with a message that this has happened). If trace_printk() uses a format that is not a constant, and thus something not guaranteed to be around when the print happens, the compiler optimizes the fmt out, as it is not used, and the __trace_printk_fmt section is not filled. This means the kernel will not allocate the special buffers needed for the trace_printk() and the trace_printk() will not write anything to the tracing buffer. Adding a "__used" to the variable in the __trace_printk_fmt section will keep it around, even though it is set to NULL. This will keep the string from being printed in the debugfs/tracing/printk_formats section as it is not needed. Reported-by: Vlastimil Babka <vbabka@suse.cz> Fixes: `07d777fe8c` "tracing: Add percpu buffers for trace_printk()" Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-12 09:09:00 -07:00
Steven Rostedt (Red Hat)	aab3ba82f8	tracing: Fix crash from reading trace_pipe with sendfile commit a29054d9478d0435ab01b7544da4f674ab13f533 upstream. If tracing contains data and the trace_pipe file is read with sendfile(), then it can trigger a NULL pointer dereference and various BUG_ON within the VM code. There's a patch to fix this in the splice_to_pipe() code, but it's also a good idea to not let that happen from trace_pipe either. Link: http://lkml.kernel.org/r/1457641146-9068-1-git-send-email-rabin@rab.in Reported-by: Rabin Vincent <rabin.vincent@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-12 09:08:59 -07:00
Steven Rostedt (Red Hat)	aa60f652ee	tracing: Have preempt(irqs)off trace preempt disabled functions commit cb86e05390debcc084cfdb0a71ed4c5dbbec517d upstream. Joel Fernandes reported that the function tracing of preempt disabled sections was not being reported when running either the preemptirqsoff or preemptoff tracers. This was due to the fact that the function tracer callback for those tracers checked if irqs were disabled before tracing. But this fails when we want to trace preempt off locations as well. Joel explained that he wanted to see funcitons where interrupts are enabled but preemption was disabled. The expected output he wanted: <...>-2265 1d.h1 3419us : preempt_count_sub <-irq_exit <...>-2265 1d..1 3419us : __do_softirq <-irq_exit <...>-2265 1d..1 3419us : msecs_to_jiffies <-__do_softirq <...>-2265 1d..1 3420us : irqtime_account_irq <-__do_softirq <...>-2265 1d..1 3420us : __local_bh_disable_ip <-__do_softirq <...>-2265 1..s1 3421us : run_timer_softirq <-__do_softirq <...>-2265 1..s1 3421us : hrtimer_run_pending <-run_timer_softirq <...>-2265 1..s1 3421us : _raw_spin_lock_irq <-run_timer_softirq <...>-2265 1d.s1 3422us : preempt_count_add <-_raw_spin_lock_irq <...>-2265 1d.s2 3422us : _raw_spin_unlock_irq <-run_timer_softirq <...>-2265 1..s2 3422us : preempt_count_sub <-_raw_spin_unlock_irq <...>-2265 1..s1 3423us : rcu_bh_qs <-__do_softirq <...>-2265 1d.s1 3423us : irqtime_account_irq <-__do_softirq <...>-2265 1d.s1 3423us : __local_bh_enable <-__do_softirq There's a comment saying that the irq disabled check is because there's a possible race that tracing_cpu may be set when the function is executed. But I don't remember that race. For now, I added a check for preemption being enabled too to not record the function, as there would be no race if that was the case. I need to re-investigate this, as I'm now thinking that the tracing_cpu will always be correct. But no harm in keeping the check for now, except for the slight performance hit. Link: http://lkml.kernel.org/r/1457770386-88717-1-git-send-email-agnel.joel@gmail.com Fixes: `5e6d2b9cfa` "tracing: Use one prologue for the preempt irqs off tracer function tracers" Reported-by: Joel Fernandes <agnel.joel@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-12 09:08:59 -07:00
Jann Horn	74b23f79f1	fs/coredump: prevent fsuid=0 dumps into user-controlled directories commit 378c6520e7d29280f400ef2ceaf155c86f05a71a upstream. This commit fixes the following security hole affecting systems where all of the following conditions are fulfilled: - The fs.suid_dumpable sysctl is set to 2. - The kernel.core_pattern sysctl's value starts with "/". (Systems where kernel.core_pattern starts with "\|/" are not affected.) - Unprivileged user namespace creation is permitted. (This is true on Linux >=3.8, but some distributions disallow it by default using a distro patch.) Under these conditions, if a program executes under secure exec rules, causing it to run with the SUID_DUMP_ROOT flag, then unshares its user namespace, changes its root directory and crashes, the coredump will be written using fsuid=0 and a path derived from kernel.core_pattern - but this path is interpreted relative to the root directory of the process, allowing the attacker to control where a coredump will be written with root privileges. To fix the security issue, always interpret core_pattern for dumps that are written under SUID_DUMP_ROOT relative to the root directory of init. Signed-off-by: Jann Horn <jann@thejh.net> Acked-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-12 09:08:58 -07:00
Tejun Heo	36591ef19a	cgroup: ignore css_sets associated with dead cgroups during migration commit 2b021cbf3cb6208f0d40fd2f1869f237934340ed upstream. Before `2e91fa7f6d` ("cgroup: keep zombies associated with their original cgroups"), all dead tasks were associated with init_css_set. If a zombie task is requested for migration, while migration prep operations would still be performed on init_css_set, the actual migration would ignore zombie tasks. As init_css_set is always valid, this worked fine. However, after `2e91fa7f6d`, zombie tasks stay with the css_set it was associated with at the time of death. Let's say a task T associated with cgroup A on hierarchy H-1 and cgroup B on hiearchy H-2. After T becomes a zombie, it would still remain associated with A and B. If A only contains zombie tasks, it can be removed. On removal, A gets marked offline but stays pinned until all zombies are drained. At this point, if migration is initiated on T to a cgroup C on hierarchy H-2, migration path would try to prepare T's css_set for migration and trigger the following. WARNING: CPU: 0 PID: 1576 at kernel/cgroup.c:474 cgroup_get+0x121/0x160() CPU: 0 PID: 1576 Comm: bash Not tainted 4.4.0-work+ #289 ... Call Trace: [<ffffffff8127e63c>] dump_stack+0x4e/0x82 [<ffffffff810445e8>] warn_slowpath_common+0x78/0xb0 [<ffffffff810446d5>] warn_slowpath_null+0x15/0x20 [<ffffffff810c33e1>] cgroup_get+0x121/0x160 [<ffffffff810c349b>] link_css_set+0x7b/0x90 [<ffffffff810c4fbc>] find_css_set+0x3bc/0x5e0 [<ffffffff810c5269>] cgroup_migrate_prepare_dst+0x89/0x1f0 [<ffffffff810c7547>] cgroup_attach_task+0x157/0x230 [<ffffffff810c7a17>] __cgroup_procs_write+0x2b7/0x470 [<ffffffff810c7bdc>] cgroup_tasks_write+0xc/0x10 [<ffffffff810c4790>] cgroup_file_write+0x30/0x1b0 [<ffffffff811c68fc>] kernfs_fop_write+0x13c/0x180 [<ffffffff81151673>] __vfs_write+0x23/0xe0 [<ffffffff81152494>] vfs_write+0xa4/0x1a0 [<ffffffff811532d4>] SyS_write+0x44/0xa0 [<ffffffff814af2d7>] entry_SYSCALL_64_fastpath+0x12/0x6f It doesn't make sense to prepare migration for css_sets pointing to dead cgroups as they are guaranteed to contain only zombies which are ignored later during migration. This patch makes cgroup destruction path mark all affected css_sets as dead and updates the migration path to ignore them during preparation. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: `2e91fa7f6d` ("cgroup: keep zombies associated with their original cgroups") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-12 09:08:54 -07:00
Joshua Hunt	5f4a82d5e3	watchdog: don't run proc_watchdog_update if new value is same as old commit a1ee1932aa6bea0bb074f5e3ced112664e4637ed upstream. While working on a script to restore all sysctl params before a series of tests I found that writing any value into the /proc/sys/kernel/{nmi_watchdog,soft_watchdog,watchdog,watchdog_thresh} causes them to call proc_watchdog_update(). NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. There doesn't appear to be a reason for doing this work every time a write occurs, so only do it when the values change. Signed-off-by: Josh Hunt <johunt@akamai.com> Acked-by: Don Zickus <dzickus@redhat.com> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-12 09:08:54 -07:00
Chris Friesen	af080e5802	sched/cputime: Fix steal_account_process_tick() to always return jiffies commit f9c904b7613b8b4c85b10cd6b33ad41b2843fa9d upstream. The callers of steal_account_process_tick() expect it to return whether a jiffy should be considered stolen or not. Currently the return value of steal_account_process_tick() is in units of cputime, which vary between either jiffies or nsecs depending on CONFIG_VIRT_CPU_ACCOUNTING_GEN. If cputime has nsecs granularity and there is a tiny amount of stolen time (a few nsecs, say) then we will consider the entire tick stolen and will not account the tick on user/system/idle, causing /proc/stats to show invalid data. The fix is to change steal_account_process_tick() to accumulate the stolen time and only account it once it's worth a jiffy. (Thanks to Frederic Weisbecker for suggestions to fix a bug in my first version of the patch.) Signed-off-by: Chris Friesen <chris.friesen@windriver.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/56DBBDB8.40305@mail.usask.ca Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-12 09:08:35 -07:00
Alexander Shishkin	37014e0c5c	perf/core: Fix perf_sched_count derailment commit 927a5570855836e5d5859a80ce7e91e963545e8f upstream. The error path in perf_event_open() is such that asking for a sampling event on a PMU that doesn't generate interrupts will end up in dropping the perf_sched_count even though it hasn't been incremented for this event yet. Given a sufficient amount of these calls, we'll end up disabling scheduler's jump label even though we'd still have active events in the system, thereby facilitating the arrival of the infernal regions upon us. I'm fixing this by moving account_event() inside perf_event_alloc(). Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/1456917854-29427-1-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-12 09:08:34 -07:00
Prasad Sodagudi	efea890321	genirq: call cancel_work_sync from irq_set_affinity_notifier When ever notification of IRQ affinity changes, call cancel_work_sync from irq_set_affinity_notifier to cancel all pending works to avoid work list corruption. Change-Id: I1f093bcc43be8c6696bad29250e4926cbc6c4029 Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>	2016-03-25 16:02:35 -07:00
Pavankumar Kondeti	bd887e4a58	sched: fix circular dependency of rq->lock and kswadp waitqueue lock There is a deadlock scenario due to the circular dependency of CPU's rq->lock and kswapd's waitqueue lock. (1) when kswapd is woken up, try_to_wake_up() is called with it's waitqueue lock held. It's previous CPU is offline, so it is woken up on a different CPU. We try to acquire the offline CPU's rq->lock in either cpufreq change callback or fixup_busy_time() (2) At the same time, the offline CPU is coming online and init_idle() is called from __cpu_up(). init_idle() calls __sched_fork() with rq->lock held. A debug object allocation in hrtimer_init() called from __sched_fork() is trying to wakeup the kswapd and attempts to take the waitqueue lock held in the (1) path. Task specific initialization is done in __sched_fork() and rq->lock is not held when it is called for other tasks. The same holds true for the idle task as well. __sched_fork() for the idle task is called only when the CPU is not active. Acquire the rq->lock after calling __sched_fork() in init_idle() to fix this deadlock. CRs-Fixed: 965873 Change-Id: Ib8a265835c29861dba571c9b2a6b7e75b5cb43ee Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> [satyap: trivial merge conflicts resolution and omitted changes for QHMP] Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>	2016-03-23 21:30:42 -07:00
Joonwoo Park	375d7195fc	sched: move out migration notification out of spinlock The commit `5e16bbc2fb` ("sched: Streamline the task migration locking a little") hardened task migration locking and now __migrate_task() is called after rq lock held. Move out notification out of spinlock. Change-Id: I553adcfe80d5c670f4ddf83438226fd5e0924fe8 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2016-03-23 21:25:25 -07:00
Joonwoo Park	d96fdc91d1	sched: fix compile failure with !CONFIG_SCHED_HMP Fix various compilation failures when CONFIG_SCHED_HMP or CONFIG_SCHED_INPUT isn't enabled. Change-Id: I385dd37cfd778919f54f606bc13bebedd2fb5b9e Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2016-03-23 21:25:24 -07:00
Joonwoo Park	16ecb20600	sched: restrict sync wakee placement bias with waker's demand Biasing sync wakee task towards waker CPU's cluster makes sense when the waker's demand is high enough so the wakee also can take advantage of high CPU frequency voted because of waker's load. Placing sync wakee on the low demand waker's CPU can lead placement imbalance which can lead unnecessary migration. Introduce a new tunable "sched_big_waker_task_load" that defines the big waker so scheduler avoid wakee on waker's cluster bias when the waker's load is below the tunable. CRs-fixed: 971295 Change-Id: I1550ede0a71ac8c9be74a7daabe164c6a269a3fb Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> [joonwoop@codeaurora.org: fixed a minor conflict in include/linux/sched/sysctl.h.]	2016-03-23 21:25:23 -07:00
Joonwoo Park	616e04a51c	sched: add preference for waker cluster CPU in wakee task placement If sync wakee task's demand is small it's worth to place the wakee task on waker's cluster for better performance in the sense that waker and wakee are corelated so the wakee should take advantage of waker cluster's frequency which is voted by the waker along with cache locality benefit. While biasing towards the waker's cluster we want to avoid the waker CPU as much as possible as placing the wakee on the waker's CPU can make the waker got preempted and migrated by load balancer. Introduce a new tunable 'sched_small_wakee_task_load' that differentiates eligible small wakee task and place the small wakee tasks on the waker's cluster. CRs-fixed: 971295 Change-Id: I96897d9a72a6f63dca4986d9219c2058cd5a7916 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> [joonwoop@codeaurora.org: fixed a minor conflict in include/linux/sched/sysctl.h.]	2016-03-23 21:25:22 -07:00
Olav Haugan	b29f9a7a84	sched/core: Add protection against null-pointer dereference p->grp is being accessed outside of lock which can cause null-pointer dereference. Fix this and also add rcu critical section around access of this data structure. CRs-fixed: 985379 Change-Id: Ic82de6ae2821845d704f0ec18046cc6a24f98e39 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict in init_new_task_load().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2016-03-23 21:25:21 -07:00
Joonwoo Park	615b6f6221	sched: allow select_prev_cpu_us to be set to values greater than 100us At present sched_select_prev_cpu_us tunable is restricted to values below 100us. Fix this unintended restriction. CRs-Fixed: 972237 Change-Id: I5eaf9f40468805c396328ca1022baef32acf8de0 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2016-03-23 21:25:20 -07:00
Pavankumar Kondeti	fbeb32ce8f	sched: clean up idle task's mark_start restoring in init_idle() The idle task's mark_start can get updated even without the CPU being online. Hence the mark_start is restored when the CPU is coming online. The idle task's mark_start is reset in init_idle()->__sched_fork()-> init_new_task_load(). The original mark_start is saved and restored later. This can be avoided by moving init_new_task_load() to wake_up_new_task(), which never gets called for an idle task. We only care about idle task's ravg.mark_start and not initializing the other fields of ravg struct will not have any side effects. This clean up allows the subsequent patches to drop the rq->lock while calling __sched_fork() in init_idle(). CRs-Fixed: 965873 Change-Id: I41de6d69944d7d44b9c4d11b2d97ad01bd8fe96d Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> [joonwoop@codeaurora.org: fixed a minor conflict in core.c. omitted changes for CONFIG_SCHED_QHMP.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2016-03-23 21:25:19 -07:00
Pavankumar Kondeti	58d411413f	sched: let sched_boost take precedence over sched_restrict_cluster_spill When sched_restrict_cluster_spill knob is enabled, RT tasks are restricted to lower power cluster. This knob also restricts inter cluster no-hz kicks. Ignore this knob setting when sched_boost is enabled so that tasks are placed on CPUs with highest spare capacity. CRs-Fixed: 968852 Change-Id: I01b3fc10b39dc834a733d64c2ee29c308d7ff730 Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>	2016-03-23 21:25:18 -07:00
Pavankumar Kondeti	6d742ce87b	sched: Add separate load tracking histogram to predict loads Current window based load tracking only saves history for five windows. A historically heavy task's heavy load will be completely forgotten after five windows of light load. Even before the five window expires, a heavy task wakes up on same CPU it used to run won't trigger any frequency change until end of the window. It would starve for the entire window. It also adds one "small" load window to history because it's accumulating load at a low frequency, further reducing the tracked load for this heavy task. Ideally, scheduler should be able to identify such tasks and notify governor to increase frequency immediately after it wakes up. Add a histogram for each task to track a much longer load history. A prediction will be made based on runtime of previous or current window, histogram data and load tracked in recent windows. Prediction of all tasks that is currently running or runnable on a CPU is aggregated and reported to CPUFreq governor in sched_get_cpus_busy(). sched_get_cpus_busy() now returns predicted busy time in addition to previous window busy time and new task busy time, scaled to the CPU maximum possible frequency. Tunables: - /proc/sys/kernel/sched_gov_alert_freq (KHz) This tunable can be used to further filter the notifications. Frequency alert notification is sent only when the predicted load exceeds previous window load by sched_gov_alert_freq converted to load. Change-Id: If29098cd2c5499163ceaff18668639db76ee8504 Suggested-by: Saravana Kannan <skannan@codeaurora.org> Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Junjie Wu <junjiew@codeaurora.org> [joonwoop@codeaurora.org: fixed merge conflicts around __migrate_task() and removed changes for CONFIG_SCHED_QHMP.]	2016-03-23 21:25:17 -07:00
Junjie Wu	efa673322f	sched: Provide a wake up API without sending freq notifications Each time a task wakes up, scheduler evaluates its load and notifies governor if the resulting frequency of destination CPU is larger than a threshold. However, some governor wakes up a separate task that handles frequency change, which again calls wake_up_process(). This is dangerous because if the task being woken up meets the threshold and ends up being moved around, there is a potential for endless recursive notifications. Introduce a new API for waking up a task without triggering frequency notification. Change-Id: I24261af81b7dc410c7fb01eaa90920b8d66fbd2a Signed-off-by: Junjie Wu <junjiew@codeaurora.org>	2016-03-23 21:25:17 -07:00
Pavankumar Kondeti	71a8c392b7	sched: Take downmigrate threshold into consideration If the tasks are run on the higher capacity cluster solely due to the reason that they can not be be fit in the lower capacity cluster, the downmigrate threshold prevents the frequent tasks migrations between the clusters. Change-Id: I234a23ffd907c2476c94d5f6227dab1bb6c9bebb Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>	2016-03-23 21:25:16 -07:00
Pavankumar Kondeti	6003b006be	sched: Provide a facility to restrict RT tasks to lower power cluster The current CPU selection algorithm for RT tasks looks for the least loaded CPU in all clusters. Stop the search at the lowest possible power cluster based on "sched_restrict_cluster_spill" sysctl tunable. Change-Id: I34fdaefea56e0d1b7e7178d800f1bb86aa0ec01c Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>	2016-03-23 21:25:15 -07:00
Pavankumar Kondeti	8cd1d7ef16	sched: Take cluster's minimum power into account for optimizing sbc() The select_best_cpu() algorithm iterates over all the clusters and selects the most power efficient CPU that satisfies the task needs. During the search, skip the next cluster if its minimum power cost is higher than the power cost of an eligible CPU found in the previous cluster. In a b.L system, if the BIG cluster minimum power cost is higher than the maximum power cost of the little cluster, this optimization avoids searching the BIG cluster if an eligible CPU is found in the little cluster. Change-Id: I5e3755f107edb6c72180edbec2a658be931c276d Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>	2016-03-23 21:25:14 -07:00
Pavankumar Kondeti	6418f213ab	sched: Revise the inter cluster load balance restrictions The frequency based inter cluster load balance restrictions are not reliable as frequency does not provide a good estimate of the CPU's current load. Replace them with the spill_load and spill_nr_run based checks. The higher capacity cluster is restricted from pulling the tasks from the lower capacity cluster unless all of the lower capacity CPUs are above spill. This behavior can be controlled by a sysctl tunable and it is disabled by default (i.e. no load balance restrictions). Change-Id: I45c09c8adcb61a8a7d4e08beadf2f97f1805fb42 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> [joonwoop@codeaurora.org: fixed merge conflicts due to omitted changes for CONFIG_SCHED_QHMP.]	2016-03-23 21:25:13 -07:00
Srivatsa Vaddagiri	3004236139	sched: colocate related threads Provide userspace interface for tasks to be grouped together as "related" threads. For example, all threads involved in updating display buffer could be tagged as related. Scheduler will attempt to provide special treatment for group of related threads such as: 1) Colocation of related threads in same "preferred" cluster 2) Aggregation of demand towards determination of cluster frequency This patch extends scheduler to provide best-effort colocation support for a group of related threads. Change-Id: Ic2cd769faf5da4d03a8f3cb0ada6224d0101a5f5 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [joonwoop@codeaurora.org: fixed minor merge conflicts. removed ifdefry for CONFIG_SCHED_QHMP.] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2016-03-23 21:25:12 -07:00

1 2 3 4 5 ...