Since is_same_group() is only used in the group scheduling code, there is
no need to define it outside CONFIG_FAIR_GROUP_SCHED.
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391005773-29493-1-git-send-email-dietmar.eggemann@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch both merged idle_balance() and pre_schedule() and pushes
both of them into pick_next_task().
Conceptually pre_schedule() and idle_balance() are rather similar,
both are used to pull more work onto the current CPU.
We cannot however first move idle_balance() into pre_schedule_fair()
since there is no guarantee the last runnable task is a fair task, and
thus we would miss newidle balances.
Similarly, the dl and rt pre_schedule calls must be ran before
idle_balance() since their respective tasks have higher priority and
it would not do to delay their execution searching for less important
tasks first.
However, by noticing that pick_next_tasks() already traverses the
sched_class hierarchy in the right order, we can get the right
behaviour and do away with both calls.
We must however change the special case optimization to also require
that prev is of sched_class_fair, otherwise we can miss doing a dl or
rt pull where we needed one.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/n/tip-a8k6vvaebtn64nie345kx1je@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add a new latency tolerance device PM QoS type to be use for
specifying active state (RPM_ACTIVE) memory access (DMA) latency
tolerance requirements for devices. It may be used to prevent
hardware from choosing overly aggressive energy-saving operation
modes (causing too much latency to appear) for the whole platform.
This feature reqiures hardware support, so it only will be
available for devices having a new .set_latency_tolerance()
callback in struct dev_pm_info populated, in which case the
routine pointed to by it should implement whatever is necessary
to transfer the effective requirement value to the hardware.
Whenever the effective latency tolerance changes for the device,
its .set_latency_tolerance() callback will be executed and the
effective value will be passed to it. If that value is negative,
which means that the list of latency tolerance requirements for
the device is empty, the callback is expected to switch the
underlying hardware latency tolerance control mechanism to an
autonomous mode if available. If that value is PM_QOS_LATENCY_ANY,
in turn, and the hardware supports a special "no requirement"
setting, the callback is expected to use it. That allows software
to prevent the hardware from automatically updating the device's
latency tolerance in response to its power state changes (e.g. during
transitions from D3cold to D0), which generally may be done in the
autonomous latency tolerance control mode.
If .set_latency_tolerance() is present for the device, a new
pm_qos_latency_tolerance_us attribute will be present in the
devivce's power directory in sysfs. Then, user space can use
that attribute to specify its latency tolerance requirement for
the device, if any. Writing "any" to it means "no requirement, but
do not let the hardware control latency tolerance" and writing
"auto" to it allows the hardware to be switched to the autonomous
mode if there are no other requirements from the kernel side in the
device's list.
This changeset includes a fix from Mika Westerberg.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add a new field, no_constraints_value, to struct pm_qos_constraints
representing a list of PM QoS constraint requests to be returned by
pm_qos_get_value() when that list of requests is empty.
That field will be equal to default_value for all of the existing
global PM QoS classes and for the resume latency device PM QoS type,
but it will be different from default_value for the new latency
tolerance device PM QoS type introduced by the next changeset.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The idle post_schedule flag is just a vile waste of time, furthermore
it appears unneeded, move the idle_enter_fair() call into
pick_next_task_idle().
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: alex.shi@linaro.org
Cc: mingo@kernel.org
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-aljykihtxJt3mkokxi0qZurb@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since commit 2f36825b1 ("sched: Next buddy hint on sleep and preempt
path") it is likely we pick a new task from the same cgroup, doing a put
and then set on all intermediate entities is a waste of time, so try to
avoid this.
Measured using:
mount nodev /cgroup -t cgroup -o cpu
cd /cgroup
mkdir a; cd a
mkdir b; cd b
mkdir c; cd c
echo $$ > tasks
perf stat --repeat 10 -- taskset 1 perf bench sched pipe
PRE : 4.542422684 seconds time elapsed ( +- 0.33% )
POST: 4.389409991 seconds time elapsed ( +- 0.32% )
Which shows a significant improvement of ~3.5%
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1328936700.2476.17.camel@laptop
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In order to avoid having to do put/set on a whole cgroup hierarchy
when we context switch, push the put into pick_next_task() so that
both operations are in the same function. Further changes then allow
us to possibly optimize away redundant work.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1328936700.2476.17.camel@laptop
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Track depth in cgroup tree, this is useful for things like
find_matching_se() where you need to get to a common parent of two
sched entities.
Keeping the depth avoids having to calculate it on the spot, which
saves a number of possible cache-misses.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1328936700.2476.17.camel@laptop
Signed-off-by: Ingo Molnar <mingo@kernel.org>
idle_balance() modifies the rq->idle_stamp field, making this information
shared across core.c and fair.c.
As we know if the cpu is going to idle or not with the previous patch, let's
encapsulate the rq->idle_stamp information in core.c by moving it up to the
caller.
The idle_balance() function returns true in case a balancing occured and the
cpu won't be idle, false if no balance happened and the cpu is going idle.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: alex.shi@linaro.org
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1389949444-14821-3-git-send-email-daniel.lezcano@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The scheduler main function 'schedule()' checks if there are no more tasks
on the runqueue. Then it checks if a task should be pulled in the current
runqueue in idle_balance() assuming it will go to idle otherwise.
But idle_balance() releases the rq->lock in order to look up the sched
domains and takes the lock again right after. That opens a window where
another cpu may put a task in our runqueue, so we won't go to idle but
we have filled the idle_stamp, thinking we will.
This patch closes the window by checking if the runqueue has been modified
but without pulling a task after taking the lock again, so we won't go to idle
right after in the __schedule() function.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: alex.shi@linaro.org
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1389949444-14821-2-git-send-email-daniel.lezcano@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The __lockdep_no_validate check in mark_held_locks() adds the subtle
and (afaics) unnecessary difference between no-validate and check==0.
And this looks even more inconsistent because __lock_acquire() skips
mark_irqflags()->mark_lock() if !check.
Change mark_held_locks() to check hlock->check instead.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140120182013.GA26505@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Test-case:
DEFINE_MUTEX(m1);
DEFINE_MUTEX(m2);
DEFINE_MUTEX(mx);
void lockdep_should_complain(void)
{
lockdep_set_novalidate_class(&mx);
// m1 -> mx -> m2
mutex_lock(&m1);
mutex_lock(&mx);
mutex_lock(&m2);
mutex_unlock(&m2);
mutex_unlock(&mx);
mutex_unlock(&m1);
// m2 -> m1 ; should trigger the warning
mutex_lock(&m2);
mutex_lock(&m1);
mutex_unlock(&m1);
mutex_unlock(&m2);
}
this doesn't trigger any warning, lockdep can't detect the trivial
deadlock.
This is because lock(&mx) correctly avoids m1 -> mx dependency, it
skips validate_chain() due to mx->check == 0. But lock(&m2) wrongly
adds mx -> m2 and thus m1 -> m2 is not created.
rcu_lock_acquire()->lock_acquire(check => 0) is fine due to read == 2,
so currently only __lockdep_no_validate__ can trigger this problem.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140120182010.GA26498@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The "int check" argument of lock_acquire() and held_lock->check are
misleading. This is actually a boolean: 2 means "true", everything
else is "false".
And there is no need to pass 1 or 0 to lock_acquire() depending on
CONFIG_PROVE_LOCKING, __lock_acquire() checks prove_locking at the
start and clears "check" if !CONFIG_PROVE_LOCKING.
Note: probably we can simply kill this member/arg. The only explicit
user of check => 0 is rcu_lock_acquire(), perhaps we can change it to
use lock_acquire(trylock =>, read => 2). __lockdep_no_validate means
check => 0 implicitly, but we can change validate_chain() to check
hlock->instance->key instead. Not to mention it would be nice to get
rid of lockdep_set_novalidate_class().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140120182006.GA26495@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As patch "sched: Move the priority specific bits into a new header file" exposes
the priority related macros in linux/sched/prio.h, we don't have to implement
task_nice() in kernel/sched/core.c any more.
This patch implements it in linux/sched/sched.h as static inline function,
saving the kernel stack and enhancing performance a bit.
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Cc: clark.williams@gmail.com
Cc: rostedt@goodmis.org
Cc: raistlin@linux.it
Cc: juri.lelli@gmail.com
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1390878045-7096-1-git-send-email-yangds.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The hrtimer mode of broadcast is supported only when
GENERIC_CLOCKEVENTS_BROADCAST and TICK_ONESHOT config options
are enabled. Hence compile in the functions for hrtimer mode
of broadcast only when these options are selected.
Also fix max_delta_ticks value for the pseudo clock device.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/52F719EE.9010304@linux.vnet.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When p is current and it's not of dl class, then there are no other
dl taks in the rq. If we had had pushable tasks in some other rq,
they would have been pushed earlier. So, skip "p == rq->curr" case.
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Acked-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140128072421.32315.25300.stgit@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Calling printk() from NMI context is bad (TM), so move it to IRQ
context.
This also avoids the problem where the printk() time is measured by
the generic NMI duration goo and triggers a second warning.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-75dv35xf6dhhmeb7nq6fua31@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
cgroup_cfts_commit() walks the cgroup hierarchy that the target
subsystem is attached to and tries to apply the file changes. Due to
the convolution with inode locking, it can't keep cgroup_mutex locked
while iterating. It currently holds only RCU read lock around the
actual iteration and then pins the found cgroup using dget().
Unfortunately, this is incorrect. Although the iteration does check
cgroup_is_dead() before invoking dget(), there's nothing which
prevents the dentry from going away inbetween. Note that this is
different from the usual css iterations where css_tryget() is used to
pin the css - css_tryget() tests whether the css can be pinned and
fails if not.
The problem can be solved by simply holding cgroup_mutex instead of
RCU read lock around the iteration, which actually reduces LOC.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: stable@vger.kernel.org
cgroup_create() was returning 0 after allocation failures. Fix it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: stable@vger.kernel.org
When cgroup_mount() fails to allocate an id for the root, it didn't
set ret before jumping to unlock_drop ending up returning 0 after a
failure. Fix it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: stable@vger.kernel.org
Sometimes the cleanup after memcg hierarchy testing gets stuck in
mem_cgroup_reparent_charges(), unable to bring non-kmem usage down to 0.
There may turn out to be several causes, but a major cause is this: the
workitem to offline parent can get run before workitem to offline child;
parent's mem_cgroup_reparent_charges() circles around waiting for the
child's pages to be reparented to its lrus, but it's holding cgroup_mutex
which prevents the child from reaching its mem_cgroup_reparent_charges().
Just use an ordered workqueue for cgroup_destroy_wq.
tj: Committing as the temporary fix until the reverse dependency can
be removed from memcg. Comment updated accordingly.
Fixes: e5fca243ab ("cgroup: use a dedicated workqueue for cgroup destruction")
Suggested-by: Filipe Brandenburger <filbranden@google.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Tejun Heo <tj@kernel.org>
Make the stub function static inline instead of static and move the
clockevents related function into the proper ifdeffed section.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Soren Brinkmann <soren.brinkmann@xilinx.com>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
On some architectures, in certain CPU deep idle states the local timers stop.
An external clock device is used to wakeup these CPUs. The kernel support for the
wakeup of these CPUs is provided by the tick broadcast framework by using the
external clock device as the wakeup source.
However not all implementations of architectures provide such an external
clock device. This patch includes support in the broadcast framework to handle
the wakeup of the CPUs in deep idle states on such systems by queuing a hrtimer
on one of the CPUs, which is meant to handle the wakeup of CPUs in deep idle states.
This patchset introduces a pseudo clock device which can be registered by the
archs as tick_broadcast_device in the absence of a real external clock
device. Once registered, the broadcast framework will work as is for these
architectures as long as the archs take care of the BROADCAST_ENTER
notification failing for one of the CPUs. This CPU is made the stand by CPU to
handle wakeup of the CPUs in deep idle and it *must not enter deep idle states*.
The CPU with the earliest wakeup is chosen to be this CPU. Hence this way the
stand by CPU dynamically moves around and so does the hrtimer which is queued
to trigger at the next earliest wakeup time. This is consistent with the case where
an external clock device is present. The smp affinity of this clock device is
set to the CPU with the earliest wakeup. This patchset handles the hotplug of
the stand by CPU as well by moving the hrtimer on to the CPU handling the CPU_DEAD
notification.
Originally-from: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: deepthi@linux.vnet.ibm.com
Cc: paulmck@linux.vnet.ibm.com
Cc: fweisbec@gmail.com
Cc: paulus@samba.org
Cc: srivatsa.bhat@linux.vnet.ibm.com
Cc: svaidy@linux.vnet.ibm.com
Cc: peterz@infradead.org
Cc: benh@kernel.crashing.org
Cc: rafael.j.wysocki@intel.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20140207080632.17187.80532.stgit@preeti.in.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
clockevent devices in periodic mode are not updated when the frequency
of the device changes. Issue a dev->set_mode() callback which forces
the device to reevaluate the timer settings.
Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Michal Simek <michal.simek@xilinx.com>
Link: http://lkml.kernel.org/r/1391466877-28908-3-git-send-email-soren.brinkmann@xilinx.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We can identify the broadcast device in the core and serialize all
callers including interrupts on a different CPU against the update.
Also, disabling interrupts is moved into the core allowing callers to
leave interrutps enabled when calling clockevents_update_freq().
Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Soeren Brinkmann <soren.brinkmann@xilinx.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Michal Simek <michal.simek@xilinx.com>
Link: http://lkml.kernel.org/r/1391466877-28908-2-git-send-email-soren.brinkmann@xilinx.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
For better use of CPU idle time, allow the scheduler to select the CPU
on which the CMOS clock sync work would be scheduled. This improves
idle residency time and conserver power.
This functionality is enabled when CONFIG_WQ_POWER_EFFICIENT is selected.
Signed-off-by: Shaibal Dutta <shaibal.dutta@broadcom.com>
[zoran.markovic@linaro.org: Added commit message. Aligned code.]
Signed-off-by: Zoran Markovic <zoran.markovic@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/1391195904-12497-1-git-send-email-zoran.markovic@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When compiling for the IA-64 ski emulator, HZ is set to 32 because the
emulation is slow and we don't want to waste too many cycles processing
timers. Alpha also has an option to set HZ to 32.
This causes integer underflow in
kernel/time/jiffies.c:
kernel/time/jiffies.c:66:2: warning: large integer implicitly truncated to unsigned type [-Woverflow]
.mult = NSEC_PER_JIFFY << JIFFIES_SHIFT, /* details above */
^
This patch reduces the JIFFIES_SHIFT value to avoid the overflow.
Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1401241639100.23871@file01.intranet.prod.int.rdu2.redhat.com
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This changes 'do_execve()' to get the executable name as a 'struct
filename', and to free it when it is done. This is what the normal
users want, and it simplifies and streamlines their error handling.
The controlled lifetime of the executable name also fixes a
use-after-free problem with the trace_sched_process_exec tracepoint: the
lifetime of the passed-in string for kernel users was not at all
obvious, and the user-mode helper code used UMH_WAIT_EXEC to serialize
the pathname allocation lifetime with the execve() having finished,
which in turn meant that the trace point that happened after
mm_release() of the old process VM ended up using already free'd memory.
To solve the kernel string lifetime issue, this simply introduces
"getname_kernel()" that works like the normal user-space getname()
function, except with the source coming from kernel memory.
As Oleg points out, this also means that we could drop the tcomm[] array
from 'struct linux_binprm', since the pathname lifetime now covers
setup_new_exec(). That would be a separate cleanup.
Reported-by: Igor Zhbanov <i.zhbanov@samsung.com>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The generic_chip.c uses interfaces from irq_domain.c which is
controlled by the IRQ_DOMAIN config option, but there is no Kconfig
dependency so the build can fail:
linux/kernel/irq/generic-chip.c:400:11: error:
'irq_domain_xlate_onetwocell' undeclared here (not in a function)
Select IRQ_DOMAIN when GENERIC_IRQ_CHIP is selected.
Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Link: http://lkml.kernel.org/r/1391129410-54548-2-git-send-email-nitin.a.kamble@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org # 3.11+
arch/arm/mach-tegra/pm.c, kernel/power/console.c and mm/vmpressure.c
were somehow getting slab.h indirectly through cgroup.h which in turn
was getting it indirectly through xattr.h. A scheduled cgroup change
drops xattr.h inclusion from cgroup.h and breaks compilation of these
three files. Add explicit slab.h includes to the three files.
A pending cgroup patch depends on this change and it'd be great if
this can be routed through cgroup/for-3.14-fixes branch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Stephen Warren <swarren@wwwdotorg.org>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: linux-tegra@vger.kernel.org
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: linux-pm@vger.kernel.org
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: cgroups@vger.kernel.org
In compat_sys_old_getrlimit() we pass a kernel pointer to
sys_old_getrlimit() inside a set_fs() bracket. This is okay, so we
can safely cast the affected pointer to __user.
In compat_clock_nanosleep_restart(), the variable "rmtp" holds a user
pointer. Annotate it as such.
Both of these warnings are ancient, but were reported by Fengguang
Wu's test system due to other changes.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Toyo Abe <toyoa@mvista.com>
Link: http://lkml.kernel.org/n/tip-507h7cq5e45eg6ygtykon3bf@git.kernel.org
We have two APIs for compatiblity timespec/val, with confusingly
similar names. compat_(get|put)_time(val|spec) *do* handle the case
where COMPAT_USE_64BIT_TIME is set, whereas
(get|put)_compat_time(val|spec) do not. This is an accident waiting
to happen.
Clean it up by favoring the full-service version; the limited version
is replaced with double-underscore versions static to kernel/compat.c.
A common pattern is to convert a struct timespec to kernel format in
an allocation on the user stack. Unfortunately it is open-coded in
several places. Since this allocation isn't actually needed if
COMPAT_USE_64BIT_TIME is true (since user format == kernel format)
encapsulate that whole pattern into the function
compat_convert_timespec(). An equivalent function should be written
for struct timeval if it is needed in the future.
Finally, get rid of compat_(get|put)_timeval_convert(): each was only
used once, and the latter was not even doing what the function said
(no conversion actually was being done.) Moving the conversion into
compat_sys_settimeofday() itself makes the code much more similar to
sys_settimeofday() itself.
v3: Remove unused compat_convert_timeval().
v2: Drop bogus "const" in the destination argument for
compat_convert_time*().
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Hans Verkuil <hans.verkuil@cisco.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: H.J. Lu <hjl.tools@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Pull timer/dynticks updates from Ingo Molnar:
"This tree contains misc dynticks updates: a fix and three cleanups"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/nohz: Fix overflow error in scheduler_tick_max_deferment()
nohz_full: fix code style issue of tick_nohz_full_stop_tick
nohz: Get timekeeping max deferment outside jiffies_lock
tick: Rename tick_check_idle() to tick_irq_enter()
Pull core debug changes from Ingo Molnar:
"This contains mostly kernel debugging related updates:
- make hung_task detection more configurable to distros
- add final bits for x86 UV NMI debugging, with related KGDB changes
- update the mailing-list of MAINTAINERS entries I'm involved with"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hung_task: Display every hung task warning
sysctl: Add neg_one as a standard constraint
x86/uv/nmi, kgdb/kdb: Fix UV NMI handler when KDB not configured
x86/uv/nmi: Fix Sparse warnings
kgdb/kdb: Fix no KDB config problem
MAINTAINERS: Restore "L: linux-kernel@vger.kernel.org" entries
After commit 9a46ad6d6d ("smp: make smp_call_function_many() use logic
similar to smp_call_function_single()"), cfd->cpumask is accessed only
in smp_call_function_many(). So there is no more need to copy it into
cfd->cpumask_ipi before putting csd into the list. The cpumask_ipi
field is obsolete and can be removed.
Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Wang YanQing <udknight@gmail.com>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Cc: Shaohua Li <shli@fusionio.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make smp_call_function_single and friends more efficient by using a
lockless list.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull core block IO changes from Jens Axboe:
"The major piece in here is the immutable bio_ve series from Kent, the
rest is fairly minor. It was supposed to go in last round, but
various issues pushed it to this release instead. The pull request
contains:
- Various smaller blk-mq fixes from different folks. Nothing major
here, just minor fixes and cleanups.
- Fix for a memory leak in the error path in the block ioctl code
from Christian Engelmayer.
- Header export fix from CaiZhiyong.
- Finally the immutable biovec changes from Kent Overstreet. This
enables some nice future work on making arbitrarily sized bios
possible, and splitting more efficient. Related fixes to immutable
bio_vecs:
- dm-cache immutable fixup from Mike Snitzer.
- btrfs immutable fixup from Muthu Kumar.
- bio-integrity fix from Nic Bellinger, which is also going to stable"
* 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
xtensa: fixup simdisk driver to work with immutable bio_vecs
block/blk-mq-cpu.c: use hotcpu_notifier()
blk-mq: for_each_* macro correctness
block: Fix memory leak in rw_copy_check_uvector() handling
bio-integrity: Fix bio_integrity_verify segment start bug
block: remove unrelated header files and export symbol
blk-mq: uses page->list incorrectly
blk-mq: use __smp_call_function_single directly
btrfs: fix missing increment of bi_remaining
Revert "block: Warn and free bio if bi_end_io is not set"
block: Warn and free bio if bi_end_io is not set
blk-mq: fix initializing request's start time
block: blk-mq: don't export blk_mq_free_queue()
block: blk-mq: make blk_sync_queue support mq
block: blk-mq: support draining mq queue
dm cache: increment bi_remaining when bi_end_io is restored
block: fixup for generic bio chaining
block: Really silence spurious compiler warnings
block: Silence spurious compiler warnings
block: Kill bio_pair_split()
...
Pull vfs updates from Al Viro:
"Assorted stuff; the biggest pile here is Christoph's ACL series. Plus
assorted cleanups and fixes all over the place...
There will be another pile later this week"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (43 commits)
__dentry_path() fixes
vfs: Remove second variable named error in __dentry_path
vfs: Is mounted should be testing mnt_ns for NULL or error.
Fix race when checking i_size on direct i/o read
hfsplus: remove can_set_xattr
nfsd: use get_acl and ->set_acl
fs: remove generic_acl
nfs: use generic posix ACL infrastructure for v3 Posix ACLs
gfs2: use generic posix ACL infrastructure
jfs: use generic posix ACL infrastructure
xfs: use generic posix ACL infrastructure
reiserfs: use generic posix ACL infrastructure
ocfs2: use generic posix ACL infrastructure
jffs2: use generic posix ACL infrastructure
hfsplus: use generic posix ACL infrastructure
f2fs: use generic posix ACL infrastructure
ext2/3/4: use generic posix ACL infrastructure
btrfs: use generic posix ACL infrastructure
fs: make posix_acl_create more useful
fs: make posix_acl_chmod more useful
...
Cleanup suggested by Mel Gorman. Now the code contains some more
hints on what statistics go where.
Suggested-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Chegu Vinod <chegu_vinod@hp.com>
Link: http://lkml.kernel.org/r/1390860228-21539-10-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We track both the node of the memory after a NUMA fault, and the node
of the CPU on which the fault happened. Rename the local variables in
task_numa_fault to make things more explicit.
Suggested-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Chegu Vinod <chegu_vinod@hp.com>
Link: http://lkml.kernel.org/r/1390860228-21539-9-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The current code in task_numa_placement calculates the difference
between the old and the new value, but also temporarily stores half
of the old value in the per-process variables.
The NUMA balancing code looks at those per-process variables, and
having other tasks temporarily see halved statistics could lead to
unwanted numa migrations. This can be avoided by doing all the math
in local variables.
This change also simplifies the code a little.
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Chegu Vinod <chegu_vinod@hp.com>
Link: http://lkml.kernel.org/r/1390860228-21539-8-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>