Commit graph

10103 commits

Author SHA1 Message Date
Linus Torvalds
44032e6316 module: reduce stack usage for each_symbol()
And now that I'm looking at that call-chain (to see if it would make sense
to use some other more specific lock - doesn't look like it: all the
readers are using RCU and this is the only writer), I also give you this
trivial one-liner. It changes each_symbol() to not put that constant array
on the stack, resulting in changing

        movq    $C.388.31095, %rsi      #, tmp85
        subq    $376, %rsp      #,
        movq    %rdi, %rbx      # fn, fn
        leaq    -208(%rbp), %rdi        #, tmp84
        movq    %rbx, %rdx      # fn,
        rep movsl
        xorl    %esi, %esi      #
        leaq    -208(%rbp), %rdi        #, tmp87
        movq    %r12, %rcx      # data,
        call    each_symbol_in_section.clone.0  #

into

        xorl    %esi, %esi      #
        subq    $216, %rsp      #,
        movq    %rdi, %rbx      # fn, fn
        movq    $arr.31078, %rdi        #,
        call    each_symbol_in_section.clone.0  #

which is not so much about being obviously shorter and simpler because we
don't unnecessarily copy that constant array around onto the stack, but
also about having a much smaller stack footprint (376 vs 216 bytes - see
the update of 'rsp').

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-08-05 12:59:06 +09:30
Rusty Russell
22e268ebec module: refactor load_module part 5
1) Extract out the relocation loop into apply_relocations
2) Extract license and version checks into check_module_license_and_versions
3) Extract icache flushing into flush_module_icache
4) Move __obsparm warning into find_module_sections
5) Move license setting into check_modinfo.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-08-05 12:59:05 +09:30
Rusty Russell
9f85a4bbb1 module: refactor load_module part 4
Allocate references inside module_unload_init(), clean up inside
module_unload_free().

This version fixed to do allocation before __this_cpu_write, thanks to
bug reports from linux-next from Dave Young <hidave.darkstar@gmail.com>
and Stephen Rothwell <sfr@canb.auug.org.au>.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-08-05 12:59:05 +09:30
Rusty Russell
40dd2560ec module: refactor load_module part 3
Extract out the allocation and copying in from userspace, and the
first set of modinfo checks.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-08-05 12:59:04 +09:30
Linus Torvalds
65b8a9b4d5 module: refactor load_module part 2
Here's a second one. It's slightly less trivial - since we now have error
cases - and equally untested so it may well be totally broken. But it also
cleans up a bit more, and avoids one of the goto targets, because the
"move_module()" helper now does both allocations or none.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-08-05 12:59:03 +09:30
Linus Torvalds
f91a13bb99 module: refactor load_module
I'd start from the trivial stuff. There's a fair amount of straight-line
code that just makes the function hard to read just because you have to
page up and down so far. Some of it is trivial to just create a helper
function for.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-08-05 12:59:02 +09:30
Eric Dumazet
2409e74278 module: module_unload_init() cleanup
No need to clear mod->refptr in module_unload_init(), since
alloc_percpu() already clears allocated chunks.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (removed unused var)
2010-08-05 12:59:02 +09:30
Linus Torvalds
3cfc2c42c1 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (48 commits)
  Documentation: update broken web addresses.
  fix comment typo "choosed" -> "chosen"
  hostap:hostap_hw.c Fix typo in comment
  Fix spelling contorller -> controller in comments
  Kconfig.debug: FAIL_IO_TIMEOUT: typo Faul -> Fault
  fs/Kconfig: Fix typo Userpace -> Userspace
  Removing dead MACH_U300_BS26
  drivers/infiniband: Remove unnecessary casts of private_data
  fs/ocfs2: Remove unnecessary casts of private_data
  libfc: use ARRAY_SIZE
  scsi: bfa: use ARRAY_SIZE
  drm: i915: use ARRAY_SIZE
  drm: drm_edid: use ARRAY_SIZE
  synclink: use ARRAY_SIZE
  block: cciss: use ARRAY_SIZE
  comment typo fixes: charater => character
  fix comment typos concerning "challenge"
  arm: plat-spear: fix typo in kerneldoc
  reiserfs: typo comment fix
  update email address
  ...
2010-08-04 15:31:02 -07:00
Linus Torvalds
b7c8e55db7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (39 commits)
  random: Reorder struct entropy_store to remove padding on 64bits
  padata: update API documentation
  padata: Remove padata_get_cpumask
  crypto: pcrypt - Update pcrypt cpumask according to the padata cpumask notifier
  crypto: pcrypt - Rename pcrypt_instance
  padata: Pass the padata cpumasks to the cpumask_change_notifier chain
  padata: Rearrange set_cpumask functions
  padata: Rename padata_alloc functions
  crypto: pcrypt - Dont calulate a callback cpu on empty callback cpumask
  padata: Check for valid cpumasks
  padata: Allocate cpumask dependend recources in any case
  padata: Fix cpu index counting
  crypto: geode_aes - Convert pci_table entries to PCI_VDEVICE (if PCI_ANY_ID is used)
  pcrypt: Added sysfs interface to pcrypt
  padata: Added sysfs primitives to padata subsystem
  padata: Make two separate cpumasks
  padata: update documentation
  padata: simplify serialization mechanism
  padata: make padata_do_parallel to return zero on success
  padata: Handle empty padata cpumasks
  ...
2010-08-04 15:23:14 -07:00
Linus Torvalds
6ba74014c1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1443 commits)
  phy/marvell: add 88ec048 support
  igb: Program MDICNFG register prior to PHY init
  e1000e: correct MAC-PHY interconnect register offset for 82579
  hso: Add new product ID
  can: Add driver for esd CAN-USB/2 device
  l2tp: fix export of header file for userspace
  can-raw: Fix skb_orphan_try handling
  Revert "net: remove zap_completion_queue"
  net: cleanup inclusion
  phy/marvell: add 88e1121 interface mode support
  u32: negative offset fix
  net: Fix a typo from "dev" to "ndev"
  igb: Use irq_synchronize per vector when using MSI-X
  ixgbevf: fix null pointer dereference due to filter being set for VLAN 0
  e1000e: Fix irq_synchronize in MSI-X case
  e1000e: register pm_qos request on hardware activation
  ip_fragment: fix subtracting PPPOE_SES_HLEN from mtu twice
  net: Add getsockopt support for TCP thin-streams
  cxgb4: update driver version
  cxgb4: add new PCI IDs
  ...

Manually fix up conflicts in:
 - drivers/net/e1000e/netdev.c: due to pm_qos registration
   infrastructure changes
 - drivers/net/phy/marvell.c: conflict between adding 88ec048 support
   and cleaning up the IDs
 - drivers/net/wireless/ipw2x00/ipw2100.c: trivial ipw2100_pm_qos_req
   conflict (registration change vs marking it static)
2010-08-04 11:47:58 -07:00
David Howells
694f690d27 CRED: Fix RCU warning due to previous patch fixing __task_cred()'s checks
Commit 8f92054e7c ("CRED: Fix __task_cred()'s lockdep check and banner
comment") fixed the lockdep checks on __task_cred().  This has shown up
a place in the signalling code where a lock should be held - namely that
check_kill_permission() requires its callers to hold the RCU lock.

Fix group_send_sig_info() to get the RCU read lock around its call to
check_kill_permission().

Without this patch, the following warning can occur:

  ===================================================
  [ INFO: suspicious rcu_dereference_check() usage. ]
  ---------------------------------------------------
  kernel/signal.c:660 invoked rcu_dereference_check() without protection!
  ...

Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-04 11:17:10 -07:00
Linus Torvalds
f46e9913fa Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  PM / Runtime: Add runtime PM statistics (v3)
  PM / Runtime: Make runtime_status attribute not debug-only (v. 2)
  PM: Do not use dynamically allocated objects in pm_wakeup_event()
  PM / Suspend: Fix ordering of calls in suspend error paths
  PM / Hibernate: Fix snapshot error code path
  PM / Hibernate: Fix hibernation_platform_enter()
  pm_qos: Get rid of the allocation in pm_qos_add_request()
  pm_qos: Reimplement using plists
  plist: Add plist_last
  PM: Make it possible to avoid races between wakeup and system sleep
  PNPACPI: Add support for remote wakeup
  PM: describe kernel policy regarding wakeup defaults (v. 2)
  PM / Hibernate: Fix typos in comments in kernel/power/swap.c
2010-08-04 11:14:36 -07:00
Srikar Dronamraju
9da79ab83e tracing/kprobes: unregister_trace_probe needs to be called under mutex
Comment in unregister_trace_probe() says probe_lock will be held when it
gets called. However there is a case where it might called without the
probe_lock being held. Also since we are traversing the probe_list and
deleting an element from the probe_list, probe_lock should be held.

This was first pointed in uprobes traceevent review by Frederic
Weisbecker here.  (http://lkml.org/lkml/2010/5/12/106)

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20100630084548.GA10325@linux.vnet.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-08-04 12:41:23 -03:00
Jiri Kosina
d790d4d583 Merge branch 'master' into for-next 2010-08-04 15:14:38 +02:00
Patrick Pannuto
5e7f5a178b timer: Added usleep_range timer
usleep_range is a finer precision implementations of msleep
and is designed to be a drop-in replacement for udelay where
a precise sleep / busy-wait is unnecessary.

Since an easy interface to hrtimers could lead to an undesired
proliferation of interrupts, we provide only a "range" API,
forcing the caller to think about an acceptable tolerance on
both ends and hopefully avoiding introducing another interrupt.

INTRO

As discussed here ( http://lkml.org/lkml/2007/8/3/250 ), msleep(1) is not
precise enough for many drivers (yes, sleep precision is an unfair notion,
but consistently sleeping for ~an order of magnitude greater than requested
is worth fixing). This patch adds a usleep API so that udelay does not have
to be used. Obviously not every udelay can be replaced (those in atomic
contexts or being used for simple bitbanging come to mind), but there are
many, many examples of

mydriver_write(...)
/* Wait for hardware to latch */
udelay(100)

in various drivers where a busy-wait loop is neither beneficial nor
necessary, but msleep simply does not provide enough precision and people
are using a busy-wait loop instead.

CONCERNS FROM THE RFC

Why is udelay a problem / necessary? Most callers of udelay are in device/
driver initialization code, which is serial...

	As I see it, there is only benefit to sleeping over a delay; the
	notion of "refactoring" areas that use udelay was presented, but
	I see usleep as the refactoring. Consider i2c, if the bus is busy,
	you need to wait a bit (say 100us) before trying again, your
	current options are:

		* udelay(100)
		* msleep(1) <-- As noted above, actually as high as ~20ms
				on some platforms, so not really an option
		* Manually set up an hrtimer to try again in 100us (which
		  is what usleep does anyway...)

	People choose the udelay route because it is EASY; we need to
	provide a better easy route.

	Device / driver / boot code is *currently* serial, but every few
	months someone makes noise about parallelizing boot, and IMHO, a
	little forward-thinking now is one less thing to worry about
	if/when that ever happens

udelay's could be preempted

	Sure, but if udelay plans on looping 1000 times, and it gets
	preempted on loop 200, whenever it's scheduled again, it is
	going to do the next 800 loops.

Is the interruptible case needed?

	Probably not, but I see usleep as a very logical parallel to msleep,
	so it made sense to include the "full" API. Processors are getting
	faster (albeit not as quickly as they are becoming more parallel),
	so if someone wanted to be interruptible for a few usecs, why not
	let them? If this is a contentious point, I'm happy to remove it.

OTHER THOUGHTS

I believe there is also value in exposing the usleep_range option; it gives
the scheduler a lot more flexibility and allows the programmer to express
his intent much more clearly; it's something I would hope future driver
writers will take advantage of.

To get the results in the NUMBERS section below, I literally s/udelay/usleep
the kernel tree; I had to go in and undo the changes to the USB drivers, but
everything else booted successfully; I find that extremely telling in and
of itself -- many people are using a delay API where a sleep will suit them
just fine.

SOME ATTEMPTS AT NUMBERS

It turns out that calculating quantifiable benefit on this is challenging,
so instead I will simply present the current state of things, and I hope
this to be sufficient:

How many udelay calls are there in 2.6.35-rc5?

	udealy(ARG) >=	| COUNT
	1000		| 319
	500		| 414
	100		| 1146
	20		| 1832

I am working on Android, so that is my focus for this. The following table
is a modified usleep that simply printk's the amount of time requested to
sleep; these tests were run on a kernel with udelay >= 20 --> usleep

"boot" is power-on to lock screen
"power collapse" is when the power button is pushed and the device suspends
"resume" is when the power button is pushed and the lock screen is displayed
         (no touchscreen events or anything, just turning on the display)
"use device" is from the unlock swipe to clicking around a bit; there is no
	sd card in this phone, so fail loading music, video, camera

	ACTION		| TOTAL NUMBER OF USLEEP CALLS	| NET TIME (us)
	boot		| 22				| 1250
	power-collapse	| 9				| 1200
	resume		| 5				| 500
	use device	| 59				| 7700

The most interesting category to me is the "use device" field; 7700us of
busy-wait time that could be put towards better responsiveness, or at the
least less power usage.

Signed-off-by: Patrick Pannuto <ppannuto@codeaurora.org>
Cc: apw@canonical.com
Cc: corbet@lwn.net
Cc: arjan@linux.intel.com
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-08-04 11:00:45 +02:00
Thomas Gleixner
e1b004c3ef Revert "timer: Added usleep[_range] timer"
This reverts commit 22b8f15c2f to merge
an advanced version.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-08-04 10:53:00 +02:00
Benjamin Herrenschmidt
412a4ac5e9 Merge commit 'gcl/next' into next 2010-08-04 10:26:03 +10:00
Jesse Barnes
8cadd2831b timer: add on-stack deferrable timer interfaces
In some cases (for instance with kernel threads) it may be desireable to
use on-stack deferrable timers to get their power saving benefits.  Add
interfaces to support this for the IPS driver.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
2010-08-03 09:48:45 -04:00
Arjan van de Ven
af5ab277de clockevents: Remove the per cpu tick skew
Historically, Linux has tried to make the regular timer tick on the
various CPUs not happen at the same time, to avoid contention on
xtime_lock.

Nowadays, with the tickless kernel, this contention no longer happens
since time keeping and updating are done differently. In addition,
this skew is actually hurting power consumption in a measurable way on
many-core systems.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
LKML-Reference: <20100727210210.58d3118c@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-08-02 21:45:58 +02:00
Ingo Molnar
3772b73472 Merge commit 'v2.6.35' into perf/core
Conflicts:
	tools/perf/Makefile
	tools/perf/util/hist.c

Merge reason: Resolve the conflicts and update to latest upstream.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-02 08:31:54 +02:00
Frederic Weisbecker
669336e4cf perf: Use tracepoint_synchronize_unregister() to flush any pending tracepoint call
We use synchronize_sched() to ensure a tracepoint won't be called
while/after we release the perf buffers it references.

But the tracepoint API has its own API for that:
tracepoint_synchronize_unregister(). Use it instead as it's
self-explanatory and eases maintainance.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
2010-08-02 01:30:56 +02:00
Suresh Siddha
6ee0578b4d workqueue: mark init_workqueues() as early_initcall()
Mark init_workqueues() as early_initcall() and thus it will be initialized
before smp bringup. init_workqueues() registers for the hotcpu notifier
and thus it should cope with the processors that are brought online after
the workqueues are initialized.

x86 smp bringup code uses workqueues and uses a workaround for the
cold boot process (as the workqueues are initialized post smp_init()).
Marking init_workqueues() as early_initcall() will pave the way for
cleaning up this code.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
2010-08-01 13:05:29 +02:00
Tejun Heo
098849516d workqueue: explain for_each_*cwq_cpu() iterators
for_each_*cwq_cpu() are similar to regular CPU iterators except that
it also considers the pseudo CPU number used for unbound workqueues.
Explain them.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
2010-08-01 11:50:12 +02:00
Steffen Klassert
0500e9b3f1 padata: Remove padata_get_cpumask
A function that copies the padata cpumasks to a user buffer
is a bit error prone. The cpumask can change any time so we
can't be sure to have the right cpumask when using this function.
A user who is interested in the padata cpumasks should register
to the padata cpumask notifier chain instead. Users of
padata_get_cpumask are already updated, so we can remove it.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-31 19:53:06 +08:00
Steffen Klassert
c635696c7c padata: Pass the padata cpumasks to the cpumask_change_notifier chain
We pass a pointer to the new padata cpumasks to the cpumask_change_notifier
chain. So users can access the cpumasks without the need of an extra
padata_get_cpumask function.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-31 19:53:05 +08:00
Steffen Klassert
65ff577e6b padata: Rearrange set_cpumask functions
padata_set_cpumask needs to be protected by a lock. We make
__padata_set_cpumasks unlocked and static. So this function
can be used by the exported and locked padata_set_cpumask and
padata_set_cpumasks functions.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-31 19:53:04 +08:00
Steffen Klassert
e6cc117076 padata: Rename padata_alloc functions
We rename padata_alloc to padata_alloc_possible because this
function allocates a padata_instance and uses the cpu_possible
mask for parallel and serial workers. Also we rename __padata_alloc
to padata_alloc to avoid to export underlined functions. Underlined
functions are considered to be private to padata. Users are updated
accordingly.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-31 19:53:04 +08:00
David Howells
de09a9771a CRED: Fix get_task_cred() and task_state() to not resurrect dead credentials
It's possible for get_task_cred() as it currently stands to 'corrupt' a set of
credentials by incrementing their usage count after their replacement by the
task being accessed.

What happens is that get_task_cred() can race with commit_creds():

	TASK_1			TASK_2			RCU_CLEANER
	-->get_task_cred(TASK_2)
	rcu_read_lock()
	__cred = __task_cred(TASK_2)
				-->commit_creds()
				old_cred = TASK_2->real_cred
				TASK_2->real_cred = ...
				put_cred(old_cred)
				  call_rcu(old_cred)
		[__cred->usage == 0]
	get_cred(__cred)
		[__cred->usage == 1]
	rcu_read_unlock()
							-->put_cred_rcu()
							[__cred->usage == 1]
							panic()

However, since a tasks credentials are generally not changed very often, we can
reasonably make use of a loop involving reading the creds pointer and using
atomic_inc_not_zero() to attempt to increment it if it hasn't already hit zero.

If successful, we can safely return the credentials in the knowledge that, even
if the task we're accessing has released them, they haven't gone to the RCU
cleanup code.

We then change task_state() in procfs to use get_task_cred() rather than
calling get_cred() on the result of __task_cred(), as that suffers from the
same problem.

Without this change, a BUG_ON in __put_cred() or in put_cred_rcu() can be
tripped when it is noticed that the usage count is not zero as it ought to be,
for example:

kernel BUG at kernel/cred.c:168!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/kernel/mm/ksm/run
CPU 0
Pid: 2436, comm: master Not tainted 2.6.33.3-85.fc13.x86_64 #1 0HR330/OptiPlex
745
RIP: 0010:[<ffffffff81069881>]  [<ffffffff81069881>] __put_cred+0xc/0x45
RSP: 0018:ffff88019e7e9eb8  EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff880161514480 RCX: 00000000ffffffff
RDX: 00000000ffffffff RSI: ffff880140c690c0 RDI: ffff880140c690c0
RBP: ffff88019e7e9eb8 R08: 00000000000000d0 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000040 R12: ffff880140c690c0
R13: ffff88019e77aea0 R14: 00007fff336b0a5c R15: 0000000000000001
FS:  00007f12f50d97c0(0000) GS:ffff880007400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8f461bc000 CR3: 00000001b26ce000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process master (pid: 2436, threadinfo ffff88019e7e8000, task ffff88019e77aea0)
Stack:
 ffff88019e7e9ec8 ffffffff810698cd ffff88019e7e9ef8 ffffffff81069b45
<0> ffff880161514180 ffff880161514480 ffff880161514180 0000000000000000
<0> ffff88019e7e9f28 ffffffff8106aace 0000000000000001 0000000000000246
Call Trace:
 [<ffffffff810698cd>] put_cred+0x13/0x15
 [<ffffffff81069b45>] commit_creds+0x16b/0x175
 [<ffffffff8106aace>] set_current_groups+0x47/0x4e
 [<ffffffff8106ac89>] sys_setgroups+0xf6/0x105
 [<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
Code: 48 8d 71 ff e8 7e 4e 15 00 85 c0 78 0b 8b 75 ec 48 89 df e8 ef 4a 15 00
48 83 c4 18 5b c9 c3 55 8b 07 8b 07 48 89 e5 85 c0 74 04 <0f> 0b eb fe 65 48 8b
04 25 00 cc 00 00 48 3b b8 58 04 00 00 75
RIP  [<ffffffff81069881>] __put_cred+0xc/0x45
 RSP <ffff88019e7e9eb8>
---[ end trace df391256a100ebdd ]---

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-29 15:16:17 -07:00
Ian Campbell
685fd0b4ea irq: Add new IRQ flag IRQF_NO_SUSPEND
A small number of users of IRQF_TIMER are using it for the implied no
suspend behaviour on interrupts which are not timer interrupts.

Therefore add a new IRQF_NO_SUSPEND flag, rename IRQF_TIMER to
__IRQF_TIMER and redefine IRQF_TIMER in terms of these new flags.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: xen-devel@lists.xensource.com
Cc: linux-input@vger.kernel.org
Cc: linuxppc-dev@ozlabs.org
Cc: devicetree-discuss@lists.ozlabs.org
LKML-Reference: <1280398595-29708-1-git-send-email-ian.campbell@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-29 13:24:57 +02:00
Thomas Gleixner
157b1a2385 kgdb: Do not access xtime directly
The xtime cleanup missed the kgdb access to xtime. Fix it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-29 10:29:39 +02:00
Sridhar Samudrala
d7926ee38f cgroups: Add an API to attach a task to current task's cgroup
Add a new kernel API to attach a task to current task's cgroup
in all the active hierarchies.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paul Menage <menage@google.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
2010-07-28 15:45:12 +03:00
Jason Baron
b82bab4bbe dynamic debug: move ddebug_remove_module() down into free_module()
The command

	echo "file ec.c +p" >/sys/kernel/debug/dynamic_debug/control

causes an oops.

Move the call to ddebug_remove_module() down into free_module().  In this
way it should be called from all error paths.  Currently, we are missing
the remove if the module init routine fails.

Signed-off-by: Jason Baron <jbaron@redhat.com>
Reported-by: Thomas Renninger <trenn@suse.de>
Tested-by: Thomas Renninger <trenn@suse.de>
Cc: <stable@kernel.org>		[2.6.32+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-27 14:32:06 -07:00
John Stultz
852db46d55 clocksource: Add __clocksource_updatefreq_hz/khz methods
To properly handle clocksources that change frequencies
at the clocksource->enable() point, this patch adds
a method that will update the clocksource's mult/shift and
max_idle_ns values.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1279068988-21864-12-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-27 12:40:55 +02:00
John Stultz
0fb86b0629 timekeeping: Make xtime and wall_to_monotonic static
This patch makes xtime and wall_to_monotonic static, as planned in
Documentation/feature-removal-schedule.txt. This will allow for
further cleanups to the timekeeping core.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1279068988-21864-10-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-27 12:40:55 +02:00
John Stultz
8ab4351a4c hrtimer: Cleanup direct access to wall_to_monotonic
Provides an accessor function to replace hrtimer.c's
direct access of wall_to_monotonic.

This will allow wall_to_monotonic to be made static as
planned in Documentation/feature-removal-schedule.txt

Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1279068988-21864-9-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-27 12:40:55 +02:00
John Stultz
7615856ebf timkeeping: Fix update_vsyscall to provide wall_to_monotonic offset
update_vsyscall() did not provide the wall_to_monotoinc offset,
so arch specific implementations tend to reference wall_to_monotonic
directly. This limits future cleanups in the timekeeping core, so
this patch fixes the update_vsyscall interface to provide
wall_to_monotonic, allowing wall_to_monotonic to be made static
as planned in Documentation/feature-removal-schedule.txt

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Tony Luck <tony.luck@intel.com>
LKML-Reference: <1279068988-21864-7-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-27 12:40:54 +02:00
John Stultz
592913ecb8 time: Kill off CONFIG_GENERIC_TIME
Now that all arches have been converted over to use generic time via
clocksources or arch_gettimeoffset(), we can remove the GENERIC_TIME
config option and simplify the generic code.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1279068988-21864-4-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-27 12:40:54 +02:00
John Stultz
ce3bf7ab22 time: Implement timespec_add
After accidentally misusing timespec_add_safe, I wanted to make sure
we don't accidently trip over that issue again, so I created a simple
timespec_add() function which we can use to replace the instances
of timespec_add_safe() that don't want the overflow detection.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1279068988-21864-3-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-27 12:40:53 +02:00
Steffen Klassert
7424713b83 padata: Check for valid cpumasks
Now that we allow to change the cpumasks from userspace, we have
to check for valid cpumasks in padata_do_parallel. This patch adds
the necessary check. This fixes a division by zero crash if the
parallel cpumask contains no active cpu.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-26 14:13:58 +08:00
Steffen Klassert
b89661dff5 padata: Allocate cpumask dependend recources in any case
The cpumask separation work assumes the cpumask dependend recources
present regardless of valid or invalid cpumasks. With this patch
we allocate the cpumask dependend recources in any case. This fixes
two NULL pointer dereference crashes in padata_replace and in
padata_get_cpumask.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-26 14:13:58 +08:00
Steffen Klassert
fad3a906d3 padata: Fix cpu index counting
The counting of the cpu index got lost with a recent commit.
This patch restores it. This fixes a hang of the parallel worker
threads on cpu hotplug.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-26 14:13:57 +08:00
Andrey Vagin
2b08de0073 posix_timer: Move copy_to_user(created_timer_id) down in timer_create()
According to Oleg Nesterov:
We can move copy_to_user(created_timer_id) down after
"if (timer_event_spec)" block too. (but before CLOCK_DISPATCH(),
of course).

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-23 15:08:12 +02:00
Patrick Pannuto
22b8f15c2f timer: Added usleep[_range] timer
usleep[_range] are finer precision implementations of msleep
and are designed to be drop-in replacements for udelay where
a precise sleep / busy-wait is unnecessary. They also allow
an easy interface to specify slack when a precise (ish)
wakeup is unnecessary to help minimize wakeups

Signed-off-by: Patrick Pannuto <ppannuto@codeaurora.org>
Cc: akinobu.mita@gmail.com
Cc: sboyd@codeaurora.org
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
LKML-Reference: <4C44CDD2.1070708@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-23 15:08:12 +02:00
J. Bruce Fields
866e26115c timers: Document meaning of deferrable timer
Steal some text from 6e453a6751 "Add support for deferrable timers".  A
reader shouldn't have to dig through the git logs for the basic
description of a deferrable timer.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: johnstul@us.ibm.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-23 15:08:12 +02:00
Tejun Heo
181a51f6e0 slow-work: kill it
slow-work doesn't have any user left.  Kill it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Howells <dhowells@redhat.com>
2010-07-23 13:14:49 +02:00
Ingo Molnar
3a01736e70 Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core 2010-07-23 09:10:29 +02:00
Tejun Heo
e120153ddf workqueue: fix how cpu number is stored in work->data
Once a work starts execution, its data contains the cpu number it was
on instead of pointing to cwq.  This is added by commit 7a22ad75
(workqueue: carry cpu number in work data once execution starts) to
reliably determine the work was last on even if the workqueue itself
was destroyed inbetween.

Whether data points to a cwq or contains a cpu number was
distinguished by comparing the value against PAGE_OFFSET.  The
assumption was that a cpu number should be below PAGE_OFFSET while a
pointer to cwq should be above it.  However, on architectures which
use separate address spaces for user and kernel spaces, this doesn't
hold as PAGE_OFFSET is zero.

Fix it by using an explicit flag, WORK_STRUCT_CWQ, to mark what the
data field contains.  If the flag is set, it's pointing to a cwq;
otherwise, it contains a cpu number.

Reported on s390 and microblaze during linux-next testing.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Sachin Sant <sachinp@in.ibm.com>
Reported-by: Michal Simek <michal.simek@petalogix.com>
Reported-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Tested-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Tested-by: Michal Simek <monstr@monstr.eu>
2010-07-22 22:39:22 +02:00
Dan Carpenter
24a461d537 trace: strlen() return doesn't account for the NULL
We need to add one to the strlen() return because of the NULL
character.  The type->name here generally comes from the kernel and I
don't think any of them come close to being MAX_TRACER_SIZE (100)
characters long so this is basically a cleanup.

Signed-off-by: Dan Carpenter <error27@gmail.com>
LKML-Reference: <20100710100644.GV19184@bicker>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-07-22 14:56:41 -04:00
Jason Wessel
edd63cb6b9 sysrq,kdb: Use __handle_sysrq() for kdb's sysrq function
The kdb code should not toggle the sysrq state in case an end user
wants to try and resume the normal kernel execution.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2010-07-21 19:27:07 -05:00
Jason Wessel
b0679c63db debug_core,kdb: fix kgdb_connected bit set in the wrong place
Immediately following an exit from the kdb shell the kgdb_connected
variable should be set to zero, unless there are breakpoints planted.
If the kgdb_connected variable is not zeroed out with kdb, it is
impossible to turn off kdb.

This patch is merely a work around for now, the real fix will check
for the breakpoints.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-07-21 19:27:07 -05:00