Commit graph

320432 commits

Author SHA1 Message Date
Marek Szyprowski
bdf5e4871f common: DMA-mapping: add DMA_ATTR_SKIP_CPU_SYNC attribute
This patch adds DMA_ATTR_SKIP_CPU_SYNC attribute to the DMA-mapping
subsystem.

By default dma_map_{single,page,sg} functions family transfer a given
buffer from CPU domain to device domain. Some advanced use cases might
require sharing a buffer between more than one device. This requires
having a mapping created separately for each device and is usually
performed by calling dma_map_{single,page,sg} function more than once
for the given buffer with device pointer to each device taking part in
the buffer sharing. The first call transfers a buffer from 'CPU' domain
to 'device' domain, what synchronizes CPU caches for the given region
(usually it means that the cache has been flushed or invalidated
depending on the dma direction). However, next calls to
dma_map_{single,page,sg}() for other devices will perform exactly the
same sychronization operation on the CPU cache. CPU cache sychronization
might be a time consuming operation, especially if the buffers are
large, so it is highly recommended to avoid it if possible.
DMA_ATTR_SKIP_CPU_SYNC allows platform code to skip synchronization of
the CPU cache for the given buffer assuming that it has been already
transferred to 'device' domain. This attribute can be also used for
dma_unmap_{single,page,sg} functions family to force buffer to stay in
device domain after releasing a mapping for it. Use this attribute with
care!

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
2012-07-30 12:25:47 +02:00
Marek Szyprowski
dc2832e1e7 ARM: dma-mapping: add support for dma_get_sgtable()
This patch adds support for dma_get_sgtable() function which is required
to let drivers to share the buffers allocated by DMA-mapping subsystem.

Generic implementation based on virt_to_page() is not suitable for ARM
dma-mapping subsystem.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
2012-07-30 12:25:47 +02:00
Marek Szyprowski
d2b7428eb0 common: dma-mapping: introduce dma_get_sgtable() function
This patch adds dma_get_sgtable() function which is required to let
drivers to share the buffers allocated by DMA-mapping subsystem. Right
now the driver gets a dma address of the allocated buffer and the kernel
virtual mapping for it. If it wants to share it with other device (= map
into its dma address space) it usually hacks around kernel virtual
addresses to get pointers to pages or assumes that both devices share
the DMA address space. Both solutions are just hacks for the special
cases, which should be avoided in the final version of buffer sharing.

To solve this issue in a generic way, a new call to DMA mapping has been
introduced - dma_get_sgtable(). It allocates a scatter-list which
describes the allocated buffer and lets the driver(s) to use it with
other device(s) by calling dma_map_sg() on it.

This patch provides a generic implementation based on virt_to_page()
call. Architectures which require more sophisticated translation might
provide their own get_sgtable() methods.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-30 12:25:46 +02:00
Marek Szyprowski
955c757e09 ARM: dma-mapping: add support for DMA_ATTR_NO_KERNEL_MAPPING attribute
This patch adds support for DMA_ATTR_NO_KERNEL_MAPPING attribute for
IOMMU allocations, what let drivers to save precious kernel virtual
address space for large buffers that are intended to be accessed only
from userspace.

This patch is heavily based on initial work kindly provided by Abhinav
Kochhar <abhinav.k@samsung.com>.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
2012-07-30 12:25:46 +02:00
Marek Szyprowski
d5724f172f common: DMA-mapping: add DMA_ATTR_NO_KERNEL_MAPPING attribute
This patch adds DMA_ATTR_NO_KERNEL_MAPPING attribute which lets the
platform to avoid creating a kernel virtual mapping for the allocated
buffer. On some architectures creating such mapping is non-trivial task
and consumes very limited resources (like kernel virtual address space
or dma consistent address space). Buffers allocated with this attribute
can be only passed to user space by calling dma_mmap_attrs().

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-30 12:25:46 +02:00
Marek Szyprowski
64ccc9c033 common: dma-mapping: add support for generic dma_mmap_* calls
Commit 9adc5374 ('common: dma-mapping: introduce mmap method') added a
generic method for implementing mmap user call to dma_map_ops structure.

This patch converts ARM and PowerPC architectures (the only providers of
dma_mmap_coherent/dma_mmap_writecombine calls) to use this generic
dma_map_ops based call and adds a generic cross architecture
definition for dma_mmap_attrs, dma_mmap_coherent, dma_mmap_writecombine
functions.

The generic mmap virt_to_page-based fallback implementation is provided for
architectures which don't provide their own implementation for mmap method.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
2012-07-30 12:25:46 +02:00
Marek Szyprowski
9fa8af91f0 ARM: dma-mapping: fix error path for memory allocation failure
This patch fixes incorrect check in error path. When the allocation of
first page fails, the kernel ops appears due to accessing -1 element of
the pages array.

Reported-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2012-07-30 12:25:45 +02:00
Marek Szyprowski
50262a4bf3 ARM: dma-mapping: add more sanity checks in arm_dma_mmap()
Add some sanity checks and forbid mmaping of buffers into vma areas larger
than allocated dma buffer.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2012-07-30 12:25:45 +02:00
Marek Szyprowski
e9da6e9905 ARM: dma-mapping: remove custom consistent dma region
This patch changes dma-mapping subsystem to use generic vmalloc areas
for all consistent dma allocations. This increases the total size limit
of the consistent allocations and removes platform hacks and a lot of
duplicated code.

Atomic allocations are served from special pool preallocated on boot,
because vmalloc areas cannot be reliably created in atomic context.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
2012-07-30 12:25:45 +02:00
Marek Szyprowski
5e6cafc83e mm: vmalloc: use const void * for caller argument
'const void *' is a safer type for caller function type. This patch
updates all references to caller function type.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
2012-07-30 12:25:44 +02:00
Tomasz Stanislawski
efc42bc980 scatterlist: add sg_alloc_table_from_pages function
This patch adds a new constructor for an sg table. The table is constructed
from an array of struct pages. All contiguous chunks of the pages are merged
into a single sg nodes. A user may provide an offset and a size of a buffer if
the buffer is not page-aligned.

The function is dedicated for DMABUF exporters which often perform conversion
from an page array to a scatterlist. Moreover the scatterlist should be
squashed in order to save memory and to speed-up the process of DMA mapping
using dma_map_sg.

The code is based on the patch 'v4l: vb2-dma-contig: add support for
scatterlist in userptr mode' and hints from Laurent Pinchart.

Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Andrew Morton <akpm@linux-foundation.org>
2012-07-30 12:25:44 +02:00
Shuah Khan
73a1180e14 mm: Fix build warning in kmem_cache_create()
The label oops is used in CONFIG_DEBUG_VM ifdef block and is defined
outside ifdef CONFIG_DEBUG_VM block. This results in the following
build warning when built with CONFIG_DEBUG_VM disabled. Fix to move
label oops definition to inside a CONFIG_DEBUG_VM block.

mm/slab_common.c: In function ‘kmem_cache_create’:
mm/slab_common.c:101:1: warning: label ‘oops’ defined but not used
[-Wunused-label]

Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-07-30 13:15:40 +03:00
Jan Beulich
e273bd98c9 hwmon: struct x86_cpu_id arrays can be __initconst
... as being referenced from __init code only.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-30 11:33:00 +02:00
Heiko Carstens
7d25617597 s390: make use of user_mode() macro where possible
We use the user_mode() helper already at several places but also
have the open coded variant at other places.
Convert the code to always use the helper function.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-07-30 11:03:12 +02:00
Heiko Carstens
37fe1d73a4 s390/mm: rename user_mode variable to addressing_mode
Fix name clash with user_mode() define which is also used in common code.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-07-30 11:03:11 +02:00
Heiko Carstens
008c2e8f24 s390/mm: fix fault handling for page table walk case
Make sure the kernel does not incorrectly create a SIGBUS signal during
user space accesses:

For user space accesses in the switched addressing mode case the kernel
may walk page tables and access user address space via the kernel
mapping. If a page table entry is invalid the function __handle_fault()
gets called in order to emulate a page fault and trigger all the usual
actions like paging in a missing page etc. by calling handle_mm_fault().

If handle_mm_fault() returns with an error fixup handling is necessary.
For the switched addressing mode case all errors need to be mapped to
-EFAULT, so that the calling uaccess function can return -EFAULT to
user space.

Unfortunately the __handle_fault() incorrectly calls do_sigbus() if
VM_FAULT_SIGBUS is set. This however should only happen if a page fault
was triggered by a user space instruction. For kernel mode uaccesses
the correct action is to only return -EFAULT.
So user space may incorrectly see SIGBUS signals because of this bug.

For current machines this would only be possible for the switched
addressing mode case in conjunction with futex operations.

Cc: stable@vger.kernel.org
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-07-30 11:03:09 +02:00
Heiko Carstens
f2c76e3b6f s390/mm: make page faults killable
This is the s390 variant of 37b23e05 "x86,mm: make pagefault killable".

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-07-30 11:03:08 +02:00
Al Viro
e3fea3f70f selinux: fix selinux_inode_setxattr oops
OK, what we have so far is e.g.
	setxattr(path, name, whatever, 0, XATTR_REPLACE)
with name being good enough to get through xattr_permission().
Then we reach security_inode_setxattr() with the desired value and size.
Aha.  name should begin with "security.selinux", or we won't get that
far in selinux_inode_setxattr().  Suppose we got there and have enough
permissions to relabel that sucker.  We call security_context_to_sid()
with value == NULL, size == 0.  OK, we want ss_initialized to be non-zero.
I.e. after everything had been set up and running.  No problem...

We do 1-byte kmalloc(), zero-length memcpy() (which doesn't oops, even
thought the source is NULL) and put a NUL there.  I.e. form an empty
string.  string_to_context_struct() is called and looks for the first
':' in there.  Not found, -EINVAL we get.  OK, security_context_to_sid_core()
has rc == -EINVAL, force == 0, so it silently returns -EINVAL.
All it takes now is not having CAP_MAC_ADMIN and we are fucked.

All right, it might be a different bug (modulo strange code quoted in the
report), but it's real.  Easily fixed, AFAICS:

Deal with size == 0, value == NULL case in selinux_inode_setxattr()

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Dave Jones <davej@redhat.com>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-07-30 15:36:50 +10:00
Dmitry Torokhov
cf45b5a252 Merge branch 'next' into for-linus
Prepare second set of changes for 3.6 merge window.
2012-07-29 22:34:47 -07:00
David Howells
5935e6dcaa KEYS: linux/key-type.h needs linux/errno.h
linux/key-type.h needs to #include linux/errno.h as it refers to ENOKEY.
Without this, with sparc's allmodconfig in one of my test trees, the following
error occurs:

include/linux/key-type.h: In function 'key_negate_and_link':
include/linux/key-type.h:122:43: error: 'ENOKEY' undeclared (first use in this function)
include/linux/key-type.h:122:43: note: each undeclared identifier is reported only once for each fun

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-07-30 15:08:46 +10:00
Alan Cox
3b9fc37280 smack: off by one error
Consider the input case of a rule that consists entirely of non space
symbols followed by a \0. Say 64 + \0

In this case strlen(data) = 64
kzalloc of subject and object are 64 byte objects
sscanfdata, "%s %s %s", subject, ...)

will put 65 bytes into subject.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Cc: stable@vger.kernel.org
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-07-30 15:04:17 +10:00
Rusty Russell
6a74389714 virtio-blk: return VIRTIO_BLK_F_FLUSH to header.
This got renamed and clarified, but let's not break any userspace out there.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-07-30 13:30:52 +09:30
Paolo Bonzini
cd5d503862 virtio-blk: allow toggling host cache between writeback and writethrough
This patch adds support for the new VIRTIO_BLK_F_CONFIG_WCE feature,
which exposes the cache mode in the configuration space and lets the
driver modify it.  The cache mode is exposed via sysfs.

Even if the host does not support the new feature, the cache mode is
visible (thanks to the existing VIRTIO_BLK_F_WCE), but not modifiable.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-07-30 13:30:52 +09:30
Asias He
2c95a32909 virtio-blk: Use block layer provided spinlock
Block layer will allocate a spinlock for the queue if the driver does
not provide one in blk_init_queue().

The reason to use the internal spinlock is that blk_cleanup_queue() will
switch to use the internal spinlock in the cleanup code path.

        if (q->queue_lock != &q->__queue_lock)
                q->queue_lock = &q->__queue_lock;

However, processes which are in D state might have taken the driver
provided spinlock, when the processes wake up, they would release the
block provided spinlock.

=====================================
[ BUG: bad unlock balance detected! ]
3.4.0-rc7+ #238 Not tainted
-------------------------------------
fio/3587 is trying to release lock (&(&q->__queue_lock)->rlock) at:
[<ffffffff813274d2>] blk_queue_bio+0x2a2/0x380
but there are no more locks to release!

other info that might help us debug this:
1 lock held by fio/3587:
 #0:  (&(&vblk->lock)->rlock){......}, at:
[<ffffffff8132661a>] get_request_wait+0x19a/0x250

Other drivers use block layer provided spinlock as well, e.g. SCSI.

Switching to the block layer provided spinlock saves a bit of memory and
does not increase lock contention. Performance test shows no real
difference is observed before and after this patch.

Changes in v2: Improve commit log as Michael suggested.

Cc: virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Cc: stable@kernel.org
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-07-30 13:30:51 +09:30
Asias He
483001c765 virtio-blk: Reset device after blk_cleanup_queue()
blk_cleanup_queue() will call blk_drian_queue() to drain all the
requests before queue DEAD marking. If we reset the device before
blk_cleanup_queue() the drain would fail.

1) if the queue is stopped in do_virtblk_request() because device is
full, the q->request_fn() will not be called.

blk_drain_queue() {
   while(true) {
      ...
      if (!list_empty(&q->queue_head))
        __blk_run_queue(q) {
	    if (queue is not stoped)
		q->request_fn()
	}
      ...
   }
}

Do no reset the device before blk_cleanup_queue() gives the chance to
start the queue in interrupt handler blk_done().

2) In commit b79d866c8b, We abort requests
dispatched to driver before blk_cleanup_queue(). There is a race if
requests are dispatched to driver after the abort and before the queue
DEAD mark. To fix this, instead of aborting the requests explicitly, we
can just reset the device after after blk_cleanup_queue so that the
device can complete all the requests before queue DEAD marking in the
drain process.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Cc: stable@kernel.org
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-07-30 13:30:51 +09:30
Asias He
02e2b12494 virtio-blk: Call del_gendisk() before disable guest kick
del_gendisk() might not return due to failing to remove the
/sys/block/vda/serial sysfs entry when another thread (udev) is
trying to read it.

virtblk_remove()
  vdev->config->reset() : guest will not kick us through interrupt
    del_gendisk()
      device_del()
        kobject_del(): got stuck, sysfs entry ref count non zero

sysfs_open_file(): user space process read /sys/block/vda/serial
   sysfs_get_active() : got sysfs entry ref count
      dev_attr_show()
        virtblk_serial_show()
           blk_execute_rq() : got stuck, interrupt is disabled
                              request cannot be finished

This patch fixes it by calling del_gendisk() before we disable guest's
interrupt so that the request sent in virtblk_serial_show() will be
finished and del_gendisk() will success.

This fixes another race in hot-unplug process.

It is save to call del_gendisk(vblk->disk) before
flush_work(&vblk->config_work) which might access vblk->disk, because
vblk->disk is not freed until put_disk(vblk->disk).

Cc: virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Cc: stable@kernel.org
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-07-30 13:30:51 +09:30
Amit Shah
0bc1a2ef19 virtio: rng: s3/s4 support
Unregister from the hwrng interface and remove the vq before entering
the S3 or S4 states.  Add the vq and re-register with hwrng on restore.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-07-30 13:30:50 +09:30
Amit Shah
178d855e78 virtio: rng: split out common code in probe / remove for s3/s4 ops
The freeze/restore s3/s4 operations will use code that's common to the
probe and remove routines.  Put the common code in separate funcitons.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-07-30 13:30:50 +09:30
Amit Shah
4476987a9a virtio: rng: don't wait on host when module is going away
No use waiting for input from host when the module is being removed.
We're going to remove the vq in the next step anyway, so just perform
any other steps for cleanup (currently none).

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-07-30 13:30:50 +09:30
Amit Shah
cc8744e129 virtio: rng: allow tasks to be killed that are waiting for rng input
Use wait_for_completion_killable() instead of wait_for_completion() when
waiting for the host to send us entropy.  Without this,

  # cat /dev/hwrng
  ^C

just hangs.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-07-30 13:30:49 +09:30
Amit Shah
ddcc286900 virtio ids: fix comment for virtio-rng
It's virtio-rng, not virtio-ring.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-07-30 13:30:49 +09:30
Mauro Carvalho Chehab
c2078e4c91 Merge branch 'devel'
* devel: (33 commits)
  edac i5000, i5400: fix pointer math in i5000_get_mc_regs()
  edac: allow specifying the error count with fake_inject
  edac: add support for Calxeda highbank L2 cache ecc
  edac: add support for Calxeda highbank memory controller
  edac: create top-level debugfs directory
  sb_edac: properly handle error count
  i7core_edac: properly handle error count
  edac: edac_mc_handle_error(): add an error_count parameter
  edac: remove arch-specific parameter for the error handler
  amd64_edac: Don't pass driver name as an error parameter
  edac_mc: check for allocation failure in edac_mc_alloc()
  edac: Increase version to 3.0.0
  edac_mc: Cleanup per-dimm_info debug messages
  edac: Convert debugfX to edac_dbg(X,
  edac: Use more normal debugging macro style
  edac: Don't add __func__ or __FILE__ for debugf[0-9] msgs
  Edac: Add ABI Documentation for the new device nodes
  edac: move documentation ABI to ABI/testing/sysfs-devices-edac
  i7core_edac: change the mem allocation scheme to make Documentation/kobject.txt happy
  edac: change the mem allocation scheme to make Documentation/kobject.txt happy
  ...
2012-07-29 21:11:05 -03:00
Mauro Carvalho Chehab
73bcc49959 Linux 3.5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQEcBAABAgAGBQJQCxgHAAoJEHm+PkMAQRiG0o4H/2mrSOfkZCdiOchmhmZQQPPC
 aRlsN1sNUaiDblPS2T6gwEwCsxGz5Sh9DX4WqM3QHlFrraUSh51ZytJ+VC4CcDpF
 GL9dVYuY8F+e8b0de0HSmZx0zhWMORstNJLg9TiX2dD5fuhpEajPlK0p+Apv1qPa
 9A3vpvnUMNLI4Lvi7MBYPm+/Suj4ewwHyfMn+86ZOdTzT5PcCszG5UJYLv+ebbNB
 2CzPvsbr4trGtPNtHkrl8eWh//LyUBWzvtk64XZWCFKiWVKESOdKEOSKe0vTP052
 NTiSyY1k8M5UvgidaFTXsG382ywYWbAzCJHu2hKkDPMC2y5lGO/5oKEIidn4l3A=
 =g7OH
 -----END PGP SIGNATURE-----

Merge tag 'v3.5'

Linux 3.5

* tag 'v3.5': (1242 commits)
  Linux 3.5
  Remove SYSTEM_SUSPEND_DISK system state
  kdb: Switch to nolock variants of kmsg_dump functions
  printk: Implement some unlocked kmsg_dump functions
  printk: Remove kdb_syslog_data
  kdb: Revive dmesg command
  dm raid1: set discard_zeroes_data_unsupported
  dm thin: do not send discards to shared blocks
  dm raid1: fix crash with mirror recovery and discard
  pnfs-obj: Fix __r4w_get_page when offset is beyond i_size
  pnfs-obj: don't leak objio_state if ore_write/read fails
  ore: Unlock r4w pages in exact reverse order of locking
  ore: Remove support of partial IO request (NFS crash)
  ore: Fix NFS crash by supporting any unaligned RAID IO
  UBIFS: fix a bug in empty space fix-up
  cx25821: Remove bad strcpy to read-only char*
  HID: hid-multitouch: add support for Zytronic panels
  MIPS: PCI: Move fixups from __init to __devinit.
  MIPS: Fix bug.h MIPS build regression
  MIPS: sync-r4k: remove redundant irq operation
  ...
2012-07-29 21:09:39 -03:00
Mark Tinguely
9a57fa8ee7 xfs: wait for the write the superblock on unmount
v2: Add the xfs_buf_lock to xfs_quiesce_attr().
    Add explaination why xfs_buf_lock() is used to wait for write.

xfs_wait_buftarg() does not wait for the completion of the write of the
uncached superblock. This write can race with the shutdown of the log
and causes a panic if the write does not win the race.

During the log write, xfsaild_push() will lock the buffer and set the
XBF_ASYNC flag. Because the XBF_FLAG is set, complete() is not performed
on the buffer's iowait entry, we cannot call xfs_buf_iowait() to wait
for the write to complete. The buffer's lock is held until the write is
complete, so we can block on a xfs_buf_lock() request to be notified
that the write is complete.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:34:19 -05:00
Brian Foster
8375f922aa xfs: re-enable xfsaild idle mode and fix associated races
xfsaild idle mode logic currently leads to a couple hangs:

1.) If xfsaild is rescheduled in during an incremental scan
    (i.e., tout != 0) and the target has been updated since
    the previous run, we can hit the new target and go into
    idle mode with a still populated ail.
2.) A wake up is only issued when the target is pushed forward.
    The wake up can race with xfsaild if it is currently in the
    process of entering idle mode, causing future wake up
    events to be lost.

These hangs have been reproduced and verified as fixed by
running xfstests 273 in a loop on a slightly modified upstream
kernel. The kernel is modified to re-enable idle mode as
previously implemented (when count == 0) and with a revert of
commit 670ce93f, which includes performance improvements that
make this harder to reproduce.

The solution, the algorithm for which has been outlined by
Dave Chinner, is to modify xfsaild to enter idle mode only when
the ail is empty and the push target has not been moved forward
since the last push.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:27:57 -05:00
Christoph Hellwig
4f59af758f xfs: remove iolock lock classes
Content-Disposition: inline; filename=xfs-remove-iolock-classes

Now that we never take the iolock during inode reclaim we don't need
to play games with lock classes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Rich Johnston <rjohnston@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:23:51 -05:00
Christoph Hellwig
5a15322da1 xfs: avoid the iolock in xfs_free_eofblocks for evicted inodes
Same rational as the last patch - these inodes are not reachable, so
don't bother with locking.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Rich Johnston <rjohnston@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:22:20 -05:00
Christoph Hellwig
0b56185b0d xfs: do not take the iolock in xfs_inactive
An inode that enters xfs_inactive has been removed from all global
lists but the inode hash, and can't be recycled in xfs_iget before
it has been marked reclaimable.  Thus taking the iolock in here
is not nessecary at all, and given the amount of lockdep false
positives it has triggered already I'd rather remove the locking.

The only change outside of xfs_inactive is relaxing an assert in
xfs_itruncate_extents.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Rich Johnston <rjohnston@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:16:49 -05:00
Christoph Hellwig
fe67be036f xfs: remove xfs_inactive_attrs
Remove this helper as the code flow is a lot more obvious when it gets
merged into its only caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Rich Johnston <rjohnston@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:15:33 -05:00
Christoph Hellwig
b373e98daa xfs: clean up xfs_inactive
The code to reserve log space and join the inode to the transaction is
common for all cases, so don't duplicate it.  Also remove the trivial
xfs_inactive_symlink_local helper which can simply be opencode now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Rich Johnston <rjohnston@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:13:09 -05:00
Christoph Hellwig
be60fe54b2 xfs: do not read the AGI buffer in xfs_dialloc until nessecary
Refactor the AG selection loop in xfs_dialloc to operate on the in-memory
perag data as much as possible.  We only read the AGI buffer once we have
selected an AG to allocate inodes now instead of for every AG considered.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:10:54 -05:00
Christoph Hellwig
55d6af64cb xfs: refactor xfs_ialloc_ag_select
Loop over the in-core perag structures and prefer using pagi_freecount over
going out to the AGI buffer where possible.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:08:13 -05:00
Christoph Hellwig
4bb61069d2 xfs: add a short cut to xfs_dialloc for the non-NULL agbp case
In this case we already have selected an AG and know it has free space
beause the buffer lock never got released.  Jump directly into xfs_dialloc_ag
and short cut the AG selection loop.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:03:23 -05:00
Christoph Hellwig
08358906ed xfs: remove the alloc_done argument to xfs_dialloc
We can simplify check the IO_agbp pointer for being non-NULL instead of
passing another argument through two layers of function calls.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:00:31 -05:00
Christoph Hellwig
f2ecc5e453 xfs: split xfs_dialloc
Move the actual allocation once we have selected an allocation group into a
separate helper, and make xfs_dialloc a wrapper around it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 15:56:49 -05:00
Linus Torvalds
a410963ba4 Merge branch 'i2c-embedded/for-next' of git://git.pengutronix.de/git/wsa/linux
Pull embedded i2c changes from Wolfram Sang:
 "Changes for the "embedded" part of the I2C subsystem:

   - lots of devicetree conversions of drivers (and preparations for
     that)
   - big cleanups for drivers for OMAP, Tegra, Nomadik, Blackfin
   - Rafael's struct dev_pm_ops conversion patches for I2C
   - usual driver cleanups and fixes

  All patches have been in linux-next for an apropriate time and all
  patches touching files outside of i2c-folders should have proper acks
  from the maintainers."

* 'i2c-embedded/for-next' of git://git.pengutronix.de/git/wsa/linux: (60 commits)
  Revert "i2c: tegra: convert normal suspend/resume to *_noirq"
  I2C: MV64XYZ: Add Device Tree support
  i2c: stu300: use devm managed resources
  i2c: i2c-ocores: support for 16bit and 32bit IO
  V4L/DVB: mfd: use reg_shift instead of regstep
  i2c: i2c-ocores: Use reg-shift property
  i2c: i2c-ocores: DT bindings and minor fixes.
  i2c: mv64xxxx: remove EXPERIMENTAL tag
  i2c-s3c2410: Use plain pm_runtime_put()
  i2c: s3c2410: Fix pointer type passed to of_match_node()
  i2c: mxs: Set I2C timing registers for mxs-i2c
  i2c: i2c-bfin-twi: Move blackfin TWI register access Macro to head file.
  i2c: i2c-bfin-twi: Move TWI peripheral pin request array to platform data.
  i2c:i2c-bfin-twi: include twi head file
  i2c:i2c-bfin-twi: TWI fails to restart next transfer in high system load.
  i2c: i2c-bfin-twi: Tighten condition when failing I2C transfer if MEN bit is reset unexpectedly.
  i2c: i2c-bfin-twi: Break dead waiting loop if i2c device misbehaves.
  i2c: i2c-bfin-twi: Improve the patch for bug "Illegal i2c bus lock upon certain transfer scenarios".
  i2c: i2c-bfin-twi: Illegal i2c bus lock upon certain transfer scenarios.
  i2c-mv64xxxx: allow more than one driver instance
  ...

Conflicts:
	drivers/i2c/busses/i2c-nomadik.c
2012-07-28 13:43:12 -07:00
Linus Torvalds
f7da9cdf45 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Several bug fixes, some to new features appearing in this merge
  window, some that have been around for a while.

  I have a short list of known problems that need to be sorted out, but
  all of them can be solved easily during the run up to 3.6-final.

  I'll be offline until Sunday afternoon, but nothing need hold up
  3.6-rc1 and the close of the merge window, networking wise, at this
  point.

  1) Fix interface check in ipv4 TCP early demux, from Eric Dumazet.

  2) Fix a long standing bug in TCP DMA to userspace offload that can
     hang applications using MSG_TRUNC, from Jiri Kosina.

  3) Don't allow TCP_USER_TIMEOUT to be negative, from Hangbin Liu.

  4) Don't use GFP_KERNEL under spinlock in kaweth driver, from Dan
     Carpenter"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  tcp: perform DMA to userspace only if there is a task waiting for it
  Revert "openvswitch: potential NULL deref in sample()"
  ipv4: fix TCP early demux
  net: fix rtnetlink IFF_PROMISC and IFF_ALLMULTI handling
  USB: kaweth.c: use GFP_ATOMIC under spin_lock
  tcp: Add TCP_USER_TIMEOUT negative value check
  bcma: add missing iounmap on error path
  bcma: fix regression in interrupt assignment on mips
  mac80211_hwsim: fix possible race condition in usage of info->control.sta & control.vif
2012-07-28 06:00:39 -07:00
Li Dongyang
00d39597e8 thinkpad_acpi: Free hotkey_keycode_map after unregistering tpacpi_inputdev
We should free hotkey_keycode_map after unregistering tpacpi_inputdev, to aviod
use after free like this:

[   99.408388] =============================================================================
[   99.408393] BUG kmalloc-64 (Not tainted): Poison overwritten
[   99.408394] -----------------------------------------------------------------------------
[   99.408394]
[   99.408398] INFO: 0xf2751962-0xf2751995. First byte 0x98 instead of 0x6b
[   99.408402] INFO: Allocated in 0xfdc88c28 age=79 cpu=0 pid=1329
[   99.408407]  __slab_alloc.isra.50.constprop.56+0x49f/0x533
[   99.408410]  kmem_cache_alloc_trace+0x10d/0x140
[   99.408412]  0xfdc88c28
[   99.408414]  0xfdc898cc
[   99.408417]  do_one_initcall+0x112/0x160
[   99.408420]  sys_init_module+0xe6d/0x1bc0
[   99.408422]  sysenter_do_call+0x12/0x28
[   99.408427] INFO: Freed in hotkey_exit+0x50/0xb0 [thinkpad_acpi] age=14 cpu=1 pid=1333
[   99.408429]  __slab_free+0x3d/0x30b
[   99.408431]  kfree+0x129/0x140
[   99.408435]  hotkey_exit+0x50/0xb0 [thinkpad_acpi]
[   99.408438]  ibm_exit+0xe3/0x1a0 [thinkpad_acpi]
[   99.408441]  thinkpad_acpi_module_exit+0x35/0x208 [thinkpad_acpi]
[   99.408443]  sys_delete_module+0x11f/0x280
[   99.408445]  sysenter_do_call+0x12/0x28
[   99.408447] INFO: Slab 0xf4d5ea20 objects=17 used=17 fp=0x  (null) flags=0x40000080
[   99.408449] INFO: Object 0xf2751960 @offset=2400 fp=0xf2751780
[   99.408449]
[   99.408452] Bytes b4 f2751950: 64 02 00 00 ae ce fe ff 5a 5a 5a 5a 5a 5a 5a 5a  d.......ZZZZZZZZ
[   99.408454] Object f2751960: 6b 6b 98 00 ec 00 8e 00 ee 00 6b 6b e3 00 bf 00 kk........kk....
[   99.408456] Object f2751970: c2 00 6b 6b 6b 6b cd 00 6b 6b 6b 6b 6b 6b e1 00 ..kkkk..kkkkkk..
[   99.408458] Object f2751980: e0 00 e4 00 6b 6b 74 01 73 00 72 00 71 00 94 00 ....kkt.s.r.q...
[   99.408460] Object f2751990: 6b 6b 6b 6b f8 00 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkk..kkkkkkkkk.
[   99.408462] Redzone f27519a0: bb bb bb bb ....

Signed-off-by: Li Dongyang <Jerry87905@gmail.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
2012-07-28 00:28:57 -04:00
Li Dongyang
d2be15bdda thinkpad_acpi: Fix a memory leak during module exit
We should free the thinkpad_id.nummodel_str during exit as it's allocated
in get_thinkpad_module_data().

Signed-off-by: Li Dongyang <Jerry87905@gmail.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
2012-07-28 00:28:56 -04:00
Li Dongyang
e03e389da5 thinkpad_acpi: Flush the workqueue before freeing tpacpi_leds
We init work_struct within tpacpi_leds, and we should free tpacpi_leds after
the workqueue is empty, in case of the work_struct is referenced after free.

This script could trigger the OOPS:

#!/bin/sh

while true
do
    modprobe -r thinkpad_acpi
    modprobe thinkpad_acpi
done

And the OOPS looks like this:

[   73.863557] BUG: unable to handle kernel paging request at 45440000
[   73.863925] IP: [<c1051d65>] process_one_work+0x25/0x3b0
[   73.864749] *pde = 00000000
[   73.865571] Oops: 0000 [#1] PREEMPT SMP
[   73.866443] Modules linked in: thinkpad_acpi(-) nvram netconsole configfs
aes_i586 cryptd aes_generic joydev btusb bluetooth arc4 snd_hda_codec_analog
iwl4965 uhci_hcd pcmcia microcode iwlegacy mac80211 cfg80211 firewire_ohci
firewire_core kvm_intel kvm snd_hda_intel acpi_cpufreq mperf ehci_hcd yenta_socket
pcmcia_rsrc crc_itu_t sr_mod snd_hda_codec processor pcmcia_core i2c_i801 usbcore
lpc_ich cdrom serio_raw psmouse coretemp rfkill e1000e snd_pcm snd_page_alloc
snd_hwdep snd_timer snd pcspkr evdev ac battery thermal soundcore usb_common
intel_agp intel_gtt tp_smapi(O) thinkpad_ec(O) ext4 crc16 jbd2 mbcache sd_mod
ata_piix ahci libahci libata scsi_mod nouveau button video mxm_wmi wmi
i2c_algo_bit drm_kms_helper ttm drm agpgart i2c_core [last unloaded: nvram]
 [   73.866676]
 [   73.866676] Pid: 62, comm: kworker/u:4 Tainted: G           O 3.5.0-1-ARCH
 #1 LENOVO 7662CTO/7662CTO
 [   73.866676] EIP: 0060:[<c1051d65>] EFLAGS: 00010002 CPU: 1
 [   73.866676] EIP is at process_one_work+0x25/0x3b0
 [   73.866676] EAX: 45440065 EBX: f5545090 ECX: 00000088 EDX: 45440000
 [   73.866676] ESI: f568ff40 EDI: c164dd40 EBP: f5705f98 ESP: f5705f68
 [   73.866676]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
 [   73.866676] CR0: 8005003b CR2: 45440000 CR3: 357ed000 CR4: 000007d0
 [   73.866676] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
 [   73.866676] DR6: ffff0ff0 DR7: 00000400
 [   73.866676] Process kworker/u:4 (pid: 62, ti=f5704000 task=f5700540 task.ti=f5704000)
 [   73.866676] Stack:
 [   73.866676]  f56fbf24 00000001 f5705f78 c10683e0 c1294950 00000000 00000000 f568ff40
 [   73.866676]  00000000 f568ff40 f568ff50 c164dd40 f5705fb8 c1052589 c1060c7e c15b9300
 [   73.866676]  c164dd40 00000000 f568ff40 c1052490 f5705fe4 c10570b2 00000000 f568ff40
 [   73.866676] Call Trace:
 [   73.866676]  [<c10683e0>] ? default_wake_function+0x10/0x20
 [   73.866676]  [<c1294950>] ? dev_get_drvdata+0x20/0x20
 [   73.866676]  [<c1052589>] worker_thread+0xf9/0x280
 [   73.866676]  [<c1060c7e>] ? complete+0x4e/0x60
 [   73.866676]  [<c1052490>] ? manage_workers.isra.24+0x1c0/0x1c0
 [   73.866676]  [<c10570b2>] kthread+0x72/0x80
 [   73.866676]  [<c1057040>] ? kthread_freezable_should_stop+0x50/0x50
 [   73.866676]  [<c13c20fe>] kernel_thread_helper+0x6/0x10
 [   73.866676] Code: bc 27 00 00 00 00 55 89 e5 57 56 53 83 ec 24 3e 8d 74 26
 00 89 c6 8b 02 89 d3 c7 45 f0 00 00 00 00 89 c2 30 d2 a8 04 0f 44 55 f0 <8b> 02 89 55 f0 89 da c1 ea
 0a 89 45 ec 89 d8 8b 4d ec c1 e8 04
 [   73.866676] EIP: [<c1051d65>] process_one_work+0x25/0x3b0 SS:ESP 0068:f5705f68
 [   73.866676] CR2: 0000000045440000
 [   73.866676] ---[ end trace 4d8a1887edca08c5 ]---
 [   73.866676] note: kworker/u:4[62] exited with preempt_count 1

Signed-off-by: Li Dongyang <Jerry87905@gmail.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
2012-07-28 00:28:56 -04:00