Commit graph

451406 commits

Author SHA1 Message Date
Yoshihiro YUNOMAE
748ec3a20e tracing/kprobes: Avoid self tests if tracing is disabled on boot up
If tracing is disabled on boot up, the kernel should not execute tracing
self tests. The kernel should check whether tracing is disabled or not
before executing any of the tracing self tests.

Link: http://lkml.kernel.org/p/20140605223520.32311.56097.stgit@yunodevel

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-06 04:53:39 -04:00
Yoshihiro YUNOMAE
dc81e5e3ab tracing: Return error if ftrace_trace_arrays list is empty
ftrace_trace_arrays links global_trace.list. However, global_trace
is not added to ftrace_trace_arrays if trace_alloc_buffers() failed.
As the result, ftrace_trace_arrays becomes an empty list. If
ftrace_trace_arrays is an empty list, current top_trace_array() returns
an invalid pointer. As the result, the kernel can induce memory corruption
or panic.

Current implementation does not check whether ftrace_trace_arrays is empty
list or not. So, in this patch, if ftrace_trace_arrays is empty list,
top_trace_array() returns NULL. Moreover, this patch makes all functions
calling top_trace_array() handle it appropriately.

Link: http://lkml.kernel.org/p/20140605223517.32311.99233.stgit@yunodevel

Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-06 04:47:46 -04:00
Nicholas Bellinger
f145377351 target: Fix alua_access_state attribute OOPs for un-configured devices
This patch fixes a OOPs where an attempt to write to the per-device
alua_access_state configfs attribute at:

  /sys/kernel/config/target/core/$HBA/$DEV/alua/$TG_PT_GP/alua_access_state

results in an NULL pointer dereference when the backend device has not
yet been configured.

This patch adds an explicit check for DF_CONFIGURED, and fails with
-ENODEV to avoid this case.

Reported-by: Chris Boot <crb@tiger-computing.co.uk>
Reported-by: Philip Gaw <pgaw@darktech.org.uk>
Cc: Chris Boot <crb@tiger-computing.co.uk>
Cc: Philip Gaw <pgaw@darktech.org.uk>
Cc: stable@vger.kernel.org # 3.8+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-06 01:22:41 -07:00
Nicholas Bellinger
e7810c2d2c target: Allow READ_CAPACITY opcode in ALUA Standby access state
This patch allows READ_CAPACITY + SAI_READ_CAPACITY_16 opcode
processing to occur while the associated ALUA group is in Standby
access state.

This is required to avoid host side LUN probe failures during the
initial scan if an ALUA group has already implicitly changed into
Standby access state.

This addresses a bug reported by Chris + Philip using dm-multipath
+ ESX hosts configured with ALUA multipath.

Reported-by: Chris Boot <crb@tiger-computing.co.uk>
Reported-by: Philip Gaw <pgaw@darktech.org.uk>
Cc: Chris Boot <crb@tiger-computing.co.uk>
Cc: Philip Gaw <pgaw@darktech.org.uk>
Cc: Hannes Reinecke <hare@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-06 01:21:12 -07:00
Steven Rostedt (Red Hat)
34839f5a69 tracing: Only calculate stats of tracepoint benchmarks for 2^32 times
When calculating the average and standard deviation, it is required that
the count be less than UINT_MAX, otherwise the do_div() will get
undefined results. After 2^32 counts of data, the average and standard
deviation should pretty much be set anyway.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-06 00:41:38 -04:00
Michael Ellerman
d34b661b10 selftests/powerpc: Test the THP bug we fixed in the previous commit
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-06 13:55:04 +10:00
Michael Ellerman
09567e7fd4 powerpc/mm: Check paca psize is up to date for huge mappings
We have a bug in our hugepage handling which exhibits as an infinite
loop of hash faults. If the fault is being taken in the kernel it will
typically trigger the softlockup detector, or the RCU stall detector.

The bug is as follows:

 1. mmap(0xa0000000, ..., MAP_FIXED | MAP_HUGE_TLB | MAP_ANONYMOUS ..)
 2. Slice code converts the slice psize to 16M.
 3. The code on lines 539-540 of slice.c in slice_get_unmapped_area()
    synchronises the mm->context with the paca->context. So the paca slice
    mask is updated to include the 16M slice.
 3. Either:
    * mmap() fails because there are no huge pages available.
    * mmap() succeeds and the mapping is then munmapped.
    In both cases the slice psize remains at 16M in both the paca & mm.
 4. mmap(0xa0000000, ..., MAP_FIXED | MAP_ANONYMOUS ..)
 5. The slice psize is converted back to 64K. Because of the check on line 539
    of slice.c we DO NOT update the paca->context. The paca slice mask is now
    out of sync with the mm slice mask.
 6. User/kernel accesses 0xa0000000.
 7. The SLB miss handler slb_allocate_realmode() **uses the paca slice mask**
    to create an SLB entry and inserts it in the SLB.
18. With the 16M SLB entry in place the hardware does a hash lookup, no entry
    is found so a data access exception is generated.
19. The data access handler calls do_page_fault() -> handle_mm_fault().
10. __handle_mm_fault() creates a THP mapping with do_huge_pmd_anonymous_page().
11. The hardware retries the access, there is still nothing in the hash table
    so once again a data access exception is generated.
12. hash_page() calls into __hash_page_thp() and inserts a mapping in the
    hash. Although the THP mapping maps 16M the hashing is done using 64K
    as the segment page size.
13. hash_page() returns immediately after calling __hash_page_thp(), skipping
    over the code at line 1125. Resulting in the mismatch between the
    paca->context and mm->context not being detected.
14. The hardware retries the access, the hash it generates using the 16M
    SLB entry does NOT match the hash we inserted.
15. We take another data access and go into __hash_page_thp().
16. We see a valid entry in the hpte_slot_array and so we call updatepp()
    which succeeds.
17. Goto 14.

We could fix this in two ways. The first would be to remove or modify
the check on line 539 of slice.c.

The second option is to cause the check of paca psize in hash_page() on
line 1125 to also be done for THP pages.

We prefer the latter, because the check & update of the paca psize is
not done until we know it's necessary. It's also done only on the
current cpu, so we don't need to IPI all other cpus.

Without further rearranging the code, the simplest fix is to pull out
the code that checks paca psize and call it in two places. Firstly for
THP/hugetlb, and secondly for other mappings as before.

Thanks to Dave Jones for trinity, which originally found this bug.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: stable@vger.kernel.org [v3.11+]
2014-06-06 13:54:26 +10:00
Steven Rostedt (Red Hat)
72e2fe38ea tracing: Convert stddev into u64 in tracepoint benchmark
I've been told that do_div() expects an unsigned 64 bit number, and
is undefined if a signed is used. This gave a warning on the MIPS
build. I'm not sure if a signed 64 bit dividend is really an issue
or not, but the calculation this is used for is standard deviation,
and that isn't going to be negative. We can just convert it to
unsigned and be safe.

Reported-by: David Daney <ddaney.cavm@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-05 20:35:30 -04:00
Linus Torvalds
2b03adc191 Microblaze patches for 3.16-rc1
- Clean PCI and DMA handling
 - Use generic device.h
 - Some cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iEYEABECAAYFAlOQT0MACgkQykllyylKDCF7mwCbBso3EMWGPPkpMiieYiy6pVze
 AYYAoJ4xyXwVZC+bNf6Bf3ojU7QXPQc1
 =qXNT
 -----END PGP SIGNATURE-----

Merge tag 'microblaze-3.16-rc1' of git://git.monstr.eu/linux-2.6-microblaze into next

Pull Microblaze updates from Michal Simek:
 - cleanup PCI and DMA handling
 - use generic device.h
 - some cleanups

* tag 'microblaze-3.16-rc1' of git://git.monstr.eu/linux-2.6-microblaze:
  microblaze: Fix typo in head.S s/substract/subtract/
  microblaze: remove check for CONFIG_XILINX_CONSOLE
  microblaze: Use generic device.h
  microblaze: Do not setup empty unmap_sg function
  microblaze: Remove device_to_mask
  microblaze: Clean device dma_ops structure
  microblaze: Cleanup PCI_DRAM_OFFSET handling
  microblaze: Do not setup pci_dma_ops
  microblaze: Return default dma operations
  microblaze: Enable SERIAL_OF_PLATFORM
2014-06-05 16:15:33 -07:00
Linus Torvalds
eb3d3ec567 Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm into next
Pull ARM updates from Russell King:

 - Major clean-up of the L2 cache support code.  The existing mess was
   becoming rather unmaintainable through all the additions that others
   have done over time.  This turns it into a much nicer structure, and
   implements a few performance improvements as well.

 - Clean up some of the CP15 control register tweaks for alignment
   support, moving some code and data into alignment.c

 - DMA properties for ARM, from Santosh and reviewed by DT people.  This
   adds DT properties to specify bus translations we can't discover
   automatically, and to indicate whether devices are coherent.

 - Hibernation support for ARM

 - Make ftrace work with read-only text in modules

 - add suspend support for PJ4B CPUs

 - rework interrupt masking for undefined instruction handling, which
   allows us to enable interrupts earlier in the handling of these
   exceptions.

 - support for big endian page tables

 - fix stacktrace support to exclude stacktrace functions from the
   trace, and add save_stack_trace_regs() implementation so that kprobes
   can record stack traces.

 - Add support for the Cortex-A17 CPU.

 - Remove last vestiges of ARM710 support.

 - Removal of ARM "meminfo" structure, finally converting us solely to
   memblock to handle the early memory initialisation.

* 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (142 commits)
  ARM: ensure C page table setup code follows assembly code (part II)
  ARM: ensure C page table setup code follows assembly code
  ARM: consolidate last remaining open-coded alignment trap enable
  ARM: remove global cr_no_alignment
  ARM: remove CPU_CP15 conditional from alignment.c
  ARM: remove unused adjust_cr() function
  ARM: move "noalign" command line option to alignment.c
  ARM: provide common method to clear bits in CPU control register
  ARM: 8025/1: Get rid of meminfo
  ARM: 8060/1: mm: allow sub-architectures to override PCI I/O memory type
  ARM: 8066/1: correction for ARM patch 8031/2
  ARM: 8049/1: ftrace/add save_stack_trace_regs() implementation
  ARM: 8065/1: remove last use of CONFIG_CPU_ARM710
  ARM: 8062/1: Modify ldrt fixup handler to re-execute the userspace instruction
  ARM: 8047/1: rwsem: use asm-generic rwsem implementation
  ARM: l2c: trial at enabling some Cortex-A9 optimisations
  ARM: l2c: add warnings for stuff modifying aux_ctrl register values
  ARM: l2c: print a warning with L2C-310 caches if the cache size is modified
  ARM: l2c: remove old .set_debug method
  ARM: l2c: kill L2X0_AUX_CTRL_MASK before anyone else makes use of this
  ...
2014-06-05 15:57:04 -07:00
Linus Torvalds
c3c55a0720 Merge branch 'arm64-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull ARM64 EFI update from Peter Anvin:
 "By agreement with the ARM64 EFI maintainers, we have agreed to make
  -tip the upstream for all EFI patches.  That is why this patchset
  comes from me :)

  This patchset enables EFI stub support for ARM64, like we already have
  on x86"

* 'arm64-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  arm64: efi: only attempt efi map setup if booting via EFI
  efi/arm64: ignore dtb= when UEFI SecureBoot is enabled
  doc: arm64: add description of EFI stub support
  arm64: efi: add EFI stub
  doc: arm: add UEFI support documentation
  arm64: add EFI runtime services
  efi: Add shared FDT related functions for ARM/ARM64
  arm64: Add function to create identity mappings
  efi: add helper function to get UEFI params from FDT
  doc: efi-stub.txt updates for ARM
  lib: add fdt_empty_tree.c
2014-06-05 13:15:32 -07:00
H. Peter Anvin
177875423e * Fix earlyprintk=efi,keep support by switching to an ioremap() mapping
of the framebuffer when early_ioremap() is no longer available and
    dropping __init from functions that may be invoked after
    free_initmem() - Dave Young
 
  * We shouldn't be exporting the EFI runtime map in sysfs if not using
    the new 1:1 EFI mapping code since in that case the mappings are not
    static across a kexec reboot - Dave Young
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTj0xzAAoJEC84WcCNIz1VgBgP/3rzGcVB6JBT5qAy2SCcKkL1
 zKAvMNiFWpfyOwICE1gbfZzDky9DIVRXpzriLIe27aQzgn/GYBLH9hAIsYO/5W1m
 uw/KRMKbUkvfnUKPxZUMe95OMsXqwhYth7oGJXq1yFa6SsQdCU4jnmMflUQ6IWfy
 sXtQzlV7frFO0W55Gxv4jRB/nxrExtTHm9Fc01H9xvguKO2l0KEh0C9PbyfZLQp0
 FffxGhcVSZXG23fB5eU0yKLV69buDk83O1y4lQ79zETvCTjuzbpGe2agFDxrlJ/u
 wxZROMStvthkHqapheGwwBu4SySoKsuNRQHKxeJ2t3XEVSBuW+c9jiYYMmpDS0t6
 LhIdLdDtBk8ItsTY60r6w86oLvQVTnkwLxwX/XdzUyA/bSzf/EzHENw2iPbJhpjn
 qfq2ZonXD8j/EVPyYXk65wUea4tMmG1QYpi9oPqMp3b4hCgoJr7qosOqvT9v+E3/
 bbGTjr6bgzOaXdctOocmEg0qfhMw/Ol63V9P4wsiN5mUq1nRJnJcKQtqA3Do08IP
 RgZJg8yQAe9FAfZ0RDaRKs5OSPgnC42t/d6pr9JdiFzgR5Ktw7BymfmOu52o7Koq
 EJFf1TlD+Kipdyo/1Q4s9vtZ1ijO5If6FSL9rYJoduWMWCPSZ9by43uCIbEobEQh
 aXD9ZvSyeu5IwD4BoBiO
 =7zSB
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent' into x86/urgent

 * Fix earlyprintk=efi,keep support by switching to an ioremap() mapping
   of the framebuffer when early_ioremap() is no longer available and
   dropping __init from functions that may be invoked after
   free_initmem() - Dave Young

 * We shouldn't be exporting the EFI runtime map in sysfs if not using
   the new 1:1 EFI mapping code since in that case the mappings are not
   static across a kexec reboot - Dave Young

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-06-05 13:09:44 -07:00
Linus Torvalds
951e273060 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Two last minute tooling fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf probe: Fix perf probe to find correct variable DIE
  perf probe: Fix a segfault if asked for variable it doesn't find
2014-06-05 12:51:05 -07:00
Linus Torvalds
1c5aefb5b1 Merge branch 'futex-fixes' (futex fixes from Thomas Gleixner)
Merge futex fixes from Thomas Gleixner:
 "So with more awake and less futex wreckaged brain, I went through my
  list of points again and came up with the following 4 patches.

  1) Prevent pi requeueing on the same futex

     I kept Kees check for uaddr1 == uaddr2 as a early check for private
     futexes and added a key comparison to both futex_requeue and
     futex_wait_requeue_pi.

     Sebastian, sorry for the confusion yesterday night.  I really
     misunderstood your question.

     You are right the check is pointless for shared futexes where the
     same physical address is mapped to two different virtual addresses.

  2) Sanity check atomic acquisiton in futex_lock_pi_atomic

     That's basically what Darren suggested.

     I just simplified it to use futex_top_waiter() to find kernel
     internal state.  If state is found return -EINVAL and do not bother
     to fix up the user space variable.  It's corrupted already.

  3) Ensure state consistency in futex_unlock_pi

     The code is silly versus the owner died bit.  There is no point to
     preserve it on unlock when the user space thread owns the futex.

     What's worse is that it does not update the user space value when
     the owner died bit is set.  So the kernel itself creates observable
     inconsistency.

     Another "optimization" is to retry an atomic unlock.  That's
     pointless as in a sane environment user space would not call into
     that code if it could have unlocked it atomically.  So we always
     check whether there is kernel state around and only if there is
     none, we do the unlock by setting the user space value to 0.

  4) Sanitize lookup_pi_state

     lookup_pi_state is ambigous about TID == 0 in the user space value.

     This can be a valid state even if there is kernel state on this
     uaddr, but we miss a few corner case checks.

     I tried to come up with a smaller solution hacking the checks into
     the current cruft, but it turned out to be ugly as hell and I got
     more confused than I was before.  So I rewrote the sanity checks
     along the state documentation with awful lots of commentry"

* emailed patches from Thomas Gleixner <tglx@linutronix.de>:
  futex: Make lookup_pi_state more robust
  futex: Always cleanup owner tid in unlock_pi
  futex: Validate atomic acquisition in futex_lock_pi_atomic()
  futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 in futex_requeue(..., requeue_pi=1)
2014-06-05 12:31:32 -07:00
Thomas Gleixner
54a217887a futex: Make lookup_pi_state more robust
The current implementation of lookup_pi_state has ambigous handling of
the TID value 0 in the user space futex.  We can get into the kernel
even if the TID value is 0, because either there is a stale waiters bit
or the owner died bit is set or we are called from the requeue_pi path
or from user space just for fun.

The current code avoids an explicit sanity check for pid = 0 in case
that kernel internal state (waiters) are found for the user space
address.  This can lead to state leakage and worse under some
circumstances.

Handle the cases explicit:

       Waiter | pi_state | pi->owner | uTID      | uODIED | ?

  [1]  NULL   | ---      | ---       | 0         | 0/1    | Valid
  [2]  NULL   | ---      | ---       | >0        | 0/1    | Valid

  [3]  Found  | NULL     | --        | Any       | 0/1    | Invalid

  [4]  Found  | Found    | NULL      | 0         | 1      | Valid
  [5]  Found  | Found    | NULL      | >0        | 1      | Invalid

  [6]  Found  | Found    | task      | 0         | 1      | Valid

  [7]  Found  | Found    | NULL      | Any       | 0      | Invalid

  [8]  Found  | Found    | task      | ==taskTID | 0/1    | Valid
  [9]  Found  | Found    | task      | 0         | 0      | Invalid
  [10] Found  | Found    | task      | !=taskTID | 0/1    | Invalid

 [1] Indicates that the kernel can acquire the futex atomically. We
     came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit.

 [2] Valid, if TID does not belong to a kernel thread. If no matching
     thread is found then it indicates that the owner TID has died.

 [3] Invalid. The waiter is queued on a non PI futex

 [4] Valid state after exit_robust_list(), which sets the user space
     value to FUTEX_WAITERS | FUTEX_OWNER_DIED.

 [5] The user space value got manipulated between exit_robust_list()
     and exit_pi_state_list()

 [6] Valid state after exit_pi_state_list() which sets the new owner in
     the pi_state but cannot access the user space value.

 [7] pi_state->owner can only be NULL when the OWNER_DIED bit is set.

 [8] Owner and user space value match

 [9] There is no transient state which sets the user space TID to 0
     except exit_robust_list(), but this is indicated by the
     FUTEX_OWNER_DIED bit. See [4]

[10] There is no transient state which leaves owner and user space
     TID out of sync.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Drewry <wad@chromium.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 12:31:07 -07:00
Thomas Gleixner
13fbca4c6e futex: Always cleanup owner tid in unlock_pi
If the owner died bit is set at futex_unlock_pi, we currently do not
cleanup the user space futex.  So the owner TID of the current owner
(the unlocker) persists.  That's observable inconsistant state,
especially when the ownership of the pi state got transferred.

Clean it up unconditionally.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Drewry <wad@chromium.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 12:31:07 -07:00
Thomas Gleixner
b3eaa9fc5c futex: Validate atomic acquisition in futex_lock_pi_atomic()
We need to protect the atomic acquisition in the kernel against rogue
user space which sets the user space futex to 0, so the kernel side
acquisition succeeds while there is existing state in the kernel
associated to the real owner.

Verify whether the futex has waiters associated with kernel state.  If
it has, return -EINVAL.  The state is corrupted already, so no point in
cleaning it up.  Subsequent calls will fail as well.  Not our problem.

[ tglx: Use futex_top_waiter() and explain why we do not need to try
  	restoring the already corrupted user space state. ]

Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Drewry <wad@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 12:31:07 -07:00
Thomas Gleixner
e9c243a5a6 futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 in futex_requeue(..., requeue_pi=1)
If uaddr == uaddr2, then we have broken the rule of only requeueing from
a non-pi futex to a pi futex with this call.  If we attempt this, then
dangling pointers may be left for rt_waiter resulting in an exploitable
condition.

This change brings futex_requeue() in line with futex_wait_requeue_pi()
which performs the same check as per commit 6f7b0a2a5c ("futex: Forbid
uaddr == uaddr2 in futex_wait_requeue_pi()")

[ tglx: Compare the resulting keys as well, as uaddrs might be
  	different depending on the mapping ]

Fixes CVE-2014-3153.

Reported-by: Pinkie Pie
Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 12:31:07 -07:00
Trond Myklebust
abbec2da13 NFS: Use raw_write_seqcount_begin/end int nfs4_reclaim_open_state
The addition of lockdep code to write_seqcount_begin/end has lead to
a bunch of false positive claims of ABBA deadlocks with the so_lock
spinlock. Audits show that this simply cannot happen because the
read side code does not spin while holding so_lock.

Cc: <stable@vger.kernel.org> # 3.13.x
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-05 13:02:47 -04:00
Haggai Eran
584482ac80 IB/core: Fix kobject leak on device register error flow
The ports kobject isn't being released during error flow in device
registration.  This patch refactors the ports kobject cleanup into a
single function called from both the error flow in device registration
and from the unregistration function.

A couple of attributes aren't being deleted (iw_stats_group, and
ib_class_attributes).  While this may be handled implicitly by the
destruction of their kobjects, it seems better to handle all the
attributes the same way.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>

[ Make free_port_list_attributes() static.  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-05 09:37:10 -07:00
Yoshihiro YUNOMAE
939c7a4f04 tracing: Introduce saved_cmdlines_size file
Introduce saved_cmdlines_size file for changing the number of saved pid-comms.
saved_cmdlines currently stores 128 command names using SAVED_CMDLINES, but
'no-existing processes' names are often lost in saved_cmdlines when we
read the trace data. So, by introducing saved_cmdlines_size file, we can
now change the 128 command names saved to something much larger if needed.

When we write a value to saved_cmdlines_size, the number of the value will
be stored in pid-comm list:

	# echo 1024 > /sys/kernel/debug/tracing/saved_cmdlines_size

Here, 1024 command names can be stored. The default number is 128 and the maximum
number is PID_MAX_DEFAULT (=32768 if CONFIG_BASE_SMALL is not set). So, if we
want to avoid losing any command names, we need to set 32768 to
saved_cmdlines_size.

We can read the maximum number of the list:

	# cat /sys/kernel/debug/tracing/saved_cmdlines_size
	128

Link: http://lkml.kernel.org/p/20140605012427.22115.16173.stgit@yunodevel

Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-05 12:35:49 -04:00
Yann Droneaud
b7dfa8895f RDMA/cxgb4: add missing padding at end of struct c4iw_alloc_ucontext_resp
The i386 ABI disagrees with most other ABIs regarding alignment of
data types larger than 4 bytes: on most ABIs a padding must be added
at end of the structures, while it is not required on i386.

So for most ABI struct c4iw_alloc_ucontext_resp gets implicitly padded
to be aligned on a 8 bytes multiple, while for i386, such padding is
not added.

The tool pahole can be used to find such implicit padding:

  $ pahole --anon_include \
           --nested_anon_include \
           --recursive \
           --class_name c4iw_alloc_ucontext_resp \
           drivers/infiniband/hw/cxgb4/iw_cxgb4.o

Then, structure layout can be compared between i386 and x86_64:

  +++ obj-i386/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt   2014-03-28 11:43:05.547432195 +0100
  --- obj-x86_64/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 10:55:10.990133017 +0100
  @@ -2,9 +2,8 @@ struct c4iw_alloc_ucontext_resp {
          __u64                      status_page_key;      /*     0     8 */
          __u32                      status_page_size;     /*     8     4 */

  -       /* size: 12, cachelines: 1, members: 2 */
  -       /* last cacheline: 12 bytes */
  +       /* size: 16, cachelines: 1, members: 2 */
  +       /* padding: 4 */
  +       /* last cacheline: 16 bytes */
   };

This ABI disagreement will make an x86_64 kernel try to write past the
buffer provided by an i386 binary.

When boundary check will be implemented, the x86_64 kernel will refuse
to write past the i386 userspace provided buffer and the uverbs will
fail.

If the structure is on a page boundary and the next page is not
mapped, ib_copy_to_udata() will fail and the uverb will fail.

Additionally, as reported by Dan Carpenter, without the implicit
padding being properly cleared, an information leak would take place
in most architectures.

This patch adds an explicit padding to struct c4iw_alloc_ucontext_resp,
and, like 92b0ca7cb1 ("IB/mlx5: Fix stack info leak in
mlx5_ib_alloc_ucontext()"), makes function c4iw_alloc_ucontext()
not writting this padding field to userspace. This way, x86_64 kernel
will be able to write struct c4iw_alloc_ucontext_resp as expected by
unpatched and patched i386 libcxgb4.

Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com
Link: http://marc.info/?i=1395848977.3297.15.camel@localhost.localdomain
Link: http://marc.info/?i=20140328082428.GH25192@mwanda
Cc: <stable@vger.kernel.org>
Fixes: 05eb23893c ("cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes")
Reported-by: Yann Droneaud <ydroneaud@opteya.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-05 09:13:54 -07:00
Linus Torvalds
046f153343 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull x86 EFI updates from Peter Anvin:
 "A collection of EFI changes.  The perhaps most important one is to
  fully save and restore the FPU state around each invocation of EFI
  runtime, and to not choke on non-ASCII characters in the boot stub"

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efivars: Add compatibility code for compat tasks
  efivars: Refactor sanity checking code into separate function
  efivars: Stop passing a struct argument to efivar_validate()
  efivars: Check size of user object
  efivars: Use local variables instead of a pointer dereference
  x86/efi: Save and restore FPU context around efi_calls (i386)
  x86/efi: Save and restore FPU context around efi_calls (x86_64)
  x86/efi: Implement a __efi_call_virt macro
  x86, fpu: Extend the use of static_cpu_has_safe
  x86/efi: Delete most of the efi_call* macros
  efi: x86: Handle arbitrary Unicode characters
  efi: Add get_dram_base() helper function
  efi: Add shared printk wrapper for consistent prefixing
  efi: create memory map iteration helper
  efi: efi-stub-helper cleanup
2014-06-05 08:16:29 -07:00
Linus Torvalds
a0abcf2e8f Merge branch 'x86/vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull x86 cdso updates from Peter Anvin:
 "Vdso cleanups and improvements largely from Andy Lutomirski.  This
  makes the vdso a lot less ''special''"

* 'x86/vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vdso, build: Make LE access macros clearer, host-safe
  x86/vdso, build: Fix cross-compilation from big-endian architectures
  x86/vdso, build: When vdso2c fails, unlink the output
  x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET
  x86, mm: Replace arch_vma_name with vm_ops->name for vsyscalls
  x86, mm: Improve _install_special_mapping and fix x86 vdso naming
  mm, fs: Add vm_ops->name as an alternative to arch_vma_name
  x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET
  x86, vdso: Remove vestiges of VDSO_PRELINK and some outdated comments
  x86, vdso: Move the vvar and hpet mappings next to the 64-bit vDSO
  x86, vdso: Move the 32-bit vdso special pages after the text
  x86, vdso: Reimplement vdso.so preparation in build-time C
  x86, vdso: Move syscall and sysenter setup into kernel/cpu/common.c
  x86, vdso: Clean up 32-bit vs 64-bit vdso params
  x86, mm: Ensure correct alignment of the fixmap
2014-06-05 08:05:29 -07:00
Linus Torvalds
2071b3e34f Merge branch 'x86/espfix' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull x86-64 espfix changes from Peter Anvin:
 "This is the espfix64 code, which fixes the IRET information leak as
  well as the associated functionality problem.  With this code applied,
  16-bit stack segments finally work as intended even on a 64-bit
  kernel.

  Consequently, this patchset also removes the runtime option that we
  added as an interim measure.

  To help the people working on Linux kernels for very small systems,
  this patchset also makes these compile-time configurable features"

* 'x86/espfix' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option"
  x86, espfix: Make it possible to disable 16-bit support
  x86, espfix: Make espfix64 a Kconfig option, fix UML
  x86, espfix: Fix broken header guard
  x86, espfix: Move espfix definitions into a separate header file
  x86-32, espfix: Remove filter for espfix32 due to race
  x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack
2014-06-05 07:46:15 -07:00
Linus Torvalds
9df0fe64eb Merge branch 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull x86 x32 ABI fix from Peter Anvin:
 "A single fix for the x32 ABI: the io_setup() and io_submit() system
  call need to use the compat stubs"

* 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, x32: Use compat shims for io_{setup,submit}
2014-06-05 07:38:07 -07:00
Igor Mammedov
3e1a878b7c x86/smpboot: Initialize secondary CPU only if master CPU will wait for it
Hang is observed on virtual machines during CPU hotplug,
especially in big guests with many CPUs. (It reproducible
more often if host is over-committed).

It happens because master CPU gives up waiting on
secondary CPU and allows it to run wild. As result
AP causes locking or crashing system. For example
as described here:

   https://lkml.org/lkml/2014/3/6/257

If master CPU have sent STARTUP IPI successfully,
and AP signalled to master CPU that it's ready
to start initialization, make master CPU wait
indefinitely till AP is onlined.
To ensure that AP won't ever run wild, make it
wait at early startup till master CPU confirms its
intention to wait for AP. If AP doesn't respond in 10
seconds, the master CPU will timeout and cancel
AP onlining.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1401975765-22328-4-git-send-email-imammedo@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 16:33:08 +02:00
Igor Mammedov
feef1e8ecb x86/smpboot: Log error on secondary CPU wakeup failure at ERR level
If system is running without debug level logging,
it will not log error if do_boot_cpu() failed to
wakeup AP. It may lead to silent AP bringup
failures at boot time.
Change message level to KERN_ERR to make error
visible to user as it's done on other architectures.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1401975765-22328-3-git-send-email-imammedo@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 16:33:07 +02:00
Igor Mammedov
89f898c1e1 x86: Fix list/memory corruption on CPU hotplug
currently if AP wake up is failed, master CPU marks AP as not
present in do_boot_cpu() by calling set_cpu_present(cpu, false).
That leads to following list corruption on the next physical CPU
hotplug:

[  418.107336] WARNING: CPU: 1 PID: 45 at lib/list_debug.c:33 __list_add+0xbe/0xd0()
[  418.115268] list_add corruption. prev->next should be next (ffff88003dc57600), but was ffff88003e20c3a0. (prev=ffff88003e20c3a0).
[  418.123693] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT ipt_REJECT cfg80211 xt_conntrack rfkill ee
[  418.138979] CPU: 1 PID: 45 Comm: kworker/u10:1 Not tainted 3.14.0-rc6+ #387
[  418.149989] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[  418.165750] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
[  418.166433]  0000000000000021 ffff880038ca7988 ffffffff8159b22d 0000000000000021
[  418.176460]  ffff880038ca79d8 ffff880038ca79c8 ffffffff8106942c ffff880038ca79e8
[  418.177453]  ffff88003e20c3a0 ffff88003dc57600 ffff88003e20c3a0 00000000ffffffea
[  418.178445] Call Trace:
[  418.185811]  [<ffffffff8159b22d>] dump_stack+0x49/0x5c
[  418.186440]  [<ffffffff8106942c>] warn_slowpath_common+0x8c/0xc0
[  418.187192]  [<ffffffff81069516>] warn_slowpath_fmt+0x46/0x50
[  418.191231]  [<ffffffff8136ef51>] ? acpi_ns_get_node+0xb7/0xc7
[  418.193889]  [<ffffffff812f796e>] __list_add+0xbe/0xd0
[  418.196649]  [<ffffffff812e2aa9>] kobject_add_internal+0x79/0x200
[  418.208610]  [<ffffffff812e2e18>] kobject_add_varg+0x38/0x60
[  418.213831]  [<ffffffff812e2ef4>] kobject_add+0x44/0x70
[  418.229961]  [<ffffffff813e2c60>] device_add+0xd0/0x550
[  418.234991]  [<ffffffff813f0e95>] ? pm_runtime_init+0xe5/0xf0
[  418.250226]  [<ffffffff813e32be>] device_register+0x1e/0x30
[  418.255296]  [<ffffffff813e82a3>] register_cpu+0xe3/0x130
[  418.266539]  [<ffffffff81592be5>] arch_register_cpu+0x65/0x150
[  418.285845]  [<ffffffff81355c0d>] acpi_processor_hotadd_init+0x5a/0x9b
...
Which is caused by the fact that generic_processor_info() allocates
logical CPU id by calling:

 cpu = cpumask_next_zero(-1, cpu_present_mask);

which returns id of previously failed to wake up CPU, since its
bit is cleared by do_boot_cpu() and as result register_cpu()
tries to register another CPU with the same id as already
present but failed to be onlined CPU.

Taking in account that AP will not do anything if master CPU
failed to wake it up, there is no reason to mark that AP as not
present and break next cpu hotplug attempts. As a side effect of
not marking AP as not present, user would be allowed to online
it again later.

Also fix memory corruption in acpi_unmap_lsapic()

if during CPU hotplug master CPU failed to wake up AP
it set percpu x86_cpu_to_apicid to BAD_APICID=0xFFFF for AP.

However following attempt to unplug that CPU will lead to
out of bound write access to __apicid_to_node[] which is
32768 items long on x86_64 kernel.

So with above fix of cpu_present_mask make sure that a present
CPU has a valid APIC ID by not setting x86_cpu_to_apicid
to BAD_APICID in do_boot_cpu() on failure and allow
acpi_processor_remove()->acpi_unmap_lsapic() cleanly remove CPU.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1401975765-22328-2-git-send-email-imammedo@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 16:33:07 +02:00
Li Zefan
c731ae1d0f cgroup: disallow disabled controllers on the default hierarchy
After booting with cgroup_disable=memory, I still saw memcg files
in the default hierarchy, and I can write to them, though it won't
take effect.

  # dmesg
  ...
  Disabling memory control group subsystem
  ...
  # mount -t cgroup -o __DEVEL__sane_behavior xxx /cgroup
  # ls /cgroup
  ...
  memory.failcnt                   memory.move_charge_at_immigrate
  memory.force_empty               memory.numa_stat
  memory.limit_in_bytes            memory.oom_control
  ...
  # cat /cgroup/memory.usage_in_bytes
  0

tj: Minor comment update.

Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-06-05 09:52:51 -04:00
Laurent Pinchart
642653d16a i2c: pca954x: Fix compilation without CONFIG_GPIOLIB
The pca954x driver recently switched to the GPIO descriptor API without
including the correct <linux/gpio/consumer.h> header. This breaks
compilation without CONFIG_GPIOLIB.

drivers/i2c/muxes/i2c-mux-pca954x.c: In function ‘pca954x_probe’:
drivers/i2c/muxes/i2c-mux-pca954x.c:204:2: error: implicit declaration
of function ‘devm_gpiod_get’ [-Werror=implicit-function-declaration]
  gpio = devm_gpiod_get(&client->dev, "reset");
  ^
drivers/i2c/muxes/i2c-mux-pca954x.c:204:7: warning: assignment makes
pointer from integer without a cast [enabled by default]
  gpio = devm_gpiod_get(&client->dev, "reset");
       ^
drivers/i2c/muxes/i2c-mux-pca954x.c:206:3: error: implicit declaration
of function ‘gpiod_direction_output’
[-Werror=implicit-function-declaration]
   gpiod_direction_output(gpio, 0);
   ^
cc1: some warnings being treated as errors
make[3]: *** [drivers/i2c/muxes/i2c-mux-pca954x.o] Error 1

Fix it by including the right header.

Reported-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2014-06-05 15:09:33 +02:00
Russell King
bd63ce27d9 Merge branch 'devel-stable' into for-next 2014-06-05 12:36:22 +01:00
Russell King
1fb333489f Merge branches 'alignment', 'fixes', 'l2c' (early part) and 'misc' into for-next 2014-06-05 12:35:52 +01:00
Antonio Ospite
225fba2162 microblaze: Fix typo in head.S s/substract/subtract/
Signed-off-by: Antonio Ospite <ao2@ao2.it>
Cc: Michal Simek <monstr@monstr.eu>
Cc: "Edgar E. Iglesias" <edgar.iglesias@gmail.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
2014-06-05 13:02:02 +02:00
Roman Gushchin
09dc4ab039 sched/fair: Fix tg_set_cfs_bandwidth() deadlock on rq->lock
tg_set_cfs_bandwidth() sets cfs_b->timer_active to 0 to
force the period timer restart. It's not safe, because
can lead to deadlock, described in commit 927b54fccb:
"__start_cfs_bandwidth calls hrtimer_cancel while holding rq->lock,
waiting for the hrtimer to finish. However, if sched_cfs_period_timer
runs for another loop iteration, the hrtimer can attempt to take
rq->lock, resulting in deadlock."

Three CPUs must be involved:

  CPU0               CPU1                         CPU2
  take rq->lock      period timer fired
  ...                take cfs_b lock
  ...                ...                          tg_set_cfs_bandwidth()
  throttle_cfs_rq()  release cfs_b lock           take cfs_b lock
  ...                distribute_cfs_runtime()     timer_active = 0
  take cfs_b->lock   wait for rq->lock            ...
  __start_cfs_bandwidth()
  {wait for timer callback
   break if timer_active == 1}

So, CPU0 and CPU1 are deadlocked.

Instead of resetting cfs_b->timer_active, tg_set_cfs_bandwidth can
wait for period timer callbacks (ignoring cfs_b->timer_active) and
restart the timer explicitly.

Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Reviewed-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/87wqdi9g8e.wl\%klamm@yandex-team.ru
Cc: pjt@google.com
Cc: chris.j.arges@canonical.com
Cc: gregkh@linuxfoundation.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 11:51:34 +02:00
Kirill Tkhai
0f397f2c90 sched/dl: Fix race in dl_task_timer()
Throttled task is still on rq, and it may be moved to other cpu
if user is playing with sched_setaffinity(). Therefore, unlocked
task_rq() access makes the race.

Juri Lelli reports he got this race when dl_bandwidth_enabled()
was not set.

Other thing, pointed by Peter Zijlstra:

   "Now I suppose the problem can still actually happen when
    you change the root domain and trigger a effective affinity
    change that way".

To fix that we do the same as made in __task_rq_lock(). We do not
use __task_rq_lock() itself, because it has a useful lockdep check,
which is not correct in case of dl_task_timer(). We do not need
pi_lock locked here. This case is an exception (PeterZ):

   "The only reason we don't strictly need ->pi_lock now is because
    we're guaranteed to have p->state == TASK_RUNNING here and are
    thus free of ttwu races".

Signed-off-by: Kirill Tkhai <tkhai@yandex.ru>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org> # v3.14+
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/3056991400578422@web14g.yandex.ru
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 11:51:12 +02:00
Richard Weinberger
b14ed2c273 sched: Fix sched_policy < 0 comparison
attr.sched_policy is u32, therefore a comparison against < 0 is never true.
Fix this by casting sched_policy to int.

This issue was reported by coverity CID 1219934.

Fixes: dbdb22754f ("sched: Disallow sched_attr::sched_policy < 0")
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1401741514-7045-1-git-send-email-richard@nod.at
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 11:07:43 +02:00
Steven Rostedt
e9dd685ce8 sched/numa: Fix use of spin_{un}lock_irq() when interrupts are disabled
As Peter Zijlstra told me, we have the following path:

do_exit()
  exit_itimers()
    itimer_delete()
      spin_lock_irqsave(&timer->it_lock, &flags);
      timer_delete_hook(timer);
        kc->timer_del(timer) := posix_cpu_timer_del()
          put_task_struct()
            __put_task_struct()
              task_numa_free()
                spin_lock(&grp->lock);

Which means that task_numa_free() can be called with interrupts
disabled, which means that we should not be using spin_lock_irq() but
spin_lock_irqsave() instead. Otherwise we are enabling interrupts while
holding an interrupt unsafe lock!

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner<tglx@linutronix.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140527182541.GH11096@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 11:07:41 +02:00
Ingo Molnar
22c91aa235 perf/urgent fixes:
. Fix perf probe to find correct variable DIE (Masami Hiramatsu)
 
 . Fix a segfault in perf probe if asked for variable it doesn't find (Masami Hiramatsu)
 
 Signed-off-by: Jiri Olsa <jolsa@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTjxatAAoJEPZqUSBWB3s9TwMQAJGA0B+NYny3LyCZyQm2hfbf
 eI/bgzZ757JT2/ughf7ccXRBXMlcfhYFh8tIkgR0/Ky9qSlwNt5yA+d7BfxI/hyW
 +TjW9JpxM3pudOzLK25C1Z4g4s2C+E5qPR+IgK3xoHhaEwSNc47SZpK1A9PqdxXo
 hseR7JFFTcaO9xZdFmwjMGbDeNlQ7Juq4EzwKlstuGxL5XkLRHkBXZyUgsOwNP2D
 tiUGbEHFJtVmrCqRpZ0yALAxjTWRPxMhXTTGePvS58sS6bWukG0BkL/0rlFBM0r0
 ro0bsXxZw6JgbPTT7W0iHHCjiMTOXOXo4Eit8hwHFWn9oLubU2DqhknTucX8G0PX
 7dM0sNEgC3VmY3bueqYUEAuqrDN9c+XZYg5nOuqHx2x8lqQyXqfLX67Qf05I5ZMs
 SlAKTcA70ueVvAZh0XoK5QvtbjmSJpWnznHsRbe6qIWlYrTMp9UiGEmnROIMpZ1W
 IZf2rRPVT1Z3Wkhp7LGcoDHOiJkNRZw/8v0Xn7Cl2DvZfFgxG2qemfW966Jzep02
 OOmZyklu6MIsqo0ZbCmA0gDDNXHWWmxdKFqvTfkn8LlFfeQjKPj2/JiEWGsuWZVQ
 2ZVqu5OYGoSj9dU32fKT/o1Kt5JLxHctaQUb1jrhfnB75f6jNbJhgV1eQFZ/+YF6
 JudOUvZ9rlabrV4yY9x8
 =xQAY
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf into perf/urgent

Pull perf/urgent fixes from Jiri Olsa:

 * Fix perf probe to find correct variable DIE (Masami Hiramatsu)

 * Fix a segfault in perf probe if asked for variable it doesn't find (Masami Hiramatsu)

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 09:54:01 +02:00
Vasant Hegde
8b8f7bf4c2 powerpc/powernv: Pass buffer size to OPAL validate flash call
We pass actual buffer size to opal_validate_flash() OPAL API call
and in return it contains output buffer size.

Commit cc146d1d (Fix little endian issues) missed to set the size
param before making OPAL call. So firmware image validation fails.

This patch sets size variable before making OPAL call.

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Tested-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-05 14:54:04 +10:00
Anton Blanchard
c1931e2181 powerpc/pseries: hcall functions are exported to modules, need _GLOBAL_TOC()
The hcall macros may call out to c code for tracing, so we need
to set up a valid r2. This fixes an oops found when testing
ibmvscsi as a module.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-05 13:21:28 +10:00
Anton Blanchard
2ac7b0166a powerpc: Exported functions __clear_user and copy_page use r2 so need _GLOBAL_TOC()
__clear_user and copy_page load from the TOC and are also exported
to modules. This means we have to use _GLOBAL_TOC() so that we
create the global entry point that sets up the TOC.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-05 13:20:41 +10:00
Anton Blanchard
6d97d7a28f powerpc/powernv: Set memory_block_size_bytes to 256MB
powerpc sets a low SECTION_SIZE_BITS to accomodate small pseries
boxes. We default to 16MB memory blocks, and boxes with a lot
of memory end up with enormous numbers of sysfs memory nodes.

Set a more reasonable default for powernv of 256MB.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-05 13:20:40 +10:00
Anton Blanchard
a5d862576a powerpc: Allow ppc_md platform hook to override memory_block_size_bytes
The pseries platform code unconditionally overrides
memory_block_size_bytes regardless of the running platform.

Create a ppc_md hook that so each platform can choose to
do what it wants.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-05 13:20:39 +10:00
Anton Blanchard
223ca9d855 powerpc/powernv: Fix endian issues in memory error handling code
struct OpalMemoryErrorData is passed to us from firmware, so we
have to byteswap it.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-05 13:20:39 +10:00
Wei Yang
2213fb142f powerpc/eeh: Skip eeh sysfs when eeh is disabled
When eeh is not enabled, and hotplug two pci devices on the same bus, eeh
related sysfs would be added twice for the first added pci device. Since the
eeh_dev is not created when eeh is not enabled.

This patch adds the check, if eeh is not enabled, eeh sysfs will not be
created.

After applying this patch, following warnings are reduced:

sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:00.0/eeh_mode'
sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:00.0/eeh_config_addr'
sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:00.0/eeh_pe_config_addr'

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-05 13:20:38 +10:00
Anton Blanchard
5d73320a96 powerpc: 64bit sendfile is capped at 2GB
commit 8f9c0119d7 (compat: fs: Generic compat_sys_sendfile
implementation) changed the PowerPC 64bit sendfile call from
sys_sendile64 to sys_sendfile.

Unfortunately this broke sendfile of lengths greater than 2G because
sys_sendfile caps at MAX_NON_LFS. Restore what we had previously which
fixes the bug.

Cc: stable@vger.kernel.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-05 13:20:38 +10:00
Benjamin Herrenschmidt
fa2dbe2e0f powerpc/powernv: Provide debugfs access to the LPC bus via OPAL
This provides debugfs files to access the LPC bus on Power8
non-virtualized using the appropriate OPAL firmware calls.

The usage is simple: one file per space (IO, MEM and FW),
lseek to the address and read/write the data. IO and MEM always
generate series of byte accesses. FW can generate word and dword
accesses if aligned properly.

Based on an original patch from Rob Lippert and reworked.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-05 13:20:37 +10:00
Benjamin Herrenschmidt
c4cad90f9e powerpc/serial: Use saner flags when creating legacy ports
We had a mix & match of flags used when creating legacy ports
depending on where we found them in the device-tree. Among others
we were missing UPF_SKIP_TEST for some kind of ISA ports which is
a problem as quite a few UARTs out there don't support the loopback
test (such as a lot of BMCs).

Let's pick the set of flags used by the SoC code and generalize it
which means autoconf, no loopback test, irq maybe shared and fixed
port.

Sending to stable as the lack of UPF_SKIP_TEST is breaking
serial on some machines so I want this back into distros

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: stable@vger.kernel.org
2014-06-05 13:20:36 +10:00
Michael Ellerman
91a6151be2 powerpc: Add cpu family documentation
This patch adds some documentation on the different cpu families
supported by arch/powerpc.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-05 13:20:01 +10:00