Commit graph

14298 commits

Author SHA1 Message Date
Linus Torvalds
933425fb00 s390: A bunch of fixes and optimizations for interrupt and time
handling.
 
 PPC: Mostly bug fixes.
 
 ARM: No big features, but many small fixes and prerequisites including:
 - a number of fixes for the arch-timer
 - introducing proper level-triggered semantics for the arch-timers
 - a series of patches to synchronously halt a guest (prerequisite for
   IRQ forwarding)
 - some tracepoint improvements
 - a tweak for the EL2 panic handlers
 - some more VGIC cleanups getting rid of redundant state
 
 x86: quite a few changes:
 
 - support for VT-d posted interrupts (i.e. PCI devices can inject
 interrupts directly into vCPUs).  This introduces a new component (in
 virt/lib/) that connects VFIO and KVM together.  The same infrastructure
 will be used for ARM interrupt forwarding as well.
 
 - more Hyper-V features, though the main one Hyper-V synthetic interrupt
 controller will have to wait for 4.5.  These will let KVM expose Hyper-V
 devices.
 
 - nested virtualization now supports VPID (same as PCID but for vCPUs)
 which makes it quite a bit faster
 
 - for future hardware that supports NVDIMM, there is support for clflushopt,
 clwb, pcommit
 
 - support for "split irqchip", i.e. LAPIC in kernel + IOAPIC/PIC/PIT in
 userspace, which reduces the attack surface of the hypervisor
 
 - obligatory smattering of SMM fixes
 
 - on the guest side, stable scheduler clock support was rewritten to not
 require help from the hypervisor.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWO2IQAAoJEL/70l94x66D/K0H/3AovAgYmJQToZlimsktMk6a
 f2xhdIqfU5lIQQh5uNBCfL3o9o8H9Py1ym7aEw3fmztPHHJYc91oTatt2UEKhmEw
 VtZHp/dFHt3hwaIdXmjRPEXiYctraKCyrhaUYdWmUYkoKi7lW5OL5h+S7frG2U6u
 p/hFKnHRZfXHr6NSgIqvYkKqtnc+C0FWY696IZMzgCksOO8jB1xrxoSN3tANW3oJ
 PDV+4og0fN/Fr1capJUFEc/fejREHneANvlKrLaa8ht0qJQutoczNADUiSFLcMPG
 iHljXeDsv5eyjMtUuIL8+MPzcrIt/y4rY41ZPiKggxULrXc6H+JJL/e/zThZpXc=
 =iv2z
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "First batch of KVM changes for 4.4.

  s390:
     A bunch of fixes and optimizations for interrupt and time handling.

  PPC:
     Mostly bug fixes.

  ARM:
     No big features, but many small fixes and prerequisites including:

      - a number of fixes for the arch-timer

      - introducing proper level-triggered semantics for the arch-timers

      - a series of patches to synchronously halt a guest (prerequisite
        for IRQ forwarding)

      - some tracepoint improvements

      - a tweak for the EL2 panic handlers

      - some more VGIC cleanups getting rid of redundant state

  x86:
     Quite a few changes:

      - support for VT-d posted interrupts (i.e. PCI devices can inject
        interrupts directly into vCPUs).  This introduces a new
        component (in virt/lib/) that connects VFIO and KVM together.
        The same infrastructure will be used for ARM interrupt
        forwarding as well.

      - more Hyper-V features, though the main one Hyper-V synthetic
        interrupt controller will have to wait for 4.5.  These will let
        KVM expose Hyper-V devices.

      - nested virtualization now supports VPID (same as PCID but for
        vCPUs) which makes it quite a bit faster

      - for future hardware that supports NVDIMM, there is support for
        clflushopt, clwb, pcommit

      - support for "split irqchip", i.e.  LAPIC in kernel +
        IOAPIC/PIC/PIT in userspace, which reduces the attack surface of
        the hypervisor

      - obligatory smattering of SMM fixes

      - on the guest side, stable scheduler clock support was rewritten
        to not require help from the hypervisor"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (123 commits)
  KVM: VMX: Fix commit which broke PML
  KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0()
  KVM: x86: allow RSM from 64-bit mode
  KVM: VMX: fix SMEP and SMAP without EPT
  KVM: x86: move kvm_set_irq_inatomic to legacy device assignment
  KVM: device assignment: remove pointless #ifdefs
  KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic
  KVM: x86: zero apic_arb_prio on reset
  drivers/hv: share Hyper-V SynIC constants with userspace
  KVM: x86: handle SMBASE as physical address in RSM
  KVM: x86: add read_phys to x86_emulate_ops
  KVM: x86: removing unused variable
  KVM: don't pointlessly leave KVM_COMPAT=y in non-KVM configs
  KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr()
  KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings
  KVM: arm/arm64: Optimize away redundant LR tracking
  KVM: s390: use simple switch statement as multiplexer
  KVM: s390: drop useless newline in debugging data
  KVM: s390: SCA must not cross page boundaries
  KVM: arm: Do not indent the arguments of DECLARE_BITMAP
  ...
2015-11-05 16:26:26 -08:00
Linus Torvalds
1873499e13 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem update from James Morris:
 "This is mostly maintenance updates across the subsystem, with a
  notable update for TPM 2.0, and addition of Jarkko Sakkinen as a
  maintainer of that"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (40 commits)
  apparmor: clarify CRYPTO dependency
  selinux: Use a kmem_cache for allocation struct file_security_struct
  selinux: ioctl_has_perm should be static
  selinux: use sprintf return value
  selinux: use kstrdup() in security_get_bools()
  selinux: use kmemdup in security_sid_to_context_core()
  selinux: remove pointless cast in selinux_inode_setsecurity()
  selinux: introduce security_context_str_to_sid
  selinux: do not check open perm on ftruncate call
  selinux: change CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default
  KEYS: Merge the type-specific data with the payload data
  KEYS: Provide a script to extract a module signature
  KEYS: Provide a script to extract the sys cert list from a vmlinux file
  keys: Be more consistent in selection of union members used
  certs: add .gitignore to stop git nagging about x509_certificate_list
  KEYS: use kvfree() in add_key
  Smack: limited capability for changing process label
  TPM: remove unnecessary little endian conversion
  vTPM: support little endian guests
  char: Drop owner assignment from i2c_driver
  ...
2015-11-05 15:32:38 -08:00
Michael Ellerman
8bdf2023e2 Merge branch 'next' of git://git.denx.de/linux-denx-agust into next
MPC5xxx updates from Anatolij:

"Highlights include a driver for MPC512x LocalPlus Bus FIFO with its
device tree binding documentation, mpc512x device tree updates and some
minor fixes."
2015-11-05 19:35:12 +11:00
Linus Torvalds
b0f85fa11a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

Changes of note:

 1) Allow to schedule ICMP packets in IPVS, from Alex Gartrell.

 2) Provide FIB table ID in ipv4 route dumps just as ipv6 does, from
    David Ahern.

 3) Allow the user to ask for the statistics to be filtered out of
    ipv4/ipv6 address netlink dumps.  From Sowmini Varadhan.

 4) More work to pass the network namespace context around deep into
    various packet path APIs, starting with the netfilter hooks.  From
    Eric W Biederman.

 5) Add layer 2 TX/RX checksum offloading to qeth driver, from Thomas
    Richter.

 6) Use usec resolution for SYN/ACK RTTs in TCP, from Yuchung Cheng.

 7) Support Very High Throughput in wireless MESH code, from Bob
    Copeland.

 8) Allow setting the ageing_time in switchdev/rocker.  From Scott
    Feldman.

 9) Properly autoload L2TP type modules, from Stephen Hemminger.

10) Fix and enable offload features by default in 8139cp driver, from
    David Woodhouse.

11) Support both ipv4 and ipv6 sockets in a single vxlan device, from
    Jiri Benc.

12) Fix CWND limiting of thin streams in TCP, from Bendik Rønning
    Opstad.

13) Fix IPSEC flowcache overflows on large systems, from Steffen
    Klassert.

14) Convert bridging to track VLANs using rhashtable entries rather than
    a bitmap.  From Nikolay Aleksandrov.

15) Make TCP listener handling completely lockless, this is a major
    accomplishment.  Incoming request sockets now live in the
    established hash table just like any other socket too.

    From Eric Dumazet.

15) Provide more bridging attributes to netlink, from Nikolay
    Aleksandrov.

16) Use hash based algorithm for ipv4 multipath routing, this was very
    long overdue.  From Peter Nørlund.

17) Several y2038 cures, mostly avoiding timespec.  From Arnd Bergmann.

18) Allow non-root execution of EBPF programs, from Alexei Starovoitov.

19) Support SO_INCOMING_CPU as setsockopt, from Eric Dumazet.  This
    influences the port binding selection logic used by SO_REUSEPORT.

20) Add ipv6 support to VRF, from David Ahern.

21) Add support for Mellanox Spectrum switch ASIC, from Jiri Pirko.

22) Add rtl8xxxu Realtek wireless driver, from Jes Sorensen.

23) Implement RACK loss recovery in TCP, from Yuchung Cheng.

24) Support multipath routes in MPLS, from Roopa Prabhu.

25) Fix POLLOUT notification for listening sockets in AF_UNIX, from Eric
    Dumazet.

26) Add new QED Qlogic river, from Yuval Mintz, Manish Chopra, and
    Sudarsana Kalluru.

27) Don't fetch timestamps on AF_UNIX sockets, from Hannes Frederic
    Sowa.

28) Support ipv6 geneve tunnels, from John W Linville.

29) Add flood control support to switchdev layer, from Ido Schimmel.

30) Fix CHECKSUM_PARTIAL handling of potentially fragmented frames, from
    Hannes Frederic Sowa.

31) Support persistent maps and progs in bpf, from Daniel Borkmann.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1790 commits)
  sh_eth: use DMA barriers
  switchdev: respect SKIP_EOPNOTSUPP flag in case there is no recursion
  net: sched: kill dead code in sch_choke.c
  irda: Delete an unnecessary check before the function call "irlmp_unregister_service"
  net: dsa: mv88e6xxx: include DSA ports in VLANs
  net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports
  net/core: fix for_each_netdev_feature
  vlan: Invoke driver vlan hooks only if device is present
  arcnet/com20020: add LEDS_CLASS dependency
  bpf, verifier: annotate verbose printer with __printf
  dp83640: Only wait for timestamps for packets with timestamping enabled.
  ptp: Change ptp_class to a proper bitmask
  dp83640: Prune rx timestamp list before reading from it
  dp83640: Delay scheduled work.
  dp83640: Include hash in timestamp/packet matching
  ipv6: fix tunnel error handling
  net/mlx5e: Fix LSO vlan insertion
  net/mlx5e: Re-eanble client vlan TX acceleration
  net/mlx5e: Return error in case mlx5e_set_features() fails
  net/mlx5e: Don't allow more than max supported channels
  ...
2015-11-04 09:41:05 -08:00
Paolo Bonzini
197a4f4b06 KVM/ARM Changes for v4.4-rc1
Includes a number of fixes for the arch-timer, introducing proper
 level-triggered semantics for the arch-timers, a series of patches to
 synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint
 improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups
 getting rid of redundant state, and finally a stylistic change that gets rid of
 some ctags warnings.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWOhgAAAoJEEtpOizt6ddyKS8H/2ZHMTPjo6yChnrusNWy4Qbr
 6laPDlzL+g45oMQRwNL7GnM1deRftaxvT2Wi+X84D/6Y/BD6MPds4HgtBfuWcSZ1
 CyRJ0Ot/zrxenucSuJuOjq+a9gdizdAczkbB1MfYDULJH8fb6D+7RYLo3zgh4Xo4
 pla3L9U6gSWe+YopBjZtZH43m3fwiwSM/v+uHOTIcXrsbR+fEgx/EFSKmA/DUCuo
 P5cFO/ceUGu7nATCexu5V82TgR2hvurrsR7mqfwY8YcF6HRM+NEOoS29xWC77v5S
 u/F08TKuKQLv0YTEFTyLETI/oEeuC0cHtrRQBNf4+9kXEOzKyXaae0wR/I6X2Ss=
 =GMNk
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/ARM Changes for v4.4-rc1

Includes a number of fixes for the arch-timer, introducing proper
level-triggered semantics for the arch-timers, a series of patches to
synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint
improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups
getting rid of redundant state, and finally a stylistic change that gets rid of
some ctags warnings.

Conflicts:
	arch/x86/include/asm/kvm_host.h
2015-11-04 16:24:17 +01:00
Linus Torvalds
b02ac6b18c Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Kernel side changes:

   - Improve accuracy of perf/sched clock on x86.  (Adrian Hunter)

   - Intel DS and BTS updates.  (Alexander Shishkin)

   - Intel cstate PMU support.  (Kan Liang)

   - Add group read support to perf_event_read().  (Peter Zijlstra)

   - Branch call hardware sampling support, implemented on x86 and
     PowerPC.  (Stephane Eranian)

   - Event groups transactional interface enhancements.  (Sukadev
     Bhattiprolu)

   - Enable proper x86/intel/uncore PMU support on multi-segment PCI
     systems.  (Taku Izumi)

   - ... misc fixes and cleanups.

  The perf tooling team was very busy again with 200+ commits, the full
  diff doesn't fit into lkml size limits.  Here's an (incomplete) list
  of the tooling highlights:

  New features:

   - Change the default event used in all tools (record/top): use the
     most precise "cycles" hw counter available, i.e. when the user
     doesn't specify any event, it will try using cycles:ppp, cycles:pp,
     etc and fall back transparently until it finds a working counter.
     (Arnaldo Carvalho de Melo)

   - Integration of perf with eBPF that, given an eBPF .c source file
     (or .o file built for the 'bpf' target with clang), will get it
     automatically built, validated and loaded into the kernel via the
     sys_bpf syscall, which can then be used and seen using 'perf trace'
     and other tools.

     (Wang Nan)

  Various user interface improvements:

   - Automatic pager invocation on long help output.  (Namhyung Kim)

   - Search for more options when passing args to -h, e.g.: (Arnaldo
     Carvalho de Melo)

        $ perf report -h interface

        Usage: perf report [<options>]

         --gtk    Use the GTK2 interface
         --stdio  Use the stdio interface
         --tui    Use the TUI interface

   - Show ordered command line options when -h is used or when an
     unknown option is specified.  (Arnaldo Carvalho de Melo)

   - If options are passed after -h, show just its descriptions, not all
     options.  (Arnaldo Carvalho de Melo)

   - Implement column based horizontal scrolling in the hists browser
     (top, report), making it possible to use the TUI for things like
     'perf mem report' where there are many more columns than can fit in
     a terminal.  (Arnaldo Carvalho de Melo)

   - Enhance the error reporting of tracepoint event parsing, e.g.:

       $ oldperf record -e sched:sched_switc usleep 1
       event syntax error: 'sched:sched_switc'
                            \___ unknown tracepoint
       Run 'perf list' for a list of valid events

     Now we get the much nicer:

       $ perf record -e sched:sched_switc ls
       event syntax error: 'sched:sched_switc'
                            \___ can't access trace events

       Error: No permissions to read /sys/kernel/debug/tracing/events/sched/sched_switc
       Hint:  Try 'sudo mount -o remount,mode=755 /sys/kernel/debug'

     And after we have those mount point permissions fixed:

       $ perf record -e sched:sched_switc ls
       event syntax error: 'sched:sched_switc'
                            \___ unknown tracepoint

       Error: File /sys/kernel/debug/tracing/events/sched/sched_switc not found.
       Hint:  Perhaps this kernel misses some CONFIG_ setting to enable this feature?.

     I.e.  basically now the event parsing routing uses the strerror_open()
     routines introduced by and used in 'perf trace' work.  (Jiri Olsa)

   - Fail properly when pattern matching fails to find a tracepoint,
     i.e. '-e non:existent' was being correctly handled, with a proper
     error message about that not being a valid event, but '-e
     non:existent*' wasn't, fix it.  (Jiri Olsa)

   - Do event name substring search as last resort in 'perf list'.
     (Arnaldo Carvalho de Melo)

     E.g.:

       # perf list clock

       List of pre-defined events (to be used in -e):

        cpu-clock                                          [Software event]
        task-clock                                         [Software event]

        uncore_cbox_0/clockticks/                          [Kernel PMU event]
        uncore_cbox_1/clockticks/                          [Kernel PMU event]

        kvm:kvm_pvclock_update                             [Tracepoint event]
        kvm:kvm_update_master_clock                        [Tracepoint event]
        power:clock_disable                                [Tracepoint event]
        power:clock_enable                                 [Tracepoint event]
        power:clock_set_rate                               [Tracepoint event]
        syscalls:sys_enter_clock_adjtime                   [Tracepoint event]
        syscalls:sys_enter_clock_getres                    [Tracepoint event]
        syscalls:sys_enter_clock_gettime                   [Tracepoint event]
        syscalls:sys_enter_clock_nanosleep                 [Tracepoint event]
        syscalls:sys_enter_clock_settime                   [Tracepoint event]
        syscalls:sys_exit_clock_adjtime                    [Tracepoint event]
        syscalls:sys_exit_clock_getres                     [Tracepoint event]
        syscalls:sys_exit_clock_gettime                    [Tracepoint event]
        syscalls:sys_exit_clock_nanosleep                  [Tracepoint event]
        syscalls:sys_exit_clock_settime                    [Tracepoint event]

  Intel PT hardware tracing enhancements:

   - Accept a zero --itrace period, meaning "as often as possible".  In
     the case of Intel PT that is the same as a period of 1 and a unit
     of 'instructions' (i.e.  --itrace=i1i).  (Adrian Hunter)

   - Harmonize itrace's synthesized callchains with the existing
     --max-stack tool option.  (Adrian Hunter)

   - Allow time to be displayed in nanoseconds in 'perf script'.
     (Adrian Hunter)

   - Fix potential infinite loop when handling Intel PT timestamps.
     (Adrian Hunter)

   - Slighly improve Intel PT debug logging.  (Adrian Hunter)

   - Warn when AUX data has been lost, just like when processing
     PERF_RECORD_LOST.  (Adrian Hunter)

   - Further document export-to-postgresql.py script.  (Adrian Hunter)

   - Add option to synthesize branch stack from auxtrace data.  (Adrian
     Hunter)

  Misc notable changes:

   - Switch the default callchain output mode to 'graph,0.5,caller', to
     make it look like the default for other tools, reducing the
     learning curve for people used to 'caller' based viewing.  (Arnaldo
     Carvalho de Melo)

   - various call chain usability enhancements.  (Namhyung Kim)

   - Introduce the 'P' event modifier, meaning 'max precision level,
     please', i.e.:

        $ perf record -e cycles:P usleep 1

     Is now similar to:

        $ perf record usleep 1

     Useful, for instance, when specifying multiple events.  (Jiri Olsa)

   - Add 'socket' sort entry, to sort by the processor socket in 'perf
     top' and 'perf report'.  (Kan Liang)

   - Introduce --socket-filter to 'perf report', for filtering by
     processor socket.  (Kan Liang)

   - Add new "Zoom into Processor Socket" operation in the perf hists
     browser, used in 'perf top' and 'perf report'.  (Kan Liang)

   - Allow probing on kmodules without DWARF.  (Masami Hiramatsu)

   - Fix 'perf probe -l' for probes added to kernel module functions.
     (Masami Hiramatsu)

   - Preparatory work for the 'perf stat record' feature that will allow
     generating perf.data files with counting data in addition to the
     sampling mode we have now (Jiri Olsa)

   - Update libtraceevent KVM plugin.  (Paolo Bonzini)

   - ... plus lots of other enhancements that I failed to list properly,
     by: Adrian Hunter, Alexander Shishkin, Andi Kleen, Andrzej Hajda,
     Arnaldo Carvalho de Melo, Dima Kogan, Don Zickus, Geliang Tang, He
     Kuang, Huaitong Han, Ingo Molnar, Jan Stancek, Jiri Olsa, Kan
     Liang, Kirill Tkhai, Masami Hiramatsu, Matt Fleming, Namhyung Kim,
     Paolo Bonzini, Peter Zijlstra, Rabin Vincent, Scott Wood, Stephane
     Eranian, Sukadev Bhattiprolu, Taku Izumi, Vaishali Thakkar, Wang
     Nan, Yang Shi and Yunlong Song"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (260 commits)
  perf unwind: Pass symbol source to libunwind
  tools build: Fix libiberty feature detection
  perf tools: Compile scriptlets to BPF objects when passing '.c' to --event
  perf record: Add clang options for compiling BPF scripts
  perf bpf: Attach eBPF filter to perf event
  perf tools: Make sure fixdep is built before libbpf
  perf script: Enable printing of branch stack
  perf trace: Add cmd string table to decode sys_bpf first arg
  perf bpf: Collect perf_evsel in BPF object files
  perf tools: Load eBPF object into kernel
  perf tools: Create probe points for BPF programs
  perf tools: Enable passing bpf object file to --event
  perf ebpf: Add the libbpf glue
  perf tools: Make perf depend on libbpf
  perf symbols: Fix endless loop in dso__split_kallsyms_for_kcore
  perf tools: Enable pre-event inherit setting by config terms
  perf symbols: we can now read separate debug-info files based on a build ID
  perf symbols: Fix type error when reading a build-id
  perf tools: Search for more options when passing args to -h
  perf stat: Cache aggregated map entries in extra cpumap
  ...
2015-11-03 17:38:09 -08:00
Linus Torvalds
6aa2fdb87c Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
 "The irq departement delivers:

   - Rework the irqdomain core infrastructure to accomodate ACPI based
     systems.  This is required to support ARM64 without creating
     artificial device tree nodes.

   - Sanitize the ACPI based ARM GIC initialization by making use of the
     new firmware independent irqdomain core

   - Further improvements to the generic MSI management

   - Generalize the irq migration on CPU hotplug

   - Improvements to the threaded interrupt infrastructure

   - Allow the migration of "chained" low level interrupt handlers

   - Allow optional force masking of interrupts in disable_irq[_nosysnc]

   - Support for two new interrupt chips - Sigh!

   - A larger set of errata fixes for ARM gicv3

   - The usual pile of fixes, updates, improvements and cleanups all
     over the place"

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
  Document that IRQ_NONE should be returned when IRQ not actually handled
  PCI/MSI: Allow the MSI domain to be device-specific
  PCI: Add per-device MSI domain hook
  of/irq: Use the msi-map property to provide device-specific MSI domain
  of/irq: Split of_msi_map_rid to reuse msi-map lookup
  irqchip/gic-v3-its: Parse new version of msi-parent property
  PCI/MSI: Use of_msi_get_domain instead of open-coded "msi-parent" parsing
  of/irq: Use of_msi_get_domain instead of open-coded "msi-parent" parsing
  of/irq: Add support code for multi-parent version of "msi-parent"
  irqchip/gic-v3-its: Add handling of PCI requester id.
  PCI/MSI: Add helper function pci_msi_domain_get_msi_rid().
  of/irq: Add new function of_msi_map_rid()
  Docs: dt: Add PCI MSI map bindings
  irqchip/gic-v2m: Add support for multiple MSI frames
  irqchip/gic-v3: Fix translation of LPIs after conversion to irq_fwspec
  irqchip/mxs: Add Alphascale ASM9260 support
  irqchip/mxs: Prepare driver for hardware with different offsets
  irqchip/mxs: Panic if ioremap or domain creation fails
  irqdomain: Documentation updates
  irqdomain/msi: Use fwnode instead of of_node
  ...
2015-11-03 14:40:01 -08:00
Michael Ellerman
3b0e21ec3b Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next
Freescale updates from Scott:

"Highlights include 64-bit book3e kexec/kdump support, a rework of the
qoriq clock driver, device tree changes including qoriq fman nodes,
support for a new 85xx board, and some fixes.

Note that there is a trivial merge conflict with the clock tree's next
branch, in the clock Makefile."
2015-11-02 13:59:48 +11:00
David S. Miller
b75ec3af27 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-11-01 00:15:30 -04:00
Linus Torvalds
8a28d67457 powerpc fixes for 4.3 #5
- powerpc/dma: dma_set_coherent_mask() should not be GPL only from Ben
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWMFthAAoJEFHr6jzI4aWAZFkP/1Ci8cgh0GIfdN3wafi85pIe
 dhSBicWpHPpwnvcaD+ab/Q73/nfub6NKTpMasBankf53SYUWmPptwu+Gs0lv7GVv
 Q1AjOEP0vhcBVxhXNJpOWDGFyt5wez4BevqWpmPuz9MEy7PCpgevQJjPEsG8qnzh
 BfUvDkWAOjA6jX3mTyZQyefZECGPhDsM31eGYvkFQRB6Ui1KKbYbFMncjjTtBfjW
 fiUnKShJZIcVtpImGg6WQRKjRaT7uJzmFOLUclv/7dDoT8hLztiiLP44zX+kvJtF
 AyCBSQI/LOrQE826YpF93Vq6WZVvzNQZaPIHPC2pj1WIF6Q/gTTng5T16jODX+QI
 ldQ96sLr9WA49UaS6JakA7iv/OhRXUlRO6qyow8I+1XQdoY5yuVyXct9NpQ3A9J6
 lOw7d8o4RuMaGqMx5nUcyDrv/NICYuuyaikDe+8GY4Df6FG6Aepit1WO6BcDkrfC
 FrnQ+XROmvNoKbVRYYhJZRhEqz1PJJwOmCtNYNzjiJXfgKNknYPBfoBXqx8XeDxl
 VTyxer8V/wMP50X+SIg558YqNIXMj9ZA7fU5xnN4Y7MbHC5DEBnVX9nmc0J5PjTF
 t36S+cm1BeG44/LTSrH66DwsnWce0xo0doK7wBx+CKSGXVciu1VPn59upL1UtX5z
 WnyEnMm5NYc5aTGfZf+L
 =sFFh
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.3-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:
 - powerpc/dma: dma_set_coherent_mask() should not be GPL only from Ben

* tag 'powerpc-4.3-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/dma: dma_set_coherent_mask() should not be GPL only
2015-10-28 18:59:53 +09:00
Benjamin Herrenschmidt
977bf062bb powerpc/dma: dma_set_coherent_mask() should not be GPL only
When turning this from inline to an exported function I was a bit
over-eager and made it GPL only. This prevents the use of pretty much
all non-GPL PCI driver which is a bit over the top. Let's bring it
back in line with other architecture.

Fixes: 817820b022 ("powerpc/iommu: Support "hybrid" iommu/direct DMA ops for coherent_mask < dma_mask")
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-28 14:20:50 +09:00
Denis Kirjanov
ccde64b51b powerpc/msi: Fix section mismatch warning in msi_bitmap_alloc()
Building with CONFIG_DEBUG_SECTION_MISMATCH gives the following warning:

  The function .msi_bitmap_alloc() references
  the function __init .memblock_virt_alloc_try_nid().

Memory allocation in msi_bitmap_alloc() uses either slab allocator or
memblock boot time allocator depending on slab_is_available().

So the section mismatch warning is correct, but in practice there is no
bug so mark msi_bitmap_alloc() as __init_refok.

Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
[mpe: Flesh out change log a bit]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-28 12:08:33 +09:00
Michael Ellerman
16c1d60626 powerpc/prom: Use of_get_next_parent() in of_get_ibm_chip_id()
Use of_get_next_parent() to simplifiy the logic in of_get_ibm_chip_id().

Original-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-28 12:08:32 +09:00
Nathan Fontenot
f755ecfb8c powerpc/pseries: Correct string length in pseries_of_derive_parent()
Commit a030e1e4bb make a change to use
kstrndup() instead of kmalloc() + strlcpy() in the pseries_of_derive_parent()
routine that introduces a subtle change in the parent path name generated.
The kstrndup() routine will copy n characters followed by a terminating null,
whereas strlcpy() will copy n-1 characters and add a terminating null.

This slight difference results in having a parent path that includes the
tailing '/' character, "/cpus/" vs. "/cpus". This then causes the subsequent
call to of_find_node_by_path() to fail, and in the case of DLPAR add
operations the DLPAR request fails.

This patch decrements the pointer returned from kbasename() to point to the
'/' character before the base name instead of the base name. This then
adjusts the string length calculations to not include the trailing '/'
in the parent path name.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-28 12:08:18 +09:00
Kevin Hao
e1f580e8ce powerpc/e6500: hw tablewalk: make sure we invalidate and write to the same tlb entry
In order to workaround Erratum A-008139, we have to invalidate the
tlb entry with tlbilx before overwriting. Due to the performance
consideration, we don't add any memory barrier when acquire/release
the tcd lock. This means the two load instructions for esel_next do
have the possibility to return different value. This is definitely
not acceptable due to the Erratum A-008139. We have two options to
fix this issue:
  a) Add memory barrier when acquire/release tcd lock to order the
     load/store to esel_next.
  b) Just make sure to invalidate and write to the same tlb entry and
     tolerate the race that we may get the wrong value and overwrite
     the tlb entry just updated by the other thread.

We observe better performance using option b. So reserve an additional
register to save the value of the esel_next.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-27 18:14:40 -05:00
Igal Liberman
da414bb923 powerpc/mpc85xx: Add FSL QorIQ DPAA FMan support to the SoC device tree(s)
Based on prior work by Andy Fleming <afleming@freescale.com>

Signed-off-by: Shruti Kanetkar <Shruti@freescale.com>
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Igal Liberman <Igal.Liberman@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-27 18:14:39 -05:00
Igal Liberman
d55ad2967d powerpc/mpc85xx: Create dts components for the FSL QorIQ DPAA FMan
Based on prior work by Andy Fleming <afleming@freescale.com>

Signed-off-by: Shruti Kanetkar <Shruti@freescale.com>
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Igal Liberman <Igal.Liberman@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-27 18:14:39 -05:00
Scott Wood
7a1db41d83 powerpc/fsl: Add #clock-cells and clockgen label to clockgen nodes
This allows new-style clock references to be used, which is needed for
fman.  The old clock nodes will be removed and all clock references
converted to new-style once the qoriq-cpufreq driver is updated to stop
depending on the old-style references in cpu nodes.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-27 18:14:38 -05:00
Scott Wood
43f2cfcce2 Merge branch 'clock' into HEAD
This is a major overhaul of the clk-qoriq driver, which I'm merging
via PPC with Stephen Boyd's ack in order to apply subsequent PPC patches
that depend on it.
2015-10-27 18:14:16 -05:00
Christophe Leroy
9d28cc811b powerpc: handle error case in cpm_muram_alloc()
rh_alloc() returns (unsigned long)-ERRxx on error, which may
result in overwriting memory outside the MURAM AREA.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-27 18:13:31 -05:00
Sudeep Holla
9100d20c5b powerpc: mpic: use IRQCHIP_SKIP_SET_WAKE instead of redundant mpic_irq_set_wake
mpic_irq_set_wake return -ENXIO for non FSL MPIC and sets IRQF_NO_SUSPEND
flag for FSL ones. enable_irq_wake already returns -ENXIO if irq_set_wak
is not implemented. Also there's no need to set the IRQF_NO_SUSPEND flag
as it doesn't guarantee wakeup for that interrupt.

This patch removes the redundant mpic_irq_set_wake and sets the
IRQCHIP_SKIP_SET_WAKE for only FSL MPIC.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Scott Wood <scottwood@freescale.com>
Cc: Hongtao Jia <hongtao.jia@freescale.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-27 18:13:31 -05:00
Tiejun Chen
96eea6426f powerpc/book3e-64: Enable kexec
Allow KEXEC for book3e, and bypass or convert non-book3e stuff
in kexec code.

Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
[scottwood@freescale.com: move code to minimize diff, and cleanup]
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-27 18:13:30 -05:00
Scott Wood
ae73e4ccbc powerpc/book3e-64/kexec: Set "r4 = 0" when entering spinloop
book3e_secondary_core_init will only create a TLB entry if r4 = 0,
so do so.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-27 18:13:30 -05:00
Scott Wood
ffda09a994 powerpc/booke: Only use VIRT_PHYS_OFFSET on booke32
The way VIRT_PHYS_OFFSET is not correct on book3e-64, because
it does not account for CONFIG_RELOCATABLE other than via the
32-bit-only virt_phys_offset.

book3e-64 can (and if the comment about a GCC miscompilation is still
relevant, should) use the normal ppc64 __va/__pa.

At this point, only booke-32 will use VIRT_PHYS_OFFSET, so given the
issues with its calculation, restrict its definition to booke-32.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-27 18:13:29 -05:00
Scott Wood
567cf94dc7 powerpc/book3e-64/kexec: Enable SMP release
The SMP release mechanism for FSL book3e is different from when booting
with normal hardware.  In theory we could simulate the normal spin
table mechanism, but not at the addresses U-Boot put in the device tree
-- so there'd need to be even more communication between the kernel and
kexec to set that up.  Instead, kexec-tools will set a boolean property
linux,booted-from-kexec in the /chosen node.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: devicetree@vger.kernel.org
2015-10-27 18:13:29 -05:00
Tiejun Chen
cf904e3088 powerpc/book3e-64/kexec: create an identity TLB mapping
book3e has no real MMU mode so we have to create an identity TLB
mapping to make sure we can access the real physical address.

Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
[scottwood: cleanup, and split off some changes]
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-27 18:13:28 -05:00
Scott Wood
ecc4999f68 powerpc/book3e-64: Don't limit paca to 256 MiB
This limit only makes sense on book3s, and on book3e it can cause
problems with kdump if we don't have any memory under 256 MiB.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-27 18:13:28 -05:00
Scott Wood
eeaab663a0 powerpc/book3e/kdump: Enable crash_kexec_wait_realmode
While book3e doesn't have "real mode", we still want to wait for
all the non-crash cpus to complete their shutdown.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-27 18:13:27 -05:00
Tiejun Chen
1cb6e06492 powerpc/book3e: support CONFIG_RELOCATABLE
book3e is different with book3s since 3s includes the exception
vectors code in head_64.S as it relies on absolute addressing
which is only possible within this compilation unit. So we have
to get that label address with got.

And when boot a relocated kernel, we should reset ipvr properly again
after .relocate.

Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
[scottwood: cleanup and ifdef removal]
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-27 18:13:27 -05:00
Tiejun Chen
835c031c98 powerpc/booke64: Fix args to copy_and_flush
Convert r4/r5, not r6, to a virtual address when calling
copy_and_flush.  Otherwise, r3 is already virtual, and copy_to_flush
tries to access r3+r6, PAGE_OFFSET gets added twice.

This isn't normally seen because on book3e we normally enter with
the kernel at zero and thus skip copy_to_flush -- but it will be
needed for kexec support.

Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
[scottwood: split patch and rewrote changelog]
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-27 18:13:26 -05:00
Tiejun Chen
68d1014019 powerpc/book3e-64: rename interrupt_end_book3e with __end_interrupts
Rename 'interrupt_end_book3e' to '__end_interrupts' so that the symbol
can be used by both book3s and book3e.

Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
[scottwood: edit changelog]
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-27 18:13:26 -05:00
Scott Wood
f34b3e19fd powerpc/e6500: kexec: Handle hardware threads
The new kernel will be expecting secondary threads to be disabled,
not spinning.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-27 18:13:25 -05:00
Tiejun Chen
939fbf0080 powerpc/85xx: Implement 64-bit kexec support
Unlike 32-bit 85xx kexec, we don't do a core reset.

Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
[scottwood: edit changelog, and cleanup]
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-27 18:13:25 -05:00
Scott Wood
eba5de8dc1 powerpc/fsl-booke-64: Don't limit ppc64_rma_size to one TLB entry
This is required for kdump to work when loaded at at an address that
does not fall within the first TLB entry -- which can easily happen
because while the lower limit is enforced via reserved memory, which
doesn't affect how much is mapped, the upper limit is enforced via a
different mechanism that does.  Thus, more TLB entries are needed than
would normally be used, as the total memory to be mapped might not be a
power of two.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-27 18:13:22 -05:00
Linus Torvalds
a2c01ed5d4 powerpc fixes for 4.3 #4
- Revert "Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8" from Paul
  - Handle irq_happened flag correctly in off-line loop from Paul
  - Validate rtas.entry before calling enter_rtas() from Vasant
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWKakXAAoJEFHr6jzI4aWAqwsP/22a8aq7SPuPaCK/c7u84aHg
 UkzYNdrC+osDvnrydqyJsmIojVLX+AsJ7TCbBZFaJ6oOuew9bVzmZ5mfNvOVn86H
 dCH2GMr4NbWbkO3SaFi6ZTUVGl7JyjEhf3uCtKGssa2+Do8FubK6Y88L1rhzFFdz
 l1Dx3Jp8CpGKcByQfwYyaNZhC/GEZ06pY36d362mLnyctxcQRYr5l+8boDH81nyE
 f89RE7baNPYOL0YOhZAh3ZilBrZ8DIAaesMXU8LUKFbLTBgWfVPkDy3l5a2m47oP
 V/Yi+oEQBkL/3Itth57iGWpl8vVkzF2MTu8Aep3BzHJXqXCliTzVVdXETW6NCdut
 Nss0xtnNdM18+0mhG3LzzdoZGi/Zb0SYz8j+nY5vE2nf7FDVFkAZzKHeW822zNaV
 A1PLJa+ei4jVhKtTp4wETjpUi+kw+ikM+rR1L1/+IKHbriLsRrj7Zw3xo6Em1KVq
 cI2g7DZLSzptIprxbEv9rNhb1VlBot4jc4mmGhmyMlwKDkpCxRkYVv+Ilfi6jCSc
 6llKTZfKqEV+0sXO6QISv8wfiye84jVTKOlkpQLvpugz9rBTpq8apmInVh4AHF2b
 wDRgs/iyOSZuz+UiPEHHXbW7ZfF2F7lqxxtQgJefiWLDsvBbsfnVTyDJsKibvWzb
 lxorlKx/tH/q4pNBjmoB
 =K7eA
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Revert "Use the POWER8 Micro Partition Prefetch Engine in KVM HV on
   POWER8" from Paul
 - Handle irq_happened flag correctly in off-line loop from Paul
 - Validate rtas.entry before calling enter_rtas() from Vasant

* tag 'powerpc-4.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/rtas: Validate rtas.entry before calling enter_rtas()
  powerpc/powernv: Handle irq_happened flag correctly in off-line loop
  powerpc: Revert "Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8"
2015-10-23 18:49:51 +09:00
Scott Wood
d9e1831a42 powerpc/85xx: Load all early TLB entries at once
Use an AS=1 trampoline TLB entry to allow all normal TLB1 entries to
be loaded at once.  This avoids the need to keep the translation that
code is executing from in the same TLB entry in the final TLB
configuration as during early boot, which in turn is helpful for
relocatable kernels (e.g. kdump) where the kernel is not running from
what would be the first TLB entry.

On e6500, we limit map_mem_in_cams() to the primary hwthread of a
core (the boot cpu is always considered primary, as a kdump kernel
can be entered on any cpu).  Each TLB only needs to be set up once,
and when we do, we don't want another thread to be running when we
create a temporary trampoline TLB1 entry.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-22 22:50:46 -05:00
Christoffer Dall
3217f7c25b KVM: Add kvm_arch_vcpu_{un}blocking callbacks
Some times it is useful for architecture implementations of KVM to know
when the VCPU thread is about to block or when it comes back from
blocking (arm/arm64 needs to know this to properly implement timers, for
example).

Therefore provide a generic architecture callback function in line with
what we do elsewhere for KVM generic-arch interactions.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22 23:01:41 +02:00
Himangi Saraogi
39e69f55f8 powerpc: Introduce the use of the managed version of kzalloc
This patch moves data allocated using kzalloc to managed data allocated
using devm_kzalloc and cleans now unnecessary kfree in probe function.

The following Coccinelle semantic patch was used for making the change:

@platform@
identifier p, probefn, removefn;
@@
struct platform_driver p = {
  .probe = probefn,
  .remove = removefn,
};

@prb@
identifier platform.probefn, pdev;
expression e, e1, e2;
@@
probefn(struct platform_device *pdev, ...) {
  <+...
- e = kzalloc(e1, e2)
+ e = devm_kzalloc(&pdev->dev, e1, e2)
  ...
?-kfree(e);
  ...+>
}

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
2015-10-22 16:13:23 +02:00
Uwe Kleine-König
ffcea122c4 powerpc: mpc512x: drop bogus and unused psc register bit definitions
These were introduced in commit 25ae3a0739 ("[POWERPC] mpc512x: Add
MPC512x PSC support to MPC52xx psc driver") and never used. Moreover
according to the datasheet[1] MEMERROR is bit 25 (0x40) and ORERR is
bit 27 (0x10).

[1] MPC5125RM Rev. 2; 11/2009

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
2015-10-22 16:06:08 +02:00
Alexander Popov
de03fe287b powerpc/512x: add a device tree binding for LocalPlus Bus FIFO
Add a device tree binding for Freescale MPC512x LocalPlus Bus FIFO and
introduce the document describing that binding.

Signed-off-by: Alexander Popov <alex.popov@linux.com>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
2015-10-22 15:20:47 +02:00
Alexander Popov
1a4bb93f79 powerpc/512x: add LocalPlus Bus FIFO device driver
This driver for Freescale MPC512x LocalPlus Bus FIFO (called SCLPC
in the Reference Manual) allows Direct Memory Access transfers
between RAM and peripheral devices on LocalPlus Bus.

Signed-off-by: Alexander Popov <alex.popov@linux.com>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
2015-10-22 15:19:40 +02:00
Luis de Bethencourt
7e610890b5 powerpc: platforms: mpc52xx_lpbfifo: Fix module autoload for OF platform driver
This platform driver has a OF device ID table but the OF module
alias information is not created so module autoloading won't work.

Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
2015-10-22 14:36:53 +02:00
Vasant Hegde
8832317f66 powerpc/rtas: Validate rtas.entry before calling enter_rtas()
Currently we do not validate rtas.entry before calling enter_rtas(). This
leads to a kernel oops when user space calls rtas system call on a powernv
platform (see below). This patch adds code to validate rtas.entry before
making enter_rtas() call.

  Oops: Exception in kernel mode, sig: 4 [#1]
  SMP NR_CPUS=1024 NUMA PowerNV
  task: c000000004294b80 ti: c0000007e1a78000 task.ti: c0000007e1a78000
  NIP: 0000000000000000 LR: 0000000000009c14 CTR: c000000000423140
  REGS: c0000007e1a7b920 TRAP: 0e40   Not tainted  (3.18.17-340.el7_1.pkvm3_1_0.2400.1.ppc64le)
  MSR: 1000000000081000 <HV,ME>  CR: 00000000  XER: 00000000
  CFAR: c000000000009c0c SOFTE: 0
  NIP [0000000000000000]           (null)
  LR [0000000000009c14] 0x9c14
  Call Trace:
  [c0000007e1a7bba0] [c00000000041a7f4] avc_has_perm_noaudit+0x54/0x110 (unreliable)
  [c0000007e1a7bd80] [c00000000002ddc0] ppc_rtas+0x150/0x2d0
  [c0000007e1a7be30] [c000000000009358] syscall_exit+0x0/0x98

Cc: stable@vger.kernel.org # v3.2+
Fixes: 55190f8878 ("powerpc: Add skeleton PowerNV platform")
Reported-by: NAGESWARA R. SASTRY <nasastry@in.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
[mpe: Reword change log, trim oops, and add stable + fixes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-22 11:03:25 +11:00
Scott Wood
9484865447 powerpc/fsl: Move fsl_guts.h out of arch/powerpc
Freescale's Layerscape ARM chips use the same structure.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-21 18:05:50 -05:00
Paul Mackerras
53c656c413 powerpc/powernv: Handle irq_happened flag correctly in off-line loop
This fixes a bug where it is possible for an off-line CPU to fail to go
into a low-power state (nap/sleep/winkle), and to become unresponsive to
requests from the KVM subsystem to wake up and run a VCPU. What can
happen is that a maskable interrupt of some kind (external, decrementer,
hypervisor doorbell, or HMI) after we have called local_irq_disable() at
the beginning of pnv_smp_cpu_kill_self() and before interrupts are
hard-disabled inside power7_nap/sleep/winkle(). In this situation, the
pending event is marked in the irq_happened flag in the PACA. This
pending event prevents power7_nap/sleep/winkle from going to the
requested low-power state; instead they return immediately. We don't
deal with any of these pending event flags in the off-line loop in
pnv_smp_cpu_kill_self() because power7_nap et al. return 0 in this case,
so we will have srr1 == 0, and none of the processing to clear
interrupts or doorbells will be done.

Usually, the most obvious symptom of this is that a KVM guest will fail
with a console message saying "KVM: couldn't grab cpu N".

This fixes the problem by making sure we handle the irq_happened flags
properly. First, we hard-disable before the off-line loop. Once we have
hard-disabled, the irq_happened flags can't change underneath us. We
unconditionally clear the DEC and HMI flags: there is no processing of
timer interrupts while off-line, and the necessary HMI processing is all
done in lower-level code. We leave the EE and DBELL flags alone for the
first iteration of the loop, so that we won't fail to respond to a
split-core request that came in just before hard-disabling. Within the
loop, we handle external interrupts if the EE bit is set in irq_happened
as well as if the low-power state was interrupted by an external
interrupt. (We don't need to do the msgclr for a pending doorbell in
irq_happened, because doorbells are edge-triggered and don't remain
pending in hardware.) Then we clear both the EE and DBELL flags, and
once clear, they cannot be set again (until this CPU comes online again,
that is).

This also fixes the debug check to not be done when we just ran a KVM
guest or when the sleep didn't happen because of a pending event in
irq_happened.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-21 20:52:49 +11:00
Paul Mackerras
23316316c1 powerpc: Revert "Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8"
This reverts commit 9678cdaae9 ("Use the POWER8 Micro Partition
Prefetch Engine in KVM HV on POWER8") because the original commit had
multiple, partly self-cancelling bugs, that could cause occasional
memory corruption.

In fact the logmpp instruction was incorrectly using register r0 as the
source of the buffer address and operation code, and depending on what
was in r0, it would either do nothing or corrupt the 64k page pointed to
by r0.

The logmpp instruction encoding and the operation code definitions could
be corrected, but then there is the problem that there is no clearly
defined way to know when the hardware has finished writing to the
buffer.

The original commit attempted to work around this by aborting the
write-out before starting the prefetch, but this is ineffective in the
case where the virtual core is now executing on a different physical
core from the one where the write-out was initiated.

These problems plus advice from the hardware designers not to use the
function (since the measured performance improvement from using the
feature was actually mostly negative), mean that reverting the code is
the best option.

Fixes: 9678cdaae9 ("Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8")
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-21 20:50:30 +11:00
Gavin Shan
353169acf1 powerpc/eeh: Fix recursive fenced PHB on Broadcom shiner adapter
Similar to commit b6541db ("powerpc/eeh: Block PCI config access
upon frozen PE"), this blocks the PCI config space of Broadcom
Shiner adapter until PE reset is completed, to avoid recursive
fenced PHB when dumping PCI config registers during the period
of error recovery.

   ~# lspci -ns 0003:03:00.0
   0003:03:00.0 0200: 14e4:168a (rev 10)
   ~# lspci -s 0003:03:00.0
   0003:03:00.0 Ethernet controller: Broadcom Corporation \
                NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10)

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-21 20:42:16 +11:00
Gavin Shan
f9433718d6 powerpc/powernv: Simplify pnv_eeh_set_option()
This simplifies pnv_eeh_set_option() to avoid unnecessary nested
if statements, to improve readability. No functional changes.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-21 20:42:15 +11:00
Gavin Shan
4d6186ca6f powerpc/powernv: Remove pnv_eeh_cap_start()
This moves the logic of pnv_eeh_cap_start() to pnv_eeh_find_cap()
as the function is only called by pnv_eeh_find_cap(). The logic
of both functions are pretty simple. No need to have separate
functions.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-21 20:42:15 +11:00
Gavin Shan
608fb9c296 powerpc/powernv: Cleanup on EEH comments
This applies cleanup on eeh-powernv.c, no functional changes:

   * Remove unnecessary comments and empty line.
   * Correct inaccurate comments.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-21 20:42:15 +11:00