Commit graph

20906 commits

Author SHA1 Message Date
Rafael J. Wysocki
b5e82233ca Merge branch 'pm-tools'
* pm-tools:
  tools/power turbostat: relax dependency on APERF_MSR
  tools/power turbostat: relax dependency on invariant TSC
  tools/power turbostat: decode MSR_*_PERF_LIMIT_REASONS
  tools/power turbostat: relax dependency on root permission
  cpupower Makefile change to help run the tool without 'make install'
2015-02-10 16:11:26 +01:00
Rafael J. Wysocki
7bc95d4ef1 Merge branch 'pm-cpufreq'
* pm-cpufreq: (46 commits)
  intel_pstate: provide option to only use intel_pstate with HWP
  cpufreq-dt: Drop unnecessary check before cpufreq_cooling_unregister() invocation
  cpufreq: Create for_each_governor()
  cpufreq: Create for_each_policy()
  cpufreq: Drop cpufreq_disabled() check from cpufreq_cpu_{get|put}()
  cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
  intel_pstate: honor user space min_perf_pct override on resume
  intel_pstate: respect cpufreq policy request
  intel_pstate: Add num_pstates to sysfs
  intel_pstate: expose turbo range to sysfs
  intel_pstate: Add support for SkyLake
  cpufreq: stats: drop unnecessary locking
  cpufreq: stats: don't update stats on false notifiers
  cpufreq: stats: don't update stats from show_trans_table()
  cpufreq: stats: time_in_state can't be NULL in cpufreq_stats_update()
  cpufreq: stats: create sysfs group once we are ready
  cpufreq: remove CPUFREQ_UPDATE_POLICY_CPU notifications
  cpufreq: stats: drop 'cpu' field of struct cpufreq_stats
  cpufreq: Remove (now) unused 'last_cpu' from struct cpufreq_policy
  cpufreq: stats: rename 'struct cpufreq_stats' objects as 'stats'
  ...
2015-02-10 16:10:44 +01:00
Rafael J. Wysocki
8fbcf5ecb3 Merge branch 'acpi-resources'
* acpi-resources: (23 commits)
  Merge branch 'pci/host-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci into acpi-resources
  x86/irq, ACPI: Implement ACPI driver to support IOAPIC hotplug
  ACPI: Add interfaces to parse IOAPIC ID for IOAPIC hotplug
  x86/PCI: Refine the way to release PCI IRQ resources
  x86/PCI/ACPI: Use common ACPI resource interfaces to simplify implementation
  x86/PCI: Fix the range check for IO resources
  PCI: Use common resource list management code instead of private implementation
  resources: Move struct resource_list_entry from ACPI into resource core
  ACPI: Introduce helper function acpi_dev_filter_resource_type()
  ACPI: Add field offset to struct resource_list_entry
  ACPI: Translate resource into master side address for bridge window resources
  ACPI: Return translation offset when parsing ACPI address space resources
  ACPI: Enforce stricter checks for address space descriptors
  ACPI: Set flag IORESOURCE_UNSET for unassigned resources
  ACPI: Normalize return value of resource parser functions
  ACPI: Fix a bug in parsing ACPI Memory24 resource
  ACPI: Add prefetch decoding to the address space parser
  ACPI: Move the window flag logic to the combined parser
  ACPI: Unify the parsing of address_space and ext_address_space
  ACPI: Let the parser return false for disabled resources
  ...
2015-02-10 16:05:16 +01:00
Rafael J. Wysocki
ca45c879c2 Merge branches 'acpi-doc', 'acpi-pm', 'acpi-pcc' and 'acpi-tables'
* acpi-doc:
  MAINTAINERS / ACPI: add the necessary '/' according to entry rules
  ACPI / Documentation: add a missing '='

* acpi-pm:
  ACPI / sleep: mark acpi_sleep_dmi_check() __init

* acpi-pcc:
  ACPI / PCC: Use pr_debug() for debug messages in pcc_init()

* acpi-tables:
  ACPI / table: remove duplicate NULL check for the handler of acpi_table_parse()
2015-02-10 16:04:12 +01:00
Rafael J. Wysocki
99e4d89afc Merge branches 'acpi-video' and 'acpi-soc'
* acpi-video:
  ACPI / video: Add disable_native_backlight quirk for Samsung 510R
  ACPI / video: Add disable_native_backlight quirk for Samsung 730U3E/740U3E

* acpi-soc:
  ACPI: add AMD ACPI2Platform device support for x86 system
  ACPI / LPSS: Remove non-existing clock control from Intel Lynxpoint I2C
  ACPI / LPSS: check the result of ioremap()
2015-02-10 16:04:01 +01:00
Rafael J. Wysocki
55c39fc2b1 Merge branch 'acpica'
* acpica:
  ACPICA: Events: Enable APIs to allow interrupt/polling adaptive request based GPE handling model
  ACPICA: Events: Introduce acpi_set_gpe()/acpi_finish_gpe() to reduce divergences
  ACPICA: Events: Introduce ACPI_GPE_DISPATCH_RAW_HANDLER to fix 2 issues for the current GPE APIs
  ACPICA: Update version to 20150204
  ACPICA: Update Copyright headers to 2015
  ACPICA: Hardware: Cast GPE enable_mask before storing
  ACPICA: Events: Cleanup GPE dispatcher type obtaining code
  ACPICA: Events: Cleanup to move acpi_gbl_global_event_handler invocation out of acpi_ev_gpe_dispatch()
  ACPICA: Events: Cleanup of resetting the GPE handler to NULL before removing
  ACPICA: Events: Fix uninitialized variable
  ACPICA: Events: Remove acpi_ev_valid_gpe_event() due to current restriction
  ACPICA: Events: Remove duplicated sanity check in acpi_ev_enable_gpe()
  ACPICA: Events: Back port "ACPICA: Save current masks of enabled GPEs after enable register writes"
  ACPICA: Resources: Provide common part for struct acpi_resource_address structures.
  ACPI: Introduce acpi_unload_parent_table() usages in Linux kernel
  ACPICA: take ACPI_MTX_INTERPRETER in acpi_unload_table_id()
2015-02-10 15:58:57 +01:00
Radim Krčmář
dab2087def KVM: x86: fix build with !CONFIG_SMP
<asm/apic.h> isn't included directly and without CONFIG_SMP, an option
that automagically pulls it can't be enabled.

Reported-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-10 08:53:18 +01:00
Linus Torvalds
e07e0d4cb0 Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS update from Ingo Molnar:
 "The changes in this cycle were:

   - allow mmcfg access to APEI error injection handlers

   - improve MCE error messages

   - smaller cleanups"

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, mce: Fix sparse errors
  x86, mce: Improve timeout error messages
  ACPI, EINJ: Enhance error injection tolerance level
2015-02-09 18:22:04 -08:00
Linus Torvalds
57d3629410 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm cleanups from Ingo Molnar:
 "Two cleanups: simplify parse_setup_data() and sanitize_e820_map()
  usage"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, e820: Clean up sanitize_e820_map() users
  x86, setup: Let early_memremap() handle page alignment
2015-02-09 18:16:03 -08:00
Linus Torvalds
a8f7684214 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SoC updates from Ingo Molnar:
 "Various Intel Atom SoC updates (mostly to enhance debuggability), plus
  an apb_timer cleanup"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: pmc_atom: Expose contents of PSS
  x86: pmc_atom: Clean up init function
  x86: pmc-atom: Remove unused macro
  x86: pmc_atom: don%27t check for NULL twice
  x86: pmc-atom: Assign debugfs node as soon as possible
  x86/platform: Remove unused function from apb_timer.c
2015-02-09 18:11:28 -08:00
Linus Torvalds
c93ecedab3 Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu updates from Ingo Molnar:
 "Initial round of kernel_fpu_begin/end cleanups from Oleg Nesterov,
  plus a cleanup from Borislav Petkov"

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, fpu: Fix math_state_restore() race with kernel_fpu_begin()
  x86, fpu: Don't abuse has_fpu in __kernel_fpu_begin/end()
  x86, fpu: Introduce per-cpu in_kernel_fpu state
  x86/fpu: Use a symbolic name for asm operand
2015-02-09 18:01:52 -08:00
Linus Torvalds
80f33a5fdf Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "Misc cleanups"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/rtc: Remove duplicate const specifier
  x86, early_serial_console: Remove unnecessary check
  x86, early_serial_console: Remove unused macro XMTRDY
  x86, setup: Rename BOOT_ISDIGIT_H to BOOT_CTYPE_H
  x86, CPU: Fix trivial printk formatting issues with dmesg
2015-02-09 17:50:09 -08:00
Linus Torvalds
7453311d68 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm changes from Ingo Molnar:
 "The main changes in this cycle were the x86/entry and sysret
  enhancements from Andy Lutomirski, see merge commits 772a9aca12 and
  b57c0b5175 for details"

[ Exectutive summary: IST exceptions that interrupt user space will run
  on the regular kernel stack instead of the IST stack.  Which
  simplifies things particularly on return to user space.

  The sysret cleanup ends up simplifying the logic on when we can use
  sysret vs when we have to use iret.                - Linus ]

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86_64, entry: Remove the syscall exit audit and schedule optimizations
  x86_64, entry: Use sysret to return to userspace when possible
  x86, traps: Fix ist_enter from userspace
  x86, vdso: teach 'make clean' remove vdso64 binaries
  x86_64 entry: Fix RCX for ptraced syscalls
  x86: entry_64.S: fold SAVE_ARGS_IRQ macro into its sole user
  x86: ia32entry.S: fix wrong symbolic constant usage: R11->ARGOFFSET
  x86: entry_64.S: delete unused code
  x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricks
  x86, traps: Add ist_begin_non_atomic and ist_end_non_atomic
  x86: Clean up current_stack_pointer
  x86, traps: Track entry into and exit from IST context
  x86, entry: Switch stacks on a paranoid entry from userspace
2015-02-09 17:16:44 -08:00
Linus Torvalds
9d43bade34 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 APIC updates from Ingo Molnar:
 "Continued fallout of the conversion of the x86 IRQ code to the
  hierarchical irqdomain framework: more cleanups, simplifications,
  memory allocation behavior enhancements, mainly in the interrupt
  remapping and APIC code"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
  x86, init: Fix UP boot regression on x86_64
  iommu/amd: Fix irq remapping detection logic
  x86/acpi: Make acpi_[un]register_gsi_ioapic() depend on CONFIG_X86_LOCAL_APIC
  x86: Consolidate boot cpu timer setup
  x86/apic: Reuse apic_bsp_setup() for UP APIC setup
  x86/smpboot: Sanitize uniprocessor init
  x86/smpboot: Move apic init code to apic.c
  init: Get rid of x86isms
  x86/apic: Move apic_init_uniprocessor code
  x86/smpboot: Cleanup ioapic handling
  x86/apic: Sanitize ioapic handling
  x86/ioapic: Add proper checks to setp/enable_IO_APIC()
  x86/ioapic: Provide stub functions for IOAPIC%3Dn
  x86/smpboot: Move smpboot inlines to code
  x86/x2apic: Use state information for disable
  x86/x2apic: Split enable and setup function
  x86/x2apic: Disable x2apic from nox2apic setup
  x86/x2apic: Add proper state tracking
  x86/x2apic: Clarify remapping mode for x2apic enablement
  x86/x2apic: Move code in conditional region
  ...
2015-02-09 16:57:56 -08:00
Linus Torvalds
a4cbbf549a Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Kernel side changes:

   - AMD range breakpoints support:

     Extend breakpoint tools and core to support address range through
     perf event with initial backend support for AMD extended
     breakpoints.

     The syntax is:

         perf record -e mem:addr/len:type

     For example set write breakpoint from 0x1000 to 0x1200 (0x1000 + 512)

         perf record -e mem:0x1000/512:w

   - event throttling/rotating fixes

   - various event group handling fixes, cleanups and general paranoia
     code to be more robust against bugs in the future.

    - kernel stack overhead fixes

  User-visible tooling side changes:

   - Show precise number of samples in at the end of a 'record' session,
     if processing build ids, since we will then traverse the whole
     perf.data file and see all the PERF_RECORD_SAMPLE records,
     otherwise stop showing the previous off-base heuristicly counted
     number of "samples" (Namhyung Kim).

   - Support to read compressed module from build-id cache (Namhyung
     Kim)

   - Enable sampling loads and stores simultaneously in 'perf mem'
     (Stephane Eranian)

   - 'perf diff' output improvements (Namhyung Kim)

   - Fix error reporting for evsel pgfault constructor (Arnaldo Carvalho
     de Melo)

  Tooling side infrastructure changes:

   - Cache eh/debug frame offset for dwarf unwind (Namhyung Kim)

   - Support parsing parameterized events (Cody P Schafer)

   - Add support for IP address formats in libtraceevent (David Ahern)

  Plus other misc fixes"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
  perf: Decouple unthrottling and rotating
  perf: Drop module reference on event init failure
  perf: Use POLLIN instead of POLL_IN for perf poll data in flag
  perf: Fix put_event() ctx lock
  perf: Fix move_group() order
  perf: Fix event->ctx locking
  perf: Add a bit of paranoia
  perf symbols: Convert lseek + read to pread
  perf tools: Use perf_data_file__fd() consistently
  perf symbols: Support to read compressed module from build-id cache
  perf evsel: Set attr.task bit for a tracking event
  perf header: Set header version correctly
  perf record: Show precise number of samples
  perf tools: Do not use __perf_session__process_events() directly
  perf callchain: Cache eh/debug frame offset for dwarf unwind
  perf tools: Provide stub for missing pthread_attr_setaffinity_np
  perf evsel: Don't rely on malloc working for sz 0
  tools lib traceevent: Add support for IP address formats
  perf ui/tui: Show fatal error message only if exists
  perf tests: Fix typo in sample-parsing.c
  ...
2015-02-09 15:43:55 -08:00
Rafael J. Wysocki
c488ea4613 Merge branch 'sfi' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux into pm-cpufreq
Pull SFI-based cpufreq driver for v3.20 from Len Brown.

* 'sfi' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  cpufreq: Add SFI based cpufreq driver support
  SFI: fix compiler warnings
2015-02-09 23:43:53 +01:00
Linus Torvalds
23e8fe2e16 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main RCU changes in this cycle are:

   - Documentation updates.

   - Miscellaneous fixes.

   - Preemptible-RCU fixes, including fixing an old bug in the
     interaction of RCU priority boosting and CPU hotplug.

   - SRCU updates.

   - RCU CPU stall-warning updates.

   - RCU torture-test updates"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
  rcu: Initialize tiny RCU stall-warning timeouts at boot
  rcu: Fix RCU CPU stall detection in tiny implementation
  rcu: Add GP-kthread-starvation checks to CPU stall warnings
  rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors
  rcu: Optionally run grace-period kthreads at real-time priority
  ksoftirqd: Use new cond_resched_rcu_qs() function
  ksoftirqd: Enable IRQs and call cond_resched() before poking RCU
  rcutorture: Add more diagnostics in rcu_barrier() test failure case
  torture: Flag console.log file to prevent holdovers from earlier runs
  torture: Add "-enable-kvm -soundhw pcspk" to qemu command line
  rcutorture: Handle different mpstat versions
  rcutorture: Check from beginning to end of grace period
  rcu: Remove redundant rcu_batches_completed() declaration
  rcutorture: Drop rcu_torture_completed() and friends
  rcu: Provide rcu_batches_completed_sched() for TINY_RCU
  rcutorture: Use unsigned for Reader Batch computations
  rcutorture: Make build-output parsing correctly flag RCU's warnings
  rcu: Make _batches_completed() functions return unsigned long
  rcutorture: Issue warnings on close calls due to Reader Batch blows
  documentation: Fix smp typo in memory-barriers.txt
  ...
2015-02-09 14:28:42 -08:00
Len Brown
3a9a941d0b tools/power turbostat: decode MSR_*_PERF_LIMIT_REASONS
The Processor generation code-named Haswell
added MSR_{CORE | GFX | RING}_PERF_LIMIT_REASONS
to explain when and how the processor limits frequency.

turbostat -v
will now decode these bits.

Each MSR has an "Active" set of bits which describe
current conditions, and a "Logged" set of bits,
which describe what has happened since last cleared.

Turbostat currently doesn't clear the log bits.

Signed-off-by: Len Brown <len.brown@intel.com>
2015-02-09 16:44:24 -05:00
Linus Torvalds
b0c1936c44 spi: Updates for v3.20
The major highlight this release is a refactoring of the core to allow
 us to run synchronous transfers in the context of the caller when there
 is no contention for the bus.  This improves performance in the very
 common case by eliminating context switches and reducing the number of
 hardware setup and teardown operations we need to perform.
 
 Other changes:
 
  - New drivers for DLN-2 USB-SPI adapter and ST SPI controllers.
  - A big round of cleanups, performance and feature improvements
    for the xilinx driver from Ricardo Ribalda Delgado.
  - A wide range of smaller cleanups, fixes and feature improvements
    throughout the subsystem.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJU2GNgAAoJECTWi3JdVIfQLiYH/0uLN43CunPp0gSWllQ2PY1O
 R1QiqXg1fr1uZKRuGy59QF0TkU/JlWPY+tpGiOH1jrnDsoecnWsxDx3YEeuYdV6U
 c//UrlK2uvESivbc48zVUTwCsgxsE8apG0JgqLjsfUpqZTEFxFpeSskepSJ2kIUz
 bsXHU8Xi0WkLalsk/8Ik8aUvOwVi5EtRE9OMvnU6QPqQMCszgv1TH4UbwbhqwwzZ
 U23WbNHQ262XDRwY2LKl/QROULeU5pd9F19wrveKMa42fkbu/e+kk6E3n7/Hd4mV
 CUjv1wTCpPZvzh3bTk50uXwA9XQOzv6ddw6jqsgLcV6jS8Ju3Z3Beya3fmdhOl0=
 =3ZQr
 -----END PGP SIGNATURE-----

Merge tag 'spi-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi updates from Mark Brown:
 "The major highlight this release is a refactoring of the core to allow
  us to run synchronous transfers in the context of the caller when
  there is no contention for the bus.  This improves performance in the
  very common case by eliminating context switches and reducing the
  number of hardware setup and teardown operations we need to perform.

  Other changes:

   - New drivers for DLN-2 USB-SPI adapter and ST SPI controllers.

   - A big round of cleanups, performance and feature improvements for
     the xilinx driver from Ricardo Ribalda Delgado.

   - A wide range of smaller cleanups, fixes and feature improvements
     throughout the subsystem"

* tag 'spi-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (68 commits)
  spi: mxs: cleanup wait_for_completion return handling
  spi: ti-qspi: cleanup wait_for_completion return handling
  spi: spi-imx: cleanup wait_for_completion handling
  spi: sh-msiof: cleanup wait_for_completion return handling
  spi: match var type to return type of wait_for_completion
  spi: spi-pxa2xx: only include mach/dma.h for legacy DMA
  spi: atmel: cleanup wait_for_completion return handling
  spi: fsl-dspi: Remove possible memory leak of 'chip'
  spi: sh-msiof: Update calculation of frequency dividing
  spi: spidev: Convert buf pointers for 32-bit compat SPI_IOC_MESSAGE(n)
  spi/xilinx: Fix access invalid memory on xilinx_spi_tx
  spi: Revert "spi/xilinx: Remove iowrite/ioread wrappers"
  spi/xilinx: Check number of slaves range
  spi/xilinx: Use polling mode on small transfers
  spi/xilinx: Remove remaining_words driver data variable
  spi/xilinx: Remove iowrite/ioread wrappers
  spi/xilinx: Convert bits_per_word in bytes_per_word
  spi/xilinx: Convert remainding_bytes in remaining words
  spi/xilinx: Make spi_tx and spi_rx simmetric
  spi/xilinx: Remove rx_fn and tx_fn pointer
  ...
2015-02-09 13:36:20 -08:00
Tony Luck
a2413d8b29 x86/mce: Fix regression. All error records should report via /dev/mcelog
I'm getting complaints from validation teams that have updated their
Linux kernels from ancient versions to current. They don't see the
error logs they expect. I tell the to unload any EDAC drivers[1], and
things start working again.  The problem is that we short-circuit
the logging process if any function on the decoder chain claims to
have dealt with the problem:

	ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
	if (ret == NOTIFY_STOP)
		return;

The logic we used when we added this code was that we did not want
to confuse users with double reports of the same error.

But it turns out users are not confused - they are upset that they
don't see a log where their tools used to find a log.

I could also get into a long description of how the consumer of this
log does more than just decode model specific details of the error.
It keeps counts, tracks thresholds, takes actions and runs scripts
that can alert administrators to problems.

[1] We've recently compounded the problem because the acpi_extlog
driver also registers for this notifier and also returns NOTIFY_STOP.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2015-02-09 09:36:53 -08:00
Paolo Bonzini
d44e121223 KVM: x86: emulate: correct page fault error code for NoWrite instructions
NoWrite instructions (e.g. cmp or test) never set the "write access"
bit in the error code, even if one of the operands is treated as a
destination.

Fixes: c205fb7d7d
Cc: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-09 13:36:01 +01:00
Mark Brown
fab4b42a9a Merge remote-tracking branches 'spi/topic/atmel', 'spi/topic/config', 'spi/topic/dln2' and 'spi/topic/dw' into spi-next 2015-02-08 11:16:43 +08:00
Linus Torvalds
26cdd1f76a Merge branches 'timers-urgent-for-linus' and 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer and x86 fix from Ingo Molnar:
 "A CLOCK_TAI early expiry fix and an x86 microcode driver oops fix"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hrtimer: Fix incorrect tai offset calculation for non high-res timer systems

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode: Return error from driver init code when loader is disabled
2015-02-06 13:56:02 -08:00
Ken Xue
92082a8886 ACPI: add AMD ACPI2Platform device support for x86 system
This new feature is to interpret AMD specific ACPI device to
platform device such as I2C, UART, GPIO found on AMD CZ and
later chipsets. It based on example intel LPSS. Now, it can
support AMD I2C, UART and GPIO.

Signed-off-by: Ken Xue <Ken.Xue@amd.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-06 15:42:16 +01:00
Paolo Bonzini
f781951299 kvm: add halt_poll_ns module parameter
This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.

This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest.  KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too.  When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.

With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest.  This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.

Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host.  The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.

The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter.  It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.

While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls.  During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark.  The wasted time is thus very low.  Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.

The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer.  Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns.  For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-06 13:08:37 +01:00
Hanjun Guo
2fad93083e ACPI / table: remove duplicate NULL check for the handler of acpi_table_parse()
In acpi_table_parse(), pointer of the table to pass to handler() is
checked before handler() called, so remove all the duplicate NULL
check in the handler function.

CC: Tony Luck <tony.luck@intel.com>
CC: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-06 01:34:47 +01:00
Linus Torvalds
f3c2352df1 PCI updates for v3.19:
Enumeration
     - Scan all device numbers on NEC as well as Stratus (Charlotte Richardson)
 
   Resource management
     - Handle read-only BARs on AMD CS553x devices (Myron Stowe)
 
   Synopsys DesignWare
     - Reject MSI-X IRQs (Lucas Stach)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU0m7MAAoJEFmIoMA60/r8zQoP/iZpoDfkIrvJWhLjRO1agVCd
 7Y5Hrzr+AmESq4n3D7IQt7c+wzaR+05gOG/orDvaFfpbUMamTHlFxUjsk7ib3lLv
 Qg+WUX/I444pt/jDxRlqZbqWC8AkvYrTWPZoR+fagdWMuIGnKYNvjIAdmzRgMwB0
 C3AFGa1S61tqD0+b2Dc177vPZowOUgPBq+ta0Ld8RJ7zBfNnL/GdgCqWxo0OyPNv
 uSeQuCmKEL5J1Sn1D+j0EKL5FdSDN+LJeU4k5hdF3I+GaervFkOYVqstAqv1kqML
 2I9LBjMndRueseArqq7oCAeMn+lCEqleiYr+Y0DtlLOo68eZzh2YWNRV/PcRGMnj
 xOJo59rDSw5amg4baL7DVdPFIW3NGZiVThNiM3ye5CuhfAl3U/PY1xiDxQ85KY2F
 htc5rhnm1b9dE5LrJN0iChX4fT60sYyc4IMHuu3wuS5fRNDgZWscs0TT6GSIXSCH
 ua1xiaq0hLc8XoWdV0A/7R9kX37KJ0kyjCmSFJV/uVWKgKpS3eGi6Ex67qCU1w0C
 Vw2JT/H8V7vbNCpF/+8fYhXYzsTx6Oa0qLfcBEgT7xxjPTMTsH0LUW0wAivL60rV
 J9PPcIBrDui6fBVLDkOlgMBjokkShkSay0MUuaUaZib3BdcDY14RVfm8K8MhYS/I
 ViaWaM1bQ4DGzjnz5dQg
 =hsLb
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.19-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:
 "Enumeration
    - Scan all device numbers on NEC as well as Stratus (Charlotte Richardson)

  Resource management
    - Handle read-only BARs on AMD CS553x devices (Myron Stowe)

  Synopsys DesignWare
    - Reject MSI-X IRQs (Lucas Stach)"

* tag 'pci-v3.19-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Handle read-only BARs on AMD CS553x devices
  PCI: Add NEC variants to Stratus ftServer PCIe DMI check
  PCI: designware: Reject MSI-X IRQs
2015-02-05 10:23:12 -08:00
Jiang Liu
b4b55cda58 x86/PCI: Refine the way to release PCI IRQ resources
Some PCI device drivers assume that pci_dev->irq won't change after
calling pci_disable_device() and pci_enable_device() during suspend and
resume.

Commit c03b3b0738 ("x86, irq, mpparse: Release IOAPIC pin when
PCI device is disabled") frees PCI IRQ resources when pci_disable_device()
is called and reallocate IRQ resources when pci_enable_device() is
called again. This breaks above assumption. So commit 3eec595235
("x86, irq, PCI: Keep IRQ assignment for PCI devices during
suspend/hibernation") and 9eabc99a63 ("x86, irq, PCI: Keep IRQ
assignment for runtime power management") fix the issue by avoiding
freeing/reallocating IRQ resources during PCI device suspend/resume.
They achieve this by checking dev.power.is_prepared and
dev.power.runtime_status.  PM maintainer, Rafael, then pointed out that
it's really an ugly fix which leaking PM internal state information to
IRQ subsystem.

Recently David Vrabel <david.vrabel@citrix.com> also reports an
regression in pciback driver caused by commit cffe0a2b5a ("x86, irq:
Keep balance of IOAPIC pin reference count"). Please refer to:
http://lkml.org/lkml/2015/1/14/546

So this patch refine the way to release PCI IRQ resources. Instead of
releasing PCI IRQ resources in pci_disable_device()/
pcibios_disable_device(), we now release it at driver unbinding
notification BUS_NOTIFY_UNBOUND_DRIVER. In other word, we only release
PCI IRQ resources when there's no driver bound to the PCI device, and
it keeps the assumption that pci_dev->irq won't through multiple
invocation of pci_enable_device()/pci_disable_device().

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-05 15:09:26 +01:00
Jiang Liu
593669c2ac x86/PCI/ACPI: Use common ACPI resource interfaces to simplify implementation
Use common ACPI resource discovery interfaces to simplify PCI host bridge
resource enumeration.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-05 15:09:26 +01:00
Jiang Liu
812dbd9994 x86/PCI: Fix the range check for IO resources
The range check in setup_res() checks the IO range against
iomem_resource. That's just wrong.

Reworked based on Thomas original patch.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-05 15:09:25 +01:00
Jiang Liu
14d76b68f2 PCI: Use common resource list management code instead of private implementation
Use common resource list management data structure and interfaces
instead of private implementation.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-05 15:09:25 +01:00
Tiejun Chen
1c2b364b22 kvm: remove KVM_MMIO_SIZE
After f78146b0f9, "KVM: Fix page-crossing MMIO", and
87da7e66a4, "KVM: x86: fix vcpu->mmio_fragments overflow",
actually KVM_MMIO_SIZE is gone.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-05 12:26:14 +01:00
Andy Lutomirski
a66734297f perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks
While perfmon2 is a sufficiently evil library (it pokes MSRs
directly) that breaking it is fair game, it's still useful, so we
might as well try to support it.  This allows users to write 2 to
/sys/devices/cpu/rdpmc to disable all rdpmc protection so that hack
like perfmon2 can continue to work.

At some point, if perf_event becomes fast enough to replace
perfmon2, then this can go.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vince Weaver <vince@deater.net>
Cc: "hillf.zj" <hillf.zj@alibaba-inc.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/caac3c1c707dcca48ecbc35f4def21495856f479.1414190806.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04 12:10:49 +01:00
Andy Lutomirski
7911d3f7af perf/x86: Only allow rdpmc if a perf_event is mapped
We currently allow any process to use rdpmc.  This significantly
weakens the protection offered by PR_TSC_DISABLED, and it could be
helpful to users attempting to exploit timing attacks.

Since we can't enable access to individual counters, use a very
coarse heuristic to limit access to rdpmc: allow access only when
a perf_event is mmapped.  This protects seccomp sandboxes.

There is plenty of room to further tighen these restrictions.  For
example, this allows rdpmc for any x86_pmu event, but it's only
useful for self-monitoring tasks.

As a side effect, cap_user_rdpmc will now be false for AMD uncore
events.  This isn't a real regression, since .event_idx is disabled
for these events anyway for the time being.  Whenever that gets
re-added, the cap_user_rdpmc code can be adjusted or refactored
accordingly.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vince Weaver <vince@deater.net>
Cc: "hillf.zj" <hillf.zj@alibaba-inc.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/a2bdb3cf3a1d70c26980d7c6dddfbaa69f3182bf.1414190806.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04 12:10:47 +01:00
Andy Lutomirski
c1317ec2b9 perf: Pass the event to arch_perf_update_userpage()
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vince Weaver <vince@deater.net>
Cc: "hillf.zj" <hillf.zj@alibaba-inc.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/0fea9a7fac3c1eea86cb0a5954184e74f4213666.1414190806.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04 12:10:46 +01:00
Andy Lutomirski
22c4bd9fa9 x86: Add a comment clarifying LDT context switching
The code is correct, but only for a rather subtle reason.  This
confused me for quite a while when I read switch_mm, so clarify the
code to avoid confusing other people, too.

TBH, I wouldn't be surprised if this code was only correct by
accident.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vince Weaver <vince@deater.net>
Cc: "hillf.zj" <hillf.zj@alibaba-inc.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/0db86397f968996fb772c443c251415b0b430ddd.1414190806.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04 12:10:43 +01:00
Andy Lutomirski
1e02ce4ccc x86: Store a per-cpu shadow copy of CR4
Context switches and TLB flushes can change individual bits of CR4.
CR4 reads take several cycles, so store a shadow copy of CR4 in a
per-cpu variable.

To avoid wasting a cache line, I added the CR4 shadow to
cpu_tlbstate, which is already touched in switch_mm.  The heaviest
users of the cr4 shadow will be switch_mm and __switch_to_xtra, and
__switch_to_xtra is called shortly after switch_mm during context
switch, so the cacheline is likely to be hot.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vince Weaver <vince@deater.net>
Cc: "hillf.zj" <hillf.zj@alibaba-inc.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/3a54dd3353fffbf84804398e00dfdc5b7c1afd7d.1414190806.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04 12:10:42 +01:00
Andy Lutomirski
375074cc73 x86: Clean up cr4 manipulation
CR4 manipulation was split, seemingly at random, between direct
(write_cr4) and using a helper (set/clear_in_cr4).  Unfortunately,
the set_in_cr4 and clear_in_cr4 helpers also poke at the boot code,
which only a small subset of users actually wanted.

This patch replaces all cr4 access in functions that don't leave cr4
exactly the way they found it with new helpers cr4_set_bits,
cr4_clear_bits, and cr4_set_bits_and_update_boot.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vince Weaver <vince@deater.net>
Cc: "hillf.zj" <hillf.zj@alibaba-inc.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/495a10bdc9e67016b8fd3945700d46cfd5c12c2f.1414190806.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04 12:10:41 +01:00
Josh Poimboeuf
12cf89b550 livepatch: rename config to CONFIG_LIVEPATCH
Rename CONFIG_LIVE_PATCHING to CONFIG_LIVEPATCH to make the naming of
the config and the code more consistent.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-02-04 11:25:51 +01:00
Ingo Molnar
0967160ad6 Merge branch 'x86/asm' into perf/x86, to avoid conflicts with upcoming patches
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04 09:01:12 +01:00
Ingo Molnar
8f4bf4bcc4 Linux 3.19-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUzvgKAAoJEHm+PkMAQRiG8XQH/1qVbHI4pP0KcnzfZUHq/mXq
 RuS4aJMwLm/Y6cXFraXBDaPde1A3CPtwtpob2C6giKcfu2zXGunY65haOEeJWNpX
 lCbBsLkNC3oDNkygBpVr5Zd6yibaw63WBjjLnpAi7pn2G2Zm2zB8DfILWWWMb7yz
 MH8ZXV+/xIYCTkjNWGWA1iMjmdYqu0PQHPeOgLsYQ+u7rxfM1zb/wHEkjqUZS6iu
 IaaZv7PV2PnFYnqib/iIPYjAEDvSQ4vN/7b82zlFd2Culm9j/568KCCWUPhJTb2l
 X0u4QYs49GnMTWVRa3bgYxS/nTUaE/6DeWs2y2WzqTt0/XDntVUnok0blUeDxGk=
 =o2kS
 -----END PGP SIGNATURE-----

Merge tag 'v3.19-rc7' into perf/core, to merge fixes before applying new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04 07:58:29 +01:00
Jan Beulich
75aaf4c3e6 x86/raid6: correctly check for assembler capabilities
Just like for AVX2 (which simply needs an #if -> #ifdef conversion),
SSSE3 assembler support should be checked for before using it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Jim Kukunas <james.t.kukunas@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-04 08:35:51 +11:00
Rafael J. Wysocki
b2cd5dd71a Merge branch 'acpica' into acpi-resources 2015-02-03 22:27:01 +01:00
Wincy Van
705699a139 KVM: nVMX: Enable nested posted interrupt processing
If vcpu has a interrupt in vmx non-root mode, injecting that interrupt
requires a vmexit.  With posted interrupt processing, the vmexit
is not needed, and interrupts are fully taken care of by hardware.
In nested vmx, this feature avoids much more vmexits than non-nested vmx.

When L1 asks L0 to deliver L1's posted interrupt vector, and the target
VCPU is in non-root mode, we use a physical ipi to deliver POSTED_INTR_NV
to the target vCPU.  Using POSTED_INTR_NV avoids unexpected interrupts
if a concurrent vmexit happens and L1's vector is different with L0's.
The IPI triggers posted interrupt processing in the target physical CPU.

In case the target vCPU was not in guest mode, complete the posted
interrupt delivery on the next entry to L2.

Signed-off-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-03 17:15:08 +01:00
Wincy Van
608406e290 KVM: nVMX: Enable nested virtual interrupt delivery
With virtual interrupt delivery, the hardware lets KVM use a more
efficient mechanism for interrupt injection. This is an important feature
for nested VMX, because it reduces vmexits substantially and they are
much more expensive with nested virtualization.  This is especially
important for throughput-bound scenarios.

Signed-off-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-03 17:07:38 +01:00
Wincy Van
82f0dd4b27 KVM: nVMX: Enable nested apic register virtualization
We can reduce apic register virtualization cost with this feature,
it is also a requirement for virtual interrupt delivery and posted
interrupt processing.

Signed-off-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-03 17:07:03 +01:00
Wincy Van
b9c237bb1d KVM: nVMX: Make nested control MSRs per-cpu
To enable nested apicv support, we need per-cpu vmx
control MSRs:
  1. If in-kernel irqchip is enabled, we can enable nested
     posted interrupt, we should set posted intr bit in
     the nested_vmx_pinbased_ctls_high.
  2. If in-kernel irqchip is disabled, we can not enable
     nested posted interrupt, the posted intr bit
     in the nested_vmx_pinbased_ctls_high will be cleared.

Since there would be different settings about in-kernel
irqchip between VMs, different nested control MSRs
are needed.

Signed-off-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-03 17:06:51 +01:00
Wincy Van
f2b93280ed KVM: nVMX: Enable nested virtualize x2apic mode
When L2 is using x2apic, we can use virtualize x2apic mode to
gain higher performance, especially in apicv case.

This patch also introduces nested_vmx_check_apicv_controls
for the nested apicv patches.

Signed-off-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-03 17:06:17 +01:00
Wincy Van
3af18d9c5f KVM: nVMX: Prepare for using hardware MSR bitmap
Currently, if L1 enables MSR_BITMAP, we will emulate this feature, all
of L2's msr access is intercepted by L0.  Features like "virtualize
x2apic mode" require that the MSR bitmap is enabled, or the hardware
will exit and for example not virtualize the x2apic MSRs.  In order to
let L1 use these features, we need to build a merged bitmap that only
not cause a VMEXIT if 1) L1 requires that 2) the bit is not required by
the processor for APIC virtualization.

For now the guests are still run with MSR bitmap disabled, but this
patch already introduces nested_vmx_merge_msr_bitmap for future use.

Signed-off-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-03 17:02:32 +01:00
Ingo Molnar
b57c0b5175 x86: Entry cleanups and a bugfix for 3.20
This fixes a bug in the RCU code I added in ist_enter.  It also includes
 the sysret stuff discussed here:
 
 http://lkml.kernel.org/g/cover.1421453410.git.luto%40amacapital.net
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUzhZ0AAoJEK9N98ZeDfrksUEH/j7wkUlMGan5h1AQIZQW6gKk
 OjlE1a4rfcgKocgkc0ix6UMc8Ks/NAUWKpeHR08eqR+Xi6Yk29cqLkboTEmAdYJ3
 jQvKjGu51kiprNjAGqF5wdqxvCT3oBSdm7CWdtY4zHkEr+2W93Ht9PM7xZhj4r+P
 ekUC8mIKQrhyhlC7g7VpXLAi3Bk4mO+f499T7XBVsVoywWpgVpOMYMhtUobV1reW
 V7/zul/dMerzNLB0t3amvdgCLphHBQTQ0fHBAN62RY78UvSDt36EZFyS65isirsR
 LhO4FpWzF5YNMRk8Dep/fB8jYlhsCi40ZIlOtGSE6kNJyLhPt+oLnkpgOwWAMQc=
 =uiRw
 -----END PGP SIGNATURE-----

Merge tag 'pr-20150201-x86-entry' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux into x86/asm

Pull "x86: Entry cleanups and a bugfix for 3.20" from Andy Lutomirski:

 " This fixes a bug in the RCU code I added in ist_enter.  It also includes
   the sysret stuff discussed here:

     http://lkml.kernel.org/g/cover.1421453410.git.luto%40amacapital.net "

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-03 12:24:08 +01:00