Pull x86/asm changes from Ingo Molnar:
"Assorted single-commit improvements, as usual"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/mtrr: Slightly simplify print_mtrr_state()
x86/mm/mtrr: Fix alignment determination in range_to_mtrr()
x86/copy_user_generic: Optimize copy_user_generic with CPU erms feature
x86/alternatives: Use atomic_xchg() instead atomic_dec_and_test() for stop_machine_text_poke()
there are 3 funcs which need to be _initcalled in a logic sequence:
1. xen_late_init_mcelog
2. mcheck_init_device
3. threshold_init_device
xen_late_init_mcelog must register xen_mce_chrdev_device before
native mce_chrdev_device registration if running under xen platform;
mcheck_init_device should be inited before threshold_init_device to
initialize mce_device, otherwise a a NULL ptr dereference will cause panic.
so we use following _initcalls
1. device_initcall(xen_late_init_mcelog);
2. device_initcall_sync(mcheck_init_device);
3. late_initcall(threshold_init_device);
when running under xen, the initcall order is 1,2,3;
on baremetal, we skip 1 and we do only 2 and 3.
Acked-and-tested-by: Borislav Petkov <bp@amd64.org>
Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
When MCA error occurs, it would be handled by Xen hypervisor first,
and then the error information would be sent to initial domain for logging.
This patch gets error information from Xen hypervisor and convert
Xen format error into Linux format mcelog. This logic is basically
self-contained, not touching other kernel components.
By using tools like mcelog tool users could read specific error information,
like what they did under native Linux.
To test follow directions outlined in Documentation/acpi/apei/einj.txt
Acked-and-tested-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Ke, Liping <liping.ke@intel.com>
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
In commit dad1743e59 ("x86/mce: Only restart instruction after machine
check recovery if it is safe") we fixed mce_notify_process() to force a
signal to the current process if it was not restartable (RIPV bit not
set in MCG_STATUS). But doing it here means that the process doesn't
get told the virtual address of the fault via siginfo_t->si_addr. This
would prevent application level recovery from the fault.
Make a new MF_MUST_KILL flag bit for memory_failure() et al. to use so
that we will provide the right information with the signal.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: stable@kernel.org # 3.4+
While debugging I noticed that unlike all the other hypervisor code in the
kernel, kvm does not have an entry for x86_hyper which is used in
detect_hypervisor_platform() which results in a nice printk in the
syslog. This is only really a stub function but it
does make kvm more consistent with the other hypervisors.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Marcelo Tostatti <mtosatti@redhat.com>
Cc: kvm@vger.kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
high_width can be easily calculated in a single expression when
making use of __ffs64().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/4FF71053020000780008E1B5@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With the variable operated on being of "unsigned long" type,
neither ffs() nor fls() are suitable to use on them, as those
truncate their arguments to 32 bits. Using __ffs() and __fls()
respectively at once eliminates the need to subtract 1 from their
results.
Additionally, with the alignment value subsequently used as a
shift count, it must be enforced to be less than BITS_PER_LONG
(and on 64-bit there's no need for it to be any smaller).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/4FF70D54020000780008E179@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use tabs for "intel_perfmon_event_map" formatting in
perf_event_intel.c.
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1341568786-7045-1-git-send-email-penberg@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch adds C-Box and PCU filter support for SandyBridge-EP
uncore. We can filter C-Box events by thread/core ID and filter
PCU events by frequency/voltage.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1341381616-12229-5-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The CBox manages the interface between the core and the LLC, so
the instances of uncore CBox is equal to number of cores.
Reported-by: Andrew Cooks <acooks@gmail.com>
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1341381616-12229-4-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Stephane Eranian suggestted using 0xff as pseudo code for fixed
uncore event and using the umask value to determine which of the
fixed events we want to map to. So far there is at most one fixed
counter in a uncore PMU. So just change the definition of
UNCORE_FIXED_EVENT to 0xff.
Suggested-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1340780953-21130-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
All these are basically boolean flags, use a bitfield to save a few
bytes.
Suggested-by: Borislav Petkov <bp@amd64.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-vsevd5g8lhcn129n3s7trl7r@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Recent Intel microcode resolved the SNB-PEBS issues, so conditionally
enable PEBS on SNB hardware depending on the microcode revision.
Thanks to Stephane for figuring out the various microcode revisions.
Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-v3672ziwh9damwqwh1uz3krm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It might be of interest which perfctr msr failed.
Signed-off-by: Robert Richter <robert.richter@amd.com>
[ added hunk to avoid GCC warn ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1340217996-2254-5-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There is no need for keeping separate pmu structs. We can enable
amd_{get,put}_event_constraints() functions also for family 15h event.
The advantage is that there is only a single pmu struct for all AMD
cpus. This patch introduces functions to setup the pmu to enabe core
performance counters or counter constraints.
Also, cpuid checks are used instead of family checks where
possible. Thus, it enables the code independently of cpu families if
the feature flag is set.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1340217996-2254-4-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There is some Intel specific code in the generic x86 path. Move it to
intel_pmu_init().
Since p4 and p6 pmus don't have fixed counters we may skip the check
in case such a pmu is detected.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1340217996-2254-3-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There are macros that are Intel specific and not x86 generic. Rename
them into INTEL_*.
This patch removes X86_PMC_IDX_GENERIC and does:
$ sed -i -e 's/X86_PMC_MAX_/INTEL_PMC_MAX_/g' \
arch/x86/include/asm/kvm_host.h \
arch/x86/include/asm/perf_event.h \
arch/x86/kernel/cpu/perf_event.c \
arch/x86/kernel/cpu/perf_event_p4.c \
arch/x86/kvm/pmu.c
$ sed -i -e 's/X86_PMC_IDX_FIXED/INTEL_PMC_IDX_FIXED/g' \
arch/x86/include/asm/perf_event.h \
arch/x86/kernel/cpu/perf_event.c \
arch/x86/kernel/cpu/perf_event_intel.c \
arch/x86/kernel/cpu/perf_event_intel_ds.c \
arch/x86/kvm/pmu.c
$ sed -i -e 's/X86_PMC_MSK_/INTEL_PMC_MSK_/g' \
arch/x86/include/asm/perf_event.h \
arch/x86/kernel/cpu/perf_event.c
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1340217996-2254-2-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Several perf interrupt handlers (PEBS,IBS,BTS) re-write regs->ip but
do not update the segment registers. So use an regs->ip based test
instead of an regs->cs/regs->flags based test.
Reported-and-tested-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-xxrt0a1zronm1sm36obwc2vy@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Testing show different CPU type(micro architectures and NUMA mode) has
different balance points between the TLB flush all and multiple invlpg.
And there also has cases the tlb flush change has no any help.
This patch give a interface to let x86 vendor developers have a chance
to set different shift for different CPU type.
like some machine in my hands, balance points is 16 entries on
Romely-EP; while it is at 8 entries on Bloomfield NHM-EP; and is 256 on
IVB mobile CPU. but on model 15 core2 Xeon using invlpg has nothing
help.
For untested machine, do a conservative optimization, same as NHM CPU.
Signed-off-by: Alex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1340845344-27557-5-git-send-email-alex.shi@intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
For 4KB pages, x86 CPU has 2 or 1 level TLB, first level is data TLB and
instruction TLB, second level is shared TLB for both data and instructions.
For hupe page TLB, usually there is just one level and seperated by 2MB/4MB
and 1GB.
Although each levels TLB size is important for performance tuning, but for
genernal and rude optimizing, last level TLB entry number is suitable. And
in fact, last level TLB always has the biggest entry number.
This patch will get the biggest TLB entry number and use it in furture TLB
optimizing.
Accroding Borislav's suggestion, except tlb_ll[i/d]_* array, other
function and data will be released after system boot up.
For all kinds of x86 vendor friendly, vendor specific code was moved to its
specific files.
Signed-off-by: Alex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1340845344-27557-2-git-send-email-alex.shi@intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
There was a stray %s left from testing, remove it.
Add -w to the #! line (which is parsed by Perl even if the Perl
interpreter is invoked explicitly on the command line) to catch these
kinds of errors in the future.
Reported-by: Jean Delvare <khali@linux-fr.org>
Link: http://lkml.kernel.org/r/20120626143246.0c9bf301@endymion.delvare
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
We had a case of duplicate CPU feature strings, a user space ABI
violation, for almost two years. Make it a build error so that
doesn't happen again.
Link: http://lkml.kernel.org/r/4FE34BCB.5050305@linux.intel.com
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Jean Delvare <khali@linux-fr.org>
It makes sense to label "Digital Thermal Sensor" as "DTS", but
unfortunately the string "dts" was already used for "Debug Store", and
/proc/cpuinfo is a user space ABI.
Therefore, rename this to "dtherm".
This conflict went into mainline via the hwmon tree without any x86
maintainer ack, and without any kind of hint in the subject.
a4659053 x86/hwmon: fix initialization of coretemp
Reported-by: Jean Delvare <khali@linux-fr.org>
Link: http://lkml.kernel.org/r/4FE34BCB.5050305@linux.intel.com
Cc: Jan Beulich <JBeulich@suse.com>
Cc: <stable@vger.kernel.org> v2.6.36..v3.4
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Fix section mismatch in uncore_pci_init():
WARNING: vmlinux.o(.init.text+0x9246): Section mismatch in reference from the function uncore_pci_init() to the function .devexit.text:uncore_pci_remove()
The function __init uncore_pci_init() references
a function __devexit uncore_pci_remove().
[...]
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: <a.p.zijlstra@chello.nl>
Cc: <zheng.z.yan@intel.com>
Link: http://lkml.kernel.org/r/20120620163927.GI5046@erda.amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The uncore subsystem in Sandy Bridge-EP consists of 8 components:
Ubox, Cacheing Agent, Home Agent, Memory controller, Power Control,
QPI Link Layer, R2PCIe, R3QPI.
Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1339741902-8449-9-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch adds generic support for uncore PMUs presented as
PCI devices. (These come in addition to the CPU/MSR based
uncores.)
Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1339741902-8449-8-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch adds the generic Intel uncore PMU support, including helper
functions that add/delete uncore events, a hrtimer that periodically
polls the counters to avoid overflow and code that places all events
for a particular socket onto a single cpu.
The code design is based on the structure of Sandy Bridge-EP's uncore
subsystem, which consists of a variety of components, each component
contains one or more "boxes".
(Tooling support follows in the next patches.)
Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1339741902-8449-6-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The RDPMC index calculation is wrong for AMD family 15h
(X86_FEATURE_ PERFCTR_CORE set). This leads to a #GP when
accessing the counter:
Pid: 2237, comm: syslog-ng Not tainted 3.5.0-rc1-perf-x86_64-standard-g130ff90 #135 AMD Pike/Pike
RIP: 0010:[<ffffffff8100dc33>] [<ffffffff8100dc33>] x86_perf_event_update+0x27/0x66
While the msr address offset is (index << 1) we must use index to
select the correct rdpmc.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Vince Weaver <vweaver1@eecs.utk.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 fixes from Ingo Molnar.
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/nmi: Fix section mismatch warnings on 32-bit
x86/uv: Fix UV2 BAU legacy mode
x86/mm: Only add extra pages count for the first memory range during pre-allocation early page table space
x86, efi stub: Add .reloc section back into image
x86/ioapic: Fix NULL pointer dereference on CPU hotplug after disabling irqs
x86/reboot: Fix a warning message triggered by stop_other_cpus()
x86/intel/moorestown: Change intel_scu_devices_create() to __devinit
x86/numa: Set numa_nodes_parsed at acpi_numa_memory_affinity_init()
x86/gart: Fix kmemleak warning
x86: mce: Add the dropped timer interval init back
x86/mce: Fix the MCE poll timer logic
Pull perf fixes from Ingo Molnar:
"A bit larger than what I'd wish for - half of it is due to hw driver
updates to Intel Ivy-Bridge which info got recently released,
cycles:pp should work there now too, amongst other things. (but we
are generally making exceptions for hardware enablement of this type.)
There are also callchain fixes in it - responding to mostly
theoretical (but valid) concerns. The tooling side sports perf.data
endianness/portability fixes which did not make it for the merge
window - and various other fixes as well."
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
perf/x86: Check user address explicitly in copy_from_user_nmi()
perf/x86: Check if user fp is valid
perf: Limit callchains to 127
perf/x86: Allow multiple stacks
perf/x86: Update SNB PEBS constraints
perf/x86: Enable/Add IvyBridge hardware support
perf/x86: Implement cycles:p for SNB/IVB
perf/x86: Fix Intel shared extra MSR allocation
x86/decoder: Fix bsr/bsf/jmpe decoding with operand-size prefix
perf: Remove duplicate invocation on perf_event_for_each
perf uprobes: Remove unnecessary check before strlist__delete
perf symbols: Check for valid dso before creating map
perf evsel: Fix 32 bit values endianity swap for sample_id_all header
perf session: Handle endianity swap on sample_id_all header data
perf symbols: Handle different endians properly during symbol load
perf evlist: Pass third argument to ioctl explicitly
perf tools: Update ioctl documentation for PERF_IOC_FLAG_GROUP
perf tools: Make --version show kernel version instead of pull req tag
perf tools: Check if callchain is corrupted
perf callchain: Make callchain cursors TLS
...
Rename checking_wrmsrl() to wrmsrl_safe(), to match the naming
convention used by all the other MSR access functions/macros.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Now that all users of {rd,wr}msr_amd_safe have been fixed, deprecate its
use by making them private to amd.c and adding warnings when used on
anything else beside K8.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1338562358-28182-5-git-send-email-bp@amd64.org
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
f7f286a910 ("x86/amd: Re-enable CPU topology extensions in case BIOS
has disabled it") wrongfully added code which used the AMD-specific
{rd,wr}msr variants for no real reason.
This caused boot panics on xen which wasn't initializing the
{rd,wr}msr_safe_regs pv_ops members properly.
This, in turn, caused a heated discussion leading to us reviewing all
uses of the AMD-specific variants and removing them where unneeded
(almost everywhere except an obscure K8 BIOS fix, see 6b0f43ddfa).
Finally, this patch switches to the standard {rd,wr}msr*_safe* variants
which should've been used in the first place anyway and avoided unneeded
excitation with xen.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Link: http://lkml.kernel.org/r/1338562358-28182-4-git-send-email-bp@amd64.org
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Link: <http://lkml.kernel.org/r/1338383402-3838-1-git-send-email-andre.przywara@amd.com>
[Boris: correct and expand commit message]
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
There's no real reason why, when showing the MSRs on a CPU at boottime,
we should be using the AMD-specific variant. Simply use the generic safe
one which handles #GPs just fine.
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1338562358-28182-3-git-send-email-bp@amd64.org
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Until now, writing to error count caused the code to reset the
thresholding bank to the current thresholding limit and start counting
errors from the beginning.
This is misleading and unclear, and can be accomplished by writing the
old thresholding limit into ->threshold_limit.
Make error_count read-only with the functionality to show the current
error count.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>