Commit graph

9561 commits

Author SHA1 Message Date
Ingo Molnar
b5210b2a34 Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core
Pull uprobes updates from Oleg Nesterov:

 - "uretprobes" - an optimization to uprobes, like kretprobes are an optimization
   to kprobes. "perf probe -x file sym%return" now works like kretprobes.

 - PowerPC fixes plus a couple of cleanups/optimizations in uprobes and trace_uprobes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-16 11:04:10 +02:00
Wang YanQing
26bfc540f6 x86/mm/gart: Drop unnecessary check
The memblock_find_in_range() return value addr is guaranteed
to be within "addr + aper_size" and not beyond GART_MAX_ADDR.

Signed-off-by: Wang YanQing <udknight@gmail.com>
Cc: yinghai@kernel.org
Link: http://lkml.kernel.org/r/20130416013734.GA14641@udknight
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-16 10:54:40 +02:00
Linus Torvalds
6c4c4d4bda Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Flush lazy MMU when DEBUG_PAGEALLOC is set
  x86/mm/cpa/selftest: Fix false positive in CPA self test
  x86/mm/cpa: Convert noop to functional fix
  x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal
  x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates
2013-04-14 11:13:24 -07:00
Linus Torvalds
ae9f4939ba Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc fixlets"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix error return code
  ftrace: Fix strncpy() use, use strlcpy() instead of strncpy()
  perf: Fix strncpy() use, use strlcpy() instead of strncpy()
  perf: Fix strncpy() use, always make sure it's NUL terminated
  perf: Fix ring_buffer perf_output_space() boundary calculation
  perf/x86: Fix uninitialized pt_regs in intel_pmu_drain_bts_buffer()
2013-04-14 11:10:44 -07:00
Anton Arapov
791eca1010 uretprobes/x86: Hijack return address
Hijack the return address and replace it with a trampoline address.

Signed-off-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2013-04-13 15:31:55 +02:00
Konrad Rzeszutek Wilk
357d122670 x86, xen, gdt: Remove the pvops variant of store_gdt.
The two use-cases where we needed to store the GDT were during ACPI S3 suspend
and resume. As the patches:
 x86/gdt/i386: store/load GDT for ACPI S3 or hibernation/resume path is not needed
 x86/gdt/64-bit: store/load GDT for ACPI S3 or hibernate/resume path is not needed.

have demonstrated - there are other mechanism by which the GDT is
saved and reloaded during early resume path.

Hence we do not need to worry about the pvops call-chain for saving the
GDT and can and can eliminate it. The other areas where the store_gdt is
used are never going to be hit when running under the pvops platforms.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/1365194544-14648-4-git-send-email-konrad.wilk@oracle.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-11 15:40:38 -07:00
Konrad Rzeszutek Wilk
84e70971e6 x86-32, gdt: Store/load GDT for ACPI S3 or hibernation/resume path is not needed
During the ACPI S3 suspend, we store the GDT in the wakup_header (see
wakeup_asm.s) field called 'pmode_gdt'.

Which is then used during the resume path and has the same exact
value as what the store/load_gdt do with the saved_context
(which is saved/restored via save/restore_processor_state()).

The flow during resume from ACPI S3 is simpler than the 64-bit
counterpart. We only use the early bootstrap once (wakeup_gdt) and
do various checks in real mode.

After the checks are completed, we load the saved GDT ('pmode_gdt') and
continue on with the resume (by heading to startup_32 in trampoline_32.S) -
which quickly jumps to what was saved in 'pmode_entry'
aka 'wakeup_pmode_return'.

The 'wakeup_pmode_return' restores the GDT (saved_gdt) again (which was
saved in do_suspend_lowlevel initially). After that it ends up calling
the 'ret_point' which calls 'restore_processor_state()'.

We have two opportunities to remove code where we restore the same GDT
twice.

Here is the call chain:
 wakeup_start
       |- lgdtl wakeup_gdt [the work-around broken BIOSes]
       |
       | - lgdtl pmode_gdt [the real one]
       |
       \-- startup_32 (in trampoline_32.S)
              \-- wakeup_pmode_return (in wakeup_32.S)
                       |- lgdtl saved_gdt [the real one]
                       \-- ret_point
                             |..
                             |- call restore_processor_state

The hibernate path is much simpler. During the saving of the hibernation
image we call save_processor_state() and save the contents of that
along with the rest of the kernel in the hibernation image destination.
We save the EIP of 'restore_registers' (restore_jump_address) and
cr3 (restore_cr3).

During hibernate resume, the 'restore_registers' (via the
'restore_jump_address) in hibernate_asm_32.S is invoked which
restores the contents of most registers. Naturally the resume path benefits
from already being in 32-bit mode, so it does not have to reload the GDT.

It only reloads the cr3 (from restore_cr3) and continues on. Note
that the restoration of the restore image page-tables is done prior to
this.

After the 'restore_registers' it returns and we end up called
restore_processor_state() - where we reload the GDT. The reload of
the GDT is not needed as bootup kernel has already loaded the GDT
which is at the same physical location as the the restored kernel.

Note that the hibernation path assumes the GDT is correct during its
'restore_registers'. The assumption in the code is that the restored
image is the same as saved - meaning we are not trying to restore
an different kernel in the virtual address space of a new kernel.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/1365194544-14648-3-git-send-email-konrad.wilk@oracle.com
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-11 15:40:17 -07:00
Kees Cook
4eefbe792b x86: Use a read-only IDT alias on all CPUs
Make a copy of the IDT (as seen via the "sidt" instruction) read-only.
This primarily removes the IDT from being a target for arbitrary memory
write attacks, and has the added benefit of also not leaking the kernel
base offset, if it has been relocated.

We already did this on vendor == Intel and family == 5 because of the
F0 0F bug -- regardless of if a particular CPU had the F0 0F bug or
not.  Since the workaround was so cheap, there simply was no reason to
be very specific.  This patch extends the readonly alias to all CPUs,
but does not activate the #PF to #UD conversion code needed to deliver
the proper exception in the F0 0F case except on Intel family 5
processors.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20130410192422.GA17344@www.outflux.net
Cc: Eric Northup <digitaleric@google.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-11 13:53:19 -07:00
Boris Ostrovsky
511ba86e1d x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal
Invoking arch_flush_lazy_mmu_mode() results in calls to
preempt_enable()/disable() which may have performance impact.

Since lazy MMU is not used on bare metal we can patch away
arch_flush_lazy_mmu_mode() so that it is never called in such
environment.

[ hpa: the previous patch "Fix vmalloc_fault oops during lazy MMU
  updates" may cause a minor performance regression on
  bare metal.  This patch resolves that performance regression.  It is
  somewhat unclear to me if this is a good -stable candidate. ]

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1364045796-10720-2-git-send-email-konrad.wilk@oracle.com
Tested-by: Josh Boyer <jwboyer@redhat.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> SEE NOTE ABOVE
2013-04-10 11:25:10 -07:00
Andi Kleen
f8378f5259 perf/x86: Add Sandy Bridge constraints for CYCLE_ACTIVITY.*
Add CYCLE_ACTIVITY.CYCLES_NO_DISPATCH/CYCLES_L1D_PENDING constraints.

These recently documented events have restrictions to counter
0-3 and counter 2 respectively. The perf scheduler needs to know
that to schedule them correctly.

IvyBridge already has the necessary constraints.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: a.p.zijlstra@chello.nl
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1362784968-12542-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-10 15:00:07 +02:00
Jacob Shin
9c5320c8ea cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor
Future AMD processors, starting with Family 16h, can provide software
with feedback on how the workload may respond to frequency change --
memory-bound workloads will not benefit from higher frequency, where
as compute-bound workloads will. This patch enables this "frequency
sensitivity feedback" to aid the ondemand governor to make better
frequency change decisions by hooking into the powersave bias.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Acked-by: Thomas Renninger <trenn@suse.de>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-10 13:19:26 +02:00
Ingo Molnar
b6d5278dc8 Clean up cmci_rediscover code to fix problems found by Dave Jones
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRXI4YAAoJEKurIx+X31iBY1UP/Rq7DkmmffFh2faMVPMdIC/T
 Z3xwbqmqNpqJOjBojocOkwpG44Da51h4dZNIdm6xtEFojqf9UdaTPobe00jOEotX
 MPcc4VsuTS2WLd3dCkRHCfHu2nW6z/F4ysEtIirrHVj5JVqqH9atZPkGBB3Wcsic
 00QTV/JeOP1QjvGEMxnaW2rNIXSE19NNbiHE0tx9jNYZuecOhll/GH2AOW17+Kvn
 ocn4dkisbHRZ/YixuwkqXzDABj2+qiwjHZbP4f4jZbdDuuDR2KWeW0Vkoau1vRHg
 TU3FX2BCBy4foA8St/AQ7WRbmSA/+5obUWvSWevXXZu7A6CxAmkv4s46D3rRuIaB
 MLDqov0NobqM4vjJn4Y4bcbvYMIv8zJijok6sg2MNFIHUy9iBFTlJHOai9guSdiB
 kHKTfLsCUpdkiwHMq3rcFCAMZi/DSxjjPbae8IzP7+IJAG1ff1SP6QwM6UOZq9h1
 wVYpvavIOwBhKIsjghkqF0Ov/8a+4lyXMksQ3nKleKYD5908hpDYMD/2i6MTCkpy
 50k3OVvnpWbGdQmYwbTiGaOW/jQ2tC72v24o/Y9tKYc2Fea0BlWn+yzIGahCn6ok
 j7oQHAW1OJ2oc3h/RWtDWSejXUpYsyclCZhlFB+TQj1J/2d7VytFxo+g156xD1Rj
 VJ0Dseml7vA4Orr6jAMh
 =wipk
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-cmci_rediscover' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras

Pull clean up of the cmci_rediscover code to fix problems found by Dave Jones,
from Tony Luck.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-08 17:41:50 +02:00
Thomas Gleixner
7d1a941731 x86: Use generic idle loop
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130321215235.486594473@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
2013-04-08 17:39:29 +02:00
Masami Hiramatsu
8101376dc5 kprobes/x86: Just return error for sanity check failure instead of using BUG_ON
Return an error from __copy_instruction() and use printk() to
give us a more productive message, since this is just an error
case which we can handle and also the BUG_ON() never tells us
why and what happened.

This is related to the following bug-report:

   https://bugzilla.redhat.com/show_bug.cgi?id=910649

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20130404104230.22862.85242.stgit@mhiramat-M0-7522
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-08 17:28:34 +02:00
Ingo Molnar
529801898b Merge branch 'for-tip' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/core
Pull IBM zEnterprise EC12 support patchlet from Robert Richter.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-08 11:43:30 +02:00
Thomas Gleixner
0ed2aef9b3 Merge branch 'fortglx/3.10/time' of git://git.linaro.org/people/jstultz/linux into timers/core 2013-04-03 12:27:29 +02:00
Borislav Petkov
73f460408c x86, quirks: Shut-up a long-standing gcc warning
So gcc nags about those since forever in randconfig builds.

arch/x86/kernel/quirks.c: In function ‘ati_ixp4x0_rev’:
arch/x86/kernel/quirks.c:361:4: warning: ‘b’ is used uninitialized in this function [-Wuninitialized]
arch/x86/kernel/quirks.c: In function ‘ati_force_enable_hpet’:
arch/x86/kernel/quirks.c:367:4: warning: ‘d’ may be used uninitialized in this function [-Wuninitialized]
arch/x86/kernel/quirks.c:357:6: note: ‘d’ was declared here
arch/x86/kernel/quirks.c:407:21: warning: ‘val’ may be used uninitialized in this function [-Wuninitialized]

This function quirk is called on a SB400 chipset only anyway so the
distant possibility of a PCI access failing becomes almost impossible
there. Even if it did fail, then something else more serious is the
problem.

So zero-out the variables so that gcc shuts up but do a coarse check
on the PCI accesses at the end and signal whether any of them had an
error. They shouldn't but in case they do, we'll at least know and we
can address it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1362428180-8865-6-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-02 16:03:34 -07:00
Borislav Petkov
8e3c2a8cf6 x86: Drop KERNEL_IMAGE_START
We have KERNEL_IMAGE_START and __START_KERNEL_map which both contain the
start of the kernel text mapping's virtual address. Remove the prior one
which has been replicated a lot less times around the tree.

No functionality change.

Signed-off-by: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/1362428180-8865-3-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-02 16:03:29 -07:00
Srivatsa S. Bhat
7a0c819d28 x86/mce: Rework cmci_rediscover() to play well with CPU hotplug
Dave Jones reports that offlining a CPU leads to this trace:

numa_remove_cpu cpu 1 node 0: mask now 0,2-3
smpboot: CPU 1 is now offline
BUG: using smp_processor_id() in preemptible [00000000] code:
cpu-offline.sh/10591
caller is cmci_rediscover+0x6a/0xe0
Pid: 10591, comm: cpu-offline.sh Not tainted 3.9.0-rc3+ #2
Call Trace:
 [<ffffffff81333bbd>] debug_smp_processor_id+0xdd/0x100
 [<ffffffff8101edba>] cmci_rediscover+0x6a/0xe0
 [<ffffffff815f5b9f>] mce_cpu_callback+0x19d/0x1ae
 [<ffffffff8160ea66>] notifier_call_chain+0x66/0x150
 [<ffffffff8107ad7e>] __raw_notifier_call_chain+0xe/0x10
 [<ffffffff8104c2e3>] cpu_notify+0x23/0x50
 [<ffffffff8104c31e>] cpu_notify_nofail+0xe/0x20
 [<ffffffff815ef082>] _cpu_down+0x302/0x350
 [<ffffffff815ef106>] cpu_down+0x36/0x50
 [<ffffffff815f1c9d>] store_online+0x8d/0xd0
 [<ffffffff813edc48>] dev_attr_store+0x18/0x30
 [<ffffffff81226eeb>] sysfs_write_file+0xdb/0x150
 [<ffffffff811adfb2>] vfs_write+0xa2/0x170
 [<ffffffff811ae16c>] sys_write+0x4c/0xa0
 [<ffffffff81613019>] system_call_fastpath+0x16/0x1b

However, a look at cmci_rediscover shows that it can be simplified quite
a bit, apart from solving the above issue. It invokes functions that
take spin locks with interrupts disabled, and hence it can run in atomic
context. Also, it is run in the CPU_POST_DEAD phase, so the dying CPU
is already dead and out of the cpu_online_mask. So take these points into
account and simplify the code, and thereby also fix the above issue.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-04-02 14:04:01 -07:00
Borislav Petkov
7d7dc116e5 x86, cpu: Convert AMD Erratum 400
Convert AMD erratum 400 to the bug infrastructure. Then, retract all
exports for modules since they're not needed now and make the AMD
erratum checking machinery local to amd.c. Use forward declarations to
avoid shuffling too much code around needlessly.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1363788448-31325-7-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2013-04-02 10:12:55 -07:00
Borislav Petkov
e6ee94d58d x86, cpu: Convert AMD Erratum 383
Convert the AMD erratum 383 testing code to the bug infrastructure. This
allows keeping the AMD-specific erratum testing machinery private to
amd.c and not export symbols to modules needlessly.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1363788448-31325-6-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2013-04-02 10:12:54 -07:00
Borislav Petkov
c5b41a6750 x86, cpu: Convert Cyrix coma bug detection
... to the new facility.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1363788448-31325-5-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2013-04-02 10:12:54 -07:00
Borislav Petkov
93a829e8e2 x86, cpu: Convert FDIV bug detection
... to the new facility. Add a reference to the wikipedia article
explaining the FDIV test we're doing here.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1363788448-31325-4-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2013-04-02 10:12:53 -07:00
Borislav Petkov
e2604b49e8 x86, cpu: Convert F00F bug detection
... to using the new facility and drop the cpuinfo_x86 member.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1363788448-31325-3-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2013-04-02 10:12:52 -07:00
Borislav Petkov
65fc985b37 x86, cpu: Expand cpufeature facility to include cpu bugs
We add another 32-bit vector at the end of the ->x86_capability
bitvector which collects bugs present in CPUs. After all, a CPU bug is a
kind of a capability, albeit a strange one.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1363788448-31325-2-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2013-04-02 10:12:52 -07:00
Stephane Eranian
9ad64c0f48 perf/x86: Add support for PEBS Precise Store
This patch adds support for PEBS Precise Store
which is available on Intel Sandy Bridge and
Ivy Bridge processors.

To use Precise store, the proper PEBS event
must be used: mem_trans_retired:precise_stores.
For the perf tool, the generic mem-stores event
exported via sysfs can be used directly.

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Cc: acme@redhat.com
Cc: jolsa@redhat.com
Cc: namhyung.kim@lge.com
Link: http://lkml.kernel.org/r/1359040242-8269-11-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-04-01 12:17:06 -03:00
Stephane Eranian
a63fcab452 perf/x86: Export PEBS load latency threshold register to sysfs
Make the PEBS Load Latency threshold register layout
and encoding visible to user level tools.

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Cc: acme@redhat.com
Cc: jolsa@redhat.com
Cc: namhyung.kim@lge.com
Link: http://lkml.kernel.org/r/1359040242-8269-10-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-04-01 12:16:49 -03:00
Stephane Eranian
f20093eef5 perf/x86: Add memory profiling via PEBS Load Latency
This patch adds support for memory profiling using the
PEBS Load Latency facility.

Load accesses are sampled by HW and the instruction
address, data address, load latency, data source, tlb,
locked information can be saved in the sampling buffer
if using the PERF_SAMPLE_COST (for latency),
PERF_SAMPLE_ADDR, PERF_SAMPLE_DATA_SRC types.

To enable PEBS Load Latency, users have to use the
model specific event:

 - on NHM/WSM: MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD
 - on SNB/IVB: MEM_TRANS_RETIRED:LATENCY_ABOVE_THRESHOLD

To make things easier, this patch also exports a generic
alias via sysfs: mem-loads. It export the right event
encoding based on the host CPU and can be used directly
by the perf tool.

Loosely based on Intel's Lin Ming patch posted on LKML
in July 2011.

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Cc: acme@redhat.com
Cc: jolsa@redhat.com
Cc: namhyung.kim@lge.com
Link: http://lkml.kernel.org/r/1359040242-8269-9-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-04-01 12:16:31 -03:00
Stephane Eranian
9fac2cf316 perf/x86: Add flags to event constraints
This patch adds a flags field to each event constraint.
It can be used to store event specific features which can
then later be used by scheduling code or low-level x86 code.

The flags are propagated into event->hw.flags during the
get_event_constraint() call. They are cleared during the
put_event_constraint() call.

This mechanism is going to be used by the PEBS-LL patches.
It avoids defining yet another table to hold event specific
information.

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Cc: jolsa@redhat.com
Cc: namhyung.kim@lge.com
Link: http://lkml.kernel.org/r/1359040242-8269-4-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-04-01 12:15:04 -03:00
Stephane Eranian
3a54aaa0a3 perf/x86: Improve sysfs event mapping with event string
This patch extends Jiri's changes to make generic
events mapping visible via sysfs. The patch extends
the mechanism to non-generic events by allowing
the mappings to be hardcoded in strings.

This mechanism will be used by the PEBS-LL patch
later on.

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Cc: acme@redhat.com
Cc: jolsa@redhat.com
Cc: namhyung.kim@lge.com
Link: http://lkml.kernel.org/r/1359040242-8269-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[ fixed up conflict with 2663960 "perf: Make EVENT_ATTR global" ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-03-26 17:36:45 -03:00
Andi Kleen
1a6461b128 perf/x86: Support CPU specific sysfs events
Add a way for the CPU initialization code to register additional
events, and merge them into the events attribute directory. Used
in the next patch.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: acme@redhat.com
Cc: jolsa@redhat.com
Cc: namhyung.kim@lge.com
Link: http://lkml.kernel.org/r/1359040242-8269-2-git-send-email-eranian@google.com
[ small cleanups ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[ merge_attr returns a **, not just * ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-03-26 16:50:23 -03:00
Linus Torvalds
33b73e9b3e Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "A collection of minor fixes, more EFI variables paranoia
  (anti-bricking) plus the ability to disable the pstore either as a
  runtime default or completely, due to bricking concerns."

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efivars: Fix check for CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE
  x86, microcode_intel_early: Mark apply_microcode_early() as cpuinit
  efivars: Handle duplicate names from get_next_variable()
  efivars: explicitly calculate length of VariableName
  efivars: Add module parameter to disable use as a pstore backend
  efivars: Allow disabling use as a pstore backend
  x86-32, microcode_intel_early: Fix crash with CONFIG_DEBUG_VIRTUAL
  x86-64: Fix the failure case in copy_user_handle_tail()
2013-03-24 10:10:34 -07:00
Boris Ostrovsky
bafcdd3b6c x86, MCE, AMD: Use MCG_CAP MSR to find out number of banks on AMD
Currently number of error reporting register banks is hardcoded to
6 on AMD processors. This may break in virtualized scenarios when
a hypervisor prefers to report fewer banks than what the physical
HW provides.

Since number of supported banks is reported in MSR_IA32_MCG_CAP[7:0]
that's what we should use.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1363295441-1859-3-git-send-email-boris.ostrovsky@oracle.com
[ reverse NULL ptr test logic ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-03-22 11:25:01 +01:00
Boris Ostrovsky
c76e81643c x86, MCE, AMD: Replace shared_bank array with is_shared_bank() helper
Use helper function instead of an array to report whether register
bank is shared. Currently only bank 4 (northbridge) is shared.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1363295441-1859-2-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-03-22 11:25:01 +01:00
H. Peter Anvin
f564c24103 x86, microcode_intel_early: Mark apply_microcode_early() as cpuinit
Add missing __cpuinit annotation to apply_microcode_early().

Reported-by: Shaun Ruffell <sruffell@digium.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/20130320170310.GA23362@digium.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-03-21 17:32:36 -07:00
Linus Torvalds
cd82346934 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "A fair chunk of the linecount comes from a fix for a tracing bug that
  corrupts latency tracing buffers when the overwrite mode is changed on
  the fly - the rest is mostly assorted fewliner fixlets."

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Add SNB/SNB-EP scheduling constraints for cycle_activity event
  kprobes/x86: Check Interrupt Flag modifier when registering probe
  kprobes: Make hash_64() as always inlined
  perf: Generate EXIT event only once per task context
  perf: Reset hwc->last_period on sw clock events
  tracing: Prevent buffer overwrite disabled for latency tracers
  tracing: Keep overwrite in sync between regular and snapshot buffers
  tracing: Protect tracer flags with trace_types_lock
  perf tools: Fix LIBNUMA build with glibc 2.12 and older.
  tracing: Fix free of probe entry by calling call_rcu_sched()
  perf/POWER7: Create a sysfs format entry for Power7 events
  perf probe: Fix segfault
  libtraceevent: Remove hard coded include to /usr/local/include in Makefile
  perf record: Fix -C option
  perf tools: check if -DFORTIFY_SOURCE=2 is allowed
  perf report: Fix build with NO_NEWT=1
  perf annotate: Fix build with NO_NEWT=1
  tracing: Fix race in snapshot swapping
2013-03-21 08:29:11 -07:00
Marcelo Tosatti
2ae33b3896 Merge remote-tracking branch 'upstream/master' into queue
Merge reason:

From: Alexander Graf <agraf@suse.de>

"Just recently this really important patch got pulled into Linus' tree for 3.9:

commit 1674400aae
Author: Anton Blanchard <anton <at> samba.org>
Date:   Tue Mar 12 01:51:51 2013 +0000

Without that commit, I can not boot my G5, thus I can't run automated tests on it against my queue.

Could you please merge kvm/next against linus/master, so that I can base my trees against that?"

* upstream/master: (653 commits)
  PCI: Use ROM images from firmware only if no other ROM source available
  sparc: remove unused "config BITS"
  sparc: delete "if !ULTRA_HAS_POPULATION_COUNT"
  KVM: Fix bounds checking in ioapic indirect register reads (CVE-2013-1798)
  KVM: x86: Convert MSR_KVM_SYSTEM_TIME to use gfn_to_hva_cache functions (CVE-2013-1797)
  KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME (CVE-2013-1796)
  arm64: Kconfig.debug: Remove unused CONFIG_DEBUG_ERRORS
  arm64: Do not select GENERIC_HARDIRQS_NO_DEPRECATED
  inet: limit length of fragment queue hash table bucket lists
  qeth: Fix scatter-gather regression
  qeth: Fix invalid router settings handling
  qeth: delay feature trace
  sgy-cts1000: Remove __dev* attributes
  KVM: x86: fix deadlock in clock-in-progress request handling
  KVM: allow host header to be included even for !CONFIG_KVM
  hwmon: (lm75) Fix tcn75 prefix
  hwmon: (lm75.h) Update header inclusion
  MAINTAINERS: Remove Mark M. Hoffman
  xfs: ensure we capture IO errors correctly
  xfs: fix xfs_iomap_eof_prealloc_initial_size type
  ...

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-03-21 11:11:52 -03:00
Stephane Eranian
0e48026ae7 perf/x86: Fix uninitialized pt_regs in intel_pmu_drain_bts_buffer()
This patch fixes an uninitialized pt_regs struct in drain BTS
function. The pt_regs struct is propagated all the way to the
code_get_segment() function from perf_instruction_pointer()
and may get garbage.

We cannot simply inherit the actual pt_regs from the interrupt
because BTS must be flushed on context-switch or when the
associated event is disabled. And there we do not have a pt_regs
handy.

Setting pt_regs to all zeroes may not be the best option but it
is not clear what else to do given where the drain_bts_buffer()
is called from.

In V2, we move the memset() later in the code to avoid doing it
when we end up returning early without doing the actual BTS
processing. Also dropped the reg.val initialization because it
is redundant with the memset() as suggested by PeterZ.

Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: peterz@infradead.org
Cc: sqazi@google.com
Cc: ak@linux.intel.com
Cc: jolsa@redhat.com
Link: http://lkml.kernel.org/r/20130319151038.GA25439@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-03-21 12:03:29 +01:00
Fenghua Yu
c83a9d5e42 x86-32, microcode_intel_early: Fix crash with CONFIG_DEBUG_VIRTUAL
In 32-bit, __pa_symbol() in CONFIG_DEBUG_VIRTUAL accesses kernel data
(e.g.  max_low_pfn) that not only hasn't been setup yet in such early
boot phase, but since we are in linear mode, cannot even be detected
as uninitialized.

Thus, use __pa_nodebug() rather than __pa_symbol() to get a global
symbol's physical address.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1363705484-27645-1-git-send-email-fenghua.yu@intel.com
Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2013-03-19 19:51:08 -07:00
Stephane Eranian
fd4a5aef00 perf/x86: Add SNB/SNB-EP scheduling constraints for cycle_activity event
Add scheduling constraints for SNB/SNB-EP CYCLE_ACTIVITY event
as defined by SDM Jan 2013 edition. The STALLS umasks are
combinations with the NO_DISPATCH umask.

Signed-off-by: Stephane Eranian <eranian@gmail.com>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Cc: jolsa@redhat.com
Link: http://lkml.kernel.org/r/20130317134957.GA8550@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-03-18 10:23:13 +01:00
Masami Hiramatsu
9a556ab998 kprobes/x86: Check Interrupt Flag modifier when registering probe
Currently kprobes check whether the copied instruction modifies
IF (interrupt flag) on each probe hit. This results not only in
introducing overhead but also involving
inat_get_opcode_attribute into the kprobes hot path, and it can
cause an infinite recursive call (and kernel panic in the end).

Actually, since the copied instruction itself can never be modified
on the buffer, it is needless to analyze the instruction on every
probe hit.

To fix this issue, we check it only once when registering probe
and store the result on ainsn->if_modifier.

Reported-by: Timo Juhani Lindfors <timo.lindfors@iki.fi>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20130314115242.19690.33573.stgit@mhiramat-M0-7522
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-03-18 10:21:23 +01:00
Linus Torvalds
2a6e06b2ae perf,x86: fix wrmsr_on_cpu() warning on suspend/resume
Commit 1d9d8639c0 ("perf,x86: fix kernel crash with PEBS/BTS after
suspend/resume") fixed a crash when doing PEBS performance profiling
after resuming, but in using init_debug_store_on_cpu() to restore the
DS_AREA mtrr it also resulted in a new WARN_ON() triggering.

init_debug_store_on_cpu() uses "wrmsr_on_cpu()", which in turn uses CPU
cross-calls to do the MSR update.  Which is not really valid at the
early resume stage, and the warning is quite reasonable.  Now, it all
happens to _work_, for the simple reason that smp_call_function_single()
ends up just doing the call directly on the CPU when the CPU number
matches, but we really should just do the wrmsr() directly instead.

This duplicates the wrmsr() logic, but hopefully we can just remove the
wrmsr_on_cpu() version eventually.

Reported-and-tested-by: Parag Warudkar <parag.lkml@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-17 15:44:43 -07:00
Feng Tang
82f9c080b2 x86: tsc: Add support for new S3_NONSTOP feature
Add support for new S3_NONSTOP feature

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-03-15 16:51:18 -07:00
Feng Tang
c54fdbb282 x86: Add cpu capability flag X86_FEATURE_NONSTOP_TSC_S3
On some new Intel Atom processors (Penwell and Cloverview), there is
a feature that the TSC won't stop in S3 state, say the TSC value
won't be reset to 0 after resume. This feature makes TSC a more reliable
clocksource and could benefit the timekeeping code during system
suspend/resume cycle, so add a flag for it.

Signed-off-by: Feng Tang <feng.tang@intel.com>
[jstultz: Fix checkpatch warning]
Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-03-15 16:50:26 -07:00
Prarit Bhargava
3195ef59cb x86: Do full rtc synchronization with ntp
Every 11 minutes ntp attempts to update the x86 rtc with the current
system time.  Currently, the x86 code only updates the rtc if the system
time is within +/-15 minutes of the current value of the rtc. This
was done originally to avoid setting the RTC if the RTC was in localtime
mode (common with Windows dualbooting).  Other architectures do a full
synchronization and now that we have better infrastructure to detect
when the RTC is in localtime, there is no reason that x86 should be
software limited to a 30 minute window.

This patch changes the behavior of the kernel to do a full synchronization
(year, month, day, hour, minute, and second) of the rtc when ntp requests
a synchronization between the system time and the rtc.

I've used the RTC library functions in this patchset as they do all the
required bounds checking.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: x86@kernel.org
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: linux-efi@vger.kernel.org
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
[jstultz: Tweak commit message, fold in build fix found by fengguang
Also add select RTC_LIB to X86, per new dependency, as found by prarit]
Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-03-15 16:50:26 -07:00
Stephane Eranian
1d9d8639c0 perf,x86: fix kernel crash with PEBS/BTS after suspend/resume
This patch fixes a kernel crash when using precise sampling (PEBS)
after a suspend/resume. Turns out the CPU notifier code is not invoked
on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
by the kernel and keeps it power-on/resume value of 0 causing any PEBS
measurement to crash when running on CPU0.

The workaround is to add a hook in the actual resume code to restore
the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
the DS_AREA will be restored twice but this is harmless.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-15 09:26:35 -07:00
Thomas Gleixner
35b61edb41 x86: Use tick broadcast expired check
Avoid going back into deep idle if the tick broadcast IPI is about to
fire.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Arjan van de Veen <arjan@infradead.org>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20130306111537.702278273@linutronix.de
2013-03-13 11:39:40 +01:00
Jan Beulich
c391c78846 x86: Constify a few items
This in particular re-does the compiler warning fix 9faec5b
("perf/x86: Fix P6 driver section warning"), tightening the
section attributes rather than relaxing them.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Shaun Ruffell <sruffell@digium.com>
Cc: yangyongqiang <yangyongqiang01@baidu.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/513DB84502000078000C4880@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-03-11 15:11:03 +01:00
Jan Beulich
ec7fd34425 x86: Drop always empty .text..page_aligned section
Commit e44b7b7 ("x86: move suspend wakeup code to C") didn't
care to also eliminate the side effects that the earlier 4c49156
("x86: make arch/x86/kernel/acpi/wakeup_32.S use a separate")
had, thus leaving a now pointless, almost page size gap at the
beginning of .text.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Pavel Machek <pavel@ucw.cz>
Link: http://lkml.kernel.org/r/513DBAA402000078000C4896@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-03-11 15:07:56 +01:00
Frederic Weisbecker
6c1e0256fa context_tracking: Restore correct previous context state on exception exit
On exception exit, we restore the previous context tracking state based on
the regs of the interrupted frame. Iff that frame is in user mode as
stated by user_mode() helper, we restore the context tracking user mode.

However there is a tiny chunck of low level arch code after we pass through
user_enter() and until the CPU eventually resumes userspace.
If an exception happens in this tiny area, exception_enter() correctly
exits the context tracking user mode but exception_exit() won't restore
it because of the value returned by user_mode(regs).

As a result we may return to userspace with the wrong context tracking
state.

To fix this, change exception_enter() to return the context tracking state
prior to its call and pass this saved state to exception_exit(). This restores
the real context tracking state of the interrupted frame.

(May be this patch was suggested to me, I don't recall exactly. If so,
sorry for the missing credit).

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Mats Liljegren <mats.liljegren@enea.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-03-07 17:10:11 +01:00