Commit graph

11811 commits

Author SHA1 Message Date
Dmitry Shmidt
f225dbdca9 This is the 4.4.35 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJYOU3vAAoJEDjbvchgkmk+rVkQAJ/SzsPfZL3j+dFrI/H+NO5b
 mPkdSqjvJ0ww+JJ0Q+YoZ4ibyv6l33vhjg7dVkf6Cr0CTSzRXkmDfZy9nOB5JwUN
 Sn/6e7cib5+gs3rFF7Jdy+VoHwrtvvcfKX5ZW1pB9qJDe+24l/B52ur2ILQHReGl
 1sD2JmPsuW1Mu2p1An5HIi15bw1ZO7t9nPwqpyIVEyGFhkuXO1SYdmcXbx3Ugw17
 5O6MBhe8Wzkdn8j4guA5FgZMjdjn74vRQyuWpmhHAtvcJbHjROi2od/mD8S/RaFK
 7XCs6qI+d62KgiAZz/u55Ev13pNDbABmVEJj7zzHEUg8Mep7m7YdCf2z5wp7aYWK
 LjxWxdvv0Geu0cj7POH8VKPnF2AtzA3B74KKj9nq0xovu6T8MThYu/+heBffx+N9
 SmH5iNufDwFP275xXGMpkWux8tSO519Fi9TjB8cKH3zsHUCQRvPRhO4OEAHXJ880
 B4YAV1nk7AqkW3PeyHoDstQH3OTV5S1esUORyYWPEF6bcskd0OlQAAY+kmtItcDd
 xRdpSglGTZ/NIX6c5pv3skdaoZawNv/fYHkRqM6v8ckMwWLFqCc2hDi3+xiIQkxg
 qhmT7MHbAbhBIYjadf3rNCF++GRN1xyGmyLVRWg89odlIoZEH7EKhtiqDuSHC336
 Cr32DpZZCwZPATlJN+ES
 =Qm+E
 -----END PGP SIGNATURE-----

Merge tag 'v4.4.35' into android-4.4.y

This is the 4.4.35 stable release
2016-12-01 13:56:55 -08:00
Yazen Ghannam
aea9d760b8 x86/cpu/AMD: Fix cpu_llc_id for AMD Fam17h systems
commit b0b6e86846093c5f8820386bc01515f857dd8faa upstream.

cpu_llc_id (Last Level Cache ID) derivation on AMD Fam17h has an
underflow bug when extracting the socket_id value. It starts from 0
so subtracting 1 from it will result in an invalid value. This breaks
scheduling topology later on since the cpu_llc_id will be incorrect.

For example, the the cpu_llc_id of the *other* CPU in the loops in
set_cpu_sibling_map() underflows and we're generating the funniest
thread_siblings masks and then when I run 8 threads of nbench, they get
spread around the LLC domains in a very strange pattern which doesn't
give you the normal scheduling spread one would expect for performance.

Other things like EDAC use cpu_llc_id so they will be b0rked too.

So, the APIC ID is preset in APICx020 for bits 3 and above: they contain
the core complex, node and socket IDs.

The LLC is at the core complex level so we can find a unique cpu_llc_id
by right shifting the APICID by 3 because then the least significant bit
will be the Core Complex ID.

Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
[ Cleaned up and extended the commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 3849e91f57 ("x86/AMD: Fix last level cache topology for AMD Fam17h systems")
Link: http://lkml.kernel.org/r/20161108083506.rvqb5h4chrcptj7d@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-26 09:54:51 +01:00
Dmitry Shmidt
5577070b6b This is the 4.4.30 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJYF/ZyAAoJEDjbvchgkmk+6RcP+wewepsJZuBDDT0hNVglGDoQ
 GeD5/Ck6sMoRMGLSwM8IxHdC0wBml8fFj0DUX103UK+LK2CF8NSzLOgYCM4g7E9P
 E5Ye+SfSeefl47Is3hbW2YesrLjswEYht8zqytHehc7Uz6UAYwxHvAlpmbRvpI8k
 l+sbTnqovTBoXw3S/tAGpOm3hSeb7Qw1lVKlOtnYAk7e7OotqKbQjbbKKzI8Orbg
 uhs5UV2iw6AQY2mHZssilUiw5iEg/wqw+Ksjxos69thbT5oHrhAfx+grdOAWU9//
 m1YHh6ljZb/FXTrQBbt82R96HhlOKCKCj88dPfsHtIBfVOLWSacv9nc0PC8Xt3Uy
 xDq8DgP7OJRPVTNGuFFL1TuReWFtoYsDcxdQDyivn61XOPXYbKuXI4L5PJaSKQ85
 w4cbD0iOsk6MyfZR2wzj/yESzcJQz40A0sAOxGjak0ZgnGL6LrkKgcZhEBRMR6cs
 zl4TF6R2bCbSDSpnSxHVmCKU+rzENoXql6IE/9ExMNMbo1t0NDrlB+Okojj+n2oR
 wZOHxEPxAzeZVNx4pBiuVlRarSZ54TS0miLROnCRUfbwn4chDMOEbB+i5NwFzey2
 2yfqV5cNUAy0bzbz/Zw5DcF+FNkYadgPdDHZLGu0w6HJ8//V1gZKg/6Ca3Cc/iK0
 WNtHNZUOpCFLPesosNpt
 =+QqD
 -----END PGP SIGNATURE-----

Merge tag 'v4.4.30' into android-4.4.y

This is the 4.4.30 stable release
2016-11-07 10:23:59 -08:00
Greg Kroah-Hartman
bb730cc301 Revert "x86/mm: Expand the exception table logic to allow new handling options"
This reverts commit fcf5e5198b which is
548acf19234dbda5a52d5a8e7e205af46e9da840 upstream.

Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-31 19:56:26 -06:00
Tony Luck
fcf5e5198b x86/mm: Expand the exception table logic to allow new handling options
commit 548acf19234dbda5a52d5a8e7e205af46e9da840 upstream.

Huge amounts of help from  Andy Lutomirski and Borislav Petkov to
produce this. Andy provided the inspiration to add classes to the
exception table with a clever bit-squeezing trick, Boris pointed
out how much cleaner it would all be if we just had a new field.

Linus Torvalds blessed the expansion with:

  ' I'd rather not be clever in order to save just a tiny amount of space
    in the exception table, which isn't really criticial for anybody. '

The third field is another relative function pointer, this one to a
handler that executes the actions.

We start out with three handlers:

 1: Legacy - just jumps the to fixup IP
 2: Fault - provide the trap number in %ax to the fixup code
 3: Cleaned up legacy for the uaccess error hack

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f6af78fcbd348cf4939875cfda9c19689b5e50b8.1455732970.git.tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-31 04:14:00 -06:00
Ville Syrjälä
be1cd22fe1 drm/i915: Account for TSEG size when determining 865G stolen base
commit d721b02fd00bf133580f431b82ef37f3b746dfb2 upstream.

Looks like the TSEG lives just above TOUD, stolen comes after TSEG.

The spec seems somewhat self-contradictory in places, in the ESMRAMC
register desctription it says:
 TSEG Size:
  10=(TOUD + 512 KB) to TOUD
  11 =(TOUD + 1 MB) to TOUD

so that agrees with TSEG being at TOUD. But the example given
elsehwere in the spec says:

 TOUD equals 62.5 MB = 03E7FFFFh
 TSEG selected as 512 KB in size,
 Graphics local memory selected as 1 MB in size
 General System RAM available in system = 62.5 MB
 General system RAM range00000000h to 03E7FFFFh
 TSEG address range03F80000h to 03FFFFFFh
 TSEG pre-allocated from03F80000h to 03FFFFFFh
 Graphics local memory pre-allocated from03E80000h to 03F7FFFFh

so here we have TSEG above stolen.

Real world evidence agrees with the TOUD->TSEG->stolen order however, so
let's fix up the code to account for the TSEG size.

Cc: Taketo Kabe <fdporg@vega.pgw.jp>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Fixes: 0ad98c74e0 ("drm/i915: Determine the stolen memory base address on gen2")
Fixes: a4dff76924 ("x86/gpu: Add Intel graphics stolen memory quirk for gen2 platforms")
Reported-by: Taketo Kabe <fdporg@vega.pgw.jp>
Tested-by: Taketo Kabe <fdporg@vega.pgw.jp>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96473
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470653919-27251-1-git-send-email-ville.syrjala@linux.intel.com
Link: http://download.intel.com/design/chipsets/datashts/25251405.pdf
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-31 04:13:58 -06:00
Dmitry Shmidt
c302df26cb This is the 4.4.28 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJYEwP8AAoJEDjbvchgkmk+a+AQALRSaM0ngEiS/y8adwUhRKn/
 C1wimkHCPZaV6vrqRnp1VuvidhPI1YFQrAbcESMegpo1n87jdl2CcwcpHdYvhBrt
 W5a0ezIyugsE6Olgrf+gtwBl0mfpB7ZW2h2/M0yZYskjyRLDBKS5EwEVbX7Y0BEB
 OwdDJJw707U36fPgPEWzzOBDa/DBy+QNYeflzCbsLWX+dCMQ+pjrF7tTT5/oOZoO
 +er1LgO5onAc9kooOqbv8QapfsRD1zGQHjb9QvjYRvONz1VeggfgsywNWiGJ8lS2
 lyqoT+6jODpvuFwRNimb7+EPdZ2siFoTYHbdmSKOE479T8uNPZMP/cFmRt8YDTyl
 c7bmrE/igOH8wcgJniIuZz9BJm/ElT6a+gijI/u0I2ygj1SBKeIa9sThRTCGnmQM
 X2iQ9zK1YdCAQ8PKKt965/AnjnLXojg0NvKoMcMjCzRFtQ7B77hw38KXzq8rY9w9
 mThOivQm3InZK2fRURT4HaBzoc2mhGaaK4HtfUQV0g+kky8MmFbibkKo8dUVEryN
 Vjm2EYjbgbc9idVxkeaMVA4VE1XpNL4CUhrmsK0nOFHncNNuxbk9LVfvdc9y9O65
 ypRiYDGXjhkvRNvTCvOB+pcWLiW/WVeM+bCeZzlFX0fqRWCGjEROHgGio9GbPlnM
 S+jgnP/F/s/szlR4+U8g
 =kv4U
 -----END PGP SIGNATURE-----

Merge tag 'v4.4.28' into android-4.4.y

This is the 4.4.28 stable release
2016-10-28 10:44:19 -07:00
Dan Williams
c234231da4 x86/e820: Don't merge consecutive E820_PRAM ranges
commit 23446cb66c073b827779e5eb3dec301623299b32 upstream.

Commit:

  917db484dc6a ("x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation")

... fixed up the broken manipulations of max_pfn in the presence of
E820_PRAM ranges.

However, it also broke the sanitize_e820_map() support for not merging
E820_PRAM ranges.

Re-introduce the enabling to keep resource boundaries between
consecutive defined ranges. Otherwise, for example, an environment that
boots with memmap=2G!8G,2G!10G will end up with a single 4G /dev/pmem0
device instead of a /dev/pmem0 and /dev/pmem1 device 2G in size.

Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Zhang Yi <yizhan@redhat.com>
Cc: linux-nvdimm@lists.01.org
Fixes: 917db484dc6a ("x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation")
Link: http://lkml.kernel.org/r/147629530854.10618.10383744751594021268.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28 03:01:33 -04:00
Dmitry Shmidt
b8fa4a3ee5 This is the 4.4.26 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJYCHnGAAoJEDjbvchgkmk+/tAP/RvQZ+Ft5ItRdlAxrXYgTFda
 PXpyUiCPC8XnrzBClXRoA+EiERIpckC5THdU3uUok4BWuALsgrhxK8YxfHXWLQWe
 2cqn19QPiFMZFz6E1sktWtwk63FH4uyfSXQ7/Roun5oQR4Ltg+Q6wRWlf33ox03t
 v8WVv880M4Q2pbp08USCDD8/EcWj3BNK2qFb05RD6pyW9jgIToSXEFoxtKsRiwDv
 QvUTXRkQOfo6Ui9zQTPzQR/ltahb6W/78fEiqu0G1p+uLkTqwusV33t4T64F+829
 s7MTMb7vnQWwuWKEcWy+CMjMwwUfrZHi6gl0PcBxHatFW/AOADixag0PYZWEku9/
 ok9Ltx63ZaSXvGy9odzIgjB/VIjdQXfPwRGOnNojJIvRDkb/zP3fCCglH9kXIwG2
 W5tXCOf43i1a6G3DyEkNEfCk66WYhuwyulGRRpo5RjsiPXnLiKbX9rqVh+WA3ucZ
 jtnZRFKGu2AsXes4tp/aip46zu73BL3wkKhI08z5ipUoV1PDEUb0fxhw0ZkEI/J7
 +tKotIQXdc190qtQNdkBwYAxujlzwhVYJAb9dwgYpWweVL5cEGIp5+sV5isJ5u4M
 0eD9pLVKsgG/BocnNve/h2heRNtTCq34vQrGEnh437NCrZbMGiRhwnI8HRCstqhr
 ECbkLL5cDnwtRiieBYSj
 =RoKA
 -----END PGP SIGNATURE-----

Merge tag 'v4.4.26' into android-4.4.y

This is the 4.4.26 stable release
2016-10-21 13:40:45 -07:00
Josh Poimboeuf
63f9190b09 x86/dumpstack: Fix x86_32 kernel_stack_pointer() previous stack access
commit 72b4f6a5e903b071f2a7c4eb1418cbe4eefdc344 upstream.

On x86_32, when an interrupt happens from kernel space, SS and SP aren't
pushed and the existing stack is used.  So pt_regs is effectively two
words shorter, and the previous stack pointer is normally the memory
after the shortened pt_regs, aka '&regs->sp'.

But in the rare case where the interrupt hits right after the stack
pointer has been changed to point to an empty stack, like for example
when call_on_stack() is used, the address immediately after the
shortened pt_regs is no longer on the stack.  In that case, instead of
'&regs->sp', the previous stack pointer should be retrieved from the
beginning of the current stack page.

kernel_stack_pointer() wants to do that, but it forgets to dereference
the pointer.  So instead of returning a pointer to the previous stack,
it returns a pointer to the beginning of the current stack.

Note that it's probably outside of kernel_stack_pointer()'s scope to be
switching stacks at all.  The x86_64 version of this function doesn't do
it, and it would be better for the caller to do it if necessary.  But
that's a patch for another day.  This just fixes the original intent.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 0788aa6a23 ("x86: Prepare removal of previous_esp from i386 thread_info structure")
Link: http://lkml.kernel.org/r/472453d6e9f6a2d4ab16aaed4935f43117111566.1471535549.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-16 17:36:15 +02:00
Mika Westerberg
eba90a4c76 x86/irq: Prevent force migration of irqs which are not in the vector domain
commit db91aa793ff984ac048e199ea1c54202543952fe upstream.

When a CPU is about to be offlined we call fixup_irqs() that resets IRQ
affinities related to the CPU in question. The same thing is also done when
the system is suspended to S-states like S3 (mem).

For each IRQ we try to complete any on-going move regardless whether the
IRQ is actually part of x86_vector_domain. For each IRQ descriptor we fetch
its chip_data, assume it is of type struct apic_chip_data and manipulate it
by clearing old_domain mask etc. For irq_chips that are not part of the
x86_vector_domain, like those created by various GPIO drivers, will find
their chip_data being changed unexpectly.

Below is an example where GPIO chip owned by pinctrl-sunrisepoint.c gets
corrupted after resume:

  # cat /sys/kernel/debug/gpio
  gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00:
   gpio-511 (                    |sysfs               ) in  hi

  # rtcwake -s10 -mmem
  <10 seconds passes>

  # cat /sys/kernel/debug/gpio
  gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00:
   gpio-511 (                    |sysfs               ) in  ?

Note '?' in the output. It means the struct gpio_chip ->get function is
NULL whereas before suspend it was there.

Fix this by first checking that the IRQ belongs to x86_vector_domain before
we try to use the chip_data as struct apic_chip_data.

Reported-and-tested-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Link: http://lkml.kernel.org/r/20161003101708.34795-1-mika.westerberg@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-16 17:36:15 +02:00
Dan Williams
09634475c7 x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation
commit 917db484dc6a69969d317b3e57add4208a8d9d42 upstream.

In commit:

  ec776ef6bb ("x86/mm: Add support for the non-standard protected e820 type")

Christoph references the original patch I wrote implementing pmem support.
The intent of the 'max_pfn' changes in that commit were to enable persistent
memory ranges to be covered by the struct page memmap by default.

However, that approach was abandoned when Christoph ported the patches [1], and
that functionality has since been replaced by devm_memremap_pages().

In the meantime, this max_pfn manipulation is confusing kdump [2] that
assumes that everything covered by the max_pfn is "System RAM".  This
results in kdump hanging or crashing.

 [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-March/000348.html
 [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1351098

So fix it.

Reported-by: Zhang Yi <yizhan@redhat.com>
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Tested-by: Zhang Yi <yizhan@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-nvdimm@lists.01.org
Fixes: ec776ef6bb ("x86/mm: Add support for the non-standard protected e820 type")
Link: http://lkml.kernel.org/r/147448744538.34910.11287693517367139607.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-16 17:36:15 +02:00
Dmitry Shmidt
14de94f03d This is the 4.4.24 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJX96H2AAoJEDjbvchgkmk+MqcQAJuhiwLmCsXRKXGujGByPi5P
 vk+mnkt8o2UpamvT4KVRnWQrJuN8EHDHg29esGXKHV9Ahdmw/UfnXvbz3P0auet9
 GvMi4rKZpL3vD/dcMxshQchRKF7SwUbNNMkyJv1WYCjLex7W1LU/NOQV5VLx21i6
 9E/R/ARrazGhrGqanfm4NhIZYOR9QAWCrsc8pbJiE2OSty1HbQCLapA8gWg2PSnz
 sTlH0BF9tJ2kKSnyYjXM1Xb1zHbZuj83qEELhSnXsGK71Sq/8jIH9a5SiUwSDtdt
 szGp+vLODPqIMYa01qyLtFA2tvkusvKDUps8vtZ5mp9t38u2R8TDA+CAbz6w19mb
 C0d9abvZ9l1pIe/96OdgZkdSmGG2DC5hxnk3eaxhsyHn6RkIXfB9igH5+Fk7r/nm
 Yq15xxOIu6DCcuoQesUcAHoIR2961kbo/ZnnUEy2hRsqvR3/21X7qW3oj68uTdnB
 QQtMc1jq32toaZFk21ojLDtxKAlVqVHuslQ0hsMMgKtADZAveWpqZj408aNlPOi8
 CFHoEAxYXCQItOhRCoQeC1mljahvhEBI9N+5Zbpf30q5imLKen9hQphytbKABEWN
 CVJ6h6YndrdnlN7cS/AQ62+SNDk4kLmeMomgXfB701WTJ1cvI6eW4q6WUSS+54DZ
 q+brnDATt0K3nUmsrpGM
 =JaSn
 -----END PGP SIGNATURE-----

Merge tag 'v4.4.24' into android-4.4.y

This is the 4.4.24 stable release
2016-10-14 13:34:43 -07:00
Andy Lutomirski
7ba8db47da x86/boot: Initialize FPU and X86_FEATURE_ALWAYS even if we don't have CPUID
commit 05fb3c199bb09f5b85de56cc3ede194ac95c5e1f upstream.

Otherwise arch_task_struct_size == 0 and we die.  While we're at it,
set X86_FEATURE_ALWAYS, too.

Reported-by: David Saggiorato <david@saggiorato.net>
Tested-by: David Saggiorato <david@saggiorato.net>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: aaeb5c01c5b ("x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86")
Link: http://lkml.kernel.org/r/8de723afbf0811071185039f9088733188b606c9.1475103911.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-07 15:23:40 +02:00
Dmitry Shmidt
8760f8e3d9 Merge remote-tracking branch 'common/android-4.4' into android-4.4.y
Change-Id: I6c4e7f9f47392d4b334f71e2b20f2ccf33827632
2016-09-26 14:58:53 -07:00
Dmitry Shmidt
734bcf32c2 This is the 4.4.22 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJX5jR2AAoJEDjbvchgkmk+nXwQAML5WFM1xDL8frXh3vIS3RzD
 fP2YHP0Bm+xE/G9jDnlcoqJmxg4DKPUCP4T/rCZmeNRWc/RaIBX+VTyfVhN969uo
 v5f8jN6fc4TO9WMD+G++Vx3MZqupJbSAXlY2ZSUTF389lM/jHvaWj+DfA1qGLmGJ
 UbfO1jNszadZGIb8yOo/qmR+E3sSV/nT+/y7Sa2rSqkKt5+YI+z1Q1ezLo7BZ+uO
 6p968djKTXSOO7SHciddoegJ8lF2hhgY4cW95CEV+Dqu2O6AVyFyMz+ngYivEueZ
 ZwwQCaYIl+68ssAoI61VmtQHEvuaikTx5g9vjAApScWWijZU+V/M65BLAL6GAMWH
 kWOmilbtZKhyirecAxgnRIkJR8Tp0YcgUYAivsqkYqVPelcPsHvOFRfr4D6HrcBt
 wLrjaoBj+1vAjskozKJEymDNGQJ2Me/nBAWgN44MQYLRGg4kdBxNS/CGyeh8O8wO
 gEeVqa+zDOQCSeg2LJdiql3TdMQfQ+kpCsfjcrrl1oRkRX7OX130+gLuI8Tt1Fno
 6niq6w+QeAY445RSyM45vLeJ6vXB7oFadtuD4QvsB5YFr0X0P0KF3GKlHl0xiyEV
 JFpWJiXYsnOvM8entT23aeCSTlDT1p6os3jLh8p7CBn9TvP3uW2nfgG/FKvy0wGD
 7L7FKYb4Mw+YSrROfxBT
 =5+OA
 -----END PGP SIGNATURE-----

Merge tag 'v4.4.22' into android-4.4.y

This is the 4.4.22 stable release

Change-Id: Id49e3c87d2cacb2fa85d85a17226f718f4a5ac28
2016-09-26 10:37:43 -07:00
Emanuel Czirai
07450b30b7 x86/AMD: Apply erratum 665 on machines without a BIOS fix
commit d1992996753132e2dafe955cccb2fb0714d3cfc4 upstream.

AMD F12h machines have an erratum which can cause DIV/IDIV to behave
unpredictably. The workaround is to set MSRC001_1029[31] but sometimes
there is no BIOS update containing that workaround so let's do it
ourselves unconditionally. It is simple enough.

[ Borislav: Wrote commit message. ]

Signed-off-by: Emanuel Czirai <icanrealizeum@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Yaowu Xu <yaowu@google.com>
Link: http://lkml.kernel.org/r/20160902053550.18097-1-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-24 10:07:37 +02:00
Steven Rostedt
7547bc9f03 x86/paravirt: Do not trace _paravirt_ident_*() functions
commit 15301a570754c7af60335d094dd2d1808b0641a5 upstream.

Łukasz Daniluk reported that on a RHEL kernel that his machine would lock up
after enabling function tracer. I asked him to bisect the functions within
available_filter_functions, which he did and it came down to three:

  _paravirt_nop(), _paravirt_ident_32() and _paravirt_ident_64()

It was found that this is only an issue when noreplace-paravirt is added
to the kernel command line.

This means that those functions are most likely called within critical
sections of the funtion tracer, and must not be traced.

In newer kenels _paravirt_nop() is defined within gcc asm(), and is no
longer an issue.  But both _paravirt_ident_{32,64}() causes the
following splat when they are traced:

 mm/pgtable-generic.c:33: bad pmd ffff8800d2435150(0000000001d00054)
 mm/pgtable-generic.c:33: bad pmd ffff8800d3624190(0000000001d00070)
 mm/pgtable-generic.c:33: bad pmd ffff8800d36a5110(0000000001d00054)
 mm/pgtable-generic.c:33: bad pmd ffff880118eb1450(0000000001d00054)
 NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [systemd-journal:469]
 Modules linked in: e1000e
 CPU: 2 PID: 469 Comm: systemd-journal Not tainted 4.6.0-rc4-test+ #513
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
 task: ffff880118f740c0 ti: ffff8800d4aec000 task.ti: ffff8800d4aec000
 RIP: 0010:[<ffffffff81134148>]  [<ffffffff81134148>] queued_spin_lock_slowpath+0x118/0x1a0
 RSP: 0018:ffff8800d4aefb90  EFLAGS: 00000246
 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011eb16d40
 RDX: ffffffff82485760 RSI: 000000001f288820 RDI: ffffea0000008030
 RBP: ffff8800d4aefb90 R08: 00000000000c0000 R09: 0000000000000000
 R10: ffffffff821c8e0e R11: 0000000000000000 R12: ffff880000200fb8
 R13: 00007f7a4e3f7000 R14: ffffea000303f600 R15: ffff8800d4b562e0
 FS:  00007f7a4e3d7840(0000) GS:ffff88011eb00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f7a4e3f7000 CR3: 00000000d3e71000 CR4: 00000000001406e0
 Call Trace:
   _raw_spin_lock+0x27/0x30
   handle_pte_fault+0x13db/0x16b0
   handle_mm_fault+0x312/0x670
   __do_page_fault+0x1b1/0x4e0
   do_page_fault+0x22/0x30
   page_fault+0x28/0x30
   __vfs_read+0x28/0xe0
   vfs_read+0x86/0x130
   SyS_read+0x46/0xa0
   entry_SYSCALL_64_fastpath+0x1e/0xa8
 Code: 12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 40 6d 01 00 48 03 14 c5 80 6a 5d 82 48 89 0a 8b 41 08 85 c0 75 09 f3 90 8b 41 08 <85> c0 74 f7 4c 8b 09 4d 85 c9 74 08 41 0f 18 09 eb 02 f3 90 8b

Reported-by: Łukasz Daniluk <lukasz.daniluk@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-24 10:07:37 +02:00
Kees Cook
fa88d4cf69 UPSTREAM: x86/mm: Always enable CONFIG_DEBUG_RODATA and remove the Kconfig option
This removes the CONFIG_DEBUG_RODATA option and makes it always enabled.

This simplifies the code and also makes it clearer that read-only mapped
memory is just as fundamental a security feature in kernel-space as it is
in user-space.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Brown <david.brown@linaro.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/1455748879-21872-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Bug: 31660652
Change-Id: I3e79c7c4ead79a81c1445f1b3dd28003517faf18
(cherry picked from commit 9ccaf77cf05915f51231d158abfd5448aedde758)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
2016-09-23 10:04:06 -07:00
Dmitry Shmidt
a517d900c6 This is the 4.4.21 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJX2j/VAAoJEDjbvchgkmk+I68P/0F4XGYxJfbjXbZSv1Q5IZRG
 62mtwjjgF9ig3ORxAORfFEI8jNYtERvjpPWrCuvjwBqgcYb6AEsL62+AxNAg1ow/
 foMuSIDqgFDboVsLBIVWpyiHzOh598X7dakB3evFtceTbLsd1b03O4PYMmL1QbtP
 NJe1ZwK52abVzbH8lR3Utqh6oUX0p7gtNgG4KaU9eKu2Y/K7p/j1vUyrvVBANLLs
 gP26Y49SiSg5ARhbi+GLfoJ5mtrt4T6/i4U6rwjpveaKf5l5tx6smCg93OH7qLxP
 uhEsTKXgU+6/czPQSnR3LvPtX08c9HTfgBiJhqlBKVf9ClnLUKN+6b3l7FvQMxGP
 Sxu8YtKvCfYzm6GITZftlicZoEDmlU1wkOiJqH6QyR+FxQODMw/Y0InCsFoCY7WG
 09483Z32VJOLLSObHMzPsO1tETjJOkAAhZemg8WHhY4XVXTnN4llTOG+/LtTHyQd
 DEWoAqBPhNZgEH6ktrVQRcGcxqUiIeO/aOPa230yCxL0bQ+bNfn1MzrKO1fKUCt9
 fMpi8DLbof1zs39PYW18DAZbAp7/M07vugICZ56ugTPUVIzrGH4KpO4sA37XBQHX
 RUZKejCqdVgxPgBSADqEUfw8FnlEJAQjaU4ozp5Za1wKVAG0YMmnkBtM+lkaTIi6
 rGg8KFalYaZj+hLaPr02
 =aXSG
 -----END PGP SIGNATURE-----

Merge tag 'v4.4.21' into android-4.4.y

This is the 4.4.21 stable release

Change-Id: I03e47d6fdca8084641c4b4f9658ea0b0edb8f297
2016-09-16 14:34:07 -07:00
Wanpeng Li
2e133f2d2c x86/apic: Do not init irq remapping if ioapic is disabled
commit 2e63ad4bd5dd583871e6602f9d398b9322d358d9 upstream.

native_smp_prepare_cpus
  -> default_setup_apic_routing
    -> enable_IR_x2apic
      -> irq_remapping_prepare
        -> intel_prepare_irq_remapping
          -> intel_setup_irq_remapping

So IR table is setup even if "noapic" boot parameter is added. As a result we
crash later when the interrupt affinity is set due to a half initialized
remapping infrastructure.

Prevent remap initialization when IOAPIC is disabled.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Link: http://lkml.kernel.org/r/1471954039-3942-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:53 +02:00
Vitaly Kuznetsov
c5852a85ed x86/hyperv: Avoid reporting bogus NMI status for Gen2 instances
[ Upstream commit 1e2ae9ec072f3b7887f456426bc2cf23b80f661a ]

Generation2 instances don't support reporting the NMI status on port 0x61,
read from there returns 'ff' and we end up reporting nonsensical PCI
error (as there is no PCI bus in these instances) on all NMIs:

    NMI: PCI system error (SERR) for reason ff on CPU 0.
    Dazed and confused, but trying to continue

Fix the issue by overriding x86_platform.get_nmi_reason. Use 'booted on
EFI' flag to detect Gen2 instances.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: devel@linuxdriverproject.org
Link: http://lkml.kernel.org/r/1460728232-31433-1-git-send-email-vkuznets@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:48 +02:00
Vikas Shivappa
12850f3616 perf/x86/cqm: Fix CQM memory leak and notifier leak
[ Upstream commit ada2f634cd50d050269b67b4e2966582387e7c27 ]

Fixes the hotcpu notifier leak and other global variable memory leaks
during CQM (cache quality of service monitoring) initialization.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: fenghua.yu@intel.com
Cc: h.peter.anvin@intel.com
Cc: ravi.v.shankar@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1457652732-4499-3-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:46 +02:00
Vikas Shivappa
0b152db042 perf/x86/cqm: Fix CQM handling of grouping events into a cache_group
[ Upstream commit a223c1c7ab4cc64537dc4b911f760d851683768a ]

Currently CQM (cache quality of service monitoring) is grouping all
events belonging to same PID to use one RMID. However its not counting
all of these different events. Hence we end up with a count of zero
for all events other than the group leader.

The patch tries to address the issue by keeping a flag in the
perf_event.hw which has other CQM related fields. The field is updated
at event creation and during grouping.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
[peterz: Changed hw_perf_event::is_group_event to an int]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: fenghua.yu@intel.com
Cc: h.peter.anvin@intel.com
Cc: ravi.v.shankar@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1457652732-4499-2-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:46 +02:00
Dmitry Shmidt
5c0fc54c9b This is the 4.4.20 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJXz7SnAAoJEDjbvchgkmk+bSYP/A9pJTr4mYzV6eG85FYupYcy
 HLeVRgUoLM6ShgqdsEDI1U6GH60GjzeCEWQ3ZC3Tvkj9BrbhSkFIIetF7ikszedN
 rEZBtvbQxn2j6ST2E98zBY6AMX/j65XJw5iiSKaG9iyWdAPinfUmtMUJpMDJkuUE
 3chDjKpNXyAMmlvwf1UCmN27OH4mrVKyK0nbLujQCGFC1Yh3ELlGV5GumGFApmyB
 +oBF4jcqn8b/9JwRWUB9P3IuFMtGC4ESn36lI8FgRWRpfHxa2i/Arc8CTJO7mEIX
 ojLyiEYMoi3fM4qCVN75oKnStD73qpQIYkUE2H22uj5ovt4sPu/R7TtMJcwXl7RJ
 3c+LV0n+b6RhIws4USoFv9unK+t6F2u6zq0cqQ9Az7qU9T1V37An+Pqzwim8Evf7
 bPYlCVHfa3mimYi/1bNRyNp+RaBwOhqCdZD96am/wEysQ1mwWlhYNEU/Vy0+tUfg
 cP7NhjMmFdP321QEimJwVp0SVW0tekd/kCQ4zcY5VNS3aQaxopinzSyKWsC+o9Ji
 WG6Yi6SLSgLnvS0fQJaRv12pXZGiermyKzMvIW6wq5NpwvzQhm0x44fzhWcgMVxv
 LuMvu0TQIbeDwucBC8XUjw4RYZpJdq/WK5Er7QVS4giknP8FqtBil8QtOWpqzsJ9
 ZsWht54uQw0Jx9dj6vcR
 =jg6w
 -----END PGP SIGNATURE-----

Merge tag 'v4.4.20' into android-4.4.y

This is the 4.4.20 stable release
2016-09-07 14:36:44 -07:00
Denys Vlasenko
77b0e10991 uprobes/x86: Fix RIP-relative handling of EVEX-encoded instructions
commit 68187872c76a96ed4db7bfb064272591f02e208b upstream.

Since instruction decoder now supports EVEX-encoded instructions, two fixes
are needed to correctly handle them in uprobes.

Extended bits for MODRM.rm field need to be sanitized just like we do it
for VEX3, to avoid encoding wrong register for register-relative access.

EVEX has _two_ extended bits: b and x. Theoretically, EVEX.x should be
ignored by the CPU (since GPRs go only up to 15, not 31), but let's be
paranoid here: proper encoding for register-relative access
should have EVEX.x = 1.

Secondly, we should fetch vex.vvvv for EVEX too.
This is now super easy because instruction decoder populates
vex_prefix.bytes[2] for all flavors of (e)vex encodings, even for VEX2.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Fixes: 8a764a875f ("x86/asm/decoder: Create artificial 3rd byte for 2-byte VEX")
Link: http://lkml.kernel.org/r/20160811154521.20469-1-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-07 08:32:36 +02:00
Dmitry Shmidt
aa349c0a96 This is the 4.4.19 stable release
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJXuIDJAAoJEDjbvchgkmk+i10QALySg/PFXDJ6AwUskGbetHBz
 RnsJ8WzjtzBR5vAyaru2vkD/GhFmM3ziG8guQK3uWGhhfpB+CPJjDmYIY1O5Djma
 CviyB6UsEIuf2zN7U70WSmjJ/FyD7XRqjGnEX9u5YGS4WQTFPnPttE4HE82ErEEW
 IocnBGFZriGye9D/2O6OjTDgIusLsZ6WKawK0OyeKiUrTUsmhLBtW0nfMHd/snNw
 4Aas0j6g5tjYrNBUyKqmkYhi7S2kFyZ7QH1vqrXxUHu4CNslTa6i1VTkQ+uVxbuF
 Vw9DLP6KEmB/Q5KyIVFMmEv6E5vvgymv7rrQ4c7pu6vqmHzbdtaWxZFM18EnIXOk
 qe8/9wzF4ahw+h/0ddmjpjmWi/SRYG8PmobgTWmIqJl+SNq4VK2G/GRkWce45EDi
 lMO6UI4qUd8vMw1OJOdKwp8C/D+l5V1qrVlQTVba8IJsH2fKFw9aSKAGwpppawfl
 CiESwHhSINGfhGzDyYS/keo1JM0KDyGc3EYQG5DaSzNZu4jqkhNPjBlQEOJug3/I
 6LDrWQo4+qC6vJJ836NyRvakv1WDL8AsHmTOuiW8h8LzcGsaxac9L7HMRgwItXAs
 aWTXg2eBoJXkBQalglvhSzGqBJl2ytlu0Efxg97zEL1huZuYDdzf9tO7hqMujZhc
 k+SnQTS6JXVuDe46uDyb
 =JLSE
 -----END PGP SIGNATURE-----

Merge tag 'v4.4.19' into android-4.4.y

This is the 4.4.19 stable release
2016-08-22 14:09:08 -07:00
Toshi Kani
a23b299b4a x86/mtrr: Fix PAT init handling when MTRR is disabled
commit ad025a73f0e9344ac73ffe1b74c184033e08e7d5 upstream.

get_mtrr_state() calls pat_init() on BSP even if MTRR is disabled.
This results in calling pat_init() on BSP only since APs do not call
pat_init() when MTRR is disabled.  This inconsistency between BSP
and APs leads to undefined behavior.

Make BSP's calling condition to pat_init() consistent with AP's,
mtrr_ap_init() and mtrr_aps_init().

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: elliott@hpe.com
Cc: konrad.wilk@oracle.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-6-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16 09:30:49 +02:00
Toshi Kani
594055cf63 x86/mtrr: Fix Xorg crashes in Qemu sessions
commit edfe63ec97ed8d4496225f7ba54c9ce4207c5431 upstream.

A Xorg failure on qemu32 was reported as a regression [1] caused by
commit 9cd25aac1f ("x86/mm/pat: Emulate PAT when it is disabled").

This patch fixes the Xorg crash.

Negative effects of this regression were the following two failures [2]
in Xorg on QEMU with QEMU CPU model "qemu32" (-cpu qemu32), which were
triggered by the fact that its virtual CPU does not support MTRRs.

 #1. copy_process() failed in the check in reserve_pfn_range()

    copy_process
     copy_mm
      dup_mm
       dup_mmap
        copy_page_range
         track_pfn_copy
          reserve_pfn_range

 A WC map request was tracked as WC in memtype, which set a PTE as
 UC (pgprot) per __cachemode2pte_tbl[].  This led to this error in
 reserve_pfn_range() called from track_pfn_copy(), which obtained
 a pgprot from a PTE.  It converts pgprot to page_cache_mode, which
 does not necessarily result in the original page_cache_mode since
 __cachemode2pte_tbl[] redirects multiple types to UC.

 #2. error path in copy_process() then hit WARN_ON_ONCE in
     untrack_pfn().

     x86/PAT: Xorg:509 map pfn expected mapping type uncached-
     minus for [mem 0xfd000000-0xfdffffff], got write-combining
      Call Trace:
     dump_stack
     warn_slowpath_common
     ? untrack_pfn
     ? untrack_pfn
     warn_slowpath_null
     untrack_pfn
     ? __kunmap_atomic
     unmap_single_vma
     ? pagevec_move_tail_fn
     unmap_vmas
     exit_mmap
     mmput
     copy_process.part.47
     _do_fork
     SyS_clone
     do_syscall_32_irqs_on
     entry_INT80_32

These negative effects are caused by two separate bugs, but they
can be addressed in separate patches.  Fixing the pat_init() issue
described below addresses the root cause, and avoids Xorg to hit
these cases.

When the CPU does not support MTRRs, MTRR does not call pat_init(),
which leaves PAT enabled without initializing PAT.  This pat_init()
issue is a long-standing issue, but manifested as issue #1 (and then
hit issue #2) with the above-mentioned commit because the memtype
now tracks cache attribute with 'page_cache_mode'.

This pat_init() issue existed before the commit, but we used pgprot
in memtype.  Hence, we did not have issue #1 before.  But WC request
resulted in WT in effect because WC pgrot is actually WT when PAT
is not initialized.  This is not how it was designed to work.  When
PAT is set to disable properly, WC is converted to UC.  The use of
WT can result in a system crash if the target range does not support
WT.  Fortunately, nobody ran into such issue before.

To fix this pat_init() issue, PAT code has been enhanced to provide
pat_disable() interface.  Call this interface when MTRRs are disabled.
By setting PAT to disable properly, PAT bypasses the memtype check,
and avoids issue #1.

  [1]: https://lkml.org/lkml/2016/3/3/828
  [2]: https://lkml.org/lkml/2016/3/4/775

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: elliott@hpe.com
Cc: konrad.wilk@oracle.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-5-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16 09:30:49 +02:00
Dmitry Shmidt
1f369b24e2 This is the 4.4.17 stable release
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJXqvi+AAoJEDjbvchgkmk+M2IQANcL1Lok1bc5SMvO9HQuR2B0
 zFJBchx4ik11wQXiKr1BEZOeHJcOSwau53+IaAEtIPqZ2KKkJ7z1iXhml6rC9TQO
 aYrXNJbObFkwDBLMfC6ViJrahoUq8bRrzAtcs+agPIlg0gm86lC2D3yjiM92RJJ1
 FNzzNwgS8Kr7js1b/O+FgplMyWM9uXhFZdq+uYeRKIEYbwhisZR+uJ2mX4qUDE5I
 lZd81Z3UvklQnqZcsXqzIG4xWFjO5z5Kt7IiGRy+36rz6EZpuNizZrBshjC11N3C
 XMFj3+wrc3cF2dV6mKPbk0KXWZuPrnFBg7Khp/d0OHxJQ5TXjsuCqvfcZVMg6bZS
 xv4imOX6GQbNrVSuMysV1ShZFJibAHIF0DFO6q/xUSXribSX8Q5WFaBcD9CG8+C/
 JpOzcuv35E/EHomkw7EjX/Lg9h4NLksebeWiDMsYSlvQv7LDqHUTkZqLB/DsSJto
 wSrSA9VLv/ABEWcQhnObsWsheXeT4d+bqjPCHBRzEyMQU3kbXF2P7fAHL70fk+XF
 QFfl7132e7y4QUYKCRr6bdc7kehNLQiNOPgSNG0HrWxt2NNBMzfzuUDi37c2N5of
 dstEVBpxdOdt/sfJILUc88ovXMADVSsJ4PiCO26PRcZeUEqAQog+j7AHlYr1t/bX
 MQwdejyXKsOKXoHRsP1C
 =JPsI
 -----END PGP SIGNATURE-----

Merge tag 'v4.4.17' into android-4.4.y

This is the 4.4.17 stable release
2016-08-11 15:12:05 -07:00
Stephane Eranian
b4fedbef96 perf/x86: fix PEBS issues on Intel Atom/Core2
commit 1424a09a9e1839285e948d4ea9fdfca26c9a2086 upstream.

This patch fixes broken PEBS support on Intel Atom and Core2
due to wrong pointer arithmetic in intel_pmu_drain_pebs_core().

The get_next_pebs_record_by_bit() was called on PEBS format fmt0
which does not use the pebs_record_nhm layout.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: kan.liang@intel.com
Fixes: 21509084f9 ("perf/x86/intel: Handle multiple records in the PEBS buffer")
Link: http://lkml.kernel.org/r/1449182000-31524-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-10 11:49:28 +02:00
Minfei Huang
ca3455867e pvclock: Add CPU barriers to get correct version value
commit 749d088b8e7f4b9826ede02b9a043e417fa84aa1 upstream.

Protocol for the "version" fields is: hypervisor raises it (making it
uneven) before it starts updating the fields and raises it again (making
it even) when it is done.  Thus the guest can make sure the time values
it got are consistent by checking the version before and after reading
them.

Add CPU barries after getting version value just like what function
vread_pvclock does, because all of callees in this function is inline.

Fixes: 502dfeff23
Signed-off-by: Minfei Huang <mnghuan@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-10 11:49:27 +02:00
Lukas Wunner
ba1eebc72d x86/quirks: Add early quirk to reset Apple AirPort card
commit abb2bafd295fe962bbadc329dbfb2146457283ac upstream.

The EFI firmware on Macs contains a full-fledged network stack for
downloading OS X images from osrecovery.apple.com. Unfortunately
on Macs introduced 2011 and 2012, EFI brings up the Broadcom 4331
wireless card on every boot and leaves it enabled even after
ExitBootServices has been called. The card continues to assert its IRQ
line, causing spurious interrupts if the IRQ is shared. It also corrupts
memory by DMAing received packets, allowing for remote code execution
over the air. This only stops when a driver is loaded for the wireless
card, which may be never if the driver is not installed or blacklisted.

The issue seems to be constrained to the Broadcom 4331. Chris Milsted
has verified that the newer Broadcom 4360 built into the MacBookPro11,3
(2013/2014) does not exhibit this behaviour. The chances that Apple will
ever supply a firmware fix for the older machines appear to be zero.

The solution is to reset the card on boot by writing to a reset bit in
its mmio space. This must be done as an early quirk and not as a plain
vanilla PCI quirk to successfully combat memory corruption by DMAed
packets: Matthew Garrett found out in 2012 that the packets are written
to EfiBootServicesData memory (http://mjg59.dreamwidth.org/11235.html).
This type of memory is made available to the page allocator by
efi_free_boot_services(). Plain vanilla PCI quirks run much later, in
subsys initcall level. In-between a time window would be open for memory
corruption. Random crashes occurring in this time window and attributed
to DMAed packets have indeed been observed in the wild by Chris
Bainbridge.

When Matthew Garrett analyzed the memory corruption issue in 2012, he
sought to fix it with a grub quirk which transitions the card to D3hot:
http://git.savannah.gnu.org/cgit/grub.git/commit/?id=9d34bb85da56

This approach does not help users with other bootloaders and while it
may prevent DMAed packets, it does not cure the spurious interrupts
emanating from the card. Unfortunately the card's mmio space is
inaccessible in D3hot, so to reset it, we have to undo the effect of
Matthew's grub patch and transition the card back to D0.

Note that the quirk takes a few shortcuts to reduce the amount of code:
The size of BAR 0 and the location of the PM capability is identical
on all affected machines and therefore hardcoded. Only the address of
BAR 0 differs between models. Also, it is assumed that the BCMA core
currently mapped is the 802.11 core. The EFI driver seems to always take
care of this.

Michael Büsch, Bjorn Helgaas and Matt Fleming contributed feedback
towards finding the best solution to this problem.

The following should be a comprehensive list of affected models:
    iMac13,1        2012  21.5"       [Root Port 00:1c.3 = 8086:1e16]
    iMac13,2        2012  27"         [Root Port 00:1c.3 = 8086:1e16]
    Macmini5,1      2011  i5 2.3 GHz  [Root Port 00:1c.1 = 8086:1c12]
    Macmini5,2      2011  i5 2.5 GHz  [Root Port 00:1c.1 = 8086:1c12]
    Macmini5,3      2011  i7 2.0 GHz  [Root Port 00:1c.1 = 8086:1c12]
    Macmini6,1      2012  i5 2.5 GHz  [Root Port 00:1c.1 = 8086:1e12]
    Macmini6,2      2012  i7 2.3 GHz  [Root Port 00:1c.1 = 8086:1e12]
    MacBookPro8,1   2011  13"         [Root Port 00:1c.1 = 8086:1c12]
    MacBookPro8,2   2011  15"         [Root Port 00:1c.1 = 8086:1c12]
    MacBookPro8,3   2011  17"         [Root Port 00:1c.1 = 8086:1c12]
    MacBookPro9,1   2012  15"         [Root Port 00:1c.1 = 8086:1e12]
    MacBookPro9,2   2012  13"         [Root Port 00:1c.1 = 8086:1e12]
    MacBookPro10,1  2012  15"         [Root Port 00:1c.1 = 8086:1e12]
    MacBookPro10,2  2012  13"         [Root Port 00:1c.1 = 8086:1e12]

For posterity, spurious interrupts caused by the Broadcom 4331 wireless
card resulted in splats like this (stacktrace omitted):

    irq 17: nobody cared (try booting with the "irqpoll" option)
    handlers:
    [<ffffffff81374370>] pcie_isr
    [<ffffffffc0704550>] sdhci_irq [sdhci] threaded [<ffffffffc07013c0>] sdhci_thread_irq [sdhci]
    [<ffffffffc0a0b960>] azx_interrupt [snd_hda_codec]
    Disabling IRQ #17

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111781
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=728916
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951#c16
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1098621
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632#c5
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1279130
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1332732
Tested-by: Konstantin Simanov <k.simanov@stlk.ru>        # [MacBookPro8,1]
Tested-by: Lukas Wunner <lukas@wunner.de>                # [MacBookPro9,1]
Tested-by: Bryan Paradis <bryan.paradis@gmail.com>       # [MacBookPro9,2]
Tested-by: Andrew Worsley <amworsley@gmail.com>          # [MacBookPro10,1]
Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com> # [MacBookPro10,2]
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Rafał Miłecki <zajec5@gmail.com>
Acked-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chris Milsted <cmilsted@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Michael Buesch <m@bues.ch>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: b43-dev@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Link: http://lkml.kernel.org/r/48d0972ac82a53d460e5fce77a07b2560db95203.1465690253.git.lukas@wunner.de
[ Did minor readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-10 11:49:24 +02:00
Lukas Wunner
dd4eb74efb x86/quirks: Reintroduce scanning of secondary buses
commit 850c321027c2e31d0afc71588974719a4b565550 upstream.

We used to scan secondary buses until the following commit that
was applied in 2009:

  8659c406ad ("x86: only scan the root bus in early PCI quirks")

which commit constrained early quirks to the root bus only. Its
motivation was to prevent application of the nvidia_bugs quirk
on secondary buses.

We're about to add a quirk to reset the Broadcom 4331 wireless card on
2011/2012 Macs, which is located on a secondary bus behind a PCIe root
port. To facilitate that, reintroduce scanning of secondary buses.

The commit message of 8659c406ad notes that scanning only the root bus
"saves quite some unnecessary scanning work". The algorithm used prior
to 8659c406ad was particularly time consuming because it scanned
buses 0 to 31 brute force. To avoid lengthening boot time, employ a
recursive strategy which only scans buses that are actually reachable
from the root bus.

Yinghai Lu pointed out that the secondary bus number read from a
bridge's config space may be invalid, in particular a value of 0 would
cause an infinite loop. The PCI core goes beyond that and recurses to a
child bus only if its bus number is greater than the parent bus number
(see pci_scan_bridge()). Since the root bus is numbered 0, this implies
that secondary buses may not be 0. Do the same on early scanning.

If this algorithm is found to significantly impact boot time or cause
infinite loops on broken hardware, it would be possible to limit its
recursion depth: The Broadcom 4331 quirk applies at depth 1, all others
at depth 0, so the bus need not be scanned deeper than that for now. An
alternative approach would be to revert to scanning only the root bus,
and apply the Broadcom 4331 quirk to the root ports 8086:1c12, 8086:1e12
and 8086:1e16. Apple always positioned the card behind either of these
three ports. The quirk would then check presence of the card in slot 0
below the root port and do its deed.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-pci@vger.kernel.org
Link: http://lkml.kernel.org/r/f0daa70dac1a9b2483abdb31887173eb6ab77bdf.1465690253.git.lukas@wunner.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-10 11:49:24 +02:00
Lukas Wunner
bab5a36c19 x86/quirks: Apply nvidia_bugs quirk only on root bus
commit 447d29d1d3aed839e74c2401ef63387780ac51ed upstream.

Since the following commit:

  8659c406ad ("x86: only scan the root bus in early PCI quirks")

... early quirks are only applied to devices on the root bus.

The motivation was to prevent application of the nvidia_bugs quirk on
secondary buses.

We're about to reintroduce scanning of secondary buses for a quirk to
reset the Broadcom 4331 wireless card on 2011/2012 Macs. To prevent
regressions, open code the requirement to apply nvidia_bugs only on the
root bus.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/4d5477c1d76b2f0387a780f2142bbcdd9fee869b.1465690253.git.lukas@wunner.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-10 11:49:24 +02:00
Dmitry Shmidt
b558f17a13 This is the 4.4.16 stable release
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJXmOXmAAoJEDjbvchgkmk+QYIP/1S8oBZsvjfDzvH8t63HyLeH
 i43MFlYoFAqUIZc002XpluSvZ8uHoG+r7R8Hq3wmv48wxe3M6OBnMdBVTht6mPw+
 t5OLTZr40lWaJm2EIi4aekueMIrCgmL+Et+IFYv7ZVBuYLteVcfny+zdq4EqGmgj
 /a19+L/sTTr4SHtJIhHxWhiVJ9fVMgQk/N3VgQmIiNF2+lVbiFI7QQiDPLbFl0KK
 CM4ETO22HxHCYilGpzhpSMsHCxv12VqNaXNLAsPAepGGW7PqvUmrEWAqgwsbOfRc
 GxTLNk0dUgJqMrfEpQ8ZOMlgzvCAYG2jZuNSuT+nuzrWSUP+WOGRi9TTTxp1CYuZ
 PHlhNTH7ZnqosxJUUZS2d9N5ygpqD48Rhlfl824YzOWCy94VeUnedkVLb20uJwPF
 Y5aQ5WjktBC9why5e4OgGQERvx/U9KTk8E1zRfZZPc2oft9My0YxuemjjKAKZiYN
 ne4WhXbgOJTQkAoZwh2xqny3bWyEaoSrWpQ3R7bBJ9SIRLEOdCKzKpduDbAnbMP7
 QWgQOQC/6qA1mKqjrqF4KPA1Quo9PcUK2Ajh523ewMGCowgY90vyejAgh4Q8g0GC
 fKlx+jJDoKVDbQ8v4hc9PPHMsNNIKT9a1ptwVS3lE+bq1D5Ffm57A4/uOTMYHVab
 gKqu8h1CA0MCVBsH3nNA
 =nY8S
 -----END PGP SIGNATURE-----

Merge tag 'v4.4.16' into android-4.4.y

This is the 4.4.16 stable release

Change-Id: Ibaf7b7e03695e1acebc654a2ca1a4bfcc48fcea4
2016-08-01 15:57:55 -07:00
Andrey Ryabinin
e73be162f4 perf/x86: Fix undefined shift on 32-bit kernels
commit 6d6f2833bfbf296101f9f085e10488aef2601ba5 upstream.

Jim reported:

	UBSAN: Undefined behaviour in arch/x86/events/intel/core.c:3708:12
	shift exponent 35 is too large for 32-bit type 'long unsigned int'

The use of 'unsigned long' type obviously is not correct here, make it
'unsigned long long' instead.

Reported-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Imre Palik <imrep@amazon.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 2c33645d36 ("perf/x86: Honor the architectural performance monitoring version")
Link: http://lkml.kernel.org/r/1462974711-10037-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Christopher <kevinc@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27 09:47:36 -07:00
Borislav Petkov
d9c5952f0a x86/amd_nb: Fix boot crash on non-AMD systems
commit 1ead852dd88779eda12cb09cc894a03d9abfe1ec upstream.

Fix boot crash that triggers if this driver is built into a kernel and
run on non-AMD systems.

AMD northbridges users call amd_cache_northbridges() and it returns
a negative value to signal that we weren't able to cache/detect any
northbridges on the system.

At least, it should do so as all its callers expect it to do so. But it
does return a negative value only when kmalloc() fails.

Fix it to return -ENODEV if there are no NBs cached as otherwise, amd_nb
users like amd64_edac, for example, which relies on it to know whether
it should load or not, gets loaded on systems like Intel Xeons where it
shouldn't.

Reported-and-tested-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1466097230-5333-2-git-send-email-bp@alien8.de
Link: https://lkml.kernel.org/r/5761BEB0.9000807@cybernetics.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27 09:47:29 -07:00
Masami Hiramatsu
66af3f62b6 kprobes/x86: Clear TF bit in fault on single-stepping
commit dcfc47248d3f7d28df6f531e6426b933de94370d upstream.

Fix kprobe_fault_handler() to clear the TF (trap flag) bit of
the flags register in the case of a fault fixup on single-stepping.

If we put a kprobe on the instruction which caused a
page fault (e.g. actual mov instructions in copy_user_*),
that fault happens on the single-stepping buffer. In this
case, kprobes resets running instance so that the CPU can
retry execution on the original ip address.

However, current code forgets to reset the TF bit. Since this
fault happens with TF bit set for enabling single-stepping,
when it retries, it causes a debug exception and kprobes
can not handle it because it already reset itself.

On the most of x86-64 platform, it can be easily reproduced
by using kprobe tracer. E.g.

  # cd /sys/kernel/debug/tracing
  # echo p copy_user_enhanced_fast_string+5 > kprobe_events
  # echo 1 > events/kprobes/enable

And you'll see a kernel panic on do_debug(), since the debug
trap is not handled by kprobes.

To fix this problem, we just need to clear the TF bit when
resetting running kprobe.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: systemtap@sourceware.org
Link: http://lkml.kernel.org/r/20160611140648.25885.37482.stgit@devbox
[ Updated the comments. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27 09:47:29 -07:00
Andy Lutomirski
035a94d8d1 x86/entry/traps: Don't force in_interrupt() to return true in IST handlers
commit aaee8c3c5cce2d9107310dd9f3026b4f901d441c upstream.

Forcing in_interrupt() to return true if we're not in a bona fide
interrupt confuses the softirq code.  This fixes warnings like:

  NOHZ: local_softirq_pending 282

... which can happen when running things like selftests/x86.

This will change perf's static percpu buffer usage in IST context.
I think this is okay, and it's changing the behavior to match
historical (pre-4.0) behavior.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 9592747538 ("x86, traps: Track entry into and exit from IST context")
Link: http://lkml.kernel.org/r/cdc215f94d118d691d73df35275022331156fb45.1464130360.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24 10:18:20 -07:00
Alexander Shishkin
a7a9e0efc8 perf/x86/intel/pt: Generate PMI in the STOP region as well
commit ab92b232ae05c382c3df0e3d6a5c6d16b639ac8c upstream.

Currently, the PT driver always sets the PMI bit one region (page) before
the STOP region so that we can wake up the consumer before we run out of
room in the buffer and have to disable the event. However, we also need
an interrupt in the last output region, so that we actually get to disable
the event (if no more room from new data is available at that point),
otherwise hardware just quietly refuses to start, but the event is
scheduled in and we end up losing trace data till the event gets removed.

For a cpu-wide event it is even worse since there may not be any
re-scheduling at all and no chance for the ring buffer code to notice
that its buffer is filled up and the event needs to be disabled (so that
the consumer can re-enable it when it finishes reading the data out). In
other words, all the trace data will be lost after the buffer gets filled
up.

This patch makes PT also generate a PMI when the last output region is
full.

Reported-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/1462886313-13660-2-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01 12:15:47 -07:00
Srinivas Pandruvada
dfa11d5862 ACPI / processor: Request native thermal interrupt handling via _OSC
commit a21211672c9a1d730a39aa65d4a5b3414700adfb upstream.

There are several reports of freeze on enabling HWP (Hardware PStates)
feature on Skylake-based systems by the Intel P-states driver. The root
cause is identified as the HWP interrupts causing BIOS code to freeze.

HWP interrupts use the thermal LVT which can be handled by Linux
natively, but on the affected Skylake-based systems SMM will respond
to it by default.  This is a problem for several reasons:
 - On the affected systems the SMM thermal LVT handler is broken (it
   will crash when invoked) and a BIOS update is necessary to fix it.
 - With thermal interrupt handled in SMM we lose all of the reporting
   features of the arch/x86/kernel/cpu/mcheck/therm_throt driver.
 - Some thermal drivers like x86-package-temp depend on the thermal
   threshold interrupts signaled via the thermal LVT.
 - The HWP interrupts are useful for debugging and tuning
   performance (if the kernel can handle them).
The native handling of thermal interrupts needs to be enabled
because of that.

This requires some way to tell SMM that the OS can handle thermal
interrupts.  That can be done by using _OSC/_PDC in processor
scope very early during ACPI initialization.

The meaning of _OSC/_PDC bit 12 in processor scope is whether or
not the OS supports native handling of interrupts for Collaborative
Processor Performance Control (CPPC) notifications.  Since on
HWP-capable systems CPPC is a firmware interface to HWP, setting
this bit effectively tells the firmware that the OS will handle
thermal interrupts natively going forward.

For details on _OSC/_PDC refer to:
http://www.intel.com/content/www/us/en/standards/processor-vendor-specific-acpi-specification.html

To implement the _OSC/_PDC handshake as described, introduce a new
function, acpi_early_processor_osc(), that walks the ACPI
namespace looking for ACPI processor objects and invokes _OSC for
them with bit 12 in the capabilities buffer set and terminates the
namespace walk on the first success.

Also modify intel_thermal_interrupt() to clear HWP status bits in
the HWP_STATUS MSR to acknowledge HWP interrupts (which prevents
them from firing continuously).

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Subject & changelog, function rename ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-11 11:21:26 +02:00
Wang YanQing
ac8fc72dec x86/sysfb_efi: Fix valid BAR address range check
commit c10fcb14c7afd6688c7b197a814358fecf244222 upstream.

The code for checking whether a BAR address range is valid will break
out of the loop when a start address of 0x0 is encountered.

This behaviour is wrong since by breaking out of the loop we may miss
the BAR that describes the EFI frame buffer in a later iteration.

Because of this bug I can't use video=efifb: boot parameter to get
efifb on my new ThinkPad E550 for my old linux system hard disk with
3.10 kernel. In 3.10, efifb is the only choice due to DRM/I915 not
supporting the GPU.

This patch also add a trivial optimization to break out after we find
the frame buffer address range without testing later BARs.

Signed-off-by: Wang YanQing <udknight@gmail.com>
[ Rewrote changelog. ]
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Peter Jones <pjones@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1462454061-21561-2-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-11 11:21:20 +02:00
Chen Yu
73c1fd0aa1 x86/tsc: Read all ratio bits from MSR_PLATFORM_INFO
commit 886123fb3a8656699dff40afa0573df359abeb18 upstream.

Currently we read the tsc radio: ratio = (MSR_PLATFORM_INFO >> 8) & 0x1f;

Thus we get bit 8-12 of MSR_PLATFORM_INFO, however according to the SDM
(35.5), the ratio bits are bit 8-15.

Ignoring the upper bits can result in an incorrect tsc ratio, which causes the
TSC calibration and the Local APIC timer frequency to be incorrect.

Fix this problem by masking 0xff instead.

[ tglx: Massaged changelog ]

Fixes: 7da7c15613 "x86, tsc: Add static (MSR) TSC calibration on Intel Atom SoCs"
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Bin Gao <bin.gao@intel.com>
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/1462505619-5516-1-git-send-email-yu.c.chen@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-11 11:21:19 +02:00
Keith Busch
01d5ccd341 x86/apic: Handle zero vector gracefully in clear_vector_irq()
commit 1bdb8970392a68489b469c3a330a1adb5ef61beb upstream.

If x86_vector_alloc_irq() fails x86_vector_free_irqs() is invoked to cleanup
the already allocated vectors. This subsequently calls clear_vector_irq().

The failed irq has no vector assigned, which triggers the BUG_ON(!vector) in
clear_vector_irq().

We cannot suppress the call to x86_vector_free_irqs() for the failed
interrupt, because the other data related to this irq must be cleaned up as
well. So calling clear_vector_irq() with vector == 0 is legitimate.

Remove the BUG_ON and return if vector is zero,

[ tglx: Massaged changelog ]

Fixes: b5dc8e6c21 "x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors"
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-04 14:48:49 -07:00
Tony Luck
adbe236b95 x86/mce: Avoid using object after free in genpool
commit a3125494cff084b098c80bb36fbe2061ffed9d52 upstream.

When we loop over all queued machine check error records to pass them
to the registered notifiers we use llist_for_each_entry(). But the loop
calls gen_pool_free() for the entry in the body of the loop - and then
the iterator looks at node->next after the free.

Use llist_for_each_entry_safe() instead.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Gong Chen <gong.chen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/0205920@agluck-desk.sc.intel.com
Link: http://lkml.kernel.org/r/1459929916-12852-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-04 14:48:40 -07:00
Andi Kleen
4b3d06d989 perf/x86/intel: Fix PEBS data source interpretation on Nehalem/Westmere
commit e17dc65328057c00db7e1bfea249c8771a78b30b upstream.

Jiri reported some time ago that some entries in the PEBS data source table
in perf do not agree with the SDM. We investigated and the bits
changed for Sandy Bridge, but the SDM was not updated.

perf already implements the bits correctly for Sandy Bridge
and later. This patch patches it up for Nehalem and Westmere.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: jolsa@kernel.org
Link: http://lkml.kernel.org/r/1456871124-15985-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:09:06 -07:00
Jiri Olsa
a54af124cd perf/x86/intel: Use PAGE_SIZE for PEBS buffer size on Core2
commit e72daf3f4d764c47fb71c9bdc7f9c54a503825b1 upstream.

Using PAGE_SIZE buffers makes the WRMSR to PERF_GLOBAL_CTRL in
intel_pmu_enable_all() mysteriously hang on Core2. As a workaround, we
don't do this.

The hard lockup is easily triggered by running 'perf test attr'
repeatedly. Most of the time it gets stuck on sample session with
small periods.

  # perf test attr -vv
  14: struct perf_event_attr setup                             :
  --- start ---
  ...
    'PERF_TEST_ATTR=/tmp/tmpuEKz3B /usr/bin/perf record -o /tmp/tmpuEKz3B/perf.data -c 123 kill >/dev/null 2>&1' ret 1

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20160301190352.GA8355@krava.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:09:06 -07:00
Kan Liang
886629ebb2 perf/x86/intel: Fix PEBS warning by only restoring active PMU in pmi
commit c3d266c8a9838cc141b69548bc3b1b18808ae8c4 upstream.

This patch tries to fix a PEBS warning found in my stress test. The
following perf command can easily trigger the pebs warning or spurious
NMI error on Skylake/Broadwell/Haswell platforms:

  sudo perf record -e 'cpu/umask=0x04,event=0xc4/pp,cycles,branches,ref-cycles,cache-misses,cache-references' --call-graph fp -b -c1000 -a

Also the NMI watchdog must be enabled.

For this case, the events number is larger than counter number. So
perf has to do multiplexing.

In perf_mux_hrtimer_handler, it does perf_pmu_disable(), schedule out
old events, rotate_ctx, schedule in new events and finally
perf_pmu_enable().

If the old events include precise event, the MSR_IA32_PEBS_ENABLE
should be cleared when perf_pmu_disable().  The MSR_IA32_PEBS_ENABLE
should keep 0 until the perf_pmu_enable() is called and the new event is
precise event.

However, there is a corner case which could restore PEBS_ENABLE to
stale value during the above period. In perf_pmu_disable(), GLOBAL_CTRL
will be set to 0 to stop overflow and followed PMI. But there may be
pending PMI from an earlier overflow, which cannot be stopped. So even
GLOBAL_CTRL is cleared, the kernel still be possible to get PMI. At
the end of the PMI handler, __intel_pmu_enable_all() will be called,
which will restore the stale values if old events haven't scheduled
out.

Once the stale pebs value is set, it's impossible to be corrected if
the new events are non-precise. Because the pebs_enabled will be set
to 0. x86_pmu.enable_all() will ignore the MSR_IA32_PEBS_ENABLE
setting. As a result, the following NMI with stale PEBS_ENABLE
trigger pebs warning.

The pending PMI after enabled=0 will become harmless if the NMI handler
does not change the state. This patch checks cpuc->enabled in pmi and
only restore the state when PMU is active.

Here is the dump:

  Call Trace:
   <NMI>  [<ffffffff813c3a2e>] dump_stack+0x63/0x85
   [<ffffffff810a46f2>] warn_slowpath_common+0x82/0xc0
   [<ffffffff810a483a>] warn_slowpath_null+0x1a/0x20
   [<ffffffff8100fe2e>] intel_pmu_drain_pebs_nhm+0x2be/0x320
   [<ffffffff8100caa9>] intel_pmu_handle_irq+0x279/0x460
   [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
   [<ffffffff811f290d>] ? vunmap_page_range+0x20d/0x330
   [<ffffffff811f2f11>] ?  unmap_kernel_range_noflush+0x11/0x20
   [<ffffffff8148379f>] ? ghes_copy_tofrom_phys+0x10f/0x2a0
   [<ffffffff814839c8>] ? ghes_read_estatus+0x98/0x170
   [<ffffffff81005a7d>] perf_event_nmi_handler+0x2d/0x50
   [<ffffffff810310b9>] nmi_handle+0x69/0x120
   [<ffffffff810316f6>] default_do_nmi+0xe6/0x100
   [<ffffffff810317f2>] do_nmi+0xe2/0x130
   [<ffffffff817aea71>] end_repeat_nmi+0x1a/0x1e
   [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
   [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
   [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
   <<EOE>>  <IRQ>  [<ffffffff81006df8>] ?  x86_perf_event_set_period+0xd8/0x180
   [<ffffffff81006eec>] x86_pmu_start+0x4c/0x100
   [<ffffffff8100722d>] x86_pmu_enable+0x28d/0x300
   [<ffffffff811994d7>] perf_pmu_enable.part.81+0x7/0x10
   [<ffffffff8119cb70>] perf_mux_hrtimer_handler+0x200/0x280
   [<ffffffff8119c970>] ?  __perf_install_in_context+0xc0/0xc0
   [<ffffffff8110f92d>] __hrtimer_run_queues+0xfd/0x280
   [<ffffffff811100d8>] hrtimer_interrupt+0xa8/0x190
   [<ffffffff81199080>] ?  __perf_read_group_add.part.61+0x1a0/0x1a0
   [<ffffffff81051bd8>] local_apic_timer_interrupt+0x38/0x60
   [<ffffffff817af01d>] smp_apic_timer_interrupt+0x3d/0x50
   [<ffffffff817ad15c>] apic_timer_interrupt+0x8c/0xa0
   <EOI>  [<ffffffff81199080>] ?  __perf_read_group_add.part.61+0x1a0/0x1a0
   [<ffffffff81123de5>] ?  smp_call_function_single+0xd5/0x130
   [<ffffffff81123ddb>] ?  smp_call_function_single+0xcb/0x130
   [<ffffffff81199080>] ?  __perf_read_group_add.part.61+0x1a0/0x1a0
   [<ffffffff8119765a>] event_function_call+0x10a/0x120
   [<ffffffff8119c660>] ? ctx_resched+0x90/0x90
   [<ffffffff811971e0>] ? cpu_clock_event_read+0x30/0x30
   [<ffffffff811976d0>] ? _perf_event_disable+0x60/0x60
   [<ffffffff8119772b>] _perf_event_enable+0x5b/0x70
   [<ffffffff81197388>] perf_event_for_each_child+0x38/0xa0
   [<ffffffff811976d0>] ? _perf_event_disable+0x60/0x60
   [<ffffffff811a0ffd>] perf_ioctl+0x12d/0x3c0
   [<ffffffff8134d855>] ? selinux_file_ioctl+0x95/0x1e0
   [<ffffffff8124a3a1>] do_vfs_ioctl+0xa1/0x5a0
   [<ffffffff81036d29>] ? sched_clock+0x9/0x10
   [<ffffffff8124a919>] SyS_ioctl+0x79/0x90
   [<ffffffff817ac4b2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
  ---[ end trace aef202839fe9a71d ]---
  Uhhuh. NMI received for unknown reason 2d on CPU 2.
  Do you have a strange power saving mode enabled?

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1457046448-6184-1-git-send-email-kan.liang@intel.com
[ Fixed various typos and other small details. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:09:06 -07:00
Stephane Eranian
d2b56a0758 perf/x86/pebs: Add workaround for broken OVFL status on HSW+
commit 8077eca079a212f26419c57226f28696b7100683 upstream.

This patch fixes an issue with the GLOBAL_OVERFLOW_STATUS bits on
Haswell, Broadwell and Skylake processors when using PEBS.

The SDM stipulates that when the PEBS iterrupt threshold is crossed,
an interrupt is posted and the kernel is interrupted. The kernel will
find GLOBAL_OVF_SATUS bit 62 set indicating there are PEBS records to
drain. But the bits corresponding to the actual counters should NOT be
set. The kernel follows the SDM and assumes that all PEBS events are
processed in the drain_pebs() callback. The kernel then checks for
remaining overflows on any other (non-PEBS) events and processes these
in the for_each_bit_set(&status) loop.

As it turns out, under certain conditions on HSW and later processors,
on PEBS buffer interrupt, bit 62 is set but the counter bits may be
set as well. In that case, the kernel drains PEBS and generates
SAMPLES with the EXACT tag, then it processes the counter bits, and
generates normal (non-EXACT) SAMPLES.

I ran into this problem by trying to understand why on HSW sampling on
a PEBS event was sometimes returning SAMPLES without the EXACT tag.
This should not happen on user level code because HSW has the
eventing_ip which always point to the instruction that caused the
event.

The workaround in this patch simply ensures that the bits for the
counters used for PEBS events are cleared after the PEBS buffer has
been drained. With this fix 100% of the PEBS samples on my user code
report the EXACT tag.

Before:
  $ perf record -e cpu/event=0xd0,umask=0x81/upp ./multichase
  $ perf report -D | fgrep SAMPLES
  PERF_RECORD_SAMPLE(IP, 0x2): 11775/11775: 0x406de5 period: 73469 addr: 0 exact=Y
                           \--- EXACT tag is missing

After:
  $ perf record -e cpu/event=0xd0,umask=0x81/upp ./multichase
  $ perf report -D | fgrep SAMPLES
  PERF_RECORD_SAMPLE(IP, 0x4002): 11775/11775: 0x406de5 period: 73469 addr: 0 exact=Y
                           \--- EXACT tag is set

The problem tends to appear more often when multiple PEBS events are used.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: namhyung@kernel.org
Link: http://lkml.kernel.org/r/1457034642-21837-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:09:05 -07:00