-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlqlSRUACgkQONu9yGCS
aT4pDQ//Y4lGqfzLeZpwhm4wWFL3ZiC0IY4WEFP0Mb4N6LPtDqhmZoW1nFT5l5sS
uJaKU/7w49GB9jjgYIlYu5+aiX/PMgTbHD7NRRIoWtRDKxgVBuP4iIhUDdB9u61f
t3W49sRHnGoc8apEOzp7zeNZMhJB+RXJnhvPkzOX1ctXsoL50aSQXhSDQUXt5g8U
yey0bvKs1aWW2f1sjpQJIDRtzFbZP+hLlfwoiz5T5d/50zovaqt2yhw6ke5vBMbi
eP8i+g/91t5+MT6Bc7SUpsYBxMyrcZvnEHb/HSMxez2twueK6O6TXbQWGxAcHYSQ
WF6W8xOKAl7z92MsDuVM4wy1Hamwc0R3jF+KTriYNboGThfuNZDD4OlUrmWfafoY
Uaoa6wI7fiYvPBq5zvLkOc7ufE+ECqRq9pC2VXZp+G4veEgcc4ftZ57blhULryKS
Nl0E25BXEpyix9m0Racl26OLFx3DUI3wSfPb1aqJx4EP13FYBAGJIVMcZfSEjN7/
zpfPZExelkF1Q5P+OWbRrg5bi/DQoy/EGmZMA0LvpQl8rSFBilLdUjhGGPvmVwBE
P9qElRjdSqeoOiDUyDmARO3iWm42s2cBqAwgxlgP8LUcyzxUperkvnbM25pvkHfn
hh+QV9zOFUlI+rVj7pwu4DLmdOteRw7+7mTY4jgBVlZcAMBBLd0=
=KZem
-----END PGP SIGNATURE-----
Merge 4.4.121 into android-4.4
Changes in 4.4.121
tpm: st33zp24: fix potential buffer overruns caused by bit glitches on the bus
tpm_i2c_infineon: fix potential buffer overruns caused by bit glitches on the bus
tpm_i2c_nuvoton: fix potential buffer overruns caused by bit glitches on the bus
ALSA: usb-audio: Add a quirck for B&W PX headphones
ALSA: hda: Add a power_save blacklist
cpufreq: s3c24xx: Fix broken s3c_cpufreq_init()
media: m88ds3103: don't call a non-initalized function
nospec: Allow index argument to have const-qualified type
ARM: mvebu: Fix broken PL310_ERRATA_753970 selects
KVM: mmu: Fix overlap between public and private memslots
x86/syscall: Sanitize syscall table de-references under speculation fix
btrfs: Don't clear SGID when inheriting ACLs
ARM: dts: LogicPD Torpedo: Fix I2C1 pinmux
x86/apic/vector: Handle legacy irq data correctly
leds: do not overflow sysfs buffer in led_trigger_show
x86/spectre: Fix an error message
Revert "led: core: Fix brightness setting when setting delay_off=0"
bridge: check brport attr show in brport_show
fib_semantics: Don't match route with mismatching tclassid
hdlc_ppp: carrier detect ok, don't turn off negotiation
ipv6 sit: work around bogus gcc-8 -Wrestrict warning
net: fix race on decreasing number of TX queues
net: ipv4: don't allow setting net.ipv4.route.min_pmtu below 68
netlink: ensure to loop over all netns in genlmsg_multicast_allns()
ppp: prevent unregistered channels from connecting to PPP units
udplite: fix partial checksum initialization
sctp: fix dst refcnt leak in sctp_v4_get_dst
sctp: fix dst refcnt leak in sctp_v6_get_dst()
s390/qeth: fix SETIP command handling
s390/qeth: fix IPA command submission race
sctp: verify size of a new chunk in _sctp_make_chunk()
net: mpls: Pull common label check into helper
mpls, nospec: Sanitize array index in mpls_label_ok()
dm io: fix duplicate bio completion due to missing ref count
bpf, x64: implement retpoline for tail call
btrfs: preserve i_mode if __btrfs_set_acl() fails
Linux 4.4.121
Change-Id: Ifc1f73c407f35cc1815e6f69bbed838c8ca60bc2
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
The backport of upstream commit 45d55e7bac40 ("x86/apic/vector: Fix off by
one in error path") missed to fixup the legacy interrupt data which is not
longer available upstream.
Handle legacy irq data correctly by clearing the legacy storage to prevent
use after free.
Fixes: 7fd1335392 ("x86/apic/vector: Fix off by one in error path") - 4.4.y
Fixes: c557481a9491 ("x86/apic/vector: Fix off by one in error path") - 4.9.y
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlpxo0gACgkQONu9yGCS
aT78EhAAs+LNVHZzqBuwgiuB/7Fsx5RvnzetpCstjWQnJHUCPjU9iCc4oTgpTGeC
jLZeQeUlwAguL87+GLEhKEflSqKd5O/3VozLd4Xw7tGTUSqkV/0yUbmKuXzMwqTP
ZlUDtM8eK4nfQ9ci/9yF6D3jMcpboVzFSlfu+HYLFxNUhr3NOf8jpPrMqDqTWEbP
ncT4habS87sQSDtZLFVsGLq2rtOg91NkiXSJEwyDeioTwR9kUju5eJGhF1yhmJZh
GEBOddmpD+RndL/Q0SN9poThWEFtWHwaBKeittHYzwnn5J7+ov9pjMmXkvGf8Slc
pWVx7WADcPkmyx18x53szI05uR0VycPB8YhwQW28yB9+4LabPzLSz9KNVDNcs7Tf
1GpP7Au0YVJBMjbUuJfZVe0MgSM6pRsw+I/etz47O27zsm/HEoRqHNwTk7T6B6jd
W0vjw2HohpQUxVa6AAgqVqCgzw4ALCmlIcepaOxtU6l3XEWLrMe8OwwUl6pQY+Fr
8dLk87SnFMgWVMyQf6M4Bse5EGHwfVEvA8z82HOlGNbynycexYDFWWxI/0P4CjPx
VCRg3XZF4OyoRWmy/9NKgpeQBXS+fiIGIGp0opjeMfpw6t6IJqeeFik6DzXZ3Hhe
FHYlCCtc45TAr3kJwSgfIoS7PcorBK93MDoEV58yJa6kcu0OOFQ=
=vx5p
-----END PGP SIGNATURE-----
Merge 4.4.114 into android-4.4
Changes in 4.4.114
x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels
usbip: prevent vhci_hcd driver from leaking a socket pointer address
usbip: Fix implicit fallthrough warning
usbip: Fix potential format overflow in userspace tools
x86/microcode/intel: Fix BDW late-loading revision check
x86/cpu/intel: Introduce macros for Intel family numbers
x86/retpoline: Fill RSB on context switch for affected CPUs
sched/deadline: Use the revised wakeup rule for suspending constrained dl tasks
can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once
can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once
PM / sleep: declare __tracedata symbols as char[] rather than char
time: Avoid undefined behaviour in ktime_add_safe()
timers: Plug locking race vs. timer migration
Prevent timer value 0 for MWAITX
drivers: base: cacheinfo: fix x86 with CONFIG_OF enabled
drivers: base: cacheinfo: fix boot error message when acpi is enabled
PCI: layerscape: Add "fsl,ls2085a-pcie" compatible ID
PCI: layerscape: Fix MSG TLP drop setting
mmc: sdhci-of-esdhc: add/remove some quirks according to vendor version
fs/select: add vmalloc fallback for select(2)
mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack
hwpoison, memcg: forcibly uncharge LRU pages
cma: fix calculation of aligned offset
mm, page_alloc: fix potential false positive in __zone_watermark_ok
ipc: msg, make msgrcv work with LONG_MIN
x86/ioapic: Fix incorrect pointers in ioapic_setup_resources()
ACPI / processor: Avoid reserving IO regions too early
ACPI / scan: Prefer devices without _HID/_CID for _ADR matching
ACPICA: Namespace: fix operand cache leak
netfilter: x_tables: speed up jump target validation
netfilter: arp_tables: fix invoking 32bit "iptable -P INPUT ACCEPT" failed in 64bit kernel
netfilter: nf_dup_ipv6: set again FLOWI_FLAG_KNOWN_NH at flowi6_flags
netfilter: nf_ct_expect: remove the redundant slash when policy name is empty
netfilter: nfnetlink_queue: reject verdict request from different portid
netfilter: restart search if moved to other chain
netfilter: nf_conntrack_sip: extend request line validation
netfilter: use fwmark_reflect in nf_send_reset
netfilter: fix IS_ERR_VALUE usage
netfilter: nfnetlink_cthelper: Add missing permission checks
netfilter: xt_osf: Add missing permission checks
ext2: Don't clear SGID when inheriting ACLs
reiserfs: fix race in prealloc discard
reiserfs: don't preallocate blocks for extended attributes
reiserfs: Don't clear SGID when inheriting ACLs
fs/fcntl: f_setown, avoid undefined behaviour
scsi: libiscsi: fix shifting of DID_REQUEUE host byte
Revert "module: Add retpoline tag to VERMAGIC"
Input: trackpoint - force 3 buttons if 0 button is reported
usb: usbip: Fix possible deadlocks reported by lockdep
usbip: fix stub_rx: get_pipe() to validate endpoint number
usbip: fix stub_rx: harden CMD_SUBMIT path to handle malicious input
usbip: prevent leaking socket pointer address in messages
um: link vmlinux with -no-pie
vsyscall: Fix permissions for emulate mode with KAISER/PTI
eventpoll.h: add missing epoll event masks
x86/microcode/intel: Extend BDW late-loading further with LLC size check
hrtimer: Reset hrtimer cpu base proper on CPU hotplug
dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state
ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL
ipv6: fix udpv6 sendmsg crash caused by too small MTU
ipv6: ip6_make_skb() needs to clear cork.base.dst
lan78xx: Fix failure in USB Full Speed
net: igmp: fix source address check for IGMPv3 reports
tcp: __tcp_hdrlen() helper
net: qdisc_pkt_len_init() should be more robust
pppoe: take ->needed_headroom of lower device into account on xmit
r8169: fix memory corruption on retrieval of hardware statistics.
sctp: do not allow the v4 socket to bind a v4mapped v6 address
sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf
vmxnet3: repair memory leak
net: Allow neigh contructor functions ability to modify the primary_key
ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
flow_dissector: properly cap thoff field
net: tcp: close sock if net namespace is exiting
nfsd: auth: Fix gid sorting when rootsquash enabled
Linux 4.4.114
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 9d98bcec731756b8688b59ec998707924d716d7b upstream.
On a 4-socket Brickland system, hot-removing one ioapic is fine.
Hot-removing the 2nd one causes panic in mp_unregister_ioapic()
while calling release_resource().
It is because the iomem_res pointer has already been released
when removing the first ioapic.
To explain the use of &res[num] here: res is assigned to ioapic_resources,
and later in ioapic_insert_resources() we do:
struct resource *r = ioapic_resources;
for_each_ioapic(i) {
insert_resource(&iomem_resource, r);
r++;
}
Here 'r' is treated as an arry of 'struct resource', and the r++ ensures
that each element of the array is inserted separately. Thus we should call
release_resouce() on each element at &res[num].
Fix it by assigning the correct pointers to ioapics[i].iomem_res in
ioapic_setup_resources().
Signed-off-by: Rui Wang <rui.y.wang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: tony.luck@intel.com
Cc: linux-pci@vger.kernel.org
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1465369193-4816-3-git-send-email-rui.y.wang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlpnhEgACgkQONu9yGCS
aT6wiBAAszhEwuUQy79/r5C8BTgpQNkt7rGWwZGRMz/nd/FTZSdJjZCI93NdT144
2i9x0ejQXkdpld2Al3Rl5GOlqEw43XTWqgiU3h/fW4nS+l/gpVZu2b9/2jsmsz36
cJGikTqwofs8wMzIlrAvfHIdXKrEAzeIbsp1NuDFq7WTdeUGorzu4ZSw7MfjQN70
tXSctd1IAhr776p6OqihVkasKV4S3D83vowivpvSCRsHR8HmmtS2kIl9QlHwNJo6
KzH3z5DHupJev+qYMsy7AucZjiDuQbXCw+9kPb9jAqFC00fBOng6DwNA63DaAL7N
QIx+tGJNUT/OPJTl0oift33Zg2fWALmsoSqHH6eJal7XjcP0sSLEnF91ayWms+BQ
m8qURMCYFShguk3om9jO4yZr6C+YbaqXxqGnhjPhnX2TvueUf7zTinXUk6d3JEfX
wnaugvqHyzWdPdxCOdBkUJ7YWRoODRKKrCHIB17A9063bZN0PombhimAPOR69NC5
kqd0bzK/lnY7OUGHipK/nfPRVJfSJlR43AFehaloowI/6hUe057v2bc3IQgTBUf1
kqX5wQD/VfhEtVibk5GomsgE/ERBkhIqpKNhm5U+/Qe2szO/XiKYuh3rEKGsTXus
0vx+TqIFpKt+oSY5rhtv9coRJov5kMnw2PYVsO+qr2TQ6TMILyQ=
=nlXw
-----END PGP SIGNATURE-----
Merge 4.4.113 into android-4.4
Changes in 4.4.113
gcov: disable for COMPILE_TEST
x86/cpu/AMD: Make LFENCE a serializing instruction
x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier
x86/asm: Use register variable to get stack pointer value
x86/kbuild: enable modversions for symbols exported from asm
x86/asm: Make asm/alternative.h safe from assembly
EXPORT_SYMBOL() for asm
kconfig.h: use __is_defined() to check if MODULE is defined
x86/retpoline: Add initial retpoline support
x86/spectre: Add boot time option to select Spectre v2 mitigation
x86/retpoline/crypto: Convert crypto assembler indirect jumps
x86/retpoline/entry: Convert entry assembler indirect jumps
x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
x86/retpoline/hyperv: Convert assembler indirect jumps
x86/retpoline/xen: Convert Xen hypercall indirect jumps
x86/retpoline/checksum32: Convert assembler indirect jumps
x86/retpoline/irq32: Convert assembler indirect jumps
x86/retpoline: Fill return stack buffer on vmexit
x86/retpoline: Remove compile time warning
scsi: sg: disable SET_FORCE_LOW_DMA
futex: Prevent overflow by strengthen input validation
ALSA: pcm: Remove yet superfluous WARN_ON()
ALSA: hda - Apply headphone noise quirk for another Dell XPS 13 variant
ALSA: hda - Apply the existing quirk to iMac 14,1
af_key: fix buffer overread in verify_address_len()
af_key: fix buffer overread in parse_exthdrs()
scsi: hpsa: fix volume offline state
sched/deadline: Zero out positive runtime after throttling constrained tasks
x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros
module: Add retpoline tag to VERMAGIC
pipe: avoid round_pipe_size() nr_pages overflow on 32-bit
x86/apic/vector: Fix off by one in error path
Input: 88pm860x-ts - fix child-node lookup
Input: twl6040-vibra - fix DT node memory management
Input: twl6040-vibra - fix child-node lookup
Input: twl4030-vibra - fix sibling-node lookup
tracing: Fix converting enum's from the map in trace_event_eval_update()
phy: work around 'phys' references to usb-nop-xceiv devices
ARM: dts: kirkwood: fix pin-muxing of MPP7 on OpenBlocks A7
can: peak: fix potential bug in packet fragmentation
libata: apply MAX_SEC_1024 to all LITEON EP1 series devices
dm btree: fix serious bug in btree_split_beneath()
dm thin metadata: THIN_MAX_CONCURRENT_LOCKS should be 6
arm64: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls
x86/cpu, x86/pti: Do not enable PTI on AMD processors
kbuild: modversions for EXPORT_SYMBOL() for asm
x86/mce: Make machine check speculation protected
retpoline: Introduce start/end markers of indirect thunk
kprobes/x86: Blacklist indirect thunk functions for kprobes
kprobes/x86: Disable optimizing on the function jumps to indirect thunk
x86/pti: Document fix wrong index
x86/retpoline: Optimize inline assembler for vmexit_fill_RSB
MIPS: AR7: ensure the port type's FCR value is used
Linux 4.4.113
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 45d55e7bac4028af93f5fa324e69958a0b868e96 upstream.
Keith reported the following warning:
WARNING: CPU: 28 PID: 1420 at kernel/irq/matrix.c:222 irq_matrix_remove_managed+0x10f/0x120
x86_vector_free_irqs+0xa1/0x180
x86_vector_alloc_irqs+0x1e4/0x3a0
msi_domain_alloc+0x62/0x130
The reason for this is that if the vector allocation fails the error
handling code tries to free the failed vector as well, which causes the
above imbalance warning to trigger.
Adjust the error path to handle this correctly.
Fixes: b5dc8e6c21 ("x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors")
Reported-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Keith Busch <keith.busch@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801161217300.1823@nanos
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kcov provides code coverage collection for coverage-guided fuzzing
(randomized testing). Coverage-guided fuzzing is a testing technique
that uses coverage feedback to determine new interesting inputs to a
system. A notable user-space example is AFL
(http://lcamtuf.coredump.cx/afl/). However, this technique is not
widely used for kernel testing due to missing compiler and kernel
support.
kcov does not aim to collect as much coverage as possible. It aims to
collect more or less stable coverage that is function of syscall inputs.
To achieve this goal it does not collect coverage in soft/hard
interrupts and instrumentation of some inherently non-deterministic or
non-interesting parts of kernel is disbled (e.g. scheduler, locking).
Currently there is a single coverage collection mode (tracing), but the
API anticipates additional collection modes. Initially I also
implemented a second mode which exposes coverage in a fixed-size hash
table of counters (what Quentin used in his original patch). I've
dropped the second mode for simplicity.
This patch adds the necessary support on kernel side. The complimentary
compiler support was added in gcc revision 231296.
We've used this support to build syzkaller system call fuzzer, which has
found 90 kernel bugs in just 2 months:
https://github.com/google/syzkaller/wiki/Found-Bugs
We've also found 30+ bugs in our internal systems with syzkaller.
Another (yet unexplored) direction where kcov coverage would greatly
help is more traditional "blob mutation". For example, mounting a
random blob as a filesystem, or receiving a random blob over wire.
Why not gcov. Typical fuzzing loop looks as follows: (1) reset
coverage, (2) execute a bit of code, (3) collect coverage, repeat. A
typical coverage can be just a dozen of basic blocks (e.g. an invalid
input). In such context gcov becomes prohibitively expensive as
reset/collect coverage steps depend on total number of basic
blocks/edges in program (in case of kernel it is about 2M). Cost of
kcov depends only on number of executed basic blocks/edges. On top of
that, kernel requires per-thread coverage because there are always
background threads and unrelated processes that also produce coverage.
With inlined gcov instrumentation per-thread coverage is not possible.
kcov exposes kernel PCs and control flow to user-space which is
insecure. But debugfs should not be mapped as user accessible.
Based on a patch by Quentin Casasnovas.
[akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
[akpm@linux-foundation.org: unbreak allmodconfig]
[akpm@linux-foundation.org: follow x86 Makefile layout standards]
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tavis Ormandy <taviso@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@google.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: David Drysdale <drysdale@google.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 64145065
(cherry-picked from 5c9a8750a6409c63a0f01d51a9024861022f6593)
Change-Id: I17b5e04f6e89b241924e78ec32ead79c38b860ce
Signed-off-by: Paul Lawrence <paullawrence@google.com>
commit e708e35ba6d89ff785b225cd07dcccab04fa954a upstream.
One of the rarely executed code pathes in check_timer() calls
unmask_ioapic_irq() passing irq_get_chip_data(0) as argument.
That's wrong as unmask_ioapic_irq() expects a pointer to the irq data of
interrupt 0. irq_get_chip_data(0) returns NULL, so the following
dereference in unmask_ioapic_irq() causes a kernel panic.
The issue went unnoticed in the first place because irq_get_chip_data()
returns a void pointer so the compiler cannot do a type check on the
argument. The code path was added for machines with broken configuration,
but it seems that those machines are either not running current kernels or
simply do not longer exist.
Hand in irq_get_irq_data(0) as argument which provides the correct data.
[ tglx: Rewrote changelog ]
Fixes: 4467715a44 ("x86/irq: Move irq_cfg.irq_2_pin into io_apic.c")
Signed-off-by: Seunghun Han <kkamagui@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1500369644-45767-1-git-send-email-kkamagui@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a9b4f08770b415f30f2fb0f8329a370c8f554aa3 upstream.
commit d32932d02e removed the irq_retrigger callback from the IO-APIC
chip and did not add it to the new IO-APIC-IR irq chip.
There is no harm because the interrupts are resent in software when the
retrigger callback is NULL, but it's less efficient. So restore them.
[ tglx: Massaged changelog ]
Fixes: d32932d02e ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
Signed-off-by: Ruslan Ruslichenko <rruslich@cisco.com>
Cc: xe-linux-external@cisco.com
Link: http://lkml.kernel.org/r/1484662432-13580-1-git-send-email-rruslich@cisco.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d966564fcdc19e13eb6ba1fbe6b8101070339c3d upstream.
This reverts commit 020eb3daaba2857b32c4cf4c82f503d6a00a67de.
Gabriel C reports that it causes his machine to not boot, and we haven't
tracked down the reason for it yet. Since the bug it fixes has been
around for a longish time, we're better off reverting the fix for now.
Gabriel says:
"It hangs early and freezes with a lot RCU warnings.
I bisected it down to :
> Ruslan Ruslichenko (1):
> x86/ioapic: Restore IO-APIC irq_chip retrigger callback
Reverting this one fixes the problem for me..
The box is a PRIMERGY TX200 S5 , 2 socket , 2 x E5520 CPU(s) installed"
and Ruslan and Thomas are currently stumped.
Reported-and-bisected-by: Gabriel C <nix.or.die@gmail.com>
Cc: Ruslan Ruslichenko <rruslich@cisco.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aaaec6fc755447a1d056765b11b24d8ff2b81366 upstream.
The recent commit which prevents double activation of interrupts unearthed
interesting code in x86. The code (ab)uses irq_domain_activate_irq() to
reconfigure an already activated interrupt. That trips over the prevention
code now.
Fix it by deactivating the interrupt before activating the new configuration.
Fixes: 08d85f3ea99f1 "irqdomain: Avoid activating interrupts more than once"
Reported-and-tested-by: Mike Galbraith <efault@gmx.de>
Reported-and-tested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1701311901580.3457@nanos
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 020eb3daaba2857b32c4cf4c82f503d6a00a67de upstream.
commit d32932d02e removed the irq_retrigger callback from the IO-APIC
chip and did not add it to the new IO-APIC-IR irq chip.
Unfortunately the software resend fallback is not enabled on X86, so edge
interrupts which are received during the lazy disabled state of the
interrupt line are not retriggered and therefor lost.
Restore the callbacks.
[ tglx: Massaged changelog ]
Fixes: d32932d02e ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
Signed-off-by: Ruslan Ruslichenko <rruslich@cisco.com>
Cc: xe-linux-external@cisco.com
Link: http://lkml.kernel.org/r/1484662432-13580-1-git-send-email-rruslich@cisco.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit db91aa793ff984ac048e199ea1c54202543952fe upstream.
When a CPU is about to be offlined we call fixup_irqs() that resets IRQ
affinities related to the CPU in question. The same thing is also done when
the system is suspended to S-states like S3 (mem).
For each IRQ we try to complete any on-going move regardless whether the
IRQ is actually part of x86_vector_domain. For each IRQ descriptor we fetch
its chip_data, assume it is of type struct apic_chip_data and manipulate it
by clearing old_domain mask etc. For irq_chips that are not part of the
x86_vector_domain, like those created by various GPIO drivers, will find
their chip_data being changed unexpectly.
Below is an example where GPIO chip owned by pinctrl-sunrisepoint.c gets
corrupted after resume:
# cat /sys/kernel/debug/gpio
gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00:
gpio-511 ( |sysfs ) in hi
# rtcwake -s10 -mmem
<10 seconds passes>
# cat /sys/kernel/debug/gpio
gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00:
gpio-511 ( |sysfs ) in ?
Note '?' in the output. It means the struct gpio_chip ->get function is
NULL whereas before suspend it was there.
Fix this by first checking that the IRQ belongs to x86_vector_domain before
we try to use the chip_data as struct apic_chip_data.
Reported-and-tested-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Link: http://lkml.kernel.org/r/20161003101708.34795-1-mika.westerberg@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2e63ad4bd5dd583871e6602f9d398b9322d358d9 upstream.
native_smp_prepare_cpus
-> default_setup_apic_routing
-> enable_IR_x2apic
-> irq_remapping_prepare
-> intel_prepare_irq_remapping
-> intel_setup_irq_remapping
So IR table is setup even if "noapic" boot parameter is added. As a result we
crash later when the interrupt affinity is set due to a half initialized
remapping infrastructure.
Prevent remap initialization when IOAPIC is disabled.
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Link: http://lkml.kernel.org/r/1471954039-3942-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1bdb8970392a68489b469c3a330a1adb5ef61beb upstream.
If x86_vector_alloc_irq() fails x86_vector_free_irqs() is invoked to cleanup
the already allocated vectors. This subsequently calls clear_vector_irq().
The failed irq has no vector assigned, which triggers the BUG_ON(!vector) in
clear_vector_irq().
We cannot suppress the call to x86_vector_free_irqs() for the failed
interrupt, because the other data related to this irq must be cleaned up as
well. So calling clear_vector_irq() with vector == 0 is legitimate.
Remove the BUG_ON and return if vector is zero,
[ tglx: Massaged changelog ]
Fixes: b5dc8e6c21 "x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors"
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 551adc60573cb68e3d55cacca9ba1b7437313df7 upstream.
Harry reported, that he's able to trigger a system freeze with cpu hot
unplug. The freeze turned out to be a live lock caused by recent changes in
irq_force_complete_move().
When fixup_irqs() and from there irq_force_complete_move() is called on the
dying cpu, then all other cpus are in stop machine an wait for the dying cpu
to complete the teardown. If there is a move of an interrupt pending then
irq_force_complete_move() sends the cleanup IPI to the cpus in the old_domain
mask and waits for them to clear the mask. That's obviously impossible as
those cpus are firmly stuck in stop machine with interrupts disabled.
I should have known that, but I completely overlooked it being concentrated on
the locking issues around the vectors. And the existance of the call to
__irq_complete_move() in the code, which actually sends the cleanup IPI made
it reasonable to wait for that cleanup to complete. That call was bogus even
before the recent changes as it was just a pointless distraction.
We have to look at two cases:
1) The move_in_progress flag of the interrupt is set
This means the ioapic has been updated with the new vector, but it has not
fired yet. In theory there is a race:
set_ioapic(new_vector) <-- Interrupt is raised before update is effective,
i.e. it's raised on the old vector.
So if the target cpu cannot handle that interrupt before the old vector is
cleaned up, we get a spurious interrupt and in the worst case the ioapic
irq line becomes stale, but my experiments so far have only resulted in
spurious interrupts.
But in case of cpu hotplug this should be a non issue because if the
affinity update happens right before all cpus rendevouz in stop machine,
there is no way that the interrupt can be blocked on the target cpu because
all cpus loops first with interrupts enabled in stop machine, so the old
vector is not yet cleaned up when the interrupt fires.
So the only way to run into this issue is if the delivery of the interrupt
on the apic/system bus would be delayed beyond the point where the target
cpu disables interrupts in stop machine. I doubt that it can happen, but at
least there is a theroretical chance. Virtualization might be able to
expose this, but AFAICT the IOAPIC emulation is not as stupid as the real
hardware.
I've spent quite some time over the weekend to enforce that situation,
though I was not able to trigger the delayed case.
2) The move_in_progress flag is not set and the old_domain cpu mask is not
empty.
That means, that an interrupt was delivered after the change and the
cleanup IPI has been sent to the cpus in old_domain, but not all CPUs have
responded to it yet.
In both cases we can assume that the next interrupt will arrive on the new
vector, so we can cleanup the old vectors on the cpus in the old_domain cpu
mask.
Fixes: 98229aa36caa "x86/irq: Plug vector cleanup race"
Reported-by: Harry Junior <harryjr@outlook.fr>
Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Ben Hutchings <ben@decadent.org.uk>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603140931430.3657@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 98229aa36caa9c769b13565523de9b813013c703 upstream.
We still can end up with a stale vector due to the following:
CPU0 CPU1 CPU2
lock_vector()
data->move_in_progress=0
sendIPI()
unlock_vector()
set_affinity()
assign_irq_vector()
lock_vector() handle_IPI
move_in_progress = 1 lock_vector()
unlock_vector()
move_in_progress == 1
So we need to serialize the vector assignment against a pending cleanup. The
solution is rather simple now. We not only check for the move_in_progress flag
in assign_irq_vector(), we also check whether there is still a cleanup pending
in the old_domain cpumask. If so, we return -EBUSY to the caller and let him
deal with it. Though we have to be careful in the cpu unplug case. If the
cleanout has not yet completed then the following setaffinity() call would
return -EBUSY. Add code which prevents this.
Full context is here: http://lkml.kernel.org/r/5653B688.4050809@stratus.com
Reported-and-tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Link: http://lkml.kernel.org/r/20151231160107.207265407@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 90a2282e23f0522e4b3f797ad447c5e91bf7fe32 upstream.
First of all there is no point in looking up the irq descriptor again, but we
also need the descriptor for the final cleanup race fix in the next
patch. Make that change seperate. No functional difference.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Link: http://lkml.kernel.org/r/20151231160107.125211743@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 56d7d2f4bbd00fb198b7907cb3ab657d06115a42 upstream.
We want to synchronize new vector assignments with a pending cleanup. Remove a
dying cpu from a pending cleanup mask.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Link: http://lkml.kernel.org/r/20151231160107.045961667@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5da0c1217f05d2ccc9a8ed6e6e5c23a8a1d24dd6 upstream.
There is no need to allocate a new cpumask for sending the cleanup vector. The
old_domain mask is now protected by the vector_lock, so we can safely remove
the offline cpus from it and send the IPI with the resulting mask.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Link: http://lkml.kernel.org/r/20151231160106.967993932@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c1684f5035b60e9f98566493e869496fb5de1d89 upstream.
send_cleanup_vector() fiddles with the old_domain mask unprotected because it
relies on the protection by the move_in_progress flag. But this is fatal, as
the flag is reset after the IPI has been sent. So a cpu which receives the IPI
can still see the flag set and therefor ignores the cleanup request. If no
other cleanup request happens then the vector stays stale on that cpu and in
case of an irq removal the vector still persists. That can lead to use after
free when the next cleanup IPI happens.
Protect the code with vector_lock and clear move_in_progress before sending
the IPI.
This does not plug the race which Joe reported because:
CPU0 CPU1 CPU2
lock_vector()
data->move_in_progress=0
sendIPI()
unlock_vector()
set_affinity()
assign_irq_vector()
lock_vector() handle_IPI
move_in_progress = 1 lock_vector()
unlock_vector()
move_in_progress == 1
The full fix comes with a later patch.
Reported-and-tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Link: http://lkml.kernel.org/r/20151231160106.892412198@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 847667ef10356b824a11c853fc8a8b1b437b6a8d upstream.
No point of keeping offline cpus in the cleanup mask.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Link: http://lkml.kernel.org/r/20151231160106.808642683@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ab25ac02148b600e645f77cfb8b8ea415ed75bb4 upstream.
Reusing an existing vector and assigning a new vector has duplicated
code. Consolidate it.
This is also a preparatory patch for finally plugging the cleanup race.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Link: http://lkml.kernel.org/r/20151231160106.721599216@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9ac15b7a8af4cf3337a101498c0ed690d23ade75 upstream.
In the case that the new vector mask is a subset of the existing mask there is
no point to do a AND operation of currentmask & newmask. The result is
newmask. So we can simply copy the new mask to the current mask and be done
with it. Preparatory patch for further consolidation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Link: http://lkml.kernel.org/r/20151231160106.640253454@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3716fd27a604d61a91cda47083504971486b80f1 upstream.
__assign_irq_vector() uses the vector_cpumask which is assigned by
apic->vector_allocation_domain() without doing basic sanity checks. That can
result in a situation where the final assignement of a newly found vector
fails in apic->cpu_mask_to_apicid_and(). So we have to do rollbacks for no
reason.
apic->cpu_mask_to_apicid_and() only fails if
vector_cpumask & requested_cpumask & cpu_online_mask
is empty.
Check for this condition right away and if the result is empty try immediately
the next possible cpu in the requested mask. So in case of a failure the old
setting is unchanged and we can remove the rollback code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Link: http://lkml.kernel.org/r/20151231160106.561877324@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 95ffeb4b5baca266e1d0d2bc90f1513e6f419cdd upstream.
Split out the code which advances the target cpu for the search so we can
reuse it for the next patch which adds an early validation check for the
vectormask which we get from the apic.
Add comments while at it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Link: http://lkml.kernel.org/r/20151231160106.484562040@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 433cbd57d190a1cdd02f243df41c3d7f55ec4b94 upstream.
Use an explicit goto for the cases where we have success in the search/update
and return -ENOSPC if the search loop ends due to no space.
Preparatory patch for fixes. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Link: http://lkml.kernel.org/r/20151231160106.403491024@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8a580f70f6936ec095da217018cdeeb5835c0207 upstream.
Function __assign_irq_vector() makes use of apic_chip_data.old_domain as a
temporary buffer, which is in the way of using apic_chip_data.old_domain for
synchronizing the vector cleanup with the vector assignement code.
Use a proper temporary cpumask for this.
[ tglx: Renamed the mask to searched_cpumask for clarity ]
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Link: http://lkml.kernel.org/r/1450880014-11741-1-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 111abeba67e0dbdc26537429de9155e4f1d807d8 upstream.
There's a race condition between
x86_vector_free_irqs()
{
free_apic_chip_data(irq_data->chip_data);
xxxxx //irq_data->chip_data has been freed, but the pointer
//hasn't been reset yet
irq_domain_reset_irq_data(irq_data);
}
and
smp_irq_move_cleanup_interrupt()
{
raw_spin_lock(&vector_lock);
data = apic_chip_data(irq_desc_get_irq_data(desc));
access data->xxxx // may access freed memory
raw_spin_unlock(&desc->lock);
}
which may cause smp_irq_move_cleanup_interrupt() to access freed memory.
Call irq_domain_reset_irq_data(), which clears the pointer with vector lock
held.
[ tglx: Free memory outside of lock held region. ]
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Link: http://lkml.kernel.org/r/1450880014-11741-3-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e23b257c293ce4bcc8cabb2aa3097b6ed8a8261a upstream.
setup_ioapic_dest() calls irqchip->irq_set_affinity() completely
unprotected. That's wrong in several aspects:
- it opens a race window where irq_set_affinity() can be interrupted and the
irq chip left in unconsistent state.
- it triggers a lockdep splat when we fix the vector race for 4.3+ because
vector lock is taken with interrupts enabled.
The proper calling convention is irq descriptor lock held and interrupts
disabled.
Reported-and-tested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Joe Lawrence <joe.lawrence@stratus.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1601140919420.3575@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The MMCFG PCI accessors weren't being setup for NumacConnect2
correctly due to over-early assignment; this would create the
potential for the wrong PCI domain to be accessed.
Fix this by using the correct arch-specific PCI init function.
Signed-off-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Steffen Persvold <sp@numascale.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1451498807-15920-1-git-send-email-daniel@numascale.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Commit d32932d02e ("x86/irq: Convert IOAPIC to use hierarchical irqdomain
interfaces") brought a regression for Hyper-V Gen2 instances. These
instances don't have i8259 legacy PIC but they use legacy IRQs for serial
port, rtc, and acpi. With this commit included we end up with these IRQs
not initialized. Earlier, there was a special workaround for legacy IRQs
in mp_map_pin_to_irq() doing mp_irqdomain_map() without looking at
nr_legacy_irqs() and now we fail in __irq_domain_alloc_irqs() when
irq_domain_alloc_descs() returns -EEXIST.
The essence of the issue seems to be that early_irq_init() calls
arch_probe_nr_irqs() to figure out the number of legacy IRQs before
we probe for i8259 and gets 16. Later when init_8259A() is called we switch
to NULL legacy PIC and nr_legacy_irqs() starts to return 0 but we already
have 16 descs allocated.
Solve the issue by separating i8259 probe from init and calling it in
arch_probe_nr_irqs() before we actually use nr_legacy_irqs() information.
Fixes: d32932d02e ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1446543614-3621-1-git-send-email-vkuznets@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull x86 apic changes from Ingo Molnar:
"The main changes in this cycle were:
- Numachip updates: new hardware support, fixes and cleanups.
(Daniel J Blueman)
- misc smaller cleanups and fixlets"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/io_apic: Make eoi_ioapic_pin() static
x86/irq: Drop unlikely before IS_ERR_OR_NULL
x86/x2apic: Make stub functions available even if !CONFIG_X86_LOCAL_APIC
x86/apic: Deinline various functions
x86/numachip: Fix timer build conflict
x86/numachip: Introduce Numachip2 timer mechanisms
x86/numachip: Add Numachip IPI optimisations
x86/numachip: Add Numachip2 APIC support
x86/numachip: Cleanup Numachip support
Commit 4857c91f0d changed the way how irq affinity is setup in
setup_ioapic_dest() from using the core helper function to
unconditionally calling the irq_set_affinity() callback of the
underlying irq chip.
That results in a NULL pointer dereference for the rare case where the
underlying irq chip is lapic_chip which has no irq_set_affinity()
callback. lapic_chip is occasionally used for the timer interrupt (irq
0).
The fix is simple: Check the availability of the callback instead of
calling it unconditionally.
Fixes: 4857c91f0d "x86/ioapic: Force affinity setting in setup_ioapic_dest()"
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
A sporadic hang with consequent crash is observed when booting Hyper-V Gen1
guests:
Call Trace:
<IRQ>
[<ffffffff810ab68d>] ? trace_hardirqs_off+0xd/0x10
[<ffffffff8107b616>] queue_work_on+0x46/0x90
[<ffffffff81365696>] ? add_interrupt_randomness+0x176/0x1d0
...
<EOI>
[<ffffffff81471ddb>] ? _raw_spin_unlock_irqrestore+0x3b/0x60
[<ffffffff810c295e>] __irq_put_desc_unlock+0x1e/0x40
[<ffffffff810c5c35>] irq_modify_status+0xb5/0xd0
[<ffffffff8104adbb>] mp_register_handler+0x4b/0x70
[<ffffffff8104c55a>] mp_irqdomain_alloc+0x1ea/0x2a0
[<ffffffff810c7f10>] irq_domain_alloc_irqs_recursive+0x40/0xa0
[<ffffffff810c860c>] __irq_domain_alloc_irqs+0x13c/0x2b0
[<ffffffff8104b070>] alloc_isa_irq_from_domain.isra.1+0xc0/0xe0
[<ffffffff8104bfa5>] mp_map_pin_to_irq+0x165/0x2d0
[<ffffffff8104c157>] pin_2_irq+0x47/0x80
[<ffffffff81744253>] setup_IO_APIC+0xfe/0x802
...
[<ffffffff814631c0>] ? rest_init+0x140/0x140
The issue is easily reproducible with a simple instrumentation: if
mdelay(10) is put between mp_setup_entry() and mp_register_handler() calls
in mp_irqdomain_alloc() Hyper-V guest always fails to boot when re-routing
IRQ0. The issue seems to be caused by the fact that we don't disable
interrupts while doing IOPIC programming for legacy IRQs and IRQ0 actually
happens.
Protect the setup sequence against concurrent interrupts.
[ tglx: Make the protection unconditional and not only for legacy
interrupts ]
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Link: http://lkml.kernel.org/r/1444930943-19336-1-git-send-email-vkuznets@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We have to define internally used function as static, otherwise the following
warning will be generated:
arch/x86/kernel/apic/io_apic.c:532:6: warning: no previous prototype for 'eoi_ioapic_pin' [-Wmissing-prototypes]
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
Link: http://lkml.kernel.org/r/1444400685-98611-1-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When sending IPIs, first check if the non-local part of the source and
destination APIC IDs match; if so, send via the local APIC for efficiency.
Secondly, since the AMD BIOS-kernel developer guide states IPI delivery
will occur invarient of prior deliver status, avoid polling the delivery
status bit for efficiency.
Signed-off-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Steffen Persvold <sp@numascale.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: http://lkml.kernel.org/r/1442768522-19217-3-git-send-email-daniel@numascale.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Introduce support for Numachip2 remote interrupts via detecting the right
ACPI SRAT signature.
Access is performed via a fixed mapping in the x86 physical address space.
Signed-off-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Steffen Persvold <sp@numascale.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: http://lkml.kernel.org/r/1442768522-19217-2-git-send-email-daniel@numascale.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Drop unused code and includes in Numachip header files and APIC driver.
Additionally, use the 'numachip1' prefix on Numachip1-specific functions;
this prepares for adding Numachip2 support in later patches.
Signed-off-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Steffen Persvold <sp@numascale.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: http://lkml.kernel.org/r/1442768522-19217-1-git-send-email-daniel@numascale.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull irq updates from Thomas Gleixner:
"This is a rather large update post rc1 due to the final steps of
cleanups and API changes which had to wait for the preparatory patches
to hit your tree.
- Regression fixes for ARM GIC irqchips
- Regression fixes and lockdep anotations for renesas irq chips
- The leftovers of the cleanup and preparatory patches which have
been ignored by maintainers
- Final conversions of the newly merged users of obsolete APIs
- Final removal of obsolete APIs
- Final removal of ARM artifacts which had been introduced during the
conversion of ARM to the generic interrupt code.
- Final split of the irq_data into chip specific and common data to
reflect the needs of hierarchical irq domains.
- Treewide removal of the first argument of interrupt flow handlers,
i.e. the irq number, which is not used by the majority of handlers
and simple to retrieve from the other argument the irq descriptor.
- A few comment updates and build warning fixes"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
arm64: Remove ununsed set_irq_flags
ARM: Remove ununsed set_irq_flags
sh: Kill off set_irq_flags usage
irqchip: Kill off set_irq_flags usage
gpu/drm: Kill off set_irq_flags usage
genirq: Remove irq argument from irq flow handlers
genirq: Move field 'msi_desc' from irq_data into irq_common_data
genirq: Move field 'affinity' from irq_data into irq_common_data
genirq: Move field 'handler_data' from irq_data into irq_common_data
genirq: Move field 'node' from irq_data into irq_common_data
irqchip/gic-v3: Use IRQD_FORWARDED_TO_VCPU flag
irqchip/gic: Use IRQD_FORWARDED_TO_VCPU flag
genirq: Provide IRQD_FORWARDED_TO_VCPU status flag
genirq: Simplify irq_data_to_desc()
genirq: Remove __irq_set_handler_locked()
pinctrl/pistachio: Use irq_set_handler_locked
gpio: vf610: Use irq_set_handler_locked
powerpc/mpc8xx: Use irq_set_handler_locked()
powerpc/ipic: Use irq_set_handler_locked()
powerpc/cpm2: Use irq_set_handler_locked()
...
Pull x86 fixes from Ingo Molnar:
- misc fixes all around the map
- block non-root vm86(old) if mmap_min_addr != 0
- two small debuggability improvements
- removal of obsolete paravirt op
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform: Fix Geode LX timekeeping in the generic x86 build
x86/apic: Serialize LVTT and TSC_DEADLINE writes
x86/ioapic: Force affinity setting in setup_ioapic_dest()
x86/paravirt: Remove the unused pv_time_ops::get_tsc_khz method
x86/ldt: Fix small LDT allocation for Xen
x86/vm86: Fix the misleading CONFIG_VM86 Kconfig help text
x86/cpu: Print family/model/stepping in hex
x86/vm86: Block non-root vm86(old) if mmap_min_addr != 0
x86/alternatives: Make optimize_nops() interrupt safe and synced
x86/mm/srat: Print non-volatile flag in SRAT
x86/cpufeatures: Enable cpuid for Intel SHA extensions
Irq affinity mask is per-irq instead of per irqchip, so move it into
struct irq_common_data.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Kevin Cernekee <cernekee@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: http://lkml.kernel.org/r/1433303281-27688-1-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The APIC LVTT register is MMIO mapped but the TSC_DEADLINE register is an
MSR. The write to the TSC_DEADLINE MSR is not serializing, so it's not
guaranteed that the write to LVTT has reached the APIC before the
TSC_DEADLINE MSR is written. In such a case the write to the MSR is
ignored and as a consequence the local timer interrupt never fires.
The SDM decribes this issue for xAPIC and x2APIC modes. The
serialization methods recommended by the SDM differ.
xAPIC:
"1. Memory-mapped write to LVT Timer Register, setting bits 18:17 to 10b.
2. WRMSR to the IA32_TSC_DEADLINE MSR a value much larger than current time-stamp counter.
3. If RDMSR of the IA32_TSC_DEADLINE MSR returns zero, go to step 2.
4. WRMSR to the IA32_TSC_DEADLINE MSR the desired deadline."
x2APIC:
"To allow for efficient access to the APIC registers in x2APIC mode,
the serializing semantics of WRMSR are relaxed when writing to the
APIC registers. Thus, system software should not use 'WRMSR to APIC
registers in x2APIC mode' as a serializing instruction. Read and write
accesses to the APIC registers will occur in program order. A WRMSR to
an APIC register may complete before all preceding stores are globally
visible; software can prevent this by inserting a serializing
instruction, an SFENCE, or an MFENCE before the WRMSR."
The xAPIC method is to just wait for the memory mapped write to hit
the LVTT by checking whether the MSR write has reached the hardware.
There is no reason why a proper MFENCE after the memory mapped write would
not do the same. Andi Kleen confirmed that MFENCE is sufficient for the
xAPIC case as well.
Issue MFENCE before writing to the TSC_DEADLINE MSR. This can be done
unconditionally as all CPUs which have TSC_DEADLINE also have MFENCE
support.
[ tglx: Massaged the changelog ]
Signed-off-by: Shaohua Li <shli@fb.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: <Kernel-team@fb.com>
Cc: <lenb@kernel.org>
Cc: <fenghua.yu@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: stable@vger.kernel.org #v3.7+
Link: http://lkml.kernel.org/r/20150909041352.GA2059853@devbig257.prn2.facebook.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The recent ioapic cleanups changed the affinity setting in
setup_ioapic_dest() from a direct write to the hardware to the delayed
affinity setup via irq_set_affinity().
That results in a warning from chained_irq_exit():
WARNING: CPU: 0 PID: 5 at kernel/irq/migration.c:32 irq_move_masked_irq
[<ffffffff810a0a88>] irq_move_masked_irq+0xb8/0xc0
[<ffffffff8103c161>] ioapic_ack_level+0x111/0x130
[<ffffffff812bbfe8>] intel_gpio_irq_handler+0x148/0x1c0
The reason is that irq_set_affinity() does not write directly to the
hardware. It marks the affinity setting as pending and executes it
from the next interrupt. The chained handler infrastructure does not
take the irq descriptor lock for performance reasons because such a
chained interrupt is not visible to any interfaces. So the delayed
affinity setting triggers the warning in irq_move_masked_irq().
Restore the old behaviour by calling the set_affinity function of the
ioapic chip in setup_ioapic_dest(). This is safe as none of the
interrupts can be on the fly at this point.
Fixes: aa5cb97f14 'x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces'
Reported-and-tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: jarkko.nikula@linux.intel.com
Pull NMI backtrace update from Russell King:
"These changes convert the x86 NMI handling to be a library
implementation which other architectures can make use of. Thomas
Gleixner has reviewed and tested these changes, and wishes me to send
these rather than taking them through the tip tree.
The final patch in the set adds an initial implementation using this
infrastructure to ARM, even though it doesn't send the IPI at "NMI"
level. Patches are in progress to add the ARM equivalent of NMI, but
we still need the IRQ-level fallback for systems where the "NMI" isn't
available due to secure firmware denying access to it"
* 'nmi' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: add basic support for on-demand backtrace of other CPUs
nmi: x86: convert to generic nmi handler
nmi: create generic NMI backtrace implementation
Pull x86 apic updates from Thomas Gleixner:
"This udpate contains:
- rework the irq vector array to store a pointer to the irq
descriptor instead of the irq number to avoid a lookup of the irq
descriptor in the irq entry path
- lguest interrupt handling cleanups
- conversion of the local apic timer to the new clockevent callbacks
- preparatory changes for the irq argument removal of interrupt flow
handlers"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/irq: Do not dereference irq descriptor before checking it
tools/lguest: Clean up include dir
tools/lguest: Fix redefinition of struct virtio_pci_cfg_cap
x86/irq: Store irq descriptor in vector array
genirq: Provide irq_desc_has_action
x86/irq: Get rid of an indentation level
x86/irq: Rename VECTOR_UNDEFINED to VECTOR_UNUSED
x86/irq: Replace numeric constant
x86/irq: Protect smp_cleanup_move
x86/lguest: Do not setup unused irq vectors
x86/lguest: Clean up lguest_setup_irq
x86/apic: Drop local_irq_save/restore in timer callbacks
x86/apic: Migrate apic timer to new set_state interface
x86/irq: Use access helper irq_data_get_affinity_mask()
x86/irq: Use accessor irq_data_get_irq_handler_data()
x86/irq: Use accessor irq_data_get_node()
Pull x86 init code fixlet from Ingo Molnar:
"A single change: fix obsolete init code annotations"
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Drop bogus __ref / __refdata annotations
Pull x86 boot updates from Ingo Molnar:
"The main x86 bootup related changes in this cycle were:
- more boot time optimizations. (Len Brown)
- implement hex output to allow the debugging of early bootup
parameters. (Kees Cook)
- remove obsolete MCA leftovers. (Paolo Pisati)"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/smpboot: Remove APIC.wait_for_init_deassert and atomic init_deasserted
x86/smpboot: Remove SIPI delays from cpu_up()
x86/smpboot: Remove udelay(100) when polling cpu_callin_map
x86/smpboot: Remove udelay(100) when polling cpu_initialized_map
x86/boot: Obsolete the MCA sys_desc_table
x86/boot: Add hex output for debugging
Pull x86 asm changes from Ingo Molnar:
"The biggest changes in this cycle were:
- Revamp, simplify (and in some cases fix) Time Stamp Counter (TSC)
primitives. (Andy Lutomirski)
- Add new, comprehensible entry and exit handlers written in C.
(Andy Lutomirski)
- vm86 mode cleanups and fixes. (Brian Gerst)
- 32-bit compat code cleanups. (Brian Gerst)
The amount of simplification in low level assembly code is already
palpable:
arch/x86/entry/entry_32.S | 130 +----
arch/x86/entry/entry_64.S | 197 ++-----
but more simplifications are planned.
There's also the usual laudry mix of low level changes - see the
changelog for details"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (83 commits)
x86/asm: Drop repeated macro of X86_EFLAGS_AC definition
x86/asm/msr: Make wrmsrl() a function
x86/asm/delay: Introduce an MWAITX-based delay with a configurable timer
x86/asm: Add MONITORX/MWAITX instruction support
x86/traps: Weaken context tracking entry assertions
x86/asm/tsc: Add rdtscll() merge helper
selftests/x86: Add syscall_nt selftest
selftests/x86: Disable sigreturn_64
x86/vdso: Emit a GNU hash
x86/entry: Remove do_notify_resume(), syscall_trace_leave(), and their TIF masks
x86/entry/32: Migrate to C exit path
x86/entry/32: Remove 32-bit syscall audit optimizations
x86/vm86: Rename vm86->v86flags and v86mask
x86/vm86: Rename vm86->vm86_info to user_vm86
x86/vm86: Clean up vm86.h includes
x86/vm86: Move the vm86 IRQ definitions to vm86.h
x86/vm86: Use the normal pt_regs area for vm86
x86/vm86: Eliminate 'struct kernel_vm86_struct'
x86/vm86: Move fields from 'struct kernel_vm86_struct' to 'struct vm86'
x86/vm86: Move vm86 fields out of 'thread_struct'
...