When auditing cmpxchg call sites, Chuck noted that gcc was optimizing
away some of the desired LDs.
| do {
| new = old = *ipi_data_ptr;
| new |= 1U << msg;
| } while (cmpxchg(ipi_data_ptr, old, new) != old);
was generating to below
| 8015cef8: ld r2,[r4,0] <-- First LD
| 8015cefc: bset r1,r2,r1
|
| 8015cf00: llock r3,[r4] <-- atomic op
| 8015cf04: brne r3,r2,8015cf10
| 8015cf08: scond r1,[r4]
| 8015cf0c: bnz 8015cf00
|
| 8015cf10: brne r3,r2,8015cf00 <-- Branch doesn't go to orig LD
Although this was fixed by adding a ACCESS_ONCE in this call site, it
seems safer (for now at least) to add compiler barrier to LLSC based
cmpxchg
Reported-by: Chuck Jordan <cjordan@synopsys,com>
Cc: <stable@vger.kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Pull x86 core updates from Ingo Molnar:
"There were so many changes in the x86/asm, x86/apic and x86/mm topics
in this cycle that the topical separation of -tip broke down somewhat -
so the result is a more traditional architecture pull request,
collected into the 'x86/core' topic.
The topics were still maintained separately as far as possible, so
bisectability and conceptual separation should still be pretty good -
but there were a handful of merge points to avoid excessive
dependencies (and conflicts) that would have been poorly tested in the
end.
The next cycle will hopefully be much more quiet (or at least will
have fewer dependencies).
The main changes in this cycle were:
* x86/apic changes, with related IRQ core changes: (Jiang Liu, Thomas
Gleixner)
- This is the second and most intrusive part of changes to the x86
interrupt handling - full conversion to hierarchical interrupt
domains:
[IOAPIC domain] -----
|
[MSI domain] --------[Remapping domain] ----- [ Vector domain ]
| (optional) |
[HPET MSI domain] ----- |
|
[DMAR domain] -----------------------------
|
[Legacy domain] -----------------------------
This now reflects the actual hardware and allowed us to distangle
the domain specific code from the underlying parent domain, which
can be optional in the case of interrupt remapping. It's a clear
separation of functionality and removes quite some duct tape
constructs which plugged the remap code between ioapic/msi/hpet
and the vector management.
- Intel IOMMU IRQ remapping enhancements, to allow direct interrupt
injection into guests (Feng Wu)
* x86/asm changes:
- Tons of cleanups and small speedups, micro-optimizations. This
is in preparation to move a good chunk of the low level entry
code from assembly to C code (Denys Vlasenko, Andy Lutomirski,
Brian Gerst)
- Moved all system entry related code to a new home under
arch/x86/entry/ (Ingo Molnar)
- Removal of the fragile and ugly CFI dwarf debuginfo annotations.
Conversion to C will reintroduce many of them - but meanwhile
they are only getting in the way, and the upstream kernel does
not rely on them (Ingo Molnar)
- NOP handling refinements. (Borislav Petkov)
* x86/mm changes:
- Big PAT and MTRR rework: making the code more robust and
preparing to phase out exposing direct MTRR interfaces to drivers -
in favor of using PAT driven interfaces (Toshi Kani, Luis R
Rodriguez, Borislav Petkov)
- New ioremap_wt()/set_memory_wt() interfaces to support
Write-Through cached memory mappings. This is especially
important for good performance on NVDIMM hardware (Toshi Kani)
* x86/ras changes:
- Add support for deferred errors on AMD (Aravind Gopalakrishnan)
This is an important RAS feature which adds hardware support for
poisoned data. That means roughly that the hardware marks data
which it has detected as corrupted but wasn't able to correct, as
poisoned data and raises an APIC interrupt to signal that in the
form of a deferred error. It is the OS's responsibility then to
take proper recovery action and thus prolonge system lifetime as
far as possible.
- Add support for Intel "Local MCE"s: upcoming CPUs will support
CPU-local MCE interrupts, as opposed to the traditional system-
wide broadcasted MCE interrupts (Ashok Raj)
- Misc cleanups (Borislav Petkov)
* x86/platform changes:
- Intel Atom SoC updates
... and lots of other cleanups, fixlets and other changes - see the
shortlog and the Git log for details"
* 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (222 commits)
x86/hpet: Use proper hpet device number for MSI allocation
x86/hpet: Check for irq==0 when allocating hpet MSI interrupts
x86/mm/pat, drivers/infiniband/ipath: Use arch_phys_wc_add() and require PAT disabled
x86/mm/pat, drivers/media/ivtv: Use arch_phys_wc_add() and require PAT disabled
x86/platform/intel/baytrail: Add comments about why we disabled HPET on Baytrail
genirq: Prevent crash in irq_move_irq()
genirq: Enhance irq_data_to_desc() to support hierarchy irqdomain
iommu, x86: Properly handle posted interrupts for IOMMU hotplug
iommu, x86: Provide irq_remapping_cap() interface
iommu, x86: Setup Posted-Interrupts capability for Intel iommu
iommu, x86: Add cap_pi_support() to detect VT-d PI capability
iommu, x86: Avoid migrating VT-d posted interrupts
iommu, x86: Save the mode (posted or remapped) of an IRTE
iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
iommu: dmar: Provide helper to copy shared irte fields
iommu: dmar: Extend struct irte for VT-d Posted-Interrupts
iommu: Add new member capability to struct irq_remap_ops
x86/asm/entry/64: Disentangle error_entry/exit gsbase/ebx/usermode code
x86/asm/entry/32: Shorten __audit_syscall_entry() args preparation
x86/asm/entry/32: Explain reloading of registers after __audit_syscall_entry()
...
This allows platforms to provide their own cpu wakeup routines
as well as IPI send / clear backends, while allowing a SMP kernel w/o
any such backend to build/boot
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Caveats about cache flush on ARCv2 based cores
- dcache is PIPT so paddr is sufficient for cache maintenance ops (no
need to setup PTAG reg
- icache is still VIPT but only aliasing configs need PTAG setup
So basically this is departure from MMU-v3 which always need vaddr in
line ops registers (DC_IVDL, DC_FLDL, IC_IVIL) but paddr in DC_PTAG,
IC_PTAG respectively.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
The issue was, on HS when interrupt is taken, IRQ_ACT is set and that is
NOT cleared unless we do RTIE (or manually clear it). Linux interrupt
handling has top and bottom halves. Latter lead to softirqs (which can
reschedule) AND expect interrupts to be REALLY re-enabled which was NOT
happening for us since we only SETI, dont clear IRQ_ACT
So we can have a state when both cores have taken interrupt (IRQ_ACT set),
get rescheduled, both send IPI and wait in CSD lock which will never be
cleared as cores can't take the pending IPI IRQ due to existing IRQ_ACT
set.
So local_irq_enable() now drops the IRQ_ACT.act bit to re-enable IRQs.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Reported by Anton as LTP:munmap01 failing with Illegal Instruction
Exception.
--------------------->8--------------------------------------
mmap2(NULL, 24576, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x200d2000
munmap(0x200d2000, 24576) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x200d2000}
---
potentially unexpected fatal signal 4.
Path: /munmap01
CPU: 0 PID: 61 Comm: munmap01 Not tainted 3.13.0-g5d5c46d9a556 #8
task: 9f1a8000 ti: 9f154000 task.ti: 9f154000
[ECR ]: 0x00020100 => Illegal Insn
[EFA ]: 0x0001354c
[BLINK ]: 0x200515d4
[ERET ]: 0x1354c
@off 0x1354c in [/munmap01]
VMA: 0x00010000 to 0x00018000
[STAT32]: 0x800802c0
...
--------------------->8--------------------------------------
The issue was
1. munmap01 accessed unmapped memory (on purpose) with signal handler
installed for SIGSEGV
2. The faulting instruction happened to be in Delay Slot
00011864 <main>:
11908: bl.d 13284 <tst_resm>
1190c: stb r16,[r2]
3. kernel sets up the reg file for signal handler and correctly clears
the DE bit in pt_regs->status32 placeholder
4. However RESTORE_CALLEE_SAVED_USER macro is not adjusted for ARCv2,
and it over-writes the above with orig/stale value of status32
5. After RTIE, userspace signal handler executes a non branch
instruction with DE bit set, triggering Illegal Instruction Exception.
Reported-by: Anton Kolesov <akolesov@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
These have been register compatible so far. However ARCv2 mandates
different pt_regs layout (due to h/w auto save). To keep pt_regs same
for both, we start by removing the assumption - used mainly for block
copies between the 2 structs in signal handling and ptrace
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Previously this macro was overloaded with stack switching, saving SP at right
slot in pt_regs, saving/setup of r25 and setting SP baseline to where
pt_regs->sp is saved (vs. bottom of pt_regs)
Now it only does SP switch, and leaves SP pointing to bottom of pt_regs.
r25 saving is no longer done here to allow for future reordering of
regfile in pt_regs w/o touching this macro
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Returning from pure kernel mode and exception mode use the same code
anyways. Remove one the duplicate blocks
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Elide the need to re-read ECR in Trap handler by ensuring that
EXCEPTION_PROLOGUE does that at the very end just before returning
to Trap handler
ARCv2 EXCEPTION_PROLOGUE already did that, so same for ARcompact and the
common trap handler adjusted to use cached ECR
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
This fixes the possible link/relo errors, since restore_regs will be
provided by ISA code, but called from ARC common code.
The .L prefix reassures binutils that it will be in same compilation
unit.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
- Remove the ifdef'ery and write distinct versions for each mmu ver even
if there is some code duplication
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
That is because __after_dc_op() already reads it for status check, so it
is better anyways to use that "newer" value.
Also reduces the clutter in callers for passing from/to these routines.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
As DW Mobile Storage databook says it's required to use "Hold Register"
if card is enumerated in SDR12 or SDR25 modes.
It means we need to act in the same way as in Altera's Socfpga
implementation - set "use hold reg" bit in commad.
Note that for upstream proper solution would be to remove
dw_mci_pltfm_prepare_command() at all and set the bit right in
dw_mci_prepare_command() for all platforms.
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Earlycon calculates UART clock as "BASE_BAUD * 16". In case of ARC
"BASE_BAUD" is calculated dynamically in runtime, basically it is an
alias to arc_early_base_baud(), which in turn just does
"arc_base_baud/16".
8250 UART on AXS/SDP board uses 33.3MHz clock source which is set in
"arc_base_baud" with this change.
Additional compatibility string "snps,arc-sdp" is introduced as well
because there're different flavours of AXS boards but they all share the
same motherboard and so it's possible to re-use the same code for
motherbord even if CPU daughterboard changes.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
The AXS10x platforms consist of a mainboard with peripherals,
on which several daughter cards can be placed. The daughter cards
typically contain a CPU and memory.
Signed-off-by: Mischa Jonker <mjonker@synopsys.com>
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Currently, it doesn't invoke the callback but continues to unwind
Also while at it - simplify the code a bit
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Directly return the result of perf_pmu_register() in
arc_pmu_device_probe() instead of assigning and returning variable ret.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
static arc_pmu in the arch/arc/kernel/perf_event.c is not initialized as
it's shadowed by a local variable of the same name in the
arc_pmu_device_probe.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Fixes: 03c94fcf95 "ARC: perf: make @arc_pmu static global"
CC: <stable@vger.kernel.org> # 4.1
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>