Commit graph

2892 commits

Author SHA1 Message Date
Thomas Tai
899b60535a sparc64: fix compile warning section mismatch in find_node()
[ Upstream commit 87a349f9cc0908bc0cfac0c9ece3179f650ae95a ]

A compile warning is introduced by a commit to fix the find_node().
This patch fix the compile warning by moving find_node() into __init
section. Because find_node() is only used by memblock_nid_range() which
is only used by a __init add_node_ranges(). find_node() and
memblock_nid_range() should also be inside __init section.

Signed-off-by: Thomas Tai <thomas.tai@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-12-10 19:07:25 +01:00
Thomas Tai
ed7b60db00 sparc64: Fix find_node warning if numa node cannot be found
[ Upstream commit 74a5ed5c4f692df2ff0a2313ea71e81243525519 ]

When booting up LDOM, find_node() warns that a physical address
doesn't match a NUMA node.

WARNING: CPU: 0 PID: 0 at arch/sparc/mm/init_64.c:835
find_node+0xf4/0x120 find_node: A physical address doesn't
match a NUMA node rule. Some physical memory will be
owned by node 0.Modules linked in:

CPU: 0 PID: 0 Comm: swapper Not tainted 4.9.0-rc3 #4
Call Trace:
 [0000000000468ba0] __warn+0xc0/0xe0
 [0000000000468c74] warn_slowpath_fmt+0x34/0x60
 [00000000004592f4] find_node+0xf4/0x120
 [0000000000dd0774] add_node_ranges+0x38/0xe4
 [0000000000dd0b1c] numa_parse_mdesc+0x268/0x2e4
 [0000000000dd0e9c] bootmem_init+0xb8/0x160
 [0000000000dd174c] paging_init+0x808/0x8fc
 [0000000000dcb0d0] setup_arch+0x2c8/0x2f0
 [0000000000dc68a0] start_kernel+0x48/0x424
 [0000000000dcb374] start_early_boot+0x27c/0x28c
 [0000000000a32c08] tlb_fixup_done+0x4c/0x64
 [0000000000027f08] 0x27f08

It is because linux use an internal structure node_masks[] to
keep the best memory latency node only. However, LDOM mdesc can
contain single latency-group with multiple memory latency nodes.

If the address doesn't match the best latency node within
node_masks[], it should check for an alternative via mdesc.
The warning message should only be printed if the address
doesn't match any node_masks[] nor within mdesc. To minimize
the impact of searching mdesc every time, the last matched
mask and index is stored in a variable.

Signed-off-by: Thomas Tai <thomas.tai@oracle.com>
Reviewed-by: Chris Hyser <chris.hyser@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-12-10 19:07:25 +01:00
Andreas Larsson
438e91da24 sparc32: Fix inverted invalid_frame_pointer checks on sigreturns
[ Upstream commit 07b5ab3f71d318e52c18cc3b73c1d44c908aacfa ]

Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-12-10 19:07:25 +01:00
David S. Miller
b4bbdcef7d sparc64: Delete now unused user copy fixup functions.
[ Upstream commit 0fd0ff01d4c3c01e7fe69b762ee1a13236639acc ]

Now that all of the user copy routines are converted to return
accurate residual lengths when an exception occurs, we no longer need
the broken fixup routines.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:42 +01:00
David S. Miller
cb85910b0d sparc64: Delete now unused user copy assembler helpers.
[ Upstream commit 614da3d9685b67917cab48c8452fd8bf93de0867 ]

All of __ret{,l}_mone{_asi,_fp,_asi_fpu} are now unused.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:42 +01:00
David S. Miller
1c7e17b1c4 sparc64: Convert U3copy_{from,to}_user to accurate exception reporting.
[ Upstream commit ee841d0aff649164080e445e84885015958d8ff4 ]

Report the exact number of bytes which have not been successfully
copied when an exception occurs, using the running remaining length.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:42 +01:00
David S. Miller
7181969338 sparc64: Convert NG2copy_{from,to}_user to accurate exception reporting.
[ Upstream commit e93704e4464fdc191f73fce35129c18de2ebf95d ]

Report the exact number of bytes which have not been successfully
copied when an exception occurs, using the running remaining length.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:42 +01:00
David S. Miller
bfc8be6593 sparc64: Convert NGcopy_{from,to}_user to accurate exception reporting.
[ Upstream commit 7ae3aaf53f1695877ccd5ebbc49ea65991e41f1e ]

Report the exact number of bytes which have not been successfully
copied when an exception occurs, using the running remaining length.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:41 +01:00
David S. Miller
dc3a7a7d2c sparc64: Convert NG4copy_{from,to}_user to accurate exception reporting.
[ Upstream commit 95707704800988093a9b9a27e0f2f67f5b4bf2fa ]

Report the exact number of bytes which have not been successfully
copied when an exception occurs, using the running remaining length.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:41 +01:00
David S. Miller
1731d90d8a sparc64: Convert U1copy_{from,to}_user to accurate exception reporting.
[ Upstream commit cb736fdbb208eb3420f1a2eb2bfc024a6e9dcada ]

Report the exact number of bytes which have not been successfully
copied when an exception occurs, using the running remaining length.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:41 +01:00
David S. Miller
8a444c770f sparc64: Convert GENcopy_{from,to}_user to accurate exception reporting.
[ Upstream commit d0796b555ba60c22eb41ae39a8362156cb08eee9 ]

Report the exact number of bytes which have not been successfully
copied when an exception occurs, using the running remaining length.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:41 +01:00
David S. Miller
c718e917b3 sparc64: Convert copy_in_user to accurate exception reporting.
[ Upstream commit 0096ac9f47b1a2e851b3165d44065d18e5f13d58 ]

Report the exact number of bytes which have not been successfully
copied when an exception occurs, using the running remaining length.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:41 +01:00
David S. Miller
dd8a78b2b6 sparc64: Prepare to move to more saner user copy exception handling.
[ Upstream commit 83a17d2661674d8c198adc0e183418f72aabab79 ]

The fixup helper function mechanism for handling user copy fault
handling is not %100 accurrate, and can never be made so.

We are going to transition the code to return the running return
return length, which is always kept track in one or more registers
of each of these routines.

In order to convert them one by one, we have to allow the existing
behavior to continue functioning.

Therefore make all the copy code that wants the fixup helper to be
used return negative one.

After all of the user copy routines have been converted, this logic
and the fixup helpers themselves can be removed completely.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:41 +01:00
David S. Miller
756723ad55 sparc64: Delete __ret_efault.
[ Upstream commit aa95ce361ed95c72ac42dcb315166bce5cf1a014 ]

It is completely unused.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:41 +01:00
David S. Miller
f5a69ff748 sparc64: Handle extremely large kernel TLB range flushes more gracefully.
[ Upstream commit a74ad5e660a9ee1d071665e7e8ad822784a2dc7f ]

When the vmalloc area gets fragmented, and because the firmware
mapping area sits between where modules live and the vmalloc area, we
can sometimes receive requests for enormous kernel TLB range flushes.

When this happens the cpu just spins flushing billions of pages and
this triggers the NMI watchdog and other problems.

We took care of this on the TSB side by doing a linear scan of the
table once we pass a certain threshold.

Do something similar for the TLB flush, however we are limited by
the TLB flush facilities provided by the different chip variants.

First of all we use an (mostly arbitrary) cut-off of 256K which is
about 32 pages.  This can be tuned in the future.

The huge range code path for each chip works as follows:

1) On spitfire we flush all non-locked TLB entries using diagnostic
   acceses.

2) On cheetah we use the "flush all" TLB flush.

3) On sun4v/hypervisor we do a TLB context flush on context 0, which
   unlike previous chips does not remove "permanent" or locked
   entries.

We could probably do something better on spitfire, such as limiting
the flush to kernel TLB entries or even doing range comparisons.
However that probably isn't worth it since those chips are old and
the TLB only had 64 entries.

Reported-by: James Clarke <jrtc27@jrtc27.com>
Tested-by: James Clarke <jrtc27@jrtc27.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:41 +01:00
David S. Miller
d36a1ac49d sparc64: Fix illegal relative branches in hypervisor patched TLB cross-call code.
[ Upstream commit a236441bb69723032db94128761a469030c3fe6d ]

Just like the non-cross-call TLB flush handlers, the cross-call ones need
to avoid doing PC-relative branches outside of their code blocks.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:41 +01:00
David S. Miller
5d8eb95476 sparc64: Fix instruction count in comment for __hypervisor_flush_tlb_pending.
[ Upstream commit 830cda3f9855ff092b0e9610346d110846fc497c ]

Noticed by James Clarke.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:41 +01:00
David S. Miller
217f829ae9 sparc64: Fix illegal relative branches in hypervisor patched TLB code.
[ Upstream commit b429ae4d5b565a71dfffd759dfcd4f6c093ced94 ]

When we copy code over to patch another piece of code, we can only use
PC-relative branches that target code within that piece of code.

Such PC-relative branches cannot be made to external symbols because
the patch moves the location of the code and thus modifies the
relative address of external symbols.

Use an absolute jmpl to fix this problem.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:41 +01:00
David S. Miller
2ba06323db sparc64: Handle extremely large kernel TSB range flushes sanely.
[ Upstream commit 849c498766060a16aad5b0e0d03206726e7d2fa4 ]

If the number of pages we are flushing is more than twice the number
of entries in the TSB, just scan the TSB table for matches rather
than probing each and every page in the range.

Based upon a patch and report by James Clarke.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:41 +01:00
James Clarke
7593180073 sparc: Handle negative offsets in arch_jump_label_transform
[ Upstream commit 9d9fa230206a3aea6ef451646c97122f04777983 ]

Additionally, if the offset will overflow the immediate for a ba,pt
instruction, fall back on a standard ba to get an extra 3 bits.

Signed-off-by: James Clarke <jrtc27@jrtc27.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:41 +01:00
Mike Kravetz
8fd11efa21 sparc64 mm: Fix base TSB sizing when hugetlb pages are used
[ Upstream commit af1b1a9b36b8f9d583d4b4f90dd8946ed0cd4bd0 ]

do_sparc64_fault() calculates both the base and huge page RSS sizes and
uses this information in calls to tsb_grow().  The calculation for base
page TSB size is not correct if the task uses hugetlb pages.  hugetlb
pages are not accounted for in RSS, therefore the call to get_mm_rss(mm)
does not include hugetlb pages.  However, the number of pages based on
huge_pte_count (which does include hugetlb pages) is subtracted from
this value.  This will result in an artificially small and often negative
RSS calculation.  The base TSB size is then often set to max_tsb_size
as the passed RSS is unsigned, so a negative value looks really big.

THP pages are also accounted for in huge_pte_count, and THP pages are
accounted for in RSS so the calculation in do_sparc64_fault() is correct
if a task only uses THP pages.

A single huge_pte_count is not sufficient for TSB sizing if both hugetlb
and THP pages can be used.  Instead of a single counter, use two:  one
for hugetlb and one for THP.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:40 +01:00
David S. Miller
a395f7a66e sparc: Don't leak context bits into thread->fault_address
[ Upstream commit 4f6deb8cbab532a8d7250bc09234c1795ecb5e2c ]

On pre-Niagara systems, we fetch the fault address on data TLB
exceptions from the TLB_TAG_ACCESS register.  But this register also
contains the context ID assosciated with the fault in the low 13 bits
of the register value.

This propagates into current_thread_info()->fault_address and can
cause trouble later on.

So clear the low 13-bits out of the TLB_TAG_ACCESS value in the cases
where it matters.

Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:40 +01:00
Al Viro
6de81788b4 sparc32: fix copy_from_user()
commit 917400cecb4b52b5cde5417348322bb9c8272fa6 upstream.

Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-24 10:07:45 +02:00
David S. Miller
561e4453dd sparc64: Fix return from trap window fill crashes.
[ Upstream commit 7cafc0b8bf130f038b0ec2dcdd6a9de6dc59b65a ]

We must handle data access exception as well as memory address unaligned
exceptions from return from trap window fill faults, not just normal
TLB misses.

Otherwise we can get an OOPS that looks like this:

ld-linux.so.2(36808): Kernel bad sw trap 5 [#1]
CPU: 1 PID: 36808 Comm: ld-linux.so.2 Not tainted 4.6.0 #34
task: fff8000303be5c60 ti: fff8000301344000 task.ti: fff8000301344000
TSTATE: 0000004410001601 TPC: 0000000000a1a784 TNPC: 0000000000a1a788 Y: 00000002    Not tainted
TPC: <do_sparc64_fault+0x5c4/0x700>
g0: fff8000024fc8248 g1: 0000000000db04dc g2: 0000000000000000 g3: 0000000000000001
g4: fff8000303be5c60 g5: fff800030e672000 g6: fff8000301344000 g7: 0000000000000001
o0: 0000000000b95ee8 o1: 000000000000012b o2: 0000000000000000 o3: 0000000200b9b358
o4: 0000000000000000 o5: fff8000301344040 sp: fff80003013475c1 ret_pc: 0000000000a1a77c
RPC: <do_sparc64_fault+0x5bc/0x700>
l0: 00000000000007ff l1: 0000000000000000 l2: 000000000000005f l3: 0000000000000000
l4: fff8000301347e98 l5: fff8000024ff3060 l6: 0000000000000000 l7: 0000000000000000
i0: fff8000301347f60 i1: 0000000000102400 i2: 0000000000000000 i3: 0000000000000000
i4: 0000000000000000 i5: 0000000000000000 i6: fff80003013476a1 i7: 0000000000404d4c
I7: <user_rtt_fill_fixup+0x6c/0x7c>
Call Trace:
 [0000000000404d4c] user_rtt_fill_fixup+0x6c/0x7c

The window trap handlers are slightly clever, the trap table entries for them are
composed of two pieces of code.  First comes the code that actually performs
the window fill or spill trap handling, and then there are three instructions at
the end which are for exception processing.

The userland register window fill handler is:

	add	%sp, STACK_BIAS + 0x00, %g1;		\
	ldxa	[%g1 + %g0] ASI, %l0;			\
	mov	0x08, %g2;				\
	mov	0x10, %g3;				\
	ldxa	[%g1 + %g2] ASI, %l1;			\
	mov	0x18, %g5;				\
	ldxa	[%g1 + %g3] ASI, %l2;			\
	ldxa	[%g1 + %g5] ASI, %l3;			\
	add	%g1, 0x20, %g1;				\
	ldxa	[%g1 + %g0] ASI, %l4;			\
	ldxa	[%g1 + %g2] ASI, %l5;			\
	ldxa	[%g1 + %g3] ASI, %l6;			\
	ldxa	[%g1 + %g5] ASI, %l7;			\
	add	%g1, 0x20, %g1;				\
	ldxa	[%g1 + %g0] ASI, %i0;			\
	ldxa	[%g1 + %g2] ASI, %i1;			\
	ldxa	[%g1 + %g3] ASI, %i2;			\
	ldxa	[%g1 + %g5] ASI, %i3;			\
	add	%g1, 0x20, %g1;				\
	ldxa	[%g1 + %g0] ASI, %i4;			\
	ldxa	[%g1 + %g2] ASI, %i5;			\
	ldxa	[%g1 + %g3] ASI, %i6;			\
	ldxa	[%g1 + %g5] ASI, %i7;			\
	restored;					\
	retry; nop; nop; nop; nop;			\
	b,a,pt	%xcc, fill_fixup_dax;			\
	b,a,pt	%xcc, fill_fixup_mna;			\
	b,a,pt	%xcc, fill_fixup;

And the way this works is that if any of those memory accesses
generate an exception, the exception handler can revector to one of
those final three branch instructions depending upon which kind of
exception the memory access took.  In this way, the fault handler
doesn't have to know if it was a spill or a fill that it's handling
the fault for.  It just always branches to the last instruction in
the parent trap's handler.

For example, for a regular fault, the code goes:

winfix_trampoline:
	rdpr	%tpc, %g3
	or	%g3, 0x7c, %g3
	wrpr	%g3, %tnpc
	done

All window trap handlers are 0x80 aligned, so if we "or" 0x7c into the
trap time program counter, we'll get that final instruction in the
trap handler.

On return from trap, we have to pull the register window in but we do
this by hand instead of just executing a "restore" instruction for
several reasons.  The largest being that from Niagara and onward we
simply don't have enough levels in the trap stack to fully resolve all
possible exception cases of a window fault when we are already at
trap level 1 (which we enter to get ready to return from the original
trap).

This is executed inline via the FILL_*_RTRAP handlers.  rtrap_64.S's
code branches directly to these to do the window fill by hand if
necessary.  Now if you look at them, we'll see at the end:

	    ba,a,pt    %xcc, user_rtt_fill_fixup;
	    ba,a,pt    %xcc, user_rtt_fill_fixup;
	    ba,a,pt    %xcc, user_rtt_fill_fixup;

And oops, all three cases are handled like a fault.

This doesn't work because each of these trap types (data access
exception, memory address unaligned, and faults) store their auxiliary
info in different registers to pass on to the C handler which does the
real work.

So in the case where the stack was unaligned, the unaligned trap
handler sets up the arg registers one way, and then we branched to
the fault handler which expects them setup another way.

So the FAULT_TYPE_* value ends up basically being garbage, and
randomly would generate the backtrace seen above.

Reported-by: Nick Alcock <nix@esperi.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24 10:18:21 -07:00
David S. Miller
1fda90c39d sparc: Harden signal return frame checks.
[ Upstream commit d11c2a0de2824395656cf8ed15811580c9dd38aa ]

All signal frames must be at least 16-byte aligned, because that is
the alignment we explicitly create when we build signal return stack
frames.

All stack pointers must be at least 8-byte aligned.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24 10:18:21 -07:00
David S. Miller
6bb3290ce9 sparc64: Take ctx_alloc_lock properly in hugetlb_setup().
[ Upstream commit 9ea46abe22550e3366ff7cee2f8391b35b12f730 ]

On cheetahplus chips we take the ctx_alloc_lock in order to
modify the TLB lookup parameters for the indexed TLBs, which
are stored in the context register.

This is called with interrupts disabled, however ctx_alloc_lock
is an IRQ safe lock, therefore we must take acquire/release it
properly with spin_{lock,unlock}_irq().

Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24 10:18:21 -07:00
Nitin Gupta
87575e31be sparc64: Reduce TLB flushes during hugepte changes
[ Upstream commit 24e49ee3d76b70853a96520e46b8837e5eae65b2 ]

During hugepage map/unmap, TSB and TLB flushes are currently
issued at every PAGE_SIZE'd boundary which is unnecessary.
We now issue the flush at REAL_HPAGE_SIZE boundaries only.

Without this patch workloads which unmap a large hugepage
backed VMA region get CPU lockups due to excessive TLB
flush calls.

Orabug: 22365539, 22643230, 22995196

Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24 10:18:21 -07:00
Babu Moger
ccd02310db sparc/PCI: Fix for panic while enabling SR-IOV
[ Upstream commit d0c31e02005764dae0aab130a57e9794d06b824d ]

We noticed this panic while enabling SR-IOV in sparc.

mlx4_core: Mellanox ConnectX core driver v2.2-1 (Jan  1 2015)
mlx4_core: Initializing 0007:01:00.0
mlx4_core 0007:01:00.0: Enabling SR-IOV with 5 VFs
mlx4_core: Initializing 0007:01:00.1
Unable to handle kernel NULL pointer dereference
insmod(10010): Oops [#1]
CPU: 391 PID: 10010 Comm: insmod Not tainted
		4.1.12-32.el6uek.kdump2.sparc64 #1
TPC: <dma_supported+0x20/0x80>
I7: <__mlx4_init_one+0x324/0x500 [mlx4_core]>
Call Trace:
 [00000000104c5ea4] __mlx4_init_one+0x324/0x500 [mlx4_core]
 [00000000104c613c] mlx4_init_one+0xbc/0x120 [mlx4_core]
 [0000000000725f14] local_pci_probe+0x34/0xa0
 [0000000000726028] pci_call_probe+0xa8/0xe0
 [0000000000726310] pci_device_probe+0x50/0x80
 [000000000079f700] really_probe+0x140/0x420
 [000000000079fa24] driver_probe_device+0x44/0xa0
 [000000000079fb5c] __device_attach+0x3c/0x60
 [000000000079d85c] bus_for_each_drv+0x5c/0xa0
 [000000000079f588] device_attach+0x88/0xc0
 [000000000071acd0] pci_bus_add_device+0x30/0x80
 [0000000000736090] virtfn_add.clone.1+0x210/0x360
 [00000000007364a4] sriov_enable+0x2c4/0x520
 [000000000073672c] pci_enable_sriov+0x2c/0x40
 [00000000104c2d58] mlx4_enable_sriov+0xf8/0x180 [mlx4_core]
 [00000000104c49ac] mlx4_load_one+0x42c/0xd40 [mlx4_core]
Disabling lock debugging due to kernel taint
Caller[00000000104c5ea4]: __mlx4_init_one+0x324/0x500 [mlx4_core]
Caller[00000000104c613c]: mlx4_init_one+0xbc/0x120 [mlx4_core]
Caller[0000000000725f14]: local_pci_probe+0x34/0xa0
Caller[0000000000726028]: pci_call_probe+0xa8/0xe0
Caller[0000000000726310]: pci_device_probe+0x50/0x80
Caller[000000000079f700]: really_probe+0x140/0x420
Caller[000000000079fa24]: driver_probe_device+0x44/0xa0
Caller[000000000079fb5c]: __device_attach+0x3c/0x60
Caller[000000000079d85c]: bus_for_each_drv+0x5c/0xa0
Caller[000000000079f588]: device_attach+0x88/0xc0
Caller[000000000071acd0]: pci_bus_add_device+0x30/0x80
Caller[0000000000736090]: virtfn_add.clone.1+0x210/0x360
Caller[00000000007364a4]: sriov_enable+0x2c4/0x520
Caller[000000000073672c]: pci_enable_sriov+0x2c/0x40
Caller[00000000104c2d58]: mlx4_enable_sriov+0xf8/0x180 [mlx4_core]
Caller[00000000104c49ac]: mlx4_load_one+0x42c/0xd40 [mlx4_core]
Caller[00000000104c5f90]: __mlx4_init_one+0x410/0x500 [mlx4_core]
Caller[00000000104c613c]: mlx4_init_one+0xbc/0x120 [mlx4_core]
Caller[0000000000725f14]: local_pci_probe+0x34/0xa0
Caller[0000000000726028]: pci_call_probe+0xa8/0xe0
Caller[0000000000726310]: pci_device_probe+0x50/0x80
Caller[000000000079f700]: really_probe+0x140/0x420
Caller[000000000079fa24]: driver_probe_device+0x44/0xa0
Caller[000000000079fb08]: __driver_attach+0x88/0xa0
Caller[000000000079d90c]: bus_for_each_dev+0x6c/0xa0
Caller[000000000079f29c]: driver_attach+0x1c/0x40
Caller[000000000079e35c]: bus_add_driver+0x17c/0x220
Caller[00000000007a02d4]: driver_register+0x74/0x120
Caller[00000000007263fc]: __pci_register_driver+0x3c/0x60
Caller[00000000104f62bc]: mlx4_init+0x60/0xcc [mlx4_core]
Kernel panic - not syncing: Fatal exception
Press Stop-A (L1-A) to return to the boot prom
---[ end Kernel panic - not syncing: Fatal exception

Details:
Here is the call sequence
virtfn_add->__mlx4_init_one->dma_set_mask->dma_supported

The panic happened at line 760(file arch/sparc/kernel/iommu.c)

758 int dma_supported(struct device *dev, u64 device_mask)
759 {
760         struct iommu *iommu = dev->archdata.iommu;
761         u64 dma_addr_mask = iommu->dma_addr_mask;
762
763         if (device_mask >= (1UL << 32UL))
764                 return 0;
765
766         if ((device_mask & dma_addr_mask) == dma_addr_mask)
767                 return 1;
768
769 #ifdef CONFIG_PCI
770         if (dev_is_pci(dev))
771		return pci64_dma_supported(to_pci_dev(dev), device_mask);
772 #endif
773
774         return 0;
775 }
776 EXPORT_SYMBOL(dma_supported);

Same panic happened with Intel ixgbe driver also.

SR-IOV code looks for arch specific data while enabling
VFs. When VF device is added, driver probe function makes set
of calls to initialize the pci device. Because the VF device is
added different way than the normal PF device(which happens via
of_create_pci_dev for sparc), some of the arch specific initialization
does not happen for VF device.  That causes panic when archdata is
accessed.

To fix this, I have used already defined weak function
pcibios_setup_device to copy archdata from PF to VF.
Also verified the fix.

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24 10:18:21 -07:00
David S. Miller
b120609082 sparc64: Fix sparc64_set_context stack handling.
[ Upstream commit 397d1533b6cce0ccb5379542e2e6d079f6936c46 ]

Like a signal return, we should use synchronize_user_stack() rather
than flush_user_windows().

Reported-by: Ilya Malakhov <ilmalakhovthefirst@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24 10:18:21 -07:00
Nitin Gupta
4185bd68ef sparc64: Fix numa node distance initialization
[ Upstream commit 36beca6571c941b28b0798667608239731f9bc3a ]

Orabug: 22495713

Currently, NUMA node distance matrix is initialized only
when a machine descriptor (MD) exists. However, sun4u
machines (e.g. Sun Blade 2500) do not have an MD and thus
distance values were left uninitialized. The initialization
is now moved such that it happens on both sun4u and sun4v.

Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Tested-by: Mikael Pettersson <mikpelinux@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24 10:18:21 -07:00
David S. Miller
e9c74337a7 sparc64: Fix bootup regressions on some Kconfig combinations.
[ Upstream commit 49fa5230462f9f2c4e97c81356473a6bdf06c422 ]

The system call tracing bug fix mentioned in the Fixes tag
below increased the amount of assembler code in the sequence
of assembler files included by head_64.S

This caused to total set of code to exceed 0x4000 bytes in
size, which overflows the expression in head_64.S that works
to place swapper_tsb at address 0x408000.

When this is violated, the TSB is not properly aligned, and
also the trap table is not aligned properly either.  All of
this together results in failed boots.

So, do two things:

1) Simplify some code by using ba,a instead of ba/nop to get
   those bytes back.

2) Add a linker script assertion to make sure that if this
   happens again the build will fail.

Fixes: 1a40b95374f6 ("sparc: Fix system call tracing register handling.")
Reported-by: Meelis Roos <mroos@linux.ee>
Reported-by: Joerg Abraham <joerg.abraham@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24 10:18:21 -07:00
Mike Frysinger
c9bc125c92 sparc: Fix system call tracing register handling.
[ Upstream commit 1a40b95374f680625318ab61d81958e949e0afe3 ]

A system call trace trigger on entry allows the tracing
process to inspect and potentially change the traced
process's registers.

Account for that by reloading the %g1 (syscall number)
and %i0-%i5 (syscall argument) values.  We need to be
careful to revalidate the range of %g1, and reload the
system call table entry it corresponds to into %l7.

Reported-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24 10:18:21 -07:00
Dmitry V. Levin
a0b1c2d0f5 sparc64: fix incorrect sign extension in sys_sparc64_personality
commit 525fd5a94e1be0776fa652df5c687697db508c91 upstream.

The value returned by sys_personality has type "long int".
It is saved to a variable of type "int", which is not a problem
yet because the type of task_struct->pesonality is "unsigned int".
The problem is the sign extension from "int" to "long int"
that happens on return from sys_sparc64_personality.

For example, a userspace call personality((unsigned) -EINVAL) will
result to any subsequent personality call, including absolutely
harmless read-only personality(0xffffffff) call, failing with
errno set to EINVAL.

Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03 15:07:17 -08:00
Rabin Vincent
55795ef546 net: filter: make JITs zero A for SKF_AD_ALU_XOR_X
The SKF_AD_ALU_XOR_X ancillary is not like the other ancillary data
instructions since it XORs A with X while all the others replace A with
some loaded value.  All the BPF JITs fail to clear A if this is used as
the first instruction in a filter.  This was found using american fuzzy
lop.

Add a helper to determine if A needs to be cleared given the first
instruction in a filter, and use this in the JITs.  Except for ARM, the
rest have only been compile-tested.

Fixes: 3480593131 ("net: filter: get rid of BPF_S_* enum")
Signed-off-by: Rabin Vincent <rabin@rab.in>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06 00:43:52 -05:00
David S. Miller
42d85c52f8 sparc: Wire up mlock2 system call.
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-31 15:38:56 -05:00
David S. Miller
8b30ca73b7 sparc: Add all necessary direct socket system calls.
The GLIBC folks would like to eliminate socketcall support
eventually, and this makes sense regardless so wire them
all up.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-31 15:18:02 -05:00
Rob Gardner
a7c5724b5c sparc64: fix FP corruption in user copy functions
Short story: Exception handlers used by some copy_to_user() and
copy_from_user() functions do not diligently clean up floating point
register usage, and this can result in a user process seeing invalid
values in floating point registers. This sometimes makes the process
fail.

Long story: Several cpu-specific (NG4, NG2, U1, U3) memcpy functions
use floating point registers and VIS alignaddr/faligndata to
accelerate data copying when source and dest addresses don't align
well. Linux uses a lazy scheme for saving floating point registers; It
is not done upon entering the kernel since it's a very expensive
operation. Rather, it is done only when needed. If the kernel ends up
not using FP regs during the course of some trap or system call, then
it can return to user space without saving or restoring them.

The various memcpy functions begin their FP code with VISEntry (or a
variation thereof), which saves the FP regs. They conclude their FP
code with VISExit (or a variation) which essentially marks the FP regs
"clean", ie, they contain no unsaved values. fprs.FPRS_FEF is turned
off so that a lazy restore will be triggered when/if the user process
accesses floating point regs again.

The bug is that the user copy variants of memcpy, copy_from_user() and
copy_to_user(), employ an exception handling mechanism to detect faults
when accessing user space addresses, and when this handler is invoked,
an immediate return from the function is forced, and VISExit is not
executed, thus leaving the fprs register in an indeterminate state,
but often with fprs.FPRS_FEF set and one or more dirty bits. This
results in a return to user space with invalid values in the FP regs,
and since fprs.FPRS_FEF is on, no lazy restore occurs.

This bug affects copy_to_user() and copy_from_user() for NG4, NG2,
U3, and U1. All are fixed by using a new exception handler for those
loads and stores that are done during the time between VISEnter and
VISExit.

n.b. In NG4memcpy, the problematic code can be triggered by a copy
size greater than 128 bytes and an unaligned source address.  This bug
is known to be the cause of random user process memory corruptions
while perf is running with the callgraph option (ie, perf record -g).
This occurs because perf uses copy_from_user() to read user stacks,
and may fault when it follows a stack frame pointer off to an
invalid page. Validation checks on the stack address just obscure
the underlying problem.

Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-24 12:13:18 -05:00
Rob Gardner
833526941f sparc64: Perf should save/restore fault info
There have been several reports of random processes being killed with
a bus error or segfault during userspace stack walking in perf.  One
of the root causes of this problem is an asynchronous modification to
thread_info fault_address and fault_code, which stems from a perf
counter interrupt arriving during kernel processing of a "benign"
fault, such as a TSB miss. Since perf_callchain_user() invokes
copy_from_user() to read user stacks, a fault is not only possible,
but probable. Validity checks on the stack address merely cover up the
problem and reduce its frequency.

The solution here is to save and restore fault_address and fault_code
in perf_callchain_user() so that the benign fault handler is not
disturbed by a perf interrupt.

Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-24 12:12:46 -05:00
Rob Gardner
3f74306ac8 sparc64: Ensure perf can access user stacks
When an interrupt (such as a perf counter interrupt) is delivered
while executing in user space, the trap entry code puts ASI_AIUS in
%asi so that copy_from_user() and copy_to_user() will access the
correct memory. But if a perf counter interrupt is delivered while the
cpu is already executing in kernel space, then the trap entry code
will put ASI_P in %asi, and this will prevent copy_from_user() from
reading any useful stack data in either of the perf_callchain_user_X
functions, and thus no user callgraph data will be collected for this
sample period. An additional problem is that a fault is guaranteed
to occur, and though it will be silently covered up, it wastes time
and could perturb state.

In perf_callchain_user(), we ensure that %asi contains ASI_AIUS
because we know for a fact that the subsequent calls to
copy_from_user() are intended to read the user's stack.

[ Use get_fs()/set_fs() -DaveM ]

Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-24 12:10:29 -05:00
Rob Gardner
1ca04a4ce0 sparc64: Don't set %pil in rtrap_nmi too early
Commit 28a1f53 delays setting %pil to avoid potential
hardirq stack overflow in the common rtrap_irq path.
Setting %pil also needs to be delayed in the rtrap_nmi
path for the same reason.

Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-24 12:07:16 -05:00
Khalid Aziz
82924e542f sparc64: Add ADI capability to cpu capabilities
Add ADI (Application Data Integrity) capability to cpu capabilities list.
ADI capability allows virtual addresses to be encoded with a tag in
bits 63-60. This tag serves as an access control key for the regions
of virtual address with ADI enabled and a key set on them. Hypervisor
encodes this capability as "adp" in "hwcap-list" property in machine
description.

Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-24 12:05:06 -05:00
Mike Kravetz
9bcfd78ac0 sparc: Hook up userfaultfd system call
After hooking up system call, userfaultfd selftest was successful for
both 32 and 64 bit version of test.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-23 15:41:13 -05:00
Peter Zijlstra
90eec103b9 treewide: Remove old email address
There were still a number of references to my old Red Hat email
address in the kernel source. Remove these while keeping the
Red Hat copyright notices intact.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:44:58 +01:00
Mathieu Desnoyers
9c2d5eebfe sparc/sparc64: allocate sys_membarrier system call number
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Linus Torvalds
3c87b79188 PCI changes for the v4.4 merge window:
Resource management
     Add support for Enhanced Allocation devices (Sean O. Stalley)
     Add Enhanced Allocation register entries (Sean O. Stalley)
     Handle IORESOURCE_PCI_FIXED when sizing resources (David Daney)
     Handle IORESOURCE_PCI_FIXED when assigning resources (David Daney)
     Handle Enhanced Allocation capability for SR-IOV devices (David Daney)
     Clear IORESOURCE_UNSET when reverting to firmware-assigned address (Bjorn Helgaas)
     Make Enhanced Allocation bitmasks more obvious (Bjorn Helgaas)
     Expand Enhanced Allocation BAR output (Bjorn Helgaas)
     Add of_pci_check_probe_only to parse "linux,pci-probe-only" (Marc Zyngier)
     Fix lookup of linux,pci-probe-only property (Marc Zyngier)
     Add sparc mem64 resource parsing for root bus (Yinghai Lu)
 
   PCI device hotplug
     pciehp: Queue power work requests in dedicated function (Guenter Roeck)
 
   Driver binding
     Add builtin_pci_driver() to avoid registration boilerplate (Paul Gortmaker)
 
   Virtualization
     Set SR-IOV NumVFs to zero after enumeration (Alexander Duyck)
     Remove redundant validation of SR-IOV offset/stride registers (Alexander Duyck)
     Remove VFs in reverse order if virtfn_add() fails (Alexander Duyck)
     Reorder pcibios_sriov_disable() (Alexander Duyck)
     Wait 1 second between disabling VFs and clearing NumVFs (Alexander Duyck)
     Fix sriov_enable() error path for pcibios_enable_sriov() failures (Alexander Duyck)
     Enable SR-IOV ARI Capable Hierarchy before reading TotalVFs (Ben Shelton)
     Don't try to restore VF BARs (Wei Yang)
 
   MSI
     Don't alloc pcibios-irq when MSI is enabled (Joerg Roedel)
     Add msi_controller setup_irqs() method for special multivector setup (Lucas Stach)
     Export all remapped MSIs to sysfs attributes (Romain Bezut)
     Disable MSI on SiS 761 (Ondrej Zary)
 
   AER
     Clear error status registers during enumeration and restore (Taku Izumi)
 
   Generic host bridge driver
     Fix lookup of linux,pci-probe-only property (Marc Zyngier)
     Allow multiple hosts with different map_bus() methods (David Daney)
     Pass starting bus number to pci_scan_root_bus() (David Daney)
     Fix address window calculation for non-zero starting bus (David Daney)
 
   Altera host bridge driver
     Add msi.h to ARM Kbuild (Ley Foon Tan)
     Add Altera PCIe host controller driver (Ley Foon Tan)
     Add Altera PCIe MSI driver (Ley Foon Tan)
 
   APM X-Gene host bridge driver
     Remove msi_controller assignment (Duc Dang)
 
   Broadcom iProc host bridge driver
     Fix header comment "Corporation" misspelling (Florian Fainelli)
     Fix code comment to match code (Ray Jui)
     Remove unused struct iproc_pcie.irqs[] (Ray Jui)
     Call pci_fixup_irqs() for ARM64 as well as ARM (Ray Jui)
     Fix PCIe reset logic (Ray Jui)
     Improve link detection logic (Ray Jui)
     Update PCIe device tree bindings (Ray Jui)
     Add outbound mapping support (Ray Jui)
 
   Freescale i.MX6 host bridge driver
     Return real error code from imx6_add_pcie_port() (Fabio Estevam)
     Add PCIE_PHY_RX_ASIC_OUT_VALID definition (Fabio Estevam)
 
   Freescale Layerscape host bridge driver
     Remove ls_pcie_establish_link() (Minghuan Lian)
     Ignore PCIe controllers in Endpoint mode (Minghuan Lian)
     Factor out SCFG related function (Minghuan Lian)
     Update ls_add_pcie_port() (Minghuan Lian)
     Remove unused fields from struct ls_pcie (Minghuan Lian)
     Add support for LS1043a and LS2080a (Minghuan Lian)
     Add ls_pcie_msi_host_init() (Minghuan Lian)
 
   HiSilicon host bridge driver
     Add HiSilicon SoC Hip05 PCIe driver (Zhou Wang)
 
   Marvell MVEBU host bridge driver
     Return zero for reserved or unimplemented config space (Russell King)
     Use exact config access size; don't read/modify/write (Russell King)
     Use of_get_available_child_count() (Russell King)
     Use for_each_available_child_of_node() to walk child nodes (Russell King)
     Report full node name when reporting a DT error (Russell King)
     Use port->name rather than "PCIe%d.%d" (Russell King)
     Move port parsing and resource claiming to  separate function (Russell King)
     Fix memory leaks and refcount leaks (Russell King)
     Split port parsing and resource claiming from  port setup (Russell King)
     Use gpio_set_value_cansleep() (Russell King)
     Use devm_kcalloc() to allocate an array (Russell King)
     Use gpio_desc to carry around gpio (Russell King)
     Improve clock/reset handling (Russell King)
     Add PCI Express root complex capability block (Russell King)
     Remove code restricting accesses to slot 0 (Russell King)
 
   NVIDIA Tegra host bridge driver
     Wrap static pgprot_t initializer with __pgprot() (Ard Biesheuvel)
 
   Renesas R-Car host bridge driver
     Build pci-rcar-gen2.c only on ARM (Geert Uytterhoeven)
     Build pcie-rcar.c only on ARM (Geert Uytterhoeven)
     Make PCI aware of the I/O resources (Phil Edworthy)
     Remove dependency on ARM-specific struct hw_pci (Phil Edworthy)
     Set root bus nr to that provided in DT (Phil Edworthy)
     Fix I/O offset for multiple host bridges (Phil Edworthy)
 
   ST Microelectronics SPEAr13xx host bridge driver
     Fix dw_pcie_cfg_read/write() usage (Gabriele Paoloni)
 
   Synopsys DesignWare host bridge driver
     Make "clocks" and "clock-names" optional DT properties (Bhupesh Sharma)
     Use exact access size in dw_pcie_cfg_read() (Gabriele Paoloni)
     Simplify dw_pcie_cfg_read/write() interfaces (Gabriele Paoloni)
     Require config accesses to be naturally aligned (Gabriele Paoloni)
     Make "num-lanes" an optional DT property (Gabriele Paoloni)
     Move calculation of bus addresses to DRA7xx (Gabriele Paoloni)
     Replace ARM pci_sys_data->align_resource with global function pointer (Gabriele Paoloni)
     Factor out MSI msg setup (Lucas Stach)
     Implement multivector MSI IRQ setup (Lucas Stach)
     Make get_msi_addr() return phys_addr_t, not u32 (Lucas Stach)
     Set up high part of MSI target address (Lucas Stach)
     Fix PORT_LOGIC_LINK_WIDTH_MASK (Zhou Wang)
     Revert "PCI: designware: Program ATU with untranslated address" (Zhou Wang)
     Use of_pci_get_host_bridge_resources() to parse DT (Zhou Wang)
     Make driver arch-agnostic (Zhou Wang)
 
   Miscellaneous
     Make x86 pci_subsys_init() static (Alexander Kuleshov)
     Turn off Request Attributes to avoid Chelsio T5 Completion erratum (Hariprasad Shenai)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWPM9aAAoJEFmIoMA60/r89f8QALFMHegqKk5M08ZcCaG7unLF
 5U5t88y6KF/kNYF6HOqLQiHh79U3ToU5HdrNaAtutr0UnvgbFC2WulLqKYgLiq6Y
 YJnwR3EfgGmdG7DKAVAXq19+nc2hgTPAEe8sciU7HKlTbqQmGj6//y3sQULGNLx3
 zur0C33DCtrDgKDP7to273lkHO8Vl0YuLzyqwt0ePCMNcXR0h1dK8QxSTjuXBxaR
 c+T1V1Hw64MTxLz3xJd1/1ipy32u+9LnqqcdRz0zRq6qi48G9ch/i4Z6DHa8kTbj
 DUZrrTYKILQ2TcjcZSBmTueX11Z1Xa4/I/45sehIi6gVWL9qQbmGpt2E5YtFED+4
 GdcmBSbWG/qsNsabXk38uiM3ww7+ltXEOhTXbcK+EgjvIhE6gSK/plYG0fU9pybs
 AKViEXVdHoT1X0N1dLK12mq7kvDCQvShHn08lz97Q9YrZ32wv1Fnij6WVSbJvfWt
 DubtPtisVM+rVy+VTpOImNR9wO54lTmG5jK53yNqH7I20K89y1kqARlN9nMXMB1a
 2nQnwe9yWlsGj9gVNCn1KmyQSPOWjg+3Z+ekfwbxpca14s1AaN3Jm0N9Z61dXFoF
 y2ygoQtZ/z9BHr3quBpxXGt+aVUg2kcNw5GYeDYiALxXdJSObyzRrZ6HDb/zicU2
 ZH9hBj0ctXvucmy6I2mt
 =uZrt
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "Resource management:
   - Add support for Enhanced Allocation devices (Sean O. Stalley)
   - Add Enhanced Allocation register entries (Sean O. Stalley)
   - Handle IORESOURCE_PCI_FIXED when sizing resources (David Daney)
   - Handle IORESOURCE_PCI_FIXED when assigning resources (David Daney)
   - Handle Enhanced Allocation capability for SR-IOV devices (David Daney)
   - Clear IORESOURCE_UNSET when reverting to firmware-assigned address (Bjorn Helgaas)
   - Make Enhanced Allocation bitmasks more obvious (Bjorn Helgaas)
   - Expand Enhanced Allocation BAR output (Bjorn Helgaas)
   - Add of_pci_check_probe_only to parse "linux,pci-probe-only" (Marc Zyngier)
   - Fix lookup of linux,pci-probe-only property (Marc Zyngier)
   - Add sparc mem64 resource parsing for root bus (Yinghai Lu)

  PCI device hotplug:
   - pciehp: Queue power work requests in dedicated function (Guenter Roeck)

  Driver binding:
   - Add builtin_pci_driver() to avoid registration boilerplate (Paul Gortmaker)

  Virtualization:
   - Set SR-IOV NumVFs to zero after enumeration (Alexander Duyck)
   - Remove redundant validation of SR-IOV offset/stride registers (Alexander Duyck)
   - Remove VFs in reverse order if virtfn_add() fails (Alexander Duyck)
   - Reorder pcibios_sriov_disable() (Alexander Duyck)
   - Wait 1 second between disabling VFs and clearing NumVFs (Alexander Duyck)
   - Fix sriov_enable() error path for pcibios_enable_sriov() failures (Alexander Duyck)
   - Enable SR-IOV ARI Capable Hierarchy before reading TotalVFs (Ben Shelton)
   - Don't try to restore VF BARs (Wei Yang)

  MSI:
   - Don't alloc pcibios-irq when MSI is enabled (Joerg Roedel)
   - Add msi_controller setup_irqs() method for special multivector setup (Lucas Stach)
   - Export all remapped MSIs to sysfs attributes (Romain Bezut)
   - Disable MSI on SiS 761 (Ondrej Zary)

  AER:
   - Clear error status registers during enumeration and restore (Taku Izumi)

  Generic host bridge driver:
   - Fix lookup of linux,pci-probe-only property (Marc Zyngier)
   - Allow multiple hosts with different map_bus() methods (David Daney)
   - Pass starting bus number to pci_scan_root_bus() (David Daney)
   - Fix address window calculation for non-zero starting bus (David Daney)

  Altera host bridge driver:
   - Add msi.h to ARM Kbuild (Ley Foon Tan)
   - Add Altera PCIe host controller driver (Ley Foon Tan)
   - Add Altera PCIe MSI driver (Ley Foon Tan)

  APM X-Gene host bridge driver:
   - Remove msi_controller assignment (Duc Dang)

  Broadcom iProc host bridge driver:
   - Fix header comment "Corporation" misspelling (Florian Fainelli)
   - Fix code comment to match code (Ray Jui)
   - Remove unused struct iproc_pcie.irqs[] (Ray Jui)
   - Call pci_fixup_irqs() for ARM64 as well as ARM (Ray Jui)
   - Fix PCIe reset logic (Ray Jui)
   - Improve link detection logic (Ray Jui)
   - Update PCIe device tree bindings (Ray Jui)
   - Add outbound mapping support (Ray Jui)

  Freescale i.MX6 host bridge driver:
   - Return real error code from imx6_add_pcie_port() (Fabio Estevam)
   - Add PCIE_PHY_RX_ASIC_OUT_VALID definition (Fabio Estevam)

  Freescale Layerscape host bridge driver:
   - Remove ls_pcie_establish_link() (Minghuan Lian)
   - Ignore PCIe controllers in Endpoint mode (Minghuan Lian)
   - Factor out SCFG related function (Minghuan Lian)
   - Update ls_add_pcie_port() (Minghuan Lian)
   - Remove unused fields from struct ls_pcie (Minghuan Lian)
   - Add support for LS1043a and LS2080a (Minghuan Lian)
   - Add ls_pcie_msi_host_init() (Minghuan Lian)

  HiSilicon host bridge driver:
   - Add HiSilicon SoC Hip05 PCIe driver (Zhou Wang)

  Marvell MVEBU host bridge driver:
   - Return zero for reserved or unimplemented config space (Russell King)
   - Use exact config access size; don't read/modify/write (Russell King)
   - Use of_get_available_child_count() (Russell King)
   - Use for_each_available_child_of_node() to walk child nodes (Russell King)
   - Report full node name when reporting a DT error (Russell King)
   - Use port->name rather than "PCIe%d.%d" (Russell King)
   - Move port parsing and resource claiming to  separate function (Russell King)
   - Fix memory leaks and refcount leaks (Russell King)
   - Split port parsing and resource claiming from  port setup (Russell King)
   - Use gpio_set_value_cansleep() (Russell King)
   - Use devm_kcalloc() to allocate an array (Russell King)
   - Use gpio_desc to carry around gpio (Russell King)
   - Improve clock/reset handling (Russell King)
   - Add PCI Express root complex capability block (Russell King)
   - Remove code restricting accesses to slot 0 (Russell King)

  NVIDIA Tegra host bridge driver:
   - Wrap static pgprot_t initializer with __pgprot() (Ard Biesheuvel)

  Renesas R-Car host bridge driver:
   - Build pci-rcar-gen2.c only on ARM (Geert Uytterhoeven)
   - Build pcie-rcar.c only on ARM (Geert Uytterhoeven)
   - Make PCI aware of the I/O resources (Phil Edworthy)
   - Remove dependency on ARM-specific struct hw_pci (Phil Edworthy)
   - Set root bus nr to that provided in DT (Phil Edworthy)
   - Fix I/O offset for multiple host bridges (Phil Edworthy)

  ST Microelectronics SPEAr13xx host bridge driver:
   - Fix dw_pcie_cfg_read/write() usage (Gabriele Paoloni)

  Synopsys DesignWare host bridge driver:
   - Make "clocks" and "clock-names" optional DT properties (Bhupesh Sharma)
   - Use exact access size in dw_pcie_cfg_read() (Gabriele Paoloni)
   - Simplify dw_pcie_cfg_read/write() interfaces (Gabriele Paoloni)
   - Require config accesses to be naturally aligned (Gabriele Paoloni)
   - Make "num-lanes" an optional DT property (Gabriele Paoloni)
   - Move calculation of bus addresses to DRA7xx (Gabriele Paoloni)
   - Replace ARM pci_sys_data->align_resource with global function pointer (Gabriele Paoloni)
   - Factor out MSI msg setup (Lucas Stach)
   - Implement multivector MSI IRQ setup (Lucas Stach)
   - Make get_msi_addr() return phys_addr_t, not u32 (Lucas Stach)
   - Set up high part of MSI target address (Lucas Stach)
   - Fix PORT_LOGIC_LINK_WIDTH_MASK (Zhou Wang)
   - Revert "PCI: designware: Program ATU with untranslated address" (Zhou Wang)
   - Use of_pci_get_host_bridge_resources() to parse DT (Zhou Wang)
   - Make driver arch-agnostic (Zhou Wang)

  Miscellaneous:
   - Make x86 pci_subsys_init() static (Alexander Kuleshov)
   - Turn off Request Attributes to avoid Chelsio T5 Completion erratum (Hariprasad Shenai)"

* tag 'pci-v4.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (94 commits)
  PCI: altera: Add Altera PCIe MSI driver
  PCI: hisi: Add HiSilicon SoC Hip05 PCIe driver
  PCI: layerscape: Add ls_pcie_msi_host_init()
  PCI: layerscape: Add support for LS1043a and LS2080a
  PCI: layerscape: Remove unused fields from struct ls_pcie
  PCI: layerscape: Update ls_add_pcie_port()
  PCI: layerscape: Factor out SCFG related function
  PCI: layerscape: Ignore PCIe controllers in Endpoint mode
  PCI: layerscape: Remove ls_pcie_establish_link()
  PCI: designware: Make "clocks" and "clock-names" optional DT properties
  PCI: designware: Make driver arch-agnostic
  ARM/PCI: Replace pci_sys_data->align_resource with global function pointer
  PCI: designware: Use of_pci_get_host_bridge_resources() to parse DT
  Revert "PCI: designware: Program ATU with untranslated address"
  PCI: designware: Move calculation of bus addresses to DRA7xx
  PCI: designware: Make "num-lanes" an optional DT property
  PCI: designware: Require config accesses to be naturally aligned
  PCI: designware: Simplify dw_pcie_cfg_read/write() interfaces
  PCI: designware: Use exact access size in dw_pcie_cfg_read()
  PCI: spear: Fix dw_pcie_cfg_read/write() usage
  ...
2015-11-06 11:29:53 -08:00
Linus Torvalds
2e3078af2c Merge branch 'akpm' (patches from Andrew)
Merge patch-bomb from Andrew Morton:

 - inotify tweaks

 - some ocfs2 updates (many more are awaiting review)

 - various misc bits

 - kernel/watchdog.c updates

 - Some of mm.  I have a huge number of MM patches this time and quite a
   lot of it is quite difficult and much will be held over to next time.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (162 commits)
  selftests: vm: add tests for lock on fault
  mm: mlock: add mlock flags to enable VM_LOCKONFAULT usage
  mm: introduce VM_LOCKONFAULT
  mm: mlock: add new mlock system call
  mm: mlock: refactor mlock, munlock, and munlockall code
  kasan: always taint kernel on report
  mm, slub, kasan: enable user tracking by default with KASAN=y
  kasan: use IS_ALIGNED in memory_is_poisoned_8()
  kasan: Fix a type conversion error
  lib: test_kasan: add some testcases
  kasan: update reference to kasan prototype repo
  kasan: move KASAN_SANITIZE in arch/x86/boot/Makefile
  kasan: various fixes in documentation
  kasan: update log messages
  kasan: accurately determine the type of the bad access
  kasan: update reported bug types for kernel memory accesses
  kasan: update reported bug types for not user nor kernel memory accesses
  mm/kasan: prevent deadlock in kasan reporting
  mm/kasan: don't use kasan shadow pointer in generic functions
  mm/kasan: MODULE_VADDR is not available on all archs
  ...
2015-11-05 23:10:54 -08:00
Eric B Munson
b0f205c2a3 mm: mlock: add mlock flags to enable VM_LOCKONFAULT usage
The previous patch introduced a flag that specified pages in a VMA should
be placed on the unevictable LRU, but they should not be made present when
the area is created.  This patch adds the ability to set this state via
the new mlock system calls.

We add MLOCK_ONFAULT for mlock2 and MCL_ONFAULT for mlockall.
MLOCK_ONFAULT will set the VM_LOCKONFAULT modifier for VM_LOCKED.
MCL_ONFAULT should be used as a modifier to the two other mlockall flags.
When used with MCL_CURRENT, all current mappings will be marked with
VM_LOCKED | VM_LOCKONFAULT.  When used with MCL_FUTURE, the mm->def_flags
will be marked with VM_LOCKED | VM_LOCKONFAULT.  When used with both
MCL_CURRENT and MCL_FUTURE, all current mappings and mm->def_flags will be
marked with VM_LOCKED | VM_LOCKONFAULT.

Prior to this patch, mlockall() will unconditionally clear the
mm->def_flags any time it is called without MCL_FUTURE.  This behavior is
maintained after adding MCL_ONFAULT.  If a call to mlockall(MCL_FUTURE) is
followed by mlockall(MCL_CURRENT), the mm->def_flags will be cleared and
new VMAs will be unlocked.  This remains true with or without MCL_ONFAULT
in either mlockall() invocation.

munlock() will unconditionally clear both vma flags.  munlockall()
unconditionally clears for VMA flags on all VMAs and in the mm->def_flags
field.

Signed-off-by: Eric B Munson <emunson@akamai.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Linus Torvalds
2c302e7e41 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Pull sparc updates from David Miller:
 "Just a couple of fixes/cleanups:

   - Correct NUMA latency calculations on sparc64, from Nitin Gupta.

   - ASI_ST_BLKINIT_MRU_S value was wrong, from Rob Gardner.

   - Fix non-faulting load handling of non-quad values, also from Rob
     Gardner.

   - Cleanup VISsave assembler, from Sam Ravnborg.

   - Fix iommu-common code so it doesn't emit rediculous warnings on
     some architectures, particularly ARM"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Fix numa distance values
  sparc64: Don't restrict fp regs for no-fault loads
  iommu-common: Fix error code used in iommu_tbl_range_{alloc,free}().
  sparc64: use ENTRY/ENDPROC in VISsave
  sparc64: Fix incorrect ASI_ST_BLKINIT_MRU_S value
2015-11-05 16:34:48 -08:00
Nitin Gupta
52708d690b sparc64: Fix numa distance values
Orabug: 21896119

Use machine descriptor (MD) to get node latency
values instead of just using default values.

Testing:
On an T5-8 system with:
 - total nodes = 8
 - self latencies = 0x26d18
 - latency to other nodes = 0x3a598
   => latency ratio = ~1.5

output of numactl --hardware

 - before fix:

node distances:
node   0   1   2   3   4   5   6   7
  0:  10  20  20  20  20  20  20  20
  1:  20  10  20  20  20  20  20  20
  2:  20  20  10  20  20  20  20  20
  3:  20  20  20  10  20  20  20  20
  4:  20  20  20  20  10  20  20  20
  5:  20  20  20  20  20  10  20  20
  6:  20  20  20  20  20  20  10  20
  7:  20  20  20  20  20  20  20  10

 - after fix:

node distances:
node   0   1   2   3   4   5   6   7
  0:  10  15  15  15  15  15  15  15
  1:  15  10  15  15  15  15  15  15
  2:  15  15  10  15  15  15  15  15
  3:  15  15  15  10  15  15  15  15
  4:  15  15  15  15  10  15  15  15
  5:  15  15  15  15  15  10  15  15
  6:  15  15  15  15  15  15  10  15
  7:  15  15  15  15  15  15  15  10

Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Reviewed-by: Chris Hyser <chris.hyser@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-04 12:14:49 -08:00
Rob Gardner
cae9af6a82 sparc64: Don't restrict fp regs for no-fault loads
The function handle_ldf_stq() deals with no-fault ASI
loads and stores, but restricts fp registers to quad
word regs (ie, %f0, %f4 etc). This is valid for the
STQ case, but unnecessarily restricts loads, which
may be single precision, double, or quad. This results
in SIGFPE being raised for this instruction when the
source address is invalid:
	ldda [%g1] ASI_PNF, %f2
but not for this one:
	ldda [%g1] ASI_PNF, %f4
The validation check for quad register is moved to
within the STQ block so that loads are not affected
by the check.

An additional problem is that the calculation for freg
is incorrect when a single precision load is being
handled. This causes %f1 to be seen as %f32 etc,
and the incorrect register ends up being overwritten.
This code sequence demonstrates the problem:
	ldd [%g1], %f32		! g1 = valid address
	lda [%i3] ASI_PNF, %f1  ! i3 = invalid address
	std %f32, [%g1]
This is corrected by basing the freg calculation on
the load size.

Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-04 15:00:49 -05:00