User visible:
- Use index, not CPU id, to find core/pkg id in 'perf stat' (Kan Liang)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJV4JVmAAoJENZQFvNTUqpARzsP/jFMKCGqK+hg1Tr3m05X3K3+
CFp1VIM7dCJLQJDGAFcxqTJtub5aE32j+RmEEfGsSvyjcE8dMqapw+JkUnk1Z8Tp
jxN/WgqZA0SP+Iwt//vEBWwRszuuiAQ7Re0x232DJBWgAeMdvhGcOaTStVnZq6Yk
d693xnmh/kxqJ3bB3Es/TCYYf8vKTfjPn2T8jCwmKQLHL18j8OwS92W5PdYhy2T/
MoqxEaz+SDf4jon1NS1sIYZZECFjeKAc8V7KDhfRoMhxwvnvkQzSwpGHOTOMrIqv
PetQbeZZou+R6LSpIBnZ/1nysQ7YWZYupgX30RCsQ3mYzUAAYC+itGJqEvTJxFZc
ykBJXSO3chbYR3m7dDeAPgQLYA+47QL3vGBfDhEtWgkVg9E58yfVaztaJzBxv22+
X69SbyIcrCvkQs8EYB10n58r2aAqtwTxl2DpLKl4g8XIeJWHjL1r+WVPgA418Sco
owr0ejIcP48LDNbMZMgF0TUfiXY7KzrXcWmsywq5Pet5xkes6CQ+iSDKBbO4PLQS
YivCVVq+egmzUtQR4xHZ8wsCVSzP6vP+WWJTGJbcfpM7OxOI+RBi34QM7rP4XE7f
Cs6R26xWusLFbfDwyA4EalSJcL85b4NZ1wIpOVeOz1xsP5DKmbQ0ZIbN8mxPGqoQ
rUs4NgaIP/q0KB0+h1pC
=u4R/
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fix from Arnaldo Carvalho de Melo:
- Use index, not CPU id, to find core/pkg id in 'perf stat' (Kan Liang)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
net/ipv4/af_inet.c: In function 'snmp_get_cpu_field64':
>> net/ipv4/af_inet.c:1486:26: error: 'offt' undeclared (first use in this function)
v = *(((u64 *)bhptr) + offt);
^
net/ipv4/af_inet.c:1486:26: note: each undeclared identifier is reported only once for each function it appears in
net/ipv4/af_inet.c: In function 'snmp_fold_field64':
>> net/ipv4/af_inet.c:1499:39: error: 'offct' undeclared (first use in this function)
res += snmp_get_cpu_field(mib, cpu, offct, syncp_offset);
^
>> net/ipv4/af_inet.c:1499:10: error: too many arguments to function 'snmp_get_cpu_field'
res += snmp_get_cpu_field(mib, cpu, offct, syncp_offset);
^
net/ipv4/af_inet.c:1455:5: note: declared here
u64 snmp_get_cpu_field(void __percpu *mib, int cpu, int offt)
^
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Poll() returns immediately after setting the kernel current frame
(ring->head) to SKIP from user space even though there is no new
frame. And in a case of all frames is VALID, user space program
unintensionally sets (only) kernel current frame to UNUSED, then
calls poll(), it will not return immediately even though there are
VALID frames.
To avoid situations like above, I think we need to scan all frames
to find VALID frames at poll() like netlink_alloc_skb(),
netlink_forward_ring() finding an UNUSED frame at skb allocation.
Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Aleksey Makarov says:
====================
net: thunderx: New features and fixes
v2:
- The unused affinity_mask field of the structure cmp_queue
has been deleted. (thanks to David Miller)
- The unneeded initializers have been dropped. (thanks to Alexey Klimov)
- The commit message "net: thunderx: Rework interrupt handling"
has been fixed. (thanks to Alexey Klimov)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Support for setting VF's corresponding BGX LMAC in internal
loopback mode. This mode can be used for verifying basic HW
functionality such as packet I/O, RX checksum validation,
CQ/RBDR interrupts, stats e.t.c. Useful when DUT has no external
network connectivity.
'loopback' mode can be enabled or disabled via ethtool.
Note: This feature is not supported when no of VFs enabled are
morethan no of physical interfaces i.e active BGX LMACs
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for handling multiple qsets assigned to a
single VF. There by increasing no of queues from earlier 8 to max
no of CPUs in the system i.e 48 queues on a single node and 96 on
dual node system. User doesn't have option to assign which Qsets/VFs
to be merged. Upon request from VF, PF assigns next free Qsets as
secondary qsets. To maintain current behavior no of queues is kept
to 8 by default which can be increased via ethtool.
If user wants to unbind NICVF driver from a secondary Qset then it
should be done after tearing down primary VF's interface.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rework interrupt handler to avoid checking IRQ affinity of
CQ interrupts. Now separate handlers are registered for each IRQ
including RBDR. Register interrupt handlers for only those
which are being used. Add nicvf_dump_intr_status() and use it
in irq handlers.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch configures HW to strip 802.1Q header if found in a
receiving packet. The stripped VLAN ID and TCI information is
passed on to software via CQE_RX. Also sets netdev's 'vlan_features'
so that other HW offload features can be used for tagged packets.
This offload feature can be enabled or disabled via ethtool.
Network stack normally ignores RPS for 802.1Q packets and hence low
throughput. With this offload enabled throughput for tagged packets
will be almost same as normal packets.
Note: This patch doesn't enable HW VLAN insertion for transmit packets.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding support for receive hashing HW offload by using RSS_ALG
and RSS_TAG fields of CQE_RX descriptor. Also removed dependency
on minimum receive queue count to configure RSS so that hash is
always generated.
This hash is used by RPS logic to distribute flows across multiple
CPUs. Offload can be disabled via ethtool.
Signed-off-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the nicvf_send_msg_to_pf() function in the mailbox code.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added ethtool support to dump receive packet error statistics reported
in CQE. Also made some small fixes
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The liquidio and thunder drivers have different maintainers.
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raghavendra K T says:
====================
Optimize the snmp stat aggregation for large cpus
While creating 1000 containers, perf is showing lot of time spent in
snmp_fold_field on a large cpu system.
The current patch tries to improve by reordering the statistics gathering.
Please note that similar overhead was also reported while creating
veth pairs https://lkml.org/lkml/2013/3/19/556
Changes in V4:
- remove 'item' variable and use IPSTATS_MIB_MAX to avoid sparse
warning (Eric) also remove 'item' parameter (Joe)
- add missing memset of padding.
Changes in V3:
- use memset to initialize temp buffer in leaf function. (David)
- use memcpy to copy the buffer data to stat instead of unalign_pu (Joe)
- Move buffer definition to leaf function __snmp6_fill_stats64() (Eric)
-
Changes in V2:
- Allocate the stat calculation buffer in stack. (Eric)
Setup:
160 cpu (20 core) baremetal powerpc system with 1TB memory
1000 docker containers was created with command
docker run -itd ubuntu:15.04 /bin/bash in loop
observation:
Docker container creation linearly increased from around 1.6 sec to 7.5 sec
(at 1000 containers) perf data showed, creating veth interfaces resulting in
the below code path was taking more time.
rtnl_fill_ifinfo
-> inet6_fill_link_af
-> inet6_fill_ifla6_attrs
-> snmp_fold_field
proposed idea:
currently __snmp6_fill_stats64 calls snmp_fold_field that walks
through per cpu data to of an item (iteratively for around 36 items).
The patch tries to aggregate the statistics by going through
all the items of each cpu sequentially which is reducing cache
misses.
Performance of docker creation improved by around more than 2x
after the patch.
before the patch:
================
3f45ba571a42e925c4ec4aaee0e48d7610a9ed82a4c931f83324d41822cf6617
real 0m6.836s
user 0m0.095s
sys 0m0.011s
perf record -a docker run -itd ubuntu:15.04 /bin/bash
=======================================================
50.73% docker [kernel.kallsyms] [k] snmp_fold_field
9.07% swapper [kernel.kallsyms] [k] snooze_loop
3.49% docker [kernel.kallsyms] [k] veth_stats_one
2.85% swapper [kernel.kallsyms] [k] _raw_spin_lock
1.37% docker docker [.] backtrace_qsort
1.31% docker docker [.] strings.FieldsFunc
cache-misses: 2.7%
after the patch:
=============
9178273e9df399c8290b6c196e4aef9273be2876225f63b14a60cf97eacfafb5
real 0m3.249s
user 0m0.088s
sys 0m0.020s
perf record -a docker run -itd ubuntu:15.04 /bin/bash
=======================================================
10.57% docker docker [.] scanblock
8.37% swapper [kernel.kallsyms] [k] snooze_loop
6.91% docker [kernel.kallsyms] [k] snmp_get_cpu_field
6.67% docker [kernel.kallsyms] [k] veth_stats_one
3.96% docker docker [.] runtime_MSpan_Sweep
2.47% docker docker [.] strings.FieldsFunc
cache-misses: 1.41 %
Please let me know if you have suggestions/comments.
Thanks Eric, Joe and David for the comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Docker container creation linearly increased from around 1.6 sec to 7.5 sec
(at 1000 containers) and perf data showed 50% ovehead in snmp_fold_field.
reason: currently __snmp6_fill_stats64 calls snmp_fold_field that walks
through per cpu data of an item (iteratively for around 36 items).
idea: This patch tries to aggregate the statistics by going through
all the items of each cpu sequentially which is reducing cache
misses.
Docker creation got faster by more than 2x after the patch.
Result:
Before After
Docker creation time 6.836s 3.25s
cache miss 2.7% 1.41%
perf before:
50.73% docker [kernel.kallsyms] [k] snmp_fold_field
9.07% swapper [kernel.kallsyms] [k] snooze_loop
3.49% docker [kernel.kallsyms] [k] veth_stats_one
2.85% swapper [kernel.kallsyms] [k] _raw_spin_lock
perf after:
10.57% docker docker [.] scanblock
8.37% swapper [kernel.kallsyms] [k] snooze_loop
6.91% docker [kernel.kallsyms] [k] snmp_get_cpu_field
6.67% docker [kernel.kallsyms] [k] veth_stats_one
changes/ideas suggested:
Using buffer in stack (Eric), Usage of memset (David), Using memcpy in
place of unaligned_put (Joe).
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On Alpha we have spinlocks that are 32b in size and an efficient
cmpxchg64 implementation, so we qualify to make use of cmpxchg backed
lockrefs. Select the ARCH_USE_CMPXCHG_LOCKREF Kconfig symbol and provide
a trivial implementation of arch_spin_value_unlocked to satisfy the
lockref code.
Using Linus' simple testcase from
http://article.gmane.org/gmane.linux.file-systems/77466 on a dual CPU
ES47 system I see around an 8% gain:
N Min Max Median Avg Stddev
x 30 6194580 6295654 6272504 6272514 17694.232
+ 30 6731164 6786334 6767982 6764274 13738.863
Difference at 95.0% confidence
491760 +/- 8188.17
7.83992% +/- 0.130541%
(Student's t, pooled s = 15840.5)
Signed-off-by: Matt Turner <mattst88@gmail.com>
This is a second pull-request which adds last part of
atomic modeset/pageflip support, render node support,
clean-up, and fix-up.
* 'exynos-drm-next' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos:
drm/exynos: fix build warning to exynos_drm_gem.c
drm/exynos: Properly report supported formats for each device
drm/exynos: add render node support
drm/exynos: implement atomic_{begin/flush} of DECON
drm/exynos: remove legacy ->suspend()/resume()
drm/exynos: Enable atomic modesetting feature
drm/exynos: remove wait queue for pending page flip
drm/exynos: wait all planes updates to finish
drm/exynos: add atomic asynchronous commit
drm/exynos: fimd: only finish update if START == START_S
drm/exynos: add macro to get the address of START_S reg
drm/exynos: check for pending fb before finish update
drm/exynos: fimd: move window protect code to prepare/cleanup_plane
drm/exynos: add prepare and cleanup phases for planes
drm/exynos: fimd: unify call to exynos_drm_crtc_finish_pageflip()
drm/exynos: don't track enabled state at exynos_crtc
Some i915 fixes headed for v4.3. SKL DDI-E is a wip, but here's the
first in a series.
* tag 'drm-intel-next-fixes-2015-08-28' of git://anongit.freedesktop.org/drm-intel:
drm/i915/skl: enable DDI-E hotplug
drm/i915: Fix build warning on 32-bit
drm/i915/skl: Update DDI buffer translation programming.
drm/i915: Allow parsing of variable size child device entries from VBT
drm/i915: fix link rates reported for SKL
drm/i915: fix VBT parsing for SDVO child device mapping
Just one small fix before 4.3 merge window:
- Use linux/mman.h instead of uapi's mman-common.h inside the driver.
* tag 'drm-amdkfd-next-fixes-2015-08-30' of git://people.freedesktop.org/~gabbayo/linux:
amdkfd: use <linux/mman.h> instead of <uapi/asm-generic/mman-common.h>
Exynos DRM reported that all planes for all supported sub-devices supports
only three pixel formats: XRGB24, ARGB24 and NV12. This patch lets each
Exynos DRM sub-drivers to provide the list of supported pixel formats
and registers this list to DRM core.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
This patch allows clients who want to use render node to access
rendering relevant ioctls - g2d, post processor and gem allocation.
Signed-off-by: Joonyoung Shim <jy0922.shim@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Each CRTC's atomic_{begin/flush} must stop/start the update of shadow
registers to active register in the functions. This patch achieves these
purpose by moving the setting of protection bits to those functions from
decon_update_plane.
v2: rebased to the branch exynos-drm-next
Signed-off-by: Hyungwon Hwang <human.hwang@samsung.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
These legacy helpers should only be used by shadow-attaching drivers.
KMS drivers has its own way to handle suspend/resume and don't need to
use these two helpers.
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Inki Dae <daeinki@gmail.com>
From: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Now that atomic modesetting is implemented for exynos enable the
DRIVER_ATOMIC flag on the driver's features.
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Exynos atomic commit procedures already does this job of waiting for
pending updates to finish, that means using pending_flip_queue is
pointless now because the disable CRTC procedure will never happen
during a page_flip.
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Add infrastructure to wait for all planes updates to finish by using
an atomic_t variable to track how many pending updates we are waiting
plus a wait_queue for the wait part.
It also changes vblank behaviour and keeps it enabled for all types
of updates
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
The atomic modesetting interfaces supports async commits that should be
implemented by the drivers. If drm core requests an async commit
exynos_atomic_commit() will now schedule a work task to run the update later.
It also serializes commits that needs to run on the same crtc, putting the
following commit to wait until the current one is finished.
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
fimd_update_plane() programs BUF_START[win] and during the update
BUF_START[win] is copied to BUF_START_S[win] (its shadow register)
and starts scanning out, then it raises a irq.
The fimd_irq_handler, in the case we have a pending_fb, will check
the fb value was copied to START_S register and finish the update
in case of success.
Based on patch from Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
This macro is need to get the value of the START shadow register, that
will tell if an framebuffer is currently displayed on the screen or not.
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
The current code was ignoring the end of update for all overlay planes,
caring only for the primary plane update in case of pageflip.
This change adds a change to start to check for pending updates for all
planes through exynos_plane->pending_fb. At the start of plane update the
pending_fb is set with the fb to be shown on the screen. Then only when to
fb is already presented in the screen we set pending_fb to NULL to
signal that the update was finished.
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
fixup! drm/exynos: check for pending fb before finish update
Only set/clear the update bit in the CRTC's .atomic_begin()/flush()
so all planes are really committed at the same time.
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
From: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
.prepare_plane() and .cleanup_plane() allows to perform extra operations
before and after the update of planes. For FIMD for example this will
be used to enable disable the shadow protection bit.
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Unify handling of finished plane update to prepare for a following patch
that will check for the START and START_S regs to really make sure that
the plane was updated.
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
struct drm_crtc already stores the enabled state of the crtc
thus we don't need to replicate enabled in exynos_drm_crtc.
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Inki Dae <inki.dae@samsung.com>