This patch uses a seqlock to ensure consistency between idst->dst and
idst->cookie. It also makes dst freeing from fib tree to undergo a
rcu grace period.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is a prep work to get dst freeing from fib tree undergo
a rcu grace period.
The following is a common paradigm:
if (ip6_del_rt(rt))
dst_free(rt)
which means, if rt cannot be deleted from the fib tree, dst_free(rt) now.
1. We don't know the ip6_del_rt(rt) failure is because it
was not managed by fib tree (e.g. DST_NOCACHE) or it had already been
removed from the fib tree.
2. If rt had been managed by the fib tree, ip6_del_rt(rt) failure means
dst_free(rt) has been called already. A second
dst_free(rt) is not always obviously safe. The rt may have
been destroyed already.
3. If rt is a DST_NOCACHE, dst_free(rt) should not be called.
4. It is a stopper to make dst freeing from fib tree undergo a
rcu grace period.
This patch is to use a DST_NOCACHE flag to indicate a rt is
not managed by the fib tree.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Problems in the current dst_entry cache in the ip6_tunnel:
1. ip6_tnl_dst_set is racy. There is no lock to protect it:
- One major problem is that the dst refcnt gets messed up. F.e.
the same dst_cache can be released multiple times and then
triggering the infamous dst refcnt < 0 warning message.
- Another issue is the inconsistency between dst_cache and
dst_cookie.
It can be reproduced by adding and removing the ip6gre tunnel
while running a super_netperf TCP_CRR test.
2. ip6_tnl_dst_get does not take the dst refcnt before returning
the dst.
This patch:
1. Create a percpu dst_entry cache in ip6_tnl
2. Use a spinlock to protect the dst_cache operations
3. ip6_tnl_dst_get always takes the dst refcnt before returning
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is a prep work to fix the dst_entry refcnt bugs in
ip6_tunnel.
This patch rename:
1. ip6_tnl_dst_check() to ip6_tnl_dst_get() to better
reflect that it will take a dst refcnt in the next patch.
2. ip6_tnl_dst_store() to ip6_tnl_dst_set() to have a more
conventional name matching with ip6_tnl_dst_get().
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is a prep work to fix the dst_entry refcnt bugs in ip6_tunnel.
This patch refactors some common init codes used by both
ip6gre_tunnel_init and ip6gre_tap_init.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The GPL does not permit us to link against the OpenSSL library. Use
LGPL for sign-file and extract-file instead.
[ The whole "openssl isn't compatible with gpl" is really just
fear-mongering, but there's no reason not to make modsign LGPL, so
nobody cares. - Linus ]
Reported-by: Julian Andres Klode <jak@jak-linux.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Julian Andres Klode <jak@jak-linux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert dff22d2054 ("PCI: Call pci_read_bridge_bases() from core instead
of arch code").
Reading PCI bridge windows is not arch-specific in itself, but there is PCI
core code that doesn't work correctly if we read them too early. For
example, Hannes found this case on an ARM Freescale i.mx6 board:
pci_bus 0000:00: root bus resource [mem 0x01000000-0x01efffff]
pci 0000:00:00.0: PCI bridge to [bus 01-ff]
pci 0000:00:00.0: BAR 8: no space for [mem size 0x01000000] (mem window)
pci 0000:01:00.0: BAR 2: failed to assign [mem size 0x00200000]
pci 0000:01:00.0: BAR 1: failed to assign [mem size 0x00004000]
pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x00000100]
The 00:00.0 mem window needs to be at least 3MB: the 01:00.0 device needs
0x204100 of space, and mem windows are megabyte-aligned.
Bus sizing can increase a bridge window size, but never *decrease* it (see
d65245c329 ("PCI: don't shrink bridge resources")). Prior to
dff22d2054, ARM didn't read bridge windows at all, so the "original size"
was zero, and we assigned a 3MB window.
After dff22d2054, we read the bridge windows before sizing the bus. The
firmware programmed a 16MB window (size 0x01000000) in 00:00.0, and since
we never decrease the size, we kept 16MB even though we only needed 3MB.
But 16MB doesn't fit in the host bridge aperture, so we failed to assign
space for the window and the downstream devices.
I think this is a defect in the PCI core: we shouldn't rely on the firmware
to assign sensible windows.
Ray reported a similar problem, also on ARM, with Broadcom iProc.
Issues like this are too hard to fix right now, so revert dff22d2054.
Reported-by: Hannes <oe5hpm@gmail.com>
Reported-by: Ray Jui <rjui@broadcom.com>
Link: http://lkml.kernel.org/r/CAAa04yFQEUJm7Jj1qMT57-LG7ZGtnhNDBe=PpSRa70Mj+XhW-A@mail.gmail.com
Link: http://lkml.kernel.org/r/55F75BB8.4070405@broadcom.com
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
The renesas-irqc interrupt controller is cascaded to the GIC, but its
driver doesn't propagate wake-up settings to the parent interrupt
controller.
Since commit aec89ef72b ("irqchip/gic: Enable SKIP_SET_WAKE and
MASK_ON_SUSPEND"), the GIC driver masks interrupts during suspend, and
wake-up through gpio-keys now fails on r8a73a4/ape6evm.
Fix this by propagating wake-up settings to the parent interrupt
controller. There's no need to handle irq_set_irq_wake() failures, as
the renesas-irqc interrupt controller is always cascaded to a GIC, and
the GIC driver always sets SKIP_SET_WAKE since the aforementioned
commit.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Magnus Damm <magnus.damm@gmail.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Link: http://lkml.kernel.org/r/1441731636-17610-3-git-send-email-geert%2Brenesas@glider.be
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The renesas-intc-irqpin interrupt controller is cascaded to the GIC, but
its driver doesn't propagate wake-up settings to the parent interrupt
controller.
Since commit aec89ef72b ("irqchip/gic: Enable SKIP_SET_WAKE and
MASK_ON_SUSPEND"), the GIC driver masks interrupts during suspend, and
wake-up through gpio-keys now fails on r8a7740/armadillo and
sh73a0/kzm9g.
Fix this by propagating wake-up settings to the parent interrupt
controller. There's no need to handle irq_set_irq_wake() failures, as
the renesas-intc-irqpin interrupt controller is always cascaded to a
GIC, and the GIC driver always sets SKIP_SET_WAKE since the
aforementioned commit.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Magnus Damm <magnus.damm@gmail.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Link: http://lkml.kernel.org/r/1441731636-17610-2-git-send-email-geert%2Brenesas@glider.be
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The renesas-intc-irqpin interrupt controller is cascaded to the GIC.
Hence when propagating wake-up settings to its parent interrupt
controller, the following lockdep warning is printed:
=============================================
[ INFO: possible recursive locking detected ]
4.2.0-armadillo-10725-g50fcd7643c034198 #781 Not tainted
---------------------------------------------
s2ram/1179 is trying to acquire lock:
(&irq_desc_lock_class){-.-...}, at: [<c005bb54>] __irq_get_desc_lock+0x78/0x94
but task is already holding lock:
(&irq_desc_lock_class){-.-...}, at: [<c005bb54>] __irq_get_desc_lock+0x78/0x94
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&irq_desc_lock_class);
lock(&irq_desc_lock_class);
*** DEADLOCK ***
May be due to missing lock nesting notation
7 locks held by s2ram/1179:
#0: (sb_writers#7){.+.+.+}, at: [<c00c9708>] __sb_start_write+0x64/0xb8
#1: (&of->mutex){+.+.+.}, at: [<c0125a00>] kernfs_fop_write+0x78/0x1a0
#2: (s_active#23){.+.+.+}, at: [<c0125a08>] kernfs_fop_write+0x80/0x1a0
#3: (autosleep_lock){+.+.+.}, at: [<c0058244>] pm_autosleep_lock+0x18/0x20
#4: (pm_mutex){+.+.+.}, at: [<c0057e50>] pm_suspend+0x54/0x248
#5: (&dev->mutex){......}, at: [<c0243a20>] __device_suspend+0xdc/0x240
#6: (&irq_desc_lock_class){-.-...}, at: [<c005bb54>] __irq_get_desc_lock+0x78/0x94
stack backtrace:
CPU: 0 PID: 1179 Comm: s2ram Not tainted 4.2.0-armadillo-10725-g50fcd7643c034198
Hardware name: Generic R8A7740 (Flattened Device Tree)
[<c00129f4>] (dump_backtrace) from [<c0012bec>] (show_stack+0x18/0x1c)
[<c0012bd4>] (show_stack) from [<c03f5d94>] (dump_stack+0x20/0x28)
[<c03f5d74>] (dump_stack) from [<c00514d4>] (__lock_acquire+0x67c/0x1b88)
[<c0050e58>] (__lock_acquire) from [<c0052df8>] (lock_acquire+0x9c/0xbc)
[<c0052d5c>] (lock_acquire) from [<c03fb068>] (_raw_spin_lock_irqsave+0x44/0x58)
[<c03fb024>] (_raw_spin_lock_irqsave) from [<c005bb54>] (__irq_get_desc_lock+0x78/0x94
[<c005badc>] (__irq_get_desc_lock) from [<c005c3d8>] (irq_set_irq_wake+0x28/0x100)
[<c005c3b0>] (irq_set_irq_wake) from [<c01e50d0>] (intc_irqpin_irq_set_wake+0x24/0x4c)
[<c01e50ac>] (intc_irqpin_irq_set_wake) from [<c005c17c>] (set_irq_wake_real+0x3c/0x50
[<c005c140>] (set_irq_wake_real) from [<c005c414>] (irq_set_irq_wake+0x64/0x100)
[<c005c3b0>] (irq_set_irq_wake) from [<c02a19b4>] (gpio_keys_suspend+0x60/0xa0)
[<c02a1954>] (gpio_keys_suspend) from [<c023b750>] (platform_pm_suspend+0x3c/0x5c)
Avoid this false positive by using a separate lockdep class for INTC
External IRQ Pin interrupts.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Magnus Damm <magnus.damm@gmail.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1441798974-25716-3-git-send-email-geert%2Brenesas@glider.be
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The renesas-irqc interrupt controller is cascaded to the GIC. Hence when
propagating wake-up settings to its parent interrupt controller, the
following lockdep warning is printed:
=============================================
[ INFO: possible recursive locking detected ]
4.2.0-ape6evm-10725-g50fcd7643c034198 #280 Not tainted
---------------------------------------------
s2ram/1072 is trying to acquire lock:
(&irq_desc_lock_class){-.-...}, at: [<c008d3fc>] __irq_get_desc_lock+0x58/0x98
but task is already holding lock:
(&irq_desc_lock_class){-.-...}, at: [<c008d3fc>] __irq_get_desc_lock+0x58/0x98
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&irq_desc_lock_class);
lock(&irq_desc_lock_class);
*** DEADLOCK ***
May be due to missing lock nesting notation
6 locks held by s2ram/1072:
#0: (sb_writers#7){.+.+.+}, at: [<c012eb14>] __sb_start_write+0xa0/0xa8
#1: (&of->mutex){+.+.+.}, at: [<c019396c>] kernfs_fop_write+0x4c/0x1bc
#2: (s_active#24){.+.+.+}, at: [<c0193974>] kernfs_fop_write+0x54/0x1bc
#3: (pm_mutex){+.+.+.}, at: [<c008213c>] pm_suspend+0x10c/0x510
#4: (&dev->mutex){......}, at: [<c02af3c4>] __device_suspend+0xdc/0x2cc
#5: (&irq_desc_lock_class){-.-...}, at: [<c008d3fc>] __irq_get_desc_lock+0x58/0x98
stack backtrace:
CPU: 0 PID: 1072 Comm: s2ram Not tainted 4.2.0-ape6evm-10725-g50fcd7643c034198 #280
Hardware name: Generic R8A73A4 (Flattened Device Tree)
[<c0018078>] (unwind_backtrace) from [<c00144f0>] (show_stack+0x10/0x14)
[<c00144f0>] (show_stack) from [<c0451f14>] (dump_stack+0x88/0x98)
[<c0451f14>] (dump_stack) from [<c007b29c>] (__lock_acquire+0x15cc/0x20e4)
[<c007b29c>] (__lock_acquire) from [<c007c6e0>] (lock_acquire+0xac/0x12c)
[<c007c6e0>] (lock_acquire) from [<c0457c00>] (_raw_spin_lock_irqsave+0x40/0x54)
[<c0457c00>] (_raw_spin_lock_irqsave) from [<c008d3fc>] (__irq_get_desc_lock+0x58/0x98)
[<c008d3fc>] (__irq_get_desc_lock) from [<c008ebbc>] (irq_set_irq_wake+0x20/0xf8)
[<c008ebbc>] (irq_set_irq_wake) from [<c0260770>] (irqc_irq_set_wake+0x20/0x4c)
[<c0260770>] (irqc_irq_set_wake) from [<c008ec28>] (irq_set_irq_wake+0x8c/0xf8)
[<c008ec28>] (irq_set_irq_wake) from [<c02cb8c0>] (gpio_keys_suspend+0x74/0xc0)
[<c02cb8c0>] (gpio_keys_suspend) from [<c02ae8cc>] (dpm_run_callback+0x54/0x124)
Avoid this false positive by using a separate lockdep class for IRQC
interrupts.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Magnus Damm <magnus.damm@gmail.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1441798974-25716-2-git-send-email-geert%2Brenesas@glider.be
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
After GICv2m was enabled for 32-bit ARM kernel, a warning popped up:
drivers/irqchip/irq-gic-v2m.c: In function gicv2m_compose_msi_msg:
drivers/irqchip/irq-gic-v2m.c💯2: warning: right shift count >= width
of type [enabled by default]
msg->address_hi = (u32) (addr >> 32);
^
This patch fixes it by using proper macros for splitting up the value.
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Stuart Yoder <stuart.yoder@freescale.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1442142873-20213-4-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When the ITS is configured for non-cacheable transactions, make sure
that the allocated, zeroed memory is flushed to the Point of
Coherency, allowing the ITS to observe the zeros instead of random
garbage (or even get its own data overwritten by zeros being evicted
from the cache...).
Fixes: 241a386c7d "irqchip: gicv3-its: Use non-cacheable accesses when no shareability"
Reported-and-tested-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Pavel Fedin <p.fedin@samsung.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1442142873-20213-3-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The GICv2 architecture mandates that the two 4kB GIC regions are
contiguous, and on two separate physical pages (so that access to
the second page can be trapped by a hypervisor). This doesn't work
very well when PAGE_SIZE is 64kB.
A relatively common hack^Wway to work around this is to alias each
4kB region over its own 64kB page. Of course in this case, the base
address you want to use is not really the begining of the region,
but base + 60kB (so that you get a contiguous 8kB region over two
distinct pages).
Normally, this would be described in DT with a new property, but
some HW is already out there, and the firmware makes sure that
it will override whatever you put in the GIC node. Duh. And of course,
said firmware source code is not available, despite being based
on u-boot.
The workaround is to detect the case where the CPU interface size
is set to 128kB, and verify the aliasing by checking that the ID
register for GIC400 (which is the only GIC wired this way so far)
is the same at base and base + 0xF000. In this case, we update
the GIC base address and let it roll.
And if you feel slightly sick by looking at this, rest assured that
I do too...
Reported-by: Julien Grall <julien.grall@citrix.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Stuart Yoder <stuart.yoder@freescale.com>
Cc: Pavel Fedin <p.fedin@samsung.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1442142873-20213-2-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The current implementation of platform MSI caches the msi_desc
pointer in irq_data::handler_data. This is a bit silly, as
we also have irq_data::msi_desc, which is perfectly valid.
Remove the useless assignment and simplify the whole flow.
Reported-by: Ma Jun <majun258@huawei.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
Link: http://lkml.kernel.org/r/1442147824-20971-1-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This is a preparatory patch for moving irq_data struct members. Search
and replace was done with coccinelle
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Amir Vadai <amirv@mellanox.com>
Currently, if we had a zero length mmio eventfd assigned on
KVM_MMIO_BUS. It will never be found by kvm_io_bus_cmp() since it
always compares the kvm_io_range() with the length that guest
wrote. This will cause e.g for vhost, kick will be trapped by qemu
userspace instead of vhost. Fixing this by using zero length if an
iodevice is zero length.
Cc: stable@vger.kernel.org
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch factors out core eventfd assign/deassign logic and leaves
the argument checking and bus index selection to callers.
Cc: stable@vger.kernel.org
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We only want zero length mmio eventfd to be registered on
KVM_FAST_MMIO_BUS. So check this explicitly when arg->len is zero to
make sure this.
Cc: stable@vger.kernel.org
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When entering the kernel at EL2, we fail to initialise the MDCR_EL2
register which controls debug access and PMU capabilities at EL1.
This patch ensures that the register is initialised so that all traps
are disabled and all the PMU counters are available to the host. When a
guest is scheduled, KVM takes care to configure trapping appropriately.
Cc: <stable@vger.kernel.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Enable generic idle loop for ARM64, so can support for hlt/nohlt
command line options to override default idle loop behavior.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The test titled "Test software clock events have valid period values"
was setting cpu/thread maps directly. Make it use the proper function
perf_evlist__set_maps() especially now that it also propagates the maps.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/1441699142-18905-15-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The test titled "Test number of exit event of a simple workload" was
setting cpu/thread maps directly. Make it use the proper function
perf_evlist__set_maps() especially now that it also propagates the maps.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/1441699142-18905-14-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix it by making it call perf_evlist__set_maps() instead of setting the
maps itself.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/1441699142-18905-13-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If evsels are added after maps are created, then they won't have any
maps propagated to them. Fix that.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/1441699142-18905-12-git-send-email-adrian.hunter@intel.com
[ Moved the moving of propagate_maps() to the patch before, so that this
one does _just_ the one lile fix calling in add()]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Subsequent fixes will need a function that just propagates maps for a
single evsel so factor it out.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/1441699142-18905-11-git-send-email-adrian.hunter@intel.com
[ Moved them to before perf_evlist__add() to avoid having to move it in the next patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since there is a function to set maps, perf_evlist__create_maps() should
use it.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/1441699142-18905-10-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Make perf_evlist__set_maps() more resilient by allowing for the
possibility that one or another of the maps isn't being changed and
therefore should not be "put".
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/1441699142-18905-9-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf_evlist__propagate_maps() cannot easily tell if an evsel has its own
cpu map. To make that simpler, keep a copy of the PMU cpu map and
adjust the propagation logic accordingly.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/1441699142-18905-8-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf_evlist__propagate_maps() incorrectly assumes evsel->threads is NULL
before reassigning it, but it won't be NULL when perf_evlist__set_maps()
is used to set different (or NULL) maps. Thus thread_map__put must be
used, which works even if evsel->threads is NULL.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/1441699142-18905-7-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit d49e469507 ("perf evsel: Add a backpointer to the evlist a
evsel is in") updated perf_evlist__add() but not
perf_evlist__splice_list_tail().
This illustrates that it is better if perf_evlist__splice_list_tail()
calls perf_evlist__add() instead of duplicating the logic, so do that.
This will also simplify a subsequent fix for propagating maps.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/1441699142-18905-6-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Subsequent patches will need to call perf_evlist__propagate_maps without
reference to a "target". Add evlist->has_user_cpus to record whether
the user has specified which cpus to target (and therefore whether that
list of cpus should override the default settings for a selected event
i.e. the cpu maps should be propagated)
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/1441699142-18905-5-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The validation checks that the values that were just assigned, got
assigned i.e. the error can't ever happen. Subsequent patches will call
this code in places where errors are not being returned. Changing those
code paths to return this non-existent error is counter-productive, so
just remove it.
That in turn results in perf_evlist__set_maps not needing to return an
error, but callers aren't checking it either, so remove that too.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/1441699142-18905-4-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Don't need to check for NULL when "putting" evlist->maps and
evlist->threads because the "put" functions already do that.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/1441699142-18905-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If evsel->cpus is to be reassigned then the current value must be "put",
which works even if it is NULL. Simplify the current logic by moving
the "put" next to the assignment.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/1441699142-18905-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
A recent change in gcc caused this build failure:
/var/lib/jenkins/workspace/gcc_kernel_build/linux/drivers/misc/cxl/cxl.h:72:27:
error: ‘CXL_PSL_DLCNTL’ defined but not used [-Werror=unused-const-variable]
static const cxl_p1_reg_t CXL_PSL_DLCNTL = {0x0060};
Because of this gcc commit:
Commit 1bca8cbd0c68366f07277f98ce6963e10c2aa617 by mark
PR28901 -Wunused-variable ignores unused const initialised variables in C
12 years ago it was decided that -Wunused-variable shouldn't warn about
static const variables because some code used const static char rcsid[]
strings which were never used but wanted in the code anyway. But as the
bug points out this hides some real bugs. These days the usage of
rcsids is not very popular anymore. So this patch changes the default
to warn about unused static const variables in C with
-Wunused-variable. And it adds a new option -Wno-unused-const-variable
to turn this warning off. For C++ this new warning is off by default,
since const variables can be used as #defines in C++. New testcases for
the new defaults in C and C++ are included testing the new warning and
suppressing it with an unused attribute or using
-Wno-unused-const-variable. gcc/ChangeLog
The cxl driver uses static consts in place of #defines in some cases
for type safety, so this change causes the driver to fail to build on
new copilers as these constants are not all used in every file that
imports the header. Suppress the warning for this driver to return to
the old behaviour of -Wunused-variable.
Reported-by: Anton Blanchard <anton@au1.ibm.com>
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently the first thing we do in cxl_probe is to grab a reference
on the pci device. Later on, we call device_register on our adapter.
In our remove path, we call device_unregister, but we never call
pci_dev_put. We therefore leak the device every time we do a
reflash.
device_register/unregister is sufficient to hold the reference.
Therefore, drop the call to pci_dev_get.
Here's why this is safe.
The proposed cxl_probe(pdev) calls cxl_adapter_init:
a) init calls cxl_adapter_alloc, which creates a struct cxl,
conventionally called adapter. This struct contains a
device entry, adapter->dev.
b) init calls cxl_configure_adapter, where we set
adapter->dev.parent = &dev->dev (here dev is the pci dev)
So at this point, the cxl adapter's device's parent is the PCI
device that I want to be refcounted properly.
c) init calls cxl_register_adapter
*) cxl_register_adapter calls device_register(&adapter->dev)
So now we're in device_register, where dev is the adapter device, and
we want to know if the PCI device is safe after we return.
device_register(&adapter->dev) calls device_initialize() and then
device_add().
device_add() does a get_device(). device_add() also explicitly grabs
the device's parent, and calls get_device() on it:
parent = get_device(dev->parent);
So therefore, device_register() takes a lock on the parent PCI dev,
which is what pci_dev_get() was guarding. pci_dev_get() can therefore
be safely removed.
Fixes: f204e0b8ce ("cxl: Driver code for powernv PCIe based cards for userspace access")
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Document the binding for the zynq specific chipidea UDC binding.
Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com>
Signed-off-by: Peter Chen <peter.chen@freescale.com>
Due to having hardware tx buffers less than 512 bytes in size, streaming
must be enabled on the Zynq for the udc to work at all. Add platform data
specific to the Zynq udc, which does not set the CI_HDRC_DISABLE_STREAMING
flag.
Based on a patch by the same name from the Xilinx vendor tree.
Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com>
Signed-off-by: Peter Chen <peter.chen@freescale.com>
According to spec, there are functional and protocol stalls.
For functional stall, it is for bulk and interrupt endpoints,
below are cases for it:
- Host sends SET_FEATURE request for Set-Halt, the udc driver
needs to set stall, and return true unconditionally.
- The gadget driver may call usb_ep_set_halt to stall certain
endpoints, if there is a transfer in pending, the udc driver
should not set stall, and return -EAGAIN accordingly.
These two kinds of stall need to be cleared by host using CLEAR_FEATURE
request (Clear-Halt).
For protocol stall, it is for control endpoint, this stall will
be set if the control request has failed. This stall will be
cleared by next setup request (hardware will do it).
It fixed usbtest (drivers/usb/misc/usbtest.c) Test 13 "set/clear halt"
test failure, meanwhile, this change has been verified by
USB2 CV Compliance Test and MSC Tests.
Cc: <stable@vger.kernel.org> #3.10+
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Felipe Balbi <balbi@ti.com>
Signed-off-by: Peter Chen <peter.chen@freescale.com>
In the event that TTM doesn't find a compatible memory type for the
driver's first placement choice (placement without eviction), TTM
returns -EINVAL without trying the driver's second choice.
This causes problems on vmwgfx when VRAM is disabled before first modeset
and during VT switches when fbdev is not enabled.
Fix this by also trying the driver's second choice before returning
-EINVAL.
v2: Also check that man->use_type is true for the driver's second choice.
Fixes a bug where disallowed memory types could be used.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Fix a few small mistakes in the static key documentation and
delete an unneeded sentence.
Suggested-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150914171105.511e1e21@lwn.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In btrfs_evict_inode, we properly truncate the page cache for evicted
inodes but then we call btrfs_wait_ordered_range for every inode as well.
It's the right thing to do for regular files but results in incorrect
behavior for device inodes for block devices.
filemap_fdatawrite_range gets called with inode->i_mapping which gets
resolved to the block device inode before getting passed to
wbc_attach_fdatawrite_inode and ultimately to inode_to_bdi. What happens
next depends on whether there's an open file handle associated with the
inode. If there is, we write to the block device, which is unexpected
behavior. If there isn't, we through normally and inode->i_data is used.
We can also end up racing against open/close which can result in crashes
when i_mapping points to a block device inode that has been closed.
Since there can't be any page cache associated with special file inodes,
it's safe to skip the btrfs_wait_ordered_range call entirely and avoid
the problem.
Cc: <stable@vger.kernel.org>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=100911
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
This commit removes all CONFIG_.*{,_MODULE} in ACPI code, replacing it
with IS_ENABLED().
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch adds the missing CONFIG_ prefix to INTEL_SOC_DTS_THERMAL
macros.
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If a file has a range pointing to a compressed extent, followed by
another range that points to the same compressed extent and a read
operation attempts to read both ranges (either completely or part of
them), the pages that correspond to the second range are incorrectly
filled with zeroes.
Consider the following example:
File layout
[0 - 8K] [8K - 24K]
| |
| |
points to extent X, points to extent X,
offset 4K, length of 8K offset 0, length 16K
[extent X, compressed length = 4K uncompressed length = 16K]
If a readpages() call spans the 2 ranges, a single bio to read the extent
is submitted - extent_io.c:submit_extent_page() would only create a new
bio to cover the second range pointing to the extent if the extent it
points to had a different logical address than the extent associated with
the first range. This has a consequence of the compressed read end io
handler (compression.c:end_compressed_bio_read()) finish once the extent
is decompressed into the pages covering the first range, leaving the
remaining pages (belonging to the second range) filled with zeroes (done
by compression.c:btrfs_clear_biovec_end()).
So fix this by submitting the current bio whenever we find a range
pointing to a compressed extent that was preceded by a range with a
different extent map. This is the simplest solution for this corner
case. Making the end io callback populate both ranges (or more, if we
have multiple pointing to the same extent) is a much more complex
solution since each bio is tightly coupled with a single extent map and
the extent maps associated to the ranges pointing to the shared extent
can have different offsets and lengths.
The following test case for fstests triggers the issue:
seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"
tmp=/tmp/$$
status=1 # failure is the default!
trap "_cleanup; exit \$status" 0 1 2 3 15
_cleanup()
{
rm -f $tmp.*
}
# get standard environment, filters and checks
. ./common/rc
. ./common/filter
# real QA test starts here
_need_to_be_root
_supported_fs btrfs
_supported_os Linux
_require_scratch
_require_cloner
rm -f $seqres.full
test_clone_and_read_compressed_extent()
{
local mount_opts=$1
_scratch_mkfs >>$seqres.full 2>&1
_scratch_mount $mount_opts
# Create a test file with a single extent that is compressed (the
# data we write into it is highly compressible no matter which
# compression algorithm is used, zlib or lzo).
$XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 4K" \
-c "pwrite -S 0xbb 4K 8K" \
-c "pwrite -S 0xcc 12K 4K" \
$SCRATCH_MNT/foo | _filter_xfs_io
# Now clone our extent into an adjacent offset.
$CLONER_PROG -s $((4 * 1024)) -d $((16 * 1024)) -l $((8 * 1024)) \
$SCRATCH_MNT/foo $SCRATCH_MNT/foo
# Same as before but for this file we clone the extent into a lower
# file offset.
$XFS_IO_PROG -f -c "pwrite -S 0xaa 8K 4K" \
-c "pwrite -S 0xbb 12K 8K" \
-c "pwrite -S 0xcc 20K 4K" \
$SCRATCH_MNT/bar | _filter_xfs_io
$CLONER_PROG -s $((12 * 1024)) -d 0 -l $((8 * 1024)) \
$SCRATCH_MNT/bar $SCRATCH_MNT/bar
echo "File digests before unmounting filesystem:"
md5sum $SCRATCH_MNT/foo | _filter_scratch
md5sum $SCRATCH_MNT/bar | _filter_scratch
# Evicting the inode or clearing the page cache before reading
# again the file would also trigger the bug - reads were returning
# all bytes in the range corresponding to the second reference to
# the extent with a value of 0, but the correct data was persisted
# (it was a bug exclusively in the read path). The issue happened
# only if the same readpages() call targeted pages belonging to the
# first and second ranges that point to the same compressed extent.
_scratch_remount
echo "File digests after mounting filesystem again:"
# Must match the same digests we got before.
md5sum $SCRATCH_MNT/foo | _filter_scratch
md5sum $SCRATCH_MNT/bar | _filter_scratch
}
echo -e "\nTesting with zlib compression..."
test_clone_and_read_compressed_extent "-o compress=zlib"
_scratch_unmount
echo -e "\nTesting with lzo compression..."
test_clone_and_read_compressed_extent "-o compress=lzo"
status=0
exit
Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo<quwenruo@cn.fujitsu.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>