i915_drm_thaw was not locking the mode_config lock when calling
drm_helper_resume_force_mode. When there were multiple wake sources,
this caused FDI training failure on SNB which in turn corrupted the
display.
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reported-by: Konstantin Belousov <kostikbel@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Totally unexpected that this regressed. Luckily it sounds like we just
need to have dmar disable on the igfx, not the entire system. At least
that's what a few days of testing between Tony Vroon and me indicates.
Reported-by: Tony Vroon <tony@linx.net>
Cc: Tony Vroon <tony@linx.net>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=43024
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This adds PCI ID for IVB GT2 server variant which we were missing.
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
[danvet: fix up conflict because the patch has been diffed against next. tsk.]
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Noticed by staring at intel_reg_dumper diffs. Unfortunately it does
not seem to completely fix the bug.
Still, it's good to get this right, and maybe it helps someplace else.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47117
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ben Widawsky reported missed IRQ issues and this patch here helps.
We have one other missed IRQ report still left on snb, reported by QA:
https://bugs.freedesktop.org/show_bug.cgi?id=46145
This is _not_ a regression due to the forcewake voodoo though, it
started showing up before that was applied and has been on-and-off for
the past few weeks. According to QA this patch does not help. But the
missed IRQ is always from the blt ring (despite running piglit, so
also render activity expected), so I'm hopefully that this is an issue
with the blt ring itself.
Tested-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Somehow the BIOS manages to screw things up when copying the VBT
around, because the one we scrap from the VBIOS rom actually works.
Cc: stable@kernel.org
Tested-by: Markus Heinz <markus.heinz@uni-dortmund.de>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=28812
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is yet another chapter in the ongoing saga of bringing RC6 to Sandy
Bridge machines by default.
Now that we have discovered that RC6 issues are triggered by RC6+ state,
let's try to disable it by default. Plain RC6 is the one responsible for
most energy savings, and so far it haven't given any problems - at least,
none we are aware of.
So with this, when i915_enable_rc6=-1 (e.g., the default value), we'll
attempt to enable plain RC6 only on SNB. For Ivy Bridge, the behavior
stays the same as always - we enable both RC6 and deep RC6.
Note that while this exact patch does not has explicit tested-by's, the
equivalent settings were fixed in 3.3 kernel by a smaller patch. And it
has also received considerable testing through Canonical RC6 task-force
testing at https://wiki.ubuntu.com/Kernel/PowerManagementRC6. Up to date,
it looks like all the known issues are gone.
v2: improve description and reference a couple of open bugs related to
RC6 which seem to be fixed with this change.
References: https://bugs.freedesktop.org/show_bug.cgi?id=41682
References: https://bugs.freedesktop.org/show_bug.cgi?id=38567
References: https://bugs.freedesktop.org/show_bug.cgi?id=44867
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This allows to select which rc6 modes are to be used via kernel parameter,
via a bitmask parameter. E.g.:
- to enable rc6, i915_enable_rc6=1
- to enable rc6 and deep rc6, i915_enable_rc6=3
- to enable rc6 and deepest rc6, use i915_enable_rc6=5
- to enable rc6, deep and deepest rc6, use i915_enable_rc6=7
Please keep in mind that the deepest RC6 state really should NOT be used
by default, as it could potentially worsen the issues with deep RC6. So do
enable it only when you know what you are doing. However, having it around
could help solving possible future rc6-related issues and their debugging
on user machines.
Note that this changes behavior - previously, value of 1 would enable both
RC6 and deep RC6. Now it should only enable RC6 and deep/deepest RC6
stages must be enabled manually.
v2: address Chris Wilson comments and clean up the code.
References: https://bugs.freedesktop.org/show_bug.cgi?id=42579
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The BLT commands on gen2/3 utilize the fence registers and so we cannot
modify any fences for the object whilst those commands are in flight.
Currently we marked tiled commands as occupying a fence, but forgot to
restrict the untiled commands from preventing a fence being assigned
before they were completed.
One side-effect is that we ten have to double check that a fence was
allocated for a fenced buffer during move-to-active.
Reported-by: Jiri Slaby <jirislaby@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43427
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47990
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Testcase: i-g-t/tests/gem_tiled_after_untiled_blt
Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The ppgtt page directory lives in a snatched part of the gtt pte
range. Which naturally gets cleared on hibernate when we pull the
power. Suspend to ram (which is what I've tested) works because
despite the fact that this is a mmio region, it is actually back by
system ram.
Fix this by moving the page directory setup code to the ppgtt init
code (which gets called on resume).
This fixes hibernate on my ivb and snb.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Quoting the BSpec from time immemorial:
PIPEACONF, bits 28:27: Frame Start Delay (Debug)
Used to delay the frame start signal that is sent to the display planes.
Care must be taken to insure that there are enough lines during VBLANK
to support this setting.
An instance of the BIOS leaving these bits set was found in the wild,
where it caused our modesetting to go all squiffy and skewiff.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47271
Reported-and-tested-by: Eva Wang <evawang@linpus.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=43012
Reported-and-tested-by: Carl Richell <carl@system76.com>
Cc: stable@kernel.org
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Looking at hibernate overwriting I though it looked like a cursor,
so I tracked down this missing piece to stop the cursor blink
timer. I've no idea if this is sufficient to fix the hibernate
problems people are seeing, but please test it.
Both radeon and nouveau have done this for a long time.
I've run this personally all night hib/resume cycles with no fails.
Reviewed-by: Keith Packard <keithp@keithp.com>
Reported-by: Petr Tesarik <kernel@tesarici.cz>
Reported-by: Stanislaw Gruszka <sgruszka@redhat.com>
Reported-by: Lots of misc segfaults after hibernate across the world.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=37142
Tested-by: Dave Airlie <airlied@redhat.com>
Tested-by: Bojan Smojver <bojan@rexursive.com>
Tested-by: Andreas Hartmann <andihartmann@01019freenet.de>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Haven't seen this yet, but it doesn't hurt.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
ValleyView has a new interrupt architecture; best to put it in a new set
of functions. Also make sure the ring mask functions handle ValleyView.
FIXME: fix flipping; need to enable interrupts and call prepare/finish
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
ValleyView handles force wake differently than previous chipsets, so add
a couple of new functions for it. But leave it disabled by default
until we test it (need a chip with the Punit enabled first).
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
HDMI register offsets are different in Valleyview. Add support for the
same.
v2: drop superfluous comments in HDMI init (Daniel)
Signed-off-by: Beeresh G <beeresh.g@intel.com>
Signed-off-by: Shobhit Kumar <shobhit.kumar@intel.com>
Reviewed-by: Vijay Purushothaman <vijay.a.purushothaman@intel.com>
Reviewed-by: Jesse Barnes <jesse.barnes@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Set required clock gating and chicken bits on VLV.
v2: set PIXEL_SUBSPAN_COLLECT_OPT_DISABLE too (Ben)
move function below ivb version to pretend to be consistent (Ben)
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
ValleyView puts some display related registers like the PLL controls and
dividers behind the DPIO bus. Add simple indirect register access
routines to get to those registers.
v2: move new wait_for macro to intel_drv.h (Ben)
fix DPIO_PKT double write (Ben)
add debugfs file
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Add support for ValleyView watermark handling.
v2: remove unused reg & bit definitions (Ben)
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
For use by the rest of the ValleyView code.
v2: fix desktop variant to not set is_mobile (Ben)
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Makes it more readable and maintainable. ValleyView will add its own
PLL update function in a later patch.
v2: split LVDS bits out of this patch (Daniel)
v3: fix dropped DP dithering hunk (Daniel)
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
danvet:
- fixup spurious whitespace change
- reorder patches to fix bisect breakage
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Just to make things clearer and reduce the size of this monstrosity.
v2: make sure 8xx PLL update function calls update_lvds too (Daniel)
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
danvet: fixed patch ordering to avoid breaking bisect.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Fixes a regression from 9e984bc1 (drm/i915: Don't do MTRR setup if PAT
is enabled) where we left the MTRR as 0 and so tried to free a MTRR we
did not own during unload.
Reported-and-tested-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This memory is always allocated, and it is always a fixed size, so just
allocate it along with the rest of the driver state.
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There is no GMBUS "disabled" port 0, nor "reserved" port 7.
For the other 6 ports there is a fixed 1:1 mapping between pin pairs and
gmbus ports, which means every real gmbus port has a gpio pin.
Given these realizations, clean up gmbus initialization.
Tested on Sandybridge (gen 6, PCH == CougarPoint) hardware.
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Instead of letting other modules directly access the ->gmbus array,
introduce intel_gmbus_get_adapter() for looking up an i2c_adapter
for a given gmbus port identifier. This will enable later refactoring
of the gmbus port list.
Note: Before requesting an adapter for a given gmbus port number, the
driver must first check its validity using i2c_intel_gmbus_is_port_valid().
If this check fails, a call to intel_gmbus_get_adapter() will WARN_ON and
return NULL. This is relevant for parts of the driver that read a port
from VBIOS, which might be improperly initialized and contain an invalid
port. In these cases, the driver must fall back to using a safer default
port.
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Instead of rolling our own custom quirk_xfer function, use the bit_algo
pre_xfer and post_xfer functions to setup and teardown bit-banged
i2c transactions.
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
According to i915 documentation [1], "Port D" (DP/HDMI Port D) is
actually gmbus pin pair 6 (gmbus0.2:0 == 110b GPIOF), not 7 (111b).
Pin pair 7 is a reserved pair.
[1] Documentation for [DevSNB+] and [DevIBX], as found on
http://intellinuxgraphics.org:
[DevSNB+]:
http://intellinuxgraphics.org/documentation/SNB/IHD_OS_Vol3_Part3.pdf
Section 2.2.2 lists the 6 gmbus ports (gpio pin pairs):
[ 5: HDMI/DPD, 4: HDMIB, 3: HDMI/DPC, 2: LVDS, 1: SSC, 0: VGA ]
2.2.2.1 lists the GPIO registers to control these 6 ports.
2.2.3.1 lists the mapping between 5 of these gmbus ports and the 3
Pin_Pair_Select bits (of the GMBUS0 register). This table is missing
HDMIB (port 101).
[DevIBX]: http://intellinuxgraphics.org/IHD_OS_Vol3_Part3r2.pdf
Section 2.2.2 lists the same 6 gmbus ports plus two 'reserved' gpio
ports.
2.2.2.1 lists 8 GPIO registers... however, it says the size of the
block is 6x32, which implies that those 2 reserved GPIO registers
(GPIO_6 & GPIO_7) don't actually exist (or are irrelevant).
2.2.3.1 lists the mapping between the 6 named gmbus ports and the 3
Pin_Pair_Select bits (of the GMBUS0 register). This table has HDMIB.
Note: the "reserved" and "disabled" pairs do not actually map to a
physical pair of pins, nor GPIO regs and shouldn't be initialized or used.
Fixing this is left for a later patch.
This bug had not been noticed earlier for two reasons:
1) Until recently, "gmbus" mode was disabled - all transfers actually
used "bit-bang" mode on GPIO port 5 (the "HDMI/DPD CTLDATA/CLK"
pair), at register 0x5024 (defined as GPIOF i915_reg.h).
Since this is the correct pair of pins for HDMI1, transfers succeed.
2) Even if gmbus mode is re-enabled, the first attempted transaction
will fail because it tries to use the wrong ("Reserved") pin pair.
However, the driver immediately falls back again to the bit-bang
method, which correctly uses GPIOF, so again, transfers succeed.
However, if gmbus mode is re-enabled and the GPIO fall-back mode is
disabled, then reading an attached monitor's EDID fail.
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Split out gmbus_xfer_read/write() helper functions.
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Beside helping the compiler untangle this maze they double-up as
documentation for which parts of the code aren't performance-critical
but just around to keep old (but already dead-slow) userspace from
breaking.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The issue is that with inline clflushing the clflushing isn't properly
swizzled. Fix this by
- always clflushing entire 128 byte chunks and
- unconditionally flush before writes when swizzling a given page.
We could be clever and check whether we pwrite a partial 128 byte
chunk instead of a partial cacheline, but I've figured that's not
worth it.
Now the usual approach is to fold this into the original patch series, but
I've opted against this because
- this fixes a corner case only very old userspace relies on and
- I'd like to not invalidate all the testing the pwrite rewrite has gotten.
This fixes the regression notice by tests/gem_tiled_partial_prite_pread
from i-g-t. Unfortunately it doesn't fix the issues with partial pwrites to
tiled buffers on bit17 swizzling machines. But that is also broken without
the pwrite patches, so likely a different issue (or a problem with the
testcase).
v2: Simplify the patch by dropping the overly clever partial write
logic for swizzled pages.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
drm/i915 wants to read/write more than one page in its fastpath
and hence needs to prefault more than PAGE_SIZE bytes.
Add new functions in filemap.h to make that possible.
Also kill a copy&pasted spurious space in both functions while at it.
v2: As suggested by Andrew Morton, add a multipage parameter to both
functions to avoid the additional branch for the pagemap.c hotpath.
My gcc 4.6 here seems to dtrt and indeed reap these branches where not
needed.
v3: Becaus I couldn't find a way around adding a uaddr += PAGE_SIZE to
the filemap.c hotpaths (that the compiler couldn't remove again),
let's go with separate new functions for the multipage use-case.
v4: Adjust comment to CodingStlye and fix spelling.
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
While moving around things, this two functions slowly grew out of any
sane bounds. So extract a few lines that do the copying and
clflushing. Also add a few comments to explain what's going on.
v2: Again do s/needs_clflush/needs_clflush_after/ in the write paths
as suggested by Chris Wilson.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It's around 20% faster.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It's too expensive to move it around just for that pwrite, especially
when we're trashing on the mappable gtt part like crazy.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In micro-benchmarking of the usual pwrite use-pattern of alternating
pwrites with gtt domain reads from the gpu, this yields around 30%
improvement of pwrite throughput across all buffers size. The trick is
that we can avoid clflush cachelines that we will overwrite completely
anyway.
Furthermore for partial pwrites it gives a proportional speedup on top
of the 30% percent because we only clflush back the part of the buffer
we're actually writing.
v2: Simplify the clflush-before-write logic, as suggested by Chris
Wilson.
v3: Finishing touches suggested by Chris Wilson:
- add comment to needs_clflush_before and only set this if the bo is
uncached.
- s/needs_clflush/needs_clflush_after/ in the write paths to clearly
differentiate it from needs_clflush_before.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The pagemap.h prefault helpers do the prefaulting by simply writing
some data into every page. Hence we should not prefault when we're not
yet commited to to actually writing data to userspace. The problem is
now that
- we can't prefault while holding dev->struct_mutex for we could
deadlock with our own pagefault handler
- we need to grab dev->struct_mutex before copying to sync up with any
outsanding gpu writes.
Therefore only prefault when we're dropping the lock the first time in
the pread slowpath - at that point we're committed to the write, don't
wait on the gpu anymore and hence won't return early (with e.g.
-EINTR).
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With the proper prefault, it's extremely unlikely that we fall back
to the gtt slowpath.
So just kill it and use the shmem_pwrite path as fallback.
To further clean up the code, move the preparatory gem calls into the
respective pwrite functions. This way the gtt_fast->shmem fallback
is much more obvious.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This speeds up pwrite and pread from ~120 µs ro ~100 µs for
reading/writing 1mb on my snb (if the backing storage pages
are already pinned, of course).
v2: Chris Wilson pointed out a glaring page reference bug - I've
unconditionally dropped the reference. With that fixed (and the
associated reduction of dirt in dmesg) it's now even a notch faster.
v3: Unconditionaly grab a page reference when dropping
dev->struct_mutex to simplify the code-flow.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
~120 µs instead fo ~210 µs to write 1mb on my snb. I like this.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
No longer needed.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is obviously gonna slow down pread. But for a half-way realistic
micro-benchmark, it doesn't matter: Non-broken userspace reads back
data from the gpu once before the gpu again dirties it.
So all this ranged clflush tracking is just a waste of time.
No pread performance change (neglecting the dumb benchmark of
constantly reading the same data) measured.
As an added bonus, this avoids clflush on read on coherent objects.
Which means that partial preads on snb are now roughly 4x as fast.
This will be usefull for e.g. the libva encoder - when I finally get
around to fix that up.
v2: Properly sync with the gpu on LLC machines.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With the previous rewrite, they've become essential identical.
v2: Simplify the page_do_bit17_swizzling logic as suggested by Chris
Wilson.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With the previous rewrite, they've become essential identical.
v2: Simplify the page_do_bit17_swizzling logic as suggested by Chris
Wilson.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We try to avoid writing the relocations through the uncached GTT, if the
buffer is currently in the CPU write domain and so will be flushed out to
main memory afterwards anyway. Also on SandyBridge we can safely write
to the pages in cacheable memory, so long as the buffer is LLC mapped.
In either of these cases, we therefore do not need to force the
reallocation of the buffer into the mappable region of the GTT, reducing
the aperture pressure.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>