commit 29a90b70893817e2f2bb3cea40a29f5308e21b21 upstream.
The intel-iommu DMA ops fail to correctly handle scatterlists where
sg->offset is greater than PAGE_SIZE - the IOVA allocation is computed
appropriately based on the page-aligned portion of the offset, but the
mapping is set up relative to sg->page, which means it fails to actually
cover the whole buffer (and in the worst case doesn't cover it at all):
(sg->dma_address + sg->dma_len) ----+
sg->dma_address ---------+ |
iov_pfn------+ | |
| | |
v v v
iova: a b c d e f
|--------|--------|--------|--------|--------|
<...calculated....>
[_____mapped______]
pfn: 0 1 2 3 4 5
|--------|--------|--------|--------|--------|
^ ^ ^
| | |
sg->page ----+ | |
sg->offset --------------+ |
(sg->offset + sg->length) ----------+
As a result, the caller ends up overrunning the mapping into whatever
lies beyond, which usually goes badly:
[ 429.645492] DMAR: DRHD: handling fault status reg 2
[ 429.650847] DMAR: [DMA Write] Request device [02:00.4] fault addr f2682000 ...
Whilst this is a fairly rare occurrence, it can happen from the result
of intermediate scatterlist processing such as scatterwalk_ffwd() in the
crypto layer. Whilst that particular site could be fixed up, it still
seems worthwhile to bring intel-iommu in line with other DMA API
implementations in handling this robustly.
To that end, fix the intel_map_sg() path to line up the mapping
correctly (in units of MM pages rather than VT-d pages to match the
aligned_nrpages() calculation) regardless of the offset, and use
sg_phys() consistently for clarity.
Reported-by: Harsh Jain <Harsh@chelsio.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed by: Ashok Raj <ashok.raj@intel.com>
Tested by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f7116e115acdd74bc75a4daf6492b11d43505125 upstream.
dma_pte_free_level() recurses down the IOMMU page tables and frees
directory pages that are entirely contained in the given PFN range.
Unfortunately, it incorrectly calculates the starting address covered
by the PTE under consideration, which can lead to it clearing an entry
that is still in use.
This occurs if we have a scatterlist with an entry that has a length
greater than 1026 MB and is aligned to 2 MB for both the IOMMU and
physical addresses. For example, if __domain_mapping() is asked to map a
two-entry scatterlist with 2 MB and 1028 MB segments to PFN 0xffff80000,
it will ask if dma_pte_free_pagetable() is asked to PFNs from
0xffff80200 to 0xffffc05ff, it will also incorrectly clear the PFNs from
0xffff80000 to 0xffff801ff because of this issue. The current code will
set level_pfn to 0xffff80200, and 0xffff80200-0xffffc01ff fits inside
the range being cleared. Properly setting the level_pfn for the current
level under consideration catches that this PTE is outside of the range
being cleared.
This patch also changes the value passed into dma_pte_free_level() when
it recurses. This only affects the first PTE of the range being cleared,
and is handled by the existing code that ensures we start our cursor no
lower than start_pfn.
This was found when using dma_map_sg() to map large chunks of contiguous
memory, which immediatedly led to faults on the first access of the
erroneously-deleted mappings.
Fixes: 3269ee0bd6 ("intel-iommu: Fix leaks in pagetable freeing")
Reviewed-by: Benjamin Serebrin <serebrin@google.com>
Signed-off-by: David Dillow <dillow@google.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f73a7eee900e95404b61408a23a1df5c5811704c upstream.
Ever since commit 091d42e43d ("iommu/vt-d: Copy translation tables from
old kernel") the kdump kernel copies the IOMMU context tables from the
previous kernel. Each device mappings will be destroyed once the driver
for the respective device takes over.
This unfortunately breaks the workflow of mapping and unmapping a new
context to the IOMMU. The mapping function assumes that either:
1) Unmapping did the proper IOMMU flushing and it only ever flush if the
IOMMU unit supports caching invalid entries.
2) The system just booted and the initialization code took care of
flushing all IOMMU caches.
This assumption is not true for the kdump kernel since the context
tables have been copied from the previous kernel and translations could
have been cached ever since. So make sure to flush the IOTLB as well
when we destroy these old copied mappings.
Cc: Joerg Roedel <joro@8bytes.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Anthony Liguori <aliguori@amazon.com>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Fixes: 091d42e43d ("iommu/vt-d: Copy translation tables from old kernel")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 21e722c4c8377b5bc82ad058fed12165af739c1b upstream.
The check to set identity map for tylersburg is done too late. It needs
to be done before the check for identity_map domain is done.
To: Joerg Roedel <joro@8bytes.org>
To: David Woodhouse <dwmw2@infradead.org>
Cc: iommu@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org
Cc: Ashok Raj <ashok.raj@intel.com>
Fixes: 86080ccc22 ("iommu/vt-d: Allocate si_domain in init_dmars()")
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Reported-by: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aec0e86172a79eb5e44aff1055bb953fe4d47c59 upstream.
We met the DMAR fault both on hpsa P420i and P421 SmartArray controllers
under kdump, it can be steadily reproduced on several different machines,
the dmesg log is like:
HP HPSA Driver (v 3.4.16-0)
hpsa 0000:02:00.0: using doorbell to reset controller
hpsa 0000:02:00.0: board ready after hard reset.
hpsa 0000:02:00.0: Waiting for controller to respond to no-op
DMAR: Setting identity map for device 0000:02:00.0 [0xe8000 - 0xe8fff]
DMAR: Setting identity map for device 0000:02:00.0 [0xf4000 - 0xf4fff]
DMAR: Setting identity map for device 0000:02:00.0 [0xbdf6e000 - 0xbdf6efff]
DMAR: Setting identity map for device 0000:02:00.0 [0xbdf6f000 - 0xbdf7efff]
DMAR: Setting identity map for device 0000:02:00.0 [0xbdf7f000 - 0xbdf82fff]
DMAR: Setting identity map for device 0000:02:00.0 [0xbdf83000 - 0xbdf84fff]
DMAR: DRHD: handling fault status reg 2
DMAR: [DMA Read] Request device [02:00.0] fault addr fffff000 [fault reason 06] PTE Read access is not set
hpsa 0000:02:00.0: controller message 03:00 timed out
hpsa 0000:02:00.0: no-op failed; re-trying
After some debugging, we found that the fault addr is from DMA initiated at
the driver probe stage after reset(not in-flight DMA), and the corresponding
pte entry value is correct, the fault is likely due to the old iommu caches
of the in-flight DMA before it.
Thus we need to flush the old cache after context mapping is setup for the
device, where the device is supposed to finish reset at its driver probe
stage and no in-flight DMA exists hereafter.
I'm not sure if the hardware is responsible for invalidating all the related
caches allocated in the iommu hardware before, but seems not the case for hpsa,
actually many device drivers have problems in properly resetting the hardware.
Anyway flushing (again) by software in kdump kernel when the device gets context
mapped which is a quite infrequent operation does little harm.
With this patch, the problematic machine can survive the kdump tests.
CC: Myron Stowe <myron.stowe@gmail.com>
CC: Joseph Szczypek <jszczype@redhat.com>
CC: Don Brace <don.brace@microsemi.com>
CC: Baoquan He <bhe@redhat.com>
CC: Dave Young <dyoung@redhat.com>
Fixes: 091d42e43d ("iommu/vt-d: Copy translation tables from old kernel")
Fixes: dbcd861f25 ("iommu/vt-d: Do not re-use domain-ids from the old kernel")
Fixes: cf484d0e69 ("iommu/vt-d: Mark copied context entries")
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Tested-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 65ca7f5f7d1cdde6c25172fe6107cd16902f826f upstream.
Different encodings are used to represent supported PASID bits
and number of PASID table entries.
The current code assigns ecap_pss directly to extended context
table entry PTS which is wrong and could result in writing
non-zero bits to the reserved fields. IOMMU fault reason
11 will be reported when reserved bits are nonzero.
This patch converts ecap_pss to extend context entry pts encoding
based on VT-d spec. Chapter 9.4 as follows:
- number of PASID bits = ecap_pss + 1
- number of PASID table entries = 2^(pts + 5)
Software assigned limit of pasid_max value is also respected to
match the allocation limitation of PASID table.
cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Tested-by: Mika Kuoppala <mika.kuoppala@intel.com>
Fixes: 2f26e0a9c9 ('iommu/vt-d: Add basic SVM PASID support')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1c387188c60f53b338c20eee32db055dfe022a9b upstream.
The VT-d specification (§8.3.3) says:
‘Virtual Functions’ of a ‘Physical Function’ are under the scope
of the same remapping unit as the ‘Physical Function’.
The BIOS is not required to list all the possible VFs in the scope
tables, and arguably *shouldn't* make any attempt to do so, since there
could be a huge number of them.
This has been broken basically for ever — the VF is never going to match
against a specific unit's scope, so it ends up being assigned to the
INCLUDE_ALL IOMMU. Which was always actually correct by coincidence, but
now we're looking at Root-Complex integrated devices with SR-IOV support
it's going to start being wrong.
Fix it to simply use pci_physfn() before doing the lookup for PCI devices.
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bea64033dd7b5fb6296eda8266acab6364ce1554 upstream.
It turns out that the disable_dmar_iommu() code-path tried
to get the device_domain_lock recursivly, which will
dead-lock when this code runs on dmar removal. Fix both
code-paths that could lead to the dead-lock.
Fixes: 55d940430a ('iommu/vt-d: Get rid of domain->iommu_lock')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 194dc870a5890e855ecffb30f3b80ba7c88f96d6 upstream.
Some of our "for_each_xyz()" macro constructs make gcc unhappy about
lack of braces around if-statements inside or outside the loop, because
the loop construct itself has a "if-then-else" statement inside of it.
The resulting warnings look something like this:
drivers/gpu/drm/i915/i915_debugfs.c: In function ‘i915_dump_lrc’:
drivers/gpu/drm/i915/i915_debugfs.c:2103:6: warning: suggest explicit braces to avoid ambiguous ‘else’ [-Wparentheses]
if (ctx != dev_priv->kernel_context)
^
even if the code itself is fine.
Since the warning is fairly easy to avoid by adding a braces around the
if-statement near the for_each_xyz() construct, do so, rather than
disabling the otherwise potentially useful warning.
(The if-then-else statements used in the "for_each_xyz()" constructs are
designed to be inherently safe even with no braces, but in this case
it's quite understandable that gcc isn't really able to tell that).
This finally leaves the standard "allmodconfig" build with just a
handful of remaining warnings, so new and valid warnings hopefully will
stand out.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5c365d18a73d3979db37006eaacefc0008869c0f upstream.
In 'commit <55d940430ab9> ("iommu/vt-d: Get rid of domain->iommu_lock")',
the error handling path is changed a little, which makes the function
always return 0.
This path fixes this.
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Fixes: 55d940430a ('iommu/vt-d: Get rid of domain->iommu_lock')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a4c34ff1c029e90e7d5f8dd8d29b0a93b31c3cb2 upstream.
This seems to be required on some X58 chipsets on systems
with more than one IOMMU. QI does not work until it is
enabled on all IOMMUs in the system.
Reported-by: Dheeraj CVR <cvr.dheeraj@gmail.com>
Tested-by: Dheeraj CVR <cvr.dheeraj@gmail.com>
Fixes: 5f0a7f7614 ('iommu/vt-d: Make root entry visible for hardware right after allocation')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e6a8c9b337eed56eb481e1b4dd2180c25a1e5310 upstream.
In the PCI hotplug path of the Intel IOMMU driver, replace
the usage of the BUS_NOTIFY_DEL_DEVICE notifier, which is
executed before the driver is unbound from the device, with
BUS_NOTIFY_REMOVED_DEVICE, which runs after that.
This fixes a kernel BUG being triggered in the VT-d code
when the device driver tries to unmap DMA buffers and the
VT-d driver already destroyed all mappings.
Reported-by: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit db0fa0cb01 "scatterlist: use sg_phys()" did replacements of
the form:
phys_addr_t phys = page_to_phys(sg_page(s));
phys_addr_t phys = sg_phys(s) & PAGE_MASK;
However, this breaks platforms where sizeof(phys_addr_t) >
sizeof(unsigned long). Revert for 4.3 and 4.4 to make room for a
combined helper in 4.5.
Cc: <stable@vger.kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: db0fa0cb01 ("scatterlist: use sg_phys()")
Suggested-by: Joerg Roedel <joro@8bytes.org>
Reported-by: Vitaly Lavrov <vel21ripn@gmail.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts. They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve". __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".
Over time, callers had a requirement to not block when fallback options
were available. Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.
This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative. High priority users continue to use
__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim. __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.
This patch then converts a number of sites
o __GFP_ATOMIC is used by callers that are high priority and have memory
pools for those requests. GFP_ATOMIC uses this flag.
o Callers that have a limited mempool to guarantee forward progress clear
__GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
into this category where kswapd will still be woken but atomic reserves
are not used as there is a one-entry mempool to guarantee progress.
o Callers that are checking if they are non-blocking should use the
helper gfpflags_allow_blocking() where possible. This is because
checking for __GFP_WAIT as was done historically now can trigger false
positives. Some exceptions like dm-crypt.c exist where the code intent
is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
flag manipulations.
o Callers that built their own GFP flags instead of starting with GFP_KERNEL
and friends now also need to specify __GFP_KSWAPD_RECLAIM.
The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.
The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL. They may
now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This time including:
* A new IOMMU driver for s390 pci devices
* Common dma-ops support based on iommu-api for ARM64. The plan is to
use this as a basis for ARM32 and hopefully other architectures as
well in the future.
* MSI support for ARM-SMMUv3
* Cleanups and dead code removal in the AMD IOMMU driver
* Better RMRR handling for the Intel VT-d driver
* Various other cleanups and small fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJWOz7hAAoJECvwRC2XARrjbvYQALwtITTA5iTm0y/ApwNMxI7n
pZpjZVPoBPNsGBc4t/MT8pVhUSdmpBOljbV4Y4CayL1mSSB6Bl2gooZjd66m7Z81
qMJYEVWhFQqVsIKkCSNOgaO7W5y+xt3rTgqN6vCu86/CCDfKrTPP/+CRl1T/z9bo
1J8ioM3KnZG9KzG8JuXYFg5wwbKToaBh6swSmj+O4U9hru7zV/ILP7ikcc9pyMji
12WbzCqchRORsJZD65xMRYAqRaPNN/3IlDejs00TOFhY3qpWgEgFUucyeRJBJ/+q
K4U8T5vZsnr1a04l7/BeYbLmP7y/9Qv0N0xMGtTyoy/w/BieGqRWu4hHhqf/44NO
EhCSXcEThMNCGTjP2VWC4dnQ/s7Y8OmSW9nCreUcFVxHoE5LfDoh8RngA2fpeNuS
ixb3OwP+YXHN9Ck+1BQqQCeBznsPTLuDxlhRjCJsWntIfMSkXebOkz83YxyZ9b0Q
gFvptfuknU7cotUwWa3dg8RiUB8kNlKJyEEByaVpWEbEOabnONKEMkstvuBx6Ots
kA63wbe7QcPgbUYuq7g0nijDw6E2aEtf0nx2Xx4ZDL932qjg/xUkiBpmbDXHw4Gu
nimNXVQtbCzF74SyTvxEtupiijOTm5eHtoKtg0mYnqPZ+V9eOwEvW8IHaFFf8XHD
SecikoTtH1Q4RVtqOcAQ
=jLlB
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
"This time including:
- A new IOMMU driver for s390 pci devices
- Common dma-ops support based on iommu-api for ARM64. The plan is
to use this as a basis for ARM32 and hopefully other architectures
as well in the future.
- MSI support for ARM-SMMUv3
- Cleanups and dead code removal in the AMD IOMMU driver
- Better RMRR handling for the Intel VT-d driver
- Various other cleanups and small fixes"
* tag 'iommu-updates-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (41 commits)
iommu/vt-d: Fix return value check of parse_ioapics_under_ir()
iommu/vt-d: Propagate error-value from ir_parse_ioapic_hpet_scope()
iommu/vt-d: Adjust the return value of the parse_ioapics_under_ir
iommu: Move default domain allocation to iommu_group_get_for_dev()
iommu: Remove is_pci_dev() fall-back from iommu_group_get_for_dev
iommu/arm-smmu: Switch to device_group call-back
iommu/fsl: Convert to device_group call-back
iommu: Add device_group call-back to x86 iommu drivers
iommu: Add generic_device_group() function
iommu: Export and rename iommu_group_get_for_pci_dev()
iommu: Revive device_group iommu-ops call-back
iommu/amd: Remove find_last_devid_on_pci()
iommu/amd: Remove first/last_device handling
iommu/amd: Initialize amd_iommu_last_bdf for DEV_ALL
iommu/amd: Cleanup buffer allocation
iommu/amd: Remove cmd_buf_size and evt_buf_size from struct amd_iommu
iommu/amd: Align DTE flag definitions
iommu/amd: Remove old alias handling code
iommu/amd: Set alias DTE in do_attach/do_detach
iommu/amd: WARN when __[attach|detach]_device are called with irqs enabled
...
Pull intel iommu updates from David Woodhouse:
"This adds "Shared Virtual Memory" (aka PASID support) for the Intel
IOMMU. This allows devices to do DMA using process address space,
translated through the normal CPU page tables for the relevant mm.
With corresponding support added to the i915 driver, this has been
tested with the graphics device on Skylake. We don't have the
required TLP support in our PCIe root ports for supporting discrete
devices yet, so it's only integrated devices that can do it so far"
* git://git.infradead.org/intel-iommu: (23 commits)
iommu/vt-d: Fix rwxp flags in SVM device fault callback
iommu/vt-d: Expose struct svm_dev_ops without CONFIG_INTEL_IOMMU_SVM
iommu/vt-d: Clean up pasid_enabled() and ecs_enabled() dependencies
iommu/vt-d: Handle Caching Mode implementations of SVM
iommu/vt-d: Fix SVM IOTLB flush handling
iommu/vt-d: Use dev_err(..) in intel_svm_device_to_iommu(..)
iommu/vt-d: fix a loop in prq_event_thread()
iommu/vt-d: Fix IOTLB flushing for global pages
iommu/vt-d: Fix address shifting in page request handler
iommu/vt-d: shift wrapping bug in prq_event_thread()
iommu/vt-d: Fix NULL pointer dereference in page request error case
iommu/vt-d: Implement SVM_FLAG_SUPERVISOR_MODE for kernel access
iommu/vt-d: Implement SVM_FLAG_PRIVATE_PASID to allocate unique PASIDs
iommu/vt-d: Add callback to device driver on page faults
iommu/vt-d: Implement page request handling
iommu/vt-d: Generalise DMAR MSI setup to allow for page request events
iommu/vt-d: Implement deferred invalidate for SVM
iommu/vt-d: Add basic SVM PASID support
iommu/vt-d: Always enable PASID/PRI PCI capabilities before ATS
iommu/vt-d: Add initial support for PASID tables
...
When booted with intel_iommu=ecs_off we were still allocating the PASID
tables even though we couldn't actually use them. We really want to make
the pasid_enabled() macro depend on ecs_enabled().
Which is unfortunate, because currently they're the other way round to
cope with the Broadwell/Skylake problems with ECS.
Instead of having ecs_enabled() depend on pasid_enabled(), which was never
something that made me happy anyway, make it depend in the normal case
on the "broken PASID" bit 28 *not* being set.
Then pasid_enabled() can depend on ecs_enabled() as it should. And we also
don't need to mess with it if we ever see an implementation that has some
features requiring ECS (like PRI) but which *doesn't* have PASID support.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Pull intel-iommu bugfix from David Woodhouse:
"This contains a single fix, for when the IOMMU API is used to overlay
an existing mapping comprised of 4KiB pages, with a mapping that can
use superpages.
For the *first* superpage in the new mapping, we were correctly¹
freeing the old bottom-level page table page and clearing the link to
it, before installing the superpage. For subsequent superpages,
however, we weren't. This causes a memory leak, and a warning about
setting a PTE which is already set.
¹ Well, not *entirely* correctly. We just free the page table pages
right there and then, which is wrong. In fact they should only be
freed *after* the IOTLB is flushed so we know the hardware will no
longer be looking at them.... and in fact I note that the IOTLB
flush is completely missing from the intel_iommu_map() code path,
although it needs to be there if it's permitted to overwrite
existing mappings.
Fixing those is somewhat more intrusive though, and will probably
need to wait for 4.4 at this point"
* tag 'for-linus-20151021' of git://git.infradead.org/intel-iommu:
iommu/vt-d: fix range computation when making room for large pages
This will give a little bit of assistance to those developing drivers
using SVM. It might cause a slight annoyance to end-users whose kernel
disables the IOMMU when drivers are trying to use it. But the fix there
is to fix the kernel to enable the IOMMU.
Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This provides basic PASID support for endpoint devices, tested with a
version of the i915 driver.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The behaviour if you enable PASID support after ATS is undefined. So we
have to enable it first, even if we don't know whether we'll need it.
This is safe enough; unless we set up a context that permits it, the device
can't actually *do* anything with it.
Also shift the feature detction to dmar_insert_one_dev_info() as it only
needs to happen once.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
As long as we use an identity mapping to work around the worst of the
hardware bugs which caused us to defeature it and change the definition
of the capability bit, we *can* use PASID support on the devices which
advertised it in bit 28 of the Extended Capability Register.
Allow people to do so with 'intel_iommu=pasid28' on the command line.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The VT-d specification says that "Software must enable ATS on endpoint
devices behind a Root Port only if the Root Port is reported as
supporting ATS transactions."
We walk up the tree to find a Root Port, but for integrated devices we
don't find one — we get to the host bridge. In that case we *should*
allow ATS. Currently we don't, which means that we are incorrectly
failing to use ATS for the integrated graphics. Fix that.
We should never break out of this loop "naturally" with bus==NULL,
since we'll always find bridge==NULL in that case (and now return 1).
So remove the check for (!bridge) after the loop, since it can never
happen. If it did, it would be worthy of a BUG_ON(!bridge). But since
it'll oops anyway in that case, that'll do just as well.
Cc: stable@vger.kernel.org
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
In preparation for deprecating ioremap_cache() convert its usage in
intel-iommu to memremap. This also eliminates the mishandling of the
__iomem annotation in the implementation.
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In preparation for the installation of a large page, any small page
tables that may still exist in the target IOV address range are
removed. However, if a scatter/gather list entry is large enough to
fit more than one large page, the address space for any subsequent
large pages is not cleared of conflicting small page tables.
This can cause legitimate mapping requests to fail with errors of the
form below, potentially followed by a series of IOMMU faults:
ERROR: DMA PTE for vPFN 0xfde00 already set (to 7f83a4003 not 7e9e00083)
In this example, a 4MiB scatter/gather list entry resulted in the
successful installation of a large page @ vPFN 0xfdc00, followed by
a failed attempt to install another large page @ vPFN 0xfde00, due to
the presence of a pointer to a small page table @ 0x7f83a4000.
To address this problem, compute the number of large pages that fit
into a given scatter/gather list entry, and use it to derive the
last vPFN covered by the large page(s).
Cc: stable@vger.kernel.org
Signed-off-by: Christian Zander <christian@nervanasys.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
A few fixes piled up:
* Fix for a suspend/resume issue where PCI probing code overwrote
dev->irq for the MSI irq of the AMD IOMMU.
* Fix for a kernel crash when a 32 bit PCI device was assigned to a KVM
guest.
* Fix for a possible memory leak in the VT-d driver
* A couple of fixes for the ARM-SMMU driver
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJWHNdbAAoJECvwRC2XARrjB/YQAKouJaRMjBaehx6kbaZMhMJy
hXDsh8Xl6TtCe6kLD2uXrvjLZAdu32kjrtzhhcM21EO5Ms2Weq6A60/98LwnJ4Eg
AqftjfxQsIwf2G1PvHb+xepgcFxIAhW6a3nORzx6d2AGrNWmMtUhbLTSncYjmojf
Td4dscuRmRPenJUV1JhcJQBR62QonknIHV99QmevaCSAoUdyuMH+t5kQVEgPjx7C
GlMPNEZZmGl7J3NXSWRtDSkUxFZ1OU8MTKc1LmPPHHAOZk37wbePihQbLLySlHPH
v4G1R05e2hG7C66yu959fyOleL87lDToUXhwQNFJMqEc+e7IzBzZsB3ANEHjpLQH
UJC9COU+sf8mPafja4ge/KbyGDmgDg/OMQJDhU6+DSXUflwymeWJmXr7sLFQex6O
nZO/SVzkbKj+PKxV7UnGD0sTeAAk0X6vfhFCL0l/acPpQg0T6Fpky5D5fUMv5dWS
xxxvxfwBcDoI44fxWBhfPYvmLFT9f5da+bpbzeeGjVSNezOkPJ65AJcVk5An4kQu
PRzJGoq3XpZHOeg5+O7IKzeuJ+3qc7Tz4wAzMxcaNFpVBl2qp1RUkTbmS9/YV1b5
ZOcIFBMLuUROE1ExsU19c5Uo0j1Bvh9jtdy6lNFCagQYzihtA0Jk19ucllx1jIjD
sdv2hgDIauRToKF1d9xz
=v5G4
-----END PGP SIGNATURE-----
Merge tag 'iommu-fixes-v4.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
"A few fixes piled up:
- Fix for a suspend/resume issue where PCI probing code overwrote
dev->irq for the MSI irq of the AMD IOMMU.
- Fix for a kernel crash when a 32 bit PCI device was assigned to a
KVM guest.
- Fix for a possible memory leak in the VT-d driver
- A couple of fixes for the ARM-SMMU driver"
* tag 'iommu-fixes-v4.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Fix NULL pointer deref on device detach
iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices
iommu/vt-d: Fix memory leak in dmar_insert_one_dev_info()
iommu/arm-smmu: Use correct address mask for CMD_TLBI_S2_IPA
iommu/arm-smmu: Ensure IAS is set correctly for AArch32-capable SMMUs
iommu/io-pgtable-arm: Don't use dma_to_phys()
Currently the RMRR entries are created only at boot time.
This means they will vanish when the domain allocated at
boot time is destroyed.
This patch makes sure that also newly allocated domains will
get RMRR mappings.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Split the part of the function that fetches the domain out
and put the rest into into a domain_prepare_identity_map, so
that the code can also be used with when the domain is
already known.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Pull IOVA fixes from David Woodhouse:
"The main fix here is the first one, fixing the over-allocation of
size-aligned requests. The other patches simply make the existing
IOVA code available to users other than the Intel VT-d driver, with no
functional change.
I concede the latter really *should* have been submitted during the
merge window, but since it's basically risk-free and people are
waiting to build on top of it and it's my fault I didn't get it in, I
(and they) would be grateful if you'd take it"
* git://git.infradead.org/intel-iommu:
iommu: Make the iova library a module
iommu: iova: Export symbols
iommu: iova: Move iova cache management to the iova library
iommu/iova: Avoid over-allocating when size-aligned
We are returning NULL if we are not able to attach the iommu
to the domain but while returning we missed freeing info.
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This time the IOMMU updates are mostly cleanups or fixes. No big new
features or drivers this time. In particular the changes include:
* Bigger cleanup of the Domain<->IOMMU data structures and the
code that manages them in the Intel VT-d driver. This makes
the code easier to understand and maintain, and also easier to
keep the data structures in sync. It is also a preparation
step to make use of default domains from the IOMMU core in the
Intel VT-d driver.
* Fixes for a couple of DMA-API misuses in ARM IOMMU drivers,
namely in the ARM and Tegra SMMU drivers.
* Fix for a potential buffer overflow in the OMAP iommu driver's
debug code
* A couple of smaller fixes and cleanups in various drivers
* One small new feature: Report domain-id usage in the Intel
VT-d driver to easier detect bugs where these are leaked.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJV7sCEAAoJECvwRC2XARrjz3YP/Au4IIfqykfPvmI0cmPhVnAV
Q72tltwkbK2u2iP+pHheveaMngJtAshsZrnhBon4KJRIt/KTLZQvsFplHDaRhPfY
yw3LIxhC5kLG/S6irY9Ozb0+uTMdQ3BU2uS23pyoFVfCz+RngBrAwDBcTKqZDCDG
8dNd+T21XlzxuyeGr58h9upz2VFtq6feoGFhLU5PNxTlf4JWZe77D7NlbSvx6Nwy
7Ai8dVRgpV9ciUP7w8FXrCUvbMZQDIoTMiWGNSlogVMgA0dllGES91UZYhWf3pil
abuX6DeFul/cOhEOnH2xa+j5zz2O/upe9stU4wAFw6IhPiAELTHc2NKlWAhwb0SY
bpDRf7dgLnUfqpmZLpWjTwN4jllc0qS2MIHj+eUu0uhdFi4Z0BuH2wSCdbR7xkqk
u5u0Jq7hDNKs5FmQTSsWSiAdjakMsRjIN7jMrBbOeZnBSmUnLx74KGPLTb63ncR3
WIOi4Iyu+LSXBIvZDiLu3lIIh7Atzd+y7IDnb8KXdyqfy+h53OZZOJNbP/qTWHgT
ZUdm/qrqjIQpTQfleOEadC7vY/y3fR5sBtOQHUamfntni3oYCc4AMRlNdf3eV9lb
Tyss6F699mU7d/vennTaIToBgVwaXdLYtmvGWjnoT/kqOMclyDf3cIUtZGtp2rJR
ddmzDA3vBUC5pGj8Hd8R
=yoGE
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates for from Joerg Roedel:
"This time the IOMMU updates are mostly cleanups or fixes. No big new
features or drivers this time. In particular the changes include:
- Bigger cleanup of the Domain<->IOMMU data structures and the code
that manages them in the Intel VT-d driver. This makes the code
easier to understand and maintain, and also easier to keep the data
structures in sync. It is also a preparation step to make use of
default domains from the IOMMU core in the Intel VT-d driver.
- Fixes for a couple of DMA-API misuses in ARM IOMMU drivers, namely
in the ARM and Tegra SMMU drivers.
- Fix for a potential buffer overflow in the OMAP iommu driver's
debug code
- A couple of smaller fixes and cleanups in various drivers
- One small new feature: Report domain-id usage in the Intel VT-d
driver to easier detect bugs where these are leaked"
* tag 'iommu-updates-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (83 commits)
iommu/vt-d: Really use upper context table when necessary
x86/vt-d: Fix documentation of DRHD
iommu/fsl: Really fix init section(s) content
iommu/io-pgtable-arm: Unmap and free table when overwriting with block
iommu/io-pgtable-arm: Move init-fn declarations to io-pgtable.h
iommu/msm: Use BUG_ON instead of if () BUG()
iommu/vt-d: Access iomem correctly
iommu/vt-d: Make two functions static
iommu/vt-d: Use BUG_ON instead of if () BUG()
iommu/vt-d: Return false instead of 0 in irq_remapping_cap()
iommu/amd: Use BUG_ON instead of if () BUG()
iommu/amd: Make a symbol static
iommu/amd: Simplify allocation in irq_remapping_alloc()
iommu/tegra-smmu: Parameterize number of TLB lines
iommu/tegra-smmu: Factor out tegra_smmu_set_pde()
iommu/tegra-smmu: Extract tegra_smmu_pte_get_use()
iommu/tegra-smmu: Use __GFP_ZERO to allocate zeroed pages
iommu/tegra-smmu: Remove PageReserved manipulation
iommu/tegra-smmu: Convert to use DMA API
iommu/tegra-smmu: smmu_flush_ptc() wants device addresses
...
Pull SG updates from Jens Axboe:
"This contains a set of scatter-gather related changes/fixes for 4.3:
- Add support for limited chaining of sg tables even for
architectures that do not set ARCH_HAS_SG_CHAIN. From Christoph.
- Add sg chain support to target_rd. From Christoph.
- Fixup open coded sg->page_link in crypto/omap-sham. From
Christoph.
- Fixup open coded crypto ->page_link manipulation. From Dan.
- Also from Dan, automated fixup of manual sg_unmark_end()
manipulations.
- Also from Dan, automated fixup of open coded sg_phys()
implementations.
- From Robert Jarzmik, addition of an sg table splitting helper that
drivers can use"
* 'for-4.3/sg' of git://git.kernel.dk/linux-block:
lib: scatterlist: add sg splitting function
scatterlist: use sg_phys()
crypto/omap-sham: remove an open coded access to ->page_link
scatterlist: remove open coded sg_unmark_end instances
crypto: replace scatterwalk_sg_chain with sg_chain
target/rd: always chain S/G list
scatterlist: allow limited chaining without ARCH_HAS_SG_CHAIN
There is a bug in iommu_context_addr() which will always use
the lower context table, even when the upper context table
needs to be used. Fix this issue.
Fixes: 03ecc32c52 ("iommu/vt-d: support extended root and context entries")
Reported-by: Xiao, Nan <nan.xiao@hp.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When a 'struct device_domain_info' is created as an alias
for another device, this struct will not be re-used when the
real device is encountered. Fix that to avoid duplicate
device_domain_info structures being added.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
For devices without an PCI alias there will be two
device_domain_info structures added. Prevent that by
checking if the alias is different from the device.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This struct contains all necessary information for the
function already. Also handle the info->dev == NULL case
while at it.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The code in the locked section does not touch anything
protected by the dmar_global_lock. Remove it from there.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When this lock is held the device_domain_lock is also
required to make sure the device_domain_info does not vanish
while in use. So this lock can be removed as it gives no
additional protection.
Signed-off-by: Joerg Roedel <jroedel@suse.de>