Add a vma_is_dax() helper macro to test whether the VMA is DAX, and use it
in zap_huge_pmd() and __split_huge_page_pmd().
[akpm@linux-foundation.org: fix build]
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Undo the change which "userfaultfd: call handle_userfault() for
userfaultfd_missing() faults" made to set_huge_zero_page(). DAX will
need that return value.
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In order to handle the !CONFIG_TRANSPARENT_HUGEPAGES case, we need to
return VM_FAULT_FALLBACK from the inlined dax_pmd_fault(), which is
defined in linux/mm.h. Given that we don't want to include <linux/mm.h>
in <linux/fs.h>, the easiest solution is to move the DAX-related
functions to a new header, <linux/dax.h>. We could also have moved
VM_FAULT_* definitions to a new header, or a different header that isn't
quite such a boil-the-ocean header as <linux/mm.h>, but this felt like
the best option.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This series of patches adds support for using PMD page table entries to
map DAX files. We expect NV-DIMMs to start showing up that are many
gigabytes in size and the memory consumption of 4kB PTEs will be
astronomical.
The patch series leverages much of the Transparant Huge Pages
infrastructure, going so far as to borrow one of Kirill's patches from
his THP page cache series.
This patch (of 10):
Since we're going to have huge pages in page cache, we need to call adjust
file-backed VMA, which potentially can contain huge pages.
For now we call it for all VMAs.
Probably later we will need to introduce a flag to indicate that the VMA
has huge pages.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Test-case:
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <assert.h>
void *find_vdso_vaddr(void)
{
FILE *perl;
char buf[32] = {};
perl = popen("perl -e 'open STDIN,qq|/proc/@{[getppid]}/maps|;"
"/^(.*?)-.*vdso/ && print hex $1 while <>'", "r");
fread(buf, sizeof(buf), 1, perl);
fclose(perl);
return (void *)atol(buf);
}
#define PAGE_SIZE 4096
int main(void)
{
void *vdso = find_vdso_vaddr();
assert(vdso);
// of course they should differ, and they do so far
printf("vdso pages differ: %d\n",
!!memcmp(vdso, vdso + PAGE_SIZE, PAGE_SIZE));
// split into 2 vma's
assert(mprotect(vdso, PAGE_SIZE, PROT_READ) == 0);
// force another fault on the next check
assert(madvise(vdso, 2 * PAGE_SIZE, MADV_DONTNEED) == 0);
// now they no longer differ, the 2nd vm_pgoff is wrong
printf("vdso pages differ: %d\n",
!!memcmp(vdso, vdso + PAGE_SIZE, PAGE_SIZE));
return 0;
}
Output:
vdso pages differ: 1
vdso pages differ: 0
This is because split_vma() correctly updates ->vm_pgoff, but the logic
in insert_vm_struct() and special_mapping_fault() is absolutely broken,
so the fault at vdso + PAGE_SIZE return the 1st page. The same happens
if you simply unmap the 1st page.
special_mapping_fault() does:
pgoff = vmf->pgoff - vma->vm_pgoff;
and this is _only_ correct if vma->vm_start mmaps the first page from
->vm_private_data array.
vdso or any other user of install_special_mapping() is not anonymous,
it has the "backing storage" even if it is just the array of pages.
So we actually need to make vm_pgoff work as an offset in this array.
Note: this also allows to fix another problem: currently gdb can't access
"[vvar]" memory because in this case special_mapping_fault() doesn't work.
Now that we can use ->vm_pgoff we can implement ->access() and fix this.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
special_mapping_fault() is absolutely broken. It seems it was always
wrong, but this didn't matter until vdso/vvar started to use more than
one page.
And after this change vma_is_anonymous() becomes really trivial, it
simply checks vm_ops == NULL. However, I do think the helper makes
sense. There are a lot of ->vm_ops != NULL checks, the helper makes the
caller's code more understandable (self-documented) and this is more
grep-friendly.
This patch (of 3):
Preparation. Add the new simple helper, vma_is_anonymous(vma), and change
handle_pte_fault() to use it. It will have more users.
The name is not accurate, say a hpet_mmap()'ed vma is not anonymous.
Perhaps it should be named vma_has_fault() instead. But it matches the
logic in mmap.c/memory.c (see next changes). "True" just means that a
page fault will use do_anonymous_page().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On 32-bit:
userfaultfd.c: In function 'locking_thread':
userfaultfd.c:152: warning: left shift count >= width of type
userfaultfd.c: In function 'uffd_poll_thread':
userfaultfd.c:295: warning: cast to pointer from integer of different size
userfaultfd.c: In function 'uffd_read_thread':
userfaultfd.c:332: warning: cast to pointer from integer of different size
Fix the shift warning by splitting the shift in two parts, and the
integer/pointer warnigns by adding intermediate casts to "unsigned long".
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When seq_show_option (commit a068acf2ee: "fs: create and use
seq_show_option for escaping") was merged, it did not correctly collide
with cgroup's addition of legacy_name (commit 3e1d2eed39: "cgroup:
introduce cgroup_subsys->legacy_name") changes.
This fixes the reported name.
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1/ Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
mechanism for adding device-driver-discovered memory regions to the
kernel's direct map. This facility is used by the pmem driver to
enable pfn_to_page() operations on the page frames returned by DAX
('direct_access' in 'struct block_device_operations'). For now, the
'memmap' allocation for these "device" pages comes from "System
RAM". Support for allocating the memmap from device memory will
arrive in a later kernel.
2/ Introduce memremap() to replace usages of ioremap_cache() and
ioremap_wt(). memremap() drops the __iomem annotation for these
mappings to memory that do not have i/o side effects. The
replacement of ioremap_cache() with memremap() is limited to the
pmem driver to ease merging the api change in v4.3. Completion of
the conversion is targeted for v4.4.
3/ Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
driver, update the VFS DAX implementation and PMEM api to provide
persistence guarantees for kernel operations on a DAX mapping.
4/ Convert the ACPI NFIT 'BLK' driver to map the block apertures as
cacheable to improve performance.
5/ Miscellaneous updates and fixes to libnvdimm including support
for issuing "address range scrub" commands, clarifying the optimal
'sector size' of pmem devices, a clarification of the usage of the
ACPI '_STA' (status) property for DIMM devices, and other minor
fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJV6Nx7AAoJEB7SkWpmfYgCWyYQAI5ju6Gvw27RNFtPovHcZUf5
JGnxXejI6/AqeTQ+IulgprxtEUCrXOHjCDA5dkjr1qvsoqK1qxug+vJHOZLgeW0R
OwDtmdW4Qrgeqm+CPoxETkorJ8wDOc8mol81kTiMgeV3UqbYeeHIiTAmwe7VzZ0C
nNdCRDm5g8dHCjTKcvK3rvozgyoNoWeBiHkPe76EbnxDICxCB5dak7XsVKNMIVFQ
NuYlnw6IYN7+rMHgpgpRux38NtIW8VlYPWTmHExejc2mlioWMNBG/bmtwLyJ6M3e
zliz4/cnonTMUaizZaVozyinTa65m7wcnpjK+vlyGV2deDZPJpDRvSOtB0lH30bR
1gy+qrKzuGKpaN6thOISxFLLjmEeYwzYd7SvC9n118r32qShz+opN9XX0WmWSFlA
sajE1ehm4M7s5pkMoa/dRnAyR8RUPu4RNINdQ/Z9jFfAOx+Q26rLdQXwf9+uqbEb
bIeSQwOteK5vYYCstvpAcHSMlJAglzIX5UfZBvtEIJN7rlb0VhmGWfxAnTu+ktG1
o9cqAt+J4146xHaFwj5duTsyKhWb8BL9+xqbKPNpXEp+PbLsrnE/+WkDLFD67jxz
dgIoK60mGnVXp+16I2uMqYYDgAyO5zUdmM4OygOMnZNa1mxesjbDJC6Wat1Wsndn
slsw6DkrWT60CRE42nbK
=o57/
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"This update has successfully completed a 0day-kbuild run and has
appeared in a linux-next release. The changes outside of the typical
drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the
removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and
the introduction of ZONE_DEVICE + devm_memremap_pages().
Summary:
- Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
mechanism for adding device-driver-discovered memory regions to the
kernel's direct map.
This facility is used by the pmem driver to enable pfn_to_page()
operations on the page frames returned by DAX ('direct_access' in
'struct block_device_operations').
For now, the 'memmap' allocation for these "device" pages comes
from "System RAM". Support for allocating the memmap from device
memory will arrive in a later kernel.
- Introduce memremap() to replace usages of ioremap_cache() and
ioremap_wt(). memremap() drops the __iomem annotation for these
mappings to memory that do not have i/o side effects. The
replacement of ioremap_cache() with memremap() is limited to the
pmem driver to ease merging the api change in v4.3.
Completion of the conversion is targeted for v4.4.
- Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
driver, update the VFS DAX implementation and PMEM api to provide
persistence guarantees for kernel operations on a DAX mapping.
- Convert the ACPI NFIT 'BLK' driver to map the block apertures as
cacheable to improve performance.
- Miscellaneous updates and fixes to libnvdimm including support for
issuing "address range scrub" commands, clarifying the optimal
'sector size' of pmem devices, a clarification of the usage of the
ACPI '_STA' (status) property for DIMM devices, and other minor
fixes"
* tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (34 commits)
libnvdimm, pmem: direct map legacy pmem by default
libnvdimm, pmem: 'struct page' for pmem
libnvdimm, pfn: 'struct page' provider infrastructure
x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB
add devm_memremap_pages
mm: ZONE_DEVICE for "device memory"
mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h
dax: drop size parameter to ->direct_access()
nd_blk: change aperture mapping from WC to WB
nvdimm: change to use generic kvfree()
pmem, dax: have direct_access use __pmem annotation
dax: update I/O path to do proper PMEM flushing
pmem: add copy_from_iter_pmem() and clear_pmem()
pmem, x86: clean up conditional pmem includes
pmem: remove layer when calling arch_has_wmb_pmem()
pmem, x86: move x86 PMEM API to new pmem.h header
libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option
pmem: switch to devm_ allocations
devres: add devm_memremap
libnvdimm, btt: write and validate parent_uuid
...
Pull misc kbuild updates from Michal Marek:
- deb-pkg:
+ module signing fix
+ dtb files are added to the package
+ do not require `hostname -f` to work during build
+ make deb-pkg generates a source package, bindeb-pkg has been
added to only generate the binary package
- rpm-pkg packages /lib/modules as well
- new coccinelle patch and updates to existing ones
- new stackusage & stackdelta script to collect and compare stack usage
info (using gcc's -fstack-usage)
- make tags understands trace_*_rcuidle() macros
- .gitignore updates, misc cleanups
* 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (27 commits)
deb-pkg: add source package
package/Makefile: move source tar creation to a function
scripts: add stackdelta script
kbuild: remove *.su files generated by -fstack-usage
.gitignore: add *.su pattern
scripts: add stackusage script
kbuild: avoid listing /lib/modules in kernel spec file
fallback to hostname in scripts/package/builddeb
coccinelle: api: extend spatch for dropping unnecessary owner
deb-pkg: simplify directory creation
scripts/tags.sh: Include trace_*_rcuidle() in tags
scripts/package/Makefile: rpmbuild is needed for rpm targets
Kbuild: Add ID files to .gitignore
gitignore: Add MIPS vmlinux.32 to the list
coccinelle: simple_return: Add a blank line
coccinelle: irqf_oneshot.cocci: Improve the generated commit log
coccinelle: api: add vma_pages.cocci
scripts/coccinelle/misc/irqf_oneshot.cocci: Fix grammar
scripts/coccinelle/misc/semicolon.cocci: Use imperative mood
coccinelle: simple_open: Use imperative mood
...
Pull kconfig updates from Michal Marek:
- kconfig warns about junk characters in Kconfig files
- merge_config.sh error handling
- small cleanup
* 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
merge_config.sh: exit on missing input files
kconfig: Regenerate shipped zconf.{hash,lex}.c files
kconfig: warn of unhandled characters in Kconfig commands
kconfig: Delete unnecessary checks before the function call "sym_calc_value"
The changes with more meat are:
o Allowing the trace event filters to filter on CPU number and process ids
o Two new markers for trace output latency were added
(10 and 100 msec latencies)
o Have tracing_thresh filter function profiling time
I also worked on modifying the ring buffer code for some future
work, and moved the adding of the timestamp around. One of my changes
caused a regression, and since other changes were built on top of it
and already tested, I had to operate a revert of that change. Instead
of rebasing, this change set has the code that caused a regression
as well as the code to revert that change without touching the other
changes that were made on top of it.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJV6aZEAAoJEEjnJuOKh9ldrR4H/A1RcQf1prLLoUibPP4w3lat
dmQcdpS1NY+cqyiKuKPAOkFDGQL7qWzRqZ8whcPSJIsHq57ufqNSLf+0bbQYPzg9
g3CgGL7OApmGi5ulj0sNxhadvc9TFm/SAN0nVJlNuUWdm8e1UWHLsrJZaMfopu2r
RDEtkOhg619mhDL4rktNdS6rk0B92Fhu2o2PwLZPVlUl1NNEt4WJU+ejitXUVO1A
Nb70/rTGGJKtyHbW+74on4LnEN5Uu0Viu6rMwGfYyIgRmC2otdBDvE4xfKMiTUKr
SzBjzrhIoMIRn4Vl0vElfulkpYaw7pcC2BdpZ4d9VpIOiLSlZs0x/TgCtpFEv5M=
=baZ3
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing update from Steven Rostedt:
"Mostly this is just clean ups and micro optimizations.
The changes with more meat are:
- Allowing the trace event filters to filter on CPU number and
process ids
- Two new markers for trace output latency were added (10 and 100
msec latencies)
- Have tracing_thresh filter function profiling time
I also worked on modifying the ring buffer code for some future work,
and moved the adding of the timestamp around. One of my changes
caused a regression, and since other changes were built on top of it
and already tested, I had to operate a revert of that change. Instead
of rebasing, this change set has the code that caused a regression as
well as the code to revert that change without touching the other
changes that were made on top of it"
* tag 'trace-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ring-buffer: Revert "ring-buffer: Get timestamp after event is allocated"
tracing: Don't make assumptions about length of string on task rename
tracing: Allow triggers to filter for CPU ids and process names
ftrace: Format MCOUNT_ADDR address as type unsigned long
tracing: Introduce two additional marks for delay
ftrace: Fix function_graph duration spacing with 7-digits
ftrace: add tracing_thresh to function profile
tracing: Clean up stack tracing and fix fentry updates
ring-buffer: Reorganize function locations
ring-buffer: Make sure event has enough room for extend and padding
ring-buffer: Get timestamp after event is allocated
ring-buffer: Move the adding of the extended timestamp out of line
ring-buffer: Add event descriptor to simplify passing data
ftrace: correct the counter increment for trace_buffer data
tracing: Fix for non-continuous cpu ids
tracing: Prefer kcalloc over kzalloc with multiply
The function device_get_mac_address is trying different property names
in order to get the mac address. To check the return value, the variable
addr (which contain the buffer pass by the caller) will be re-used. This
means that if the previous property is not found, the next property will
be read using a NULL buffer.
Therefore it's only possible to retrieve the mac if node contains a
property "mac-address". Fix it by using a temporary buffer for the
return value.
This has been introduced by commit 4c96b7dc0d
"Add a matching set of device_ functions for determining mac/phy"
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Reviewed-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull audit update from Paul Moore:
"This is one of the larger audit patchsets in recent history,
consisting of eight patches and almost 400 lines of changes.
The bulk of the patchset is the new "audit by executable"
functionality which allows admins to set an audit watch based on the
executable on disk. Prior to this, admins could only track an
application by PID, which has some obvious limitations.
Beyond the new functionality we also have some refcnt fixes and a few
minor cleanups"
* 'upstream' of git://git.infradead.org/users/pcmoore/audit:
fixup: audit: implement audit by executable
audit: implement audit by executable
audit: clean simple fsnotify implementation
audit: use macros for unset inode and device values
audit: make audit_del_rule() more robust
audit: fix uninitialized variable in audit_add_rule()
audit: eliminate unnecessary extra layer of watch parent references
audit: eliminate unnecessary extra layer of watch references
The race may happen when a device (e.g. YOTA 4G LTE Modem) is
unplugged while the system is downloading a large file from the Net.
Hardware breakpoints and Kprobes with delays were used to confirm that
the race does actually happen.
The race is on skb_queue ('next' pointer) between usbnet_stop()
and rx_complete(), which, in turn, calls usbnet_bh().
Here is a part of the call stack with the code where the changes to the
queue happen. The line numbers are for the kernel 4.1.0:
*0 __skb_unlink (skbuff.h:1517)
prev->next = next;
*1 defer_bh (usbnet.c:430)
spin_lock_irqsave(&list->lock, flags);
old_state = entry->state;
entry->state = state;
__skb_unlink(skb, list);
spin_unlock(&list->lock);
spin_lock(&dev->done.lock);
__skb_queue_tail(&dev->done, skb);
if (dev->done.qlen == 1)
tasklet_schedule(&dev->bh);
spin_unlock_irqrestore(&dev->done.lock, flags);
*2 rx_complete (usbnet.c:640)
state = defer_bh(dev, skb, &dev->rxq, state);
At the same time, the following code repeatedly checks if the queue is
empty and reads these values concurrently with the above changes:
*0 usbnet_terminate_urbs (usbnet.c:765)
/* maybe wait for deletions to finish. */
while (!skb_queue_empty(&dev->rxq)
&& !skb_queue_empty(&dev->txq)
&& !skb_queue_empty(&dev->done)) {
schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS));
set_current_state(TASK_UNINTERRUPTIBLE);
netif_dbg(dev, ifdown, dev->net,
"waited for %d urb completions\n", temp);
}
*1 usbnet_stop (usbnet.c:806)
if (!(info->flags & FLAG_AVOID_UNLINK_URBS))
usbnet_terminate_urbs(dev);
As a result, it is possible, for example, that the skb is removed from
dev->rxq by __skb_unlink() before the check
"!skb_queue_empty(&dev->rxq)" in usbnet_terminate_urbs() is made. It is
also possible in this case that the skb is added to dev->done queue
after "!skb_queue_empty(&dev->done)" is checked. So
usbnet_terminate_urbs() may stop waiting and return while dev->done
queue still has an item.
Locking in defer_bh() and usbnet_terminate_urbs() was revisited to avoid
this race.
Signed-off-by: Eugene Shatokhin <eugene.shatokhin@rosalab.ru>
Reviewed-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull security subsystem updates from James Morris:
"Highlights:
- PKCS#7 support added to support signed kexec, also utilized for
module signing. See comments in 3f1e1bea.
** NOTE: this requires linking against the OpenSSL library, which
must be installed, e.g. the openssl-devel on Fedora **
- Smack
- add IPv6 host labeling; ignore labels on kernel threads
- support smack labeling mounts which use binary mount data
- SELinux:
- add ioctl whitelisting (see
http://kernsec.org/files/lss2015/vanderstoep.pdf)
- fix mprotect PROT_EXEC regression caused by mm change
- Seccomp:
- add ptrace options for suspend/resume"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (57 commits)
PKCS#7: Add OIDs for sha224, sha284 and sha512 hash algos and use them
Documentation/Changes: Now need OpenSSL devel packages for module signing
scripts: add extract-cert and sign-file to .gitignore
modsign: Handle signing key in source tree
modsign: Use if_changed rule for extracting cert from module signing key
Move certificate handling to its own directory
sign-file: Fix warning about BIO_reset() return value
PKCS#7: Add MODULE_LICENSE() to test module
Smack - Fix build error with bringup unconfigured
sign-file: Document dependency on OpenSSL devel libraries
PKCS#7: Appropriately restrict authenticated attributes and content type
KEYS: Add a name for PKEY_ID_PKCS7
PKCS#7: Improve and export the X.509 ASN.1 time object decoder
modsign: Use extract-cert to process CONFIG_SYSTEM_TRUSTED_KEYS
extract-cert: Cope with multiple X.509 certificates in a single file
sign-file: Generate CMS message as signature instead of PKCS#7
PKCS#7: Support CMS messages also [RFC5652]
X.509: Change recorded SKID & AKID to not include Subject or Issuer
PKCS#7: Check content type and versions
MAINTAINERS: The keyrings mailing list has moved
...
Pull NMI backtrace update from Russell King:
"These changes convert the x86 NMI handling to be a library
implementation which other architectures can make use of. Thomas
Gleixner has reviewed and tested these changes, and wishes me to send
these rather than taking them through the tip tree.
The final patch in the set adds an initial implementation using this
infrastructure to ARM, even though it doesn't send the IPI at "NMI"
level. Patches are in progress to add the ARM equivalent of NMI, but
we still need the IRQ-level fallback for systems where the "NMI" isn't
available due to secure firmware denying access to it"
* 'nmi' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: add basic support for on-demand backtrace of other CPUs
nmi: x86: convert to generic nmi handler
nmi: create generic NMI backtrace implementation
- Convert xen-blkfront to the multiqueue API
- [arm] Support binding event channels to different VCPUs.
- [x86] Support > 512 GiB in a PV guests (off by default as such a
guest cannot be migrated with the current toolstack).
- [x86] PMU support for PV dom0 (limited support for using perf with
Xen and other guests).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJV7wIdAAoJEFxbo/MsZsTR0hEH/04HTKLKGnSJpZ5WbMPxqZxE
UqGlvhvVWNAmFocZmbPcEi9T1qtcFrX5pM55JQr6UmAp3ovYsT2q1Q1kKaOaawks
pSfc/YEH3oQW5VUQ9Lm9Ru5Z8Btox0WrzRREO92OF36UOgUOBOLkGsUfOwDinNIM
lSk2djbYwDYAsoeC3PHB32wwMI//Lz6B/9ZVXcyL6ULynt1ULdspETjGnptRPZa7
JTB5L4/soioKOn18HDwwOhKmvaFUPQv9Odnv7dc85XwZreajhM/KMu3qFbMDaF/d
WVB1NMeCBdQYgjOrUjrmpyr5uTMySiQEG54cplrEKinfeZgKlEyjKvjcAfJfiac=
=Ktjl
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.3-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from David Vrabel:
"Xen features and fixes for 4.3:
- Convert xen-blkfront to the multiqueue API
- [arm] Support binding event channels to different VCPUs.
- [x86] Support > 512 GiB in a PV guests (off by default as such a
guest cannot be migrated with the current toolstack).
- [x86] PMU support for PV dom0 (limited support for using perf with
Xen and other guests)"
* tag 'for-linus-4.3-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (33 commits)
xen: switch extra memory accounting to use pfns
xen: limit memory to architectural maximum
xen: avoid another early crash of memory limited dom0
xen: avoid early crash of memory limited dom0
arm/xen: Remove helpers which are PV specific
xen/x86: Don't try to set PCE bit in CR4
xen/PMU: PMU emulation code
xen/PMU: Intercept PMU-related MSR and APIC accesses
xen/PMU: Describe vendor-specific PMU registers
xen/PMU: Initialization code for Xen PMU
xen/PMU: Sysfs interface for setting Xen PMU mode
xen: xensyms support
xen: remove no longer needed p2m.h
xen: allow more than 512 GB of RAM for 64 bit pv-domains
xen: move p2m list if conflicting with e820 map
xen: add explicit memblock_reserve() calls for special pages
mm: provide early_memremap_ro to establish read-only mapping
xen: check for initrd conflicting with e820 map
xen: check pre-allocated page tables for conflict with memory map
xen: check for kernel memory conflicting with memory layout
...
Pull more irq updates from Thomas Gleixner:
"The second part of irq related updates:
- Provide EOImode for GIC[V3] irq chips, which is a prerequisite for
direct interrupt handling in [KVM] guests"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/GIC: Fix EOImode setting for non-DT/ACPI systems
irqchip/GIC: Don't deactivate interrupts forwarded to a guest
irqchip/GIC: Convert to EOImode == 1
irqchip/GICv3: Don't deactivate interrupts forwarded to a guest
irqchip/GICv3: Convert to EOImode == 1
Pull m68k/colfire fixes from Greg Ungerer:
"Only a couple of patches this time. One migrating the clock driver
code to the new set-state interface. The other cleaning up to use the
PFN_DOWN macro"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68k/coldfire: use PFN_DOWN macro
m68k/coldfire/pit: Migrate to new 'set-state' interface
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCgAGBQJV7v55AAoJENaSAD2qAscKAwoQAKkBJiRk4s8xmhHh++qgMxKr
JPOJqiwsV86KWMrxzT7xYO9IUr/D8zFo0m7mXydMeJvRZl1MJe9atnoZL9xRuvch
+ALh6i2c1o/tjuWze4ewMatxMiWffJ7W72p8dcihDiacDp0gxArd6zqaURjQRyns
nwk/vt41PerXv2dloFFsnC3XibojwS5/dltM8B98dzdP8lj9XYSHAhwhsa4/MUvF
pkv/PGuoDuuK0mbNaCEvjy2c40qjETlW0Hn/YkvJhBYYDR3awuXqCvNO+dZbhoym
NeHb7Jfvzkagfd8v5Us9aQ0lK1ABTjmRdW5zGIymCke5HbpzzkMMKfAJKBvx186I
M3eNnztc78lOjWOfqrfwOHM3/X1ELKiHgrZcG3KR+XX9J3fCmp7fL5EYL+JDf86H
2FPEU1RZue3oJUo3PYhCbEvWekUyHbG/18JHxRXAwVsUpcjeHprfvmn7hBN0sFPs
hom3Mcc1IMUuos00Mj3H6CxpD2xe2+77L6VngQ681zNIhkG2nnCLx5zNPUx7d+68
RjV/KKs17/YNGZKxuNVD//RYJqx58NyM2BwBQ7caVm3dPWHa5tLoAxJstFjhKwhZ
lon7mS4odEidSU4s1apEtoTeCjQMSAMgtQbTHUj+p+dNXx+CVdXXFaMqNvBHhCEI
5h01NR6QzfE8QGzLUcXf
=ejQ5
-----END PGP SIGNATURE-----
Merge tag 'ecryptfs-4.3-rc1-stale-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull ecryptfs fixes from Tyler Hicks:
"Invalidate stale eCryptfs dcache entries caused by unlinked lower
inodes"
* tag 'ecryptfs-4.3-rc1-stale-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: Delete a check before the function call "key_put"
eCryptfs: Invalidate dcache entries when lower i_nlink is zero
Pull crypto fix from Herbert Xu:
"This fixes a memory corruption bug in ghash-clmulni-intel due to
insufficient memory allocation"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: ghash-clmulni: specify context size for ghash async algorithm
It changed as result of other syscalls, and while the system call list
itself was correctly updated, the selftest program was not.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The variable xen_store_mfn is effectively storing a GFN and not an MFN.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
The privcmd code is mixing the usage of GFN and MFN within the same
functions which make the code difficult to understand when you only work
with auto-translated guests.
The privcmd driver is only dealing with GFN so replace all the mention
of MFN into GFN.
The ioctl structure used to map foreign change has been left unchanged
given that the userspace is using it. Nonetheless, add a comment to
explain the expected value within the "mfn" field.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
HVM_PARAM_CONSOLE_PFN is used to retrieved the console PFN for HVM
guest. It returns a PFN (aka GFN) and not a MFN.
Furthermore, use directly virt_to_gfn for both PV and HVM domain rather
than doing a special case for each of the them.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
The PV driver xen-fbfront is only dealing with GFN and not MFN. Rename
all the occurence of MFN to GFN.
Also take the opportunity to replace to usage of pfn_to_gfn by
xen_page_to_gfn.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
All the caller of xen_tmem_{get,put}_page have a struct page * in hand
and call pfn_to_gfn for the only benefits of these 2 functions.
Rather than passing the pfn in parameter, pass directly the page and use
directly xen_page_to_gfn.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Based on include/xen/mm.h [1], Linux is mistakenly using MFN when GFN
is meant, I suspect this is because the first support for Xen was for
PV. This resulted in some misimplementation of helpers on ARM and
confused developers about the expected behavior.
For instance, with pfn_to_mfn, we expect to get an MFN based on the name.
Although, if we look at the implementation on x86, it's returning a GFN.
For clarity and avoid new confusion, replace any reference to mfn with
gfn in any helpers used by PV drivers. The x86 code will still keep some
reference of pfn_to_mfn which may be used by all kind of guests
No changes as been made in the hypercall field, even
though they may be invalid, in order to keep the same as the defintion
in xen repo.
Note that page_to_mfn has been renamed to xen_page_to_gfn to avoid a
name to close to the KVM function gfn_to_page.
Take also the opportunity to simplify simple construction such
as pfn_to_mfn(page_to_pfn(page)) into xen_page_to_gfn. More complex clean up
will come in follow-up patches.
[1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=e758ed14f390342513405dd766e874934573e6cb
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
After the commit introducing convertion between DMA and guest addresses,
all the callers of pfn_to_mfn are expecting to get a GFN (Guest Frame
Number). On ARM, all the guests are auto-translated so the GFN is equal
to the Linux PFN (Pseudo-physical Frame Number).
The current implementation may return an MFN if the caller is passing a
PFN associated to a mapped foreign grant. In pratice, I haven't seen
the problem on running guest but we should fix it for the sake of
correctness.
Correct the implementation by always returning the pfn passed in parameter.
A follow-up patch will take care to rename pfn_to_mfn to a suitable
name.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
The swiotlb is required when programming a DMA address on ARM when a
device is not protected by an IOMMU.
In this case, the DMA address should always be equal to the machine address.
For DOM0 memory, Xen ensure it by have an identity mapping between the
guest address and host address. However, when mapping a foreign grant
reference, the 1:1 model doesn't work.
For ARM guest, most of the callers of pfn_to_mfn expects to get a GFN
(Guest Frame Number), i.e a PFN (Page Frame Number) from the Linux point
of view given that all ARM guest are auto-translated.
Even though the name pfn_to_mfn is misleading, we need to ensure that
those caller get a GFN and not by mistake a MFN. In pratical, I haven't
seen error related to this but we should fix it for the sake of
correctness.
In order to fix the implementation of pfn_to_mfn on ARM in a follow-up
patch, we have to introduce new helpers to return the DMA from a PFN and
the invert.
On x86, the new helpers will be an alias of pfn_to_mfn and mfn_to_pfn.
The helpers will be used in swiotlb and xen_biovec_phys_mergeable.
This is necessary in the latter because we have to ensure that the
biovec code will not try to merge a biovec using foreign page and
another using Linux memory.
Lastly, the helper mfn_to_local_pfn has been renamed to bfn_to_local_pfn
given that the only usage was in swiotlb.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
No need to use CONFIG_SMP around update_cr16_clocksource(). It checks for
num_online_cpus() beeing greater than 1, which is always 1 in UP builds.
Signed-off-by: Helge Deller <deller@gmx.de>
Instead of using physical addresses for accounting of extra memory
areas available for ballooning switch to pfns as this is much less
error prone regarding partial pages.
Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
When a pv-domain (including dom0) is started it tries to size it's
p2m list according to the maximum possible memory amount it ever can
achieve. Limit the initial maximum memory size to the architectural
limit of the hardware in order to avoid overflows during remapping
of memory.
This problem will occur when dom0 is started with an initial memory
size being a multiple of 1GB, but without specifying it's maximum
memory size. The kernel must be configured without
CONFIG_XEN_BALLOON_MEMORY_HOTPLUG for the problem to happen.
Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Commit b1c9f169047b ("xen: split counting of extra memory pages...")
introduced an error when dom0 was started with limited memory occurring
only on some hardware.
The problem arises in case dom0 is started with initial memory and
maximum memory being the same. The kernel must be configured without
CONFIG_XEN_BALLOON_MEMORY_HOTPLUG for the problem to happen. If all
of this is true and the E820 map of the machine is sparse (some areas
are not covered) then the machine might crash early in the boot
process.
An example E820 map triggering the problem looks like this:
[ 0.000000] e820: BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009d7ff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009d800-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000cf7fafff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000cf7fb000-0x00000000cf95ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000cf960000-0x00000000cfb62fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000cfb63000-0x00000000cfd14fff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000cfd15000-0x00000000cfd61fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000cfd62000-0x00000000cfd6cfff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x00000000cfd6d000-0x00000000cfd6ffff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000cfd70000-0x00000000cfd70fff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000cfd71000-0x00000000cfea8fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000cfea9000-0x00000000cfeb9fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000cfeba000-0x00000000cfecafff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000cfecb000-0x00000000cfecbfff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000cfecc000-0x00000000cfedbfff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000cfedc000-0x00000000cfedcfff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000cfedd000-0x00000000cfeddfff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000cfede000-0x00000000cfee3fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000cfee4000-0x00000000cfef6fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000cfef7000-0x00000000cfefffff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fec10000-0x00000000fec10fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed00fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed40000-0x00000000fed44fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed61000-0x00000000fed70fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed80000-0x00000000fed8ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000100001000-0x000000020effffff] usable
In this case the area a0000-dffff isn't present in the map. This will
confuse the memory setup of the domain when remapping the memory from
such holes to populated areas.
To avoid the problem the accounting of to be remapped memory has to
count such holes in the E820 map as well.
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Commit b1c9f169047b ("xen: split counting of extra memory pages...")
introduced an error when dom0 was started with limited memory.
The problem arises in case dom0 is started with initial memory and
maximum memory being the same and exactly a multiple of 1 GB. The
kernel must be configured without CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
for the problem to happen. In this case it will crash very early
during boot due to the virtual mapped p2m list not being large
enough to be able to remap any memory:
(XEN) Freed 304kB init memory.
mapping kernel into physical memory
about to get started...
(XEN) traps.c:459:d0v0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at ffff82d080229a93 create_bounce_frame+0x12b/0x13a
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.5.2-pre x86_64 debug=n Not tainted ]----
(XEN) CPU: 0
(XEN) RIP: e033:[<ffffffff81d120cb>]
(XEN) RFLAGS: 0000000000000206 EM: 1 CONTEXT: pv guest (d0v0)
(XEN) rax: ffffffff81db2000 rbx: 000000004d000000 rcx: 0000000000000000
(XEN) rdx: 000000004d000000 rsi: 0000000000063000 rdi: 000000004d063000
(XEN) rbp: ffffffff81c03d78 rsp: ffffffff81c03d28 r8: 0000000000023000
(XEN) r9: 00000001040ff000 r10: 0000000000007ff0 r11: 0000000000000000
(XEN) r12: 0000000000063000 r13: 000000000004d000 r14: 0000000000000063
(XEN) r15: 0000000000000063 cr0: 0000000080050033 cr4: 00000000000006f0
(XEN) cr3: 0000000105c0f000 cr2: ffffc90000268000
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033
(XEN) Guest stack trace from rsp=ffffffff81c03d28:
(XEN) 0000000000000000 0000000000000000 ffffffff81d120cb 000000010000e030
(XEN) 0000000000010006 ffffffff81c03d68 000000000000e02b ffffffffffffffff
(XEN) 0000000000000063 000000000004d063 ffffffff81c03de8 ffffffff81d130a7
(XEN) ffffffff81c03de8 000000000004d000 00000001040ff000 0000000000105db1
(XEN) 00000001040ff001 000000000004d062 ffff8800092d6ff8 0000000002027000
(XEN) ffff8800094d8340 ffff8800092d6ff8 00003ffffffff000 ffff8800092d7ff8
(XEN) ffffffff81c03e48 ffffffff81d13c43 ffff8800094d8000 ffff8800094d9000
(XEN) 0000000000000000 ffff8800092d6000 00000000092d6000 000000004cfbf000
(XEN) 00000000092d6000 00000000052d5442 0000000000000000 0000000000000000
(XEN) ffffffff81c03ed8 ffffffff81d185c1 0000000000000000 0000000000000000
(XEN) ffffffff81c03e78 ffffffff810f8ca4 ffffffff81c03ed8 ffffffff8171a15d
(XEN) 0000000000000010 ffffffff81c03ee8 0000000000000000 0000000000000000
(XEN) ffffffff81f0e402 ffffffffffffffff ffffffff81dae900 0000000000000000
(XEN) 0000000000000000 0000000000000000 ffffffff81c03f28 ffffffff81d0cf0f
(XEN) 0000000000000000 0000000000000000 0000000000000000 ffffffff81db82e0
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) ffffffff81c03f38 ffffffff81d0c603 ffffffff81c03ff8 ffffffff81d11c86
(XEN) 0300000100000032 0000000000000005 0000000000000020 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) Domain 0 crashed: rebooting machine in 5 seconds.
This can be avoided by allocating aneough space for the p2m to cover
the maximum memory of dom0 plus the identity mapped holes required
for PCI space, BIOS etc.
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
The attached change fixes the condition used in the "sub" instruction.
A double word comparison is needed. This fixes the 64-bit LWS CAS
operation on 64-bit kernels.
I can now enable 64-bit atomic support in GCC.
Cc: <stable@vger.kernel.org>
Signed-off-by: John David Anglin <dave.anglin>
Signed-off-by: Helge Deller <deller@gmx.de>
When detecting a serial port on newer PA-RISC machines (with iosapic) we have a
long way to go to find the right IRQ line, registering it, then registering the
serial port and the irq handler for the serial port. During this phase spurious
interrupts for the serial port may happen which then crashes the kernel because
the action handler might not have been set up yet.
So, basically it's a race condition between the serial port hardware and the
CPU which sets up the necessary fields in the irq sructs. The main reason for
this race is, that we unmask the serial port irqs too early without having set
up everything properly before (which isn't easily possible because we need the
IRQ number to register the serial ports).
This patch is a work-around for this problem. It adds checks to the CPU irq
handler to verify if the IRQ action field has been initialized already. If not,
we just skip this interrupt (which isn't critical for a serial port at bootup).
The real fix would probably involve rewriting all PA-RISC specific IRQ code
(for CPU, IOSAPIC, GSC and EISA) to use IRQ domains with proper parenting of
the irq chips and proper irq enabling along this line.
This bug has been in the PA-RISC port since the beginning, but the crashes
happened very rarely with currently used hardware. But on the latest machine
which I bought (a C8000 workstation), which uses the fastest CPUs (4 x PA8900,
1GHz) and which has the largest possible L1 cache size (64MB each), the kernel
crashed at every boot because of this race. So, without this patch the machine
would currently be unuseable.
For the record, here is the flow logic:
1. serial_init_chip() in 8250_gsc.c calls iosapic_serial_irq().
2. iosapic_serial_irq() calls txn_alloc_irq() to find the irq.
3. iosapic_serial_irq() calls cpu_claim_irq() to register the CPU irq
4. cpu_claim_irq() unmasks the CPU irq (which it shouldn't!)
5. serial_init_chip() then registers the 8250 port.
Problems:
- In step 4 the CPU irq shouldn't have been registered yet, but after step 5
- If serial irq happens between 4 and 5 have finished, the kernel will crash
Signed-off-by: Helge Deller <deller@gmx.de>
Craig Estey noticed that we didn't checked for in_atomic() in our page fault
handler like other architectures. This commit adds this check by using
faulthandler_disabled() which includes a check for pagefault_disabled() and
in_atomic().
Reported-by: Craig Estey <cae370@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Commit 3a9ad0b ("PCI: Add pci_bus_addr_t") unconditionally introduced usage of
64-bit PCI bus addresses on all 64-bit platforms which broke PA-RISC.
It turned out that due to enabling the 64-bit addresses, the PCI logic decided
to use the GMMIO instead of the LMMIO region. This commit simply disables
registering the GMMIO and thus we fall back to use the LMMIO region as before.
Reverts commit 45ea2a5fed
("PCI: Don't use 64-bit bus addresses on PA-RISC")
To: linux-parisc@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Meelis Roos <mroos@linux.ee>
Cc: stable@vger.kernel.org # v3.19+
Signed-off-by: Helge Deller <deller@gmx.de>
Commit 3cc2dac5be ("drivers/video/fbdev/atyfb: Replace MTRR UC hole
with strong UC") introduces calls to ioremap_wc and ioremap_uc. This
causes build failures with parisc:allmodconfig. Map the missing
functions to ioremap_nocache.
Fixes: 3cc2dac5be ("drivers/video/fbdev/atyfb:
Replace MTRR UC hole with strong UC")
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Helge Deller <deller@gmx.de>
connector->encoder is initialized as NULL. Fix this by setting it in
during pre enable. MST connectors are not read out during initial hw
readout, and have no fixed encoder mappings. So it's harmless to
return false when the connector has never been assigned to an encoder.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Balloon device is frequently used as a mean of cooperative memory control
in between guest and host to manage memory overcommitment. This is the
typical case for any hosting workload when KVM guest is provided for
end-user.
Though there is a problem in this setup. The end-user and hosting provider
have signed SLA agreement in which some amount of memory is guaranted for
the guest. The good thing is that this memory will be given to the guest
when the guest will really need it (f.e. with OOM in guest and with
VIRTIO_BALLOON_F_DEFLATE_ON_OOM configuration flag set). The bad thing
is that end-user does not know this.
Balloon by default reduce the amount of memory exposed to the end-user
each time when the page is stolen from guest or returned back by using
adjust_managed_page_count and thus /proc/meminfo shows reduced amount
of memory.
Fortunately the solution is simple, we should just avoid to call
adjust_managed_page_count with VIRTIO_BALLOON_F_DEFLATE_ON_OOM set.
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
and rename it to release_pages_balloon. The function originally takes
arrays of pfns and now it takes pointer to struct virtio_ballon.
This change is necessary to conditionally call adjust_managed_page_count
in the next patch.
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This will allow up to DISK_MAX_PARTS (256) partitions, with for example
GPT in the guest. Otherwise, the partition scan code will only discover
the first 15 partitions.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Added the match table and pointers for ACPI probing to the driver.
This uses the same identifier for virt devices as being used for qemu
ARM64 ACPI support.
d0bf1955a3
Signed-off-by: Graeme Gregory <graeme.gregory@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
VIRTIO_BLK_F_CONFIG_WCE is important in order to achieve good performance
(up to 2x, though more realistically +30-40%) in latency-bound workloads.
However, it was removed by mistake together with VIRTIO_BLK_F_FLUSH.
It will be restored in the next revision of the virtio 1.0 standard, so
do the same in Linux.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>