* refs/heads/tmp-55b3b8c
Linux 4.4.108
alpha: fix build failures
ALSA: hda - Fix yet another i915 pointer leftover in error path
ALSA: hda - Degrade i915 binding failure message
ALSA: hda - Clear the leftover component assignment at snd_hdac_i915_exit()
Revert "Bluetooth: btusb: driver to enable the usb-wakeup feature"
MIPS: math-emu: Fix final emulation phase for certain instructions
thermal: hisilicon: Handle return value of clk_prepare_enable
cpuidle: fix broadcast control when broadcast can not be entered
rtc: set the alarm to the next expiring timer
tcp: fix under-evaluated ssthresh in TCP Vegas
fm10k: ensure we process SM mbx when processing VF mbx
scsi: lpfc: PLOGI failures during NPIV testing
scsi: lpfc: Fix secure firmware updates
PCI/AER: Report non-fatal errors only to the affected endpoint
ixgbe: fix use of uninitialized padding
igb: check memory allocation failure
PCI: Create SR-IOV virtfn/physfn links before attaching driver
scsi: mpt3sas: Fix IO error occurs on pulling out a drive from RAID1 volume created on two SATA drive
scsi: cxgb4i: fix Tx skb leak
PCI: Avoid bus reset if bridge itself is broken
net: phy: at803x: Change error to EINVAL for invalid MAC
rtc: pl031: make interrupt optional
crypto: crypto4xx - increase context and scatter ring buffer elements
backlight: pwm_bl: Fix overflow condition
bnxt_en: Fix NULL pointer dereference in reopen failure path
cpuidle: powernv: Pass correct drv->cpumask for registration
ARM: dma-mapping: disallow dma_get_sgtable() for non-kernel managed memory
netfilter: nfnetlink_queue: fix secctx memory leak
xhci: plat: Register shutdown for xhci_plat
isdn: kcapi: avoid uninitialized data
KVM: pci-assign: do not map smm memory slot pages in vt-d page tables
ARM: dts: am335x-evmsk: adjust mmc2 param to allow suspend
netfilter: nf_nat_snmp: Fix panic when snmp_trap_helper fails to register
netfilter: nfnl_cthelper: fix a race when walk the nf_ct_helper_hash table
irda: vlsi_ir: fix check for DMA mapping errors
RDMA/iser: Fix possible mr leak on device removal event
i40e: Do not enable NAPI on q_vectors that have no rings
net: Do not allow negative values for busy_read and busy_poll sysctl interfaces
bna: avoid writing uninitialized data into hw registers
s390/qeth: no ETH header for outbound AF_IUCV
r8152: prevent the driver from transmitting packets with carrier off
HID: xinmo: fix for out of range for THT 2P arcade controller.
hwmon: (asus_atk0110) fix uninitialized data access
ARM: dts: ti: fix PCI bus dtc warnings
KVM: VMX: Fix enable VPID conditions
KVM: x86: correct async page present tracepoint
scsi: lpfc: Fix PT2PT PRLI reject
pinctrl: st: add irq_request/release_resources callbacks
inet: frag: release spinlock before calling icmp_send()
netfilter: nfnl_cthelper: Fix memory leak
netfilter: nfnl_cthelper: fix runtime expectation policy updates
usb: gadget: udc: remove pointer dereference after free
usb: gadget: f_uvc: Sanity check wMaxPacketSize for SuperSpeed
net: qmi_wwan: Add USB IDs for MDM6600 modem on Motorola Droid 4
bna: integer overflow bug in debugfs
sch_dsmark: fix invalid skb_cow() usage
crypto: deadlock between crypto_alg_sem/rtnl_mutex/genl_mutex
r8152: fix the list rx_done may be used without initialization
cpuidle: Validate cpu_dev in cpuidle_add_sysfs()
arm: kprobes: Align stack to 8-bytes in test code
arm: kprobes: Fix the return address of multiple kretprobes
ALSA: hda - add support for docking station for HP 840 G3
ALSA: hda - add support for docking station for HP 820 G2
x86/irq: Do not substract irq_tlb_count from irq_call_count
sched/core: Idle_task_exit() shouldn't use switch_mm_irqs_off()
ARM: Hide finish_arch_post_lock_switch() from modules
x86/mm, sched/core: Turn off IRQs in switch_mm()
x86/mm, sched/core: Uninline switch_mm()
x86/mm: Build arch/x86/mm/tlb.c even on !SMP
sched/core: Add switch_mm_irqs_off() and use it in the scheduler
mm/mmu_context, sched/core: Fix mmu_context.h assumption
mm/rmap: batched invalidations should use existing api
x86/mm: If INVPCID is available, use it to flush global mappings
x86/mm: Add a 'noinvpcid' boot option to turn off INVPCID
x86/mm: Fix INVPCID asm constraint
x86/mm: Add INVPCID helpers
cxl: Check if vphb exists before iterating over AFU devices
arm64: Initialise high_memory global variable earlier
ANDROID: binder: Remove obsolete proc waitqueue.
Change-Id: Ie954ccd1dbd861672345bb0ee879273be4d0a441
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
* refs/heads/tmp-8a53962
Linux 4.4.105
xen-netfront: avoid crashing on resume after a failure in talk_to_netback()
usb: host: fix incorrect updating of offset
USB: usbfs: Filter flags passed in from user space
USB: devio: Prevent integer overflow in proc_do_submiturb()
USB: Increase usbfs transfer limit
USB: core: Add type-specific length check of BOS descriptors
usb: ch9: Add size macro for SSP dev cap descriptor
usb: Add USB 3.1 Precision time measurement capability descriptor support
usb: xhci: fix panic in xhci_free_virt_devices_depth_first
usb: hub: Cycle HUB power when initialization fails
Revert "ocfs2: should wait dio before inode lock in ocfs2_setattr()"
net: fec: fix multicast filtering hardware setup
xen-netfront: Improve error handling during initialization
mm: avoid returning VM_FAULT_RETRY from ->page_mkwrite handlers
tcp: correct memory barrier usage in tcp_check_space()
dmaengine: pl330: fix double lock
tipc: fix cleanup at module unload
net: sctp: fix array overrun read on sctp_timer_tbl
drm/exynos/decon5433: set STANDALONE_UPDATE_F on output enablement
NFSv4: Fix client recovery when server reboots multiple times
KVM: arm/arm64: Fix occasional warning from the timer work function
nfs: Don't take a reference on fl->fl_file for LOCK operation
ravb: Remove Rx overflow log messages
net/appletalk: Fix kernel memory disclosure
vti6: fix device register to report IFLA_INFO_KIND
ARM: OMAP1: DMA: Correct the number of logical channels
net: systemport: Pad packet before inserting TSB
net: systemport: Utilize skb_put_padto()
kprobes/x86: Disable preemption in ftrace-based jprobes
perf test attr: Fix ignored test case result
sysrq : fix Show Regs call trace on ARM
EDAC, sb_edac: Fix missing break in switch
x86/entry: Use SYSCALL_DEFINE() macros for sys_modify_ldt()
serial: 8250: Preserve DLD[7:4] for PORT_XR17V35X
usb: phy: tahvo: fix error handling in tahvo_usb_probe()
spi: sh-msiof: Fix DMA transfer size check
serial: 8250_fintek: Fix rs485 disablement on invalid ioctl()
selftests/x86/ldt_get: Add a few additional tests for limits
s390/pci: do not require AIS facility
ima: fix hash algorithm initialization
USB: serial: option: add Quectel BG96 id
s390/runtime instrumentation: simplify task exit handling
serial: 8250_pci: Add Amazon PCI serial device ID
usb: quirks: Add no-lpm quirk for KY-688 USB 3.1 Type-C Hub
uas: Always apply US_FL_NO_ATA_1X quirk to Seagate devices
bcache: recover data from backing when data is clean
bcache: only permit to recovery read error when cache device is clean
ANDROID: initramfs: call free_initrd() when skipping init
Conflicts:
drivers/usb/core/config.c
include/linux/usb.h
include/uapi/linux/usb/ch9.h
Change-Id: Ibada5100be12f3a1389461f7738ee2ecb0d427af
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
[ Upstream commit 0292e169b2d9c8377a168778f0b16eadb1f578fd ]
or VM memory are not put thus leaked in kvm_iommu_unmap_memslots() when
destroy VM.
This is consistent with current vfio implementation.
Signed-off-by: herongguang <herongguang.he@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 63e41226afc3f7a044b70325566fa86ac3142538 ]
When a VCPU blocks (WFI) and has programmed the vtimer, we program a
soft timer to expire in the future to wake up the vcpu thread when
appropriate. Because such as wake up involves a vcpu kick, and the
timer expire function can get called from interrupt context, and the
kick may sleep, we have to schedule the kick in the work function.
The work function currently has a warning that gets raised if it turns
out that the timer shouldn't fire when it's run, which was added because
the idea was that in that case the work should never have been cancelled.
However, it turns out that this whole thing is racy and we can get
spurious warnings. The problem is that we clear the armed flag in the
work function, which may run in parallel with the
kvm_timer_unschedule->timer_disarm() call. This results in a possible
situation where the timer_disarm() call does not call
cancel_work_sync(), which effectively synchronizes the completion of the
work function with running the VCPU. As a result, the VCPU thread
proceeds before the work function completees, causing changes to the
timer state such that kvm_timer_should_fire(vcpu) returns false in the
work function.
All we do in the work function is to kick the VCPU, and an occasional
rare extra kick never harmed anyone. Since the race above is extremely
rare, we don't bother checking if the race happens but simply remove the
check and the clearing of the armed flag from the work function.
Reported-by: Matthias Brugger <mbrugger@suse.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* refs/heads/tmp-9f764bb
Linux 4.4.80
ASoC: dpcm: Avoid putting stream state to STOP when FE stream is paused
scsi: snic: Return error code on memory allocation failure
scsi: fnic: Avoid sending reset to firmware when another reset is in progress
HID: ignore Petzl USB headlamp
ALSA: usb-audio: test EP_FLAG_RUNNING at urb completion
sh_eth: enable RX descriptor word 0 shift on SH7734
nvmem: imx-ocotp: Fix wrong register size
arm64: mm: fix show_pte KERN_CONT fallout
vfio-pci: Handle error from pci_iomap
video: fbdev: cobalt_lcdfb: Handle return NULL error from devm_ioremap
perf symbols: Robustify reading of build-id from sysfs
perf tools: Install tools/lib/traceevent plugins with install-bin
xfrm: Don't use sk_family for socket policy lookups
tools lib traceevent: Fix prev/next_prio for deadline tasks
Btrfs: adjust outstanding_extents counter properly when dio write is split
usb: gadget: Fix copy/pasted error message
ACPI / scan: Prefer devices without _HID/_CID for _ADR matching
ARM: s3c2410_defconfig: Fix invalid values for NF_CT_PROTO_*
ARM64: zynqmp: Fix i2c node's compatible string
ARM64: zynqmp: Fix W=1 dtc 1.4 warnings
dmaengine: ti-dma-crossbar: Add some 'of_node_put()' in error path.
dmaengine: ioatdma: workaround SKX ioatdma version
dmaengine: ioatdma: Add Skylake PCI Dev ID
openrisc: Add _text symbol to fix ksym build error
irqchip/mxs: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
ASoC: nau8825: fix invalid configuration in Pre-Scalar of FLL
spi: dw: Make debugfs name unique between instances
ASoC: tlv320aic3x: Mark the RESET register as volatile
irqchip/keystone: Fix "scheduling while atomic" on rt
vfio-pci: use 32-bit comparisons for register address for gcc-4.5
drm/msm: Verify that MSM_SUBMIT_BO_FLAGS are set
drm/msm: Ensure that the hardware write pointer is valid
net/mlx4: Remove BUG_ON from ICM allocation routine
ipv6: Should use consistent conditional judgement for ip6 fragment between __ip6_append_data and ip6_finish_output
ARM: dts: n900: Mark eMMC slot with no-sdio and no-sd flags
r8169: add support for RTL8168 series add-on card.
x86/mce/AMD: Make the init code more robust
tpm: Replace device number bitmap with IDR
tpm: fix a kernel memory leak in tpm-sysfs.c
xen/blkback: don't use xen_blkif_get() in xen-blkback kthread
xen/blkback: don't free be structure too early
sched/cputime: Fix prev steal time accouting during CPU hotplug
net: skb_needs_check() accepts CHECKSUM_NONE for tx
pstore: Use dynamic spinlock initializer
pstore: Correctly initialize spinlock and flags
pstore: Allow prz to control need for locking
vlan: Propagate MAC address to VLANs
/proc/iomem: only expose physical resource addresses to privileged users
Make file credentials available to the seqfile interfaces
v4l: s5c73m3: fix negation operator
dentry name snapshots
ipmi/watchdog: fix watchdog timeout set on reboot
libnvdimm, btt: fix btt_rw_page not returning errors
RDMA/uverbs: Fix the check for port number
PM / Domains: defer dev_pm_domain_set() until genpd->attach_dev succeeds if present
sched/cgroup: Move sched_online_group() back into css_online() to fix crash
kaweth: fix oops upon failed memory allocation
kaweth: fix firmware download
mpt3sas: Don't overreach ioc->reply_post[] during initialization
mailbox: handle empty message in tx_tick
mailbox: skip complete wait event if timer expired
mailbox: always wait in mbox_send_message for blocking Tx mode
wil6210: fix deadlock when using fw_no_recovery option
ath10k: fix null deref on wmi-tlv when trying spectral scan
isdn/i4l: fix buffer overflow
isdn: Fix a sleep-in-atomic bug
net: phy: Do not perform software reset for Generic PHY
nfc: fdp: fix NULL pointer dereference
xfs: don't BUG() on mixed direct and mapped I/O
perf intel-pt: Ensure never to set 'last_ip' when packet 'count' is zero
perf intel-pt: Use FUP always when scanning for an IP
perf intel-pt: Fix last_ip usage
perf intel-pt: Fix ip compression
drm: rcar-du: Simplify and fix probe error handling
drm: rcar-du: Perform initialization/cleanup at probe/remove time
drm/rcar: Nuke preclose hook
Staging: comedi: comedi_fops: Avoid orphaned proc entry
Revert "powerpc/numa: Fix percpu allocations to be NUMA aware"
KVM: PPC: Book3S HV: Save/restore host values of debug registers
KVM: PPC: Book3S HV: Reload HTM registers explicitly
KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exit
KVM: PPC: Book3S HV: Context-switch EBB registers properly
drm/nouveau/bar/gf100: fix access to upper half of BAR2
drm/vmwgfx: Fix gcc-7.1.1 warning
md/raid5: add thread_group worker async_tx_issue_pending_all
crypto: authencesn - Fix digest_null crash
powerpc/pseries: Fix of_node_put() underflow during reconfig remove
net: reduce skb_warn_bad_offload() noise
pstore: Make spinlock per zone instead of global
af_key: Add lock to key dump
ANDROID: binder: Don't BUG_ON(!spin_is_locked()).
Linux 4.4.79
alarmtimer: don't rate limit one-shot timers
tracing: Fix kmemleak in instance_rmdir
spmi: Include OF based modalias in device uevent
of: device: Export of_device_{get_modalias, uvent_modalias} to modules
drm/mst: Avoid processing partially received up/down message transactions
drm/mst: Avoid dereferencing a NULL mstb in drm_dp_mst_handle_up_req()
drm/mst: Fix error handling during MST sideband message reception
RDMA/core: Initialize port_num in qp_attr
ceph: fix race in concurrent readdir
staging: rtl8188eu: add TL-WN722N v2 support
Revert "perf/core: Drop kernel samples even though :u is specified"
perf annotate: Fix broken arrow at row 0 connecting jmp instruction to its target
target: Fix COMPARE_AND_WRITE caw_sem leak during se_cmd quiesce
udf: Fix deadlock between writeback and udf_setsize()
NFS: only invalidate dentrys that are clearly invalid.
Input: i8042 - fix crash at boot time
MIPS: Fix a typo: s/preset/present/ in r2-to-r6 emulation error message
MIPS: Send SIGILL for linked branches in `__compute_return_epc_for_insn'
MIPS: Rename `sigill_r6' to `sigill_r2r6' in `__compute_return_epc_for_insn'
MIPS: Send SIGILL for BPOSGE32 in `__compute_return_epc_for_insn'
MIPS: math-emu: Prevent wrong ISA mode instruction emulation
MIPS: Fix unaligned PC interpretation in `compute_return_epc'
MIPS: Actually decode JALX in `__compute_return_epc_for_insn'
MIPS: Save static registers before sysmips
MIPS: Fix MIPS I ISA /proc/cpuinfo reporting
x86/ioapic: Pass the correct data to unmask_ioapic_irq()
x86/acpi: Prevent out of bound access caused by broken ACPI tables
MIPS: Negate error syscall return in trace
MIPS: Fix mips_atomic_set() with EVA
MIPS: Fix mips_atomic_set() retry condition
ftrace: Fix uninitialized variable in match_records()
vfio: New external user group/file match
vfio: Fix group release deadlock
f2fs: Don't clear SGID when inheriting ACLs
ipmi:ssif: Add missing unlock in error branch
ipmi: use rcu lock around call to intf->handlers->sender()
drm/radeon: Fix eDP for single-display iMac10,1 (v2)
drm/radeon/ci: disable mclk switching for high refresh rates (v2)
drm/amd/amdgpu: Return error if initiating read out of range on vram
s390/syscalls: Fix out of bounds arguments access
Raid5 should update rdev->sectors after reshape
cx88: Fix regression in initial video standard setting
x86/xen: allow userspace access during hypercalls
md: don't use flush_signals in userspace processes
usb: renesas_usbhs: gadget: disable all eps when the driver stops
usb: renesas_usbhs: fix usbhsc_resume() for !USBHSF_RUNTIME_PWCTRL
USB: cdc-acm: add device-id for quirky printer
usb: storage: return on error to avoid a null pointer dereference
xhci: Fix NULL pointer dereference when cleaning up streams for removed host
xhci: fix 20000ms port resume timeout
ipvs: SNAT packet replies only for NATed connections
PCI/PM: Restore the status of PCI devices across hibernation
af_key: Fix sadb_x_ipsecrequest parsing
powerpc/asm: Mark cr0 as clobbered in mftb()
powerpc: Fix emulation of mfocrf in emulate_step()
powerpc: Fix emulation of mcrf in emulate_step()
powerpc/64: Fix atomic64_inc_not_zero() to return an int
iscsi-target: Add login_keys_workaround attribute for non RFC initiators
scsi: ses: do not add a device to an enclosure if enclosure_add_links() fails.
PM / Domains: Fix unsafe iteration over modified list of domain providers
PM / Domains: Fix unsafe iteration over modified list of device links
ASoC: compress: Derive substream from stream based on direction
wlcore: fix 64K page support
Bluetooth: use constant time memory comparison for secret values
perf intel-pt: Clear FUP flag on error
perf intel-pt: Ensure IP is zero when state is INTEL_PT_STATE_NO_IP
perf intel-pt: Fix missing stack clear
perf intel-pt: Improve sample timestamp
perf intel-pt: Move decoder error setting into one condition
NFC: Add sockaddr length checks before accessing sa_family in bind handlers
nfc: Fix the sockaddr length sanitization in llcp_sock_connect
nfc: Ensure presence of required attributes in the activate_target handler
NFC: nfcmrvl: fix firmware-management initialisation
NFC: nfcmrvl: use nfc-device for firmware download
NFC: nfcmrvl: do not use device-managed resources
NFC: nfcmrvl_uart: add missing tty-device sanity check
NFC: fix broken device allocation
ath9k: fix tx99 bus error
ath9k: fix tx99 use after free
thermal: cpu_cooling: Avoid accessing potentially freed structures
s5p-jpeg: don't return a random width/height
ir-core: fix gcc-7 warning on bool arithmetic
disable new gcc-7.1.1 warnings for now
sched/fair: Add a backup_cpu to find_best_target
sched/fair: Try to estimate possible idle states.
sched/fair: Sync task util before EAS wakeup
Revert "sched/fair: ensure utilization signals are synchronized before use"
sched/fair: kick nohz idle balance for misfit task
sched/fair: Update signals of nohz cpus if we are going idle
events: add tracepoint for find_best_target
sched/fair: streamline find_best_target heuristics
UPSTREAM: af_key: Fix sadb_x_ipsecrequest parsing
ANDROID: lowmemorykiller: Add tgid to kill message
Revert "proc: smaps: Allow smaps access for CAP_SYS_RESOURCE"
Conflicts:
drivers/gpu/drm/msm/adreno/adreno_gpu.c
drivers/gpu/drm/msm/msm_ringbuffer.c
drivers/staging/android/lowmemorykiller.c
kernel/sched/fair.c
Change-Id: Ic3b3a522b79b1deb178e513b56b9c39eea48e079
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
commit 5d6dee80a1e94cc284d03e06d930e60e8d3ecf7d upstream.
At the point where the kvm-vfio pseudo device wants to release its
vfio group reference, we can't always acquire a new reference to make
that happen. The group can be in a state where we wouldn't allow a
new reference to be added. This new helper function allows a caller
to match a file to a group to facilitate this. Given a file and
group, report if they match. Thus the caller needs to already have a
group reference to match to the file. This allows the deletion of a
group without acquiring a new reference.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* refs/heads/tmp-b834e92
Revert "USB: gadget: u_ether: Fix data stall issue in RNDIS tethering mode"
Linux 4.4.63
MIPS: fix Select HAVE_IRQ_EXIT_ON_IRQ_STACK patch.
sctp: deny peeloff operation on asocs with threads sleeping on it
net: ipv6: check route protocol when deleting routes
tty/serial: atmel: RS485 half duplex w/DMA: enable RX after TX is done
SUNRPC: fix refcounting problems with auth_gss messages.
ibmveth: calculate gso_segs for large packets
catc: Use heap buffer for memory size test
catc: Combine failure cleanup code in catc_probe()
rtl8150: Use heap buffers for all register access
pegasus: Use heap buffers for all register access
virtio-console: avoid DMA from stack
dvb-usb-firmware: don't do DMA on stack
dvb-usb: don't use stack for firmware load
mm: Tighten x86 /dev/mem with zeroing reads
rtc: tegra: Implement clock handling
platform/x86: acer-wmi: setup accelerometer when machine has appropriate notify event
ext4: fix inode checksum calculation problem if i_extra_size is small
dvb-usb-v2: avoid use-after-free
ath9k: fix NULL pointer dereference
crypto: ahash - Fix EINPROGRESS notification callback
powerpc: Disable HFSCR[TM] if TM is not supported
zram: do not use copy_page with non-page aligned address
kvm: fix page struct leak in handle_vmon
Revert "MIPS: Lantiq: Fix cascaded IRQ setup"
char: lack of bool string made CONFIG_DEVPORT always on
char: Drop bogus dependency of DEVPORT on !M68K
ftrace: Fix removing of second function probe
irqchip/irq-imx-gpcv2: Fix spinlock initialization
libnvdimm: fix reconfig_mutex, mmap_sem, and jbd2_handle lockdep splat
xen, fbfront: fix connecting to backend
scsi: sd: Fix capacity calculation with 32-bit sector_t
scsi: sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable
scsi: sr: Sanity check returned mode data
iscsi-target: Drop work-around for legacy GlobalSAN initiator
iscsi-target: Fix TMR reference leak during session shutdown
acpi, nfit, libnvdimm: fix interleave set cookie calculation (64-bit comparison)
x86/vdso: Plug race between mapping and ELF header setup
x86/vdso: Ensure vdso32_enabled gets set to valid values only
perf/x86: Avoid exposing wrong/stale data in intel_pmu_lbr_read_32()
Input: xpad - add support for Razer Wildcat gamepad
CIFS: store results of cifs_reopen_file to avoid infinite wait
drm/nouveau/mmu/nv4a: use nv04 mmu rather than the nv44 one
drm/nouveau/mpeg: mthd returns true on success now
thp: fix MADV_DONTNEED vs clear soft dirty race
cgroup, kthread: close race window where new kthreads can be migrated to non-root cgroups
ANDROID: uid_sys_stats: reduce update_io_stats overhead
UPSTREAM: char: lack of bool string made CONFIG_DEVPORT always on
UPSTREAM: char: Drop bogus dependency of DEVPORT on !M68K
Revert "Android: sdcardfs: Don't do d_add for lower fs"
ANDROID: usb: gadget: fix MTP enumeration issue under super speed mode
Android: sdcardfs: Don't complain in fixup_lower_ownership
Android: sdcardfs: Don't do d_add for lower fs
ANDROID: sdcardfs: ->iget fixes
Android: sdcardfs: Change cache GID value
BACKPORT: [UPSTREAM] ext2: convert to mbcache2
BACKPORT [UPSTREAM] ext4: convert to mbcache2
BACKPORT: [UPSTREAM] mbcache2: reimplement mbcache
Linux 4.4.62
ibmveth: set correct gso_size and gso_type
net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT transitions
net/mlx4_core: Fix racy CQ (Completion Queue) free
net/mlx4_en: Fix bad WQE issue
usb: hub: Wait for connection to be reestablished after port reset
blk-mq: Avoid memory reclaim when remapping queues
net/packet: fix overflow in check for priv area size
crypto: caam - fix RNG deinstantiation error checking
MIPS: IRQ Stack: Fix erroneous jal to plat_irq_dispatch
MIPS: Select HAVE_IRQ_EXIT_ON_IRQ_STACK
MIPS: Switch to the irq_stack in interrupts
MIPS: Only change $28 to thread_info if coming from user mode
MIPS: Stack unwinding while on IRQ stack
MIPS: Introduce irq_stack
mtd: bcm47xxpart: fix parsing first block after aligned TRX
usb: dwc3: gadget: delay unmap of bounced requests
drm/i915: Stop using RP_DOWN_EI on Baytrail
drm/i915: Avoid tweaking evaluation thresholds on Baytrail v3
UPSTREAM: net: socket: Make unnecessarily global sockfs_setattr() static
UPSTREAM: net: ipv4: Don't crash if passing a null sk to ip_do_redirect.
UPSTREAM: net/packet: fix overflow in check for priv area size
Linux 4.4.61
mm/mempolicy.c: fix error handling in set_mempolicy and mbind.
MIPS: Flush wrong invalid FTLB entry for huge page
MIPS: Lantiq: fix missing xbar kernel panic
MIPS: End spinlocks with .insn
MIPS: ralink: Fix typos in rt3883 pinctrl
MIPS: Force o32 fp64 support on 32bit MIPS64r6 kernels
s390/uaccess: get_user() should zero on failure (again)
s390/decompressor: fix initrd corruption caused by bss clear
nios2: reserve boot memory for device tree
powerpc: Don't try to fix up misaligned load-with-reservation instructions
powerpc/mm: Add missing global TLB invalidate if cxl is active
metag/usercopy: Add missing fixups
metag/usercopy: Fix src fixup in from user rapf loops
metag/usercopy: Set flags before ADDZ
metag/usercopy: Zero rest of buffer from copy_from_user
metag/usercopy: Add early abort to copy_to_user
metag/usercopy: Fix alignment error checking
metag/usercopy: Drop unused macros
ring-buffer: Fix return value check in test_ringbuffer()
ptrace: fix PTRACE_LISTEN race corrupting task->state
Reset TreeId to zero on SMB2 TREE_CONNECT
iio: bmg160: reset chip when probing
arm/arm64: KVM: Take mmap_sem in kvm_arch_prepare_memory_region
arm/arm64: KVM: Take mmap_sem in stage2_unmap_vm
staging: android: ashmem: lseek failed due to no FMODE_LSEEK.
sysfs: be careful of error returns from ops->show()
drm/vmwgfx: fix integer overflow in vmw_surface_define_ioctl()
drm/vmwgfx: Remove getparam error message
drm/ttm, drm/vmwgfx: Relax permission checking when opening surfaces
drm/vmwgfx: avoid calling vzalloc with a 0 size in vmw_get_cap_3d_ioctl()
drm/vmwgfx: NULL pointer dereference in vmw_surface_define_ioctl()
drm/vmwgfx: Type-check lookups of fence objects
Revert "Revert "Revert "CHROMIUM: android: binder: Fix potential scheduling-while-atomic"""
ANDROID: sdcardfs: Directly pass lower file for mmap
UPSTREAM: checkpatch: special audit for revert commit line
UPSTREAM: PM / sleep: make PM notifiers called symmetrically
Revert "Revert "CHROMIUM: android: binder: Fix potential scheduling-while-atomic""
Linux 4.4.60
padata: avoid race in reordering
blk: Ensure users for current->bio_list can see the full list.
blk: improve order of bio handling in generic_make_request()
power: reset: at91-poweroff: timely shutdown LPDDR memories
KVM: kvm_io_bus_unregister_dev() should never fail
rtc: s35390a: improve irq handling
rtc: s35390a: implement reset routine as suggested by the reference
rtc: s35390a: make sure all members in the output are set
rtc: s35390a: fix reading out alarm
MIPS: Lantiq: Fix cascaded IRQ setup
mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd()
drm/radeon: Override fpfn for all VRAM placements in radeon_evict_flags
KVM: x86: clear bus pointer when destroyed
USB: fix linked-list corruption in rh_call_control()
tty/serial: atmel: fix TX path in atmel_console_write()
tty/serial: atmel: fix race condition (TX+DMA)
ACPI: Do not create a platform_device for IOAPIC/IOxAPIC
ACPI: Fix incompatibility with mcount-based function graph tracing
ASoC: atmel-classd: fix audio clock rate
ALSA: hda - fix a problem for lineout on a Dell AIO machine
ALSA: seq: Fix race during FIFO resize
scsi: libsas: fix ata xfer length
scsi: sg: check length passed to SG_NEXT_CMD_LEN
scsi: mpt3sas: fix hang on ata passthrough commands
xen/setup: Don't relocate p2m over existing one
libceph: force GFP_NOIO for socket allocations
Linux 4.4.59
sched/rt: Add a missing rescheduling point
fscrypt: remove broken support for detecting keyring key revocation
metag/ptrace: Reject partial NT_METAG_RPIPE writes
metag/ptrace: Provide default TXSTATUS for short NT_PRSTATUS
metag/ptrace: Preserve previous registers for short regset write
sparc/ptrace: Preserve previous registers for short regset write
mips/ptrace: Preserve previous registers for short regset write
h8300/ptrace: Fix incorrect register transfer count
c6x/ptrace: Remove useless PTRACE_SETREGSET implementation
pinctrl: qcom: Don't clear status bit on irq_unmask
virtio_balloon: init 1st buffer in stats vq
xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder
xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window
xfrm: policy: init locks early
Conflicts:
drivers/scsi/sd.c
drivers/usb/gadget/function/f_mtp.c
drivers/usb/gadget/function/u_ether.c
Change-Id: I80501cf02d04204f8c0f3a7f5a036eaa4d54546e
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
commit 90db10434b163e46da413d34db8d0e77404cc645 upstream.
No caller currently checks the return value of
kvm_io_bus_unregister_dev(). This is evil, as all callers silently go on
freeing their device. A stale reference will remain in the io_bus,
getting at least used again, when the iobus gets teared down on
kvm_destroy_vm() - leading to use after free errors.
There is nothing the callers could do, except retrying over and over
again.
So let's simply remove the bus altogether, print an error and make
sure no one can access this broken bus again (returning -ENOMEM on any
attempt to access it).
Fixes: e93f8a0f82 ("KVM: convert io_bus to SRCU")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit df630b8c1e851b5e265dc2ca9c87222e342c093b upstream.
When releasing the bus, let's clear the bus pointers to mark it out. If
any further device unregister happens on this bus, we know that we're
done if we found the bus being released already.
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4f3dbdf47e150016aacd734e663347fcaa768303 upstream.
Reported syzkaller:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
PGD 0
Oops: 0002 [#1] SMP
CPU: 1 PID: 125 Comm: kworker/1:1 Not tainted 4.9.0+ #1
Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm]
task: ffff9bbe0dfbb900 task.stack: ffffb61802014000
RIP: 0010:irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
Call Trace:
irqfd_shutdown+0x66/0xa0 [kvm]
process_one_work+0x16b/0x480
worker_thread+0x4b/0x500
kthread+0x101/0x140
? process_one_work+0x480/0x480
? kthread_create_on_node+0x60/0x60
ret_from_fork+0x25/0x30
RIP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] RSP: ffffb61802017e20
CR2: 0000000000000008
The syzkaller folks reported a NULL pointer dereference that due to
unregister an consumer which fails registration before. The syzkaller
creates two VMs w/ an equal eventfd occasionally. So the second VM
fails to register an irqbypass consumer. It will make irqfd as inactive
and queue an workqueue work to shutdown irqfd and unregister the irqbypass
consumer when eventfd is closed. However, the second consumer has been
initialized though it fails registration. So the token(same as the first
VM's) is taken to unregister the consumer through the workqueue, the
consumer of the first VM is found and unregistered, then NULL deref incurred
in the path of deleting consumer from the consumers list.
This patch fixes it by making irq_bypass_register/unregister_consumer()
looks for the consumer entry based on consumer pointer itself instead of
token matching.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Having the system register numbers as #defines has been a pain
since day one, as the ordering is pretty fragile, and moving
things around leads to renumbering and epic conflict resolutions.
Now that we're mostly acessing the sysreg file in C, an enum is
a much better type to use, and we can clean things up a bit.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit 9d8415d6c148a16b6d906a96f0596851d7e4d607)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
We store GICv3 LRs in reverse order so that the CPU can save/restore
them in rever order as well (don't ask why, the design is crazy),
and yet generate memory traffic that doesn't completely suck.
We need this macro to be available to the C version of save/restore.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
(cherry picked from commit 3c13b8f435acb452eac62d966148a8b6fa92151f)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
commit 2f1fe81123f59271bddda673b60116bde9660385 upstream.
When freeing the nested resources of a vcpu, there is an assumption that
the vcpu's vmcs01 is the current VMCS on the CPU that executes
nested_release_vmcs12(). If this assumption is violated, the vcpu's
vmcs01 may be made active on multiple CPUs at the same time, in
violation of Intel's specification. Moreover, since the vcpu's vmcs01 is
not VMCLEARed on every CPU on which it is active, it can linger in a
CPU's VMCS cache after it has been freed and potentially
repurposed. Subsequent eviction from the CPU's VMCS cache on a capacity
miss can result in memory corruption.
It is not sufficient for vmx_free_vcpu() to call vmx_load_vmcs01(). If
the vcpu in question was last loaded on a different CPU, it must be
migrated to the current CPU before calling vmx_load_vmcs01().
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit caf1ff26e1aa178133df68ac3d40815fed2187d9 upstream.
These days, we experienced one guest crash with 8 cores and 3 disks,
with qemu error logs as bellow:
qemu-system-x86_64: /build/qemu-2.0.0/kvm-all.c:984:
kvm_irqchip_commit_routes: Assertion `ret == 0' failed.
And then we found one patch(bdf026317d) in qemu tree, which said
could fix this bug.
Execute the following script will reproduce the BUG quickly:
irq_affinity.sh
========================================================================
vda_irq_num=25
vdb_irq_num=27
while [ 1 ]
do
for irq in {1,2,4,8,10,20,40,80}
do
echo $irq > /proc/irq/$vda_irq_num/smp_affinity
echo $irq > /proc/irq/$vdb_irq_num/smp_affinity
dd if=/dev/vda of=/dev/zero bs=4K count=100 iflag=direct
dd if=/dev/vdb of=/dev/zero bs=4K count=100 iflag=direct
done
done
========================================================================
The following qemu log is added in the qemu code and is displayed when
this bug reproduced:
kvm_irqchip_commit_routes: max gsi: 1008, nr_allocated_irq_routes: 1024,
irq_routes->nr: 1024, gsi_count: 1024.
That's to say when irq_routes->nr == 1024, there are 1024 routing entries,
but in the kernel code when routes->nr >= 1024, will just return -EINVAL;
The nr is the number of the routing entries which is in of
[1 ~ KVM_MAX_IRQ_ROUTES], not the index in [0 ~ KVM_MAX_IRQ_ROUTES - 1].
This patch fix the BUG above.
Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Signed-off-by: Wei Tang <tangwei@cmss.chinamobile.com>
Signed-off-by: Zhang Zhuoyu <zhangzhuoyu@cmss.chinamobile.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1c5631c73fc2261a5df64a72c155cb53dcdc0c45 upstream.
On a host that runs NTP, corrections can have a direct impact on
the background timer that we program on the behalf of a vcpu.
In particular, NTP performing a forward correction will result in
a timer expiring sooner than expected from a guest point of view.
Not a big deal, we kick the vcpu anyway.
But on wake-up, the vcpu thread is going to perform a check to
find out whether or not it should block. And at that point, the
timer check is going to say "timer has not expired yet, go back
to sleep". This results in the timer event being lost forever.
There are multiple ways to handle this. One would be record that
the timer has expired and let kvm_cpu_has_pending_timer return
true in that case, but that would be fairly invasive. Another is
to check for the "short sleep" condition in the hrtimer callback,
and restart the timer for the remaining time when the condition
is detected.
This patch implements the latter, with a bit of refactoring in
order to avoid too much code duplication.
Reported-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e9ad4ec8379ad1ba6f68b8ca1c26b50b5ae0a327 upstream.
Moving the initialization earlier is needed in 4.6 because
kvm_arch_init_vm is now using mmu_lock, causing lockdep to
complain:
[ 284.440294] INFO: trying to register non-static key.
[ 284.445259] the code is fine but needs lockdep annotation.
[ 284.450736] turning off the locking correctness validator.
...
[ 284.528318] [<ffffffff810aecc3>] lock_acquire+0xd3/0x240
[ 284.533733] [<ffffffffa0305aa0>] ? kvm_page_track_register_notifier+0x20/0x60 [kvm]
[ 284.541467] [<ffffffff81715581>] _raw_spin_lock+0x41/0x80
[ 284.546960] [<ffffffffa0305aa0>] ? kvm_page_track_register_notifier+0x20/0x60 [kvm]
[ 284.554707] [<ffffffffa0305aa0>] kvm_page_track_register_notifier+0x20/0x60 [kvm]
[ 284.562281] [<ffffffffa02ece70>] kvm_mmu_init_vm+0x20/0x30 [kvm]
[ 284.568381] [<ffffffffa02dbf7a>] kvm_arch_init_vm+0x1ea/0x200 [kvm]
[ 284.574740] [<ffffffffa02bff3f>] kvm_dev_ioctl+0xbf/0x4d0 [kvm]
However, it also helps fixing a preexisting problem, which is why this
patch is also good for stable kernels: kvm_create_vm was incrementing
current->mm->mm_count but not decrementing it at the out_err label (in
case kvm_init_mmu_notifier failed). The new initialization order makes
it possible to add the required mmdrop without adding a new error label.
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 313f636d5c490c9741d3f750dc8da33029edbc6b upstream.
When growing halt-polling, there is no check that the poll time exceeds
the limit. It's possible for vcpu->halt_poll_ns grow once past
halt_poll_ns, and stay there until a halt which takes longer than
vcpu->halt_poll_ns. For example, booting a Linux guest with
halt_poll_ns=11000:
... kvm:kvm_halt_poll_ns: vcpu 0: halt_poll_ns 0 (shrink 10000)
... kvm:kvm_halt_poll_ns: vcpu 0: halt_poll_ns 10000 (grow 0)
... kvm:kvm_halt_poll_ns: vcpu 0: halt_poll_ns 20000 (grow 10000)
Signed-off-by: David Matlack <dmatlack@google.com>
Fixes: aca6ff29c4
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 236cf17c2502007a9d2dda3c39fb0d9a6bd03cc2 upstream.
When we allocate bitmaps in vgic_vcpu_init_maps, we divide the number of
bits we need by 8 to figure out how many bytes to allocate. However,
bitmap elements are always accessed as unsigned longs, and if we didn't
happen to allocate a size such that size % sizeof(unsigned long) == 0,
bitmap accesses may go past the end of the allocation.
When using KASAN (which does byte-granular access checks), this results
in a continuous stream of BUGs whenever these bitmaps are accessed:
=============================================================================
BUG kmalloc-128 (Tainted: G B ): kasan: bad access detected
-----------------------------------------------------------------------------
INFO: Allocated in vgic_init.part.25+0x55c/0x990 age=7493 cpu=3 pid=1730
INFO: Slab 0xffffffbde6d5da40 objects=16 used=15 fp=0xffffffc935769700 flags=0x4000000000000080
INFO: Object 0xffffffc935769500 @offset=1280 fp=0x (null)
Bytes b4 ffffffc9357694f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object ffffffc935769500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object ffffffc935769510: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object ffffffc935769520: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object ffffffc935769530: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object ffffffc935769540: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object ffffffc935769550: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object ffffffc935769560: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object ffffffc935769570: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Padding ffffffc9357695b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Padding ffffffc9357695c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Padding ffffffc9357695d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Padding ffffffc9357695e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Padding ffffffc9357695f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
CPU: 3 PID: 1740 Comm: kvm-vcpu-0 Tainted: G B 4.4.0+ #17
Hardware name: ARM Juno development board (r1) (DT)
Call trace:
[<ffffffc00008e770>] dump_backtrace+0x0/0x280
[<ffffffc00008ea04>] show_stack+0x14/0x20
[<ffffffc000726360>] dump_stack+0x100/0x188
[<ffffffc00030d324>] print_trailer+0xfc/0x168
[<ffffffc000312294>] object_err+0x3c/0x50
[<ffffffc0003140fc>] kasan_report_error+0x244/0x558
[<ffffffc000314548>] __asan_report_load8_noabort+0x48/0x50
[<ffffffc000745688>] __bitmap_or+0xc0/0xc8
[<ffffffc0000d9e44>] kvm_vgic_flush_hwstate+0x1bc/0x650
[<ffffffc0000c514c>] kvm_arch_vcpu_ioctl_run+0x2ec/0xa60
[<ffffffc0000b9a6c>] kvm_vcpu_ioctl+0x474/0xa68
[<ffffffc00036b7b0>] do_vfs_ioctl+0x5b8/0xcb0
[<ffffffc00036bf34>] SyS_ioctl+0x8c/0xa0
[<ffffffc000086cb0>] el0_svc_naked+0x24/0x28
Memory state around the buggy address:
ffffffc935769400: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffffffc935769480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffffffc935769500: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffffffc935769580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffffffc935769600: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Fix the issue by always allocating a multiple of sizeof(unsigned long),
as we do elsewhere in the vgic code.
Fixes: c1bfb577a ("arm/arm64: KVM: vgic: switch to dynamic allocation")
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b3aff6ccbb1d25e506b60ccd9c559013903f3464 upstream.
Commit 4b4b4512da ("arm/arm64: KVM: Rework the arch timer to use
level-triggered semantics") brought the virtual architected timer
closer to the VGIC. There is one occasion were we don't properly
check for the VGIC actually having been initialized before, but
instead go on to check the active state of some IRQ number.
If userland hasn't instantiated a virtual GIC, we end up with a
kernel NULL pointer dereference:
=========
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = ffffffc9745c5000
[00000000] *pgd=00000009f631e003, *pud=00000009f631e003, *pmd=0000000000000000
Internal error: Oops: 96000006 [#2] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 2144 Comm: kvm_simplest-ar Tainted: G D 4.5.0-rc2+ #1300
Hardware name: ARM Juno development board (r1) (DT)
task: ffffffc976da8000 ti: ffffffc976e28000 task.ti: ffffffc976e28000
PC is at vgic_bitmap_get_irq_val+0x78/0x90
LR is at kvm_vgic_map_is_active+0xac/0xc8
pc : [<ffffffc0000b7e28>] lr : [<ffffffc0000b972c>] pstate: 20000145
....
=========
Fix this by bailing out early of kvm_timer_flush_hwstate() if we don't
have a VGIC at all.
Reported-by: Cosmin Gorgovan <cosmin@linux-geek.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
External inputs to the vgic from time to time need to poke into the
state of a virtual interrupt, the prime example is the architected timer
code.
Since the IRQ's active state can be represented in two places; the LR or
the distributor, we first loop over the LRs but if not active in the LRs
we just return if *any* IRQ is active on the VCPU in question.
This is of course bogus, as we should check if the specific IRQ in
quesiton is active on the distributor instead.
Reported-by: Eric Auger <eric.auger@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We were probing the physial distributor state for the active state of a
HW virtual IRQ, because we had seen evidence that the LR state was not
cleared when the guest deactivated a virtual interrupted.
However, this issue turned out to be a software bug in the GIC, which
was solved by: 84aab5e68c2a5e1e18d81ae8308c3ce25d501b29
(KVM: arm/arm64: arch_timer: Preserve physical dist. active
state on LR.active, 2015-11-24)
Therefore, get rid of the complexities and just look at the LR.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
We were incorrectly removing the active state from the physical
distributor on the timer interrupt when the timer output level was
deasserted. We shouldn't be doing this without considering the virtual
interrupt's active state, because the architecture requires that when an
LR has the HW bit set and the pending or active bits set, then the
physical interrupt must also have the corresponding bits set.
This addresses an issue where we have been observing an inconsistency
between the LR state and the physical distributor state where the LR
state was active and the physical distributor was not active, which
shouldn't happen.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
handling.
PPC: Mostly bug fixes.
ARM: No big features, but many small fixes and prerequisites including:
- a number of fixes for the arch-timer
- introducing proper level-triggered semantics for the arch-timers
- a series of patches to synchronously halt a guest (prerequisite for
IRQ forwarding)
- some tracepoint improvements
- a tweak for the EL2 panic handlers
- some more VGIC cleanups getting rid of redundant state
x86: quite a few changes:
- support for VT-d posted interrupts (i.e. PCI devices can inject
interrupts directly into vCPUs). This introduces a new component (in
virt/lib/) that connects VFIO and KVM together. The same infrastructure
will be used for ARM interrupt forwarding as well.
- more Hyper-V features, though the main one Hyper-V synthetic interrupt
controller will have to wait for 4.5. These will let KVM expose Hyper-V
devices.
- nested virtualization now supports VPID (same as PCID but for vCPUs)
which makes it quite a bit faster
- for future hardware that supports NVDIMM, there is support for clflushopt,
clwb, pcommit
- support for "split irqchip", i.e. LAPIC in kernel + IOAPIC/PIC/PIT in
userspace, which reduces the attack surface of the hypervisor
- obligatory smattering of SMM fixes
- on the guest side, stable scheduler clock support was rewritten to not
require help from the hypervisor.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJWO2IQAAoJEL/70l94x66D/K0H/3AovAgYmJQToZlimsktMk6a
f2xhdIqfU5lIQQh5uNBCfL3o9o8H9Py1ym7aEw3fmztPHHJYc91oTatt2UEKhmEw
VtZHp/dFHt3hwaIdXmjRPEXiYctraKCyrhaUYdWmUYkoKi7lW5OL5h+S7frG2U6u
p/hFKnHRZfXHr6NSgIqvYkKqtnc+C0FWY696IZMzgCksOO8jB1xrxoSN3tANW3oJ
PDV+4og0fN/Fr1capJUFEc/fejREHneANvlKrLaa8ht0qJQutoczNADUiSFLcMPG
iHljXeDsv5eyjMtUuIL8+MPzcrIt/y4rY41ZPiKggxULrXc6H+JJL/e/zThZpXc=
=iv2z
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"First batch of KVM changes for 4.4.
s390:
A bunch of fixes and optimizations for interrupt and time handling.
PPC:
Mostly bug fixes.
ARM:
No big features, but many small fixes and prerequisites including:
- a number of fixes for the arch-timer
- introducing proper level-triggered semantics for the arch-timers
- a series of patches to synchronously halt a guest (prerequisite
for IRQ forwarding)
- some tracepoint improvements
- a tweak for the EL2 panic handlers
- some more VGIC cleanups getting rid of redundant state
x86:
Quite a few changes:
- support for VT-d posted interrupts (i.e. PCI devices can inject
interrupts directly into vCPUs). This introduces a new
component (in virt/lib/) that connects VFIO and KVM together.
The same infrastructure will be used for ARM interrupt
forwarding as well.
- more Hyper-V features, though the main one Hyper-V synthetic
interrupt controller will have to wait for 4.5. These will let
KVM expose Hyper-V devices.
- nested virtualization now supports VPID (same as PCID but for
vCPUs) which makes it quite a bit faster
- for future hardware that supports NVDIMM, there is support for
clflushopt, clwb, pcommit
- support for "split irqchip", i.e. LAPIC in kernel +
IOAPIC/PIC/PIT in userspace, which reduces the attack surface of
the hypervisor
- obligatory smattering of SMM fixes
- on the guest side, stable scheduler clock support was rewritten
to not require help from the hypervisor"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (123 commits)
KVM: VMX: Fix commit which broke PML
KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0()
KVM: x86: allow RSM from 64-bit mode
KVM: VMX: fix SMEP and SMAP without EPT
KVM: x86: move kvm_set_irq_inatomic to legacy device assignment
KVM: device assignment: remove pointless #ifdefs
KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic
KVM: x86: zero apic_arb_prio on reset
drivers/hv: share Hyper-V SynIC constants with userspace
KVM: x86: handle SMBASE as physical address in RSM
KVM: x86: add read_phys to x86_emulate_ops
KVM: x86: removing unused variable
KVM: don't pointlessly leave KVM_COMPAT=y in non-KVM configs
KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr()
KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings
KVM: arm/arm64: Optimize away redundant LR tracking
KVM: s390: use simple switch statement as multiplexer
KVM: s390: drop useless newline in debugging data
KVM: s390: SCA must not cross page boundaries
KVM: arm: Do not indent the arguments of DECLARE_BITMAP
...
We do not want to do too much work in atomic context, in particular
not walking all the VCPUs of the virtual machine. So we want
to distinguish the architecture-specific injection function for irqfd
from kvm_set_msi. Since it's still empty, reuse the newly added
kvm_arch_set_irq and rename it to kvm_arch_set_irq_inatomic.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Includes a number of fixes for the arch-timer, introducing proper
level-triggered semantics for the arch-timers, a series of patches to
synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint
improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups
getting rid of redundant state, and finally a stylistic change that gets rid of
some ctags warnings.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJWOhgAAAoJEEtpOizt6ddyKS8H/2ZHMTPjo6yChnrusNWy4Qbr
6laPDlzL+g45oMQRwNL7GnM1deRftaxvT2Wi+X84D/6Y/BD6MPds4HgtBfuWcSZ1
CyRJ0Ot/zrxenucSuJuOjq+a9gdizdAczkbB1MfYDULJH8fb6D+7RYLo3zgh4Xo4
pla3L9U6gSWe+YopBjZtZH43m3fwiwSM/v+uHOTIcXrsbR+fEgx/EFSKmA/DUCuo
P5cFO/ceUGu7nATCexu5V82TgR2hvurrsR7mqfwY8YcF6HRM+NEOoS29xWC77v5S
u/F08TKuKQLv0YTEFTyLETI/oEeuC0cHtrRQBNf4+9kXEOzKyXaae0wR/I6X2Ss=
=GMNk
-----END PGP SIGNATURE-----
Merge tag 'kvm-arm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/ARM Changes for v4.4-rc1
Includes a number of fixes for the arch-timer, introducing proper
level-triggered semantics for the arch-timers, a series of patches to
synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint
improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups
getting rid of redundant state, and finally a stylistic change that gets rid of
some ctags warnings.
Conflicts:
arch/x86/include/asm/kvm_host.h
Now we see that vgic_set_lr() and vgic_sync_lr_elrsr() are always used
together. Merge them into one function, saving from second vgic_ops
dereferencing every time.
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
1. Remove unnecessary 'irq' argument, because irq number can be retrieved
from the LR.
2. Since cff9211eb1
("arm/arm64: KVM: Fix arch timer behavior for disabled interrupts ")
LR_STATE_PENDING is queued back by vgic_retire_lr() itself. Also, it
clears vlr.state itself. Therefore, we remove the same, now duplicated,
check with all accompanying bit manipulations from vgic_unqueue_irqs().
3. vgic_retire_lr() is always accompanied by vgic_irq_clear_queued(). Since
it already does more than just clearing the LR, move
vgic_irq_clear_queued() inside of it.
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Currently we use vgic_irq_lr_map in order to track which LRs hold which
IRQs, and lr_used bitmap in order to track which LRs are used or free.
vgic_irq_lr_map is actually used only for piggy-back optimization, and
can be easily replaced by iteration over lr_used. This is good because in
future, when LPI support is introduced, number of IRQs will grow up to at
least 16384, while numbers from 1024 to 8192 are never going to be used.
This would be a huge memory waste.
In its turn, lr_used is also completely redundant since
ae705930fc ("arm/arm64: KVM: Keep elrsr/aisr
in sync with software model"), because together with lr_used we also update
elrsr. This allows to easily replace lr_used with elrsr, inverting all
conditions (because in elrsr '1' means 'free').
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Pull irq updates from Thomas Gleixner:
"The irq departement delivers:
- Rework the irqdomain core infrastructure to accomodate ACPI based
systems. This is required to support ARM64 without creating
artificial device tree nodes.
- Sanitize the ACPI based ARM GIC initialization by making use of the
new firmware independent irqdomain core
- Further improvements to the generic MSI management
- Generalize the irq migration on CPU hotplug
- Improvements to the threaded interrupt infrastructure
- Allow the migration of "chained" low level interrupt handlers
- Allow optional force masking of interrupts in disable_irq[_nosysnc]
- Support for two new interrupt chips - Sigh!
- A larger set of errata fixes for ARM gicv3
- The usual pile of fixes, updates, improvements and cleanups all
over the place"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
Document that IRQ_NONE should be returned when IRQ not actually handled
PCI/MSI: Allow the MSI domain to be device-specific
PCI: Add per-device MSI domain hook
of/irq: Use the msi-map property to provide device-specific MSI domain
of/irq: Split of_msi_map_rid to reuse msi-map lookup
irqchip/gic-v3-its: Parse new version of msi-parent property
PCI/MSI: Use of_msi_get_domain instead of open-coded "msi-parent" parsing
of/irq: Use of_msi_get_domain instead of open-coded "msi-parent" parsing
of/irq: Add support code for multi-parent version of "msi-parent"
irqchip/gic-v3-its: Add handling of PCI requester id.
PCI/MSI: Add helper function pci_msi_domain_get_msi_rid().
of/irq: Add new function of_msi_map_rid()
Docs: dt: Add PCI MSI map bindings
irqchip/gic-v2m: Add support for multiple MSI frames
irqchip/gic-v3: Fix translation of LPIs after conversion to irq_fwspec
irqchip/mxs: Add Alphascale ASM9260 support
irqchip/mxs: Prepare driver for hardware with different offsets
irqchip/mxs: Panic if ioremap or domain creation fails
irqdomain: Documentation updates
irqdomain/msi: Use fwnode instead of of_node
...
The VGIC and timer code for KVM arm/arm64 doesn't have any tracepoints
or tracepoint infrastructure defined. Rewriting some of the timer code
handling showed me how much we need this, so let's add these simple
trace points once and for all and we can easily expand with additional
trace points in these files as we go along.
Cc: Wei Huang <wei@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
We mark edge-triggered interrupts with the HW bit set as queued to
prevent the VGIC code from injecting LRs with both the Active and
Pending bits set at the same time while also setting the HW bit,
because the hardware does not support this.
However, this means that we must also clear the queued flag when we sync
back a LR where the state on the physical distributor went from active
to inactive because the guest deactivated the interrupt. At this point
we must also check if the interrupt is pending on the distributor, and
tell the VGIC to queue it again if it is.
Since these actions on the sync path are extremely close to those for
level-triggered interrupts, rename process_level_irq to
process_queued_irq, allowing it to cater for both cases.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
The arch timer currently uses edge-triggered semantics in the sense that
the line is never sampled by the vgic and lowering the line from the
timer to the vgic doesn't have any effect on the pending state of
virtual interrupts in the vgic. This means that we do not support a
guest with the otherwise valid behavior of (1) disable interrupts (2)
enable the timer (3) disable the timer (4) enable interrupts. Such a
guest would validly not expect to see any interrupts on real hardware,
but will see interrupts on KVM.
This patch fixes this shortcoming through the following series of
changes.
First, we change the flow of the timer/vgic sync/flush operations. Now
the timer is always flushed/synced before the vgic, because the vgic
samples the state of the timer output. This has the implication that we
move the timer operations in to non-preempible sections, but that is
fine after the previous commit getting rid of hrtimer schedules on every
entry/exit.
Second, we change the internal behavior of the timer, letting the timer
keep track of its previous output state, and only lower/raise the line
to the vgic when the state changes. Note that in theory this could have
been accomplished more simply by signalling the vgic every time the
state *potentially* changed, but we don't want to be hitting the vgic
more often than necessary.
Third, we get rid of the use of the map->active field in the vgic and
instead simply set the interrupt as active on the physical distributor
whenever the input to the GIC is asserted and conversely clear the
physical active state when the input to the GIC is deasserted.
Fourth, and finally, we now initialize the timer PPIs (and all the other
unused PPIs for now), to be level-triggered, and modify the sync code to
sample the line state on HW sync and re-inject a new interrupt if it is
still pending at that time.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
We currently initialize the SGIs to be enabled in the VGIC code, but we
use the VGIC_NR_PPIS define for this purpose, instead of the the more
natural VGIC_NR_SGIS. Change this slightly confusing use of the
defines.
Note: This should have no functional change, as both names are defined
to the number 16.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
The GICD_ICFGR allows the bits for the SGIs and PPIs to be read only.
We currently simulate this behavior by writing a hardcoded value to the
register for the SGIs and PPIs on every write of these bits to the
register (ignoring what the guest actually wrote), and by writing the
same value as the reset value to the register.
This is a bit counter-intuitive, as the register is RO for these bits,
and we can just implement it that way, allowing us to control the value
of the bits purely in the reset code.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Currently vgic_process_maintenance() processes dealing with a completed
level-triggered interrupt directly, but we are soon going to reuse this
logic for level-triggered mapped interrupts with the HW bit set, so
move this logic into a separate static function.
Probably the most scary part of this commit is convincing yourself that
the current flow is safe compared to the old one. In the following I
try to list the changes and why they are harmless:
Move vgic_irq_clear_queued after kvm_notify_acked_irq:
Harmless because the only potential effect of clearing the queued
flag wrt. kvm_set_irq is that vgic_update_irq_pending does not set
the pending bit on the emulated CPU interface or in the
pending_on_cpu bitmask if the function is called with level=1.
However, the point of kvm_notify_acked_irq is to call kvm_set_irq
with level=0, and we set the queued flag again in
__kvm_vgic_sync_hwstate later on if the level is stil high.
Move vgic_set_lr before kvm_notify_acked_irq:
Also, harmless because the LR are cpu-local operations and
kvm_notify_acked only affects the dist
Move vgic_dist_irq_clear_soft_pend after kvm_notify_acked_irq:
Also harmless, because now we check the level state in the
clear_soft_pend function and lower the pending bits if the level is
low.
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
We currently schedule a soft timer every time we exit the guest if the
timer did not expire while running the guest. This is really not
necessary, because the only work we do in the timer work function is to
kick the vcpu.
Kicking the vcpu does two things:
(1) If the vpcu thread is on a waitqueue, make it runnable and remove it
from the waitqueue.
(2) If the vcpu is running on a different physical CPU from the one
doing the kick, it sends a reschedule IPI.
The second case cannot happen, because the soft timer is only ever
scheduled when the vcpu is not running. The first case is only relevant
when the vcpu thread is on a waitqueue, which is only the case when the
vcpu thread has called kvm_vcpu_block().
Therefore, we only need to make sure a timer is scheduled for
kvm_vcpu_block(), which we do by encapsulating all calls to
kvm_vcpu_block() with kvm_timer_{un}schedule calls.
Additionally, we only schedule a soft timer if the timer is enabled and
unmasked, since it is useless otherwise.
Note that theoretically userspace can use the SET_ONE_REG interface to
change registers that should cause the timer to fire, even if the vcpu
is blocked without a scheduled timer, but this case was not supported
before this patch and we leave it for future work for now.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Some times it is useful for architecture implementations of KVM to know
when the VCPU thread is about to block or when it comes back from
blocking (arm/arm64 needs to know this to properly implement timers, for
example).
Therefore provide a generic architecture callback function in line with
what we do elsewhere for KVM generic-arch interactions.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
We currently do a single update of the vgic state when the distributor
enable/disable control register is accessed and then bypass updating the
state for as long as the distributor remains disabled.
This is incorrect, because updating the state does not consider the
distributor enable bit, and this you can end up in a situation where an
interrupt is marked as pending on the CPU interface, but not pending on
the distributor, which is an impossible state to be in, and triggers a
warning. Consider for example the following sequence of events:
1. An interrupt is marked as pending on the distributor
- the interrupt is also forwarded to the CPU interface
2. The guest turns off the distributor (it's about to do a reboot)
- we stop updating the CPU interface state from now on
3. The guest disables the pending interrupt
- we remove the pending state from the distributor, but don't touch
the CPU interface, see point 2.
Since the distributor disable bit really means that no interrupts should
be forwarded to the CPU interface, we modify the code to keep updating
the internal VGIC state, but always set the CPU interface pending bits
to zero when the distributor is disabled.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
When a guest reboots or offlines/onlines CPUs, it is not uncommon for it
to clear the pending and active states of an interrupt through the
emulated VGIC distributor. However, since the architected timers are
defined by the architecture to be level triggered and the guest
rightfully expects them to be that, but we emulate them as
edge-triggered, we have to mimic level-triggered behavior for an
edge-triggered virtual implementation.
We currently do not signal the VGIC when the map->active field is true,
because it indicates that the guest has already been signalled of the
interrupt as required. Normally this field is set to false when the
guest deactivates the virtual interrupt through the sync path.
We also need to catch the case where the guest deactivates the interrupt
through the emulated distributor, again allowing guests to boot even if
the original virtual timer signal hit before the guest's GIC
initialization sequence is run.
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
We have an interesting issue when the guest disables the timer interrupt
on the VGIC, which happens when turning VCPUs off using PSCI, for
example.
The problem is that because the guest disables the virtual interrupt at
the VGIC level, we never inject interrupts to the guest and therefore
never mark the interrupt as active on the physical distributor. The
host also never takes the timer interrupt (we only use the timer device
to trigger a guest exit and everything else is done in software), so the
interrupt does not become active through normal means.
The result is that we keep entering the guest with a programmed timer
that will always fire as soon as we context switch the hardware timer
state and run the guest, preventing forward progress for the VCPU.
Since the active state on the physical distributor is really part of the
timer logic, it is the job of our virtual arch timer driver to manage
this state.
The timer->map->active boolean field indicates whether we have signalled
this interrupt to the vgic and if that interrupt is still pending or
active. As long as that is the case, the hardware doesn't have to
generate physical interrupts and therefore we mark the interrupt as
active on the physical distributor.
We also have to restore the pending state of an interrupt that was
queued to an LR but was retired from the LR for some reason, while
remaining pending in the LR.
Cc: Marc Zyngier <marc.zyngier@arm.com>
Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
When lowering a level-triggered line from userspace, we forgot to lower
the pending bit on the emulated CPU interface and we also did not
re-compute the pending_on_cpu bitmap for the CPU affected by the change.
Update vgic_update_irq_pending() to fix the two issues above and also
raise a warning in vgic_quue_irq_to_lr if we encounter an interrupt
pending on a CPU which is neither marked active nor pending.
[ Commit text reworked completely - Christoffer ]
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Allow for arch-specific interrupt types to be set. For that, add
kvm_arch_set_irq() which takes interrupt type-specific action if it
recognizes the interrupt type given, and -EWOULDBLOCK otherwise.
The default implementation always returns -EWOULDBLOCK.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Factor out kvm_notify_acked_gsi() helper to iterate over EOI listeners
and notify those matching the given gsi.
It will be reused in the upcoming Hyper-V SynIC implementation.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The loop(for) inside irqfd_update() is unnecessary
because any other value for irq_entry.type will just trigger
schedule_work(&irqfd->inject) in irqfd_wakeup.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
async_pf_execute() seems to be missing a memory barrier which might
cause the waker to not notice the waiter and miss sending a wake_up as
in the following figure.
async_pf_execute kvm_vcpu_block
------------------------------------------------------------------------
spin_lock(&vcpu->async_pf.lock);
if (waitqueue_active(&vcpu->wq))
/* The CPU might reorder the test for
the waitqueue up here, before
prior writes complete */
prepare_to_wait(&vcpu->wq, &wait,
TASK_INTERRUPTIBLE);
/*if (kvm_vcpu_check_block(vcpu) < 0) */
/*if (kvm_arch_vcpu_runnable(vcpu)) { */
...
return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
!vcpu->arch.apf.halted)
|| !list_empty_careful(&vcpu->async_pf.done)
...
return 0;
list_add_tail(&apf->link,
&vcpu->async_pf.done);
spin_unlock(&vcpu->async_pf.lock);
waited = true;
schedule();
------------------------------------------------------------------------
The attached patch adds the missing memory barrier.
I found this issue when I was looking through the linux source code
for places calling waitqueue_active() before wake_up*(), but without
preceding memory barriers, after sending a patch to fix a similar
issue in drivers/tty/n_tty.c (Details about the original issue can be
found here: https://lkml.org/lkml/2015/9/28/849).
Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>