Commit graph

962 commits

Author SHA1 Message Date
Srinivasarao P
3726391f05 Merge android-4.4.108 (55b3b8c) into msm-4.4
* refs/heads/tmp-55b3b8c
  Linux 4.4.108
  alpha: fix build failures
  ALSA: hda - Fix yet another i915 pointer leftover in error path
  ALSA: hda - Degrade i915 binding failure message
  ALSA: hda - Clear the leftover component assignment at snd_hdac_i915_exit()
  Revert "Bluetooth: btusb: driver to enable the usb-wakeup feature"
  MIPS: math-emu: Fix final emulation phase for certain instructions
  thermal: hisilicon: Handle return value of clk_prepare_enable
  cpuidle: fix broadcast control when broadcast can not be entered
  rtc: set the alarm to the next expiring timer
  tcp: fix under-evaluated ssthresh in TCP Vegas
  fm10k: ensure we process SM mbx when processing VF mbx
  scsi: lpfc: PLOGI failures during NPIV testing
  scsi: lpfc: Fix secure firmware updates
  PCI/AER: Report non-fatal errors only to the affected endpoint
  ixgbe: fix use of uninitialized padding
  igb: check memory allocation failure
  PCI: Create SR-IOV virtfn/physfn links before attaching driver
  scsi: mpt3sas: Fix IO error occurs on pulling out a drive from RAID1 volume created on two SATA drive
  scsi: cxgb4i: fix Tx skb leak
  PCI: Avoid bus reset if bridge itself is broken
  net: phy: at803x: Change error to EINVAL for invalid MAC
  rtc: pl031: make interrupt optional
  crypto: crypto4xx - increase context and scatter ring buffer elements
  backlight: pwm_bl: Fix overflow condition
  bnxt_en: Fix NULL pointer dereference in reopen failure path
  cpuidle: powernv: Pass correct drv->cpumask for registration
  ARM: dma-mapping: disallow dma_get_sgtable() for non-kernel managed memory
  netfilter: nfnetlink_queue: fix secctx memory leak
  xhci: plat: Register shutdown for xhci_plat
  isdn: kcapi: avoid uninitialized data
  KVM: pci-assign: do not map smm memory slot pages in vt-d page tables
  ARM: dts: am335x-evmsk: adjust mmc2 param to allow suspend
  netfilter: nf_nat_snmp: Fix panic when snmp_trap_helper fails to register
  netfilter: nfnl_cthelper: fix a race when walk the nf_ct_helper_hash table
  irda: vlsi_ir: fix check for DMA mapping errors
  RDMA/iser: Fix possible mr leak on device removal event
  i40e: Do not enable NAPI on q_vectors that have no rings
  net: Do not allow negative values for busy_read and busy_poll sysctl interfaces
  bna: avoid writing uninitialized data into hw registers
  s390/qeth: no ETH header for outbound AF_IUCV
  r8152: prevent the driver from transmitting packets with carrier off
  HID: xinmo: fix for out of range for THT 2P arcade controller.
  hwmon: (asus_atk0110) fix uninitialized data access
  ARM: dts: ti: fix PCI bus dtc warnings
  KVM: VMX: Fix enable VPID conditions
  KVM: x86: correct async page present tracepoint
  scsi: lpfc: Fix PT2PT PRLI reject
  pinctrl: st: add irq_request/release_resources callbacks
  inet: frag: release spinlock before calling icmp_send()
  netfilter: nfnl_cthelper: Fix memory leak
  netfilter: nfnl_cthelper: fix runtime expectation policy updates
  usb: gadget: udc: remove pointer dereference after free
  usb: gadget: f_uvc: Sanity check wMaxPacketSize for SuperSpeed
  net: qmi_wwan: Add USB IDs for MDM6600 modem on Motorola Droid 4
  bna: integer overflow bug in debugfs
  sch_dsmark: fix invalid skb_cow() usage
  crypto: deadlock between crypto_alg_sem/rtnl_mutex/genl_mutex
  r8152: fix the list rx_done may be used without initialization
  cpuidle: Validate cpu_dev in cpuidle_add_sysfs()
  arm: kprobes: Align stack to 8-bytes in test code
  arm: kprobes: Fix the return address of multiple kretprobes
  ALSA: hda - add support for docking station for HP 840 G3
  ALSA: hda - add support for docking station for HP 820 G2
  x86/irq: Do not substract irq_tlb_count from irq_call_count
  sched/core: Idle_task_exit() shouldn't use switch_mm_irqs_off()
  ARM: Hide finish_arch_post_lock_switch() from modules
  x86/mm, sched/core: Turn off IRQs in switch_mm()
  x86/mm, sched/core: Uninline switch_mm()
  x86/mm: Build arch/x86/mm/tlb.c even on !SMP
  sched/core: Add switch_mm_irqs_off() and use it in the scheduler
  mm/mmu_context, sched/core: Fix mmu_context.h assumption
  mm/rmap: batched invalidations should use existing api
  x86/mm: If INVPCID is available, use it to flush global mappings
  x86/mm: Add a 'noinvpcid' boot option to turn off INVPCID
  x86/mm: Fix INVPCID asm constraint
  x86/mm: Add INVPCID helpers
  cxl: Check if vphb exists before iterating over AFU devices
  arm64: Initialise high_memory global variable earlier
  ANDROID: binder: Remove obsolete proc waitqueue.

Change-Id: Ie954ccd1dbd861672345bb0ee879273be4d0a441
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
2018-01-18 12:50:06 +05:30
Srinivasarao P
33260fbfb3 Merge android-4.4.105 (8a53962) into msm-4.4
* refs/heads/tmp-8a53962
  Linux 4.4.105
  xen-netfront: avoid crashing on resume after a failure in talk_to_netback()
  usb: host: fix incorrect updating of offset
  USB: usbfs: Filter flags passed in from user space
  USB: devio: Prevent integer overflow in proc_do_submiturb()
  USB: Increase usbfs transfer limit
  USB: core: Add type-specific length check of BOS descriptors
  usb: ch9: Add size macro for SSP dev cap descriptor
  usb: Add USB 3.1 Precision time measurement capability descriptor support
  usb: xhci: fix panic in xhci_free_virt_devices_depth_first
  usb: hub: Cycle HUB power when initialization fails
  Revert "ocfs2: should wait dio before inode lock in ocfs2_setattr()"
  net: fec: fix multicast filtering hardware setup
  xen-netfront: Improve error handling during initialization
  mm: avoid returning VM_FAULT_RETRY from ->page_mkwrite handlers
  tcp: correct memory barrier usage in tcp_check_space()
  dmaengine: pl330: fix double lock
  tipc: fix cleanup at module unload
  net: sctp: fix array overrun read on sctp_timer_tbl
  drm/exynos/decon5433: set STANDALONE_UPDATE_F on output enablement
  NFSv4: Fix client recovery when server reboots multiple times
  KVM: arm/arm64: Fix occasional warning from the timer work function
  nfs: Don't take a reference on fl->fl_file for LOCK operation
  ravb: Remove Rx overflow log messages
  net/appletalk: Fix kernel memory disclosure
  vti6: fix device register to report IFLA_INFO_KIND
  ARM: OMAP1: DMA: Correct the number of logical channels
  net: systemport: Pad packet before inserting TSB
  net: systemport: Utilize skb_put_padto()
  kprobes/x86: Disable preemption in ftrace-based jprobes
  perf test attr: Fix ignored test case result
  sysrq : fix Show Regs call trace on ARM
  EDAC, sb_edac: Fix missing break in switch
  x86/entry: Use SYSCALL_DEFINE() macros for sys_modify_ldt()
  serial: 8250: Preserve DLD[7:4] for PORT_XR17V35X
  usb: phy: tahvo: fix error handling in tahvo_usb_probe()
  spi: sh-msiof: Fix DMA transfer size check
  serial: 8250_fintek: Fix rs485 disablement on invalid ioctl()
  selftests/x86/ldt_get: Add a few additional tests for limits
  s390/pci: do not require AIS facility
  ima: fix hash algorithm initialization
  USB: serial: option: add Quectel BG96 id
  s390/runtime instrumentation: simplify task exit handling
  serial: 8250_pci: Add Amazon PCI serial device ID
  usb: quirks: Add no-lpm quirk for KY-688 USB 3.1 Type-C Hub
  uas: Always apply US_FL_NO_ATA_1X quirk to Seagate devices
  bcache: recover data from backing when data is clean
  bcache: only permit to recovery read error when cache device is clean
  ANDROID: initramfs: call free_initrd() when skipping init

Conflicts:
	drivers/usb/core/config.c
	include/linux/usb.h
	include/uapi/linux/usb/ch9.h

Change-Id: Ibada5100be12f3a1389461f7738ee2ecb0d427af
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
2018-01-08 10:02:41 +05:30
Herongguang (Stephen)
f15394085d KVM: pci-assign: do not map smm memory slot pages in vt-d page tables
[ Upstream commit 0292e169b2d9c8377a168778f0b16eadb1f578fd ]

or VM memory are not put thus leaked in kvm_iommu_unmap_memslots() when
destroy VM.

This is consistent with current vfio implementation.

Signed-off-by: herongguang <herongguang.he@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25 14:22:13 +01:00
Christoffer Dall
48222dd037 KVM: arm/arm64: Fix occasional warning from the timer work function
[ Upstream commit 63e41226afc3f7a044b70325566fa86ac3142538 ]

When a VCPU blocks (WFI) and has programmed the vtimer, we program a
soft timer to expire in the future to wake up the vcpu thread when
appropriate.  Because such as wake up involves a vcpu kick, and the
timer expire function can get called from interrupt context, and the
kick may sleep, we have to schedule the kick in the work function.

The work function currently has a warning that gets raised if it turns
out that the timer shouldn't fire when it's run, which was added because
the idea was that in that case the work should never have been cancelled.

However, it turns out that this whole thing is racy and we can get
spurious warnings.  The problem is that we clear the armed flag in the
work function, which may run in parallel with the
kvm_timer_unschedule->timer_disarm() call.  This results in a possible
situation where the timer_disarm() call does not call
cancel_work_sync(), which effectively synchronizes the completion of the
work function with running the VCPU.  As a result, the VCPU thread
proceeds before the work function completees, causing changes to the
timer state such that kvm_timer_should_fire(vcpu) returns false in the
work function.

All we do in the work function is to kick the VCPU, and an occasional
rare extra kick never harmed anyone.  Since the race above is extremely
rare, we don't bother checking if the race happens but simply remove the
check and the clearing of the armed flag from the work function.

Reported-by: Matthias Brugger <mbrugger@suse.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-09 18:42:42 +01:00
Blagovest Kolenichev
899e6b9605 Merge android-4.4@9f764bb (v4.4.80) into msm-4.4
* refs/heads/tmp-9f764bb
  Linux 4.4.80
  ASoC: dpcm: Avoid putting stream state to STOP when FE stream is paused
  scsi: snic: Return error code on memory allocation failure
  scsi: fnic: Avoid sending reset to firmware when another reset is in progress
  HID: ignore Petzl USB headlamp
  ALSA: usb-audio: test EP_FLAG_RUNNING at urb completion
  sh_eth: enable RX descriptor word 0 shift on SH7734
  nvmem: imx-ocotp: Fix wrong register size
  arm64: mm: fix show_pte KERN_CONT fallout
  vfio-pci: Handle error from pci_iomap
  video: fbdev: cobalt_lcdfb: Handle return NULL error from devm_ioremap
  perf symbols: Robustify reading of build-id from sysfs
  perf tools: Install tools/lib/traceevent plugins with install-bin
  xfrm: Don't use sk_family for socket policy lookups
  tools lib traceevent: Fix prev/next_prio for deadline tasks
  Btrfs: adjust outstanding_extents counter properly when dio write is split
  usb: gadget: Fix copy/pasted error message
  ACPI / scan: Prefer devices without _HID/_CID for _ADR matching
  ARM: s3c2410_defconfig: Fix invalid values for NF_CT_PROTO_*
  ARM64: zynqmp: Fix i2c node's compatible string
  ARM64: zynqmp: Fix W=1 dtc 1.4 warnings
  dmaengine: ti-dma-crossbar: Add some 'of_node_put()' in error path.
  dmaengine: ioatdma: workaround SKX ioatdma version
  dmaengine: ioatdma: Add Skylake PCI Dev ID
  openrisc: Add _text symbol to fix ksym build error
  irqchip/mxs: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
  ASoC: nau8825: fix invalid configuration in Pre-Scalar of FLL
  spi: dw: Make debugfs name unique between instances
  ASoC: tlv320aic3x: Mark the RESET register as volatile
  irqchip/keystone: Fix "scheduling while atomic" on rt
  vfio-pci: use 32-bit comparisons for register address for gcc-4.5
  drm/msm: Verify that MSM_SUBMIT_BO_FLAGS are set
  drm/msm: Ensure that the hardware write pointer is valid
  net/mlx4: Remove BUG_ON from ICM allocation routine
  ipv6: Should use consistent conditional judgement for ip6 fragment between __ip6_append_data and ip6_finish_output
  ARM: dts: n900: Mark eMMC slot with no-sdio and no-sd flags
  r8169: add support for RTL8168 series add-on card.
  x86/mce/AMD: Make the init code more robust
  tpm: Replace device number bitmap with IDR
  tpm: fix a kernel memory leak in tpm-sysfs.c
  xen/blkback: don't use xen_blkif_get() in xen-blkback kthread
  xen/blkback: don't free be structure too early
  sched/cputime: Fix prev steal time accouting during CPU hotplug
  net: skb_needs_check() accepts CHECKSUM_NONE for tx
  pstore: Use dynamic spinlock initializer
  pstore: Correctly initialize spinlock and flags
  pstore: Allow prz to control need for locking
  vlan: Propagate MAC address to VLANs
  /proc/iomem: only expose physical resource addresses to privileged users
  Make file credentials available to the seqfile interfaces
  v4l: s5c73m3: fix negation operator
  dentry name snapshots
  ipmi/watchdog: fix watchdog timeout set on reboot
  libnvdimm, btt: fix btt_rw_page not returning errors
  RDMA/uverbs: Fix the check for port number
  PM / Domains: defer dev_pm_domain_set() until genpd->attach_dev succeeds if present
  sched/cgroup: Move sched_online_group() back into css_online() to fix crash
  kaweth: fix oops upon failed memory allocation
  kaweth: fix firmware download
  mpt3sas: Don't overreach ioc->reply_post[] during initialization
  mailbox: handle empty message in tx_tick
  mailbox: skip complete wait event if timer expired
  mailbox: always wait in mbox_send_message for blocking Tx mode
  wil6210: fix deadlock when using fw_no_recovery option
  ath10k: fix null deref on wmi-tlv when trying spectral scan
  isdn/i4l: fix buffer overflow
  isdn: Fix a sleep-in-atomic bug
  net: phy: Do not perform software reset for Generic PHY
  nfc: fdp: fix NULL pointer dereference
  xfs: don't BUG() on mixed direct and mapped I/O
  perf intel-pt: Ensure never to set 'last_ip' when packet 'count' is zero
  perf intel-pt: Use FUP always when scanning for an IP
  perf intel-pt: Fix last_ip usage
  perf intel-pt: Fix ip compression
  drm: rcar-du: Simplify and fix probe error handling
  drm: rcar-du: Perform initialization/cleanup at probe/remove time
  drm/rcar: Nuke preclose hook
  Staging: comedi: comedi_fops: Avoid orphaned proc entry
  Revert "powerpc/numa: Fix percpu allocations to be NUMA aware"
  KVM: PPC: Book3S HV: Save/restore host values of debug registers
  KVM: PPC: Book3S HV: Reload HTM registers explicitly
  KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exit
  KVM: PPC: Book3S HV: Context-switch EBB registers properly
  drm/nouveau/bar/gf100: fix access to upper half of BAR2
  drm/vmwgfx: Fix gcc-7.1.1 warning
  md/raid5: add thread_group worker async_tx_issue_pending_all
  crypto: authencesn - Fix digest_null crash
  powerpc/pseries: Fix of_node_put() underflow during reconfig remove
  net: reduce skb_warn_bad_offload() noise
  pstore: Make spinlock per zone instead of global
  af_key: Add lock to key dump
  ANDROID: binder: Don't BUG_ON(!spin_is_locked()).
  Linux 4.4.79
  alarmtimer: don't rate limit one-shot timers
  tracing: Fix kmemleak in instance_rmdir
  spmi: Include OF based modalias in device uevent
  of: device: Export of_device_{get_modalias, uvent_modalias} to modules
  drm/mst: Avoid processing partially received up/down message transactions
  drm/mst: Avoid dereferencing a NULL mstb in drm_dp_mst_handle_up_req()
  drm/mst: Fix error handling during MST sideband message reception
  RDMA/core: Initialize port_num in qp_attr
  ceph: fix race in concurrent readdir
  staging: rtl8188eu: add TL-WN722N v2 support
  Revert "perf/core: Drop kernel samples even though :u is specified"
  perf annotate: Fix broken arrow at row 0 connecting jmp instruction to its target
  target: Fix COMPARE_AND_WRITE caw_sem leak during se_cmd quiesce
  udf: Fix deadlock between writeback and udf_setsize()
  NFS: only invalidate dentrys that are clearly invalid.
  Input: i8042 - fix crash at boot time
  MIPS: Fix a typo: s/preset/present/ in r2-to-r6 emulation error message
  MIPS: Send SIGILL for linked branches in `__compute_return_epc_for_insn'
  MIPS: Rename `sigill_r6' to `sigill_r2r6' in `__compute_return_epc_for_insn'
  MIPS: Send SIGILL for BPOSGE32 in `__compute_return_epc_for_insn'
  MIPS: math-emu: Prevent wrong ISA mode instruction emulation
  MIPS: Fix unaligned PC interpretation in `compute_return_epc'
  MIPS: Actually decode JALX in `__compute_return_epc_for_insn'
  MIPS: Save static registers before sysmips
  MIPS: Fix MIPS I ISA /proc/cpuinfo reporting
  x86/ioapic: Pass the correct data to unmask_ioapic_irq()
  x86/acpi: Prevent out of bound access caused by broken ACPI tables
  MIPS: Negate error syscall return in trace
  MIPS: Fix mips_atomic_set() with EVA
  MIPS: Fix mips_atomic_set() retry condition
  ftrace: Fix uninitialized variable in match_records()
  vfio: New external user group/file match
  vfio: Fix group release deadlock
  f2fs: Don't clear SGID when inheriting ACLs
  ipmi:ssif: Add missing unlock in error branch
  ipmi: use rcu lock around call to intf->handlers->sender()
  drm/radeon: Fix eDP for single-display iMac10,1 (v2)
  drm/radeon/ci: disable mclk switching for high refresh rates (v2)
  drm/amd/amdgpu: Return error if initiating read out of range on vram
  s390/syscalls: Fix out of bounds arguments access
  Raid5 should update rdev->sectors after reshape
  cx88: Fix regression in initial video standard setting
  x86/xen: allow userspace access during hypercalls
  md: don't use flush_signals in userspace processes
  usb: renesas_usbhs: gadget: disable all eps when the driver stops
  usb: renesas_usbhs: fix usbhsc_resume() for !USBHSF_RUNTIME_PWCTRL
  USB: cdc-acm: add device-id for quirky printer
  usb: storage: return on error to avoid a null pointer dereference
  xhci: Fix NULL pointer dereference when cleaning up streams for removed host
  xhci: fix 20000ms port resume timeout
  ipvs: SNAT packet replies only for NATed connections
  PCI/PM: Restore the status of PCI devices across hibernation
  af_key: Fix sadb_x_ipsecrequest parsing
  powerpc/asm: Mark cr0 as clobbered in mftb()
  powerpc: Fix emulation of mfocrf in emulate_step()
  powerpc: Fix emulation of mcrf in emulate_step()
  powerpc/64: Fix atomic64_inc_not_zero() to return an int
  iscsi-target: Add login_keys_workaround attribute for non RFC initiators
  scsi: ses: do not add a device to an enclosure if enclosure_add_links() fails.
  PM / Domains: Fix unsafe iteration over modified list of domain providers
  PM / Domains: Fix unsafe iteration over modified list of device links
  ASoC: compress: Derive substream from stream based on direction
  wlcore: fix 64K page support
  Bluetooth: use constant time memory comparison for secret values
  perf intel-pt: Clear FUP flag on error
  perf intel-pt: Ensure IP is zero when state is INTEL_PT_STATE_NO_IP
  perf intel-pt: Fix missing stack clear
  perf intel-pt: Improve sample timestamp
  perf intel-pt: Move decoder error setting into one condition
  NFC: Add sockaddr length checks before accessing sa_family in bind handlers
  nfc: Fix the sockaddr length sanitization in llcp_sock_connect
  nfc: Ensure presence of required attributes in the activate_target handler
  NFC: nfcmrvl: fix firmware-management initialisation
  NFC: nfcmrvl: use nfc-device for firmware download
  NFC: nfcmrvl: do not use device-managed resources
  NFC: nfcmrvl_uart: add missing tty-device sanity check
  NFC: fix broken device allocation
  ath9k: fix tx99 bus error
  ath9k: fix tx99 use after free
  thermal: cpu_cooling: Avoid accessing potentially freed structures
  s5p-jpeg: don't return a random width/height
  ir-core: fix gcc-7 warning on bool arithmetic
  disable new gcc-7.1.1 warnings for now
  sched/fair: Add a backup_cpu to find_best_target
  sched/fair: Try to estimate possible idle states.
  sched/fair: Sync task util before EAS wakeup
  Revert "sched/fair: ensure utilization signals are synchronized before use"
  sched/fair: kick nohz idle balance for misfit task
  sched/fair: Update signals of nohz cpus if we are going idle
  events: add tracepoint for find_best_target
  sched/fair: streamline find_best_target heuristics
  UPSTREAM: af_key: Fix sadb_x_ipsecrequest parsing
  ANDROID: lowmemorykiller: Add tgid to kill message
  Revert "proc: smaps: Allow smaps access for CAP_SYS_RESOURCE"

Conflicts:
	drivers/gpu/drm/msm/adreno/adreno_gpu.c
	drivers/gpu/drm/msm/msm_ringbuffer.c
	drivers/staging/android/lowmemorykiller.c
	kernel/sched/fair.c

Change-Id: Ic3b3a522b79b1deb178e513b56b9c39eea48e079
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2017-08-15 09:32:23 -07:00
Alex Williamson
3457c04594 vfio: New external user group/file match
commit 5d6dee80a1e94cc284d03e06d930e60e8d3ecf7d upstream.

At the point where the kvm-vfio pseudo device wants to release its
vfio group reference, we can't always acquire a new reference to make
that happen.  The group can be in a state where we wouldn't allow a
new reference to be added.  This new helper function allows a caller
to match a file to a group to facilitate this.  Given a file and
group, report if they match.  Thus the caller needs to already have a
group reference to match to the file.  This allows the deletion of a
group without acquiring a new reference.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27 15:06:07 -07:00
Blagovest Kolenichev
d877e94313 Merge branch 'android-4.4@b834e92' into branch 'msm-4.4'
* refs/heads/tmp-b834e92
  Revert "USB: gadget: u_ether: Fix data stall issue in RNDIS tethering mode"
  Linux 4.4.63
  MIPS: fix Select HAVE_IRQ_EXIT_ON_IRQ_STACK patch.
  sctp: deny peeloff operation on asocs with threads sleeping on it
  net: ipv6: check route protocol when deleting routes
  tty/serial: atmel: RS485 half duplex w/DMA: enable RX after TX is done
  SUNRPC: fix refcounting problems with auth_gss messages.
  ibmveth: calculate gso_segs for large packets
  catc: Use heap buffer for memory size test
  catc: Combine failure cleanup code in catc_probe()
  rtl8150: Use heap buffers for all register access
  pegasus: Use heap buffers for all register access
  virtio-console: avoid DMA from stack
  dvb-usb-firmware: don't do DMA on stack
  dvb-usb: don't use stack for firmware load
  mm: Tighten x86 /dev/mem with zeroing reads
  rtc: tegra: Implement clock handling
  platform/x86: acer-wmi: setup accelerometer when machine has appropriate notify event
  ext4: fix inode checksum calculation problem if i_extra_size is small
  dvb-usb-v2: avoid use-after-free
  ath9k: fix NULL pointer dereference
  crypto: ahash - Fix EINPROGRESS notification callback
  powerpc: Disable HFSCR[TM] if TM is not supported
  zram: do not use copy_page with non-page aligned address
  kvm: fix page struct leak in handle_vmon
  Revert "MIPS: Lantiq: Fix cascaded IRQ setup"
  char: lack of bool string made CONFIG_DEVPORT always on
  char: Drop bogus dependency of DEVPORT on !M68K
  ftrace: Fix removing of second function probe
  irqchip/irq-imx-gpcv2: Fix spinlock initialization
  libnvdimm: fix reconfig_mutex, mmap_sem, and jbd2_handle lockdep splat
  xen, fbfront: fix connecting to backend
  scsi: sd: Fix capacity calculation with 32-bit sector_t
  scsi: sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable
  scsi: sr: Sanity check returned mode data
  iscsi-target: Drop work-around for legacy GlobalSAN initiator
  iscsi-target: Fix TMR reference leak during session shutdown
  acpi, nfit, libnvdimm: fix interleave set cookie calculation (64-bit comparison)
  x86/vdso: Plug race between mapping and ELF header setup
  x86/vdso: Ensure vdso32_enabled gets set to valid values only
  perf/x86: Avoid exposing wrong/stale data in intel_pmu_lbr_read_32()
  Input: xpad - add support for Razer Wildcat gamepad
  CIFS: store results of cifs_reopen_file to avoid infinite wait
  drm/nouveau/mmu/nv4a: use nv04 mmu rather than the nv44 one
  drm/nouveau/mpeg: mthd returns true on success now
  thp: fix MADV_DONTNEED vs clear soft dirty race
  cgroup, kthread: close race window where new kthreads can be migrated to non-root cgroups
  ANDROID: uid_sys_stats: reduce update_io_stats overhead
  UPSTREAM: char: lack of bool string made CONFIG_DEVPORT always on
  UPSTREAM: char: Drop bogus dependency of DEVPORT on !M68K
  Revert "Android: sdcardfs: Don't do d_add for lower fs"
  ANDROID: usb: gadget: fix MTP enumeration issue under super speed mode
  Android: sdcardfs: Don't complain in fixup_lower_ownership
  Android: sdcardfs: Don't do d_add for lower fs
  ANDROID: sdcardfs: ->iget fixes
  Android: sdcardfs: Change cache GID value
  BACKPORT: [UPSTREAM] ext2: convert to mbcache2
  BACKPORT [UPSTREAM] ext4: convert to mbcache2
  BACKPORT: [UPSTREAM] mbcache2: reimplement mbcache
  Linux 4.4.62
  ibmveth: set correct gso_size and gso_type
  net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT transitions
  net/mlx4_core: Fix racy CQ (Completion Queue) free
  net/mlx4_en: Fix bad WQE issue
  usb: hub: Wait for connection to be reestablished after port reset
  blk-mq: Avoid memory reclaim when remapping queues
  net/packet: fix overflow in check for priv area size
  crypto: caam - fix RNG deinstantiation error checking
  MIPS: IRQ Stack: Fix erroneous jal to plat_irq_dispatch
  MIPS: Select HAVE_IRQ_EXIT_ON_IRQ_STACK
  MIPS: Switch to the irq_stack in interrupts
  MIPS: Only change $28 to thread_info if coming from user mode
  MIPS: Stack unwinding while on IRQ stack
  MIPS: Introduce irq_stack
  mtd: bcm47xxpart: fix parsing first block after aligned TRX
  usb: dwc3: gadget: delay unmap of bounced requests
  drm/i915: Stop using RP_DOWN_EI on Baytrail
  drm/i915: Avoid tweaking evaluation thresholds on Baytrail v3
  UPSTREAM: net: socket: Make unnecessarily global sockfs_setattr() static
  UPSTREAM: net: ipv4: Don't crash if passing a null sk to ip_do_redirect.
  UPSTREAM: net/packet: fix overflow in check for priv area size
  Linux 4.4.61
  mm/mempolicy.c: fix error handling in set_mempolicy and mbind.
  MIPS: Flush wrong invalid FTLB entry for huge page
  MIPS: Lantiq: fix missing xbar kernel panic
  MIPS: End spinlocks with .insn
  MIPS: ralink: Fix typos in rt3883 pinctrl
  MIPS: Force o32 fp64 support on 32bit MIPS64r6 kernels
  s390/uaccess: get_user() should zero on failure (again)
  s390/decompressor: fix initrd corruption caused by bss clear
  nios2: reserve boot memory for device tree
  powerpc: Don't try to fix up misaligned load-with-reservation instructions
  powerpc/mm: Add missing global TLB invalidate if cxl is active
  metag/usercopy: Add missing fixups
  metag/usercopy: Fix src fixup in from user rapf loops
  metag/usercopy: Set flags before ADDZ
  metag/usercopy: Zero rest of buffer from copy_from_user
  metag/usercopy: Add early abort to copy_to_user
  metag/usercopy: Fix alignment error checking
  metag/usercopy: Drop unused macros
  ring-buffer: Fix return value check in test_ringbuffer()
  ptrace: fix PTRACE_LISTEN race corrupting task->state
  Reset TreeId to zero on SMB2 TREE_CONNECT
  iio: bmg160: reset chip when probing
  arm/arm64: KVM: Take mmap_sem in kvm_arch_prepare_memory_region
  arm/arm64: KVM: Take mmap_sem in stage2_unmap_vm
  staging: android: ashmem: lseek failed due to no FMODE_LSEEK.
  sysfs: be careful of error returns from ops->show()
  drm/vmwgfx: fix integer overflow in vmw_surface_define_ioctl()
  drm/vmwgfx: Remove getparam error message
  drm/ttm, drm/vmwgfx: Relax permission checking when opening surfaces
  drm/vmwgfx: avoid calling vzalloc with a 0 size in vmw_get_cap_3d_ioctl()
  drm/vmwgfx: NULL pointer dereference in vmw_surface_define_ioctl()
  drm/vmwgfx: Type-check lookups of fence objects
  Revert "Revert "Revert "CHROMIUM: android: binder: Fix potential scheduling-while-atomic"""
  ANDROID: sdcardfs: Directly pass lower file for mmap
  UPSTREAM: checkpatch: special audit for revert commit line
  UPSTREAM: PM / sleep: make PM notifiers called symmetrically
  Revert "Revert "CHROMIUM: android: binder: Fix potential scheduling-while-atomic""
  Linux 4.4.60
  padata: avoid race in reordering
  blk: Ensure users for current->bio_list can see the full list.
  blk: improve order of bio handling in generic_make_request()
  power: reset: at91-poweroff: timely shutdown LPDDR memories
  KVM: kvm_io_bus_unregister_dev() should never fail
  rtc: s35390a: improve irq handling
  rtc: s35390a: implement reset routine as suggested by the reference
  rtc: s35390a: make sure all members in the output are set
  rtc: s35390a: fix reading out alarm
  MIPS: Lantiq: Fix cascaded IRQ setup
  mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd()
  drm/radeon: Override fpfn for all VRAM placements in radeon_evict_flags
  KVM: x86: clear bus pointer when destroyed
  USB: fix linked-list corruption in rh_call_control()
  tty/serial: atmel: fix TX path in atmel_console_write()
  tty/serial: atmel: fix race condition (TX+DMA)
  ACPI: Do not create a platform_device for IOAPIC/IOxAPIC
  ACPI: Fix incompatibility with mcount-based function graph tracing
  ASoC: atmel-classd: fix audio clock rate
  ALSA: hda - fix a problem for lineout on a Dell AIO machine
  ALSA: seq: Fix race during FIFO resize
  scsi: libsas: fix ata xfer length
  scsi: sg: check length passed to SG_NEXT_CMD_LEN
  scsi: mpt3sas: fix hang on ata passthrough commands
  xen/setup: Don't relocate p2m over existing one
  libceph: force GFP_NOIO for socket allocations
  Linux 4.4.59
  sched/rt: Add a missing rescheduling point
  fscrypt: remove broken support for detecting keyring key revocation
  metag/ptrace: Reject partial NT_METAG_RPIPE writes
  metag/ptrace: Provide default TXSTATUS for short NT_PRSTATUS
  metag/ptrace: Preserve previous registers for short regset write
  sparc/ptrace: Preserve previous registers for short regset write
  mips/ptrace: Preserve previous registers for short regset write
  h8300/ptrace: Fix incorrect register transfer count
  c6x/ptrace: Remove useless PTRACE_SETREGSET implementation
  pinctrl: qcom: Don't clear status bit on irq_unmask
  virtio_balloon: init 1st buffer in stats vq
  xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder
  xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window
  xfrm: policy: init locks early

Conflicts:
	drivers/scsi/sd.c
	drivers/usb/gadget/function/f_mtp.c
	drivers/usb/gadget/function/u_ether.c

Change-Id: I80501cf02d04204f8c0f3a7f5a036eaa4d54546e
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2017-04-25 12:51:55 -07:00
David Hildenbrand
42462d23e6 KVM: kvm_io_bus_unregister_dev() should never fail
commit 90db10434b163e46da413d34db8d0e77404cc645 upstream.

No caller currently checks the return value of
kvm_io_bus_unregister_dev(). This is evil, as all callers silently go on
freeing their device. A stale reference will remain in the io_bus,
getting at least used again, when the iobus gets teared down on
kvm_destroy_vm() - leading to use after free errors.

There is nothing the callers could do, except retrying over and over
again.

So let's simply remove the bus altogether, print an error and make
sure no one can access this broken bus again (returning -ENOMEM on any
attempt to access it).

Fixes: e93f8a0f82 ("KVM: convert io_bus to SRCU")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-08 09:53:32 +02:00
Peter Xu
3eb392056a KVM: x86: clear bus pointer when destroyed
commit df630b8c1e851b5e265dc2ca9c87222e342c093b upstream.

When releasing the bus, let's clear the bus pointers to mark it out. If
any further device unregister happens on this bus, we know that we're
done if we found the bus being released already.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-08 09:53:31 +02:00
Alex Shi
261e8dbdb9 Merge tag 'v4.4.44' into linux-linaro-lsk-v4.4
This is the 4.4.44 stable release
2017-01-22 12:01:41 +08:00
Wanpeng Li
34a55c9d4a KVM: eventfd: fix NULL deref irqbypass consumer
commit 4f3dbdf47e150016aacd734e663347fcaa768303 upstream.

Reported syzkaller:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    IP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
    PGD 0

    Oops: 0002 [#1] SMP
    CPU: 1 PID: 125 Comm: kworker/1:1 Not tainted 4.9.0+ #1
    Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm]
    task: ffff9bbe0dfbb900 task.stack: ffffb61802014000
    RIP: 0010:irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
    Call Trace:
     irqfd_shutdown+0x66/0xa0 [kvm]
     process_one_work+0x16b/0x480
     worker_thread+0x4b/0x500
     kthread+0x101/0x140
     ? process_one_work+0x480/0x480
     ? kthread_create_on_node+0x60/0x60
     ret_from_fork+0x25/0x30
    RIP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] RSP: ffffb61802017e20
    CR2: 0000000000000008

The syzkaller folks reported a NULL pointer dereference that due to
unregister an consumer which fails registration before. The syzkaller
creates two VMs w/ an equal eventfd occasionally. So the second VM
fails to register an irqbypass consumer. It will make irqfd as inactive
and queue an workqueue work to shutdown irqfd and unregister the irqbypass
consumer when eventfd is closed. However, the second consumer has been
initialized though it fails registration. So the token(same as the first
VM's) is taken to unregister the consumer through the workqueue, the
consumer of the first VM is found and unregistered, then NULL deref incurred
in the path of deleting consumer from the consumers list.

This patch fixes it by making irq_bypass_register/unregister_consumer()
looks for the consumer entry based on consumer pointer itself instead of
token matching.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-19 20:17:19 +01:00
Marc Zyngier
f65bf332f0 arm64: KVM: Turn system register numbers to an enum
Having the system register numbers as #defines has been a pain
since day one, as the ordering is pretty fragile, and moving
things around leads to renumbering and epic conflict resolutions.

Now that we're mostly acessing the sysreg file in C, an enum is
a much better type to use, and we can clean things up a bit.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
(cherry picked from commit 9d8415d6c148a16b6d906a96f0596851d7e4d607)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
2016-11-09 22:15:49 +08:00
Marc Zyngier
ac185e4487 KVM: arm/arm64: vgic-v3: Make the LR indexing macro public
We store GICv3 LRs in reverse order so that the CPU can save/restore
them in rever order as well (don't ask why, the design is crazy),
and yet generate memory traffic that doesn't completely suck.

We need this macro to be available to the C version of save/restore.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
(cherry picked from commit 3c13b8f435acb452eac62d966148a8b6fa92151f)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
2016-11-09 22:15:40 +08:00
Jim Mattson
144941bd99 KVM: nVMX: Fix memory corruption when using VMCS shadowing
commit 2f1fe81123f59271bddda673b60116bde9660385 upstream.

When freeing the nested resources of a vcpu, there is an assumption that
the vcpu's vmcs01 is the current VMCS on the CPU that executes
nested_release_vmcs12(). If this assumption is violated, the vcpu's
vmcs01 may be made active on multiple CPUs at the same time, in
violation of Intel's specification. Moreover, since the vcpu's vmcs01 is
not VMCLEARed on every CPU on which it is active, it can linger in a
CPU's VMCS cache after it has been freed and potentially
repurposed. Subsequent eviction from the CPU's VMCS cache on a capacity
miss can result in memory corruption.

It is not sufficient for vmx_free_vcpu() to call vmx_load_vmcs01(). If
the vcpu in question was last loaded on a different CPU, it must be
migrated to the current CPU before calling vmx_load_vmcs01().

Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-20 18:09:18 +02:00
Xiubo Li
54f87e16e0 kvm: Fix irq route entries exceeding KVM_MAX_IRQ_ROUTES
commit caf1ff26e1aa178133df68ac3d40815fed2187d9 upstream.

These days, we experienced one guest crash with 8 cores and 3 disks,
with qemu error logs as bellow:

qemu-system-x86_64: /build/qemu-2.0.0/kvm-all.c:984:
kvm_irqchip_commit_routes: Assertion `ret == 0' failed.

And then we found one patch(bdf026317d) in qemu tree, which said
could fix this bug.

Execute the following script will reproduce the BUG quickly:

irq_affinity.sh
========================================================================

vda_irq_num=25
vdb_irq_num=27
while [ 1 ]
do
    for irq in {1,2,4,8,10,20,40,80}
        do
            echo $irq > /proc/irq/$vda_irq_num/smp_affinity
            echo $irq > /proc/irq/$vdb_irq_num/smp_affinity
            dd if=/dev/vda of=/dev/zero bs=4K count=100 iflag=direct
            dd if=/dev/vdb of=/dev/zero bs=4K count=100 iflag=direct
        done
done
========================================================================

The following qemu log is added in the qemu code and is displayed when
this bug reproduced:

kvm_irqchip_commit_routes: max gsi: 1008, nr_allocated_irq_routes: 1024,
irq_routes->nr: 1024, gsi_count: 1024.

That's to say when irq_routes->nr == 1024, there are 1024 routing entries,
but in the kernel code when routes->nr >= 1024, will just return -EINVAL;

The nr is the number of the routing entries which is in of
[1 ~ KVM_MAX_IRQ_ROUTES], not the index in [0 ~ KVM_MAX_IRQ_ROUTES - 1].

This patch fix the BUG above.

Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Signed-off-by: Wei Tang <tangwei@cmss.chinamobile.com>
Signed-off-by: Zhang Zhuoyu <zhangzhuoyu@cmss.chinamobile.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27 09:47:31 -07:00
Paolo Bonzini
2cb77b0ad4 KVM: irqfd: fix NULL pointer dereference in kvm_irq_map_gsi
commit c622a3c21ede892e370b56e1ceb9eb28f8bbda6b upstream.

Found by syzkaller:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000120
    IP: [<ffffffffa0797202>] kvm_irq_map_gsi+0x12/0x90 [kvm]
    PGD 6f80b067 PUD b6535067 PMD 0
    Oops: 0000 [#1] SMP
    CPU: 3 PID: 4988 Comm: a.out Not tainted 4.4.9-300.fc23.x86_64 #1
    [...]
    Call Trace:
     [<ffffffffa0795f62>] irqfd_update+0x32/0xc0 [kvm]
     [<ffffffffa0796c7c>] kvm_irqfd+0x3dc/0x5b0 [kvm]
     [<ffffffffa07943f4>] kvm_vm_ioctl+0x164/0x6f0 [kvm]
     [<ffffffff81241648>] do_vfs_ioctl+0x298/0x480
     [<ffffffff812418a9>] SyS_ioctl+0x79/0x90
     [<ffffffff817a1062>] tracesys_phase2+0x84/0x89
    Code: b5 71 a7 e0 5b 41 5c 41 5d 5d f3 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 8f 10 2e 00 00 31 c0 48 89 e5 <39> 91 20 01 00 00 76 6a 48 63 d2 48 8b 94 d1 28 01 00 00 48 85
    RIP  [<ffffffffa0797202>] kvm_irq_map_gsi+0x12/0x90 [kvm]
     RSP <ffff8800926cbca8>
    CR2: 0000000000000120

Testcase:

    #include <unistd.h>
    #include <sys/syscall.h>
    #include <string.h>
    #include <stdint.h>
    #include <linux/kvm.h>
    #include <fcntl.h>
    #include <sys/ioctl.h>

    long r[26];

    int main()
    {
        memset(r, -1, sizeof(r));
        r[2] = open("/dev/kvm", 0);
        r[3] = ioctl(r[2], KVM_CREATE_VM, 0);

        struct kvm_irqfd ifd;
        ifd.fd = syscall(SYS_eventfd2, 5, 0);
        ifd.gsi = 3;
        ifd.flags = 2;
        ifd.resamplefd = ifd.fd;
        r[25] = ioctl(r[3], KVM_IRQFD, &ifd);
        return 0;
    }

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24 10:18:18 -07:00
Marc Zyngier
5716a93fef KVM: arm/arm64: Handle forward time correction gracefully
commit 1c5631c73fc2261a5df64a72c155cb53dcdc0c45 upstream.

On a host that runs NTP, corrections can have a direct impact on
the background timer that we program on the behalf of a vcpu.

In particular, NTP performing a forward correction will result in
a timer expiring sooner than expected from a guest point of view.
Not a big deal, we kick the vcpu anyway.

But on wake-up, the vcpu thread is going to perform a check to
find out whether or not it should block. And at that point, the
timer check is going to say "timer has not expired yet, go back
to sleep". This results in the timer event being lost forever.

There are multiple ways to handle this. One would be record that
the timer has expired and let kvm_cpu_has_pending_timer return
true in that case, but that would be fairly invasive. Another is
to check for the "short sleep" condition in the hrtimer callback,
and restart the timer for the remaining time when the condition
is detected.

This patch implements the latter, with a bit of refactoring in
order to avoid too much code duplication.

Reported-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-04 14:48:40 -07:00
Paolo Bonzini
4e2fa4bbba KVM: fix spin_lock_init order on x86
commit e9ad4ec8379ad1ba6f68b8ca1c26b50b5ae0a327 upstream.

Moving the initialization earlier is needed in 4.6 because
kvm_arch_init_vm is now using mmu_lock, causing lockdep to
complain:

[  284.440294] INFO: trying to register non-static key.
[  284.445259] the code is fine but needs lockdep annotation.
[  284.450736] turning off the locking correctness validator.
...
[  284.528318]  [<ffffffff810aecc3>] lock_acquire+0xd3/0x240
[  284.533733]  [<ffffffffa0305aa0>] ? kvm_page_track_register_notifier+0x20/0x60 [kvm]
[  284.541467]  [<ffffffff81715581>] _raw_spin_lock+0x41/0x80
[  284.546960]  [<ffffffffa0305aa0>] ? kvm_page_track_register_notifier+0x20/0x60 [kvm]
[  284.554707]  [<ffffffffa0305aa0>] kvm_page_track_register_notifier+0x20/0x60 [kvm]
[  284.562281]  [<ffffffffa02ece70>] kvm_mmu_init_vm+0x20/0x30 [kvm]
[  284.568381]  [<ffffffffa02dbf7a>] kvm_arch_init_vm+0x1ea/0x200 [kvm]
[  284.574740]  [<ffffffffa02bff3f>] kvm_dev_ioctl+0xbf/0x4d0 [kvm]

However, it also helps fixing a preexisting problem, which is why this
patch is also good for stable kernels: kvm_create_vm was incrementing
current->mm->mm_count but not decrementing it at the out_err label (in
case kvm_init_mmu_notifier failed).  The new initialization order makes
it possible to add the required mmdrop without adding a new error label.

Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:08:34 -07:00
David Matlack
c9e1bbef7e kvm: cap halt polling at exactly halt_poll_ns
commit 313f636d5c490c9741d3f750dc8da33029edbc6b upstream.

When growing halt-polling, there is no check that the poll time exceeds
the limit. It's possible for vcpu->halt_poll_ns grow once past
halt_poll_ns, and stay there until a halt which takes longer than
vcpu->halt_poll_ns. For example, booting a Linux guest with
halt_poll_ns=11000:

 ... kvm:kvm_halt_poll_ns: vcpu 0: halt_poll_ns 0 (shrink 10000)
 ... kvm:kvm_halt_poll_ns: vcpu 0: halt_poll_ns 10000 (grow 0)
 ... kvm:kvm_halt_poll_ns: vcpu 0: halt_poll_ns 20000 (grow 10000)

Signed-off-by: David Matlack <dmatlack@google.com>
Fixes: aca6ff29c4
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-16 08:42:58 -07:00
Mark Rutland
d62cca1106 KVM: arm/arm64: vgic: Ensure bitmaps are long enough
commit 236cf17c2502007a9d2dda3c39fb0d9a6bd03cc2 upstream.

When we allocate bitmaps in vgic_vcpu_init_maps, we divide the number of
bits we need by 8 to figure out how many bytes to allocate. However,
bitmap elements are always accessed as unsigned longs, and if we didn't
happen to allocate a size such that size % sizeof(unsigned long) == 0,
bitmap accesses may go past the end of the allocation.

When using KASAN (which does byte-granular access checks), this results
in a continuous stream of BUGs whenever these bitmaps are accessed:

=============================================================================
BUG kmalloc-128 (Tainted: G    B          ): kasan: bad access detected
-----------------------------------------------------------------------------

INFO: Allocated in vgic_init.part.25+0x55c/0x990 age=7493 cpu=3 pid=1730
INFO: Slab 0xffffffbde6d5da40 objects=16 used=15 fp=0xffffffc935769700 flags=0x4000000000000080
INFO: Object 0xffffffc935769500 @offset=1280 fp=0x          (null)

Bytes b4 ffffffc9357694f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769510: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769520: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769530: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769540: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769550: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769560: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769570: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
CPU: 3 PID: 1740 Comm: kvm-vcpu-0 Tainted: G    B           4.4.0+ #17
Hardware name: ARM Juno development board (r1) (DT)
Call trace:
[<ffffffc00008e770>] dump_backtrace+0x0/0x280
[<ffffffc00008ea04>] show_stack+0x14/0x20
[<ffffffc000726360>] dump_stack+0x100/0x188
[<ffffffc00030d324>] print_trailer+0xfc/0x168
[<ffffffc000312294>] object_err+0x3c/0x50
[<ffffffc0003140fc>] kasan_report_error+0x244/0x558
[<ffffffc000314548>] __asan_report_load8_noabort+0x48/0x50
[<ffffffc000745688>] __bitmap_or+0xc0/0xc8
[<ffffffc0000d9e44>] kvm_vgic_flush_hwstate+0x1bc/0x650
[<ffffffc0000c514c>] kvm_arch_vcpu_ioctl_run+0x2ec/0xa60
[<ffffffc0000b9a6c>] kvm_vcpu_ioctl+0x474/0xa68
[<ffffffc00036b7b0>] do_vfs_ioctl+0x5b8/0xcb0
[<ffffffc00036bf34>] SyS_ioctl+0x8c/0xa0
[<ffffffc000086cb0>] el0_svc_naked+0x24/0x28
Memory state around the buggy address:
 ffffffc935769400: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffffffc935769480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffffffc935769500: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                   ^
 ffffffc935769580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffffffc935769600: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================

Fix the issue by always allocating a multiple of sizeof(unsigned long),
as we do elsewhere in the vgic code.

Fixes: c1bfb577a ("arm/arm64: KVM: vgic: switch to dynamic allocation")
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03 15:07:29 -08:00
Christian Borntraeger
85212a3690 KVM: async_pf: do not warn on page allocation failures
commit d7444794a02ff655eda87e3cc54e86b940e7736f upstream.

In async_pf we try to allocate with NOWAIT to get an element quickly
or fail. This code also handle failures gracefully. Lets silence
potential page allocation failures under load.

qemu-system-s39: page allocation failure: order:0,mode:0x2200000
[...]
Call Trace:
([<00000000001146b8>] show_trace+0xf8/0x148)
[<000000000011476a>] show_stack+0x62/0xe8
[<00000000004a36b8>] dump_stack+0x70/0x98
[<0000000000272c3a>] warn_alloc_failed+0xd2/0x148
[<000000000027709e>] __alloc_pages_nodemask+0x94e/0xb38
[<00000000002cd36a>] new_slab+0x382/0x400
[<00000000002cf7ac>] ___slab_alloc.constprop.30+0x2dc/0x378
[<00000000002d03d0>] kmem_cache_alloc+0x160/0x1d0
[<0000000000133db4>] kvm_setup_async_pf+0x6c/0x198
[<000000000013dee8>] kvm_arch_vcpu_ioctl_run+0xd48/0xd58
[<000000000012fcaa>] kvm_vcpu_ioctl+0x372/0x690
[<00000000002f66f6>] do_vfs_ioctl+0x3be/0x510
[<00000000002f68ec>] SyS_ioctl+0xa4/0xb8
[<0000000000781c5e>] system_call+0xd6/0x264
[<000003ffa24fa06a>] 0x3ffa24fa06a

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03 15:07:29 -08:00
Andre Przywara
b3e336de65 KVM: arm/arm64: Fix reference to uninitialised VGIC
commit b3aff6ccbb1d25e506b60ccd9c559013903f3464 upstream.

Commit 4b4b4512da ("arm/arm64: KVM: Rework the arch timer to use
level-triggered semantics") brought the virtual architected timer
closer to the VGIC. There is one occasion were we don't properly
check for the VGIC actually having been initialized before, but
instead go on to check the active state of some IRQ number.
If userland hasn't instantiated a virtual GIC, we end up with a
kernel NULL pointer dereference:
=========
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = ffffffc9745c5000
[00000000] *pgd=00000009f631e003, *pud=00000009f631e003, *pmd=0000000000000000
Internal error: Oops: 96000006 [#2] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 2144 Comm: kvm_simplest-ar Tainted: G      D 4.5.0-rc2+ #1300
Hardware name: ARM Juno development board (r1) (DT)
task: ffffffc976da8000 ti: ffffffc976e28000 task.ti: ffffffc976e28000
PC is at vgic_bitmap_get_irq_val+0x78/0x90
LR is at kvm_vgic_map_is_active+0xac/0xc8
pc : [<ffffffc0000b7e28>] lr : [<ffffffc0000b972c>] pstate: 20000145
....
=========

Fix this by bailing out early of kvm_timer_flush_hwstate() if we don't
have a VGIC at all.

Reported-by: Cosmin Gorgovan <cosmin@linux-geek.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:20 -08:00
Christoffer Dall
fdec12c12e KVM: arm/arm64: vgic: Fix kvm_vgic_map_is_active's dist check
External inputs to the vgic from time to time need to poke into the
state of a virtual interrupt, the prime example is the architected timer
code.

Since the IRQ's active state can be represented in two places; the LR or
the distributor, we first loop over the LRs but if not active in the LRs
we just return if *any* IRQ is active on the VCPU in question.

This is of course bogus, as we should check if the specific IRQ in
quesiton is active on the distributor instead.

Reported-by: Eric Auger <eric.auger@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-12-11 16:33:31 +00:00
Christoffer Dall
9f958c11b7 KVM: arm/arm64: vgic: Trust the LR state for HW IRQs
We were probing the physial distributor state for the active state of a
HW virtual IRQ, because we had seen evidence that the LR state was not
cleared when the guest deactivated a virtual interrupted.

However, this issue turned out to be a software bug in the GIC, which
was solved by: 84aab5e68c2a5e1e18d81ae8308c3ce25d501b29
(KVM: arm/arm64: arch_timer: Preserve physical dist. active
state on LR.active, 2015-11-24)

Therefore, get rid of the complexities and just look at the LR.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-24 18:08:37 +01:00
Christoffer Dall
0e3dfda91d KVM: arm/arm64: arch_timer: Preserve physical dist. active state on LR.active
We were incorrectly removing the active state from the physical
distributor on the timer interrupt when the timer output level was
deasserted.  We shouldn't be doing this without considering the virtual
interrupt's active state, because the architecture requires that when an
LR has the HW bit set and the pending or active bits set, then the
physical interrupt must also have the corresponding bits set.

This addresses an issue where we have been observing an inconsistency
between the LR state and the physical distributor state where the LR
state was active and the physical distributor was not active, which
shouldn't happen.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-24 18:07:40 +01:00
Linus Torvalds
933425fb00 s390: A bunch of fixes and optimizations for interrupt and time
handling.
 
 PPC: Mostly bug fixes.
 
 ARM: No big features, but many small fixes and prerequisites including:
 - a number of fixes for the arch-timer
 - introducing proper level-triggered semantics for the arch-timers
 - a series of patches to synchronously halt a guest (prerequisite for
   IRQ forwarding)
 - some tracepoint improvements
 - a tweak for the EL2 panic handlers
 - some more VGIC cleanups getting rid of redundant state
 
 x86: quite a few changes:
 
 - support for VT-d posted interrupts (i.e. PCI devices can inject
 interrupts directly into vCPUs).  This introduces a new component (in
 virt/lib/) that connects VFIO and KVM together.  The same infrastructure
 will be used for ARM interrupt forwarding as well.
 
 - more Hyper-V features, though the main one Hyper-V synthetic interrupt
 controller will have to wait for 4.5.  These will let KVM expose Hyper-V
 devices.
 
 - nested virtualization now supports VPID (same as PCID but for vCPUs)
 which makes it quite a bit faster
 
 - for future hardware that supports NVDIMM, there is support for clflushopt,
 clwb, pcommit
 
 - support for "split irqchip", i.e. LAPIC in kernel + IOAPIC/PIC/PIT in
 userspace, which reduces the attack surface of the hypervisor
 
 - obligatory smattering of SMM fixes
 
 - on the guest side, stable scheduler clock support was rewritten to not
 require help from the hypervisor.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWO2IQAAoJEL/70l94x66D/K0H/3AovAgYmJQToZlimsktMk6a
 f2xhdIqfU5lIQQh5uNBCfL3o9o8H9Py1ym7aEw3fmztPHHJYc91oTatt2UEKhmEw
 VtZHp/dFHt3hwaIdXmjRPEXiYctraKCyrhaUYdWmUYkoKi7lW5OL5h+S7frG2U6u
 p/hFKnHRZfXHr6NSgIqvYkKqtnc+C0FWY696IZMzgCksOO8jB1xrxoSN3tANW3oJ
 PDV+4og0fN/Fr1capJUFEc/fejREHneANvlKrLaa8ht0qJQutoczNADUiSFLcMPG
 iHljXeDsv5eyjMtUuIL8+MPzcrIt/y4rY41ZPiKggxULrXc6H+JJL/e/zThZpXc=
 =iv2z
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "First batch of KVM changes for 4.4.

  s390:
     A bunch of fixes and optimizations for interrupt and time handling.

  PPC:
     Mostly bug fixes.

  ARM:
     No big features, but many small fixes and prerequisites including:

      - a number of fixes for the arch-timer

      - introducing proper level-triggered semantics for the arch-timers

      - a series of patches to synchronously halt a guest (prerequisite
        for IRQ forwarding)

      - some tracepoint improvements

      - a tweak for the EL2 panic handlers

      - some more VGIC cleanups getting rid of redundant state

  x86:
     Quite a few changes:

      - support for VT-d posted interrupts (i.e. PCI devices can inject
        interrupts directly into vCPUs).  This introduces a new
        component (in virt/lib/) that connects VFIO and KVM together.
        The same infrastructure will be used for ARM interrupt
        forwarding as well.

      - more Hyper-V features, though the main one Hyper-V synthetic
        interrupt controller will have to wait for 4.5.  These will let
        KVM expose Hyper-V devices.

      - nested virtualization now supports VPID (same as PCID but for
        vCPUs) which makes it quite a bit faster

      - for future hardware that supports NVDIMM, there is support for
        clflushopt, clwb, pcommit

      - support for "split irqchip", i.e.  LAPIC in kernel +
        IOAPIC/PIC/PIT in userspace, which reduces the attack surface of
        the hypervisor

      - obligatory smattering of SMM fixes

      - on the guest side, stable scheduler clock support was rewritten
        to not require help from the hypervisor"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (123 commits)
  KVM: VMX: Fix commit which broke PML
  KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0()
  KVM: x86: allow RSM from 64-bit mode
  KVM: VMX: fix SMEP and SMAP without EPT
  KVM: x86: move kvm_set_irq_inatomic to legacy device assignment
  KVM: device assignment: remove pointless #ifdefs
  KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic
  KVM: x86: zero apic_arb_prio on reset
  drivers/hv: share Hyper-V SynIC constants with userspace
  KVM: x86: handle SMBASE as physical address in RSM
  KVM: x86: add read_phys to x86_emulate_ops
  KVM: x86: removing unused variable
  KVM: don't pointlessly leave KVM_COMPAT=y in non-KVM configs
  KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr()
  KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings
  KVM: arm/arm64: Optimize away redundant LR tracking
  KVM: s390: use simple switch statement as multiplexer
  KVM: s390: drop useless newline in debugging data
  KVM: s390: SCA must not cross page boundaries
  KVM: arm: Do not indent the arguments of DECLARE_BITMAP
  ...
2015-11-05 16:26:26 -08:00
Paolo Bonzini
b97e6de9c9 KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic
We do not want to do too much work in atomic context, in particular
not walking all the VCPUs of the virtual machine.  So we want
to distinguish the architecture-specific injection function for irqfd
from kvm_set_msi.  Since it's still empty, reuse the newly added
kvm_arch_set_irq and rename it to kvm_arch_set_irq_inatomic.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 16:24:35 +01:00
Jan Beulich
6956d8946d KVM: don't pointlessly leave KVM_COMPAT=y in non-KVM configs
The symbol was missing a KVM dependency.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 16:24:30 +01:00
Paolo Bonzini
197a4f4b06 KVM/ARM Changes for v4.4-rc1
Includes a number of fixes for the arch-timer, introducing proper
 level-triggered semantics for the arch-timers, a series of patches to
 synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint
 improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups
 getting rid of redundant state, and finally a stylistic change that gets rid of
 some ctags warnings.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWOhgAAAoJEEtpOizt6ddyKS8H/2ZHMTPjo6yChnrusNWy4Qbr
 6laPDlzL+g45oMQRwNL7GnM1deRftaxvT2Wi+X84D/6Y/BD6MPds4HgtBfuWcSZ1
 CyRJ0Ot/zrxenucSuJuOjq+a9gdizdAczkbB1MfYDULJH8fb6D+7RYLo3zgh4Xo4
 pla3L9U6gSWe+YopBjZtZH43m3fwiwSM/v+uHOTIcXrsbR+fEgx/EFSKmA/DUCuo
 P5cFO/ceUGu7nATCexu5V82TgR2hvurrsR7mqfwY8YcF6HRM+NEOoS29xWC77v5S
 u/F08TKuKQLv0YTEFTyLETI/oEeuC0cHtrRQBNf4+9kXEOzKyXaae0wR/I6X2Ss=
 =GMNk
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/ARM Changes for v4.4-rc1

Includes a number of fixes for the arch-timer, introducing proper
level-triggered semantics for the arch-timers, a series of patches to
synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint
improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups
getting rid of redundant state, and finally a stylistic change that gets rid of
some ctags warnings.

Conflicts:
	arch/x86/include/asm/kvm_host.h
2015-11-04 16:24:17 +01:00
Pavel Fedin
26caea7693 KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr()
Now we see that vgic_set_lr() and vgic_sync_lr_elrsr() are always used
together. Merge them into one function, saving from second vgic_ops
dereferencing every time.

Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-04 15:29:49 +01:00
Pavel Fedin
212c76545d KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings
1. Remove unnecessary 'irq' argument, because irq number can be retrieved
   from the LR.
2. Since cff9211eb1
   ("arm/arm64: KVM: Fix arch timer behavior for disabled interrupts ")
   LR_STATE_PENDING is queued back by vgic_retire_lr() itself. Also, it
   clears vlr.state itself. Therefore, we remove the same, now duplicated,
   check with all accompanying bit manipulations from vgic_unqueue_irqs().
3. vgic_retire_lr() is always accompanied by vgic_irq_clear_queued(). Since
   it already does more than just clearing the LR, move
   vgic_irq_clear_queued() inside of it.

Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-04 15:29:49 +01:00
Pavel Fedin
c4cd4c168b KVM: arm/arm64: Optimize away redundant LR tracking
Currently we use vgic_irq_lr_map in order to track which LRs hold which
IRQs, and lr_used bitmap in order to track which LRs are used or free.

vgic_irq_lr_map is actually used only for piggy-back optimization, and
can be easily replaced by iteration over lr_used. This is good because in
future, when LPI support is introduced, number of IRQs will grow up to at
least 16384, while numbers from 1024 to 8192 are never going to be used.
This would be a huge memory waste.

In its turn, lr_used is also completely redundant since
ae705930fc ("arm/arm64: KVM: Keep elrsr/aisr
in sync with software model"), because together with lr_used we also update
elrsr. This allows to easily replace lr_used with elrsr, inverting all
conditions (because in elrsr '1' means 'free').

Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-04 15:29:49 +01:00
Linus Torvalds
6aa2fdb87c Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
 "The irq departement delivers:

   - Rework the irqdomain core infrastructure to accomodate ACPI based
     systems.  This is required to support ARM64 without creating
     artificial device tree nodes.

   - Sanitize the ACPI based ARM GIC initialization by making use of the
     new firmware independent irqdomain core

   - Further improvements to the generic MSI management

   - Generalize the irq migration on CPU hotplug

   - Improvements to the threaded interrupt infrastructure

   - Allow the migration of "chained" low level interrupt handlers

   - Allow optional force masking of interrupts in disable_irq[_nosysnc]

   - Support for two new interrupt chips - Sigh!

   - A larger set of errata fixes for ARM gicv3

   - The usual pile of fixes, updates, improvements and cleanups all
     over the place"

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
  Document that IRQ_NONE should be returned when IRQ not actually handled
  PCI/MSI: Allow the MSI domain to be device-specific
  PCI: Add per-device MSI domain hook
  of/irq: Use the msi-map property to provide device-specific MSI domain
  of/irq: Split of_msi_map_rid to reuse msi-map lookup
  irqchip/gic-v3-its: Parse new version of msi-parent property
  PCI/MSI: Use of_msi_get_domain instead of open-coded "msi-parent" parsing
  of/irq: Use of_msi_get_domain instead of open-coded "msi-parent" parsing
  of/irq: Add support code for multi-parent version of "msi-parent"
  irqchip/gic-v3-its: Add handling of PCI requester id.
  PCI/MSI: Add helper function pci_msi_domain_get_msi_rid().
  of/irq: Add new function of_msi_map_rid()
  Docs: dt: Add PCI MSI map bindings
  irqchip/gic-v2m: Add support for multiple MSI frames
  irqchip/gic-v3: Fix translation of LPIs after conversion to irq_fwspec
  irqchip/mxs: Add Alphascale ASM9260 support
  irqchip/mxs: Prepare driver for hardware with different offsets
  irqchip/mxs: Panic if ioremap or domain creation fails
  irqdomain: Documentation updates
  irqdomain/msi: Use fwnode instead of of_node
  ...
2015-11-03 14:40:01 -08:00
Christoffer Dall
e21f091087 arm/arm64: KVM: Add tracepoints for vgic and timer
The VGIC and timer code for KVM arm/arm64 doesn't have any tracepoints
or tracepoint infrastructure defined.  Rewriting some of the timer code
handling showed me how much we need this, so let's add these simple
trace points once and for all and we can easily expand with additional
trace points in these files as we go along.

Cc: Wei Huang <wei@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22 23:01:48 +02:00
Christoffer Dall
8fe2f19e6e arm/arm64: KVM: Support edge-triggered forwarded interrupts
We mark edge-triggered interrupts with the HW bit set as queued to
prevent the VGIC code from injecting LRs with both the Active and
Pending bits set at the same time while also setting the HW bit,
because the hardware does not support this.

However, this means that we must also clear the queued flag when we sync
back a LR where the state on the physical distributor went from active
to inactive because the guest deactivated the interrupt.  At this point
we must also check if the interrupt is pending on the distributor, and
tell the VGIC to queue it again if it is.

Since these actions on the sync path are extremely close to those for
level-triggered interrupts, rename process_level_irq to
process_queued_irq, allowing it to cater for both cases.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22 23:01:44 +02:00
Christoffer Dall
4b4b4512da arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
The arch timer currently uses edge-triggered semantics in the sense that
the line is never sampled by the vgic and lowering the line from the
timer to the vgic doesn't have any effect on the pending state of
virtual interrupts in the vgic.  This means that we do not support a
guest with the otherwise valid behavior of (1) disable interrupts (2)
enable the timer (3) disable the timer (4) enable interrupts.  Such a
guest would validly not expect to see any interrupts on real hardware,
but will see interrupts on KVM.

This patch fixes this shortcoming through the following series of
changes.

First, we change the flow of the timer/vgic sync/flush operations.  Now
the timer is always flushed/synced before the vgic, because the vgic
samples the state of the timer output.  This has the implication that we
move the timer operations in to non-preempible sections, but that is
fine after the previous commit getting rid of hrtimer schedules on every
entry/exit.

Second, we change the internal behavior of the timer, letting the timer
keep track of its previous output state, and only lower/raise the line
to the vgic when the state changes.  Note that in theory this could have
been accomplished more simply by signalling the vgic every time the
state *potentially* changed, but we don't want to be hitting the vgic
more often than necessary.

Third, we get rid of the use of the map->active field in the vgic and
instead simply set the interrupt as active on the physical distributor
whenever the input to the GIC is asserted and conversely clear the
physical active state when the input to the GIC is deasserted.

Fourth, and finally, we now initialize the timer PPIs (and all the other
unused PPIs for now), to be level-triggered, and modify the sync code to
sample the line state on HW sync and re-inject a new interrupt if it is
still pending at that time.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22 23:01:44 +02:00
Christoffer Dall
54723bb37f arm/arm64: KVM: Use appropriate define in VGIC reset code
We currently initialize the SGIs to be enabled in the VGIC code, but we
use the VGIC_NR_PPIS define for this purpose, instead of the the more
natural VGIC_NR_SGIS.  Change this slightly confusing use of the
defines.

Note: This should have no functional change, as both names are defined
to the number 16.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22 23:01:43 +02:00
Christoffer Dall
8bf9a701e1 arm/arm64: KVM: Implement GICD_ICFGR as RO for PPIs
The GICD_ICFGR allows the bits for the SGIs and PPIs to be read only.
We currently simulate this behavior by writing a hardcoded value to the
register for the SGIs and PPIs on every write of these bits to the
register (ignoring what the guest actually wrote), and by writing the
same value as the reset value to the register.

This is a bit counter-intuitive, as the register is RO for these bits,
and we can just implement it that way, allowing us to control the value
of the bits purely in the reset code.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22 23:01:42 +02:00
Christoffer Dall
9103617df2 arm/arm64: KVM: vgic: Factor out level irq processing on guest exit
Currently vgic_process_maintenance() processes dealing with a completed
level-triggered interrupt directly, but we are soon going to reuse this
logic for level-triggered mapped interrupts with the HW bit set, so
move this logic into a separate static function.

Probably the most scary part of this commit is convincing yourself that
the current flow is safe compared to the old one.  In the following I
try to list the changes and why they are harmless:

  Move vgic_irq_clear_queued after kvm_notify_acked_irq:
    Harmless because the only potential effect of clearing the queued
    flag wrt.  kvm_set_irq is that vgic_update_irq_pending does not set
    the pending bit on the emulated CPU interface or in the
    pending_on_cpu bitmask if the function is called with level=1.
    However, the point of kvm_notify_acked_irq is to call kvm_set_irq
    with level=0, and we set the queued flag again in
    __kvm_vgic_sync_hwstate later on if the level is stil high.

  Move vgic_set_lr before kvm_notify_acked_irq:
    Also, harmless because the LR are cpu-local operations and
    kvm_notify_acked only affects the dist

  Move vgic_dist_irq_clear_soft_pend after kvm_notify_acked_irq:
    Also harmless, because now we check the level state in the
    clear_soft_pend function and lower the pending bits if the level is
    low.

Reviewed-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22 23:01:42 +02:00
Christoffer Dall
d35268da66 arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block
We currently schedule a soft timer every time we exit the guest if the
timer did not expire while running the guest.  This is really not
necessary, because the only work we do in the timer work function is to
kick the vcpu.

Kicking the vcpu does two things:
(1) If the vpcu thread is on a waitqueue, make it runnable and remove it
from the waitqueue.
(2) If the vcpu is running on a different physical CPU from the one
doing the kick, it sends a reschedule IPI.

The second case cannot happen, because the soft timer is only ever
scheduled when the vcpu is not running.  The first case is only relevant
when the vcpu thread is on a waitqueue, which is only the case when the
vcpu thread has called kvm_vcpu_block().

Therefore, we only need to make sure a timer is scheduled for
kvm_vcpu_block(), which we do by encapsulating all calls to
kvm_vcpu_block() with kvm_timer_{un}schedule calls.

Additionally, we only schedule a soft timer if the timer is enabled and
unmasked, since it is useless otherwise.

Note that theoretically userspace can use the SET_ONE_REG interface to
change registers that should cause the timer to fire, even if the vcpu
is blocked without a scheduled timer, but this case was not supported
before this patch and we leave it for future work for now.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22 23:01:42 +02:00
Christoffer Dall
3217f7c25b KVM: Add kvm_arch_vcpu_{un}blocking callbacks
Some times it is useful for architecture implementations of KVM to know
when the VCPU thread is about to block or when it comes back from
blocking (arm/arm64 needs to know this to properly implement timers, for
example).

Therefore provide a generic architecture callback function in line with
what we do elsewhere for KVM generic-arch interactions.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22 23:01:41 +02:00
Christoffer Dall
0d997491f8 arm/arm64: KVM: Fix disabled distributor operation
We currently do a single update of the vgic state when the distributor
enable/disable control register is accessed and then bypass updating the
state for as long as the distributor remains disabled.

This is incorrect, because updating the state does not consider the
distributor enable bit, and this you can end up in a situation where an
interrupt is marked as pending on the CPU interface, but not pending on
the distributor, which is an impossible state to be in, and triggers a
warning.  Consider for example the following sequence of events:

1. An interrupt is marked as pending on the distributor
   - the interrupt is also forwarded to the CPU interface
2. The guest turns off the distributor (it's about to do a reboot)
   - we stop updating the CPU interface state from now on
3. The guest disables the pending interrupt
   - we remove the pending state from the distributor, but don't touch
     the CPU interface, see point 2.

Since the distributor disable bit really means that no interrupts should
be forwarded to the CPU interface, we modify the code to keep updating
the internal VGIC state, but always set the CPU interface pending bits
to zero when the distributor is disabled.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-20 18:09:13 +02:00
Christoffer Dall
544c572e03 arm/arm64: KVM: Clear map->active on pend/active clear
When a guest reboots or offlines/onlines CPUs, it is not uncommon for it
to clear the pending and active states of an interrupt through the
emulated VGIC distributor.  However, since the architected timers are
defined by the architecture to be level triggered and the guest
rightfully expects them to be that, but we emulate them as
edge-triggered, we have to mimic level-triggered behavior for an
edge-triggered virtual implementation.

We currently do not signal the VGIC when the map->active field is true,
because it indicates that the guest has already been signalled of the
interrupt as required.  Normally this field is set to false when the
guest deactivates the virtual interrupt through the sync path.

We also need to catch the case where the guest deactivates the interrupt
through the emulated distributor, again allowing guests to boot even if
the original virtual timer signal hit before the guest's GIC
initialization sequence is run.

Reviewed-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-20 18:06:34 +02:00
Christoffer Dall
cff9211eb1 arm/arm64: KVM: Fix arch timer behavior for disabled interrupts
We have an interesting issue when the guest disables the timer interrupt
on the VGIC, which happens when turning VCPUs off using PSCI, for
example.

The problem is that because the guest disables the virtual interrupt at
the VGIC level, we never inject interrupts to the guest and therefore
never mark the interrupt as active on the physical distributor.  The
host also never takes the timer interrupt (we only use the timer device
to trigger a guest exit and everything else is done in software), so the
interrupt does not become active through normal means.

The result is that we keep entering the guest with a programmed timer
that will always fire as soon as we context switch the hardware timer
state and run the guest, preventing forward progress for the VCPU.

Since the active state on the physical distributor is really part of the
timer logic, it is the job of our virtual arch timer driver to manage
this state.

The timer->map->active boolean field indicates whether we have signalled
this interrupt to the vgic and if that interrupt is still pending or
active.  As long as that is the case, the hardware doesn't have to
generate physical interrupts and therefore we mark the interrupt as
active on the physical distributor.

We also have to restore the pending state of an interrupt that was
queued to an LR but was retired from the LR for some reason, while
remaining pending in the LR.

Cc: Marc Zyngier <marc.zyngier@arm.com>
Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-20 18:04:54 +02:00
Pavel Fedin
437f9963bc KVM: arm/arm64: Do not inject spurious interrupts
When lowering a level-triggered line from userspace, we forgot to lower
the pending bit on the emulated CPU interface and we also did not
re-compute the pending_on_cpu bitmap for the CPU affected by the change.

Update vgic_update_irq_pending() to fix the two issues above and also
raise a warning in vgic_quue_irq_to_lr if we encounter an interrupt
pending on a CPU which is neither marked active nor pending.

  [ Commit text reworked completely - Christoffer ]

Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-20 18:04:43 +02:00
Andrey Smetanin
f33143d809 kvm/irqchip: allow only multiple irqchip routes per GSI
Any other irq routing types (MSI, S390_ADAPTER, upcoming Hyper-V
SynIC) map one-to-one to GSI.

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:34:30 +02:00
Andrey Smetanin
c9a5eccac1 kvm/eventfd: add arch-specific set_irq
Allow for arch-specific interrupt types to be set.  For that, add
kvm_arch_set_irq() which takes interrupt type-specific action if it
recognizes the interrupt type given, and -EWOULDBLOCK otherwise.

The default implementation always returns -EWOULDBLOCK.

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:34:29 +02:00
Andrey Smetanin
ba1aefcd6d kvm/eventfd: factor out kvm_notify_acked_gsi()
Factor out kvm_notify_acked_gsi() helper to iterate over EOI listeners
and notify those matching the given gsi.

It will be reused in the upcoming Hyper-V SynIC implementation.

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:34:29 +02:00
Andrey Smetanin
351dc6477c kvm/eventfd: avoid loop inside irqfd_update()
The loop(for) inside irqfd_update() is unnecessary
because any other value for irq_entry.type will just trigger
schedule_work(&irqfd->inject) in irqfd_wakeup.

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:34:28 +02:00
Kosuke Tatsukawa
6003a42010 kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c
async_pf_execute() seems to be missing a memory barrier which might
cause the waker to not notice the waiter and miss sending a wake_up as
in the following figure.

        async_pf_execute                    kvm_vcpu_block
------------------------------------------------------------------------
spin_lock(&vcpu->async_pf.lock);
if (waitqueue_active(&vcpu->wq))
/* The CPU might reorder the test for
   the waitqueue up here, before
   prior writes complete */
                                    prepare_to_wait(&vcpu->wq, &wait,
                                      TASK_INTERRUPTIBLE);
                                    /*if (kvm_vcpu_check_block(vcpu) < 0) */
                                     /*if (kvm_arch_vcpu_runnable(vcpu)) { */
                                      ...
                                      return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
                                        !vcpu->arch.apf.halted)
                                        || !list_empty_careful(&vcpu->async_pf.done)
                                     ...
                                     return 0;
list_add_tail(&apf->link,
  &vcpu->async_pf.done);
spin_unlock(&vcpu->async_pf.lock);
                                    waited = true;
                                    schedule();
------------------------------------------------------------------------

The attached patch adds the missing memory barrier.

I found this issue when I was looking through the linux source code
for places calling waitqueue_active() before wake_up*(), but without
preceding memory barriers, after sending a patch to fix a similar
issue in drivers/tty/n_tty.c  (Details about the original issue can be
found here: https://lkml.org/lkml/2015/9/28/849).

Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-14 16:41:08 +02:00