* refs/heads/tmp-f057ff9
Linux 4.4.148
x86/speculation/l1tf: Unbreak !__HAVE_ARCH_PFN_MODIFY_ALLOWED architectures
x86/init: fix build with CONFIG_SWAP=n
x86/speculation/l1tf: Fix up CPU feature flags
x86/mm/kmmio: Make the tracer robust against L1TF
x86/mm/pat: Make set_memory_np() L1TF safe
x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert
x86/speculation/l1tf: Invert all not present mappings
x86/speculation/l1tf: Fix up pte->pfn conversion for PAE
x86/speculation/l1tf: Protect PAE swap entries against L1TF
x86/cpufeatures: Add detection of L1D cache flush support.
x86/speculation/l1tf: Extend 64bit swap file size limit
x86/bugs: Move the l1tf function and define pr_fmt properly
x86/speculation/l1tf: Limit swap file size to MAX_PA/2
x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings
mm: fix cache mode tracking in vm_insert_mixed()
mm: Add vm_insert_pfn_prot()
x86/speculation/l1tf: Add sysfs reporting for l1tf
x86/speculation/l1tf: Make sure the first page is always reserved
x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation
x86/speculation/l1tf: Protect swap entries against L1TF
x86/speculation/l1tf: Change order of offset/type in swap entry
mm: x86: move _PAGE_SWP_SOFT_DIRTY from bit 7 to bit 1
x86/mm: Fix swap entry comment and macro
x86/mm: Move swap offset/type up in PTE to work around erratum
x86/speculation/l1tf: Increase 32bit PAE __PHYSICAL_PAGE_SHIFT
x86/irqflags: Provide a declaration for native_save_fl
kprobes/x86: Fix %p uses in error messages
x86/speculation: Protect against userspace-userspace spectreRSB
x86/paravirt: Fix spectre-v2 mitigations for paravirt guests
ARM: dts: imx6sx: fix irq for pcie bridge
IB/ocrdma: fix out of bounds access to local buffer
IB/mlx4: Mark user MR as writable if actual virtual memory is writable
IB/core: Make testing MR flags for writability a static inline function
fix __legitimize_mnt()/mntput() race
fix mntput/mntput race
root dentries need RCU-delayed freeing
scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management enabled
ACPI / LPSS: Add missing prv_offset setting for byt/cht PWM devices
xen/netfront: don't cache skb_shinfo()
parisc: Define mb() and add memory barriers to assembler unlock sequences
parisc: Enable CONFIG_MLONGCALLS by default
fork: unconditionally clear stack on fork
ipv4+ipv6: Make INET*_ESP select CRYPTO_ECHAINIV
tpm: fix race condition in tpm_common_write()
ext4: fix check to prevent initializing reserved inodes
Linux 4.4.147
jfs: Fix inconsistency between memory allocation and ea_buf->max_size
i2c: imx: Fix reinit_completion() use
ring_buffer: tracing: Inherit the tracing setting to next ring buffer
ACPI / PCI: Bail early in acpi_pci_add_bus() if there is no ACPI handle
ext4: fix false negatives *and* false positives in ext4_check_descriptors()
netlink: Don't shift on 64 for ngroups
netlink: Don't shift with UB on nlk->ngroups
netlink: Do not subscribe to non-existent groups
nohz: Fix local_timer_softirq_pending()
genirq: Make force irq threading setup more robust
scsi: qla2xxx: Return error when TMF returns
scsi: qla2xxx: Fix ISP recovery on unload
Conflicts:
include/linux/swapfile.h
Removed CONFIG_CRYPTO_ECHAINIV from defconfig files since this upmerge is
adding this config to Kconfig file.
Change-Id: Ide96c29f919d76590c2bdccf356d1d464a892fd7
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAltsFTEACgkQONu9yGCS
aT7VPg/+KkII11i9uplgHn1nTKf1NTePO+Ur/LWJ8w1JR+2aRp2nresV3chI6gN/
zgraNSzRMrcfno7ERl5Ryrd9YKlsR1JTFBjW6Q9xHfiVu5jKxm7tDU1Quzknmwdy
qXReqtQCtmttOcJqRGVIgDWEy6XxUB2eOLU++nZNCrTw90M+hiTC0COVnY/qaoUd
+pYjjdMdG/qIB345gua+o+4q/yuV/cpfSwKf3ycEQZistzS8wvKwV1Szm4DXp1v/
mGIOx/a5NBRUKlHSdD46QBR9TvugeS4kb5m5vBh6LLum0TWl+Gh0PCg3Q2pBHGWp
iofDHcZga3LnX5rckXVwI69MPoCG3gXei5F8soYcdiGf0XOK2nZN/HSNUB2rBdhw
G8n/Ojr4owedpc8X8Vle19/iQGu2RDh8UfeMRAeUujG2DaWF+YCTy69IY3aNI2Vo
YCNUApib56YnG7/Y/SPLua7kEYIK2z99q8Vc1dW98nqqDXmLPzH78dHmVvLz0WmL
vQfKkPKGM6Ae4YTLM+2Le2BtyQu42FC5fRm1ewPIATo/6Dxdq/+5+O+G2bAg2qD6
kySslEtyKQ/B1IthALmD5ZDO5Q4B2GhewUtwlbo0LbfVB97otdOOlvLyCjNYdRbz
HlCU+BPuh7SDkaJ9spz9P6j8OcDk+/vhgtAd3g16kIXAWecCvf4=
=0wLN
-----END PGP SIGNATURE-----
Merge 4.4.147 into android-4.4
Changes in 4.4.147
scsi: qla2xxx: Fix ISP recovery on unload
scsi: qla2xxx: Return error when TMF returns
genirq: Make force irq threading setup more robust
nohz: Fix local_timer_softirq_pending()
netlink: Do not subscribe to non-existent groups
netlink: Don't shift with UB on nlk->ngroups
netlink: Don't shift on 64 for ngroups
ext4: fix false negatives *and* false positives in ext4_check_descriptors()
ACPI / PCI: Bail early in acpi_pci_add_bus() if there is no ACPI handle
ring_buffer: tracing: Inherit the tracing setting to next ring buffer
i2c: imx: Fix reinit_completion() use
jfs: Fix inconsistency between memory allocation and ea_buf->max_size
Linux 4.4.147
Change-Id: I067f9844278976dddef8063961a70e189c423de3
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 80d20d35af1edd632a5e7a3b9c0ab7ceff92769e upstream.
local_timer_softirq_pending() checks whether the timer softirq is
pending with: local_softirq_pending() & TIMER_SOFTIRQ.
This is wrong because TIMER_SOFTIRQ is the softirq number and not a
bitmask. So the test checks for the wrong bit.
Use BIT(TIMER_SOFTIRQ) instead.
Fixes: 5d62c183f9e9 ("nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()")
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Cc: bigeasy@linutronix.de
Cc: peterz@infradead.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180731161358.29472-1-anna-maria@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* refs/heads/tmp-8cbe01c
Linux 4.4.109
mm/vmstat: Make NR_TLB_REMOTE_FLUSH_RECEIVED available even on UP
n_tty: fix EXTPROC vs ICANON interaction with TIOCINQ (aka FIONREAD)
x86/smpboot: Remove stale TLB flush invocations
nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()
usb: xhci: Add XHCI_TRUST_TX_LENGTH for Renesas uPD720201
USB: Fix off by one in type-specific length check of BOS SSP capability
usb: add RESET_RESUME for ELSA MicroLink 56K
usb: Add device quirk for Logitech HD Pro Webcam C925e
USB: serial: option: adding support for YUGA CLM920-NC5
USB: serial: option: add support for Telit ME910 PID 0x1101
USB: serial: qcserial: add Sierra Wireless EM7565
USB: serial: ftdi_sio: add id for Airbus DS P8GR
usbip: vhci: stop printing kernel pointer addresses in messages
usbip: stub: stop printing kernel pointer addresses in messages
usbip: fix usbip bind writing random string after command in match_busid
sock: free skb in skb_complete_tx_timestamp on error
net: phy: micrel: ksz9031: reconfigure autoneg after phy autoneg workaround
net: Fix double free and memory corruption in get_net_ns_by_id()
net: bridge: fix early call to br_stp_change_bridge_id and plug newlink leaks
ipv4: Fix use-after-free when flushing FIB tables
sctp: Replace use of sockets_allocated with specified macro.
net: mvmdio: disable/unprepare clocks in EPROBE_DEFER case
net: ipv4: fix for a race condition in raw_sendmsg
tg3: Fix rx hang on MTU change with 5717/5719
tcp md5sig: Use skb's saddr when replying to an incoming segment
net: reevalulate autoflowlabel setting after sysctl setting
net: qmi_wwan: add Sierra EM7565 1199:9091
netlink: Add netns check on taps
net: igmp: Use correct source address on IGMPv3 reports
ipv6: mcast: better catch silly mtu values
ipv4: igmp: guard against silly MTU values
kbuild: add '-fno-stack-check' to kernel build options
x86/mm/64: Fix reboot interaction with CR4.PCIDE
x86/mm: Enable CR4.PCIDE on supported systems
x86/mm: Add the 'nopcid' boot option to turn off PCID
x86/mm: Disable PCID on 32-bit kernels
x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code
x86/mm: Reimplement flush_tlb_page() using flush_tlb_mm_range()
x86/mm: Make flush_tlb_mm_range() more predictable
x86/mm: Remove flush_tlb() and flush_tlb_current_task()
x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()
ALSA: hda - fix headset mic detection issue on a Dell machine
ALSA: hda: Drop useless WARN_ON()
ASoC: twl4030: fix child-node lookup
ASoC: fsl_ssi: AC'97 ops need regmap, clock and cleaning up on failure
iw_cxgb4: Only validate the MSN for successful completions
ring-buffer: Mask out the info bits when returning buffer page length
tracing: Fix crash when it fails to alloc ring buffer
tracing: Fix possible double free on failure of allocating trace buffer
tracing: Remove extra zeroing out of the ring buffer page
net: mvneta: clear interface link status on port disable
powerpc/perf: Dereference BHRB entries safely
kvm: x86: fix RSM when PCID is non-zero
KVM: X86: Fix load RFLAGS w/o the fixed bit
spi: xilinx: Detect stall with Unknown commands
parisc: Hide Diva-built-in serial aux and graphics card
PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()
ALSA: usb-audio: Fix the missing ctl name suffix at parsing SU
ALSA: rawmidi: Avoid racy info ioctl via ctl device
mfd: twl6040: Fix child-node lookup
mfd: twl4030-audio: Fix sibling-node lookup
mfd: cros ec: spi: Don't send first message too soon
crypto: mcryptd - protect the per-CPU queue with a lock
ACPI: APEI / ERST: Fix missing error handling in erst_reader()
Change-Id: I3823f793c0c85d1639e9be10358cf70cfcd13afc
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlpL3okACgkQONu9yGCS
aT6p5g/8CAG9NU/fLu7IMcIlyqfVvdOhzxn44oHCxq08eycqoggdnb3TZXxBUBgY
+w8uZk8yxNdjXR39GjkMSUy06WRvl2XDSrd36sDGRCBP62Fi8l5scmlRaNEnI/E8
ltBSB93P16SmnpKa/3Zscz+7LcaoXHpU5Xhs8Zmf4I69qmzOFX2qSKsUyzVT+gNI
ZoSN/mYuXf7+dzrcKhVdYzm4ZdMRvxdT0WefeoeZMekfAtU9D8zaFOA9jTIAMHSZ
adNn18s7UKmaipZf/01mW9srvZce4nPKiUC8WVGstiyl27ws+IDleKVmDnqFALjy
2LIxDvjDth/x8jfqTb7F6bFh6dVtMJjwUmd3KL7hgPuTddoQQe/GfKnjSHkbNxyR
qNxNtbOgQ2EVOf59fejxWshCP/fButNo8uvCI1ERdm4axGXcf9hiucdlwzCYezHs
UN0xrxAXprhqTq4hQFB9E4C49e8nMPNsyXTMZwSZRPe2z53spD53JR/0sl5Z2RWe
ueO21tBZ6ev9jPNi+lJrCVw1oBO+PKOmdNPAaSynUVm96grRnW6grUI3mX9FqMXb
r62UWG3YCWWBgxA3iQQrMxf/3S2YZXz59TBbp9GU8xOYJZLhKL29/iB7Rv4ANtkR
aMDrABjWqrCZpIazqkZ5uwbsNl6Q51e3Mji3EfwkBaMqjc41++I=
=B52+
-----END PGP SIGNATURE-----
Merge 4.4.109 into android-4.4
Changes in 4.4.109
ACPI: APEI / ERST: Fix missing error handling in erst_reader()
crypto: mcryptd - protect the per-CPU queue with a lock
mfd: cros ec: spi: Don't send first message too soon
mfd: twl4030-audio: Fix sibling-node lookup
mfd: twl6040: Fix child-node lookup
ALSA: rawmidi: Avoid racy info ioctl via ctl device
ALSA: usb-audio: Fix the missing ctl name suffix at parsing SU
PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()
parisc: Hide Diva-built-in serial aux and graphics card
spi: xilinx: Detect stall with Unknown commands
KVM: X86: Fix load RFLAGS w/o the fixed bit
kvm: x86: fix RSM when PCID is non-zero
powerpc/perf: Dereference BHRB entries safely
net: mvneta: clear interface link status on port disable
tracing: Remove extra zeroing out of the ring buffer page
tracing: Fix possible double free on failure of allocating trace buffer
tracing: Fix crash when it fails to alloc ring buffer
ring-buffer: Mask out the info bits when returning buffer page length
iw_cxgb4: Only validate the MSN for successful completions
ASoC: fsl_ssi: AC'97 ops need regmap, clock and cleaning up on failure
ASoC: twl4030: fix child-node lookup
ALSA: hda: Drop useless WARN_ON()
ALSA: hda - fix headset mic detection issue on a Dell machine
x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()
x86/mm: Remove flush_tlb() and flush_tlb_current_task()
x86/mm: Make flush_tlb_mm_range() more predictable
x86/mm: Reimplement flush_tlb_page() using flush_tlb_mm_range()
x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code
x86/mm: Disable PCID on 32-bit kernels
x86/mm: Add the 'nopcid' boot option to turn off PCID
x86/mm: Enable CR4.PCIDE on supported systems
x86/mm/64: Fix reboot interaction with CR4.PCIDE
kbuild: add '-fno-stack-check' to kernel build options
ipv4: igmp: guard against silly MTU values
ipv6: mcast: better catch silly mtu values
net: igmp: Use correct source address on IGMPv3 reports
netlink: Add netns check on taps
net: qmi_wwan: add Sierra EM7565 1199:9091
net: reevalulate autoflowlabel setting after sysctl setting
tcp md5sig: Use skb's saddr when replying to an incoming segment
tg3: Fix rx hang on MTU change with 5717/5719
net: ipv4: fix for a race condition in raw_sendmsg
net: mvmdio: disable/unprepare clocks in EPROBE_DEFER case
sctp: Replace use of sockets_allocated with specified macro.
ipv4: Fix use-after-free when flushing FIB tables
net: bridge: fix early call to br_stp_change_bridge_id and plug newlink leaks
net: Fix double free and memory corruption in get_net_ns_by_id()
net: phy: micrel: ksz9031: reconfigure autoneg after phy autoneg workaround
sock: free skb in skb_complete_tx_timestamp on error
usbip: fix usbip bind writing random string after command in match_busid
usbip: stub: stop printing kernel pointer addresses in messages
usbip: vhci: stop printing kernel pointer addresses in messages
USB: serial: ftdi_sio: add id for Airbus DS P8GR
USB: serial: qcserial: add Sierra Wireless EM7565
USB: serial: option: add support for Telit ME910 PID 0x1101
USB: serial: option: adding support for YUGA CLM920-NC5
usb: Add device quirk for Logitech HD Pro Webcam C925e
usb: add RESET_RESUME for ELSA MicroLink 56K
USB: Fix off by one in type-specific length check of BOS SSP capability
usb: xhci: Add XHCI_TRUST_TX_LENGTH for Renesas uPD720201
nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()
x86/smpboot: Remove stale TLB flush invocations
n_tty: fix EXTPROC vs ICANON interaction with TIOCINQ (aka FIONREAD)
mm/vmstat: Make NR_TLB_REMOTE_FLUSH_RECEIVED available even on UP
Linux 4.4.109
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 5d62c183f9e9df1deeea0906d099a94e8a43047a upstream.
The conditions in irq_exit() to invoke tick_nohz_irq_exit() which
subsequently invokes tick_nohz_stop_sched_tick() are:
if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))
If need_resched() is not set, but a timer softirq is pending then this is
an indication that the softirq code punted and delegated the execution to
softirqd. need_resched() is not true because the current interrupted task
takes precedence over softirqd.
Invoking tick_nohz_irq_exit() in this case can cause an endless loop of
timer interrupts because the timer wheel contains an expired timer, but
softirqs are not yet executed. So it returns an immediate expiry request,
which causes the timer to fire immediately again. Lather, rinse and
repeat....
Prevent that by adding a check for a pending timer soft interrupt to the
conditions in tick_nohz_stop_sched_tick() which avoid calling
get_next_timer_interrupt(). That keeps the tick sched timer on the tick and
prevents a repetitive programming of an already expired timer.
Reported-by: Sebastian Siewior <bigeasy@linutronix.d>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272156050.2431@nanos
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* refs/heads/tmp-59ff2e1
Linux 4.4.78
kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS
kvm: vmx: Check value written to IA32_BNDCFGS
kvm: x86: Guest BNDCFGS requires guest MPX support
kvm: vmx: Do not disable intercepts for BNDCFGS
KVM: x86: disable MPX if host did not enable MPX XSAVE features
tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results
PM / QoS: return -EINVAL for bogus strings
PM / wakeirq: Convert to SRCU
sched/topology: Optimize build_group_mask()
sched/topology: Fix overlapping sched_group_mask
crypto: caam - fix signals handling
crypto: sha1-ssse3 - Disable avx2
crypto: atmel - only treat EBUSY as transient if backlog
crypto: talitos - Extend max key length for SHA384/512-HMAC and AEAD
mm: fix overflow check in expand_upwards()
tpm: Issue a TPM2_Shutdown for TPM2 devices.
Add "shutdown" to "struct class".
tpm: Provide strong locking for device removal
tpm: Get rid of chip->pdev
selftests/capabilities: Fix the test_execve test
mnt: Make propagate_umount less slow for overlapping mount propagation trees
mnt: In propgate_umount handle visiting mounts in any order
mnt: In umount propagation reparent in a separate pass
vt: fix unchecked __put_user() in tioclinux ioctls
exec: Limit arg stack to at most 75% of _STK_LIM
s390: reduce ELF_ET_DYN_BASE
powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB
arm64: move ELF_ET_DYN_BASE to 4GB / 4MB
arm: move ELF_ET_DYN_BASE to 4MB
binfmt_elf: use ELF_ET_DYN_BASE only for PIE
checkpatch: silence perl 5.26.0 unescaped left brace warnings
fs/dcache.c: fix spin lockup issue on nlru->lock
mm/list_lru.c: fix list_lru_count_node() to be race free
kernel/extable.c: mark core_kernel_text notrace
tools/lib/lockdep: Reduce MAX_LOCK_DEPTH to avoid overflowing lock_chain/: Depth
parisc/mm: Ensure IRQs are off in switch_mm()
parisc: DMA API: return error instead of BUG_ON for dma ops on non dma devs
parisc: use compat_sys_keyctl()
parisc: Report SIGSEGV instead of SIGBUS when running out of stack
irqchip/gic-v3: Fix out-of-bound access in gic_set_affinity
cfg80211: Check if PMKID attribute is of expected size
cfg80211: Validate frequencies nested in NL80211_ATTR_SCAN_FREQUENCIES
cfg80211: Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODE
brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx()
rds: tcp: use sock_create_lite() to create the accept socket
vrf: fix bug_on triggered by rx when destroying a vrf
net: ipv6: Compare lwstate in detecting duplicate nexthops
ipv6: dad: don't remove dynamic addresses if link is down
net: handle NAPI_GRO_FREE_STOLEN_HEAD case also in napi_frags_finish()
bpf: prevent leaking pointer via xadd on unpriviledged
net: prevent sign extension in dev_get_stats()
tcp: reset sk_rx_dst in tcp_disconnect()
net: dp83640: Avoid NULL pointer dereference.
ipv6: avoid unregistering inet6_dev for loopback
net/phy: micrel: configure intterupts after autoneg workaround
net: sched: Fix one possible panic when no destroy callback
net_sched: fix error recovery at qdisc creation
ANDROID: android-verity: mark dev as rw for linear target
ANDROID: sdcardfs: Remove unnecessary lock
ANDROID: binder: don't check prio permissions on restore.
Add BINDER_GET_NODE_DEBUG_INFO ioctl
UPSTREAM: cpufreq: schedutil: Trace frequency only if it has changed
UPSTREAM: cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely
UPSTREAM: cpufreq: schedutil: Refactor sugov_next_freq_shared()
UPSTREAM: cpufreq: schedutil: Fix per-CPU structure initialization in sugov_start()
UPSTREAM: cpufreq: schedutil: Pass sg_policy to get_next_freq()
UPSTREAM: cpufreq: schedutil: move cached_raw_freq to struct sugov_policy
UPSTREAM: cpufreq: schedutil: Rectify comment in sugov_irq_work() function
UPSTREAM: cpufreq: schedutil: irq-work and mutex are only used in slow path
UPSTREAM: cpufreq: schedutil: enable fast switch earlier
UPSTREAM: cpufreq: schedutil: Avoid indented labels
Linux 4.4.77
saa7134: fix warm Medion 7134 EEPROM read
x86/mm/pat: Don't report PAT on CPUs that don't support it
ext4: check return value of kstrtoull correctly in reserved_clusters_store
staging: comedi: fix clean-up of comedi_class in comedi_init()
staging: vt6556: vnt_start Fix missing call to vnt_key_init_table.
tcp: fix tcp_mark_head_lost to check skb len before fragmenting
md: fix super_offset endianness in super_1_rdev_size_change
md: fix incorrect use of lexx_to_cpu in does_sb_need_changing
perf tools: Use readdir() instead of deprecated readdir_r() again
perf tests: Remove wrong semicolon in while loop in CQM test
perf trace: Do not process PERF_RECORD_LOST twice
perf dwarf: Guard !x86_64 definitions under #ifdef else clause
perf pmu: Fix misleadingly indented assignment (whitespace)
perf annotate browser: Fix behaviour of Shift-Tab with nothing focussed
perf tools: Remove duplicate const qualifier
perf script: Use readdir() instead of deprecated readdir_r()
perf thread_map: Use readdir() instead of deprecated readdir_r()
perf tools: Use readdir() instead of deprecated readdir_r()
perf bench numa: Avoid possible truncation when using snprintf()
perf tests: Avoid possible truncation with dirent->d_name + snprintf
perf scripting perl: Fix compile error with some perl5 versions
perf thread_map: Correctly size buffer used with dirent->dt_name
perf intel-pt: Use __fallthrough
perf top: Use __fallthrough
tools strfilter: Use __fallthrough
tools string: Use __fallthrough in perf_atoll()
tools include: Add a __fallthrough statement
mqueue: fix a use-after-free in sys_mq_notify()
RDMA/uverbs: Check port number supplied by user verbs cmds
KEYS: Fix an error code in request_master_key()
ath10k: override CE5 config for QCA9377
x86/uaccess: Optimize copy_user_enhanced_fast_string() for short strings
x86/tools: Fix gcc-7 warning in relocs.c
gfs2: Fix glock rhashtable rcu bug
USB: serial: qcserial: new Sierra Wireless EM7305 device ID
USB: serial: option: add two Longcheer device ids
pinctrl: sh-pfc: Update info pointer after SoC-specific init
pinctrl: mxs: atomically switch mux and drive strength config
pinctrl: sunxi: Fix SPDIF function name for A83T
pinctrl: meson: meson8b: fix the NAND DQS pins
pinctrl: sh-pfc: r8a7791: Fix SCIF2 pinmux data
sysctl: report EINVAL if value is larger than UINT_MAX for proc_douintvec
sysctl: don't print negative flag for proc_douintvec
mac80211_hwsim: Replace bogus hrtimer clockid
usb: Fix typo in the definition of Endpoint[out]Request
usb: usbip: set buffer pointers to NULL after free
Add USB quirk for HVR-950q to avoid intermittent device resets
USB: serial: cp210x: add ID for CEL EM3588 USB ZigBee stick
usb: dwc3: replace %p with %pK
drm/virtio: don't leak bo on drm_gem_object_init failure
tracing/kprobes: Allow to create probe with a module name starting with a digit
mm: fix classzone_idx underflow in shrink_zones()
bgmac: reset & enable Ethernet core before using it
driver core: platform: fix race condition with driver_override
fs: completely ignore unknown open flags
fs: add a VALID_OPEN_FLAGS
ANDROID: binder: add RT inheritance flag to node.
ANDROID: binder: improve priority inheritance.
ANDROID: binder: add min sched_policy to node.
ANDROID: binder: add support for RT prio inheritance.
ANDROID: binder: push new transactions to waiting threads.
ANDROID: binder: remove proc waitqueue
FROMLIST: binder: remove global binder lock
FROMLIST: binder: fix death race conditions
FROMLIST: binder: protect against stale pointers in print_binder_transaction
FROMLIST: binder: protect binder_ref with outer lock
FROMLIST: binder: use inner lock to protect thread accounting
FROMLIST: binder: protect transaction_stack with inner lock.
FROMLIST: binder: protect proc->threads with inner_lock
FROMLIST: binder: protect proc->nodes with inner lock
FROMLIST: binder: add spinlock to protect binder_node
FROMLIST: binder: add spinlocks to protect todo lists
FROMLIST: binder: use inner lock to sync work dq and node counts
FROMLIST: binder: introduce locking helper functions
FROMLIST: binder: use node->tmp_refs to ensure node safety
FROMLIST: binder: refactor binder ref inc/dec for thread safety
FROMLIST: binder: make sure accesses to proc/thread are safe
FROMLIST: binder: make sure target_node has strong ref
FROMLIST: binder: guarantee txn complete / errors delivered in-order
FROMLIST: binder: refactor binder_pop_transaction
FROMLIST: binder: use atomic for transaction_log index
FROMLIST: binder: add more debug info when allocation fails.
FROMLIST: binder: protect against two threads freeing buffer
FROMLIST: binder: remove dead code in binder_get_ref_for_node
FROMLIST: binder: don't modify thread->looper from other threads
FROMLIST: binder: avoid race conditions when enqueuing txn
FROMLIST: binder: refactor queue management in binder_thread_read
FROMLIST: binder: add log information for binder transaction failures
FROMLIST: binder: make binder_last_id an atomic
FROMLIST: binder: change binder_stats to atomics
FROMLIST: binder: add protection for non-perf cases
FROMLIST: binder: remove binder_debug_no_lock mechanism
FROMLIST: binder: move binder_alloc to separate file
FROMLIST: binder: separate out binder_alloc functions
FROMLIST: binder: remove unneeded cleanup code
FROMLIST: binder: separate binder allocator structure from binder proc
FROMLIST: binder: Use wake up hint for synchronous transactions.
Revert "android: binder: move global binder state into context struct."
sched: walt: fix window misalignment when HZ=300
ANDROID: android-base.cfg: remove CONFIG_CGROUP_DEBUG
ANDROID: sdcardfs: use mount_nodev and fix a issue in sdcardfs_kill_sb
Conflicts:
drivers/android/binder.c
drivers/net/wireless/ath/ath10k/pci.c
Change-Id: Ic6f82c2ec9929733a16a03bb3b745187e002f4f6
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
The way the schedutil governor uses the PELT metric causes it to
underestimate the CPU utilization in some cases.
That can be easily demonstrated by running kernel compilation on
a Sandy Bridge Intel processor, running turbostat in parallel with
it and looking at the values written to the MSR_IA32_PERF_CTL
register. Namely, the expected result would be that when all CPUs
were 100% busy, all of them would be requested to run in the maximum
P-state, but observation shows that this clearly isn't the case.
The CPUs run in the maximum P-state for a while and then are
requested to run slower and go back to the maximum P-state after
a while again. That causes the actual frequency of the processor to
visibly oscillate below the sustainable maximum in a jittery fashion
which clearly is not desirable.
That has been attributed to CPU utilization metric updates on task
migration that cause the total utilization value for the CPU to be
reduced by the utilization of the migrated task. If that happens,
the schedutil governor may see a CPU utilization reduction and will
attempt to reduce the CPU frequency accordingly right away. That
may be premature, though, for example if the system is generally
busy and there are other runnable tasks waiting to be run on that
CPU already.
This is unlikely to be an issue on systems where cpufreq policies are
shared between multiple CPUs, because in those cases the policy
utilization is computed as the maximum of the CPU utilization values
over the whole policy and if that turns out to be low, reducing the
frequency for the policy most likely is a good idea anyway. On
systems with one CPU per policy, however, it may affect performance
adversely and even lead to increased energy consumption in some cases.
On those systems it may be addressed by taking another utilization
metric into consideration, like whether or not the CPU whose
frequency is about to be reduced has been idle recently, because if
that's not the case, the CPU is likely to be busy in the near future
and its frequency should not be reduced.
To that end, use the counter of idle calls in the timekeeping code.
Namely, make the schedutil governor look at that counter for the
current CPU every time before its frequency is about to be reduced.
If the counter has not changed since the previous iteration of the
governor computations for that CPU, the CPU has been busy for all
that time and its frequency should not be decreased, so if the new
frequency would be lower than the one set previously, the governor
will skip the frequency update.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Joel Fernandes <joelaf@google.com>
(cherry picked from commit b7eaf1aab9f8bd2e49fceed77ebc66c1b5800718)
(simple CPUFREQ_RT_DL vs CPUFREQ_DL usage conflicts)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: I531ec02c052944ee07a904dc2a25c59948ee762b
Add a check for cpu unbound deferrable timer expiry and raise
softirq for handling the expired timers so that the CPU can
process the cpu unbound deferrable times as early as possible
when a cpu tries to enter/exit idle loop.
Change-Id: Ieffa74fa22a4d25493f5590b5ac1e0d784fcbbad
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
* tmp-917a9:
ARM/vdso: Mark the vDSO code read-only after init
x86/vdso: Mark the vDSO code read-only after init
lkdtm: Verify that '__ro_after_init' works correctly
arch: Introduce post-init read-only memory
x86/mm: Always enable CONFIG_DEBUG_RODATA and remove the Kconfig option
mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings
asm-generic: Consolidate mark_rodata_ro()
Linux 4.4.6
ld-version: Fix awk regex compile failure
target: Drop incorrect ABORT_TASK put for completed commands
block: don't optimize for non-cloned bio in bio_get_last_bvec()
MIPS: smp.c: Fix uninitialised temp_foreign_map
MIPS: Fix build error when SMP is used without GIC
ovl: fix getcwd() failure after unsuccessful rmdir
ovl: copy new uid/gid into overlayfs runtime inode
userfaultfd: don't block on the last VM updates at exit time
powerpc/powernv: Fix OPAL_CONSOLE_FLUSH prototype and usages
powerpc/powernv: Add a kmsg_dumper that flushes console output on panic
powerpc: Fix dedotify for binutils >= 2.26
Revert "drm/radeon/pm: adjust display configuration after powerstate"
drm/radeon: Fix error handling in radeon_flip_work_func.
drm/amdgpu: Fix error handling in amdgpu_flip_work_func.
Revert "drm/radeon: call hpd_irq_event on resume"
x86/mm: Fix slow_virt_to_phys() for X86_PAE again
gpu: ipu-v3: Do not bail out on missing optional port nodes
mac80211: Fix Public Action frame RX in AP mode
mac80211: check PN correctly for GCMP-encrypted fragmented MPDUs
mac80211: minstrel_ht: fix a logic error in RTS/CTS handling
mac80211: minstrel_ht: set default tx aggregation timeout to 0
mac80211: fix use of uninitialised values in RX aggregation
mac80211: minstrel: Change expected throughput unit back to Kbps
iwlwifi: mvm: inc pending frames counter also when txing non-sta
can: gs_usb: fixed disconnect bug by removing erroneous use of kfree()
cfg80211/wext: fix message ordering
wext: fix message delay/ordering
ovl: fix working on distributed fs as lower layer
ovl: ignore lower entries when checking purity of non-directory entries
ASoC: wm8958: Fix enum ctl accesses in a wrong type
ASoC: wm8994: Fix enum ctl accesses in a wrong type
ASoC: samsung: Use IRQ safe spin lock calls
ASoC: dapm: Fix ctl value accesses in a wrong type
ncpfs: fix a braino in OOM handling in ncp_fill_cache()
jffs2: reduce the breakage on recovery from halfway failed rename()
dmaengine: at_xdmac: fix residue computation
tracing: Fix check for cpu online when event is disabled
s390/dasd: fix diag 0x250 inline assembly
s390/mm: four page table levels vs. fork
KVM: MMU: fix reserved bit check for ept=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0
KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
KVM: PPC: Book3S HV: Sanitize special-purpose register values on guest exit
KVM: s390: correct fprs on SIGP (STOP AND) STORE STATUS
KVM: VMX: disable PEBS before a guest entry
kvm: cap halt polling at exactly halt_poll_ns
PCI: Allow a NULL "parent" pointer in pci_bus_assign_domain_nr()
ARM: OMAP2+: hwmod: Introduce ti,no-idle dt property
ARM: dts: dra7: do not gate cpsw clock due to errata i877
ARM: mvebu: fix overlap of Crypto SRAM with PCIe memory window
arm64: account for sparsemem section alignment when choosing vmemmap offset
Linux 4.4.5
drm/amdgpu: fix topaz/tonga gmc assignment in 4.4 stable
modules: fix longstanding /proc/kallsyms vs module insertion race.
drm/i915: refine qemu south bridge detection
drm/i915: more virtual south bridge detection
block: get the 1st and last bvec via helpers
block: check virt boundary in bio_will_gap()
drm/amdgpu: Use drm_calloc_large for VM page_tables array
thermal: cpu_cooling: fix out of bounds access in time_in_idle
i2c: brcmstb: allocate correct amount of memory for regmap
ubi: Fix out of bounds write in volume update code
cxl: Fix PSL timebase synchronization detection
MIPS: traps: Fix SIGFPE information leak from `do_ov' and `do_trap_or_bp'
MIPS: scache: Fix scache init with invalid line size.
USB: serial: option: add support for Quectel UC20
USB: serial: option: add support for Telit LE922 PID 0x1045
USB: qcserial: add Sierra Wireless EM74xx device ID
USB: qcserial: add Dell Wireless 5809e Gobi 4G HSPA+ (rev3)
USB: cp210x: Add ID for Parrot NMEA GPS Flight Recorder
usb: chipidea: otg: change workqueue ci_otg as freezable
ALSA: timer: Fix broken compat timer user status ioctl
ALSA: hdspm: Fix zero-division
ALSA: hdsp: Fix wrong boolean ctl value accesses
ALSA: hdspm: Fix wrong boolean ctl value accesses
ALSA: seq: oss: Don't drain at closing a client
ALSA: pcm: Fix ioctls for X32 ABI
ALSA: timer: Fix ioctls for X32 ABI
ALSA: rawmidi: Fix ioctls X32 ABI
ALSA: hda - Fix mic issues on Acer Aspire E1-472
ALSA: ctl: Fix ioctls for X32 ABI
ALSA: usb-audio: Add a quirk for Plantronics DA45
adv7604: fix tx 5v detect regression
dmaengine: pxa_dma: fix cyclic transfers
Fix directory hardlinks from deleted directories
jffs2: Fix page lock / f->sem deadlock
Revert "jffs2: Fix lock acquisition order bug in jffs2_write_begin"
Btrfs: fix loading of orphan roots leading to BUG_ON
pata-rb532-cf: get rid of the irq_to_gpio() call
tracing: Do not have 'comm' filter override event 'comm' field
ata: ahci: don't mark HotPlugCapable Ports as external/removable
PM / sleep / x86: Fix crash on graph trace through x86 suspend
arm64: vmemmap: use virtual projection of linear region
Adding Intel Lewisburg device IDs for SATA
writeback: flush inode cgroup wb switches instead of pinning super_block
block: bio: introduce helpers to get the 1st and last bvec
libata: Align ata_device's id on a cacheline
libata: fix HDIO_GET_32BIT ioctl
drm/amdgpu: return from atombios_dp_get_dpcd only when error
drm/amdgpu/gfx8: specify which engine to wait before vm flush
drm/amdgpu: apply gfx_v8 fixes to gfx_v7 as well
drm/amdgpu/pm: update current crtc info after setting the powerstate
drm/radeon/pm: update current crtc info after setting the powerstate
drm/ast: Fix incorrect register check for DRAM width
target: Fix WRITE_SAME/DISCARD conversion to linux 512b sectors
iommu/vt-d: Use BUS_NOTIFY_REMOVED_DEVICE in hotplug path
iommu/amd: Fix boot warning when device 00:00.0 is not iommu covered
iommu/amd: Apply workaround for ATS write permission check
arm/arm64: KVM: Fix ioctl error handling
KVM: x86: fix root cause for missed hardware breakpoints
vfio: fix ioctl error handling
Fix cifs_uniqueid_to_ino_t() function for s390x
CIFS: Fix SMB2+ interim response processing for read requests
cifs: fix out-of-bounds access in lease parsing
fbcon: set a default value to blink interval
kvm: x86: Update tsc multiplier on change.
mips/kvm: fix ioctl error handling
parisc: Fix ptrace syscall number and return value modification
PCI: keystone: Fix MSI code that retrieves struct pcie_port pointer
block: Initialize max_dev_sectors to 0
drm/amdgpu: mask out WC from BO on unsupported arches
btrfs: async-thread: Fix a use-after-free error for trace
btrfs: Fix no_space in write and rm loop
Btrfs: fix deadlock running delayed iputs at transaction commit time
drivers: sh: Restore legacy clock domain on SuperH platforms
use ->d_seq to get coherency between ->d_inode and ->d_flags
Linux 4.4.4
iwlwifi: mvm: don't allow sched scans without matches to be started
iwlwifi: update and fix 7265 series PCI IDs
iwlwifi: pcie: properly configure the debug buffer size for 8000
iwlwifi: dvm: fix WoWLAN
security: let security modules use PTRACE_MODE_* with bitmasks
IB/cma: Fix RDMA port validation for iWarp
x86/irq: Plug vector cleanup race
x86/irq: Call irq_force_move_complete with irq descriptor
x86/irq: Remove outgoing CPU from vector cleanup mask
x86/irq: Remove the cpumask allocation from send_cleanup_vector()
x86/irq: Clear move_in_progress before sending cleanup IPI
x86/irq: Remove offline cpus from vector cleanup
x86/irq: Get rid of code duplication
x86/irq: Copy vectormask instead of an AND operation
x86/irq: Check vector allocation early
x86/irq: Reorganize the search in assign_irq_vector
x86/irq: Reorganize the return path in assign_irq_vector
x86/irq: Do not use apic_chip_data.old_domain as temporary buffer
x86/irq: Validate that irq descriptor is still active
x86/irq: Fix a race in x86_vector_free_irqs()
x86/irq: Call chip->irq_set_affinity in proper context
x86/entry/compat: Add missing CLAC to entry_INT80_32
x86/mpx: Fix off-by-one comparison with nr_registers
hpfs: don't truncate the file when delete fails
do_last(): ELOOP failure exit should be done after leaving RCU mode
should_follow_link(): validate ->d_seq after having decided to follow
xen/pcifront: Fix mysterious crashes when NUMA locality information was extracted.
xen/pciback: Save the number of MSI-X entries to be copied later.
xen/pciback: Check PF instead of VF for PCI_COMMAND_MEMORY
xen/scsiback: correct frontend counting
xen/arm: correctly handle DMA mapping of compound pages
ARM: at91/dt: fix typo in sama5d2 pinmux descriptions
ARM: OMAP2+: Fix onenand initialization to avoid filesystem corruption
do_last(): don't let a bogus return value from ->open() et.al. to confuse us
kernel/resource.c: fix muxed resource handling in __request_region()
sunrpc/cache: fix off-by-one in qword_get()
tracing: Fix showing function event in available_events
powerpc/eeh: Fix partial hotplug criterion
KVM: x86: MMU: fix ubsan index-out-of-range warning
KVM: x86: fix conversion of addresses to linear in 32-bit protected mode
KVM: x86: fix missed hardware breakpoints
KVM: arm/arm64: vgic: Ensure bitmaps are long enough
KVM: async_pf: do not warn on page allocation failures
of/irq: Fix msi-map calculation for nonzero rid-base
NFSv4: Fix a dentry leak on alias use
nfs: fix nfs_size_to_loff_t
block: fix use-after-free in dio_bio_complete
bio: return EINTR if copying to user space got interrupted
i2c: i801: Adding Intel Lewisburg support for iTCO
phy: core: fix wrong err handle for phy_power_on
writeback: keep superblock pinned during cgroup writeback association switches
cgroup: make sure a parent css isn't offlined before its children
cpuset: make mm migration asynchronous
PCI/AER: Flush workqueue on device remove to avoid use-after-free
ARCv2: SMP: Emulate IPI to self using software triggered interrupt
ARCv2: STAR 9000950267: Handle return from intr to Delay Slot #2
libata: fix sff host state machine locking while polling
qla2xxx: Fix stale pointer access.
spi: atmel: fix gpio chip-select in case of non-DT platform
target: Fix race with SCF_SEND_DELAYED_TAS handling
target: Fix remote-port TMR ABORT + se_cmd fabric stop
target: Fix TAS handling for multi-session se_node_acls
target: Fix LUN_RESET active TMR descriptor handling
target: Fix LUN_RESET active I/O handling for ACK_KREF
ALSA: hda - Fixing background noise on Dell Inspiron 3162
ALSA: hda - Apply clock gate workaround to Skylake, too
Revert "workqueue: make sure delayed work run in local cpu"
workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup
mac80211: Requeue work after scan complete for all VIF types.
rfkill: fix rfkill_fop_read wait_event usage
tick/nohz: Set the correct expiry when switching to nohz/lowres mode
perf stat: Do not clean event's private stats
cdc-acm:exclude Samsung phone 04e8:685d
Revert "Staging: panel: usleep_range is preferred over udelay"
Staging: speakup: Fix getting port information
sd: Optimal I/O size is in bytes, not sectors
libceph: don't spam dmesg with stray reply warnings
libceph: use the right footer size when skipping a message
libceph: don't bail early from try_read() when skipping a message
libceph: fix ceph_msg_revoke()
seccomp: always propagate NO_NEW_PRIVS on tsync
cpufreq: Fix NULL reference crash while accessing policy->governor_data
cpufreq: pxa2xx: fix pxa_cpufreq_change_voltage prototype
hwmon: (ads1015) Handle negative conversion values correctly
hwmon: (gpio-fan) Remove un-necessary speed_index lookup for thermal hook
hwmon: (dell-smm) Blacklist Dell Studio XPS 8000
Thermal: do thermal zone update after a cooling device registered
Thermal: handle thermal zone device properly during system sleep
Thermal: initialize thermal zone device correctly
IB/mlx5: Expose correct maximum number of CQE capacity
IB/qib: Support creating qps with GFP_NOIO flag
IB/qib: fix mcast detach when qp not attached
IB/cm: Fix a recently introduced deadlock
dmaengine: dw: disable BLOCK IRQs for non-cyclic xfer
dmaengine: at_xdmac: fix resume for cyclic transfers
dmaengine: dw: fix cyclic transfer callbacks
dmaengine: dw: fix cyclic transfer setup
nfit: fix multi-interface dimm handling, acpi6.1 compatibility
ACPI / PCI / hotplug: unlock in error path in acpiphp_enable_slot()
ACPI: Revert "ACPI / video: Add Dell Inspiron 5737 to the blacklist"
ACPI / video: Add disable_backlight_sysfs_if quirk for the Toshiba Satellite R830
ACPI / video: Add disable_backlight_sysfs_if quirk for the Toshiba Portege R700
lib: sw842: select crc32
uapi: update install list after nvme.h rename
ideapad-laptop: Add Lenovo Yoga 700 to no_hw_rfkill dmi list
ideapad-laptop: Add Lenovo ideapad Y700-17ISK to no_hw_rfkill dmi list
toshiba_acpi: Fix blank screen at boot if transflective backlight is supported
make sure that freeing shmem fast symlinks is RCU-delayed
drm/radeon/pm: adjust display configuration after powerstate
drm/radeon: Don't hang in radeon_flip_work_func on disabled crtc. (v2)
drm: Fix treatment of drm_vblank_offdelay in drm_vblank_on() (v2)
drm: Fix drm_vblank_pre/post_modeset regression from Linux 4.4
drm: Prevent vblank counter bumps > 1 with active vblank clients. (v2)
drm: No-Op redundant calls to drm_vblank_off() (v2)
drm/radeon: use post-decrement in error handling
drm/qxl: use kmalloc_array to alloc reloc_info in qxl_process_single_command
drm/i915: fix error path in intel_setup_gmbus()
drm/i915/dsi: don't pass arbitrary data to sideband
drm/i915/dsi: defend gpio table against out of bounds access
drm/i915/skl: Don't skip mst encoders in skl_ddi_pll_select()
drm/i915: Don't reject primary plane windowing with color keying enabled on SKL+
drm/i915/dp: fall back to 18 bpp when sink capability is unknown
drm/i915: Make sure DC writes are coherent on flush.
drm/i915: Init power domains early in driver load
drm/i915: intel_hpd_init(): Fix suspend/resume reprobing
drm/i915: Restore inhibiting the load of the default context
drm: fix missing reference counting decrease
drm/radeon: hold reference to fences in radeon_sa_bo_new
drm/radeon: mask out WC from BO on unsupported arches
drm: add helper to check for wc memory support
drm/radeon: fix DP audio support for APU with DCE4.1 display engine
drm/radeon: Add a common function for DFS handling
drm/radeon: cleaned up VCO output settings for DP audio
drm/radeon: properly byte swap vce firmware setup
drm/radeon: clean up fujitsu quirks
drm/radeon: Fix "slow" audio over DP on DCE8+
drm/radeon: call hpd_irq_event on resume
drm/radeon: Fix off-by-one errors in radeon_vm_bo_set_addr
drm/dp/mst: deallocate payload on port destruction
drm/dp/mst: Reverse order of MST enable and clearing VC payload table.
drm/dp/mst: move GUID storage from mgr, port to only mst branch
drm/dp/mst: Calculate MST PBN with 31.32 fixed point
drm: Add drm_fixp_from_fraction and drm_fixp2int_ceil
drm/dp/mst: fix in RAD element access
drm/dp/mst: fix in MSTB RAD initialization
drm/dp/mst: always send reply for UP request
drm/dp/mst: process broadcast messages correctly
drm/nouveau: platform: Fix deferred probe
drm/nouveau/disp/dp: ensure sink is powered up before attempting link training
drm/nouveau/display: Enable vblank irqs after display engine is on again.
drm/nouveau/kms: take mode_config mutex in connector hotplug path
drm/amdgpu/pm: adjust display configuration after powerstate
drm/amdgpu: Don't hang in amdgpu_flip_work_func on disabled crtc.
drm/amdgpu: use post-decrement in error handling
drm/amdgpu: fix issue with overlapping userptrs
drm/amdgpu: hold reference to fences in amdgpu_sa_bo_new (v2)
drm/amdgpu: remove unnecessary forward declaration
drm/amdgpu: fix s4 resume
drm/amdgpu: remove exp hardware support from iceland
drm/amdgpu: don't load MEC2 on topaz
drm/amdgpu: drop topaz support from gmc8 module
drm/amdgpu: pull topaz gmc bits into gmc_v7
drm/amdgpu: The VI specific EXE bit should only apply to GMC v8.0 above
drm/amdgpu: iceland use CI based MC IP
drm/amdgpu: move gmc7 support out of CIK dependency
drm/amdgpu: no need to load MC firmware on fiji
drm/amdgpu: fix amdgpu_bo_pin_restricted VRAM placing v2
drm/amdgpu: fix tonga smu resume
drm/amdgpu: fix lost sync_to if scheduler is enabled.
drm/amdgpu: call hpd_irq_event on resume
drm/amdgpu: Fix off-by-one errors in amdgpu_vm_bo_map
drm/vmwgfx: respect 'nomodeset'
drm/vmwgfx: Fix a width / pitch mismatch on framebuffer updates
drm/vmwgfx: Fix an incorrect lock check
virtio_pci: fix use after free on release
virtio_balloon: fix race between migration and ballooning
virtio_balloon: fix race by fill and leak
regulator: mt6311: MT6311_REGULATOR needs to select REGMAP_I2C
regulator: axp20x: Fix GPIO LDO enable value for AXP22x
clk: exynos: use irqsave version of spin_lock to avoid deadlock with irqs
cxl: use correct operator when writing pcie config space values
sparc64: fix incorrect sign extension in sys_sparc64_personality
EDAC, mc_sysfs: Fix freeing bus' name
EDAC: Robustify workqueues destruction
MIPS: Fix buffer overflow in syscall_get_arguments()
MIPS: Fix some missing CONFIG_CPU_MIPSR6 #ifdefs
MIPS: hpet: Choose a safe value for the ETIME check
MIPS: Loongson-3: Fix SMP_ASK_C0COUNT IPI handler
Revert "MIPS: Fix PAGE_MASK definition"
cputime: Prevent 32bit overflow in time[val|spec]_to_cputime()
time: Avoid signed overflow in timekeeping_get_ns()
Bluetooth: 6lowpan: Fix handling of uncompressed IPv6 packets
Bluetooth: 6lowpan: Fix kernel NULL pointer dereferences
Bluetooth: Fix incorrect removing of IRKs
Bluetooth: Add support of Toshiba Broadcom based devices
Bluetooth: Use continuous scanning when creating LE connections
Drivers: hv: vmbus: Fix a Host signaling bug
tools: hv: vss: fix the write()'s argument: error -> vss_msg
mmc: sdhci: Allow override of get_cd() called from sdhci_request()
mmc: sdhci: Allow override of mmc host operations
mmc: sdhci-pci: Fix card detect race for Intel BXT/APL
mmc: pxamci: fix again read-only gpio detection polarity
mmc: sdhci-acpi: Fix card detect race for Intel BXT/APL
mmc: mmci: fix an ages old detection error
mmc: core: Enable tuning according to the actual timing
mmc: sdhci: Fix sdhci_runtime_pm_bus_on/off()
mmc: mmc: Fix incorrect use of driver strength switching HS200 and HS400
mmc: sdio: Fix invalid vdd in voltage switch power cycle
mmc: sdhci: Fix DMA descriptor with zero data length
mmc: sdhci-pci: Do not default to 33 Ohm driver strength for Intel SPT
mmc: usdhi6rol0: handle NULL data in timeout
clockevents/tcb_clksrc: Prevent disabling an already disabled clock
posix-clock: Fix return code on the poll method's error path
irqchip/gic-v3-its: Fix double ICC_EOIR write for LPI in EOImode==1
irqchip/atmel-aic: Fix wrong bit operation for IRQ priority
irqchip/mxs: Add missing set_handle_irq()
irqchip/omap-intc: Add support for spurious irq handling
coresight: checking for NULL string in coresight_name_match()
dm: fix dm_rq_target_io leak on faults with .request_fn DM w/ blk-mq paths
dm snapshot: fix hung bios when copy error occurs
dm space map metadata: remove unused variable in brb_pop()
tda1004x: only update the frontend properties if locked
vb2: fix a regression in poll() behavior for output,streams
gspca: ov534/topro: prevent a division by 0
si2157: return -EINVAL if firmware blob is too big
media: dvb-core: Don't force CAN_INVERSION_AUTO in oneshot mode
rc: sunxi-cir: Initialize the spinlock properly
namei: ->d_inode of a pinned dentry is stable only for positives
mei: validate request value in client notify request ioctl
mei: fix fasync return value on error
rtlwifi: rtl8723be: Fix module parameter initialization
rtlwifi: rtl8188ee: Fix module parameter initialization
rtlwifi: rtl8192se: Fix module parameter initialization
rtlwifi: rtl8723ae: Fix initialization of module parameters
rtlwifi: rtl8192de: Fix incorrect module parameter descriptions
rtlwifi: rtl8192ce: Fix handling of module parameters
rtlwifi: rtl8192cu: Add missing parameter setup
rtlwifi: rtl_pci: Fix kernel panic
locks: fix unlock when fcntl_setlk races with a close
um: link with -lpthread
uml: fix hostfs mknod()
uml: flush stdout before forking
s390/fpu: signals vs. floating point control register
s390/compat: correct restore of high gprs on signal return
s390/dasd: fix performance drop
s390/dasd: fix refcount for PAV reassignment
s390/dasd: prevent incorrect length error under z/VM after PAV changes
s390: fix normalization bug in exception table sorting
btrfs: initialize the seq counter in struct btrfs_device
Btrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume roots
Btrfs: fix transaction handle leak on failure to create hard link
Btrfs: fix number of transaction units required to create symlink
Btrfs: send, don't BUG_ON() when an empty symlink is found
btrfs: statfs: report zero available if metadata are exhausted
Btrfs: igrab inode in writepage
Btrfs: add missing brelse when superblock checksum fails
KVM: s390: fix memory overwrites when vx is disabled
s390/kvm: remove dependency on struct save_area definition
clocksource/drivers/vt8500: Increase the minimum delta
genirq: Validate action before dereferencing it in handle_irq_event_percpu()
mm: numa: quickly fail allocations for NUMA balancing on full nodes
mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED
ocfs2: unlock inode if deleting inode from orphan fails
drm/i915: shut up gen8+ SDE irq dmesg noise
iw_cxgb3: Fix incorrectly returning error on success
spi: omap2-mcspi: Prevent duplicate gpio_request
drivers: android: correct the size of struct binder_uintptr_t for BC_DEAD_BINDER_DONE
USB: option: add "4G LTE usb-modem U901"
USB: option: add support for SIM7100E
USB: cp210x: add IDs for GE B650V3 and B850V3 boards
usb: dwc3: Fix assignment of EP transfer resources
can: ems_usb: Fix possible tx overflow
dm thin: fix race condition when destroying thin pool workqueue
bcache: Change refill_dirty() to always scan entire disk if necessary
bcache: prevent crash on changing writeback_running
bcache: allows use of register in udev to avoid "device_busy" error.
bcache: unregister reboot notifier if bcache fails to unregister device
bcache: fix a leak in bch_cached_dev_run()
bcache: clear BCACHE_DEV_UNLINK_DONE flag when attaching a backing device
bcache: Add a cond_resched() call to gc
bcache: fix a livelock when we cause a huge number of cache misses
lib/ucs2_string: Correct ucs2 -> utf8 conversion
efi: Add pstore variables to the deletion whitelist
efi: Make efivarfs entries immutable by default
efi: Make our variable validation list include the guid
efi: Do variable name validation tests in utf8
efi: Use ucs2_as_utf8 in efivarfs instead of open coding a bad version
lib/ucs2_string: Add ucs2 -> utf8 helper functions
ARM: 8457/1: psci-smp is built only for SMP
drm/gma500: Use correct unref in the gem bo create function
devm_memremap: Fix error value when memremap failed
KVM: s390: fix guest fprs memory leak
arm64: errata: Add -mpc-relative-literal-loads to build flags
ARM: debug-ll: fix BCM63xx entry for multiplatform
ext4: fix bh->b_state corruption
sctp: Fix port hash table size computation
unix_diag: fix incorrect sign extension in unix_lookup_by_ino
tipc: unlock in error path
rtnl: RTM_GETNETCONF: fix wrong return value
IFF_NO_QUEUE: Fix for drivers not calling ether_setup()
tcp/dccp: fix another race at listener dismantle
route: check and remove route cache when we get route
net_sched fix: reclassification needs to consider ether protocol changes
pppoe: fix reference counting in PPPoE proxy
l2tp: Fix error creating L2TP tunnels
net/mlx4_en: Avoid changing dev->features directly in run-time
net/mlx4_en: Choose time-stamping shift value according to HW frequency
net/mlx4_en: Count HW buffer overrun only once
qmi_wwan: add "4G LTE usb-modem U901"
tcp: md5: release request socket instead of listener
tipc: fix premature addition of node to lookup table
af_unix: Guard against other == sk in unix_dgram_sendmsg
af_unix: Don't set err in unix_stream_read_generic unless there was an error
ipv4: fix memory leaks in ip_cmsg_send() callers
bonding: Fix ARP monitor validation
bpf: fix branch offset adjustment on backjumps after patching ctx expansion
flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen
net: Copy inner L3 and L4 headers as unaligned on GRE TEB
sctp: translate network order to host order when users get a hmacid
enic: increment devcmd2 result ring in case of timeout
tg3: Fix for tg3 transmit queue 0 timed out when too many gso_segs
net:Add sysctl_max_skb_frags
tcp: do not drop syn_recv on all icmp reports
unix: correctly track in-flight fds in sending process user_struct
ipv6: fix a lockdep splat
ipv6: addrconf: Fix recursive spin lock call
ipv6/udp: use sticky pktinfo egress ifindex on connect()
ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail()
tcp: beware of alignments in tcp_get_info()
switchdev: Require RTNL mutex to be held when sending FDB notifications
inet: frag: Always orphan skbs inside ip_defrag()
tipc: fix connection abort during subscription cancel
net: dsa: fix mv88e6xxx switches
sctp: allow setting SCTP_SACK_IMMEDIATELY by the application
pptp: fix illegal memory access caused by multiple bind()s
af_unix: fix struct pid memory leak
tcp: fix NULL deref in tcp_v4_send_ack()
lwt: fix rx checksum setting for lwt devices tunneling over ipv6
tunnels: Allow IPv6 UDP checksums to be correctly controlled.
net: dp83640: Fix tx timestamp overflow handling.
gro: Make GRO aware of lightweight tunnels.
af_iucv: Validate socket address length in iucv_sock_bind()
Conflicts:
arch/arm64/Makefile
arch/arm64/include/asm/cacheflush.h
drivers/mmc/host/sdhci.c
drivers/usb/dwc3/ep0.c
drivers/usb/dwc3/gadget.c
kernel/module.c
sound/core/pcm_compat.c
CRs-Fixed: 1010239
Signed-off-by: Runmin Wang <runminw@codeaurora.org>
Change-Id: I41a28636fc9ad91f9d979b191784609476294cdf
touch_softlockup_watchdog() is used to tell watchdog that scheduler
stall is expected. One group of usage is from paths where the task
may not be able to yield for a long time such as performing slow PIO
to finicky device and coming out of suspend. The other is to account
for scheduler and timer going idle.
For scheduler softlockup detection, there's no reason to distinguish
the two cases; however, workqueue lockup detector is planned and it
can use the same signals from the former group while the latter would
spuriously prevent detection. This patch introduces a new function
touch_softlockup_watchdog_sched() and convert the latter group to call
it instead. For now, it just calls touch_softlockup_watchdog() and
there's no functional difference.
CRs-Fixed: 1007459
Change-Id: I6fe77926acd4240458cab29d399f81d8739a16c0
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Git-commit: 03e0d4610bf4d4a93bfa16b2474ed4fd5243aa71
Git-repo: git://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
Signed-off-by: Trilok Soni <tsoni@codeaurora.org>
At present, HMP scheduler uses sched_clock to setup window boundary to
be aligned with timer interrupt to ensure timer interrupt fires after
window rollover. However this alignment won't last long since the timer
interrupt rearms next timer based on time measured by ktime which isn't
coupled with sched_clock.
Convert sched_clock to ktime to avoid wallclock discrepancy between
scheduler and timer so that we can ensure scheduler's window boundary is
always aligned with timer.
CRs-fixed: 933330
Change-Id: I4108819a4382f725b3ce6075eb46aab0cf670b7e
[joonwoop@codeaurora.org: fixed minor conflict in include/linux/tick.h
and kernel/sched/core.c. omitted fixes for kernel/sched/qhmp_core.c]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
The tick code already tracks exact time a tick is expected
to arrive. This can be used to eliminate slack in the jiffy
to sched_clock mapping that aligns windows between a caller
of sched_set_window and the scheduler itself.
Change-Id: I9d47466658d01e6857d7457405459436d504a2ca
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in include/linux/tick.h]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Add a snapshot of the run queue stats driver as of msm-3.10 commit
4bf320bd ("Merge "ASoC: msm8952: set async flag for 8952 dailink"")
Resolve checkpatch warnings in the process, notably the replacement
of sscanf with kstrtouint.
Change-Id: I7e2f98223677e6477df114ffe770c0740ed37de9
Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
Signed-off-by: Abhimanyu Kapur <abhimany@codeaurora.org>
The use of clockevents_notify is deprcated and targeted APIs are used
instead of the clockevents_notify callbacks. Switch broadcast timer
notifications to tick_broadcast_enter and tick_broadcast_exit.
Change-Id: I3441873eb4009b105db04f4a18d28ae9ccd07e95
commit 1ca8ec532fc2d986f1f4a319857bb18e0c9739b4 upstream.
commit 0ff53d0964 sets the next tick interrupt to the last jiffies update,
i.e. in the past, because the forward operation is invoked before the set
operation. There is no resulting damage (yet), but we get an extra pointless
tick interrupt.
Revert the order so we get the next tick interrupt in the future.
Fixes: commit 0ff53d0964 "tick: sched: Force tick interrupt and get rid of softirq magic"
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1453893967-3458-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The code ensures that when nohz full is running, at least the
boot CPU serves as a housekeeper and it can't be later offlined.
Let's assert this assumption to make sure that we have CPUs to
handle unbound jobs like workqueues and timers while nohz full
CPUs run undisturbed.
Also improve the comments on housekeeper offlining prevention.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Vatika Harlalka <vatikaharlalka@gmail.com>
Link: http://lkml.kernel.org/r/1441119060-2230-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Restart the tick when necessary from the irq exit path. It makes nohz
full more flexible, simplify the related IPIs and doesn't bring
significant overhead on irq exit.
In a longer term view, it will allow us to piggyback the nohz kick
on the scheduler IPI in the future instead of sending a dedicated IPI
that often doubles the scheduler IPI on task wakeup. This will require
more changes though including careful review of resched_curr() callers
to include nohz full needs.
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
On nohz full early days, idle dynticks and full dynticks weren't well
integrated and we couldn't risk full dynticks calls on idle without
risking messing up tick idle statistics. This is why we prevented such
thing to happen.
Nowadays full dynticks and idle dynticks are better integrated and
interact without known issue.
So lets remove that.
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
If nohz is disabled on the kernel command line the [hr]timer code
still calls wake_up_nohz_cpu() and tick_nohz_full_cpu(), a pretty
pointless exercise. Cache nohz_active in [hr]timer per cpu bases and
avoid the overhead.
Before:
48.10% hog [.] main
15.25% [kernel] [k] _raw_spin_lock_irqsave
9.76% [kernel] [k] _raw_spin_unlock_irqrestore
6.50% [kernel] [k] mod_timer
6.44% [kernel] [k] lock_timer_base.isra.38
3.87% [kernel] [k] detach_if_pending
3.80% [kernel] [k] del_timer
2.67% [kernel] [k] internal_add_timer
1.33% [kernel] [k] __internal_add_timer
0.73% [kernel] [k] timerfn
0.54% [kernel] [k] wake_up_nohz_cpu
After:
48.73% hog [.] main
15.36% [kernel] [k] _raw_spin_lock_irqsave
9.77% [kernel] [k] _raw_spin_unlock_irqrestore
6.61% [kernel] [k] lock_timer_base.isra.38
6.42% [kernel] [k] mod_timer
3.90% [kernel] [k] detach_if_pending
3.76% [kernel] [k] del_timer
2.41% [kernel] [k] internal_add_timer
1.39% [kernel] [k] __internal_add_timer
0.76% [kernel] [k] timerfn
We probably should have a cached value for nohz full in the per cpu
bases as well to avoid the cpumask check. The base cache line is hot
already, the cpumask not necessarily.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Link: http://lkml.kernel.org/r/20150526224512.207378134@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Eric reported that the timer_migration sysctl is not really nice
performance wise as it needs to check at every timer insertion whether
the feature is enabled or not. Further the check does not live in the
timer code, so we have an extra function call which checks an extra
cache line to figure out that it is disabled.
We can do better and store that information in the per cpu (hr)timer
bases. I pondered to use a static key, but that's a nightmare to
update from the nohz code and the timer base cache line is hot anyway
when we select a timer base.
The old logic enabled the timer migration unconditionally if
CONFIG_NO_HZ was set even if nohz was disabled on the kernel command
line.
With this modification, we start off with migration disabled. The user
visible sysctl is still set to enabled. If the kernel switches to NOHZ
migration is enabled, if the user did not disable it via the sysctl
prior to the switch. If nohz=off is on the kernel command line,
migration stays disabled no matter what.
Before:
47.76% hog [.] main
14.84% [kernel] [k] _raw_spin_lock_irqsave
9.55% [kernel] [k] _raw_spin_unlock_irqrestore
6.71% [kernel] [k] mod_timer
6.24% [kernel] [k] lock_timer_base.isra.38
3.76% [kernel] [k] detach_if_pending
3.71% [kernel] [k] del_timer
2.50% [kernel] [k] internal_add_timer
1.51% [kernel] [k] get_nohz_timer_target
1.28% [kernel] [k] __internal_add_timer
0.78% [kernel] [k] timerfn
0.48% [kernel] [k] wake_up_nohz_cpu
After:
48.10% hog [.] main
15.25% [kernel] [k] _raw_spin_lock_irqsave
9.76% [kernel] [k] _raw_spin_unlock_irqrestore
6.50% [kernel] [k] mod_timer
6.44% [kernel] [k] lock_timer_base.isra.38
3.87% [kernel] [k] detach_if_pending
3.80% [kernel] [k] del_timer
2.67% [kernel] [k] internal_add_timer
1.33% [kernel] [k] __internal_add_timer
0.73% [kernel] [k] timerfn
0.54% [kernel] [k] wake_up_nohz_cpu
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Link: http://lkml.kernel.org/r/20150526224512.127050787@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Simon Horman reported this crash on a system with
high-res timers disabled but nohz enabled:
> ------------[ cut here ]------------
> kernel BUG at kernel/irq_work.c:135!
BUG_ON(!irqs_disabled());
So something enabled interrupts in the periodic tick handling machinery,
and that code path indeed has a local_irq_disable()/enable pair in
tick_nohz_switch_to_nohz() which causes havoc. Fix it.
This patch also fixes a +nohz -hrtimers hang reported by Ingo Molnar.
Reported-by: Simon Horman <horms@verge.net.au>
Reported-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: LAK <linux-arm-kernel@lists.infradead.org>
Cc: Magnus Damm <magnus.damm@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1505071425520.4225@nanos
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The evaluation of the next timer in the nohz code is based on jiffies
while all the tick internals are nano seconds based. We have also to
convert hrtimer nanoseconds to jiffies in the !highres case. That's
just wrong and introduces interesting corner cases.
Turn it around and convert the next timer wheel timer expiry and the
rcu event to clock monotonic and base all calculations on
nanoseconds. That identifies the case where no timer is pending
clearly with an absolute expiry value of KTIME_MAX.
Makes the code more readable and gets rid of the jiffies magic in the
nohz code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Link: http://lkml.kernel.org/r/20150414203502.184198593@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Get rid of one indentation level. Preparatory patch for a major
rework. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Link: http://lkml.kernel.org/r/20150414203502.101563235@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We already got rid of the hrtimer reprogramming loops and hoops as
hrtimer now enforces an interrupt if the enqueued time is in the past.
Do the same for the nohz non highres mode. That gets rid of the need
to raise the softirq which only serves the purpose of getting the
machine out of the inner idle loop.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Link: http://lkml.kernel.org/r/20150414203502.023464878@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
hrtimer_start() enforces a timer interrupt if the timer is already
expired. Get rid of the checks and the forward loop.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Link: http://lkml.kernel.org/r/20150414203501.943658239@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
No point to expose everything to the world. People just believe
such functions can be abused for whatever purposes. Sigh.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ Rebased on top of 4.0-rc5 ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/28017337.VbCUc39Gme@vostro.rjw.lan
[ Merged to latest timers/core ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
printk and friends can now format bitmaps using '%*pb[l]'. cpumask
and nodemask also provide cpumask_pr_args() and nodemask_pr_args()
respectively which can be used to generate the two printf arguments
necessary to format the specified cpu/nodemask.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull NOHZ update from Thomas Gleixner:
"Remove the call into the nohz idle code from the fake 'idle' thread in
the powerclamp driver along with the export of those functions which
was smuggeled in via the thermal tree. People have tried to hack
around it in the nohz core code, but it just violates all rightful
assumptions of that code about the only valid calling context (i.e.
the proper idle task).
The powerclamp trainwreck will still work, it just wont get the
benefit of long idle sleeps"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/powerclamp: Remove tick_nohz_idle abuse
commit 4dbd27711c "tick: export nohz tick idle symbols for module
use" was merged via the thermal tree without an explicit ack from the
relevant maintainers.
The exports are abused by the intel powerclamp driver which implements
a fake idle state from a sched FIFO task. This causes all kinds of
wreckage in the NOHZ core code which rightfully assumes that
tick_nohz_idle_enter/exit() are only called from the idle task itself.
Recent changes in the NOHZ core lead to a failure of the powerclamp
driver and now people try to hack completely broken and backwards
workarounds into the NOHZ core code. This is completely unacceptable
and just papers over the real problem. There are way more subtle
issues lurking around the corner.
The real solution is to fix the powerclamp driver by rewriting it with
a sane concept, but that's beyond the scope of this.
So the only solution for now is to remove the calls into the core NOHZ
code from the powerclamp trainwreck along with the exports.
Fixes: d6d71ee4a1 "PM: Introduce Intel PowerClamp Driver"
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Pan Jacob jun <jacob.jun.pan@intel.com>
Cc: LKP <lkp@01.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412181110110.17382@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull percpu updates from Tejun Heo:
"Nothing interesting. A patch to convert the remaining __get_cpu_var()
users, another to fix non-critical off-by-one in an assertion and a
cosmetic conversion to lockless_dereference() in percpu-ref.
The back-merge from mainline is to receive lockless_dereference()"
* 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu: Replace smp_read_barrier_depends() with lockless_dereference()
percpu: Convert remaining __get_cpu_var uses in 3.18-rcX
percpu: off by one in BUG_ON()
The "cpu" argument to rcu_needs_cpu() is always the current CPU, so drop
it. This in turn allows the "cpu" argument to rcu_cpu_has_callbacks()
to be removed, which allows the uses of "cpu" in both functions to be
replaced with a this_cpu_ptr(). Again, the anticipated cross-CPU uses
of these functions has been replaced by NO_HZ_FULL.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
During the 3.18 merge period additional __get_cpu_var uses were
added. The patch converts these to this_cpu_ptr().
Signed-off-by: Christoph Lameter <cl@linux.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Pull percpu consistent-ops changes from Tejun Heo:
"Way back, before the current percpu allocator was implemented, static
and dynamic percpu memory areas were allocated and handled separately
and had their own accessors. The distinction has been gone for many
years now; however, the now duplicate two sets of accessors remained
with the pointer based ones - this_cpu_*() - evolving various other
operations over time. During the process, we also accumulated other
inconsistent operations.
This pull request contains Christoph's patches to clean up the
duplicate accessor situation. __get_cpu_var() uses are replaced with
with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().
Unfortunately, the former sometimes is tricky thanks to C being a bit
messy with the distinction between lvalues and pointers, which led to
a rather ugly solution for cpumask_var_t involving the introduction of
this_cpu_cpumask_var_ptr().
This converts most of the uses but not all. Christoph will follow up
with the remaining conversions in this merge window and hopefully
remove the obsolete accessors"
* 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
irqchip: Properly fetch the per cpu offset
percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
Revert "powerpc: Replace __get_cpu_var uses"
percpu: Remove __this_cpu_ptr
clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
sparc: Replace __get_cpu_var uses
avr32: Replace __get_cpu_var with __this_cpu_write
blackfin: Replace __get_cpu_var uses
tile: Use this_cpu_ptr() for hardware counters
tile: Replace __get_cpu_var uses
powerpc: Replace __get_cpu_var uses
alpha: Replace __get_cpu_var
ia64: Replace __get_cpu_var uses
s390: cio driver &__get_cpu_var replacements
s390: Replace __get_cpu_var uses
mips: Replace __get_cpu_var uses
MIPS: Replace __get_cpu_var uses in FPU emulator.
arm: Replace __this_cpu_ptr with raw_cpu_ptr
...
Pull s390 updates from Martin Schwidefsky:
"This patch set contains the main portion of the changes for 3.18 in
regard to the s390 architecture. It is a bit bigger than usual,
mainly because of a new driver and the vector extension patches.
The interesting bits are:
- Quite a bit of work on the tracing front. Uprobes is enabled and
the ftrace code is reworked to get some of the lost performance
back if CONFIG_FTRACE is enabled.
- To improve boot time with CONFIG_DEBIG_PAGEALLOC, support for the
IPTE range facility is added.
- The rwlock code is re-factored to improve writer fairness and to be
able to use the interlocked-access instructions.
- The kernel part for the support of the vector extension is added.
- The device driver to access the CD/DVD on the HMC is added, this
will hopefully come in handy to improve the installation process.
- Add support for control-unit initiated reconfiguration.
- The crypto device driver is enhanced to enable the additional AP
domains and to allow the new crypto hardware to be used.
- Bug fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (39 commits)
s390/ftrace: simplify enabling/disabling of ftrace_graph_caller
s390/ftrace: remove 31 bit ftrace support
s390/kdump: add support for vector extension
s390/disassembler: add vector instructions
s390: add support for vector extension
s390/zcrypt: Toleration of new crypto hardware
s390/idle: consolidate idle functions and definitions
s390/nohz: use a per-cpu flag for arch_needs_cpu
s390/vtime: do not reset idle data on CPU hotplug
s390/dasd: add support for control unit initiated reconfiguration
s390/dasd: fix infinite loop during format
s390/mm: make use of ipte range facility
s390/setup: correct 4-level kernel page table detection
s390/topology: call set_sched_topology early
s390/uprobes: architecture backend for uprobes
s390/uprobes: common library for kprobes and uprobes
s390/rwlock: use the interlocked-access facility 1 instructions
s390/rwlock: improve writer fairness
s390/rwlock: remove interrupt-enabling rwlock variant.
s390/mm: remove change bit override support
...
Pull timer updates from Thomas Gleixner:
"Nothing really exciting this time:
- a few fixlets in the NOHZ code
- a new ARM SoC timer abomination. One should expect that we have
enough of them already, but they insist on inventing new ones.
- the usual bunch of ARM SoC timer updates. That feels like herding
cats"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource: arm_arch_timer: Consolidate arch_timer_evtstrm_enable
clocksource: arm_arch_timer: Enable counter access for 32-bit ARM
clocksource: arm_arch_timer: Change clocksource name if CP15 unavailable
clocksource: sirf: Disable counter before re-setting it
clocksource: cadence_ttc: Add support for 32bit mode
clocksource: tcb_clksrc: Sanitize IRQ request
clocksource: arm_arch_timer: Discard unavailable timers correctly
clocksource: vf_pit_timer: Support shutdown mode
ARM: meson6: clocksource: Add Meson6 timer support
ARM: meson: documentation: Add timer documentation
clocksource: sh_tmu: Document r8a7779 binding
clocksource: sh_mtu2: Document r7s72100 binding
clocksource: sh_cmt: Document SoC specific bindings
timerfd: Remove an always true check
nohz: Avoid tick's double reprogramming in highres mode
nohz: Fix spurious periodic tick behaviour in low-res dynticks mode
Move the nohz_delay bit from the s390_idle data structure to the
per-cpu flags. Clear the nohz delay flag in __cpu_disable and
remove the cpu hotplug notifier that used to do this.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The nohz full functionality depends on IRQ work to trigger its own
interrupts. As it's used to restart the tick, we can't rely on the tick
fallback for irq work callbacks, ie: we can't use the tick to restart
the tick itself.
Lets reject the full dynticks initialization if that arch support isn't
available.
As a side effect, this makes sure that nohz kick is never called from
the tick. That otherwise would result in illegal hrtimer self-cancellation
and lockup.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
The supports for CONFIG_NO_HZ_FULL_ALL=y and the nohz_full= kernel
parameter both have their own way to do the same thing: allocate
full dynticks cpumasks, fill them and initialize some state variables.
Lets consolidate that all in the same place.
While at it, convert some regular printk message to warnings when
fundamental allocations fail.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
The local nohz kick is currently used by perf which needs it to be
NMI-safe. Recent commit though (7d1311b93e)
changed its implementation to fire the local kick using the remote kick
API. It was convenient to make the code more generic but the remote kick
isn't NMI-safe.
As a result:
WARNING: CPU: 3 PID: 18062 at kernel/irq_work.c:72 irq_work_queue_on+0x11e/0x140()
CPU: 3 PID: 18062 Comm: trinity-subchil Not tainted 3.16.0+ #34
0000000000000009 00000000903774d1 ffff880244e06c00 ffffffff9a7f1e37
0000000000000000 ffff880244e06c38 ffffffff9a0791dd ffff880244fce180
0000000000000003 ffff880244e06d58 ffff880244e06ef8 0000000000000000
Call Trace:
<NMI> [<ffffffff9a7f1e37>] dump_stack+0x4e/0x7a
[<ffffffff9a0791dd>] warn_slowpath_common+0x7d/0xa0
[<ffffffff9a07930a>] warn_slowpath_null+0x1a/0x20
[<ffffffff9a17ca1e>] irq_work_queue_on+0x11e/0x140
[<ffffffff9a10a2c7>] tick_nohz_full_kick_cpu+0x57/0x90
[<ffffffff9a186cd5>] __perf_event_overflow+0x275/0x350
[<ffffffff9a184f80>] ? perf_event_task_disable+0xa0/0xa0
[<ffffffff9a01a4cf>] ? x86_perf_event_set_period+0xbf/0x150
[<ffffffff9a187934>] perf_event_overflow+0x14/0x20
[<ffffffff9a020386>] intel_pmu_handle_irq+0x206/0x410
[<ffffffff9a0b54d3>] ? arch_vtime_task_switch+0x63/0x130
[<ffffffff9a01937b>] perf_event_nmi_handler+0x2b/0x50
[<ffffffff9a007b72>] nmi_handle+0xd2/0x390
[<ffffffff9a007aa5>] ? nmi_handle+0x5/0x390
[<ffffffff9a0d131b>] ? lock_release+0xab/0x330
[<ffffffff9a008062>] default_do_nmi+0x72/0x1c0
[<ffffffff9a0c925f>] ? cpuacct_account_field+0xcf/0x200
[<ffffffff9a008268>] do_nmi+0xb8/0x100
Lets fix this by restoring the use of local irq work for the nohz local
kick.
Reported-by: Catalin Iacob <iacobcatalin@gmail.com>
Reported-and-tested-by: Dave Jones <davej@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Convert all uses of __get_cpu_var for address calculation to use
this_cpu_ptr instead.
[Uses of __get_cpu_var with cpumask_var_t are no longer
handled by this patch]
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Convert uses of __get_cpu_var for creating a address from a percpu
offset to this_cpu_ptr.
The two cases where get_cpu_var is used to actually access a percpu
variable are changed to use this_cpu_read/raw_cpu_read.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
In highres mode, the tick reschedules itself unconditionally to the
next jiffies.
However while this clock reprogramming is relevant when the tick is
in periodic mode, it's not that interesting when we run in dynticks mode
because irq exit is likely going to overwrite the next tick to some
randomly deferred future.
So lets just get rid of this tick self rescheduling in dynticks mode.
This way we can avoid some clockevents double write in favourable
scenarios like when we stop the tick completely in idle while no other
hrtimer is pending.
Suggested-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
When we reach the end of the tick handler, we unconditionally reschedule
the next tick to the next jiffy. Then on irq exit, the nohz code
overrides that setting if needed and defers the next tick as far away in
the future as possible.
Now in the best dynticks case, when we actually don't need any tick in
the future (ie: expires == KTIME_MAX), low-res and high-res behave
differently. What we want in this case is to cancel the next tick
programmed by the previous one. That's what we do in high-res mode. OTOH
we lack a low-res mode equivalent of hrtimer_cancel() so we simply don't
do anything in this case and the next tick remains scheduled to jiffies + 1.
As a result, in low-res mode, when the dynticks code determines that no
tick is needed in the future, we can recursively get a spurious tick
every jiffy because then the next tick is always reprogrammed from the
tick handler and is never cancelled. And this can happen indefinetly
until some subsystem actually needs a precise tick in the future and only
then we eventually overwrite the previous tick handler setting to defer
the next tick.
We are fixing this by introducing the ONESHOT_STOPPED mode which will
let us pause a clockevent when no further interrupt is needed. Meanwhile
we can't expect all drivers to support this new mode.
So lets reduce much of the symptoms by skipping the nohz-blind tick
rescheduling from the tick-handler when the CPU is in dynticks mode.
That tick rescheduling wrongly assumed periodicity and the low-res
dynticks code can't cancel such decision. This breaks the recursive (and
thus the worst) part of the problem. In the worst case now, we'll get
only one extra tick due to uncancelled tick scheduled before we entered
dynticks mode.
This also removes a needless clockevent write on idle ticks. Since those
clock write are usually considered to be slow, it's a general win.
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Pull scheduler updates from Ingo Molnar:
- Move the nohz kick code out of the scheduler tick to a dedicated IPI,
from Frederic Weisbecker.
This necessiated quite some background infrastructure rework,
including:
* Clean up some irq-work internals
* Implement remote irq-work
* Implement nohz kick on top of remote irq-work
* Move full dynticks timer enqueue notification to new kick
* Move multi-task notification to new kick
* Remove unecessary barriers on multi-task notification
- Remove proliferation of wait_on_bit() action functions and allow
wait_on_bit_action() functions to support a timeout. (Neil Brown)
- Another round of sched/numa improvements, cleanups and fixes. (Rik
van Riel)
- Implement fast idling of CPUs when the system is partially loaded,
for better scalability. (Tim Chen)
- Restructure and fix the CPU hotplug handling code that may leave
cfs_rq and rt_rq's throttled when tasks are migrated away from a dead
cpu. (Kirill Tkhai)
- Robustify the sched topology setup code. (Peterz Zijlstra)
- Improve sched_feat() handling wrt. static_keys (Jason Baron)
- Misc fixes.
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
sched/fair: Fix 'make xmldocs' warning caused by missing description
sched: Use macro for magic number of -1 for setparam
sched: Robustify topology setup
sched: Fix sched_setparam() policy == -1 logic
sched: Allow wait_on_bit_action() functions to support a timeout
sched: Remove proliferation of wait_on_bit() action functions
sched/numa: Revert "Use effective_load() to balance NUMA loads"
sched: Fix static_key race with sched_feat()
sched: Remove extra static_key*() function indirection
sched/rt: Fix replenish_dl_entity() comments to match the current upstream code
sched: Transform resched_task() into resched_curr()
sched/deadline: Kill task_struct->pi_top_task
sched: Rework check_for_tasks()
sched/rt: Enqueue just unthrottled rt_rq back on the stack in __disable_runtime()
sched/fair: Disable runtime_enabled on dying rq
sched/numa: Change scan period code to match intent
sched/numa: Rework best node setting in task_numa_migrate()
sched/numa: Examine a task move when examining a task swap
sched/numa: Simplify task_numa_compare()
sched/numa: Use effective_load() to balance NUMA loads
...
Binding the grace-period kthreads to the timekeeping CPU resulted in
significant performance decreases for some workloads. For more detail,
see:
https://lkml.org/lkml/2014/6/3/395 for benchmark numbers
https://lkml.org/lkml/2014/6/4/218 for CPU statistics
It turns out that it is necessary to bind the grace-period kthreads
to the timekeeping CPU only when all but CPU 0 is a nohz_full CPU
on the one hand or if CONFIG_NO_HZ_FULL_SYSIDLE=y on the other.
In other cases, it suffices to bind the grace-period kthreads to the
set of non-nohz_full CPUs.
This commit therefore creates a tick_nohz_not_full_mask that is the
complement of tick_nohz_full_mask, and then binds the grace-period
kthread to the set of CPUs indicated by this new mask, which covers
the CONFIG_NO_HZ_FULL_SYSIDLE=n case. The CONFIG_NO_HZ_FULL_SYSIDLE=y
case still binds the grace-period kthreads to the timekeeping CPU.
This commit also includes the tick_nohz_full_enabled() check suggested
by Frederic Weisbecker.
Reported-by: Jet Chen <jet.chen@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Created housekeeping_affine() and housekeeping_mask per
fweisbec feedback. ]
Remotely kicking a full nohz CPU in order to make it re-evaluate its
next tick is currently implemented using the scheduler IPI.
However this bloats a scheduler fast path with an off-topic feature.
The scheduler tick was abused here for its cool "callable
anywhere/anytime" properties.
But now that the irq work subsystem can queue remote callbacks, it's
a perfect fit to safely queue IPIs when interrupts are disabled
without worrying about concurrent callers.
So lets implement remote kick on top of irq work. This is going to
be used when a new event requires the next tick to be recalculated:
more than 1 task competing on the CPU, timer armed, ...
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>