evie/android_kernel_oneplus_msm8998 - Gay Catgirls Forgejo: gay catgirls having sex

evie/android_kernel_oneplus_msm8998

Author	SHA1	Message	Date
Greg Kroah-Hartman	79e7124a51	This is the 4.4.123 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlqzaAQACgkQONu9yGCS aT6/iw//SWWvNvIYVs9SUcAk1lLRdxIPEJ2mR4CELU8S9WblInX8MS/w3+qGUGVt 9LzQUfxqbwpX0luel6QnyvME66qQI+VQ0CzA7Qtuzoy2UNlr5woW112nOzkBOuKt +KjHg3cUwPijljEZnhPq7eEv6TrHrb/ZsU4jjTLMrlL1u/TYfJQGrU+JAFpTbHto 9IyVu/F80E/Vln2GHwYidHUbWG70lYIY75J/CXvklbfrxuRpsrOhQla52YspzI2p B9Q5p6Ct7FIO1SEBdVdBYLMMBbrt/+dbQzHMRWyUtTudiCSgeQ+8JEWRo6Pu+dGs 1cW9uCJ+Q7AolQWicG/H80YBZczkB/p77PRCua2vVpx8e/FYSujP3E5T8Mnl7k81 mmSoNRF8SqYXSwWDCqd+c/ck4zPRKPM++eNpkor+M0QyT9MOfh9oq4vIW4HoZQEm j+mZWUNihLO+O7znNGLoQLWn2mzDLjcrfnM6DOaKc5FJht/7OG8yyaobINYoUCmA +1cQP0RsRrWo87zpcWoHKRSZ6Iw+YtEu3i7q2bqEdtb2Jc/88nQ5/eYcBLYDaPtw DZ8GXL8kxXCP2J2GumwHIBTY68yZPVBknABGo5odqZwfkSgjK5fL17MzWX10Occp 0U6r+IluHkRCSxcCuyS8w10+TPn/Rw26wJCstPYargfiw/twILA= =H2Hj -----END PGP SIGNATURE----- Merge 4.4.123 into android-4.4 Changes in 4.4.123 blkcg: fix double free of new_blkg in blkcg_init_queue Input: tsc2007 - check for presence and power down tsc2007 during probe staging: speakup: Replace BUG_ON() with WARN_ON(). staging: wilc1000: add check for kmalloc allocation failure. HID: reject input outside logical range only if null state is set drm: qxl: Don't alloc fbdev if emulation is not supported ath10k: fix a warning during channel switch with multiple vaps PCI/MSI: Stop disabling MSI/MSI-X in pci_device_shutdown() selinux: check for address length in selinux_socket_bind() perf sort: Fix segfault with basic block 'cycles' sort dimension i40e: Acquire NVM lock before reads on all devices i40e: fix ethtool to get EEPROM data from X722 interface perf tools: Make perf_event__synthesize_mmap_events() scale drivers: net: xgene: Fix hardware checksum setting drm: Defer disabling the vblank IRQ until the next interrupt (for instant-off) ath10k: disallow DFS simulation if DFS channel is not enabled perf probe: Return errno when not hitting any event HID: clamp input to logical range if no null state net/8021q: create device with all possible features in wanted_features ARM: dts: Adjust moxart IRQ controller and flags batman-adv: handle race condition for claims between gateways of: fix of_device_get_modalias returned length when truncating buffers solo6x10: release vb2 buffers in solo_stop_streaming() scsi: ipr: Fix missed EH wakeup media: i2c/soc_camera: fix ov6650 sensor getting wrong clock timers, sched_clock: Update timeout for clock wrap sysrq: Reset the watchdog timers while displaying high-resolution timers Input: qt1070 - add OF device ID table sched: act_csum: don't mangle TCP and UDP GSO packets ASoC: rcar: ssi: don't set SSICR.CKDV = 000 with SSIWSR.CONT spi: omap2-mcspi: poll OMAP2_MCSPI_CHSTAT_RXS for PIO transfer tcp: sysctl: Fix a race to avoid unexpected 0 window from space dmaengine: imx-sdma: add 1ms delay to ensure SDMA channel is stopped driver: (adm1275) set the m,b and R coefficients correctly for power mm: Fix false-positive VM_BUG_ON() in page_cache_{get,add}_speculative() blk-throttle: make sure expire time isn't too big f2fs: relax node version check for victim data in gc bonding: refine bond_fold_stats() wrap detection braille-console: Fix value returned by _braille_console_setup drm/vmwgfx: Fixes to vmwgfx_fb vxlan: vxlan dev should inherit lowerdev's gso_max_size NFC: nfcmrvl: Include unaligned.h instead of access_ok.h NFC: nfcmrvl: double free on error path ARM: dts: r8a7790: Correct parent of SSI[0-9] clocks ARM: dts: r8a7791: Correct parent of SSI[0-9] clocks powerpc: Avoid taking a data miss on every userspace instruction miss net/faraday: Add missing include of of.h ARM: dts: koelsch: Correct clock frequency of X2 DU clock input reiserfs: Make cancel_old_flush() reliable ALSA: firewire-digi00x: handle all MIDI messages on streaming packets fm10k: correctly check if interface is removed scsi: ses: don't get power status of SES device slot on probe apparmor: Make path_max parameter readonly iommu/iova: Fix underflow bug in __alloc_and_insert_iova_range video: ARM CLCD: fix dma allocation size drm/radeon: Fail fb creation from imported dma-bufs. drm/amdgpu: Fail fb creation from imported dma-bufs. (v2) coresight: Fixes coresight DT parse to get correct output port ID. MIPS: BPF: Quit clobbering callee saved registers in JIT code. MIPS: BPF: Fix multiple problems in JIT skb access helpers. MIPS: r2-on-r6-emu: Fix BLEZL and BGTZL identification MIPS: r2-on-r6-emu: Clear BLTZALL and BGEZALL debugfs counters regulator: isl9305: fix array size md/raid6: Fix anomily when recovering a single device in RAID6. usb: dwc2: Make sure we disconnect the gadget state usb: gadget: dummy_hcd: Fix wrong power status bit clear/reset in dummy_hub_control() drivers/perf: arm_pmu: handle no platform_device perf inject: Copy events when reordering events in pipe mode perf session: Don't rely on evlist in pipe mode scsi: sg: check for valid direction before starting the request scsi: sg: close race condition in sg_remove_sfp_usercontext() kprobes/x86: Fix kprobe-booster not to boost far call instructions kprobes/x86: Set kprobes pages read-only pwm: tegra: Increase precision in PWM rate calculation wil6210: fix memory access violation in wil_memcpy_from/toio_32 drm/edid: set ELD connector type in drm_edid_to_eld() video/hdmi: Allow "empty" HDMI infoframes HID: elo: clear BTN_LEFT mapping ARM: dts: exynos: Correct Trats2 panel reset line sched: Stop switched_to_rt() from sending IPIs to offline CPUs sched: Stop resched_cpu() from sending IPIs to offline CPUs test_firmware: fix setting old custom fw path back on exit net: xfrm: allow clearing socket xfrm policies. mtd: nand: fix interpretation of NAND_CMD_NONE in nand_command[_lp]() ARM: dts: am335x-pepper: Fix the audio CODEC's reset pin ARM: dts: omap3-n900: Fix the audio CODEC's reset pin ath10k: update tdls teardown state to target cpufreq: Fix governor module removal race clk: qcom: msm8916: fix mnd_width for codec_digcodec ath10k: fix invalid STS_CAP_OFFSET_MASK tools/usbip: fixes build with musl libc toolchain spi: sun6i: disable/unprepare clocks on remove scsi: core: scsi_get_device_flags_keyed(): Always return device flags scsi: devinfo: apply to HP XP the same flags as Hitachi VSP scsi: dh: add new rdac devices media: cpia2: Fix a couple off by one bugs veth: set peer GSO values drm/amdkfd: Fix memory leaks in kfd topology agp/intel: Flush all chipset writes after updating the GGTT mac80211_hwsim: enforce PS_MANUAL_POLL to be set after PS_ENABLED mac80211: remove BUG() when interface type is invalid ASoC: nuc900: Fix a loop timeout test ipvlan: add L2 check for packets arriving via virtual devices rcutorture/configinit: Fix build directory error message ima: relax requiring a file signature for new files with zero length selftests/x86/entry_from_vm86: Exit with 1 if we fail selftests/x86: Add tests for User-Mode Instruction Prevention selftests/x86: Add tests for the STR and SLDT instructions selftests/x86/entry_from_vm86: Add test cases for POPF x86/vm86/32: Fix POPF emulation x86/mm: Fix vmalloc_fault to use pXd_large ALSA: pcm: Fix UAF in snd_pcm_oss_get_formats() ALSA: hda - Revert power_save option default value ALSA: seq: Fix possible UAF in snd_seq_check_queue() ALSA: seq: Clear client entry before deleting else at closing drm/amdgpu/dce: Don't turn off DP sink when disconnected fs: Teach path_connected to handle nfs filesystems with multiple roots. lock_parent() needs to recheck if dentry got __dentry_kill'ed under it fs/aio: Add explicit RCU grace period when freeing kioctx fs/aio: Use RCU accessors for kioctx_table->table[] irqchip/gic-v3-its: Ensure nr_ites >= nr_lpis scsi: sg: fix SG_DXFER_FROM_DEV transfers scsi: sg: fix static checker warning in sg_is_valid_dxfer scsi: sg: only check for dxfer_len greater than 256M ARM: dts: LogicPD Torpedo: Fix I2C1 pinmux btrfs: alloc_chunk: fix DUP stripe size handling btrfs: Fix use-after-free when cleaning up fs_devs with a single stale device USB: gadget: udc: Add missing platform_device_put() on error in bdc_pci_probe() usb: gadget: bdc: 64-bit pointer capability check bpf: fix incorrect sign extension in check_alu_op() Linux 4.4.123 Change-Id: Ieb89411248f93522dde29edb8581f8ece22e33a7 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2018-03-22 09:57:28 +01:00
Paul E. McKenney	35be5af4d2	sched: Stop switched_to_rt() from sending IPIs to offline CPUs [ Upstream commit 2fe2582649aa2355f79acddb86bd4d6c5363eb63 ] The rcutorture test suite occasionally provokes a splat due to invoking rt_mutex_lock() which needs to boost the priority of a task currently sitting on a runqueue that belongs to an offline CPU: WARNING: CPU: 0 PID: 12 at /home/paulmck/public_git/linux-rcu/arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x37/0x40 Modules linked in: CPU: 0 PID: 12 Comm: rcub/7 Not tainted 4.14.0-rc4+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 task: ffff9ed3de5f8cc0 task.stack: ffffbbf80012c000 RIP: 0010:native_smp_send_reschedule+0x37/0x40 RSP: 0018:ffffbbf80012fd10 EFLAGS: 00010082 RAX: 000000000000002f RBX: ffff9ed3dd9cb300 RCX: 0000000000000004 RDX: 0000000080000004 RSI: 0000000000000086 RDI: 00000000ffffffff RBP: ffffbbf80012fd10 R08: 000000000009da7a R09: 0000000000007b9d R10: 0000000000000001 R11: ffffffffbb57c2cd R12: 000000000000000d R13: ffff9ed3de5f8cc0 R14: 0000000000000061 R15: ffff9ed3ded59200 FS: 0000000000000000(0000) GS:ffff9ed3dea00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000080686f0 CR3: 000000001b9e0000 CR4: 00000000000006f0 Call Trace: resched_curr+0x61/0xd0 switched_to_rt+0x8f/0xa0 rt_mutex_setprio+0x25c/0x410 task_blocks_on_rt_mutex+0x1b3/0x1f0 rt_mutex_slowlock+0xa9/0x1e0 rt_mutex_lock+0x29/0x30 rcu_boost_kthread+0x127/0x3c0 kthread+0x104/0x140 ? rcu_report_unblock_qs_rnp+0x90/0x90 ? kthread_create_on_node+0x40/0x40 ret_from_fork+0x22/0x30 Code: f0 00 0f 92 c0 84 c0 74 14 48 8b 05 34 74 c5 00 be fd 00 00 00 ff 90 a0 00 00 00 5d c3 89 fe 48 c7 c7 a0 c6 fc b9 e8 d5 b5 06 00 <0f> ff 5d c3 0f 1f 44 00 00 8b 05 a2 d1 13 02 85 c0 75 38 55 48 But the target task's priority has already been adjusted, so the only purpose of switched_to_rt() invoking resched_curr() is to wake up the CPU running some task that needs to be preempted by the boosted task. But the CPU is offline, which presumably means that the task must be migrated to some other CPU, and that this other CPU will undertake any needed preemption at the time of migration. Because the runqueue lock is held when resched_curr() is invoked, we know that the boosted task cannot go anywhere, so it is not necessary to invoke resched_curr() in this particular case. This commit therefore makes switched_to_rt() refrain from invoking resched_curr() when the target CPU is offline. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-03-22 09:23:27 +01:00
Greg Kroah-Hartman	20ddb25b3e	This is the 4.4.116 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlqHLN8ACgkQONu9yGCS aT7eyQ/+NGK3/MPgoqRtg8sEvr1CVk8VhH1BiBfiQPGXe/D4nqPrKQzQBBzsW8QX 6Z9PY7wDz9RgFkw+FoOyG0eLuYdgNYOelASdQ4kJzteVH8pB2GxxTbX0drttzV+F liNy0w39YLYxbjR4FavOSuDekd46dNQsHBvzTawaFKh0BEtQO+1uUGMg1LjMKVPn F9ry0mEPrOoC2+nRvU6QXIUZy6y4+Pgdda0sfGcO3yXwQev9HoW5h9qMCnGah30J D3Glt86dtpQcuqeIaXrfX+HnkvAOxTHjP8uRn3O7A7h8+WYBWq5Xms6A7EE9duNV 0UA8OZpvq0r0YSTmBFzrDexAcf/cXW8ajd/VKseI/d53iIauLV5FUaGldLJ3IQMc gYZ2uNxGTI4z3V+nIiVQ0NCm4kmqogVY8PvMlgUwiFVG2B088iYGZ7iTOQ9b7wBO VgDo0ouC/yDA8Lmz/A0l3SuvkJDNIPJit5lWzqCGRjk1F8WdPpI5C3ONfp8R3Lko sTllldOo982KW5up/fg5HfuMg1OjgXZtzO+/NlTtyTpSr9bb1OoniSROG8eEcMqO lKI1MB8Xx/pqqW1E8OOtb7A/8JPCBFzVV9xVGKwI0uZa2XOQeAwGruOe8Ub6nEpU 8w30DlSgy8MB1BPL6UGC6k+001k8jkohdl/qjpYb6aK55CfbhlA= =a3k5 -----END PGP SIGNATURE----- Merge 4.4.116 into android-4.4 Changes in 4.4.116 powerpc/bpf/jit: Disable classic BPF JIT on ppc64le powerpc/64: Fix flush_(d\|i)cache_range() called from modules powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VEC powerpc: Simplify module TOC handling powerpc/pseries: Add H_GET_CPU_CHARACTERISTICS flags & wrapper powerpc/64: Add macros for annotating the destination of rfid/hrfid powerpc/64s: Simple RFI macro conversions powerpc/64: Convert fast_exception_return to use RFI_TO_USER/KERNEL powerpc/64: Convert the syscall exit path to use RFI_TO_USER/KERNEL powerpc/64s: Convert slb_miss_common to use RFI_TO_USER/KERNEL powerpc/64s: Add support for RFI flush of L1-D cache powerpc/64s: Support disabling RFI flush with no_rfi_flush and nopti powerpc/pseries: Query hypervisor for RFI flush settings powerpc/powernv: Check device-tree for RFI flush settings powerpc/64s: Wire up cpu_show_meltdown() powerpc/64s: Allow control of RFI flush via debugfs ASoC: pcm512x: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE usbip: vhci_hcd: clear just the USB_PORT_STAT_POWER bit usbip: fix 3eee23c3ec14 tcp_socket address still in the status file net: cdc_ncm: initialize drvflags before usage ASoC: simple-card: Fix misleading error message ASoC: rsnd: don't call free_irq() on Parent SSI ASoC: rsnd: avoid duplicate free_irq() drm: rcar-du: Use the VBK interrupt for vblank events drm: rcar-du: Fix race condition when disabling planes at CRTC stop x86/asm: Fix inline asm call constraints for GCC 4.4 ip6mr: fix stale iterator net: igmp: add a missing rcu locking section qlcnic: fix deadlock bug r8169: fix RTL8168EP take too long to complete driver initialization. tcp: release sk_frag.page in tcp_disconnect vhost_net: stop device during reset owner media: soc_camera: soc_scale_crop: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE KEYS: encrypted: fix buffer overread in valid_master_desc() don't put symlink bodies in pagecache into highmem crypto: tcrypt - fix S/G table for test_aead_speed() x86/microcode/AMD: Do not load when running on a hypervisor x86/microcode: Do the family check first powerpc/pseries: include linux/types.h in asm/hvcall.h cifs: Fix missing put_xid in cifs_file_strict_mmap cifs: Fix autonegotiate security settings mismatch CIFS: zero sensitive data when freeing dmaengine: dmatest: fix container_of member in dmatest_callback x86/kaiser: fix build error with KASAN && !FUNCTION_GRAPH_TRACER kaiser: fix compile error without vsyscall netfilter: nf_queue: Make the queue_handler pernet posix-timer: Properly check sigevent->sigev_notify usb: gadget: uvc: Missing files for configfs interface sched/rt: Use container_of() to get root domain in rto_push_irq_work_func() sched/rt: Up the root domain ref count when passing it around via IPIs dccp: CVE-2017-8824: use-after-free in DCCP code media: dvb-usb-v2: lmedm04: Improve logic checking of warm start media: dvb-usb-v2: lmedm04: move ts2020 attach to dm04_lme2510_tuner mtd: cfi: convert inline functions to macros mtd: nand: brcmnand: Disable prefetch by default mtd: nand: Fix nand_do_read_oob() return value mtd: nand: sunxi: Fix ECC strength choice ubi: block: Fix locking for idr_alloc/idr_remove nfs/pnfs: fix nfs_direct_req ref leak when i/o falls back to the mds NFS: Add a cond_resched() to nfs_commit_release_pages() NFS: commit direct writes even if they fail partially NFS: reject request for id_legacy key without auxdata kernfs: fix regression in kernfs_fop_write caused by wrong type ahci: Annotate PCI ids for mobile Intel chipsets as such ahci: Add PCI ids for Intel Bay Trail, Cherry Trail and Apollo Lake AHCI ahci: Add Intel Cannon Lake PCH-H PCI ID crypto: hash - introduce crypto_hash_alg_has_setkey() crypto: cryptd - pass through absence of ->setkey() crypto: poly1305 - remove ->setkey() method nsfs: mark dentry with DCACHE_RCUACCESS media: v4l2-ioctl.c: don't copy back the result for -ENOTTY vb2: V4L2_BUF_FLAG_DONE is set after DQBUF media: v4l2-compat-ioctl32.c: add missing VIDIOC_PREPARE_BUF media: v4l2-compat-ioctl32.c: fix the indentation media: v4l2-compat-ioctl32.c: move 'helper' functions to __get/put_v4l2_format32 media: v4l2-compat-ioctl32.c: avoid sizeof(type) media: v4l2-compat-ioctl32.c: copy m.userptr in put_v4l2_plane32 media: v4l2-compat-ioctl32.c: fix ctrl_is_pointer media: v4l2-compat-ioctl32.c: make ctrl_is_pointer work for subdevs media: v4l2-compat-ioctl32: Copy v4l2_window->global_alpha media: v4l2-compat-ioctl32.c: copy clip list in put_v4l2_window32 media: v4l2-compat-ioctl32.c: drop pr_info for unknown buffer type media: v4l2-compat-ioctl32.c: don't copy back the result for certain errors media: v4l2-compat-ioctl32.c: refactor compat ioctl32 logic crypto: caam - fix endless loop when DECO acquire fails arm: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2 watchdog: imx2_wdt: restore previous timeout after suspend+resume media: ts2020: avoid integer overflows on 32 bit machines media: cxusb, dib0700: ignore XC2028_I2C_FLUSH kernel/async.c: revert "async: simplify lowest_in_progress()" HID: quirks: Fix keyboard + touchpad on Toshiba Click Mini not working Bluetooth: btsdio: Do not bind to non-removable BCM43341 Revert "Bluetooth: btusb: fix QCA Rome suspend/resume" Bluetooth: btusb: Restore QCA Rome suspend/resume fix with a "rewritten" version signal/openrisc: Fix do_unaligned_access to send the proper signal signal/sh: Ensure si_signo is initialized in do_divide_error alpha: fix crash if pthread_create races with signal delivery alpha: fix reboot on Avanti platform xtensa: fix futex_atomic_cmpxchg_inatomic EDAC, octeon: Fix an uninitialized variable warning pktcdvd: Fix pkt_setup_dev() error path btrfs: Handle btrfs_set_extent_delalloc failure in fixup worker nvme: Fix managing degraded controllers ACPI: sbshc: remove raw pointer from printk() message ovl: fix failure to fsync lower dir mn10300/misalignment: Use SIGSEGV SEGV_MAPERR to report a failed user copy ftrace: Remove incorrect setting of glob search field Linux 4.4.116 Change-Id: Id000cb8d59b74de063902e9ad24dd07fe1b1694b Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2018-02-20 16:23:06 +01:00
Steven Rostedt (VMware)	911357aed6	sched/rt: Up the root domain ref count when passing it around via IPIs commit 364f56653708ba8bcdefd4f0da2a42904baa8eeb upstream. When issuing an IPI RT push, where an IPI is sent to each CPU that has more than one RT task scheduled on it, it references the root domain's rto_mask, that contains all the CPUs within the root domain that has more than one RT task in the runable state. The problem is, after the IPIs are initiated, the rq->lock is released. This means that the root domain that is associated to the run queue could be freed while the IPIs are going around. Add a sched_get_rd() and a sched_put_rd() that will increment and decrement the root domain's ref count respectively. This way when initiating the IPIs, the scheduler will up the root domain's ref count before releasing the rq->lock, ensuring that the root domain does not go away until the IPI round is complete. Reported-by: Pavan Kondeti <pkondeti@codeaurora.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 4bdced5c9a292 ("sched/rt: Simplify the IPI based RT balancing logic") Link: http://lkml.kernel.org/r/CAEU1=PkiHO35Dzna8EQqNSKW1fr1y1zRQ5y66X117MG06sQtNA@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-02-16 20:09:40 +01:00
Steven Rostedt (VMware)	af9de1a10f	sched/rt: Use container_of() to get root domain in rto_push_irq_work_func() commit ad0f1d9d65938aec72a698116cd73a980916895e upstream. When the rto_push_irq_work_func() is called, it looks at the RT overloaded bitmask in the root domain via the runqueue (rq->rd). The problem is that during CPU up and down, nothing here stops rq->rd from changing between taking the rq->rd->rto_lock and releasing it. That means the lock that is released is not the same lock that was taken. Instead of using this_rq()->rd to get the root domain, as the irq work is part of the root domain, we can simply get the root domain from the irq work that is passed to the routine: container_of(work, struct root_domain, rto_push_work) This keeps the root domain consistent. Reported-by: Pavan Kondeti <pkondeti@codeaurora.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 4bdced5c9a292 ("sched/rt: Simplify the IPI based RT balancing logic") Link: http://lkml.kernel.org/r/CAEU1=PkiHO35Dzna8EQqNSKW1fr1y1zRQ5y66X117MG06sQtNA@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-02-16 20:09:40 +01:00
Joel Fernandes	a81d322647	ANDROID: sched/rt: schedtune: Add boost retention to RT Boosted RT tasks can be deboosted quickly, this makes boost usless for RT tasks and causes lots of glitching. Use timers to prevent de-boost too soon and wait for long enough such that next enqueue happens after a threshold. While this can be solved in the governor, there are following advantages: - The approach used is governor-independent - Reduces boost group lock contention for frequently sleepers/wakers Note: Fixed build breakage due to schedfreq dependency which isn't used for RT anymore. Bug: 30210506 Change-Id: I428a2695cac06cc3458cdde0dea72315e4e66c00 Signed-off-by: Joel Fernandes <joelaf@google.com>	2018-02-01 11:19:48 -08:00
Greg Kroah-Hartman	79f138ac8c	This is the 4.4.107 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlo6J9YACgkQONu9yGCS aT7Cdg/8D7+btjAJPjk/suKUBSOpfSkYIoakaVSGf7r7gFv7SkMF023SikLUK+vN xa0FL1bYASzXuKgcxY9vB7ZkCDShrglqTCIpbWwHJhwS0fRGTGrMN2MM+opVgeoG 4ngnEPLue5TqZs3LVrpTySQFODlxnY3C4lpKopN7QNrcr1M5iiMELXCJu/qy6JhC ZBsRGUY8GHbouqC0YSqNlrv+C7zbfAlaawIBDSmYm0R4F+TuqoKZlBGAJ9lbALcZ pM8OaOXx9v471RhE7Tcsl3Eiz3vKHFKWxG/ZujkSqB21wPq6gd4VuP/wMuelX0GC rDTb/nn9Zhmv7UCOn62htlRLrAnSaJ9FlEK+u3TJ+XBGE9gmanH9IjIljCehEZeI 55Vm7q6IwQT2WvgTzqUoco4AYI37T9pqJ++I1E3jY/zk+bfCIQ1ZMpMXmwAUx738 m7boO38eRnyXMxqf4hfVQ4BFPkwaxdW/I3LDanE6U85Hw2nI2uZIPHRbVrC6gAPS aY9EUFEancxu4mW92mWWKrEnEWs5Jsb313ISAKSU75WwWmZ/tgsUtSzvY7XNPwYr G//HbPA5zNFdM1zSO4HQiLYjOqAiwNNAbDEWKoYr8MQpgYbTB4/SgT15mJzw6uTo WtKZsIWMYiXVW8cNhQiWAVUJVf66GMlc6kHcyHU7YHH3obZAYXU= =DNsf -----END PGP SIGNATURE----- Merge 4.4.107 into android-4.4 Changes in 4.4.107 crypto: hmac - require that the underlying hash algorithm is unkeyed crypto: salsa20 - fix blkcipher_walk API usage autofs: fix careless error in recent commit tracing: Allocate mask_str buffer dynamically USB: uas and storage: Add US_FL_BROKEN_FUA for another JMicron JMS567 ID USB: core: prevent malicious bNumInterfaces overflow usbip: fix stub_send_ret_submit() vulnerability to null transfer_buffer ceph: drop negative child dentries before try pruning inode's alias Bluetooth: btusb: driver to enable the usb-wakeup feature xhci: Don't add a virt_dev to the devs array before it's fully allocated sched/rt: Do not pull from current CPU if only one CPU to pull dmaengine: dmatest: move callback wait queue to thread context ext4: fix fdatasync(2) after fallocate(2) operation ext4: fix crash when a directory's i_size is too small KEYS: add missing permission check for request_key() destination mac80211: Fix addition of mesh configuration element usb: phy: isp1301: Add OF device ID table md-cluster: free md_cluster_info if node leave cluster userfaultfd: shmem: __do_fault requires VM_FAULT_NOPAGE userfaultfd: selftest: vm: allow to build in vm/ directory net: initialize msg.msg_flags in recvfrom net: bcmgenet: correct the RBUF_OVFL_CNT and RBUF_ERR_CNT MIB values net: bcmgenet: correct MIB access of UniMAC RUNT counters net: bcmgenet: reserved phy revisions must be checked first net: bcmgenet: power down internal phy if open or resume fails net: bcmgenet: Power up the internal PHY before probing the MII NFSD: fix nfsd_minorversion(.., NFSD_AVAIL) NFSD: fix nfsd_reset_versions for NFSv4. Input: i8042 - add TUXEDO BU1406 (N24_25BU) to the nomux list drm/omap: fix dmabuf mmap for dma_alloc'ed buffers netfilter: bridge: honor frag_max_size when refragmenting writeback: fix memory leak in wb_queue_work() net: wimax/i2400m: fix NULL-deref at probe dmaengine: Fix array index out of bounds warning in __get_unmap_pool() net: Resend IGMP memberships upon peer notification. mlxsw: reg: Fix SPVM max record count mlxsw: reg: Fix SPVMLR max record count intel_th: pci: Add Gemini Lake support openrisc: fix issue handling 8 byte get_user calls scsi: hpsa: update check for logical volume status scsi: hpsa: limit outstanding rescans fjes: Fix wrong netdevice feature flags drm/radeon/si: add dpm quirk for Oland sched/deadline: Make sure the replenishment timer fires in the next period sched/deadline: Throttle a constrained deadline task activated after the deadline sched/deadline: Use deadline instead of period when calculating overflow mmc: mediatek: Fixed bug where clock frequency could be set wrong drm/radeon: reinstate oland workaround for sclk afs: Fix missing put_page() afs: Populate group ID from vnode status afs: Adjust mode bits processing afs: Flush outstanding writes when an fd is closed afs: Migrate vlocation fields to 64-bit afs: Prevent callback expiry timer overflow afs: Fix the maths in afs_fs_store_data() afs: Populate and use client modification time afs: Fix page leak in afs_write_begin() afs: Fix afs_kill_pages() net/mlx4_core: Avoid delays during VF driver device shutdown perf symbols: Fix symbols__fixup_end heuristic for corner cases efi/esrt: Cleanup bad memory map log messages NFSv4.1 respect server's max size in CREATE_SESSION btrfs: add missing memset while reading compressed inline extents target: Use system workqueue for ALUA transitions target: fix ALUA transition timeout handling target: fix race during implicit transition work flushes sfc: don't warn on successful change of MAC fbdev: controlfb: Add missing modes to fix out of bounds access video: udlfb: Fix read EDID timeout video: fbdev: au1200fb: Release some resources if a memory allocation fails video: fbdev: au1200fb: Return an error code if a memory allocation fails rtc: pcf8563: fix output clock rate dmaengine: ti-dma-crossbar: Correct am335x/am43xx mux value type PCI/PME: Handle invalid data when reading Root Status powerpc/powernv/cpufreq: Fix the frequency read by /proc/cpuinfo netfilter: ipvs: Fix inappropriate output of procfs powerpc/opal: Fix EBUSY bug in acquiring tokens powerpc/ipic: Fix status get and status clear target/iscsi: Fix a race condition in iscsit_add_reject_from_cmd() iscsi-target: fix memory leak in lio_target_tiqn_addtpg() target:fix condition return in core_pr_dump_initiator_port() target/file: Do not return error for UNMAP if length is zero arm-ccn: perf: Prevent module unload while PMU is in use crypto: tcrypt - fix buffer lengths in test_aead_speed() mm: Handle 0 flags in _calc_vm_trans() macro clk: mediatek: add the option for determining PLL source clock clk: imx6: refine hdmi_isfr's parent to make HDMI work on i.MX6 SoCs w/o VPU clk: tegra: Fix cclk_lp divisor register ppp: Destroy the mutex when cleanup thermal/drivers/step_wise: Fix temperature regulation misbehavior GFS2: Take inode off order_write list when setting jdata flag bcache: explicitly destroy mutex while exiting bcache: fix wrong cache_misses statistics l2tp: cleanup l2tp_tunnel_delete calls xfs: fix log block underflow during recovery cycle verification xfs: fix incorrect extent state in xfs_bmap_add_extent_unwritten_real PCI: Detach driver before procfs & sysfs teardown on device remove scsi: hpsa: cleanup sas_phy structures in sysfs when unloading scsi: hpsa: destroy sas transport properties before scsi_host powerpc/perf/hv-24x7: Fix incorrect comparison in memord tty fix oops when rmmod 8250 usb: musb: da8xx: fix babble condition handling pinctrl: adi2: Fix Kconfig build problem raid5: Set R5_Expanded on parity devices as well as data. scsi: scsi_devinfo: Add REPORTLUN2 to EMC SYMMETRIX blacklist entry vt6655: Fix a possible sleep-in-atomic bug in vt6655_suspend scsi: sd: change manage_start_stop to bool in sysfs interface scsi: sd: change allow_restart to bool in sysfs interface scsi: bfa: integer overflow in debugfs udf: Avoid overflow when session starts at large offset macvlan: Only deliver one copy of the frame to the macvlan interface RDMA/cma: Avoid triggering undefined behavior IB/ipoib: Grab rtnl lock on heavy flush when calling ndo_open/stop ath9k: fix tx99 potential info leak Linux 4.4.107 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2017-12-20 10:49:07 +01:00
Steven Rostedt	af36d95af5	sched/rt: Do not pull from current CPU if only one CPU to pull commit f73c52a5bcd1710994e53fbccc378c42b97a06b6 upstream. Daniel Wagner reported a crash on the BeagleBone Black SoC. This is a single CPU architecture, and does not have a functional arch_send_call_function_single_ipi() implementation which can crash the kernel if that is called. As it only has one CPU, it shouldn't be called, but if the kernel is compiled for SMP, the push/pull RT scheduling logic now calls it for irq_work if the one CPU is overloaded, it can use that function to call itself and crash the kernel. Ideally, we should disable the SCHED_FEAT(RT_PUSH_IPI) if the system only has a single CPU. But SCHED_FEAT is a constant if sched debugging is turned off. Another fix can also be used, and this should also help with normal SMP machines. That is, do not initiate the pull code if there's only one RT overloaded CPU, and that CPU happens to be the current CPU that is scheduling in a lower priority task. Even on a system with many CPUs, if there's many RT tasks waiting to run on a single CPU, and that CPU schedules in another RT task of lower priority, it will initiate the PULL logic in case there's a higher priority RT task on another CPU that is waiting to run. But if there is no other CPU with waiting RT tasks, it will initiate the RT pull logic on itself (as it still has RT tasks waiting to run). This is a wasted effort. Not only does this help with SMP code where the current CPU is the only one with RT overloaded tasks, it should also solve the issue that Daniel encountered, because it will prevent the PULL logic from executing, as there's only one CPU on the system, and the check added here will cause it to exit the RT pull code. Reported-by: Daniel Wagner <wagi@monom.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-rt-users <linux-rt-users@vger.kernel.org> Fixes: 4bdced5c9 ("sched/rt: Simplify the IPI based RT balancing logic") Link: http://lkml.kernel.org/r/20171202130454.4cbbfe8d@vmware.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-20 10:04:52 +01:00
Greg Kroah-Hartman	9fbf3d7374	This is the 4.4.103 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlofw0sACgkQONu9yGCS aT4MPBAAo85uk2d6CXKRkNl3qKWtiStKXUet+NJFVr4GotOeg6ul9yul5jcs4pvl BJYnBh2LE77oDCOUKaSKI/0nDOHJs9n5m8GxjvG6cAvfn9RdgNm6kCCxNQFEhpNT IrmRrmCMd3aKPNrdz2Cbd4qHzNr0JuIv/bykNHDA/rw+PkQeLzZgiGIw9ftg1yHJ npzNLCjfVDPRy4qUCDYSS7+p83oHpWq3tHfha7M1S5HphsjVWjG79ABIKkN8w86z 5KnY3dqt5tqO4w0gZzKXv0gg4IJS62YqeJbF/dSefASvnBkINIzxBOEu0+xOFQ5t ezKkukpe8ivX4eUP2ruF9jAjVLCPYCm6UaWbYQZBAAf04KHC09uXDjB4wdGCINt6 tdOgfm60OsPHUFjx9KBn8M81Iabq8DYNubp+naG2U/j7lGzh3+mvyAlzQKetXMct b69skOxrjfT+2cCYeqz0UupHJigi5VLjX8hjpraXJA9oEwdS5gr9CfckEN3aUysu YmQ2LtgGuglUdV3Lc4QptFxRDoKna3E/Gx6rzMDPtRdV1L6dn9CULRz+Pw4T+nWl m6Ly9QXJVmC+d6fPW7cOEytPKRIqAUHSXQZxcPNPEcaPxD9CPWGO6TJLanc0BNYS g7u9kLA2fWmWnAkvEosP8lxJlQvgorhkXdCpEWuL+mAbnaImpts= =2wPT -----END PGP SIGNATURE----- Merge 4.4.103 into android-4.4 Changes in 4.4.103 s390: fix transactional execution control register handling s390/runtime instrumention: fix possible memory corruption s390/disassembler: add missing end marker for e7 table s390/disassembler: increase show_code buffer size ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER AF_VSOCK: Shrink the area influenced by prepare_to_wait vsock: use new wait API for vsock_stream_sendmsg() sched: Make resched_cpu() unconditional lib/mpi: call cond_resched() from mpi_powm() loop x86/decoder: Add new TEST instruction pattern ARM: 8722/1: mm: make STRICT_KERNEL_RWX effective for LPAE ARM: 8721/1: mm: dump: check hardware RO bit for LPAE MIPS: ralink: Fix MT7628 pinmux MIPS: ralink: Fix typo in mt7628 pinmux function ALSA: hda: Add Raven PCI ID dm bufio: fix integer overflow when limiting maximum cache size dm: fix race between dm_get_from_kobject() and __dm_destroy() MIPS: Fix an n32 core file generation regset support regression MIPS: BCM47XX: Fix LED inversion for WRT54GSv1 autofs: don't fail mount for transient error nilfs2: fix race condition that causes file system corruption eCryptfs: use after free in ecryptfs_release_messaging() bcache: check ca->alloc_thread initialized before wake up it isofs: fix timestamps beyond 2027 NFS: Fix typo in nomigration mount option nfs: Fix ugly referral attributes nfsd: deal with revoked delegations appropriately rtlwifi: rtl8192ee: Fix memory leak when loading firmware rtlwifi: fix uninitialized rtlhal->last_suspend_sec time ata: fixes kernel crash while tracing ata_eh_link_autopsy event ext4: fix interaction between i_size, fallocate, and delalloc after a crash ALSA: pcm: update tstamp only if audio_tstamp changed ALSA: usb-audio: Add sanity checks to FE parser ALSA: usb-audio: Fix potential out-of-bound access at parsing SU ALSA: usb-audio: Add sanity checks in v2 clock parsers ALSA: timer: Remove kernel warning at compat ioctl error paths ALSA: hda/realtek - Fix ALC700 family no sound issue fix a page leak in vhost_scsi_iov_to_sgl() error recovery fs/9p: Compare qid.path in v9fs_test_inode iscsi-target: Fix non-immediate TMR reference leak target: Fix QUEUE_FULL + SCSI task attribute handling KVM: nVMX: set IDTR and GDTR limits when loading L1 host state KVM: SVM: obey guest PAT SUNRPC: Fix tracepoint storage issues with svc_recv and svc_rqst_status clk: ti: dra7-atl-clock: Fix of_node reference counting clk: ti: dra7-atl-clock: fix child-node lookups libnvdimm, namespace: fix label initialization to use valid seq numbers libnvdimm, namespace: make 'resource' attribute only readable by root IB/srpt: Do not accept invalid initiator port names IB/srp: Avoid that a cable pull can trigger a kernel crash NFC: fix device-allocation error return i40e: Use smp_rmb rather than read_barrier_depends igb: Use smp_rmb rather than read_barrier_depends igbvf: Use smp_rmb rather than read_barrier_depends ixgbevf: Use smp_rmb rather than read_barrier_depends i40evf: Use smp_rmb rather than read_barrier_depends fm10k: Use smp_rmb rather than read_barrier_depends ixgbe: Fix skb list corruption on Power systems parisc: Fix validity check of pointer size argument in new CAS implementation powerpc/signal: Properly handle return value from uprobe_deny_signal() media: Don't do DMA on stack for firmware upload in the AS102 driver media: rc: check for integer overflow cx231xx-cards: fix NULL-deref on missing association descriptor media: v4l2-ctrl: Fix flags field on Control events sched/rt: Simplify the IPI based RT balancing logic fscrypt: lock mutex before checking for bounce page pool net/9p: Switch to wait_event_killable() PM / OPP: Add missing of_node_put(np) e1000e: Fix error path in link detection e1000e: Fix return value test e1000e: Separate signaling for link check/link up RDS: RDMA: return appropriate error on rdma map failures PCI: Apply _HPX settings only to relevant devices dmaengine: zx: set DMA_CYCLIC cap_mask bit net: Allow IP_MULTICAST_IF to set index to L3 slave net: 3com: typhoon: typhoon_init_one: make return values more specific net: 3com: typhoon: typhoon_init_one: fix incorrect return values drm/armada: Fix compile fail ath10k: fix incorrect txpower set by P2P_DEVICE interface ath10k: ignore configuring the incorrect board_id ath10k: fix potential memory leak in ath10k_wmi_tlv_op_pull_fw_stats() ath10k: set CTS protection VDEV param only if VDEV is up ALSA: hda - Apply ALC269_FIXUP_NO_SHUTUP on HDA_FIXUP_ACT_PROBE drm: Apply range restriction after color adjustment when allocation mac80211: Remove invalid flag operations in mesh TSF synchronization mac80211: Suppress NEW_PEER_CANDIDATE event if no room iio: light: fix improper return value staging: iio: cdc: fix improper return value spi: SPI_FSL_DSPI should depend on HAS_DMA netfilter: nft_queue: use raw_smp_processor_id() netfilter: nf_tables: fix oob access ASoC: rsnd: don't double free kctrl btrfs: return the actual error value from from btrfs_uuid_tree_iterate ASoC: wm_adsp: Don't overrun firmware file buffer when reading region data s390/kbuild: enable modversions for symbols exported from asm xen: xenbus driver must not accept invalid transaction ids Revert "sctp: do not peel off an assoc from one netns to another one" Linux 4.4.103 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2017-11-30 15:43:08 +00:00
Steven Rostedt (Red Hat)	cb1831a83e	sched/rt: Simplify the IPI based RT balancing logic commit 4bdced5c9a2922521e325896a7bbbf0132c94e56 upstream. When a CPU lowers its priority (schedules out a high priority task for a lower priority one), a check is made to see if any other CPU has overloaded RT tasks (more than one). It checks the rto_mask to determine this and if so it will request to pull one of those tasks to itself if the non running RT task is of higher priority than the new priority of the next task to run on the current CPU. When we deal with large number of CPUs, the original pull logic suffered from large lock contention on a single CPU run queue, which caused a huge latency across all CPUs. This was caused by only having one CPU having overloaded RT tasks and a bunch of other CPUs lowering their priority. To solve this issue, commit: `b6366f048e` ("sched/rt: Use IPI to trigger RT task push migration instead of pulling") changed the way to request a pull. Instead of grabbing the lock of the overloaded CPU's runqueue, it simply sent an IPI to that CPU to do the work. Although the IPI logic worked very well in removing the large latency build up, it still could suffer from a large number of IPIs being sent to a single CPU. On a 80 CPU box, I measured over 200us of processing IPIs. Worse yet, when I tested this on a 120 CPU box, with a stress test that had lots of RT tasks scheduling on all CPUs, it actually triggered the hard lockup detector! One CPU had so many IPIs sent to it, and due to the restart mechanism that is triggered when the source run queue has a priority status change, the CPU spent minutes! processing the IPIs. Thinking about this further, I realized there's no reason for each run queue to send its own IPI. As all CPUs with overloaded tasks must be scanned regardless if there's one or many CPUs lowering their priority, because there's no current way to find the CPU with the highest priority task that can schedule to one of these CPUs, there really only needs to be one IPI being sent around at a time. This greatly simplifies the code! The new approach is to have each root domain have its own irq work, as the rto_mask is per root domain. The root domain has the following fields attached to it: rto_push_work - the irq work to process each CPU set in rto_mask rto_lock - the lock to protect some of the other rto fields rto_loop_start - an atomic that keeps contention down on rto_lock the first CPU scheduling in a lower priority task is the one to kick off the process. rto_loop_next - an atomic that gets incremented for each CPU that schedules in a lower priority task. rto_loop - a variable protected by rto_lock that is used to compare against rto_loop_next rto_cpu - The cpu to send the next IPI to, also protected by the rto_lock. When a CPU schedules in a lower priority task and wants to make sure overloaded CPUs know about it. It increments the rto_loop_next. Then it atomically sets rto_loop_start with a cmpxchg. If the old value is not "0", then it is done, as another CPU is kicking off the IPI loop. If the old value is "0", then it will take the rto_lock to synchronize with a possible IPI being sent around to the overloaded CPUs. If rto_cpu is greater than or equal to nr_cpu_ids, then there's either no IPI being sent around, or one is about to finish. Then rto_cpu is set to the first CPU in rto_mask and an IPI is sent to that CPU. If there's no CPUs set in rto_mask, then there's nothing to be done. When the CPU receives the IPI, it will first try to push any RT tasks that is queued on the CPU but can't run because a higher priority RT task is currently running on that CPU. Then it takes the rto_lock and looks for the next CPU in the rto_mask. If it finds one, it simply sends an IPI to that CPU and the process continues. If there's no more CPUs in the rto_mask, then rto_loop is compared with rto_loop_next. If they match, everything is done and the process is over. If they do not match, then a CPU scheduled in a lower priority task as the IPI was being passed around, and the process needs to start again. The first CPU in rto_mask is sent the IPI. This change removes this duplication of work in the IPI logic, and greatly lowers the latency caused by the IPIs. This removed the lockup happening on the 120 CPU machine. It also simplifies the code tremendously. What else could anyone ask for? Thanks to Peter Zijlstra for simplifying the rto_loop_start atomic logic and supplying me with the rto_start_trylock() and rto_start_unlock() helper functions. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Clark Williams <williams@redhat.com> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott Wood <swood@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170424114732.1aac6dc4@gandalf.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:37:25 +00:00
Todd Kjos	3822fe484c	Revert "ANDROID: sched/rt: schedtune: Add boost retention to RT" This reverts commit `d194ba5d71`. Reason for revert: Broke some builds. Will fix and resubmit. Change-Id: I4e6fa1562346eda1bbf058f1d5ace5ba6256ce07	2017-11-08 00:43:53 +00:00
Viresh Kumar	df147c9e33	cpufreq: Drop schedfreq governor We all should be using (and improving) the schedutil governor now. Get rid of the non-upstream governor. Tested on Hikey. Change-Id: Ic660756536e5da51952738c3c18b94e31f58cd57 Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2017-11-07 23:57:47 +00:00
Joel Fernandes	d194ba5d71	ANDROID: sched/rt: schedtune: Add boost retention to RT Boosted RT tasks can be deboosted quickly, this makes boost usless for RT tasks and causes lots of glitching. Use timers to prevent de-boost too soon and wait for long enough such that next enqueue happens after a threshold. While this can be solved in the governor, there are following advantages: - The approach used is governor-independent - Reduces boost group lock contention for frequently sleepers/wakers - Works with schedfreq without any other schedfreq hacks. Bug: 30210506 Change-Id: I41788b235586988be446505deb7c0529758a9898 Signed-off-by: Joel Fernandes <joelaf@google.com>	2017-11-07 23:47:42 +00:00
Joel Fernandes	cd04e987d1	ANDROID: sched/rt: add schedtune accounting This patch adds schedtune enqueue/dequeue to RT scheduling class. Change-Id: If416e64319d62191f3aedd675d3e9a21fe2102fb Signed-off-by: Joel Fernandes <joelaf@google.com>	2017-11-07 23:47:17 +00:00
Brendan Jackman	38ddcff85a	FROMLIST: sched/fair: Use wake_q length as a hint for wake_wide (from https://patchwork.kernel.org/patch/9895261/) This patch adds a parameter to select_task_rq, sibling_count_hint allowing the caller, where it has this information, to inform the sched_class the number of tasks that are being woken up as part of the same event. The wake_q mechanism is one case where this information is available. select_task_rq_fair can then use the information to detect that it needs to widen the search space for task placement in order to avoid overloading the last-level cache domain's CPUs. * * * The reason I am investigating this change is the following use case on ARM big.LITTLE (asymmetrical CPU capacity): 1 task per CPU, which all repeatedly do X amount of work then pthread_barrier_wait (i.e. sleep until the last task finishes its X and hits the barrier). On big.LITTLE, the tasks which get a "big" CPU finish faster, and then those CPUs pull over the tasks that are still running: v CPU v ->time-> ------------- 0 (big) 11111 /333 ------------- 1 (big) 22222 /444\| ------------- 2 (LITTLE) 333333/ ------------- 3 (LITTLE) 444444/ ------------- Now when task 4 hits the barrier (at \|) and wakes the others up, there are 4 tasks with prev_cpu=<big> and 0 tasks with prev_cpu=<little>. want_affine therefore means that we'll only look in CPUs 0 and 1 (sd_llc), so tasks will be unnecessarily coscheduled on the bigs until the next load balance, something like this: v CPU v ->time-> ------------------------ 0 (big) 11111 /333 31313\33333 ------------------------ 1 (big) 22222 /444\|424\4444444 ------------------------ 2 (LITTLE) 333333/ \222222 ------------------------ 3 (LITTLE) 444444/ \1111 ------------------------ ^^^ underutilization So, I'm trying to get want_affine = 0 for these tasks. I don't _think_ any incarnation of the wakee_flips mechanism can help us here because which task is waker and which tasks are wakees generally changes with each iteration. However pthread_barrier_wait (or more accurately FUTEX_WAKE) has the nice property that we know exactly how many tasks are being woken, so we can cheat. It might be a disadvantage that we "widen" _every_ task that's woken in an event, while select_idle_sibling would work fine for the first sd_llc_size - 1 tasks. IIUC, if wake_affine() behaves correctly this trick wouldn't be necessary on SMP systems, so it might be best guarded by the presence of SD_ASYM_CPUCAPACITY? * * * Final note.. In order to observe "perfect" behaviour for this use case, I also had to disable the TTWU_QUEUE sched feature. Suppose during the wakeup above we are working through the work queue and have placed tasks 3 and 2, and are about to place task 1: v CPU v ->time-> -------------- 0 (big) 11111 /333 3 -------------- 1 (big) 22222 /444\|4 -------------- 2 (LITTLE) 333333/ 2 -------------- 3 (LITTLE) 444444/ <- Task 1 should go here -------------- If TTWU_QUEUE is enabled, we will not yet have enqueued task 2 (having instead sent a reschedule IPI) or attached its load to CPU 2. So we are likely to also place task 1 on cpu 2. Disabling TTWU_QUEUE means that we enqueue task 2 before placing task 1, solving this issue. TTWU_QUEUE is there to minimise rq lock contention, and I guess that this contention is less of an issue on big.LITTLE systems since they have relatively few CPUs, which suggests the trade-off makes sense here. Change-Id: I2080302839a263e0841a89efea8589ea53bbda9c Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Matt Fleming <matt@codeblueprint.co.uk>	2017-10-27 18:50:59 +00:00
Olav Haugan	ec888d46d8	sched: Update task->on_rq when tasks are moving between runqueues Task->on_rq has three states: 0 - Task is not on runqueue (rq) 1 (TASK_ON_RQ_QUEUED) - Task is on rq 2 (TASK_ON_RQ_MIGRATING) - Task is on rq but in the process of being migrated to another rq When a task is moving between rqs task->on_rq state should be TASK_ON_RQ_MIGRATING in order for WALT to account rq's cumulative runnable average correctly. Without such state marking for all the classes, WALT's update_history() would try to fixup task's demand which was never contributed to any of CPUs during migration. Change-Id: Iced3428f3924fe8ab5d0075698273ead04f12d5b Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> [joonwoop: Reinforced changelog to explain why this is needed by WALT. Fixed conflicts in deadline.c] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2017-10-16 21:07:06 +00:00
Steve Muckle	f02702dcf2	sched: backport cpufreq hooks from 4.9-rc4 The scheduler cpufreq hooks are required by the schedutil cpufreq governor. Change-Id: Ied6c46262bb33b7e81bbb3d3d2761124e0c676b7 Signed-off-by: Steve Muckle <smuckle@linaro.org> [trivial cherry-picking fixes] Signed-off-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2017-06-02 08:01:50 -07:00
Greg Kroah-Hartman	3a75d7a947	Merge 4.4.59 into android-4.4 Changes in 4.4.59: xfrm: policy: init locks early xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder virtio_balloon: init 1st buffer in stats vq pinctrl: qcom: Don't clear status bit on irq_unmask c6x/ptrace: Remove useless PTRACE_SETREGSET implementation h8300/ptrace: Fix incorrect register transfer count mips/ptrace: Preserve previous registers for short regset write sparc/ptrace: Preserve previous registers for short regset write metag/ptrace: Preserve previous registers for short regset write metag/ptrace: Provide default TXSTATUS for short NT_PRSTATUS metag/ptrace: Reject partial NT_METAG_RPIPE writes fscrypt: remove broken support for detecting keyring key revocation sched/rt: Add a missing rescheduling point Linux 4.4.59 Change-Id: Ifa35307b133cbf29d0a0084bb78a7b0436182b53 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2017-04-06 19:01:38 +00:00
Sebastian Andrzej Siewior	2bed598769	sched/rt: Add a missing rescheduling point commit 619bd4a71874a8fd78eb6ccf9f272c5e98bcc7b7 upstream. Since the change in commit: `fd7a4bed18` ("sched, rt: Convert switched_{from, to}_rt() / prio_changed_rt() to balance callbacks") ... we don't reschedule a task under certain circumstances: Lets say task-A, SCHED_OTHER, is running on CPU0 (and it may run only on CPU0) and holds a PI lock. This task is removed from the CPU because it used up its time slice and another SCHED_OTHER task is running. Task-B on CPU1 runs at RT priority and asks for the lock owned by task-A. This results in a priority boost for task-A. Task-B goes to sleep until the lock has been made available. Task-A is already runnable (but not active), so it receives no wake up. The reality now is that task-A gets on the CPU once the scheduler decides to remove the current task despite the fact that a high priority task is enqueued and waiting. This may take a long time. The desired behaviour is that CPU0 immediately reschedules after the priority boost which made task-A the task with the lowest priority. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `fd7a4bed18` ("sched, rt: Convert switched_{from, to}_rt() prio_changed_rt() to balance callbacks") Link: http://lkml.kernel.org/r/20170124144006.29821-1-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-31 09:49:54 +02:00
Matt Wagantall	2a4445395f	sched/rt: Add Kconfig option to enable panicking for RT throttling This may be useful for detecting and debugging RT throttling issues. Change-Id: I5807a897d11997d76421c1fcaa2918aad988c6c9 Signed-off-by: Matt Wagantall <mattw@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [jstultz: forwardported to 4.4] Signed-off-by: John Stultz <john.stultz@linaro.org>	2016-08-11 14:26:55 -07:00
Matt Wagantall	989f33f789	sched/rt: print RT tasks when RT throttling is activated Existing debug prints do not provide any clues about which tasks may have triggered RT throttling. Print the names and PIDs of all tasks on the throttled rt_rq to help narrow down the source of the problem. Change-Id: I180534c8a647254ed38e89d0c981a8f8bccd741c Signed-off-by: Matt Wagantall <mattw@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>	2016-08-11 14:26:54 -07:00
Srivatsa Vaddagiri	efb86bd08a	sched: Introduce Window Assisted Load Tracking (WALT) use a window based view of time in order to track task demand and CPU utilization in the scheduler. Window Assisted Load Tracking (WALT) implementation credits: Srivatsa Vaddagiri, Steve Muckle, Syed Rameez Mustafa, Joonwoo Park, Pavan Kumar Kondeti, Olav Haugan 2016-03-06: Integration with EAS/refactoring by Vikram Mulukutla and Todd Kjos Change-Id: I21408236836625d4e7d7de1843d20ed5ff36c708 Includes fixes for issues: eas/walt: Use walt_ktime_clock() instead of ktime_get_ns() to avoid a race resulting in watchdog resets BUG: 29353986 Change-Id: Ic1820e22a136f7c7ebd6f42e15f14d470f6bbbdb Handle walt accounting anomoly during resume During resume, there is a corner case where on wakeup, a task's prev_runnable_sum can go negative. This is a workaround that fixes the condition and warns (instead of crashing). BUG: 29464099 Change-Id: I173e7874324b31a3584435530281708145773508 Signed-off-by: Todd Kjos <tkjos@google.com> Signed-off-by: Srinath Sridharan <srinathsr@google.com> Signed-off-by: Juri Lelli <juri.lelli@arm.com> [jstultz: fwdported to 4.4] Signed-off-by: John Stultz <john.stultz@linaro.org>	2016-08-11 14:26:43 -07:00
Vincent Guittot	6e1e1ed872	sched: rt scheduler sets capacity requirement RT tasks don't provide any running constraints like deadline ones except their running priority. The only current usable input to estimate the capacity needed by RT tasks is the rt_avg metric. We use it to estimate the CPU capacity needed for the RT scheduler class. In order to monitor the evolution for RT task load, we must peridiocally check it during the tick. Then, we use the estimated capacity of the last activity to estimate the next one which can not be that accurate but is a good starting point without any impact on the wake up path of RT tasks. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Steve Muckle <smuckle@linaro.org>	2016-05-10 16:53:23 +08:00
Arnd Bergmann	89b411081d	sched/rt: Hide the push_irq_work_func() declaration The push_irq_work_func() function is conditionally defined only when both CONFIG_SMP and HAVE_RT_PUSH_IPI are defined, but the forward declaration remains visibile without HAVE_RT_PUSH_IPI, causing a gcc warning in ARM64 allnoconfig: kernel/sched/rt.c:68:13: warning: 'push_irq_work_func' declared 'static' but never defined [-Wunused-function] This changes the code to use the same condition for both the declaration and the function definition, which gets rid of the warning. As Peter Zijlstra, we can possibly get rid of the whole HAVE_RT_PUSH_IPI thing after: `8053871d0f` ("smp: Fix smp_call_function_single_async() locking") Until that is done, this patch can be used to avoid the warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `b6366f048e` ("sched/rt: Use IPI to trigger RT task push migration instead of pulling") Link: http://lkml.kernel.org/r/3828565.oKfGk7yNIT@wuerfel Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-11-23 09:25:08 +01:00
Juri Lelli	269b26a5ef	sched/rt: Make (do_)balance_runtime() return void The return value of (do_)balance_runtime() is not consumed by anybody. Make them return void. Signed-off-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1441188096-23021-5-git-send-email-juri.lelli@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-23 09:51:26 +02:00
Peter Zijlstra	6c37067e27	sched: Change the sched_class::set_cpus_allowed() calling context Change the calling context of sched_class::set_cpus_allowed() such that we can assume the task is inactive. This allows us to easily make changes that affect accounting done by enqueue/dequeue. This does in fact completely remove set_cpus_allowed_rt() and greatly reduces set_cpus_allowed_dl(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dedekind1@gmail.com Cc: juri.lelli@arm.com Cc: mgorman@suse.de Cc: riel@redhat.com Cc: rostedt@goodmis.org Link: http://lkml.kernel.org/r/20150515154833.667516139@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-12 12:06:10 +02:00
Peter Zijlstra	c5b2803840	sched: Make sched_class::set_cpus_allowed() unconditional Give every class a set_cpus_allowed() method, this enables some small optimization in the RT,DL implementation by avoiding a double cpumask_weight() call. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dedekind1@gmail.com Cc: juri.lelli@arm.com Cc: mgorman@suse.de Cc: riel@redhat.com Cc: rostedt@goodmis.org Link: http://lkml.kernel.org/r/20150515154833.614517487@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-12 12:06:09 +02:00
Xunlei Pang	8fd373548e	sched/rt: Remove a redundant condition from task_woken_rt() 'p' has been already queued at this point, so "!task_running(rq, p)" and "p->nr_cpus_allowed > 1" imply that "has_pushable_tasks(rq)" is true, so it can be removed. Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1435995563-3723-1-git-send-email-xlpang@126.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-03 12:21:19 +02:00
Peter Zijlstra	cbce1a6867	sched,lockdep: Employ lock pinning Employ the new lockdep lock pinning annotation to ensure no 'accidental' lock-breaks happen with rq->lock. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: oleg@redhat.com Cc: wanpeng.li@linux.intel.com Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150611124744.003233193@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-06-19 00:25:27 +02:00
Peter Zijlstra	fd7a4bed18	sched, rt: Convert switched_{from, to}_rt() / prio_changed_rt() to balance callbacks Remove the direct {push,pull} balancing operations from switched_{from,to}_rt() / prio_changed_rt() and use the balance callback queue. Again, err on the side of too many reschedules; since too few is a hard bug while too many is just annoying. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: oleg@redhat.com Cc: wanpeng.li@linux.intel.com Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150611124742.766832367@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-06-19 00:25:26 +02:00
Peter Zijlstra	8046d68062	sched,rt: Remove return value from pull_rt_task() In order to be able to use pull_rt_task() from a callback, we need to do away with the return value. Since the return value indicates if we should reschedule, do this inside the function. Since not all callers currently do this, this can increase the number of reschedules due rt balancing. Too many reschedules is not a correctness issues, too few are. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: oleg@redhat.com Cc: wanpeng.li@linux.intel.com Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150611124742.679002000@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-06-19 00:25:26 +02:00
Peter Zijlstra	e3fca9e7cb	sched: Replace post_schedule with a balance callback list Generalize the post_schedule() stuff into a balance callback list. This allows us to more easily use it outside of schedule() and cross sched_class. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: oleg@redhat.com Cc: wanpeng.li@linux.intel.com Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150611124742.424032725@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-06-19 00:25:26 +02:00
Thomas Gleixner	624bbdfac9	Merge branch 'timers/core' into sched/hrtimers Merge sched/core and timers/core so we can apply the sched balancing patch queue, which depends on both.	2015-06-19 00:17:47 +02:00
Peter Zijlstra	4cfafd3082	sched,perf: Fix periodic timers In the below two commits (see Fixes) we have periodic timers that can stop themselves when they're no longer required, but need to be (re)-started when their idle condition changes. Further complications is that we want the timer handler to always do the forward such that it will always correctly deal with the overruns, and we do not want to race such that the handler has already decided to stop, but the (external) restart sees the timer still active and we end up with a 'lost' timer. The problem with the current code is that the re-start can come before the callback does the forward, at which point the forward from the callback will WARN about forwarding an enqueued timer. Now, conceptually its easy to detect if you're before or after the fwd by comparing the expiration time against the current time. Of course, that's expensive (and racy) because we don't have the current time. Alternatively one could cache this state inside the timer, but then everybody pays the overhead of maintaining this extra state, and that is undesired. The only other option that I could see is the external timer_active variable, which I tried to kill before. I would love a nicer interface for this seemingly simple 'problem' but alas. Fixes: `272325c482` ("perf: Fix mux_interval hrtimer wreckage") Fixes: `77a4d1a1b9` ("sched: Cleanup bandwidth timers") Cc: pjt@google.com Cc: tglx@linutronix.de Cc: klamm@yandex-team.ru Cc: mingo@kernel.org Cc: bsegall@google.com Cc: hpa@zytor.com Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150514102311.GX21418@twins.programming.kicks-ass.net	2015-05-18 17:17:42 +02:00
Jason Low	316c1608d1	sched, timer: Convert usages of ACCESS_ONCE() in the scheduler to READ_ONCE()/WRITE_ONCE() ACCESS_ONCE doesn't work reliably on non-scalar types. This patch removes the rest of the existing usages of ACCESS_ONCE() in the scheduler, and use the new READ_ONCE() and WRITE_ONCE() APIs as appropriate. Signed-off-by: Jason Low <jason.low2@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Waiman Long <Waiman.Long@hp.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1430251224-5764-2-git-send-email-jason.low2@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-08 12:11:32 +02:00
Peter Zijlstra	77a4d1a1b9	sched: Cleanup bandwidth timers Roman reported a 3 cpu lockup scenario involving __start_cfs_bandwidth(). The more I look at that code the more I'm convinced its crack, that entire __start_cfs_bandwidth() thing is brain melting, we don't need to cancel a timer before starting it, hrtimer_start() will happily remove the timer for you if its still enqueued. Removing that, removes a big part of the problem, no more ugly cancel loop to get stuck in. So now, if I understand things right, the entire reason you have this cfs_b->lock guarded ->timer_active nonsense is to make sure we don't accidentally lose the timer. It appears to me that it should be possible to guarantee that same by unconditionally (re)starting the timer when !queued. Because regardless what hrtimer::function will return, if we beat it to (re)enqueue the timer, it doesn't matter. Now, because hrtimers don't come with any serialization guarantees we must ensure both handler and (re)start loop serialize their access to the hrtimer to avoid both trying to forward the timer at the same time. Update the rt bandwidth timer to match. This effectively reverts: `09dc4ab039` ("sched/fair: Fix tg_set_cfs_bandwidth() deadlock on rq->lock"). Reported-by: Roman Gushchin <klamm@yandex-team.ru> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ben Segall <bsegall@google.com> Cc: Paul Turner <pjt@google.com> Link: http://lkml.kernel.org/r/20150415095011.804589208@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-22 17:06:53 +02:00
Abel Vesa	07c54f7a7f	sched/core: Remove unused argument from init_[rt\|dl]_rq() Obviously, 'rq' is not used in these two functions, therefore, there is no reason for it to be passed as an argument. Signed-off-by: Abel Vesa <abelvesa@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1425383427-26244-1-git-send-email-abelvesa@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:42:55 +02:00
Steven Rostedt	b6366f048e	sched/rt: Use IPI to trigger RT task push migration instead of pulling When debugging the latencies on a 40 core box, where we hit 300 to 500 microsecond latencies, I found there was a huge contention on the runqueue locks. Investigating it further, running ftrace, I found that it was due to the pulling of RT tasks. The test that was run was the following: cyclictest --numa -p95 -m -d0 -i100 This created a thread on each CPU, that would set its wakeup in iterations of 100 microseconds. The -d0 means that all the threads had the same interval (100us). Each thread sleeps for 100us and wakes up and measures its latencies. cyclictest is maintained at: git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rt-tests.git What happened was another RT task would be scheduled on one of the CPUs that was running our test, when the other CPU tests went to sleep and scheduled idle. This caused the "pull" operation to execute on all these CPUs. Each one of these saw the RT task that was overloaded on the CPU of the test that was still running, and each one tried to grab that task in a thundering herd way. To grab the task, each thread would do a double rq lock grab, grabbing its own lock as well as the rq of the overloaded CPU. As the sched domains on this box was rather flat for its size, I saw up to 12 CPUs block on this lock at once. This caused a ripple affect with the rq locks especially since the taking was done via a double rq lock, which means that several of the CPUs had their own rq locks held while trying to take this rq lock. As these locks were blocked, any wakeups or load balanceing on these CPUs would also block on these locks, and the wait time escalated. I've tried various methods to lessen the load, but things like an atomic counter to only let one CPU grab the task wont work, because the task may have a limited affinity, and we may pick the wrong CPU to take that lock and do the pull, to only find out that the CPU we picked isn't in the task's affinity. Instead of doing the PULL, I now have the CPUs that want the pull to send over an IPI to the overloaded CPU, and let that CPU pick what CPU to push the task to. No more need to grab the rq lock, and the push/pull algorithm still works fine. With this patch, the latency dropped to just 150us over a 20 hour run. Without the patch, the huge latencies would trigger in seconds. I've created a new sched feature called RT_PUSH_IPI, which is enabled by default. When RT_PUSH_IPI is not enabled, the old method of grabbing the rq locks and having the pulling CPU do the work is implemented. When RT_PUSH_IPI is enabled, the IPI is sent to the overloaded CPU to do a push. To enabled or disable this at run time: # mount -t debugfs nodev /sys/kernel/debug # echo RT_PUSH_IPI > /sys/kernel/debug/sched_features or # echo NO_RT_PUSH_IPI > /sys/kernel/debug/sched_features Update: This original patch would send an IPI to all CPUs in the RT overload list. But that could theoretically cause the reverse issue. That is, there could be lots of overloaded RT queues and one CPU lowers its priority. It would then send an IPI to all the overloaded RT queues and they could then all try to grab the rq lock of the CPU lowering its priority, and then we have the same problem. The latest design sends out only one IPI to the first overloaded CPU. It tries to push any tasks that it can, and then looks for the next overloaded CPU that can push to the source CPU. The IPIs stop when all overloaded CPUs that have pushable tasks that have priorities greater than the source CPU are covered. In case the source CPU lowers its priority again, a flag is set to tell the IPI traversal to restart with the first RT overloaded CPU after the source CPU. Parts-suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Joern Engel <joern@purestorage.com> Cc: Clark Williams <williams@redhat.com> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150318144946.2f3cc982@gandalf.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-23 10:55:22 +01:00
Tim Chen	80e3d87b2c	sched/rt: Reduce rq lock contention by eliminating locking of non-feasible target This patch adds checks that prevens futile attempts to move rt tasks to a CPU with active tasks of equal or higher priority. This reduces run queue lock contention and improves the performance of a well known OLTP benchmark by 0.7%. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Shawn Bohrer <sbohrer@rgmadvisors.com> Cc: Suruchi Kadu <suruchi.a.kadu@intel.com> Cc: Doug Nelson<doug.nelson@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1421430374.2399.27.camel@schen9-desk2.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-30 19:38:49 +01:00
Peter Zijlstra	9edfbfed3f	sched/core: Rework rq->clock update skips The original purpose of rq::skip_clock_update was to avoid 'costly' clock updates for back to back wakeup-preempt pairs. The big problem with it has always been that the rq variable is unaware of the context and causes indiscrimiate clock skips. Rework the entire thing and create a sense of context by only allowing schedule() to skip clock updates. (XXX can we measure the cost of the added store?) By ensuring only schedule can ever skip an update, we guarantee we're never more than 1 tick behind on the update. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150105103554.432381549@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-14 13:34:20 +01:00
Wanpeng Li	6c1d9410f0	sched: Move p->nr_cpus_allowed check to select_task_rq() Move the p->nr_cpus_allowed check into kernel/sched/core.c: select_task_rq(). This change will make fair.c, rt.c, and deadline.c all start with the same logic. Suggested-and-Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "pang.xunlei" <pang.xunlei@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1415150077-59053-1-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-16 10:58:55 +01:00
Ingo Molnar	e9ac5f0fa8	Merge branch 'sched/urgent' into sched/core, to pick up fixes before applying more changes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-16 10:50:25 +01:00
Stanislaw Gruszka	6e998916df	sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency Commit `d670ec1317` "posix-cpu-timers: Cure SMP wobbles" fixes one glibc test case in cost of breaking another one. After that commit, calling clock_nanosleep(TIMER_ABSTIME, X) and then clock_gettime(&Y) can result of Y time being smaller than X time. Reproducer/tester can be found further below, it can be compiled and ran by: gcc -o tst-cpuclock2 tst-cpuclock2.c -pthread while ./tst-cpuclock2 ; do : ; done This reproducer, when running on a buggy kernel, will complain about "clock_gettime difference too small". Issue happens because on start in thread_group_cputimer() we initialize sum_exec_runtime of cputimer with threads runtime not yet accounted and then add the threads runtime to running cputimer again on scheduler tick, making it's sum_exec_runtime bigger than actual threads runtime. KOSAKI Motohiro posted a fix for this problem, but that patch was never applied: https://lkml.org/lkml/2013/5/26/191 . This patch takes different approach to cure the problem. It calls update_curr() when cputimer starts, that assure we will have updated stats of running threads and on the next schedule tick we will account only the runtime that elapsed from cputimer start. That also assure we have consistent state between cpu times of individual threads and cpu time of the process consisted by those threads. Full reproducer (tst-cpuclock2.c): #define _GNU_SOURCE #include <unistd.h> #include <sys/syscall.h> #include <stdio.h> #include <time.h> #include <pthread.h> #include <stdint.h> #include <inttypes.h> /* Parameters for the Linux kernel ABI for CPU clocks. / #define CPUCLOCK_SCHED 2 #define MAKE_PROCESS_CPUCLOCK(pid, clock) \ ((~(clockid_t) (pid) << 3) \| (clockid_t) (clock)) static pthread_barrier_t barrier; / Help advance the clock. / static void chew_cpu(void arg) { pthread_barrier_wait(&barrier); while (1) ; return NULL; } / Don't use the glibc wrapper. / static int do_nanosleep(int flags, const struct timespec req) { clockid_t clock_id = MAKE_PROCESS_CPUCLOCK(0, CPUCLOCK_SCHED); return syscall(SYS_clock_nanosleep, clock_id, flags, req, NULL); } static int64_t tsdiff(const struct timespec before, const struct timespec after) { int64_t before_i = before->tv_sec * 1000000000ULL + before->tv_nsec; int64_t after_i = after->tv_sec * 1000000000ULL + after->tv_nsec; return after_i - before_i; } int main(void) { int result = 0; pthread_t th; pthread_barrier_init(&barrier, NULL, 2); if (pthread_create(&th, NULL, chew_cpu, NULL) != 0) { perror("pthread_create"); return 1; } pthread_barrier_wait(&barrier); /* The test. / struct timespec before, after, sleeptimeabs; int64_t sleepdiff, diffabs; const struct timespec sleeptime = {.tv_sec = 0,.tv_nsec = 100000000 }; / The relative nanosleep. Not sure why this is needed, but its presence seems to make it easier to reproduce the problem. / if (do_nanosleep(0, &sleeptime) != 0) { perror("clock_nanosleep"); return 1; } / Get the current time. / if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &before) < 0) { perror("clock_gettime[2]"); return 1; } / Compute the absolute sleep time based on the current time. / uint64_t nsec = before.tv_nsec + sleeptime.tv_nsec; sleeptimeabs.tv_sec = before.tv_sec + nsec / 1000000000; sleeptimeabs.tv_nsec = nsec % 1000000000; / Sleep for the computed time. / if (do_nanosleep(TIMER_ABSTIME, &sleeptimeabs) != 0) { perror("absolute clock_nanosleep"); return 1; } / Get the time after the sleep. / if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &after) < 0) { perror("clock_gettime[3]"); return 1; } / The time after sleep should always be equal to or after the absolute sleep time passed to clock_nanosleep. / sleepdiff = tsdiff(&sleeptimeabs, &after); if (sleepdiff < 0) { printf("absolute clock_nanosleep woke too early: %" PRId64 "\n", sleepdiff); result = 1; printf("Before %llu.%09llu\n", before.tv_sec, before.tv_nsec); printf("After %llu.%09llu\n", after.tv_sec, after.tv_nsec); printf("Sleep %llu.%09llu\n", sleeptimeabs.tv_sec, sleeptimeabs.tv_nsec); } / The difference between the timestamps taken before and after the clock_nanosleep call should be equal to or more than the duration of the sleep. */ diffabs = tsdiff(&before, &after); if (diffabs < sleeptime.tv_nsec) { printf("clock_gettime difference too small: %" PRId64 "\n", diffabs); result = 1; } pthread_cancel(th); return result; } Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20141112155843.GA24803@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-16 10:04:20 +01:00
Wanpeng Li	308a623a40	sched/rt: Clean up check_preempt_equal_prio() This patch checks if current can be pushed/pulled somewhere else in advance to make logic clear, the same behavior as dl class. - If current can't be migrated, useless to reschedule, let's hope task can move out. - If task is migratable, so let's not schedule it and see if it can be pushed or pulled somewhere else. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414708776-124078-1-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-04 07:17:52 +01:00
Linus Torvalds	0429fbc0bd	Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu consistent-ops changes from Tejun Heo: "Way back, before the current percpu allocator was implemented, static and dynamic percpu memory areas were allocated and handled separately and had their own accessors. The distinction has been gone for many years now; however, the now duplicate two sets of accessors remained with the pointer based ones - this_cpu_() - evolving various other operations over time. During the process, we also accumulated other inconsistent operations. This pull request contains Christoph's patches to clean up the duplicate accessor situation. __get_cpu_var() uses are replaced with with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr(). Unfortunately, the former sometimes is tricky thanks to C being a bit messy with the distinction between lvalues and pointers, which led to a rather ugly solution for cpumask_var_t involving the introduction of this_cpu_cpumask_var_ptr(). This converts most of the uses but not all. Christoph will follow up with the remaining conversions in this merge window and hopefully remove the obsolete accessors" 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits) irqchip: Properly fetch the per cpu offset percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write. percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t Revert "powerpc: Replace __get_cpu_var uses" percpu: Remove __this_cpu_ptr clocksource: Replace __this_cpu_ptr with raw_cpu_ptr sparc: Replace __get_cpu_var uses avr32: Replace __get_cpu_var with __this_cpu_write blackfin: Replace __get_cpu_var uses tile: Use this_cpu_ptr() for hardware counters tile: Replace __get_cpu_var uses powerpc: Replace __get_cpu_var uses alpha: Replace __get_cpu_var ia64: Replace __get_cpu_var uses s390: cio driver &__get_cpu_var replacements s390: Replace __get_cpu_var uses mips: Replace __get_cpu_var uses MIPS: Replace __get_cpu_var uses in FPU emulator. arm: Replace __this_cpu_ptr with raw_cpu_ptr ...	2014-10-15 07:48:18 +02:00
Kirill Tkhai	8aa6f0ebf4	sched/rt: Use resched_curr() in task_tick_rt() Some time ago PREEMPT_NEED_RESCHED was implemented, so reschedule technics is a little more difficult now. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140922183642.11015.66039.stgit@localhost Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-09-24 14:47:12 +02:00
Kirill Tkhai	f3f1768f89	sched/rt: Remove useless if from cleanup pick_next_task_rt() _pick_next_task_rt() never returns NULL. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1410529321.3569.26.camel@tkhai Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-09-19 12:35:20 +02:00
Christoph Lameter	4ba2968420	percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t __get_cpu_var can paper over differences in the definitions of cpumask_var_t and either use the address of the cpumask variable directly or perform a fetch of the address of the struct cpumask allocated elsewhere. This is important particularly when using per cpu cpumask_var_t declarations because in one case we have an offset into a per cpu area to handle and in the other case we need to fetch a pointer from the offset. This patch introduces a new macro this_cpu_cpumask_var_ptr() that is defined where cpumask_var_t is defined and performs the proper actions. All use cases where __get_cpu_var is used with cpumask_var_t are converted to the use of this_cpu_cpumask_var_ptr(). Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-08-28 08:58:57 -04:00
Kirill Tkhai	da0c1e65b5	sched: Add wrapper for checking task_struct::on_rq Implement task_on_rq_queued() and use it everywhere instead of on_rq check. No functional changes. The only exception is we do not use the wrapper in check_for_tasks(), because it requires to export task_on_rq_queued() in global header files. Next patch in series would return it back, so we do not twist it from here to there. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Kirill Tkhai <tkhai@yandex.ru> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1408528052.23412.87.camel@tkhai Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-08-20 14:52:59 +02:00
Kirill Tkhai	8875125efe	sched: Transform resched_task() into resched_curr() We always use resched_task() with rq->curr argument. It's not possible to reschedule any task but rq's current. The patch introduces resched_curr(struct rq *) to replace all of the repeating patterns. The main aim is cleanup, but there is a little size profit too: (before) $ size kernel/sched/built-in.o text data bss dec hex filename 155274 16445 7042 178761 2ba49 kernel/sched/built-in.o $ size vmlinux text data bss dec hex filename 7411490 1178376 991232 9581098 92322a vmlinux (after) $ size kernel/sched/built-in.o text data bss dec hex filename 155130 16445 7042 178617 2b9b9 kernel/sched/built-in.o $ size vmlinux text data bss dec hex filename 7411362 1178376 991232 9580970 9231aa vmlinux I was choosing between resched_curr() and resched_rq(), and the first name looks better for me. A little lie in Documentation/trace/ftrace.txt. I have not actually collected the tracing again. With a hope the patch won't make execution times much worse :) Signed-off-by: Kirill Tkhai <tkhai@yandex.ru> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20140628200219.1778.18735.stgit@localhost Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-07-16 13:38:19 +02:00

1 2 3