Commit graph

240 commits

Author SHA1 Message Date
Greg Kroah-Hartman
20ddb25b3e This is the 4.4.116 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlqHLN8ACgkQONu9yGCS
 aT7eyQ/+NGK3/MPgoqRtg8sEvr1CVk8VhH1BiBfiQPGXe/D4nqPrKQzQBBzsW8QX
 6Z9PY7wDz9RgFkw+FoOyG0eLuYdgNYOelASdQ4kJzteVH8pB2GxxTbX0drttzV+F
 liNy0w39YLYxbjR4FavOSuDekd46dNQsHBvzTawaFKh0BEtQO+1uUGMg1LjMKVPn
 F9ry0mEPrOoC2+nRvU6QXIUZy6y4+Pgdda0sfGcO3yXwQev9HoW5h9qMCnGah30J
 D3Glt86dtpQcuqeIaXrfX+HnkvAOxTHjP8uRn3O7A7h8+WYBWq5Xms6A7EE9duNV
 0UA8OZpvq0r0YSTmBFzrDexAcf/cXW8ajd/VKseI/d53iIauLV5FUaGldLJ3IQMc
 gYZ2uNxGTI4z3V+nIiVQ0NCm4kmqogVY8PvMlgUwiFVG2B088iYGZ7iTOQ9b7wBO
 VgDo0ouC/yDA8Lmz/A0l3SuvkJDNIPJit5lWzqCGRjk1F8WdPpI5C3ONfp8R3Lko
 sTllldOo982KW5up/fg5HfuMg1OjgXZtzO+/NlTtyTpSr9bb1OoniSROG8eEcMqO
 lKI1MB8Xx/pqqW1E8OOtb7A/8JPCBFzVV9xVGKwI0uZa2XOQeAwGruOe8Ub6nEpU
 8w30DlSgy8MB1BPL6UGC6k+001k8jkohdl/qjpYb6aK55CfbhlA=
 =a3k5
 -----END PGP SIGNATURE-----

Merge 4.4.116 into android-4.4

Changes in 4.4.116
	powerpc/bpf/jit: Disable classic BPF JIT on ppc64le
	powerpc/64: Fix flush_(d|i)cache_range() called from modules
	powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VEC
	powerpc: Simplify module TOC handling
	powerpc/pseries: Add H_GET_CPU_CHARACTERISTICS flags & wrapper
	powerpc/64: Add macros for annotating the destination of rfid/hrfid
	powerpc/64s: Simple RFI macro conversions
	powerpc/64: Convert fast_exception_return to use RFI_TO_USER/KERNEL
	powerpc/64: Convert the syscall exit path to use RFI_TO_USER/KERNEL
	powerpc/64s: Convert slb_miss_common to use RFI_TO_USER/KERNEL
	powerpc/64s: Add support for RFI flush of L1-D cache
	powerpc/64s: Support disabling RFI flush with no_rfi_flush and nopti
	powerpc/pseries: Query hypervisor for RFI flush settings
	powerpc/powernv: Check device-tree for RFI flush settings
	powerpc/64s: Wire up cpu_show_meltdown()
	powerpc/64s: Allow control of RFI flush via debugfs
	ASoC: pcm512x: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
	usbip: vhci_hcd: clear just the USB_PORT_STAT_POWER bit
	usbip: fix 3eee23c3ec14 tcp_socket address still in the status file
	net: cdc_ncm: initialize drvflags before usage
	ASoC: simple-card: Fix misleading error message
	ASoC: rsnd: don't call free_irq() on Parent SSI
	ASoC: rsnd: avoid duplicate free_irq()
	drm: rcar-du: Use the VBK interrupt for vblank events
	drm: rcar-du: Fix race condition when disabling planes at CRTC stop
	x86/asm: Fix inline asm call constraints for GCC 4.4
	ip6mr: fix stale iterator
	net: igmp: add a missing rcu locking section
	qlcnic: fix deadlock bug
	r8169: fix RTL8168EP take too long to complete driver initialization.
	tcp: release sk_frag.page in tcp_disconnect
	vhost_net: stop device during reset owner
	media: soc_camera: soc_scale_crop: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
	KEYS: encrypted: fix buffer overread in valid_master_desc()
	don't put symlink bodies in pagecache into highmem
	crypto: tcrypt - fix S/G table for test_aead_speed()
	x86/microcode/AMD: Do not load when running on a hypervisor
	x86/microcode: Do the family check first
	powerpc/pseries: include linux/types.h in asm/hvcall.h
	cifs: Fix missing put_xid in cifs_file_strict_mmap
	cifs: Fix autonegotiate security settings mismatch
	CIFS: zero sensitive data when freeing
	dmaengine: dmatest: fix container_of member in dmatest_callback
	x86/kaiser: fix build error with KASAN && !FUNCTION_GRAPH_TRACER
	kaiser: fix compile error without vsyscall
	netfilter: nf_queue: Make the queue_handler pernet
	posix-timer: Properly check sigevent->sigev_notify
	usb: gadget: uvc: Missing files for configfs interface
	sched/rt: Use container_of() to get root domain in rto_push_irq_work_func()
	sched/rt: Up the root domain ref count when passing it around via IPIs
	dccp: CVE-2017-8824: use-after-free in DCCP code
	media: dvb-usb-v2: lmedm04: Improve logic checking of warm start
	media: dvb-usb-v2: lmedm04: move ts2020 attach to dm04_lme2510_tuner
	mtd: cfi: convert inline functions to macros
	mtd: nand: brcmnand: Disable prefetch by default
	mtd: nand: Fix nand_do_read_oob() return value
	mtd: nand: sunxi: Fix ECC strength choice
	ubi: block: Fix locking for idr_alloc/idr_remove
	nfs/pnfs: fix nfs_direct_req ref leak when i/o falls back to the mds
	NFS: Add a cond_resched() to nfs_commit_release_pages()
	NFS: commit direct writes even if they fail partially
	NFS: reject request for id_legacy key without auxdata
	kernfs: fix regression in kernfs_fop_write caused by wrong type
	ahci: Annotate PCI ids for mobile Intel chipsets as such
	ahci: Add PCI ids for Intel Bay Trail, Cherry Trail and Apollo Lake AHCI
	ahci: Add Intel Cannon Lake PCH-H PCI ID
	crypto: hash - introduce crypto_hash_alg_has_setkey()
	crypto: cryptd - pass through absence of ->setkey()
	crypto: poly1305 - remove ->setkey() method
	nsfs: mark dentry with DCACHE_RCUACCESS
	media: v4l2-ioctl.c: don't copy back the result for -ENOTTY
	vb2: V4L2_BUF_FLAG_DONE is set after DQBUF
	media: v4l2-compat-ioctl32.c: add missing VIDIOC_PREPARE_BUF
	media: v4l2-compat-ioctl32.c: fix the indentation
	media: v4l2-compat-ioctl32.c: move 'helper' functions to __get/put_v4l2_format32
	media: v4l2-compat-ioctl32.c: avoid sizeof(type)
	media: v4l2-compat-ioctl32.c: copy m.userptr in put_v4l2_plane32
	media: v4l2-compat-ioctl32.c: fix ctrl_is_pointer
	media: v4l2-compat-ioctl32.c: make ctrl_is_pointer work for subdevs
	media: v4l2-compat-ioctl32: Copy v4l2_window->global_alpha
	media: v4l2-compat-ioctl32.c: copy clip list in put_v4l2_window32
	media: v4l2-compat-ioctl32.c: drop pr_info for unknown buffer type
	media: v4l2-compat-ioctl32.c: don't copy back the result for certain errors
	media: v4l2-compat-ioctl32.c: refactor compat ioctl32 logic
	crypto: caam - fix endless loop when DECO acquire fails
	arm: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls
	KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2
	watchdog: imx2_wdt: restore previous timeout after suspend+resume
	media: ts2020: avoid integer overflows on 32 bit machines
	media: cxusb, dib0700: ignore XC2028_I2C_FLUSH
	kernel/async.c: revert "async: simplify lowest_in_progress()"
	HID: quirks: Fix keyboard + touchpad on Toshiba Click Mini not working
	Bluetooth: btsdio: Do not bind to non-removable BCM43341
	Revert "Bluetooth: btusb: fix QCA Rome suspend/resume"
	Bluetooth: btusb: Restore QCA Rome suspend/resume fix with a "rewritten" version
	signal/openrisc: Fix do_unaligned_access to send the proper signal
	signal/sh: Ensure si_signo is initialized in do_divide_error
	alpha: fix crash if pthread_create races with signal delivery
	alpha: fix reboot on Avanti platform
	xtensa: fix futex_atomic_cmpxchg_inatomic
	EDAC, octeon: Fix an uninitialized variable warning
	pktcdvd: Fix pkt_setup_dev() error path
	btrfs: Handle btrfs_set_extent_delalloc failure in fixup worker
	nvme: Fix managing degraded controllers
	ACPI: sbshc: remove raw pointer from printk() message
	ovl: fix failure to fsync lower dir
	mn10300/misalignment: Use SIGSEGV SEGV_MAPERR to report a failed user copy
	ftrace: Remove incorrect setting of glob search field
	Linux 4.4.116

Change-Id: Id000cb8d59b74de063902e9ad24dd07fe1b1694b
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-02-20 16:23:06 +01:00
Steven Rostedt (VMware)
911357aed6 sched/rt: Up the root domain ref count when passing it around via IPIs
commit 364f56653708ba8bcdefd4f0da2a42904baa8eeb upstream.

When issuing an IPI RT push, where an IPI is sent to each CPU that has more
than one RT task scheduled on it, it references the root domain's rto_mask,
that contains all the CPUs within the root domain that has more than one RT
task in the runable state. The problem is, after the IPIs are initiated, the
rq->lock is released. This means that the root domain that is associated to
the run queue could be freed while the IPIs are going around.

Add a sched_get_rd() and a sched_put_rd() that will increment and decrement
the root domain's ref count respectively. This way when initiating the IPIs,
the scheduler will up the root domain's ref count before releasing the
rq->lock, ensuring that the root domain does not go away until the IPI round
is complete.

Reported-by: Pavan Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 4bdced5c9a292 ("sched/rt: Simplify the IPI based RT balancing logic")
Link: http://lkml.kernel.org/r/CAEU1=PkiHO35Dzna8EQqNSKW1fr1y1zRQ5y66X117MG06sQtNA@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-16 20:09:40 +01:00
Joel Fernandes
a81d322647 ANDROID: sched/rt: schedtune: Add boost retention to RT
Boosted RT tasks can be deboosted quickly, this makes boost usless
for RT tasks and causes lots of glitching. Use timers to prevent
de-boost too soon and wait for long enough such that next enqueue
happens after a threshold.

While this can be solved in the governor, there are following
advantages:
- The approach used is governor-independent
- Reduces boost group lock contention for frequently sleepers/wakers

Note:
Fixed build breakage due to schedfreq dependency which isn't used
for RT anymore.

Bug: 30210506

Change-Id: I428a2695cac06cc3458cdde0dea72315e4e66c00
Signed-off-by: Joel Fernandes <joelaf@google.com>
2018-02-01 11:19:48 -08:00
Greg Kroah-Hartman
9fbf3d7374 This is the 4.4.103 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlofw0sACgkQONu9yGCS
 aT4MPBAAo85uk2d6CXKRkNl3qKWtiStKXUet+NJFVr4GotOeg6ul9yul5jcs4pvl
 BJYnBh2LE77oDCOUKaSKI/0nDOHJs9n5m8GxjvG6cAvfn9RdgNm6kCCxNQFEhpNT
 IrmRrmCMd3aKPNrdz2Cbd4qHzNr0JuIv/bykNHDA/rw+PkQeLzZgiGIw9ftg1yHJ
 npzNLCjfVDPRy4qUCDYSS7+p83oHpWq3tHfha7M1S5HphsjVWjG79ABIKkN8w86z
 5KnY3dqt5tqO4w0gZzKXv0gg4IJS62YqeJbF/dSefASvnBkINIzxBOEu0+xOFQ5t
 ezKkukpe8ivX4eUP2ruF9jAjVLCPYCm6UaWbYQZBAAf04KHC09uXDjB4wdGCINt6
 tdOgfm60OsPHUFjx9KBn8M81Iabq8DYNubp+naG2U/j7lGzh3+mvyAlzQKetXMct
 b69skOxrjfT+2cCYeqz0UupHJigi5VLjX8hjpraXJA9oEwdS5gr9CfckEN3aUysu
 YmQ2LtgGuglUdV3Lc4QptFxRDoKna3E/Gx6rzMDPtRdV1L6dn9CULRz+Pw4T+nWl
 m6Ly9QXJVmC+d6fPW7cOEytPKRIqAUHSXQZxcPNPEcaPxD9CPWGO6TJLanc0BNYS
 g7u9kLA2fWmWnAkvEosP8lxJlQvgorhkXdCpEWuL+mAbnaImpts=
 =2wPT
 -----END PGP SIGNATURE-----

Merge 4.4.103 into android-4.4

Changes in 4.4.103
	s390: fix transactional execution control register handling
	s390/runtime instrumention: fix possible memory corruption
	s390/disassembler: add missing end marker for e7 table
	s390/disassembler: increase show_code buffer size
	ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER
	AF_VSOCK: Shrink the area influenced by prepare_to_wait
	vsock: use new wait API for vsock_stream_sendmsg()
	sched: Make resched_cpu() unconditional
	lib/mpi: call cond_resched() from mpi_powm() loop
	x86/decoder: Add new TEST instruction pattern
	ARM: 8722/1: mm: make STRICT_KERNEL_RWX effective for LPAE
	ARM: 8721/1: mm: dump: check hardware RO bit for LPAE
	MIPS: ralink: Fix MT7628 pinmux
	MIPS: ralink: Fix typo in mt7628 pinmux function
	ALSA: hda: Add Raven PCI ID
	dm bufio: fix integer overflow when limiting maximum cache size
	dm: fix race between dm_get_from_kobject() and __dm_destroy()
	MIPS: Fix an n32 core file generation regset support regression
	MIPS: BCM47XX: Fix LED inversion for WRT54GSv1
	autofs: don't fail mount for transient error
	nilfs2: fix race condition that causes file system corruption
	eCryptfs: use after free in ecryptfs_release_messaging()
	bcache: check ca->alloc_thread initialized before wake up it
	isofs: fix timestamps beyond 2027
	NFS: Fix typo in nomigration mount option
	nfs: Fix ugly referral attributes
	nfsd: deal with revoked delegations appropriately
	rtlwifi: rtl8192ee: Fix memory leak when loading firmware
	rtlwifi: fix uninitialized rtlhal->last_suspend_sec time
	ata: fixes kernel crash while tracing ata_eh_link_autopsy event
	ext4: fix interaction between i_size, fallocate, and delalloc after a crash
	ALSA: pcm: update tstamp only if audio_tstamp changed
	ALSA: usb-audio: Add sanity checks to FE parser
	ALSA: usb-audio: Fix potential out-of-bound access at parsing SU
	ALSA: usb-audio: Add sanity checks in v2 clock parsers
	ALSA: timer: Remove kernel warning at compat ioctl error paths
	ALSA: hda/realtek - Fix ALC700 family no sound issue
	fix a page leak in vhost_scsi_iov_to_sgl() error recovery
	fs/9p: Compare qid.path in v9fs_test_inode
	iscsi-target: Fix non-immediate TMR reference leak
	target: Fix QUEUE_FULL + SCSI task attribute handling
	KVM: nVMX: set IDTR and GDTR limits when loading L1 host state
	KVM: SVM: obey guest PAT
	SUNRPC: Fix tracepoint storage issues with svc_recv and svc_rqst_status
	clk: ti: dra7-atl-clock: Fix of_node reference counting
	clk: ti: dra7-atl-clock: fix child-node lookups
	libnvdimm, namespace: fix label initialization to use valid seq numbers
	libnvdimm, namespace: make 'resource' attribute only readable by root
	IB/srpt: Do not accept invalid initiator port names
	IB/srp: Avoid that a cable pull can trigger a kernel crash
	NFC: fix device-allocation error return
	i40e: Use smp_rmb rather than read_barrier_depends
	igb: Use smp_rmb rather than read_barrier_depends
	igbvf: Use smp_rmb rather than read_barrier_depends
	ixgbevf: Use smp_rmb rather than read_barrier_depends
	i40evf: Use smp_rmb rather than read_barrier_depends
	fm10k: Use smp_rmb rather than read_barrier_depends
	ixgbe: Fix skb list corruption on Power systems
	parisc: Fix validity check of pointer size argument in new CAS implementation
	powerpc/signal: Properly handle return value from uprobe_deny_signal()
	media: Don't do DMA on stack for firmware upload in the AS102 driver
	media: rc: check for integer overflow
	cx231xx-cards: fix NULL-deref on missing association descriptor
	media: v4l2-ctrl: Fix flags field on Control events
	sched/rt: Simplify the IPI based RT balancing logic
	fscrypt: lock mutex before checking for bounce page pool
	net/9p: Switch to wait_event_killable()
	PM / OPP: Add missing of_node_put(np)
	e1000e: Fix error path in link detection
	e1000e: Fix return value test
	e1000e: Separate signaling for link check/link up
	RDS: RDMA: return appropriate error on rdma map failures
	PCI: Apply _HPX settings only to relevant devices
	dmaengine: zx: set DMA_CYCLIC cap_mask bit
	net: Allow IP_MULTICAST_IF to set index to L3 slave
	net: 3com: typhoon: typhoon_init_one: make return values more specific
	net: 3com: typhoon: typhoon_init_one: fix incorrect return values
	drm/armada: Fix compile fail
	ath10k: fix incorrect txpower set by P2P_DEVICE interface
	ath10k: ignore configuring the incorrect board_id
	ath10k: fix potential memory leak in ath10k_wmi_tlv_op_pull_fw_stats()
	ath10k: set CTS protection VDEV param only if VDEV is up
	ALSA: hda - Apply ALC269_FIXUP_NO_SHUTUP on HDA_FIXUP_ACT_PROBE
	drm: Apply range restriction after color adjustment when allocation
	mac80211: Remove invalid flag operations in mesh TSF synchronization
	mac80211: Suppress NEW_PEER_CANDIDATE event if no room
	iio: light: fix improper return value
	staging: iio: cdc: fix improper return value
	spi: SPI_FSL_DSPI should depend on HAS_DMA
	netfilter: nft_queue: use raw_smp_processor_id()
	netfilter: nf_tables: fix oob access
	ASoC: rsnd: don't double free kctrl
	btrfs: return the actual error value from from btrfs_uuid_tree_iterate
	ASoC: wm_adsp: Don't overrun firmware file buffer when reading region data
	s390/kbuild: enable modversions for symbols exported from asm
	xen: xenbus driver must not accept invalid transaction ids
	Revert "sctp: do not peel off an assoc from one netns to another one"
	Linux 4.4.103

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-11-30 15:43:08 +00:00
Steven Rostedt (Red Hat)
cb1831a83e sched/rt: Simplify the IPI based RT balancing logic
commit 4bdced5c9a2922521e325896a7bbbf0132c94e56 upstream.

When a CPU lowers its priority (schedules out a high priority task for a
lower priority one), a check is made to see if any other CPU has overloaded
RT tasks (more than one). It checks the rto_mask to determine this and if so
it will request to pull one of those tasks to itself if the non running RT
task is of higher priority than the new priority of the next task to run on
the current CPU.

When we deal with large number of CPUs, the original pull logic suffered
from large lock contention on a single CPU run queue, which caused a huge
latency across all CPUs. This was caused by only having one CPU having
overloaded RT tasks and a bunch of other CPUs lowering their priority. To
solve this issue, commit:

  b6366f048e ("sched/rt: Use IPI to trigger RT task push migration instead of pulling")

changed the way to request a pull. Instead of grabbing the lock of the
overloaded CPU's runqueue, it simply sent an IPI to that CPU to do the work.

Although the IPI logic worked very well in removing the large latency build
up, it still could suffer from a large number of IPIs being sent to a single
CPU. On a 80 CPU box, I measured over 200us of processing IPIs. Worse yet,
when I tested this on a 120 CPU box, with a stress test that had lots of
RT tasks scheduling on all CPUs, it actually triggered the hard lockup
detector! One CPU had so many IPIs sent to it, and due to the restart
mechanism that is triggered when the source run queue has a priority status
change, the CPU spent minutes! processing the IPIs.

Thinking about this further, I realized there's no reason for each run queue
to send its own IPI. As all CPUs with overloaded tasks must be scanned
regardless if there's one or many CPUs lowering their priority, because
there's no current way to find the CPU with the highest priority task that
can schedule to one of these CPUs, there really only needs to be one IPI
being sent around at a time.

This greatly simplifies the code!

The new approach is to have each root domain have its own irq work, as the
rto_mask is per root domain. The root domain has the following fields
attached to it:

  rto_push_work	 - the irq work to process each CPU set in rto_mask
  rto_lock	 - the lock to protect some of the other rto fields
  rto_loop_start - an atomic that keeps contention down on rto_lock
		    the first CPU scheduling in a lower priority task
		    is the one to kick off the process.
  rto_loop_next	 - an atomic that gets incremented for each CPU that
		    schedules in a lower priority task.
  rto_loop	 - a variable protected by rto_lock that is used to
		    compare against rto_loop_next
  rto_cpu	 - The cpu to send the next IPI to, also protected by
		    the rto_lock.

When a CPU schedules in a lower priority task and wants to make sure
overloaded CPUs know about it. It increments the rto_loop_next. Then it
atomically sets rto_loop_start with a cmpxchg. If the old value is not "0",
then it is done, as another CPU is kicking off the IPI loop. If the old
value is "0", then it will take the rto_lock to synchronize with a possible
IPI being sent around to the overloaded CPUs.

If rto_cpu is greater than or equal to nr_cpu_ids, then there's either no
IPI being sent around, or one is about to finish. Then rto_cpu is set to the
first CPU in rto_mask and an IPI is sent to that CPU. If there's no CPUs set
in rto_mask, then there's nothing to be done.

When the CPU receives the IPI, it will first try to push any RT tasks that is
queued on the CPU but can't run because a higher priority RT task is
currently running on that CPU.

Then it takes the rto_lock and looks for the next CPU in the rto_mask. If it
finds one, it simply sends an IPI to that CPU and the process continues.

If there's no more CPUs in the rto_mask, then rto_loop is compared with
rto_loop_next. If they match, everything is done and the process is over. If
they do not match, then a CPU scheduled in a lower priority task as the IPI
was being passed around, and the process needs to start again. The first CPU
in rto_mask is sent the IPI.

This change removes this duplication of work in the IPI logic, and greatly
lowers the latency caused by the IPIs. This removed the lockup happening on
the 120 CPU machine. It also simplifies the code tremendously. What else
could anyone ask for?

Thanks to Peter Zijlstra for simplifying the rto_loop_start atomic logic and
supplying me with the rto_start_trylock() and rto_start_unlock() helper
functions.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Clark Williams <williams@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott Wood <swood@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170424114732.1aac6dc4@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-30 08:37:25 +00:00
Todd Kjos
3822fe484c Revert "ANDROID: sched/rt: schedtune: Add boost retention to RT"
This reverts commit d194ba5d71.

Reason for revert: Broke some builds. Will fix and resubmit.

Change-Id: I4e6fa1562346eda1bbf058f1d5ace5ba6256ce07
2017-11-08 00:43:53 +00:00
Viresh Kumar
df147c9e33 cpufreq: Drop schedfreq governor
We all should be using (and improving) the schedutil governor now. Get
rid of the non-upstream governor.

Tested on Hikey.

Change-Id: Ic660756536e5da51952738c3c18b94e31f58cd57
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2017-11-07 23:57:47 +00:00
Joel Fernandes
d194ba5d71 ANDROID: sched/rt: schedtune: Add boost retention to RT
Boosted RT tasks can be deboosted quickly, this makes boost usless
for RT tasks and causes lots of glitching. Use timers to prevent
de-boost too soon and wait for long enough such that next enqueue
happens after a threshold.

While this can be solved in the governor, there are following
advantages:
- The approach used is governor-independent
- Reduces boost group lock contention for frequently sleepers/wakers
- Works with schedfreq without any other schedfreq hacks.

Bug: 30210506

Change-Id: I41788b235586988be446505deb7c0529758a9898
Signed-off-by: Joel Fernandes <joelaf@google.com>
2017-11-07 23:47:42 +00:00
Joonwoo Park
9e293db052 sched: EAS: upmigrate misfit current task
Upmigrate misfit current task upon scheduler tick with stopper.

We can kick an random (not necessarily big CPU) NOHZ idle CPU when a
CPU bound task is in need of upmigration.  But it's not efficient as that
way needs following unnecessary wakeups:

  1. Busy little CPU A to kick idle B
  2. B runs idle balancer and enqueue migration/A
  3. B goes idle
  4. A runs migration/A, enqueues busy task on B.
  5. B wakes up again.

This change makes active upmigration more efficiently by doing:

  1. Busy little CPU A find target CPU B upon tick.
  2. CPU A enqueues migration/A.

Change-Id: Ie865738054ea3296f28e6ba01710635efa7193c0
[joonwoop: The original version had logic to reserve CPU.  The logic is
 omitted in this version.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
2017-11-01 15:09:35 -07:00
Srivatsa Vaddagiri
2da014c0d8 sched: Extend active balance to accept 'push_task' argument
Active balance currently picks one task to migrate from busy cpu to
a chosen cpu (push_cpu). This patch extends active load balance to
recognize a particular task ('push_task') that needs to be migrated to
'push_cpu'. This capability will be leveraged by HMP-aware task
placement in a subsequent patch.

Change-Id: If31320111e6cc7044e617b5c3fd6d8e0c0e16952
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2017-11-01 15:09:34 -07:00
Vikram Mulukutla
e79f447a97 sched: walt: Correct WALT window size initialization
It is preferable that WALT window rollover occurs just
before a tick, since the tick is an opportune moment
to record a complete window's statistics, as well as report
those stats to the cpu frequency governor. When CONFIG_HZ
results in a TICK_NSEC that isn't a integral number, this
requirement may be violated. Account for this by reducing
the WALT window size to the nearest multiple of TICK_NSEC.

Commit d368c6faa1 ("sched: walt: fix window misalignment
when HZ=300") attempted to do this but WALT isn't using
MIN_SCHED_RAVG_WINDOW as the window size and the patch was
doing nothing.

Also, change the type of 'walt_disabled' to bool and warn
if an invalid window size causes WALT to be disabled.

Change-Id: Ie3dcfc21a3df4408254ca1165a355bbe391ed5c7
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
2017-10-27 11:58:48 -07:00
Brendan Jackman
38ddcff85a FROMLIST: sched/fair: Use wake_q length as a hint for wake_wide
(from https://patchwork.kernel.org/patch/9895261/)

This patch adds a parameter to select_task_rq, sibling_count_hint
allowing the caller, where it has this information, to inform the
sched_class the number of tasks that are being woken up as part of
the same event.

The wake_q mechanism is one case where this information is available.

select_task_rq_fair can then use the information to detect that it
needs to widen the search space for task placement in order to avoid
overloading the last-level cache domain's CPUs.

                               * * *

The reason I am investigating this change is the following use case
on ARM big.LITTLE (asymmetrical CPU capacity): 1 task per CPU, which
all repeatedly do X amount of work then
pthread_barrier_wait (i.e. sleep until the last task finishes its X
and hits the barrier). On big.LITTLE, the tasks which get a "big" CPU
finish faster, and then those CPUs pull over the tasks that are still
running:

     v CPU v           ->time->

                    -------------
   0  (big)         11111  /333
                    -------------
   1  (big)         22222   /444|
                    -------------
   2  (LITTLE)      333333/
                    -------------
   3  (LITTLE)      444444/
                    -------------

Now when task 4 hits the barrier (at |) and wakes the others up,
there are 4 tasks with prev_cpu=<big> and 0 tasks with
prev_cpu=<little>. want_affine therefore means that we'll only look
in CPUs 0 and 1 (sd_llc), so tasks will be unnecessarily coscheduled
on the bigs until the next load balance, something like this:

     v CPU v           ->time->

                    ------------------------
   0  (big)         11111  /333  31313\33333
                    ------------------------
   1  (big)         22222   /444|424\4444444
                    ------------------------
   2  (LITTLE)      333333/          \222222
                    ------------------------
   3  (LITTLE)      444444/            \1111
                    ------------------------
                                 ^^^
                           underutilization

So, I'm trying to get want_affine = 0 for these tasks.

I don't _think_ any incarnation of the wakee_flips mechanism can help
us here because which task is waker and which tasks are wakees
generally changes with each iteration.

However pthread_barrier_wait (or more accurately FUTEX_WAKE) has the
nice property that we know exactly how many tasks are being woken, so
we can cheat.

It might be a disadvantage that we "widen" _every_ task that's woken in
an event, while select_idle_sibling would work fine for the first
sd_llc_size - 1 tasks.

IIUC, if wake_affine() behaves correctly this trick wouldn't be
necessary on SMP systems, so it might be best guarded by the presence
of SD_ASYM_CPUCAPACITY?

                               * * *

Final note..

In order to observe "perfect" behaviour for this use case, I also had
to disable the TTWU_QUEUE sched feature. Suppose during the wakeup
above we are working through the work queue and have placed tasks 3
and 2, and are about to place task 1:

     v CPU v           ->time->

                    --------------
   0  (big)         11111  /333  3
                    --------------
   1  (big)         22222   /444|4
                    --------------
   2  (LITTLE)      333333/      2
                    --------------
   3  (LITTLE)      444444/          <- Task 1 should go here
                    --------------

If TTWU_QUEUE is enabled, we will not yet have enqueued task
2 (having instead sent a reschedule IPI) or attached its load to CPU
2. So we are likely to also place task 1 on cpu 2. Disabling
TTWU_QUEUE means that we enqueue task 2 before placing task 1,
solving this issue. TTWU_QUEUE is there to minimise rq lock
contention, and I guess that this contention is less of an issue on
big.LITTLE systems since they have relatively few CPUs, which
suggests the trade-off makes sense here.

Change-Id: I2080302839a263e0841a89efea8589ea53bbda9c
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
2017-10-27 18:50:59 +00:00
Joonwoo Park
43bd960dfe sched: WALT: account cumulative window demand
Energy cost estimation has been a long lasting challenge for WALT
because WALT guides CPU frequency based on the CPU utilization of
previous window.  Consequently it's not possible to know newly
waking-up task's energy cost until WALT's end of the current window.

The WALT already tracks 'Previous Runnable Sum' (prev_runnable_sum)
and 'Cumulative Runnable Average' (cr_avg).  They are designed for
CPU frequency guidance and task placement but unfortunately both
are not suitable for the energy cost estimation.

It's because using prev_runnable_sum for energy cost calculation would
make us to account CPU and task's energy solely based on activity in the
previous window so for example, any task didn't have an activity in the
previous window will be accounted as a 'zero energy cost' task.
Energy estimation with cr_avg is what energy_diff() relies on at present.
However cr_avg can only represent instantaneous picture of energy cost
thus for example, if a CPU was fully occupied for an entire WALT window
and became idle just before window boundary, and if there is a wake-up,
energy_diff() accounts that CPU is a 'zero energy cost' CPU.

As a result, introduce a new accounting unit 'Cumulative Window Demand'.
The cumulative window demand tracks all the tasks' demands have seen in
current window which is neither instantaneous nor actual execution time.
Because task demand represents estimated scaled execution time when the
task runs a full window, accumulation of all the demands represents
predicted CPU load at the end of window.

Thus we can estimate CPU's frequency at the end of current WALT window
with the cumulative window demand.

The use of prev_runnable_sum for the CPU frequency guidance and cr_avg
for the task placement have not changed and these are going to be used
for both purpose while this patch aims to add an additional statistics.

Change-Id: I9908c77ead9973a26dea2b36c001c2baf944d4f5
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2017-10-27 18:10:15 +00:00
Vincent Guittot
138a670d97 BACKPORT: sched/cgroup: Fix cpu_cgroup_fork() handling
A new fair task is detached and attached from/to task_group with:

  cgroup_post_fork()
    ss->fork(child) := cpu_cgroup_fork()
      sched_move_task()
        task_move_group_fair()

Which is wrong, because at this point in fork() the task isn't fully
initialized and it cannot 'move' to another group, because its not
attached to any group as yet.

In fact, cpu_cgroup_fork() needs a small part of sched_move_task() so we
can just call this small part directly instead sched_move_task(). And
the task doesn't really migrate because it is not yet attached so we
need the following sequence:

  do_fork()
    sched_fork()
      __set_task_cpu()

    cgroup_post_fork()
      set_task_rq() # set task group and runqueue

    wake_up_new_task()
      select_task_rq() can select a new cpu
      __set_task_cpu
      post_init_entity_util_avg
        attach_task_cfs_rq()
      activate_task
        enqueue_task

This patch makes that happen.

BACKPORT: Difference from original commit:

- Removed use of DEQUEUE_MOVE (which isn't defined in 4.4) in
  dequeue_task flags
- Replaced "struct rq_flags rf" with "unsigned long flags".

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
[ Added TASK_SET_GROUP to set depth properly. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit ea86cb4b7621e1298a37197005bf0abcc86348d4)
Change-Id: I8126fd923288acf961218431ffd29d6bf6fd8d72
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:32 +01:00
Chris Redpath
792510d9b3 BACKPORT: sched/fair: Make it possible to account fair load avg consistently
While set_task_rq_fair() is introduced in mainline by commit ad936d8658fd
("sched/fair: Make it possible to account fair load avg consistently"),
the function results to be introduced here by the backport of
commit 09a43ace1f98 ("sched/fair: Propagate load during synchronous
attach/detach"). The problem (apart from the confusion introduced by the
backport) is actually that set_task_rq_fair() is currently not called at
all.

Fix the problem by backporting again commit ad936d8658fd
("sched/fair: Make it possible to account fair load avg consistently").

Original change log:

The current code accounts for the time a task was absent from the fair
class (per ATTACH_AGE_LOAD). However it does not work correctly when a
task got migrated or moved to another cgroup while outside of the fair
class.

This patch tries to address that by aging on migration. We locklessly
read the 'last_update_time' stamp from both the old and new cfs_rq,
ages the load upto the old time, and sets it to the new time.

These timestamps should in general not be more than 1 tick apart from
one another, so there is a definite bound on things.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
[ Changelog, a few edits and !SMP build fix ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445616981-29904-2-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry-picked from ad936d8658fd348338cb7d42c577dac77892b074)
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: I17294ab0ada3901d35895014715fd60952949358
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
2017-10-27 13:30:32 +01:00
Chris Redpath
fac311be26 cpufreq/sched: Consider max cpu capacity when choosing frequencies
When using schedfreq on cpus with max capacity significantly smaller than
1024, the tick update uses non-normalised capacities - this leads to
selecting an incorrect OPP as we were scaling the frequency as if the
max capacity achievable was 1024 rather than the max for that particular
cpu or group. This could result in a cpu being stuck at the lowest OPP
and unable to generate enough utilisation to climb out if the max
capacity is significantly smaller than 1024.

Instead, normalize the capacity to be in the range 0-1024 in the tick
so that when we later select a frequency, we get the correct one.

Also comments updated to be clearer about what is needed.

Change-Id: Id84391c7ac015311002ada21813a353ee13bee60
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:32 +01:00
Vikram Mulukutla
be832f69a9 sched: walt: Leverage existing helper APIs to apply invariance
There's no need for a separate hierarchy of notifiers, APIs
and variables in walt.c for the purpose of applying frequency
and IPC invariance. Let's just use capacity_curr_of and get
rid of a lot of the infrastructure relating to capacity,
load_scale_factor etc.

Change-Id: Ia220e2c896373fa535db05bff60f9aa33aefc978
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
2017-10-19 11:56:50 -07:00
Joonwoo Park
c8b8c92bbc sched: WALT: fix potential overflow
Task demand and CPU util are in u64.

Change-Id: If7ec1623e723026d3346201122aab0303a6d2ba2
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2017-09-01 17:23:31 -07:00
Joonwoo Park
ee4cebd75e sched: EAS/WALT: use cr_avg instead of prev_runnable_sum
WALT accounts two major statistics; CPU load and cumulative tasks
demand.

The CPU load which is account of accumulated each CPU's absolute
execution time is for CPU frequency guidance.  Whereas cumulative
tasks demand which is each CPU's instantaneous load to reflect
CPU's load at given time is for task placement decision.

Use cumulative tasks demand for cpu_util() for task placement and
introduce cpu_util_freq() for frequency guidance.

Change-Id: Id928f01dbc8cb2a617cdadc584c1f658022565c5
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2017-09-01 17:20:59 -07:00
Andy Lutomirski
8bc69d462a UPSTREAM: sched/core: Allow putting thread_info into task_struct
If an arch opts in by setting CONFIG_THREAD_INFO_IN_TASK_STRUCT,
then thread_info is defined as a single 'u32 flags' and is the first
entry of task_struct.  thread_info::task is removed (it serves no
purpose if thread_info is embedded in task_struct), and
thread_info::cpu gets its own slot in task_struct.

This is heavily based on a patch written by Linus.

Originally-from: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jann Horn <jann@thejh.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a0898196f0476195ca02713691a5037a14f2aac5.1473801993.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Bug: 38331309
Change-Id: I25e5a830f2ada5e74fa93661e97e5e701b1b70d2
(cherry picked from commit c65eacbe290b8141554c71b2c94489e73ade8c8d)
Signed-off-by: Zubin Mithra <zsm@google.com>
2017-08-09 15:23:22 +01:00
Greg Kroah-Hartman
9f764bbe06 This is the 4.4.80 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlmHzogACgkQONu9yGCS
 aT72Kg/9Ea02hrf7SCaEmReH0CNBsZiWBp0u/4b6QtXt3TrPDXK0oteIB4SUIVi/
 zOzjU5SkssMLL9RoRQob81DLFJlL0b9ME5nLXxAACe2P74DaRSxA3DDmrYILgerH
 Gnv4k9xjbVMXMjdk6qAZ/SahCFfYPfnPCRO/zPeb3+6EZk8UQpaaB/GNxVCsGFTZ
 AfThsAHYzfFOg2fYdK0T09eDtAFqAokwGY6O8uaigkJt3u5mbMXcgxSp4o322OcG
 V3jxCUPzSk/78QtoSqQErXDCj/30451oLVByMBuRpBJAilsDf6VaURuz1dVfKFW8
 PdkLiy397sir696HwPU0HwHz++kRnZK2u2z//TRDE5wmgsC9VSq9fkggZdmNBol5
 N4ekCWjhYyyJzxf9hTxK/fA4t4KRFtOcdRiEkJj9RDIhT9jxsxPMr3TGJ25LJaUH
 8Qae+nNlYVe7lmaojckGa+AjIMm5HRB7LZnf4VQr1E8kvWpWpwA/0YtnduzPsXhH
 6xqT0rL/1/Z1Jz63/zPAtZ9OSL/ne0hJs+xOuUhKHGwH3oWBKrgmxAH8CAxYq0x9
 Y6ALkDweS3e+vVt+4BcHpUz8JTNTlspMcebt4VvjqvmERpKwmVsl7tEY242Uw4LQ
 wMF50vA9Cc0bVkVS7w2Ns/dn6XEWYpqS4a/MninjaBOMbtMia78=
 =l+tE
 -----END PGP SIGNATURE-----

Merge 4.4.80 into android-4.4

Changes in 4.4.80
	af_key: Add lock to key dump
	pstore: Make spinlock per zone instead of global
	net: reduce skb_warn_bad_offload() noise
	powerpc/pseries: Fix of_node_put() underflow during reconfig remove
	crypto: authencesn - Fix digest_null crash
	md/raid5: add thread_group worker async_tx_issue_pending_all
	drm/vmwgfx: Fix gcc-7.1.1 warning
	drm/nouveau/bar/gf100: fix access to upper half of BAR2
	KVM: PPC: Book3S HV: Context-switch EBB registers properly
	KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exit
	KVM: PPC: Book3S HV: Reload HTM registers explicitly
	KVM: PPC: Book3S HV: Save/restore host values of debug registers
	Revert "powerpc/numa: Fix percpu allocations to be NUMA aware"
	Staging: comedi: comedi_fops: Avoid orphaned proc entry
	drm/rcar: Nuke preclose hook
	drm: rcar-du: Perform initialization/cleanup at probe/remove time
	drm: rcar-du: Simplify and fix probe error handling
	perf intel-pt: Fix ip compression
	perf intel-pt: Fix last_ip usage
	perf intel-pt: Use FUP always when scanning for an IP
	perf intel-pt: Ensure never to set 'last_ip' when packet 'count' is zero
	xfs: don't BUG() on mixed direct and mapped I/O
	nfc: fdp: fix NULL pointer dereference
	net: phy: Do not perform software reset for Generic PHY
	isdn: Fix a sleep-in-atomic bug
	isdn/i4l: fix buffer overflow
	ath10k: fix null deref on wmi-tlv when trying spectral scan
	wil6210: fix deadlock when using fw_no_recovery option
	mailbox: always wait in mbox_send_message for blocking Tx mode
	mailbox: skip complete wait event if timer expired
	mailbox: handle empty message in tx_tick
	mpt3sas: Don't overreach ioc->reply_post[] during initialization
	kaweth: fix firmware download
	kaweth: fix oops upon failed memory allocation
	sched/cgroup: Move sched_online_group() back into css_online() to fix crash
	PM / Domains: defer dev_pm_domain_set() until genpd->attach_dev succeeds if present
	RDMA/uverbs: Fix the check for port number
	libnvdimm, btt: fix btt_rw_page not returning errors
	ipmi/watchdog: fix watchdog timeout set on reboot
	dentry name snapshots
	v4l: s5c73m3: fix negation operator
	Make file credentials available to the seqfile interfaces
	/proc/iomem: only expose physical resource addresses to privileged users
	vlan: Propagate MAC address to VLANs
	pstore: Allow prz to control need for locking
	pstore: Correctly initialize spinlock and flags
	pstore: Use dynamic spinlock initializer
	net: skb_needs_check() accepts CHECKSUM_NONE for tx
	sched/cputime: Fix prev steal time accouting during CPU hotplug
	xen/blkback: don't free be structure too early
	xen/blkback: don't use xen_blkif_get() in xen-blkback kthread
	tpm: fix a kernel memory leak in tpm-sysfs.c
	tpm: Replace device number bitmap with IDR
	x86/mce/AMD: Make the init code more robust
	r8169: add support for RTL8168 series add-on card.
	ARM: dts: n900: Mark eMMC slot with no-sdio and no-sd flags
	ipv6: Should use consistent conditional judgement for ip6 fragment between __ip6_append_data and ip6_finish_output
	net/mlx4: Remove BUG_ON from ICM allocation routine
	drm/msm: Ensure that the hardware write pointer is valid
	drm/msm: Verify that MSM_SUBMIT_BO_FLAGS are set
	vfio-pci: use 32-bit comparisons for register address for gcc-4.5
	irqchip/keystone: Fix "scheduling while atomic" on rt
	ASoC: tlv320aic3x: Mark the RESET register as volatile
	spi: dw: Make debugfs name unique between instances
	ASoC: nau8825: fix invalid configuration in Pre-Scalar of FLL
	irqchip/mxs: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
	openrisc: Add _text symbol to fix ksym build error
	dmaengine: ioatdma: Add Skylake PCI Dev ID
	dmaengine: ioatdma: workaround SKX ioatdma version
	dmaengine: ti-dma-crossbar: Add some 'of_node_put()' in error path.
	ARM64: zynqmp: Fix W=1 dtc 1.4 warnings
	ARM64: zynqmp: Fix i2c node's compatible string
	ARM: s3c2410_defconfig: Fix invalid values for NF_CT_PROTO_*
	ACPI / scan: Prefer devices without _HID/_CID for _ADR matching
	usb: gadget: Fix copy/pasted error message
	Btrfs: adjust outstanding_extents counter properly when dio write is split
	tools lib traceevent: Fix prev/next_prio for deadline tasks
	xfrm: Don't use sk_family for socket policy lookups
	perf tools: Install tools/lib/traceevent plugins with install-bin
	perf symbols: Robustify reading of build-id from sysfs
	video: fbdev: cobalt_lcdfb: Handle return NULL error from devm_ioremap
	vfio-pci: Handle error from pci_iomap
	arm64: mm: fix show_pte KERN_CONT fallout
	nvmem: imx-ocotp: Fix wrong register size
	sh_eth: enable RX descriptor word 0 shift on SH7734
	ALSA: usb-audio: test EP_FLAG_RUNNING at urb completion
	HID: ignore Petzl USB headlamp
	scsi: fnic: Avoid sending reset to firmware when another reset is in progress
	scsi: snic: Return error code on memory allocation failure
	ASoC: dpcm: Avoid putting stream state to STOP when FE stream is paused
	Linux 4.4.80

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-08-07 14:29:16 -07:00
Wanpeng Li
62208707b4 sched/cputime: Fix prev steal time accouting during CPU hotplug
commit 3d89e5478bf550a50c99e93adf659369798263b0 upstream.

Commit:

  e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug")

... set rq->prev_* to 0 after a CPU hotplug comes back, in order to
fix the case where (after CPU hotplug) steal time is smaller than
rq->prev_steal_time.

However, this should never happen. Steal time was only smaller because of the
KVM-specific bug fixed by the previous patch.  Worse, the previous patch
triggers a bug on CPU hot-unplug/plug operation: because
rq->prev_steal_time is cleared, all of the CPU's past steal time will be
accounted again on hot-plug.

Since the root cause has been fixed, we can just revert commit e9532e69b8d1.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 'commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug")'
Link: http://lkml.kernel.org/r/1465813966-3116-3-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andres Oportus <andresoportus@google.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-06 19:19:43 -07:00
Chris Redpath
fce0ecf04a schedstats/eas: guard properly to avoid breaking non-smp schedstats users
Add appropriate #ifdef guards to ensure the smp-only easstats structs
are not used when smp is not enabled. Arnd got a report from buildbot,
analysed it, and pointed out exactly what the issue was.

Reported-by: "Arnd Bergmann" <arnd@arndb.de>
Suggested-by: "Arnd Bergmann" <arnd@arndb.de>
Fixes: 4b85765a3d ("sched/fair: Add eas (& cas)
 specific rq, sd and task stats")
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: I60554dea20137f6774db3f59b4afd40a06554cfc
2017-06-03 15:03:03 +01:00
Dietmar Eggemann
4b85765a3d sched/fair: Add eas (& cas) specific rq, sd and task stats
The statistic counter are placed in the eas (& cas) wakeup path. Each
of them has one representation for the runqueue (rq), the sched_domain
(sd) and the task.
A task counter is always incremented. A rq counter is always
incremented for the rq the scheduler is currently running on. A sd
counter is only incremented if a relation to a sd exists.

The counters are exposed:

(1) In /proc/schedstat for rq's and sd's:

$ cat /proc/schedstat
...
cpu0 71422 0 2321254 ...
eas  44144 0 0 19446 0 24698 568435 51621 156932 133 222011 17459 120279 516814 83 0 156962 359235 176439 139981
  <- runqueue for cpu0
...
domain0 3 42430 42331 ...
eas 0 0 0 14200 0 0 0 0 0 0 0 0 0 0 0 0 0 0 66355 0  <- MC sched domain for cpu0
...

The per-cpu eas vector has the following elements:

sis_attempts  sis_idle   sis_cache_affine sis_suff_cap    sis_idle_cpu    sis_count               ||
secb_attempts secb_sync  secb_idle_bt     secb_insuff_cap secb_no_nrg_sav secb_nrg_sav secb_count ||
fbt_attempts  fbt_no_cpu fbt_no_sd        fbt_pref_idle   fbt_count                               ||
cas_attempts  cas_count

The following relations exist between these counters (from cpu0 eas
vector above):

sis_attempts = sis_idle + sis_cache_affine + sis_suff_cap + sis_idle_cpu + sis_count

44144        = 0        + 0                + 19446        + 0            + 24698

secb_attempts = secb_sync + secb_idle_bt + secb_insuff_cap + secb_no_nrg_sav + secb_nrg_sav + secb_count

568435        = 51621     + 156932       + 133             + 222011          + 17459        + 120279

fbt_attempts = fbt_no_cpu + fbt_no_sd + fbt_pref_idle + fbt_count + (return -1)

516814       = 83         + 0         + 156962        + 359235    + (534)

cas_attempts = cas_count + (return -1 or smp_processor_id())

176439       = 139981    + (36458)

(2) In /proc/$PROCESS_PID/task/$TASK_PID/sched for a task.

example: main thread of system_server

$ cat /proc/1083/task/1083/sched

...
se.statistics.nr_wakeups_sis_attempts        :                  945
se.statistics.nr_wakeups_sis_idle            :                    0
se.statistics.nr_wakeups_sis_cache_affine    :                    0
se.statistics.nr_wakeups_sis_suff_cap        :                  219
se.statistics.nr_wakeups_sis_idle_cpu        :                    0
se.statistics.nr_wakeups_sis_count           :                  726
se.statistics.nr_wakeups_secb_attempts       :                10376
se.statistics.nr_wakeups_secb_sync           :                 1462
se.statistics.nr_wakeups_secb_idle_bt        :                 6984
se.statistics.nr_wakeups_secb_insuff_cap     :                    3
se.statistics.nr_wakeups_secb_no_nrg_sav     :                  927
se.statistics.nr_wakeups_secb_nrg_sav        :                  206
se.statistics.nr_wakeups_secb_count          :                  794
se.statistics.nr_wakeups_fbt_attempts        :                 8914
se.statistics.nr_wakeups_fbt_no_cpu          :                    0
se.statistics.nr_wakeups_fbt_no_sd           :                    0
se.statistics.nr_wakeups_fbt_pref_idle       :                 6987
se.statistics.nr_wakeups_fbt_count           :                 1554
se.statistics.nr_wakeups_cas_attempts        :                 3107
se.statistics.nr_wakeups_cas_count           :                 1195
...

The same relation between the counters as in the per-cpu case apply.

Change-Id: Ie7d01267c78a3f41f60a3ef52917d5a5d463f195
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:54 -07:00
Vincent Guittot
e875665411 UPSTREAM: sched/fair: Propagate load during synchronous attach/detach
When a task moves from/to a cfs_rq, we set a flag which is then used to
propagate the change at parent level (sched_entity and cfs_rq) during
next update. If the cfs_rq is throttled, the flag will stay pending until
the cfs_rq is unthrottled.

For propagating the utilization, we copy the utilization of group cfs_rq to
the sched_entity.

For propagating the load, we have to take into account the load of the
whole task group in order to evaluate the load of the sched_entity.
Similarly to what was done before the rewrite of PELT, we add a correction
factor in case the task group's load is greater than its share so it will
contribute the same load of a task of equal weight.

Change-Id: Id34a9888484716961c9027299c0b4d82881a39d1
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten.Rasmussen@arm.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: kernellwp@gmail.com
Cc: pjt@google.com
Cc: yuyang.du@intel.com
Link: http://lkml.kernel.org/r/1478598827-32372-5-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 09a43ace1f986b003c118fdf6ddf1fd685692d49)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:54 -07:00
Vincent Guittot
8370e07d82 UPSTREAM: sched/fair: Fix hierarchical order in rq->leaf_cfs_rq_list
Fix the insertion of cfs_rq in rq->leaf_cfs_rq_list to ensure that a
child will always be called before its parent.

The hierarchical order in shares update list has been introduced by
commit:

  67e86250f8 ("sched: Introduce hierarchal order on shares update list")

With the current implementation a child can be still put after its
parent.

Lets take the example of:

       root
        \
         b
         /\
         c d*
           |
           e*

with root -> b -> c already enqueued but not d -> e so the
leaf_cfs_rq_list looks like: head -> c -> b -> root -> tail

The branch d -> e will be added the first time that they are enqueued,
starting with e then d.

When e is added, its parents is not already on the list so e is put at
the tail : head -> c -> b -> root -> e -> tail

Then, d is added at the head because its parent is already on the
list: head -> d -> c -> b -> root -> e -> tail

e is not placed at the right position and will be called the last
whereas it should be called at the beginning.

Because it follows the bottom-up enqueue sequence, we are sure that we
will finished to add either a cfs_rq without parent or a cfs_rq with a
parent that is already on the list. We can use this event to detect
when we have finished to add a new branch. For the others, whose
parents are not already added, we have to ensure that they will be
added after their children that have just been inserted the steps
before, and after any potential parents that are already in the list.
The easiest way is to put the cfs_rq just after the last inserted one
and to keep track of it untl the branch is fully added.

Change-Id: I4fe0b8502ea628c13d14e8e5c5279bce67fb8845
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten.Rasmussen@arm.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: kernellwp@gmail.com
Cc: pjt@google.com
Cc: yuyang.du@intel.com
Link: http://lkml.kernel.org/r/1478598827-32372-3-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 9c2791f936ef5fd04a118b5c284f2c9a95f4a647)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:54 -07:00
Yuyang Du
9de438d27c BACKPORT: sched/fair: Initiate a new task's util avg to a bounded value
A new task's util_avg is set to full utilization of a CPU (100% time
running). This accelerates a new task's utilization ramp-up, useful to
boost its execution in early time. However, it may result in
(insanely) high utilization for a transient time period when a flood
of tasks are spawned. Importantly, it violates the "fundamentally
bounded" CPU utilization, and its side effect is negative if we don't
take any measure to bound it.

This patch proposes an algorithm to address this issue. It has
two methods to approach a sensible initial util_avg:

(1) An expected (or average) util_avg based on its cfs_rq's util_avg:

  util_avg = cfs_rq->util_avg / (cfs_rq->load_avg + 1) * se.load.weight

(2) A trajectory of how successive new tasks' util develops, which
gives 1/2 of the left utilization budget to a new task such that
the additional util is noticeably large (when overall util is low) or
unnoticeably small (when overall util is high enough). In the meantime,
the aggregate utilization is well bounded:

  util_avg_cap = (1024 - cfs_rq->avg.util_avg) / 2^n

where n denotes the nth task.

If util_avg is larger than util_avg_cap, then the effective util is
clamped to the util_avg_cap.

Change-Id: Idafe989b24d9e70911666f09800bf1d5a011e1f4
Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Yuyang Du <yuyang.du@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: morten.rasmussen@arm.com
Cc: pjt@google.com
Cc: steve.muckle@linaro.org
Link: http://lkml.kernel.org/r/1459283456-21682-1-git-send-email-yuyang.du@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 2b8c41daba327c633228169e8bd8ec067ab443f8)
[integrate with schedfreq - schedfreq has a tuneable for init task util
 but this commit removes the use of the tuneable since we have a new
 algorithm for calculating an initial utilisation. I've left the tuneable
 in place, but it is no longer used even when schedfreq is the CPUFreq
 governor]
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:53 -07:00
Dietmar Eggemann
633b98b651 sched/core: Add first cpu w/ max/min orig capacity to root domain
This will allow to start iterating from a cpu with max or min original
capacity in the wakeup path regardless on which cpu the scheduler is
currently running (smp_processor_id()) or the previous cpu of the task
(task_cpu(p)). This iteration has to happen on a sched_domain spanning
all cpus in the order of the sched_groups of this sched_domain seen by
the starting cpu.

In case of an SMP system the first cpu with max orig capacity and the
the one with min orig capacity is the same. This can temporally happen
on a big.LITTLE system with hotplug as well.

E.g. the different order of cpu iteration can be used to map schedtune
task parameter 'boosted' into the cpu iteration order in
find_best_target().

Use of READ_ONCE()/WRITE_ONCE() to avoid load/store tearing.

Change-Id: I812fbd9c7e5f506617e456c0eec3edcd2c016e92
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
(cherry picked from commit fd6e9543c1fd8971a5e2e68e39b2f6e591d46114)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:53 -07:00
Morten Rasmussen
60cc9f4e1e UPSTREAM: sched/fair: Add per-CPU min capacity to sched_group_capacity
struct sched_group_capacity currently represents the compute capacity
sum of all CPUs in the sched_group.

Unless it is divided by the group_weight to get the average capacity
per CPU, it hides differences in CPU capacity for mixed capacity systems
(e.g. high RT/IRQ utilization or ARM big.LITTLE).

But even the average may not be sufficient if the group covers CPUs of
different capacities.

Instead, by extending struct sched_group_capacity to indicate min per-CPU
capacity in the group a suitable group for a given task utilization can
more easily be found such that CPUs with reduced capacity can be avoided
for tasks with high utilization (not implemented by this patch).

Change-Id: If3cae1be62d01a199e752bca5abb45357d5d0fbd
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: freedom.tan@mediatek.com
Cc: keita.kobayashi.ym@renesas.com
Cc: mgalbraith@suse.de
Cc: sgurrappadi@nvidia.com
Cc: vincent.guittot@linaro.org
Cc: yuyang.du@intel.com
Link: http://lkml.kernel.org/r/1476452472-24740-4-git-send-email-morten.rasmussen@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit bf475ce0a3dd75b5d1df6c6c14ae25168caa15ac)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:52 -07:00
Steve Muckle
f02702dcf2 sched: backport cpufreq hooks from 4.9-rc4
The scheduler cpufreq hooks are required by the schedutil cpufreq
governor.

Change-Id: Ied6c46262bb33b7e81bbb3d3d2761124e0c676b7
Signed-off-by: Steve Muckle <smuckle@linaro.org>
[trivial cherry-picking fixes]
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:50 -07:00
Greg Kroah-Hartman
9bc462220d This is the 4.4.70 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlkm0zAACgkQONu9yGCS
 aT5QnxAAh9uZYFJtQ7wYngD7cQcDH1KVztqEYxCP5OtxzAZBrSNBufLdhKBbc1ZP
 C04Mo+FzzNiJtBwkmlOqYaEPYUSx/uwCEk9mNX85VtchIhKBrwWF7GxkeXCPs6e5
 yP5TUXmxbbSp3qM4q2Z4XSW8eEPZ2l3zoy0fkjz2kS02e4RW0yQ34dvzw0BG2urr
 +9ocyVjDBoU3QNKyVw3fd1AltKesSZK0fa2vEO+TOTW6Bm3xD4egCJdOzu9saUwK
 hfSKXsJ0/pf1r1iyfz2foR/Hi3i4j6vRqnneyqozT7nxEJEuBQ3B5WhnsbDfzrXu
 +CY23KBkDkQ1RBngmtTQd3ABHEN1E2StpBImG5RUr+5giV6/e4rdz0/HWGMvCvAz
 iWqXdgZNdCnc96HPEWaDGUKxndCxsiaJOhgZwW2zm/0drVWRE+vjsOmFLyUp2Ky1
 1vnKfwlvTFU4xjQ5H44AuuSHQsv+GNEtPPIHrbBv/wg90/2VuF0aYuNYjHSsc4Ca
 3YM53S6/sjQqmsKixWboax8Kh2wRrEuFbqSFQV64JjFpGau61JQFMtRNl4+FFXzm
 Cm+26Fan4Wtyo5zB9xnBZbDwCOXqwTXQYUP2SejtObq+Uk2tXxF05emeta9pURF3
 vdgv6N0cTPm4K3VZyBZvj8JitEr2OEaIxoUqE2BXkA1MPmbqOoI=
 =Z1no
 -----END PGP SIGNATURE-----

Merge 4.4.70 into android-4.4

Changes in 4.4.70
	usb: misc: legousbtower: Fix buffers on stack
	usb: misc: legousbtower: Fix memory leak
	USB: ene_usb6250: fix DMA to the stack
	watchdog: pcwd_usb: fix NULL-deref at probe
	char: lp: fix possible integer overflow in lp_setup()
	USB: core: replace %p with %pK
	ARM: tegra: paz00: Mark panel regulator as enabled on boot
	tpm_crb: check for bad response size
	infiniband: call ipv6 route lookup via the stub interface
	dm btree: fix for dm_btree_find_lowest_key()
	dm raid: select the Kconfig option CONFIG_MD_RAID0
	dm bufio: avoid a possible ABBA deadlock
	dm bufio: check new buffer allocation watermark every 30 seconds
	dm cache metadata: fail operations if fail_io mode has been established
	dm bufio: make the parameter "retain_bytes" unsigned long
	dm thin metadata: call precommit before saving the roots
	dm space map disk: fix some book keeping in the disk space map
	md: update slab_cache before releasing new stripes when stripes resizing
	rtlwifi: rtl8821ae: setup 8812ae RFE according to device type
	mwifiex: pcie: fix cmd_buf use-after-free in remove/reset
	ima: accept previously set IMA_NEW_FILE
	KVM: x86: Fix load damaged SSEx MXCSR register
	KVM: X86: Fix read out-of-bounds vulnerability in kvm pio emulation
	regulator: tps65023: Fix inverted core enable logic.
	s390/kdump: Add final note
	s390/cputime: fix incorrect system time
	ath9k_htc: Add support of AirTies 1eda:2315 AR9271 device
	ath9k_htc: fix NULL-deref at probe
	drm/amdgpu: Avoid overflows/divide-by-zero in latency_watermark calculations.
	drm/amdgpu: Make display watermark calculations more accurate
	drm/nouveau/therm: remove ineffective workarounds for alarm bugs
	drm/nouveau/tmr: ack interrupt before processing alarms
	drm/nouveau/tmr: fix corruption of the pending list when rescheduling an alarm
	drm/nouveau/tmr: avoid processing completed alarms when adding a new one
	drm/nouveau/tmr: handle races with hw when updating the next alarm time
	cdc-acm: fix possible invalid access when processing notification
	proc: Fix unbalanced hard link numbers
	of: fix sparse warning in of_pci_range_parser_one
	iio: dac: ad7303: fix channel description
	pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes
	pid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes()
	USB: serial: ftdi_sio: fix setting latency for unprivileged users
	USB: serial: ftdi_sio: add Olimex ARM-USB-TINY(H) PIDs
	ext4 crypto: don't let data integrity writebacks fail with ENOMEM
	ext4 crypto: fix some error handling
	net: qmi_wwan: Add SIMCom 7230E
	fscrypt: fix context consistency check when key(s) unavailable
	f2fs: check entire encrypted bigname when finding a dentry
	fscrypt: avoid collisions when presenting long encrypted filenames
	sched/fair: Do not announce throttled next buddy in dequeue_task_fair()
	sched/fair: Initialize throttle_count for new task-groups lazily
	usb: host: xhci-plat: propagate return value of platform_get_irq()
	xhci: apply PME_STUCK_QUIRK and MISSING_CAS quirk for Denverton
	usb: host: xhci-mem: allocate zeroed Scratchpad Buffer
	net: irda: irda-usb: fix firmware name on big-endian hosts
	usbvision: fix NULL-deref at probe
	mceusb: fix NULL-deref at probe
	ttusb2: limit messages to buffer size
	usb: musb: tusb6010_omap: Do not reset the other direction's packet size
	USB: iowarrior: fix info ioctl on big-endian hosts
	usb: serial: option: add Telit ME910 support
	USB: serial: qcserial: add more Lenovo EM74xx device IDs
	USB: serial: mct_u232: fix big-endian baud-rate handling
	USB: serial: io_ti: fix div-by-zero in set_termios
	USB: hub: fix SS hub-descriptor handling
	USB: hub: fix non-SS hub-descriptor handling
	ipx: call ipxitf_put() in ioctl error path
	iio: proximity: as3935: fix as3935_write
	ceph: fix recursion between ceph_set_acl() and __ceph_setattr()
	gspca: konica: add missing endpoint sanity check
	s5p-mfc: Fix unbalanced call to clock management
	dib0700: fix NULL-deref at probe
	zr364xx: enforce minimum size when reading header
	dvb-frontends/cxd2841er: define symbol_rate_min/max in T/C fe-ops
	cx231xx-audio: fix init error path
	cx231xx-audio: fix NULL-deref at probe
	cx231xx-cards: fix NULL-deref at probe
	powerpc/book3s/mce: Move add_taint() later in virtual mode
	powerpc/pseries: Fix of_node_put() underflow during DLPAR remove
	powerpc/64e: Fix hang when debugging programs with relocated kernel
	ARM: dts: at91: sama5d3_xplained: fix ADC vref
	ARM: dts: at91: sama5d3_xplained: not all ADC channels are available
	arm64: xchg: hazard against entire exchange variable
	arm64: uaccess: ensure extension of access_ok() addr
	arm64: documentation: document tagged pointer stack constraints
	xc2028: Fix use-after-free bug properly
	mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp
	staging: rtl8192e: fix 2 byte alignment of register BSSIDR.
	staging: rtl8192e: rtl92e_get_eeprom_size Fix read size of EPROM_CMD.
	iommu/vt-d: Flush the IOTLB to get rid of the initial kdump mappings
	metag/uaccess: Fix access_ok()
	metag/uaccess: Check access_ok in strncpy_from_user
	uwb: fix device quirk on big-endian hosts
	genirq: Fix chained interrupt data ordering
	osf_wait4(): fix infoleak
	tracing/kprobes: Enforce kprobes teardown after testing
	PCI: Fix pci_mmap_fits() for HAVE_PCI_RESOURCE_TO_USER platforms
	PCI: Freeze PME scan before suspending devices
	drm/edid: Add 10 bpc quirk for LGD 764 panel in HP zBook 17 G2
	nfsd: encoders mustn't use unitialized values in error cases
	drivers: char: mem: Check for address space wraparound with mmap()
	Linux 4.4.70

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-05-25 17:31:28 +02:00
Konstantin Khlebnikov
ada79b5ecd sched/fair: Initialize throttle_count for new task-groups lazily
commit 094f469172e00d6ab0a3130b0e01c83b3cf3a98d upstream.

Cgroup created inside throttled group must inherit current throttle_count.
Broken throttle_count allows to nominate throttled entries as a next buddy,
later this leads to null pointer dereference in pick_next_task_fair().

This patch initialize cfs_rq->throttle_count at first enqueue: laziness
allows to skip locking all rq at group creation. Lazy approach also allows
to skip full sub-tree scan at throttling hierarchy (not in this patch).

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Link: http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Ben Pineau <benjamin.pineau@mirakl.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 14:30:12 +02:00
Greg Hackmann
a5e2a1ddbc ANDROID: sched: fix duplicate sched_group_energy const specifiers
EAS uses "const struct sched_group_energy * const" fairly consistently.
But a couple of places swap the "*" and second "const", making the
pointer mutable.

In the case of struct sched_group, "* const" would have been an error,
since init_sched_energy() writes to sd->groups->sge.

Change-Id: Ic6a8fcf99e65c0f25d9cc55c32625ef3ca5c9aca
Signed-off-by: Greg Hackmann <ghackmann@google.com>
2017-03-17 16:26:10 +00:00
Dmitry Shmidt
441e10ac4c Merge remote-tracking branch 'common/android-4.4' into android-4.4.y 2016-09-13 14:47:50 -07:00
Amit Pundir
aeb4a3112e sched/walt: use do_div instead of division operator
Use do_div() instead of "/" operator to fix undefined references to
"__aeabi_uldivmod" build error for ARCH=arm.

Also in TP_fast_assign(), along with do_div() usage,  replace "," with
";" which would have resulted in a syntax error (!), because
'#define TP_fast_assign(args...) args' would have stripped off the ","
and left white space between these two assignments after CPP phase.

Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
[jstultz: Cherry-picked from common/android-3.18]
Signed-off-by: John Stultz <john.stultz@linaro.org>
2016-09-09 15:19:25 -07:00
Todd Kjos
8935b6b4d2 FIXUP: sched: Fix double-release of spinlock in move_queued_task
BUG: 29519455
Change-Id: I4d1c27a1b4bcbba03d4b175d170cfe1701a90ffd
2016-08-11 14:26:47 -07:00
Patrick Bellasi
dfc1151b46 FIXUP: sched: fix set_cfs_cpu_capacity when WALT is in use
The CPU utilization reported when WALT is in use already tracks the
contributions due to RT and DL workloads. However, SchedFreq exposes
different capacity update functions, one for each class, and does classes
utilization internally at update_cpu_capacity_request() call time.

This patch ensures that when WALT is in use, the:
  cpu_sched_capacity_reqs::cfs
value is tracking just the load generated by SCHED_OTHER tasks.

Change-Id: Ibd9c9a10874a1d91f62477034548f7664e57cd6a
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2016-08-11 14:26:44 -07:00
Srinath Sridharan
519c62750e sched/walt: Accounting for number of irqs pending on each core
Schedules on a core whose irq count is less than a threshold.
Improves I/O performance of EAS.

Change-Id: I08ff7dd0d22502a0106fc636b1af2e6fe9e758b5
2016-08-11 14:26:43 -07:00
Srivatsa Vaddagiri
efb86bd08a sched: Introduce Window Assisted Load Tracking (WALT)
use a window based view of time in order to track task
demand and CPU utilization in the scheduler.

Window Assisted Load Tracking (WALT) implementation credits:
 Srivatsa Vaddagiri, Steve Muckle, Syed Rameez Mustafa, Joonwoo Park,
 Pavan Kumar Kondeti, Olav Haugan

2016-03-06: Integration with EAS/refactoring by Vikram Mulukutla
            and Todd Kjos

Change-Id: I21408236836625d4e7d7de1843d20ed5ff36c708

Includes fixes for issues:

eas/walt: Use walt_ktime_clock() instead of ktime_get_ns() to avoid a
race resulting in watchdog resets
BUG: 29353986
Change-Id: Ic1820e22a136f7c7ebd6f42e15f14d470f6bbbdb

Handle walt accounting anomoly during resume

During resume, there is a corner case where on wakeup, a task's
prev_runnable_sum can go negative. This is a workaround that
fixes the condition and warns (instead of crashing).

BUG: 29464099
Change-Id: I173e7874324b31a3584435530281708145773508

Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Srinath Sridharan <srinathsr@google.com>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
[jstultz: fwdported to 4.4]
Signed-off-by: John Stultz <john.stultz@linaro.org>
2016-08-11 14:26:43 -07:00
Patrick Bellasi
af14760e19 FIXUP: sched/tune: fix accounting for runnable tasks
Contains:

sched/tune: fix accounting for runnable tasks (1/5)

The accounting for tasks into boost groups of different CPUs is currently
broken mainly because:
a) we do not properly track the change of boost group of a RUNNABLE task
b) there are race conditions between migration code and accounting code

This patch provides a fixes to ensure enqueue/dequeue
accounting also for throttled tasks.

Without this patch is can happen that a task is enqueued into a throttled
RQ thus not being accounted for the boosting of the corresponding RQ.
We could argue that a throttled task should not boost a CPU, however:
a) properly implementing CPU boosting considering throttled tasks will
   increase a lot the complexity of the solution
b) it's not easy to quantify the benefits introduced by such a more
   complex solution

Since task throttling requires the usage of the CFS bandwidth controller,
which is not widely used on mobile systems (at least not by Android kernels
so far), for the time being we go for the simple solution and boost also
for throttled RQs.

sched/tune: fix accounting for runnable tasks (2/5)

This patch provides the code required to enforce proper locking.
A per boost group spinlock has been added to grant atomic
accounting of tasks as well as to serialise enqueue/dequeue operations,
triggered by tasks migrations, with cgroups's attach/detach operations.

sched/tune: fix accounting for runnable tasks (3/5)

This patch adds cgroups {allow,can,cancel}_attach callbacks.

Since a task can be migrated between boost groups while it's running,
the CGroups's attach callbacks have been added to properly migrate
boost contributions of RUNNABLE tasks.

The RQ's lock is used to serialise enqueue/dequeue operations, triggered
by tasks migrations, with cgroups's attach/detach operations. While the
SchedTune's CPU lock is used to grant atrocity of the accounting within
the CPU.

NOTE: the current implementation does not allows a concurrent CPU migration
      and CGroups change.

sched/tune: fix accounting for runnable tasks (4/5)

This fixes accounting for exiting tasks by adding a dedicated call early
in the do_exit() syscall, which disables SchedTune accounting as soon as a
task is flagged PF_EXITING.

This flag is set before the multiple dequeue/enqueue dance triggered
by cgroup_exit() which is useful only to inject useless tasks movements
thus increasing possibilities for race conditions with the migration code.
The schedtune_exit_task() call does the last dequeue of a task from its
current boost group. This is a solution more aligned with what happens in
mainline kernels (>v4.4) where the exit_cgroup does not move anymore a dying
task to the root control group.

sched/tune: fix accounting for runnable tasks (5/5)

To avoid accounting issues at startup, this patch disable the SchedTune
accounting until the required data structures have been properly
initialized.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
[jstultz: fwdported to 4.4]
Signed-off-by: John Stultz <john.stultz@linaro.org>
2016-08-11 14:26:41 -07:00
Patrick Bellasi
765c2ab363 FIXUP: sched: fix build for non-SMP target
Currently the build for a single-core (e.g. user-mode) Linux is broken
and this configuration is required (at least) to run some network tests.

The main issues for the current code support on single-core systems are:
1. {se,rq}::sched_avg is not available nor maintained for !SMP systems
   This means that load and utilisation signals are NOT available in single
   core systems. All the EAS code depends on these signals.
2. sched_group_energy is also SMP dependant. Again this means that all the
   EAS setup and preparation code (energyn model initialization) has to be
   properly guarded/disabled for !SMP systems.
3. SchedFreq depends on utilization signal, which is not available on
   !SMP systems.
4. SchedTune is useless on unicore systems if SchedFreq is not available.
5. WALT machinery is not required on single-core systems.

This patch addresses all these issues by enforcing some constraints for
single-core systems:
a) WALT, SchedTune and SchedTune are now dependant on SMP
b) The default governor for !SMP systems is INTERACTIVE
c) The energy model initialisation/build functions are
d) Other minor code re-arrangements and CONFIG_SMP guarding to enable
   single core builds.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2016-08-10 15:08:01 -07:00
Joseph Lo
3a400abdc5 CHROMIUM: sched: update the average of nr_running
Doing a Exponential moving average per nr_running++/-- does not
guarantee a fixed sample rate which induces errors if there are lots of
threads being enqueued/dequeued from the rq (Linpack mt). Instead of
keeping track of the avg, the scheduler now keeps track of the integral
of nr_running and allows the readers to perform filtering on top.

Original-author: Sai Charan Gurrappadi <sgurrappadi@nvidia.com>

Change-Id: Id946654f32fa8be0eaf9d8fa7c9a8039b5ef9fab
Signed-off-by: Joseph Lo <josephl@nvidia.com>
Signed-off-by: Andrew Bresticker <abrestic@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/174694
Reviewed-on: https://chromium-review.googlesource.com/272853
[jstultz: fwdported to 4.4]
Signed-off-by: John Stultz <john.stultz@linaro.org>
2016-08-10 15:01:22 -07:00
Juri Lelli
dd2460f387 DEBUG: sched,cpufreq: add cpu_capacity change tracepoint
This is useful when we want to compare cpu utilization and
cpu curr capacity side by side.

Signed-off-by: Juri Lelli <juri.lelli@arm.com>
2016-05-10 16:54:42 +08:00
Vincent Guittot
fab5cc59bf sched: deadline: use deadline bandwidth in scale_rt_capacity
Instead of monitoring the exec time of deadline tasks to evaluate the
CPU capacity consumed by deadline scheduler class, we can directly
calculate it thanks to the sum of utilization of deadline tasks on the
CPU.  We can remove deadline tasks from rt_avg metric and directly use
the average bandwidth of deadline scheduler in scale_rt_capacity.

Based in part on a similar patch from Luca Abeni <luca.abeni@unitn.it>.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Steve Muckle <smuckle@linaro.org>
2016-05-10 16:53:23 +08:00
Vincent Guittot
cd248fa758 sched: remove call of sched_avg_update from sched_rt_avg_update
rt_avg is only used to scale the available CPU's capacity for CFS
tasks.  As the update of this scaling is done during periodic load
balance, we only have to ensure that sched_avg_update has been called
before any periodic load balancing. This requirement is already
fulfilled by __update_cpu_load so the call in sched_rt_avg_update,
which is part of the hotpath, is useless.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Steve Muckle <smuckle@linaro.org>
2016-05-10 16:53:23 +08:00
Steve Muckle
6b6c192453 sched/fair: jump to max OPP when crossing UP threshold
Since the true utilization of a long running task is not detectable
while it is running and might be bigger than the current cpu capacity,
create the maximum cpu capacity head room by requesting the maximum
cpu capacity once the cpu usage plus the capacity margin exceeds the
current capacity. This is also done to try to harm the performance of
a task the least.

Original fair-class only version authored by Juri Lelli
<juri.lelli@arm.com>.

cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Steve Muckle <smuckle@linaro.org>
2016-05-10 16:53:23 +08:00
Juri Lelli
7ff814dd71 sched/{core,fair}: trigger OPP change request on fork()
Patch "sched/fair: add triggers for OPP change requests" introduced OPP
change triggers for enqueue_task_fair(), but the trigger was operating only
for wakeups. Fact is that it makes sense to consider wakeup_new also (i.e.,
fork()), as we don't know anything about a newly created task and thus we
most certainly want to jump to max OPP to not harm performance too much.

However, it is not currently possible (or at least it wasn't evident to me
how to do so :/) to tell new wakeups from other (non wakeup) operations.

This patch introduces an additional flag in sched.h that is only set at
fork() time and it is then consumed in enqueue_task_fair() for our purpose.

cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Steve Muckle <smuckle@linaro.org>
2016-05-10 16:53:22 +08:00
Michael Turquette
a967a45c71 sched: scheduler-driven cpu frequency selection
Scheduler-driven CPU frequency selection hopes to exploit both
per-task and global information in the scheduler to improve frequency
selection policy, achieving lower power consumption, improved
responsiveness/performance, and less reliance on heuristics and
tunables. For further discussion on the motivation of this integration
see [0].

This patch implements a shim layer between the Linux scheduler and the
cpufreq subsystem. The interface accepts capacity requests from the
CFS, RT and deadline sched classes. The requests from each sched class
are summed on each CPU with a margin applied to the CFS and RT
capacity requests to provide some headroom. Deadline requests are
expected to be precise enough given their nature to not require
headroom. The maximum total capacity request for a CPU in a frequency
domain drives the requested frequency for that domain.

Policy is determined by both the sched classes and this shim layer.

Note that this algorithm is event-driven. There is no polling loop to
check cpu idle time nor any other method which is unsynchronized with
the scheduler, aside from a throttling mechanism to ensure frequency
changes are not attempted faster than the hardware can accommodate them.

Thanks to Juri Lelli <juri.lelli@arm.com> for contributing design ideas,
code and test results, and to Ricky Liang <jcliang@chromium.org>
for initialization and static key inc/dec fixes.

[0] http://article.gmane.org/gmane.linux.kernel/1499836

[smuckle@linaro.org: various additions and fixes, revised commit text]

CC: Ricky Liang <jcliang@chromium.org>
Signed-off-by: Michael Turquette <mturquette@baylibre.com>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Steve Muckle <smuckle@linaro.org>
2016-05-10 16:53:22 +08:00
Morten Rasmussen
f2a8923298 sched: Add group_misfit_task load-balance type
To maximize throughput in systems with reduced capacity cpus (e.g.
high RT/IRQ load and/or ARM big.LITTLE) load-balancing has to consider
task and cpu utilization as well as per-cpu compute capacity when
load-balancing in addition to the current average load based
load-balancing policy. Tasks that are scheduled on a reduced capacity
cpu need to be identified and migrated to a higher capacity cpu if
possible.

To implement this additional policy an additional group_type
(load-balance scenario) is added: group_misfit_task. This represents
scenarios where a sched_group has tasks that are not suitable for its
per-cpu capacity. group_misfit_task is only considered if the system is
not overloaded in any other way (group_imbalanced or group_overloaded).

Identifying misfit tasks requires the rq lock to be held. To avoid
taking remote rq locks to examine source sched_groups for misfit tasks,
each cpu is responsible for tracking misfit tasks themselves and update
the rq->misfit_task flag. This means checking task utilization when
tasks are scheduled and on sched_tick.

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
2016-05-10 16:49:54 +08:00
Morten Rasmussen
563ddb604e sched: Add per-cpu max capacity to sched_group_capacity
struct sched_group_capacity currently represents the compute capacity
sum of all cpus in the sched_group. Unless it is divided by the
group_weight to get the average capacity per cpu it hides differences in
cpu capacity for mixed capacity systems (e.g. high RT/IRQ utilization or
ARM big.LITTLE). But even the average may not be sufficient if the group
covers cpus of different capacities. Instead, by extending struct
sched_group_capacity to indicate max per-cpu capacity in the group a
suitable group for a given task utilization can easily be found such
that cpus with reduced capacity can be avoided for tasks with high
utilization (not implemented by this patch).

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
2016-05-10 16:49:54 +08:00