Commit graph

22923 commits

Author SHA1 Message Date
Pavankumar Kondeti
cd6d19b8e4 genriq: pick only one CPU while overriding the affinity during migration
With commit bfc60d474137 ("genirq: Use irq_set_affinity_locked to change
irq affinity"), affinity listeners receive the notification when the irq
affinity is changed during migration. If there is no online and
un-isolated CPU available from the user specified affinity, the affinity
is overridden with all online and un-isolated CPUs. The same cpumask is
notified to PM QOS affinity listener which applies PM_QOS_CPU_DMA_LATENCY
vote to all those CPUs. As the low level irqchip driver sets affinity to
only one CPU, do the same while overriding the affinity during migration.

Change-Id: I0bcb75dd356658da100fbeeefd33ef8b121f4d6d
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2017-06-30 10:41:00 +05:30
Pavankumar Kondeti
76aa496f89 cpu-hotplug: Keep atleast 1 online and un-isolated CPU
The PM_QOS_CPU_DMA_LATENCY vote attached to an IRQ is discarded,
if it is affined to an isolated CPU. So we need atleast 1 CPU
in online and un-isolate state. The scheduler rejects isolating
a CPU if it is the only online and un-isolated CPU in the system.
Add the same check for CPU hotplug.

Change-Id: I5bdfe6e3bb0352ed3ae5a2de90097b73d248f3fc
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2017-06-29 16:27:33 +05:30
Neil Leeder
5f71e693df perf: stop deadlock if attempt to bring cpu up fails
When an attempt is made to free an event on a CPU which is
no longer online, perf tries to bring the CPU online. This
can fail, resulting in an UP_CANCELLED notifier, which
eventually tries to acquire the ctx->mutex which is already
being held by the code, which brings up the CPU.

Removing the attempt to bring the cpu up will remove this
deadlock, but also requires temporarily removing support of
counting events across hotplug. This will be restored in a
later patch.

Conflicts:
	kernel/events/core.c
	kernel/events/hw_breakpoint.c

Change-Id: Iaafa3c6688d26508857472fd5bb32139a137880e
Signed-off-by: Neil Leeder <nleeder@codeaurora.org>
2017-06-28 10:09:20 +05:30
Linux Build Service Account
c9b4dc7067 Merge "Merge branch 'android-4.4@e76c0fa' into branch 'msm-4.4'" 2017-06-22 23:41:14 -07:00
Linux Build Service Account
af39cfe11e Merge "sched: avoid migrating when softint on tgt cpu should be short" 2017-06-22 14:00:20 -07:00
Linux Build Service Account
4dcf7a50c5 Merge "Merge branch 'android-4.4@6fc0573' into branch 'msm-4.4'" 2017-06-22 07:40:22 -07:00
Blagovest Kolenichev
e0a0b484bf Merge branch 'android-4.4@e76c0fa' into branch 'msm-4.4'
* refs/heads/tmp-e76c0fa
  Linux 4.4.72
  arm64: ensure extension of smp_store_release value
  arm64: armv8_deprecated: ensure extension of addr
  usercopy: Adjust tests to deal with SMAP/PAN
  RDMA/qib,hfi1: Fix MR reference count leak on write with immediate
  arm64: entry: improve data abort handling of tagged pointers
  arm64: hw_breakpoint: fix watchpoint matching for tagged pointers
  Make __xfs_xattr_put_listen preperly report errors.
  NFSv4: Don't perform cached access checks before we've OPENed the file
  NFS: Ensure we revalidate attributes before using execute_ok()
  mm: consider memblock reservations for deferred memory initialization sizing
  net: better skb->sender_cpu and skb->napi_id cohabitation
  serial: sh-sci: Fix panic when serial console and DMA are enabled
  tty: Drop krefs for interrupted tty lock
  drivers: char: mem: Fix wraparound check to allow mappings up to the end
  ASoC: Fix use-after-free at card unregistration
  ALSA: timer: Fix missing queue indices reset at SNDRV_TIMER_IOCTL_SELECT
  ALSA: timer: Fix race between read and ioctl
  drm/nouveau/tmr: fully separate alarm execution/pending lists
  drm/vmwgfx: Make sure backup_handle is always valid
  drm/vmwgfx: limit the number of mip levels in vmw_gb_surface_define_ioctl()
  drm/vmwgfx: Handle vmalloc() failure in vmw_local_fifo_reserve()
  perf/core: Drop kernel samples even though :u is specified
  powerpc/hotplug-mem: Fix missing endian conversion of aa_index
  powerpc/numa: Fix percpu allocations to be NUMA aware
  powerpc/eeh: Avoid use after free in eeh_handle_special_event()
  scsi: qla2xxx: don't disable a not previously enabled PCI device
  KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages
  btrfs: fix memory leak in update_space_info failure path
  btrfs: use correct types for page indices in btrfs_page_exists_in_range
  cxl: Fix error path on bad ioctl
  ufs_getfrag_block(): we only grab ->truncate_mutex on block creation path
  ufs_extend_tail(): fix the braino in calling conventions of ufs_new_fragments()
  ufs: set correct ->s_maxsize
  ufs: restore maintaining ->i_blocks
  fix ufs_isblockset()
  ufs: restore proper tail allocation
  fs: add i_blocksize()
  cpuset: consider dying css as offline
  Input: elantech - add Fujitsu Lifebook E546/E557 to force crc_enabled
  drm/msm: Expose our reservation object when exporting a dmabuf.
  target: Re-add check to reject control WRITEs with overflow data
  cpufreq: cpufreq_register_driver() should return -ENODEV if init fails
  stackprotector: Increase the per-task stack canary's random range from 32 bits to 64 bits on 64-bit platforms
  random: properly align get_random_int_hash
  drivers: char: random: add get_random_long()
  iio: proximity: as3935: fix AS3935_INT mask
  iio: light: ltr501 Fix interchanged als/ps register field
  staging/lustre/lov: remove set_fs() call from lov_getstripe()
  usb: chipidea: debug: check before accessing ci_role
  usb: chipidea: udc: fix NULL pointer dereference if udc_start failed
  usb: gadget: f_mass_storage: Serialize wake and sleep execution
  ext4: fix fdatasync(2) after extent manipulation operations
  ext4: keep existing extra fields when inode expands
  ext4: fix SEEK_HOLE
  xen-netfront: cast grant table reference first to type int
  xen-netfront: do not cast grant table reference to signed short
  xen/privcmd: Support correctly 64KB page granularity when mapping memory
  dmaengine: ep93xx: Always start from BASE0
  dmaengine: usb-dmac: Fix DMAOR AE bit definition
  KVM: async_pf: avoid async pf injection when in guest mode
  arm: KVM: Allow unaligned accesses at HYP
  KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation
  kvm: async_pf: fix rcu_irq_enter() with irqs enabled
  nfsd: Fix up the "supattr_exclcreat" attributes
  nfsd4: fix null dereference on replay
  drm/amdgpu/ci: disable mclk switching for high refresh rates (v2)
  crypto: gcm - wait for crypto op not signal safe
  KEYS: fix freeing uninitialized memory in key_update()
  KEYS: fix dereferencing NULL payload with nonzero length
  ptrace: Properly initialize ptracer_cred on fork
  serial: ifx6x60: fix use-after-free on module unload
  arch/sparc: support NR_CPUS = 4096
  sparc64: delete old wrap code
  sparc64: new context wrap
  sparc64: add per-cpu mm of secondary contexts
  sparc64: redefine first version
  sparc64: combine activate_mm and switch_mm
  sparc64: reset mm cpumask after wrap
  sparc: Machine description indices can vary
  sparc64: mm: fix copy_tsb to correctly copy huge page TSBs
  net: bridge: start hello timer only if device is up
  net: ethoc: enable NAPI before poll may be scheduled
  net: ping: do not abuse udp_poll()
  ipv6: Fix leak in ipv6_gso_segment().
  vxlan: fix use-after-free on deletion
  tcp: disallow cwnd undo when switching congestion control
  cxgb4: avoid enabling napi twice to the same queue
  ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt()
  bnx2x: Fix Multi-Cos
  ANDROID: uid_sys_stats: check previous uid_entry before call find_or_register_uid
  ANDROID: sdcardfs: d_splice_alias can return error values

Change-Id: I829ebf1a9271dcf0462c537e7bfcbcfde322f336
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2017-06-20 14:55:15 -07:00
Blagovest Kolenichev
c5f247dd6d Merge branch 'android-4.4@6fc0573' into branch 'msm-4.4'
* refs/heads/tmp-6fc0573:
  Linux 4.4.71
  xfs: only return -errno or success from attr ->put_listent
  xfs: in _attrlist_by_handle, copy the cursor back to userspace
  xfs: fix unaligned access in xfs_btree_visit_blocks
  xfs: bad assertion for delalloc an extent that start at i_size
  xfs: fix indlen accounting error on partial delalloc conversion
  xfs: wait on new inodes during quotaoff dquot release
  xfs: update ag iterator to support wait on new inodes
  xfs: support ability to wait on new inodes
  xfs: fix up quotacheck buffer list error handling
  xfs: prevent multi-fsb dir readahead from reading random blocks
  xfs: handle array index overrun in xfs_dir2_leaf_readbuf()
  xfs: fix over-copying of getbmap parameters from userspace
  xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff()
  xfs: Fix missed holes in SEEK_HOLE implementation
  mlock: fix mlock count can not decrease in race condition
  mm/migrate: fix refcount handling when !hugepage_migration_supported()
  drm/gma500/psb: Actually use VBT mode when it is found
  slub/memcg: cure the brainless abuse of sysfs attributes
  ALSA: hda - apply STAC_9200_DELL_M22 quirk for Dell Latitude D430
  pcmcia: remove left-over %Z format
  drm/radeon: Unbreak HPD handling for r600+
  drm/radeon/ci: disable mclk switching for high refresh rates (v2)
  scsi: mpt3sas: Force request partial completion alignment
  HID: wacom: Have wacom_tpc_irq guard against possible NULL dereference
  mmc: sdhci-iproc: suppress spurious interrupt with Multiblock read
  i2c: i2c-tiny-usb: fix buffer not being DMA capable
  vlan: Fix tcp checksum offloads in Q-in-Q vlans
  net: phy: marvell: Limit errata to 88m1101
  netem: fix skb_orphan_partial()
  ipv4: add reference counting to metrics
  sctp: fix ICMP processing if skb is non-linear
  tcp: avoid fastopen API to be used on AF_UNSPEC
  virtio-net: enable TSO/checksum offloads for Q-in-Q vlans
  be2net: Fix offload features for Q-in-Q packets
  ipv6: fix out of bound writes in __ip6_append_data()
  bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
  qmi_wwan: add another Lenovo EM74xx device ID
  bridge: netlink: check vlan_default_pvid range
  ipv6: Check ip6_find_1stfragopt() return value properly.
  ipv6: Prevent overrun when parsing v6 header options
  net: Improve handling of failures on link and route dumps
  tcp: eliminate negative reordering in tcp_clean_rtx_queue
  sctp: do not inherit ipv6_{mc|ac|fl}_list from parent
  sctp: fix src address selection if using secondary addresses for ipv6
  tcp: avoid fragmenting peculiar skbs in SACK
  s390/qeth: avoid null pointer dereference on OSN
  s390/qeth: unbreak OSM and OSN support
  s390/qeth: handle sysfs error during initialization
  ipv6/dccp: do not inherit ipv6_mc_list from parent
  dccp/tcp: do not inherit mc_list from parent
  sparc: Fix -Wstringop-overflow warning
  android: base-cfg: disable CONFIG_NFS_FS and CONFIG_NFSD
  schedstats/eas: guard properly to avoid breaking non-smp schedstats users
  BACKPORT: f2fs: sanity check size of nat and sit cache
  FROMLIST: f2fs: sanity check checkpoint segno and blkoff
  sched/tune: don't use schedtune before it is ready
  sched/fair: use SCHED_CAPACITY_SCALE for energy normalization
  sched/{fair,tune}: use reciprocal_value to compute boost margin
  sched/tune: Initialize raw_spin_lock in boosted_groups
  sched/tune: report when SchedTune has not been initialized
  sched/tune: fix sched_energy_diff tracepoint
  sched/tune: increase group count to 5
  cpufreq/schedutil: use boosted_cpu_util for PELT to match WALT
  sched/fair: Fix sched_group_energy() to support per-cpu capacity states
  sched/fair: discount task contribution to find CPU with lowest utilization
  sched/fair: ensure utilization signals are synchronized before use
  sched/fair: remove task util from own cpu when placing waking task
  trace:sched: Make util_avg in load_avg trace reflect PELT/WALT as used
  sched/fair: Add eas (& cas) specific rq, sd and task stats
  sched/core: Fix PELT jump to max OPP upon util increase
  sched: EAS & 'single cpu per cluster'/cpu hotplug interoperability
  UPSTREAM: sched/core: Fix group_entity's share update
  UPSTREAM: sched/fair: Fix calc_cfs_shares() fixed point arithmetics width confusion
  UPSTREAM: sched/fair: Fix incorrect task group ->load_avg
  UPSTREAM: sched/fair: Fix effective_load() to consistently use smoothed load
  UPSTREAM: sched/fair: Propagate asynchrous detach
  UPSTREAM: sched/fair: Propagate load during synchronous attach/detach
  UPSTREAM: sched/fair: Fix hierarchical order in rq->leaf_cfs_rq_list
  BACKPORT: sched/fair: Factorize PELT update
  UPSTREAM: sched/fair: Factorize attach/detach entity
  UPSTREAM: sched/fair: Improve PELT stuff some more
  UPSTREAM: sched/fair: Apply more PELT fixes
  UPSTREAM: sched/fair: Fix post_init_entity_util_avg() serialization
  BACKPORT: sched/fair: Initiate a new task's util avg to a bounded value
  sched/fair: Simplify idle_idx handling in select_idle_sibling()
  sched/fair: refactor find_best_target() for simplicity
  sched/fair: Change cpu iteration order in find_best_target()
  sched/core: Add first cpu w/ max/min orig capacity to root domain
  sched/core: Remove remnants of commit fd5c98da1a42
  sched: Remove sysctl_sched_is_big_little
  sched/fair: Code !is_big_little path into select_energy_cpu_brute()
  EAS: sched/fair: Re-integrate 'honor sync wakeups' into wakeup path
  Fixup!: sched/fair.c: Set SchedTune specific struct energy_env.task
  sched/fair: Energy-aware wake-up task placement
  sched/fair: Add energy_diff dead-zone margin
  sched/fair: Decommission energy_aware_wake_cpu()
  sched/fair: Do not force want_affine eq. true if EAS is enabled
  arm64: Set SD_ASYM_CPUCAPACITY sched_domain flag on DIE level
  UPSTREAM: sched/fair: Fix incorrect comment for capacity_margin
  UPSTREAM: sched/fair: Avoid pulling tasks from non-overloaded higher capacity groups
  UPSTREAM: sched/fair: Add per-CPU min capacity to sched_group_capacity
  UPSTREAM: sched/fair: Consider spare capacity in find_idlest_group()
  UPSTREAM: sched/fair: Compute task/cpu utilization at wake-up correctly
  UPSTREAM: sched/fair: Let asymmetric CPU configurations balance at wake-up
  UPSTREAM: sched/core: Enable SD_BALANCE_WAKE for asymmetric capacity systems
  UPSTREAM: sched/core: Pass child domain into sd_init()
  UPSTREAM: sched/core: Introduce SD_ASYM_CPUCAPACITY sched_domain topology flag
  UPSTREAM: sched/core: Remove unnecessary NULL-pointer check
  UPSTREAM: sched/fair: Optimize find_idlest_cpu() when there is no choice
  BACKPORT: sched/fair: Make the use of prev_cpu consistent in the wakeup path
  UPSTREAM: sched/core: Fix power to capacity renaming in comment
  Partial Revert: "WIP: sched: Add cpu capacity awareness to wakeup balancing"
  Revert "WIP: sched: Consider spare cpu capacity at task wake-up"
  FROM-LIST: cpufreq: schedutil: Redefine the rate_limit_us tunable
  cpufreq: schedutil: add up/down frequency transition rate limits
  trace/sched: add rq utilization signal for WALT
  sched/cpufreq: make schedutil use WALT signal
  sched: cpufreq: use rt_avg as estimate of required RT CPU capacity
  cpufreq: schedutil: move slow path from workqueue to SCHED_FIFO task
  BACKPORT: kthread: allow to cancel kthread work
  sched/cpufreq: fix tunables for schedfreq governor
  BACKPORT: cpufreq: schedutil: New governor based on scheduler utilization data
  sched: backport cpufreq hooks from 4.9-rc4
  ANDROID: Kconfig: add depends for UID_SYS_STATS
  ANDROID: hid: uhid: implement refcount for open and close
  Revert "ext4: require encryption feature for EXT4_IOC_SET_ENCRYPTION_POLICY"
  ANDROID: mnt: Fix next_descendent

Conflicts:
	include/trace/events/sched.h
	kernel/sched/Makefile
	kernel/sched/core.c
	kernel/sched/fair.c
	kernel/sched/sched.h

Change-Id: I55318828f2c858e192ac7015bcf2bf0ec5c5b2c5
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2017-06-19 16:59:55 -07:00
Greg Kroah-Hartman
e76c0faf11 This is the 4.4.72 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllBIXAACgkQONu9yGCS
 aT6T+w//VjXDZ+MddWJ4UeQDyIANYeFpa4tJNoqR3JsnT6yg1HODRZDR7aP5QJmN
 GIoRWU/2Q2nmYbAO0c8RPxs07w2xtIZzTUn+H+i6sG7bRs5RbLM5AMg4W/A/X88L
 V5c34kCvCf1HRfrdd4rXIZiibFnSZGqUv6o1YyQqCIvx15pyB6elMM714zt8uubk
 iL4/WJ2M4SrmamHWA349ldEtPjQKpwpwdBcCn+M4awbimdc0pm8oZqNkAfwJ+vLO
 HsuClO57I699ESU2Zt5bfEdVsW/gc7WiJOAr1Mrl2suToryrWfs2YT+sC/IQhkfC
 gUsi9Cm/6YMu+tiP4o6aqYvTFoFplFErpEbC3mqAEvHGGHKhrgEDotYJ+FnvI3q7
 Jaxix0B/Q/NIqsJPnqe5ONOCKFmW7rGR2e2j5+45GuiofioNVNF12HWfQkoItPOL
 YeR2JB8K9aywzYM4gaJuy8ScJ1shN8TY1FKgZa5gBT2ym4pDDcQmxz7Jr7agREHe
 F2sJ23zMU+o9guGA4Is2yqWCQ5yM+3kpPPISz+Pcgh8Q95o+ftCSyOeB2F5roW8I
 EO22AlJPlQH0LWDQhOJ5ZuAVe+qB8EdrQqqdLbP4/oHp7MtlR5ge+idRuZc+AUsa
 UoASccPsEwHyBErQmHoWNI4nPRciFrKliOqERmPLcuzewUwSatw=
 =wXRR
 -----END PGP SIGNATURE-----

Merge 4.4.72 into android-4.4

Changes in 4.4.72
	bnx2x: Fix Multi-Cos
	ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt()
	cxgb4: avoid enabling napi twice to the same queue
	tcp: disallow cwnd undo when switching congestion control
	vxlan: fix use-after-free on deletion
	ipv6: Fix leak in ipv6_gso_segment().
	net: ping: do not abuse udp_poll()
	net: ethoc: enable NAPI before poll may be scheduled
	net: bridge: start hello timer only if device is up
	sparc64: mm: fix copy_tsb to correctly copy huge page TSBs
	sparc: Machine description indices can vary
	sparc64: reset mm cpumask after wrap
	sparc64: combine activate_mm and switch_mm
	sparc64: redefine first version
	sparc64: add per-cpu mm of secondary contexts
	sparc64: new context wrap
	sparc64: delete old wrap code
	arch/sparc: support NR_CPUS = 4096
	serial: ifx6x60: fix use-after-free on module unload
	ptrace: Properly initialize ptracer_cred on fork
	KEYS: fix dereferencing NULL payload with nonzero length
	KEYS: fix freeing uninitialized memory in key_update()
	crypto: gcm - wait for crypto op not signal safe
	drm/amdgpu/ci: disable mclk switching for high refresh rates (v2)
	nfsd4: fix null dereference on replay
	nfsd: Fix up the "supattr_exclcreat" attributes
	kvm: async_pf: fix rcu_irq_enter() with irqs enabled
	KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation
	arm: KVM: Allow unaligned accesses at HYP
	KVM: async_pf: avoid async pf injection when in guest mode
	dmaengine: usb-dmac: Fix DMAOR AE bit definition
	dmaengine: ep93xx: Always start from BASE0
	xen/privcmd: Support correctly 64KB page granularity when mapping memory
	xen-netfront: do not cast grant table reference to signed short
	xen-netfront: cast grant table reference first to type int
	ext4: fix SEEK_HOLE
	ext4: keep existing extra fields when inode expands
	ext4: fix fdatasync(2) after extent manipulation operations
	usb: gadget: f_mass_storage: Serialize wake and sleep execution
	usb: chipidea: udc: fix NULL pointer dereference if udc_start failed
	usb: chipidea: debug: check before accessing ci_role
	staging/lustre/lov: remove set_fs() call from lov_getstripe()
	iio: light: ltr501 Fix interchanged als/ps register field
	iio: proximity: as3935: fix AS3935_INT mask
	drivers: char: random: add get_random_long()
	random: properly align get_random_int_hash
	stackprotector: Increase the per-task stack canary's random range from 32 bits to 64 bits on 64-bit platforms
	cpufreq: cpufreq_register_driver() should return -ENODEV if init fails
	target: Re-add check to reject control WRITEs with overflow data
	drm/msm: Expose our reservation object when exporting a dmabuf.
	Input: elantech - add Fujitsu Lifebook E546/E557 to force crc_enabled
	cpuset: consider dying css as offline
	fs: add i_blocksize()
	ufs: restore proper tail allocation
	fix ufs_isblockset()
	ufs: restore maintaining ->i_blocks
	ufs: set correct ->s_maxsize
	ufs_extend_tail(): fix the braino in calling conventions of ufs_new_fragments()
	ufs_getfrag_block(): we only grab ->truncate_mutex on block creation path
	cxl: Fix error path on bad ioctl
	btrfs: use correct types for page indices in btrfs_page_exists_in_range
	btrfs: fix memory leak in update_space_info failure path
	KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages
	scsi: qla2xxx: don't disable a not previously enabled PCI device
	powerpc/eeh: Avoid use after free in eeh_handle_special_event()
	powerpc/numa: Fix percpu allocations to be NUMA aware
	powerpc/hotplug-mem: Fix missing endian conversion of aa_index
	perf/core: Drop kernel samples even though :u is specified
	drm/vmwgfx: Handle vmalloc() failure in vmw_local_fifo_reserve()
	drm/vmwgfx: limit the number of mip levels in vmw_gb_surface_define_ioctl()
	drm/vmwgfx: Make sure backup_handle is always valid
	drm/nouveau/tmr: fully separate alarm execution/pending lists
	ALSA: timer: Fix race between read and ioctl
	ALSA: timer: Fix missing queue indices reset at SNDRV_TIMER_IOCTL_SELECT
	ASoC: Fix use-after-free at card unregistration
	drivers: char: mem: Fix wraparound check to allow mappings up to the end
	tty: Drop krefs for interrupted tty lock
	serial: sh-sci: Fix panic when serial console and DMA are enabled
	net: better skb->sender_cpu and skb->napi_id cohabitation
	mm: consider memblock reservations for deferred memory initialization sizing
	NFS: Ensure we revalidate attributes before using execute_ok()
	NFSv4: Don't perform cached access checks before we've OPENed the file
	Make __xfs_xattr_put_listen preperly report errors.
	arm64: hw_breakpoint: fix watchpoint matching for tagged pointers
	arm64: entry: improve data abort handling of tagged pointers
	RDMA/qib,hfi1: Fix MR reference count leak on write with immediate
	usercopy: Adjust tests to deal with SMAP/PAN
	arm64: armv8_deprecated: ensure extension of addr
	arm64: ensure extension of smp_store_release value
	Linux 4.4.72

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-06-14 16:33:25 +02:00
Jin Yao
e582b82c16 perf/core: Drop kernel samples even though :u is specified
commit cc1582c231ea041fbc68861dfaf957eaf902b829 upstream.

When doing sampling, for example:

  perf record -e cycles:u ...

On workloads that do a lot of kernel entry/exits we see kernel
samples, even though :u is specified. This is due to skid existing.

This might be a security issue because it can leak kernel addresses even
though kernel sampling support is disabled.

The patch drops the kernel samples if exclude_kernel is specified.

For example, test on Haswell desktop:

  perf record -e cycles:u <mgen>
  perf report --stdio

Before patch applied:

    99.77%  mgen     mgen              [.] buf_read
     0.20%  mgen     mgen              [.] rand_buf_init
     0.01%  mgen     [kernel.vmlinux]  [k] apic_timer_interrupt
     0.00%  mgen     mgen              [.] last_free_elem
     0.00%  mgen     libc-2.23.so      [.] __random_r
     0.00%  mgen     libc-2.23.so      [.] _int_malloc
     0.00%  mgen     mgen              [.] rand_array_init
     0.00%  mgen     [kernel.vmlinux]  [k] page_fault
     0.00%  mgen     libc-2.23.so      [.] __random
     0.00%  mgen     libc-2.23.so      [.] __strcasestr
     0.00%  mgen     ld-2.23.so        [.] strcmp
     0.00%  mgen     ld-2.23.so        [.] _dl_start
     0.00%  mgen     libc-2.23.so      [.] sched_setaffinity@@GLIBC_2.3.4
     0.00%  mgen     ld-2.23.so        [.] _start

We can see kernel symbols apic_timer_interrupt and page_fault.

After patch applied:

    99.79%  mgen     mgen           [.] buf_read
     0.19%  mgen     mgen           [.] rand_buf_init
     0.00%  mgen     libc-2.23.so   [.] __random_r
     0.00%  mgen     mgen           [.] rand_array_init
     0.00%  mgen     mgen           [.] last_free_elem
     0.00%  mgen     libc-2.23.so   [.] vfprintf
     0.00%  mgen     libc-2.23.so   [.] rand
     0.00%  mgen     libc-2.23.so   [.] __random
     0.00%  mgen     libc-2.23.so   [.] _int_malloc
     0.00%  mgen     libc-2.23.so   [.] _IO_doallocbuf
     0.00%  mgen     ld-2.23.so     [.] do_lookup_x
     0.00%  mgen     ld-2.23.so     [.] open_verify.constprop.7
     0.00%  mgen     ld-2.23.so     [.] _dl_important_hwcaps
     0.00%  mgen     libc-2.23.so   [.] sched_setaffinity@@GLIBC_2.3.4
     0.00%  mgen     ld-2.23.so     [.] _start

There are only userspace symbols.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Cc: kan.liang@intel.com
Cc: mark.rutland@arm.com
Cc: will.deacon@arm.com
Cc: yao.jin@intel.com
Link: http://lkml.kernel.org/r/1495706947-3744-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 13:16:25 +02:00
Tejun Heo
c8acec90d9 cpuset: consider dying css as offline
commit 41c25707d21716826e3c1f60967f5550610ec1c9 upstream.

In most cases, a cgroup controller don't care about the liftimes of
cgroups.  For the controller, a css becomes online when ->css_online()
is called on it and offline when ->css_offline() is called.

However, cpuset is special in that the user interface it exposes cares
whether certain cgroups exist or not.  Combined with the RCU delay
between cgroup removal and css offlining, this can lead to user
visible behavior oddities where operations which should succeed after
cgroup removals fail for some time period.  The effects of cgroup
removals are delayed when seen from userland.

This patch adds css_is_dying() which tests whether offline is pending
and updates is_cpuset_online() so that the function returns false also
while offline is pending.  This gets rid of the userland visible
delays.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Link: http://lkml.kernel.org/r/327ca1f5-7957-fbb9-9e5f-9ba149d40ba2@oracle.com
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 13:16:23 +02:00
Daniel Micay
2ff1edbbb2 stackprotector: Increase the per-task stack canary's random range from 32 bits to 64 bits on 64-bit platforms
commit 5ea30e4e58040cfd6434c2f33dc3ea76e2c15b05 upstream.

The stack canary is an 'unsigned long' and should be fully initialized to
random data rather than only 32 bits of random data.

Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Arjan van Ven <arjan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170504133209.3053-1-danielmicay@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 13:16:23 +02:00
Eric W. Biederman
c94bea2e4b ptrace: Properly initialize ptracer_cred on fork
commit c70d9d809fdeecedb96972457ee45c49a232d97f upstream.

When I introduced ptracer_cred I failed to consider the weirdness of
fork where the task_struct copies the old value by default.  This
winds up leaving ptracer_cred set even when a process forks and
the child process does not wind up being ptraced.

Because ptracer_cred is not set on non-ptraced processes whose
parents were ptraced this has broken the ability of the enlightenment
window manager to start setuid children.

Fix this by properly initializing ptracer_cred in ptrace_init_task

This must be done with a little bit of care to preserve the current value
of ptracer_cred when ptrace carries through fork.  Re-reading the
ptracer_cred from the ptracing process at this point is inconsistent
with how PT_PTRACE_CAP has been maintained all of these years.

Tested-by: Takashi Iwai <tiwai@suse.de>
Fixes: 64b875f7ac8a ("ptrace: Capture the ptracer's creds not PT_PTRACE_CAP")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 13:16:20 +02:00
Linux Build Service Account
3d12c58f77 Merge "sched: Fix the bug in select_best_cpu() that returns -1 as target_cpu" 2017-06-09 02:45:19 -07:00
John Dias
25e8ecf9da sched: avoid migrating when softint on tgt cpu should be short
The scheduling change (bug 31501544) to avoid putting RT threads on cores that
are handling softint's was catching cases where there was no reason
to believe the softint would take a long time, resulting in unnecessary
migration overhead. This patch reduces the migration to cases where
the core has a softint that is actually likely to take a long time,
as opposed to the RCU, SCHED, and TIMER softints that are rather quick.
Bug: 31752786

Change-Id: Ib4e179f1e15c736b2fdba31070494e357e9fbbe2
Git-commit: ce05770bd37b8065b61ef650108ecef2b97b148b
Git-repo: https://android.googlesource.com/kernel/msm
[pkondeti@codeaurora.org: resolved minor merge conflicts]
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2017-06-09 15:14:07 +05:30
John Dias
c3544e35ef sched: avoid scheduling RT threads on cores currently handling softirqs
Bug: 31501544
Change-Id: I99dd7aaa12c11270b28dbabea484bcc8fb8ba0c1
Git-commit: 080ea011fd9f47315e1fc53185872ef813b59d00
Git-repo: https://android.googlesource.com/kernel/msm
[pkondeti@codeaurora.org: resolved minor merge conflicts and fixed
checkpatch warnings]
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2017-06-09 15:14:07 +05:30
Linux Build Service Account
bc22546551 Merge "Merge branch 'android-4.4@9bc4622' into branch 'msm-4.4'" 2017-06-08 19:03:18 -07:00
Pavankumar Kondeti
a761ae8501 sched: Fix the bug in select_best_cpu() that returns -1 as target_cpu
select_best_cpu() has previous CPU's cluster bias which overrides
the best_cpu with best_sibling_cpu when the power cost is same.
When the power table is configured incorrectly or static_cpu_pwr_cost/
static_cluster_pwr_cost tunables are set to a large value, the
power_cost() for all candidate CPUs can return INT_MAX. So the
stats.min_cost is never changed from it's initial value i.e INT_MAX.

In the above scenario, we find stats.best_cpu >= 0 &&  stats.min_cost =
stats.best_sibling_cpu_cost = INT_MAX && stats.best_sibling_cpu_cost = -1
and replace best_cpu with best_sibling_cpu i.e -1.

Change-Id: I09829e278e41daaaff959428ff50927aba29104c
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2017-06-09 06:40:02 +05:30
Linux Build Service Account
e3954e1572 Merge "drivers: Warning fixes to disable CC_OPTIMIZE_FOR_SIZE" 2017-06-07 13:13:31 -07:00
Blagovest Kolenichev
2025064255 Merge branch 'android-4.4@9bc4622' into branch 'msm-4.4'
* refs/heads/tmp-9bc4622:
  Linux 4.4.70
  drivers: char: mem: Check for address space wraparound with mmap()
  nfsd: encoders mustn't use unitialized values in error cases
  drm/edid: Add 10 bpc quirk for LGD 764 panel in HP zBook 17 G2
  PCI: Freeze PME scan before suspending devices
  PCI: Fix pci_mmap_fits() for HAVE_PCI_RESOURCE_TO_USER platforms
  tracing/kprobes: Enforce kprobes teardown after testing
  osf_wait4(): fix infoleak
  genirq: Fix chained interrupt data ordering
  uwb: fix device quirk on big-endian hosts
  metag/uaccess: Check access_ok in strncpy_from_user
  metag/uaccess: Fix access_ok()
  iommu/vt-d: Flush the IOTLB to get rid of the initial kdump mappings
  staging: rtl8192e: rtl92e_get_eeprom_size Fix read size of EPROM_CMD.
  staging: rtl8192e: fix 2 byte alignment of register BSSIDR.
  mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp
  xc2028: Fix use-after-free bug properly
  arm64: documentation: document tagged pointer stack constraints
  arm64: uaccess: ensure extension of access_ok() addr
  arm64: xchg: hazard against entire exchange variable
  ARM: dts: at91: sama5d3_xplained: not all ADC channels are available
  ARM: dts: at91: sama5d3_xplained: fix ADC vref
  powerpc/64e: Fix hang when debugging programs with relocated kernel
  powerpc/pseries: Fix of_node_put() underflow during DLPAR remove
  powerpc/book3s/mce: Move add_taint() later in virtual mode
  cx231xx-cards: fix NULL-deref at probe
  cx231xx-audio: fix NULL-deref at probe
  cx231xx-audio: fix init error path
  dvb-frontends/cxd2841er: define symbol_rate_min/max in T/C fe-ops
  zr364xx: enforce minimum size when reading header
  dib0700: fix NULL-deref at probe
  s5p-mfc: Fix unbalanced call to clock management
  gspca: konica: add missing endpoint sanity check
  ceph: fix recursion between ceph_set_acl() and __ceph_setattr()
  iio: proximity: as3935: fix as3935_write
  ipx: call ipxitf_put() in ioctl error path
  USB: hub: fix non-SS hub-descriptor handling
  USB: hub: fix SS hub-descriptor handling
  USB: serial: io_ti: fix div-by-zero in set_termios
  USB: serial: mct_u232: fix big-endian baud-rate handling
  USB: serial: qcserial: add more Lenovo EM74xx device IDs
  usb: serial: option: add Telit ME910 support
  USB: iowarrior: fix info ioctl on big-endian hosts
  usb: musb: tusb6010_omap: Do not reset the other direction's packet size
  ttusb2: limit messages to buffer size
  mceusb: fix NULL-deref at probe
  usbvision: fix NULL-deref at probe
  net: irda: irda-usb: fix firmware name on big-endian hosts
  usb: host: xhci-mem: allocate zeroed Scratchpad Buffer
  xhci: apply PME_STUCK_QUIRK and MISSING_CAS quirk for Denverton
  usb: host: xhci-plat: propagate return value of platform_get_irq()
  sched/fair: Initialize throttle_count for new task-groups lazily
  sched/fair: Do not announce throttled next buddy in dequeue_task_fair()
  fscrypt: avoid collisions when presenting long encrypted filenames
  f2fs: check entire encrypted bigname when finding a dentry
  fscrypt: fix context consistency check when key(s) unavailable
  net: qmi_wwan: Add SIMCom 7230E
  ext4 crypto: fix some error handling
  ext4 crypto: don't let data integrity writebacks fail with ENOMEM
  USB: serial: ftdi_sio: add Olimex ARM-USB-TINY(H) PIDs
  USB: serial: ftdi_sio: fix setting latency for unprivileged users
  pid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes()
  pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes
  iio: dac: ad7303: fix channel description
  of: fix sparse warning in of_pci_range_parser_one
  proc: Fix unbalanced hard link numbers
  cdc-acm: fix possible invalid access when processing notification
  drm/nouveau/tmr: handle races with hw when updating the next alarm time
  drm/nouveau/tmr: avoid processing completed alarms when adding a new one
  drm/nouveau/tmr: fix corruption of the pending list when rescheduling an alarm
  drm/nouveau/tmr: ack interrupt before processing alarms
  drm/nouveau/therm: remove ineffective workarounds for alarm bugs
  drm/amdgpu: Make display watermark calculations more accurate
  drm/amdgpu: Avoid overflows/divide-by-zero in latency_watermark calculations.
  ath9k_htc: fix NULL-deref at probe
  ath9k_htc: Add support of AirTies 1eda:2315 AR9271 device
  s390/cputime: fix incorrect system time
  s390/kdump: Add final note
  regulator: tps65023: Fix inverted core enable logic.
  KVM: X86: Fix read out-of-bounds vulnerability in kvm pio emulation
  KVM: x86: Fix load damaged SSEx MXCSR register
  ima: accept previously set IMA_NEW_FILE
  mwifiex: pcie: fix cmd_buf use-after-free in remove/reset
  rtlwifi: rtl8821ae: setup 8812ae RFE according to device type
  md: update slab_cache before releasing new stripes when stripes resizing
  dm space map disk: fix some book keeping in the disk space map
  dm thin metadata: call precommit before saving the roots
  dm bufio: make the parameter "retain_bytes" unsigned long
  dm cache metadata: fail operations if fail_io mode has been established
  dm bufio: check new buffer allocation watermark every 30 seconds
  dm bufio: avoid a possible ABBA deadlock
  dm raid: select the Kconfig option CONFIG_MD_RAID0
  dm btree: fix for dm_btree_find_lowest_key()
  infiniband: call ipv6 route lookup via the stub interface
  tpm_crb: check for bad response size
  ARM: tegra: paz00: Mark panel regulator as enabled on boot
  USB: core: replace %p with %pK
  char: lp: fix possible integer overflow in lp_setup()
  watchdog: pcwd_usb: fix NULL-deref at probe
  USB: ene_usb6250: fix DMA to the stack
  usb: misc: legousbtower: Fix memory leak
  usb: misc: legousbtower: Fix buffers on stack
  ANDROID: uid_sys_stats: defer io stats calulation for dead tasks
  ANDROID: AVB: Fix linter errors.
  ANDROID: AVB: Fix invalidate_vbmeta_submit().
  ANDROID: sdcardfs: Check for NULL in revalidate
  Linux 4.4.69
  ipmi: Fix kernel panic at ipmi_ssif_thread()
  wlcore: Add RX_BA_WIN_SIZE_CHANGE_EVENT event
  wlcore: Pass win_size taken from ieee80211_sta to FW
  mac80211: RX BA support for sta max_rx_aggregation_subframes
  mac80211: pass block ack session timeout to to driver
  mac80211: pass RX aggregation window size to driver
  Bluetooth: hci_intel: add missing tty-device sanity check
  Bluetooth: hci_bcm: add missing tty-device sanity check
  Bluetooth: Fix user channel for 32bit userspace on 64bit kernel
  tty: pty: Fix ldisc flush after userspace become aware of the data already
  serial: omap: suspend device on probe errors
  serial: omap: fix runtime-pm handling on unbind
  serial: samsung: Use right device for DMA-mapping calls
  arm64: KVM: Fix decoding of Rt/Rt2 when trapping AArch32 CP accesses
  padata: free correct variable
  CIFS: add misssing SFM mapping for doublequote
  cifs: fix CIFS_IOC_GET_MNT_INFO oops
  CIFS: fix mapping of SFM_SPACE and SFM_PERIOD
  SMB3: Work around mount failure when using SMB3 dialect to Macs
  Set unicode flag on cifs echo request to avoid Mac error
  fs/block_dev: always invalidate cleancache in invalidate_bdev()
  ceph: fix memory leak in __ceph_setxattr()
  fs/xattr.c: zero out memory copied to userspace in getxattr
  ext4: evict inline data when writing to memory map
  IB/mlx4: Reduce SRIOV multicast cleanup warning message to debug level
  IB/mlx4: Fix ib device initialization error flow
  IB/IPoIB: ibX: failed to create mcg debug file
  IB/core: Fix sysfs registration error flow
  vfio/type1: Remove locked page accounting workqueue
  dm era: save spacemap metadata root after the pre-commit
  crypto: algif_aead - Require setkey before accept(2)
  block: fix blk_integrity_register to use template's interval_exp if not 0
  KVM: arm/arm64: fix races in kvm_psci_vcpu_on
  KVM: x86: fix user triggerable warning in kvm_apic_accept_events()
  um: Fix PTRACE_POKEUSER on x86_64
  x86, pmem: Fix cache flushing for iovec write < 8 bytes
  selftests/x86/ldt_gdt_32: Work around a glibc sigaction() bug
  x86/boot: Fix BSS corruption/overwrite bug in early x86 kernel startup
  usb: hub: Do not attempt to autosuspend disconnected devices
  usb: hub: Fix error loop seen after hub communication errors
  usb: Make sure usb/phy/of gets built-in
  usb: misc: add missing continue in switch
  staging: comedi: jr3_pci: cope with jiffies wraparound
  staging: comedi: jr3_pci: fix possible null pointer dereference
  staging: gdm724x: gdm_mux: fix use-after-free on module unload
  staging: vt6656: use off stack for out buffer USB transfers.
  staging: vt6656: use off stack for in buffer USB transfers.
  USB: Proper handling of Race Condition when two USB class drivers try to call init_usb_class simultaneously
  USB: serial: ftdi_sio: add device ID for Microsemi/Arrow SF2PLUS Dev Kit
  usb: host: xhci: print correct command ring address
  iscsi-target: Set session_fall_back_to_erl0 when forcing reinstatement
  target: Convert ACL change queue_depth se_session reference usage
  target/fileio: Fix zero-length READ and WRITE handling
  target: Fix compare_and_write_callback handling for non GOOD status
  xen: adjust early dom0 p2m handling to xen hypervisor behavior
  ANDROID: AVB: Only invalidate vbmeta when told to do so.
  ANDROID: sdcardfs: Move top to its own struct
  ANDROID: lowmemorykiller: account for unevictable pages
  ANDROID: usb: gadget: fix NULL pointer issue in mtp_read()
  ANDROID: usb: f_mtp: return error code if transfer error in receive_file_work function

Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>

Conflicts:
	drivers/usb/gadget/function/f_mtp.c
	fs/ext4/page-io.c
	net/mac80211/agg-rx.c

Change-Id: Id65e75bf3bcee4114eb5d00730a9ef2444ad58eb
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2017-06-07 09:31:32 -07:00
Linux Build Service Account
fb0ac8cd2b Merge "hrtimer: Don't drop the base lock when migration during isolation" 2017-06-06 21:33:06 -07:00
Linux Build Service Account
1d5844ba9d Merge "sched: hmp: Optimize cycle counter reads" 2017-06-06 13:21:50 -07:00
Linux Build Service Account
6ed51e0bab Merge "sched: Don't active migrate tasks to CPUs in the same cluster" 2017-06-06 13:21:48 -07:00
Linux Build Service Account
0d1b465cb8 Merge "sched: Fix load tracking bug to avoid adding phantom task demand" 2017-06-06 13:21:39 -07:00
Chris Redpath
fce0ecf04a schedstats/eas: guard properly to avoid breaking non-smp schedstats users
Add appropriate #ifdef guards to ensure the smp-only easstats structs
are not used when smp is not enabled. Arnd got a report from buildbot,
analysed it, and pointed out exactly what the issue was.

Reported-by: "Arnd Bergmann" <arnd@arndb.de>
Suggested-by: "Arnd Bergmann" <arnd@arndb.de>
Fixes: 4b85765a3d ("sched/fair: Add eas (& cas)
 specific rq, sd and task stats")
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: I60554dea20137f6774db3f59b4afd40a06554cfc
2017-06-03 15:03:03 +01:00
Chris Redpath
c47d00b57b sched/tune: don't use schedtune before it is ready
When EAS is enabled during boot, we have to be careful not to use
schedtune from fair.c before it is ready or it will warn us and we'll
get a traceback in the console.

Change-Id: I1a5cf29b18af626545c636c51219f9ed497c19fa
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:55 -07:00
Patrick Bellasi
9e3c04bef7 sched/fair: use SCHED_CAPACITY_SCALE for energy normalization
Change-Id: I686d26975f4a7dd830ff8441ff986e35461a7d55
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Srinath Sridharan <srinathsr@google.com>
2017-06-02 08:01:55 -07:00
Patrick Bellasi
7b8577d94c sched/{fair,tune}: use reciprocal_value to compute boost margin
Change-Id: I493b07360c46eee0b72c2a046dab9ec6cb3427ef
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Srinath Sridharan <srinathsr@google.com>
2017-06-02 08:01:55 -07:00
Srinath Sridharan
41d9288e3e sched/tune: Initialize raw_spin_lock in boosted_groups
bug: 32668852
Change-Id: Ice96230d88939d5973b1b6310085d1b3df9c47d9
Signed-off-by: Srinath Sridharan <srinathsr@google.com>
2017-06-02 08:01:55 -07:00
Patrick Bellasi
3757f95741 sched/tune: report when SchedTune has not been initialized
Change-Id: Iba4e5e3d220451f04272d555e6b8e0af83a7f09d
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Srinath Sridharan <srinathsr@google.com>
2017-06-02 08:01:55 -07:00
Chris Redpath
f9b83b3e6e sched/tune: fix sched_energy_diff tracepoint
sched_energy_diff tracepoint is in a place where it can never trace
payoff or nrg.delta. If CONFIG_SCHED_TUNE is enabled, put it in
a place where those values exist. If it is not enabled, trace from
the current location

Change-Id: Id5442f2b34ec76625491d27c0f4285433ca12699
Reported-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:55 -07:00
Chris Redpath
2e829cf17f sched/tune: increase group count to 5
We use 5 groups everywhere else, this should default to the same.

Change-Id: I05a20bdcf8046ea90a2e36979940cef11246e735
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2017-06-02 08:01:55 -07:00
Chris Redpath
4c031f0e6f cpufreq/schedutil: use boosted_cpu_util for PELT to match WALT
When using WALT we always used boosted cpu util for OPP selection.
This is the primary purpose for boosted cpu util, but we hadn't
changed the PELT utilization check to do the same thing.

Fix that here.

Change-Id: Id5ffb26eac23b25fe754255221f6d21b8cededfd
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:55 -07:00
Morten Rasmussen
fc969e3bfa sched/fair: Fix sched_group_energy() to support per-cpu capacity states
sched_group_energy() was supposed to support per-cpu capacity states
(DVFS), however, while fixing a hotplug issue this was broken as we bail
out if there is no SD_SHARE_CAP_STATES flag set.

This patch implements the hotplug race check differently and should
therefore reinstate support for per-cpu capacity states.

Change-Id: I5b865666c9ce833dcfa6514c574580d75aa0a195
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
2017-06-02 08:01:55 -07:00
Valentin Schneider
fef0112a63 sched/fair: discount task contribution to find CPU with lowest utilization
In some cases, the new_util of a task can be the same on several
CPUs. This causes an issue because the target_util is only updated
if the current new_util is strictly smaller than target_util.

To fix that, the cpu_util_wake() return value is used alongside the
new_util value. If two CPUs compute the same new_util value,
we'll now also look at their cpu_util_wake() return value. In this
case, the CPU that last ran the task will be chosen in priority.

Change-Id: Ia1ea2c4b3ec39621372c2f748862317d5b497723
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
2017-06-02 08:01:54 -07:00
Chris Redpath
83f462daa3 sched/fair: ensure utilization signals are synchronized before use
wake_cap performs task and cpu utilization synchronization which is
what allows us to subtract current task util from prev_cpu util and
have a sensible number to work with.

It looks as though if wake_wide returns 0, we could potentially not
execute wake_cap, which would result in unsynced signals we then use
for energy calculations.

This is not necessarily an issue we've seen in traces, but it looks
as though it should be changed.

Change-Id: Ic54a3cba2a10d946ea20113a04371dea04115e82
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:54 -07:00
Chris Redpath
8865f07600 sched/fair: remove task util from own cpu when placing waking task
When we place a waking task with find_best_target, we calculate the
existing and new utilisation of each candidate cpu. However, we do
not remove any blocked load resulting from the waking task on the
previous cpu which might cause unnecessary migrations.

Switch to using cpu_util_wake which does this for us, which requires
moving cpu_util_wake a few functions earlier.

Also, we have multiple potential cpu utilization signals here, so
update the necessary bits to allow WALT to work properly (including
not subtracting task util for WALT).

When WALT is in use, cpu utilization is the utilization
in the previous completed window, whilst the task utilization
ignores fully idle windows. There seems to be no way to have a
decently accurate estimate of how much (if any) utilization from
this task remains on the prev cpu.

Instead, just return cpu_util when we're using WALT.

Change-Id: I448203ab98ffb5c020dfb6b218581eef1f5601f7
Reported-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:54 -07:00
Chris Redpath
8ac52cbaf4 trace:sched: Make util_avg in load_avg trace reflect PELT/WALT as used
With the ability to choose between WALT and PELT for utilisation tracking
we can have the situation where we're using WALT to make all the
decisions and reporting PELT figures in the sched_load_avg_(cpu|task)
trace points. This is not too much of an issue, but when analysing trace
it is nice to see numbers representing what the scheduler is using rather
than needing to add in additional sched_walt_* traces to figure it out.

Add reporting for both types, and make the util_avg member reflect what
will be seen from cpu or task_util functions in the scheduler.

Change-Id: I2abbd2c5fa70822096d0f3372b4c12b1c6af1590
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:54 -07:00
Dietmar Eggemann
4b85765a3d sched/fair: Add eas (& cas) specific rq, sd and task stats
The statistic counter are placed in the eas (& cas) wakeup path. Each
of them has one representation for the runqueue (rq), the sched_domain
(sd) and the task.
A task counter is always incremented. A rq counter is always
incremented for the rq the scheduler is currently running on. A sd
counter is only incremented if a relation to a sd exists.

The counters are exposed:

(1) In /proc/schedstat for rq's and sd's:

$ cat /proc/schedstat
...
cpu0 71422 0 2321254 ...
eas  44144 0 0 19446 0 24698 568435 51621 156932 133 222011 17459 120279 516814 83 0 156962 359235 176439 139981
  <- runqueue for cpu0
...
domain0 3 42430 42331 ...
eas 0 0 0 14200 0 0 0 0 0 0 0 0 0 0 0 0 0 0 66355 0  <- MC sched domain for cpu0
...

The per-cpu eas vector has the following elements:

sis_attempts  sis_idle   sis_cache_affine sis_suff_cap    sis_idle_cpu    sis_count               ||
secb_attempts secb_sync  secb_idle_bt     secb_insuff_cap secb_no_nrg_sav secb_nrg_sav secb_count ||
fbt_attempts  fbt_no_cpu fbt_no_sd        fbt_pref_idle   fbt_count                               ||
cas_attempts  cas_count

The following relations exist between these counters (from cpu0 eas
vector above):

sis_attempts = sis_idle + sis_cache_affine + sis_suff_cap + sis_idle_cpu + sis_count

44144        = 0        + 0                + 19446        + 0            + 24698

secb_attempts = secb_sync + secb_idle_bt + secb_insuff_cap + secb_no_nrg_sav + secb_nrg_sav + secb_count

568435        = 51621     + 156932       + 133             + 222011          + 17459        + 120279

fbt_attempts = fbt_no_cpu + fbt_no_sd + fbt_pref_idle + fbt_count + (return -1)

516814       = 83         + 0         + 156962        + 359235    + (534)

cas_attempts = cas_count + (return -1 or smp_processor_id())

176439       = 139981    + (36458)

(2) In /proc/$PROCESS_PID/task/$TASK_PID/sched for a task.

example: main thread of system_server

$ cat /proc/1083/task/1083/sched

...
se.statistics.nr_wakeups_sis_attempts        :                  945
se.statistics.nr_wakeups_sis_idle            :                    0
se.statistics.nr_wakeups_sis_cache_affine    :                    0
se.statistics.nr_wakeups_sis_suff_cap        :                  219
se.statistics.nr_wakeups_sis_idle_cpu        :                    0
se.statistics.nr_wakeups_sis_count           :                  726
se.statistics.nr_wakeups_secb_attempts       :                10376
se.statistics.nr_wakeups_secb_sync           :                 1462
se.statistics.nr_wakeups_secb_idle_bt        :                 6984
se.statistics.nr_wakeups_secb_insuff_cap     :                    3
se.statistics.nr_wakeups_secb_no_nrg_sav     :                  927
se.statistics.nr_wakeups_secb_nrg_sav        :                  206
se.statistics.nr_wakeups_secb_count          :                  794
se.statistics.nr_wakeups_fbt_attempts        :                 8914
se.statistics.nr_wakeups_fbt_no_cpu          :                    0
se.statistics.nr_wakeups_fbt_no_sd           :                    0
se.statistics.nr_wakeups_fbt_pref_idle       :                 6987
se.statistics.nr_wakeups_fbt_count           :                 1554
se.statistics.nr_wakeups_cas_attempts        :                 3107
se.statistics.nr_wakeups_cas_count           :                 1195
...

The same relation between the counters as in the per-cpu case apply.

Change-Id: Ie7d01267c78a3f41f60a3ef52917d5a5d463f195
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:54 -07:00
Andres Oportus
aa8882923a sched/core: Fix PELT jump to max OPP upon util increase
Change-Id: Ic80b588ec466ef707f658dcea039fd0d6b384b63
Signed-off-by: Andres Oportus <andresoportus@google.com>
2017-06-02 08:01:54 -07:00
Dietmar Eggemann
55af384815 sched: EAS & 'single cpu per cluster'/cpu hotplug interoperability
For Energy-Aware Scheduling (EAS) to work properly, even in the
case that there is only one cpu per cluster or that cpus are hot-plugged
out, the Energy Model (EM) data on all energy-aware sched domains (sd)
has to be present for all online cpus.

Mainline sd hierarchy setup code will remove sd's which are not useful
for task scheduling e.g. in the following situations:

1. Only 1 cpu is/remains in one cluster of a multi cluster system.

   This remaining cpu only has DIE and no MC sd.

2. A complete cluster in a two cluster system is hot-plugged out.

   The cpus of the remaining cluster only have MC and no DIE sd.

To make sure that all online cpus keep all their energy-aware sd's,
the sd degenerate functionality has been changed to not free a sd if
its first sched group (sg) contains EM data in case:

1. There is only 1 cpu left in the sd.

2. There have to be at least 2 sg's if certain sd flags are set.

Instead of freeing such a sd it now clears only its SD_LOAD_BALANCE
flag. This will make sure that the EAS functionality will always see
all energy-aware sd's for all online cpus.

It will introduce a tiny performance degradation for operations on
affected cpus since the hot-path macro for_each_domain() has to deal
with sd's not contributing to task scheduling at all now.

In most cases the exisiting code makes sure that task scheduling is not
invoked on a sd with !SD_LOAD_BALANCE.

However, a small change is necessary in update_sd_lb_stats() to make
sure that sd->parent is only initialized to !NULL in case the parent sd
contains more than 1 sg.

The handling of newidle decay values before the SD_LOAD_BALANCE check in
rebalance_domains() stays unchanged.

Test (w/ CONFIG_SCHED_DEBUG):

JUNO r0 default system:

$ cat /proc/cpuinfo | grep "^CPU part"
CPU part        : 0xd03
CPU part        : 0xd07
CPU part        : 0xd07
CPU part        : 0xd03
CPU part        : 0xd03
CPU part        : 0xd03

SD names and flags:

$ cat /proc/sys/kernel/sched_domain/cpu*/domain*/name
MC
DIE
MC
DIE
MC
DIE
MC
DIE
MC
DIE
MC
DIE

$ printf "%x\n" `cat /proc/sys/kernel/sched_domain/cpu*/domain*/flags`
832f
102f
832f
102f
832f
102f
832f
102f
832f
102f
832f
102f

Test 1: Hotplug-out one A57 (CPU part 0xd07) cpu:

$ echo 0 > /sys/devices/system/cpu/cpu1/online

$ cat /proc/cpuinfo | grep "^CPU part"
CPU part        : 0xd03
CPU part        : 0xd07
CPU part        : 0xd03
CPU part        : 0xd03
CPU part        : 0xd03

SD names and flags for remaining A57 (cpu2) cpu:

$ cat /proc/sys/kernel/sched_domain/cpu2/domain*/name
MC
DIE

$ printf "%x\n" `cat /proc/sys/kernel/sched_domain/cpu2/domain*/flags`
832e <-- MC SD with !SD_LOAD_BALANCE
102f

Test 2: Hotplug-out the entire A57 cluster:

$ echo 0 > /sys/devices/system/cpu/cpu1/online
$ echo 0 > /sys/devices/system/cpu/cpu2/online

$ cat /proc/cpuinfo | grep "^CPU part"
CPU part        : 0xd03
CPU part        : 0xd03
CPU part        : 0xd03
CPU part        : 0xd03

SD names and flags for the remaining A53 (CPU part 0xd03) cluster:

$ cat /proc/sys/kernel/sched_domain/cpu*/domain*/name
MC
DIE
MC
DIE
MC
DIE
MC
DIE

$ printf "%x\n" `cat /proc/sys/kernel/sched_domain/cpu*/domain*/flags`
832f
102e <-- DIE SD with !SD_LOAD_BALANCE
832f
102e
832f
102e
832f
102e

Change-Id: If24aa2b2628f334abbf0207d39e2a86168d9d673
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
2017-06-02 08:01:54 -07:00
Vincent Guittot
e62a1ca36b UPSTREAM: sched/core: Fix group_entity's share update
The update of the share of a cfs_rq is done when its load_avg is updated
but before the group_entity's load_avg has been updated for the past time
slot. This generates wrong load_avg accounting which can be significant
when small tasks are involved in the scheduling.

Let take the example of a task a that is dequeued of its task group A:
   root
  (cfs_rq)
    \
    (se)
     A
    (cfs_rq)
      \
      (se)
       a

Task "a" was the only task in task group A which becomes idle when a is
dequeued.

We have the sequence:

- dequeue_entity a->se
    - update_load_avg(a->se)
    - dequeue_entity_load_avg(A->cfs_rq, a->se)
    - update_cfs_shares(A->cfs_rq)
	A->cfs_rq->load.weight == 0
        A->se->load.weight is updated with the new share (0 in this case)
- dequeue_entity A->se
    - update_load_avg(A->se) but its weight is now null so the last time
      slot (up to a tick) will be accounted with a weight of 0 instead of
      its real weight during the time slot. The last time slot will be
      accounted as an idle one whereas it was a running one.

If the running time of task a is short enough that no tick happens when it
runs, all running time of group entity A->se will be accounted as idle
time.

Instead, we should update the share of a cfs_rq (in fact the weight of its
group entity) only after having updated the load_avg of the group_entity.

update_cfs_shares() now takes the sched_entity as a parameter instead of the
cfs_rq, and the weight of the group_entity is updated only once its load_avg
has been synced with current time.

Change-Id: Id6ce3be1767b44b444ce2a77ed1ba063e57c0664
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: pjt@google.com
Link: http://lkml.kernel.org/r/1482335426-7664-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 89ee048f3cc796db6f26906c6bef4edf0bee70fd)
[minor cherry pick stuff]
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:54 -07:00
Peter Zijlstra
baaa21b59b UPSTREAM: sched/fair: Fix calc_cfs_shares() fixed point arithmetics width confusion
Commit:

  fde7d22e01 ("sched/fair: Fix overly small weight for interactive group entities")

did something non-obvious but also did it buggy yet latent.

The problem was exposed for real by a later commit in the v4.7 merge window:

  2159197d6677 ("sched/core: Enable increased load resolution on 64-bit kernels")

... after which tg->load_avg and cfs_rq->load.weight had different
units (10 bit fixed point and 20 bit fixed point resp.).

Add a comment to explain the use of cfs_rq->load.weight over the
'natural' cfs_rq->avg.load_avg and add scale_load_down() to correct
for the difference in unit.

Since this is (now, as per a previous commit) the only user of
calc_tg_weight(), collapse it.

The effects of this bug should be randomly inconsistent SMP-balancing
of cgroups workloads.

Change-Id: If1e565662ea163485edd94a12aef644d0e0dfe7a
Reported-by: Jirka Hladky <jhladky@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 2159197d6677 ("sched/core: Enable increased load resolution on 64-bit kernels")
Fixes: fde7d22e01 ("sched/fair: Fix overly small weight for interactive group entities")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit ea1dc6fc6242f991656e35e2ed3d90ec1cd13418)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:54 -07:00
Vincent Guittot
20bbd92679 UPSTREAM: sched/fair: Fix incorrect task group ->load_avg
A scheduler performance regression has been reported by Joseph Salisbury,
which he bisected back to:

  3d30544f0212 ("sched/fair: Apply more PELT fixes)

The regression triggers when several levels of task groups are involved
(read: SystemD) and cpu_possible_mask != cpu_present_mask.

The root cause is that group entity's load (tg_child->se[i]->avg.load_avg)
is initialized to scale_load_down(se->load.weight). During the creation of
a child task group, its group entities on possible CPUs are attached to
parent's cfs_rq (tg_parent) and their loads are added to the parent's load
(tg_parent->load_avg) with update_tg_load_avg().

But only the load on online CPUs will then be updated to reflect real load,
whereas load on other CPUs will stay at the initial value.

The result is a tg_parent->load_avg that is higher than the real load, the
weight of group entities (tg_parent->se[i]->load.weight) on online CPUs is
smaller than it should be, and the task group gets a less running time than
what it could expect.

( This situation can be detected with /proc/sched_debug. The ".tg_load_avg"
  of the task group will be much higher than sum of ".tg_load_avg_contrib"
  of online cfs_rqs of the task group. )

The load of group entities don't have to be intialized to something else
than 0 because their load will increase when an entity is attached.

Change-Id: Ie55021ff98ba49016adfddb2444e9c9709939226
Reported-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@vger.kernel.org> # 4.8.x
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: joonwoop@codeaurora.org
Fixes: 3d30544f0212 ("sched/fair: Apply more PELT fixes)
Link: http://lkml.kernel.org/r/1476881123-10159-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit b5a9b340789b2b24c6896bcf7a065c31a4db671c)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:54 -07:00
Peter Zijlstra
640c909c34 UPSTREAM: sched/fair: Fix effective_load() to consistently use smoothed load
Starting with the following commit:

  fde7d22e01 ("sched/fair: Fix overly small weight for interactive group entities")

calc_tg_weight() doesn't compute the right value as expected by effective_load().

The difference is in the 'correction' term. In order to ensure \Sum
rw_j >= rw_i we cannot use tg->load_avg directly, since that might be
lagging a correction on the current cfs_rq->avg.load_avg value.
Therefore we use tg->load_avg - cfs_rq->tg_load_avg_contrib +
cfs_rq->avg.load_avg.

Now, per the referenced commit, calc_tg_weight() doesn't use
cfs_rq->avg.load_avg, as is later used in @w, but uses
cfs_rq->load.weight instead.

So stop using calc_tg_weight() and do it explicitly.

The effects of this bug are wake_affine() making randomly
poor choices in cgroup-intense workloads.

Change-Id: I1c0058ff674650cf295c8dc3b88a5a3de4bddab0
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org> # v4.3+
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: fde7d22e01 ("sched/fair: Fix overly small weight for interactive group entities")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 7dd4912594daf769a46744848b05bd5bc6d62469)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:54 -07:00
Vincent Guittot
89e4d18a67 UPSTREAM: sched/fair: Propagate asynchrous detach
A task can be asynchronously detached from cfs_rq when migrating
between CPUs. The load of the migrated task is then removed from
source cfs_rq during its next update. We use this event to set
propagation flag.

During the load balance, we take advantage of the update of blocked
load to propagate any pending changes.

The propagation relies on patch:

  "sched: Fix hierarchical order in rq->leaf_cfs_rq_list"

... which orders children and parents, to ensure that it's done in one pass.

Change-Id: I33782e35fc4711f5901e8c23d6aa7ec5f2ff7ee5
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten.Rasmussen@arm.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: kernellwp@gmail.com
Cc: pjt@google.com
Cc: yuyang.du@intel.com
Link: http://lkml.kernel.org/r/1478598827-32372-6-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 4e5160766fcc9f41bbd38bac11f92dce993644aa)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:54 -07:00
Vincent Guittot
e875665411 UPSTREAM: sched/fair: Propagate load during synchronous attach/detach
When a task moves from/to a cfs_rq, we set a flag which is then used to
propagate the change at parent level (sched_entity and cfs_rq) during
next update. If the cfs_rq is throttled, the flag will stay pending until
the cfs_rq is unthrottled.

For propagating the utilization, we copy the utilization of group cfs_rq to
the sched_entity.

For propagating the load, we have to take into account the load of the
whole task group in order to evaluate the load of the sched_entity.
Similarly to what was done before the rewrite of PELT, we add a correction
factor in case the task group's load is greater than its share so it will
contribute the same load of a task of equal weight.

Change-Id: Id34a9888484716961c9027299c0b4d82881a39d1
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten.Rasmussen@arm.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: kernellwp@gmail.com
Cc: pjt@google.com
Cc: yuyang.du@intel.com
Link: http://lkml.kernel.org/r/1478598827-32372-5-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 09a43ace1f986b003c118fdf6ddf1fd685692d49)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:54 -07:00
Vincent Guittot
8370e07d82 UPSTREAM: sched/fair: Fix hierarchical order in rq->leaf_cfs_rq_list
Fix the insertion of cfs_rq in rq->leaf_cfs_rq_list to ensure that a
child will always be called before its parent.

The hierarchical order in shares update list has been introduced by
commit:

  67e86250f8 ("sched: Introduce hierarchal order on shares update list")

With the current implementation a child can be still put after its
parent.

Lets take the example of:

       root
        \
         b
         /\
         c d*
           |
           e*

with root -> b -> c already enqueued but not d -> e so the
leaf_cfs_rq_list looks like: head -> c -> b -> root -> tail

The branch d -> e will be added the first time that they are enqueued,
starting with e then d.

When e is added, its parents is not already on the list so e is put at
the tail : head -> c -> b -> root -> e -> tail

Then, d is added at the head because its parent is already on the
list: head -> d -> c -> b -> root -> e -> tail

e is not placed at the right position and will be called the last
whereas it should be called at the beginning.

Because it follows the bottom-up enqueue sequence, we are sure that we
will finished to add either a cfs_rq without parent or a cfs_rq with a
parent that is already on the list. We can use this event to detect
when we have finished to add a new branch. For the others, whose
parents are not already added, we have to ensure that they will be
added after their children that have just been inserted the steps
before, and after any potential parents that are already in the list.
The easiest way is to put the cfs_rq just after the last inserted one
and to keep track of it untl the branch is fully added.

Change-Id: I4fe0b8502ea628c13d14e8e5c5279bce67fb8845
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten.Rasmussen@arm.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: kernellwp@gmail.com
Cc: pjt@google.com
Cc: yuyang.du@intel.com
Link: http://lkml.kernel.org/r/1478598827-32372-3-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 9c2791f936ef5fd04a118b5c284f2c9a95f4a647)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:54 -07:00
Vincent Guittot
723dab7871 BACKPORT: sched/fair: Factorize PELT update
Every time we modify load/utilization of sched_entity, we start to
sync it with its cfs_rq. This update is done in different ways:

 - when attaching/detaching a sched_entity, we update cfs_rq and then
   we sync the entity with the cfs_rq.

 - when enqueueing/dequeuing the sched_entity, we update both
   sched_entity and cfs_rq metrics to now.

Use update_load_avg() everytime we have to update and sync cfs_rq and
sched_entity before changing the state of a sched_enity.

Change-Id: Ibde9a7e07ac80e9d5753bb4a0c30dfb3643cc666
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten.Rasmussen@arm.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: kernellwp@gmail.com
Cc: pjt@google.com
Cc: yuyang.du@intel.com
Link: http://lkml.kernel.org/r/1478598827-32372-4-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[backported FROMLIST]
Signed-off-by: Andres Oportus <andresoportus@google.com>
(cherry picked from commit d31b1a66cbe0931733583ad9d9e8c6cfd710907d)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:53 -07:00
Vincent Guittot
18d09a45ec UPSTREAM: sched/fair: Factorize attach/detach entity
Factorize post_init_entity_util_avg() and part of attach_task_cfs_rq()
in one function attach_entity_cfs_rq().

Create symmetric detach_entity_cfs_rq() function.

Change-Id: I44fc6bb5e71460be65f6b8928d4620c6c27a6a67
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten.Rasmussen@arm.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: kernellwp@gmail.com
Cc: pjt@google.com
Cc: yuyang.du@intel.com
Link: http://lkml.kernel.org/r/1478598827-32372-2-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit df217913e72ec7e603d8b68cc4c70646cf7000db)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:53 -07:00