Commit graph

588 commits

Author SHA1 Message Date
Syed Rameez Mustafa
30fc774235 sched/hmp: Enhance co-location and scheduler boost features
The recent introduction of the schedtune cgroup controller has provided
the scheduler with added flexibility in terms of some of it's placement
features. In particular each cgroup under the schedtune controller can
now specify:

1) Whether it needs co-location along with other cgroups
2) Whether it is eligible for scheduler boost (sched_boost_enabled)
3) Whether the kernel can override the boost eligibility when necessary
   (sched_boost_no_override)

The scheduler now creates a reserved co-location group at boot. This
group is used to co-locate all tasks that form part of any one of the
cgroups that have co-location enabled. This reserved group can neither
be destroyed nor reused for other purposes. Furthermore, cgroups are
only allowed to indicate their co-location preference once at boot.
Further updates are disallowed.

Since we are now creating co-location groups for an extended period of
time, there are a few other factors to consider when determining the
preferred cluster for the group. We first exclude any tasks in the
group that have not been observed to be running for a significant
amount of time. Secondly we introduce the notion of group up and down
migrate tunables to allow different migration policies than individual
tasks. Lastly we break co-location if a single task in a group exceeds
up-migrate but the total load of the group does not exceed group
up-migrate.

In terms of sched_boost, the scheduler now supports multiple types of
boost. These are:

1) FULL_THROTTLE : Force up-migrate tasks belonging any cgroup that
                   has the sched_boost_enabled flag turned on. Little
                   CPUs will only be used when big CPUs can no longer
                   accommodate tasks. Also up-migrate all RT tasks.

2) CONSERVATIVE : Override the sched_boost_enabled flag for all cgroups
                  except those that have the sched_boost_no_override
                  flag set. Force up-migrate all tasks belonging to only
                  those cgroups that still remain eligible for boost.
                  RT tasks do not get force up migrated.

3) RESTRAINED : Start frequency aggregation for co-located tasks. This
                type of boost does not force up-migrate any task.

Finally the boost API removes ref-counting. This means that there can
only be a single entity using boost at any given time. If multiple
entities are managing boost, they are required to be well behaved so
that they don't interfere with one another. Even for a single client,
it is not possible to switch directly from one boost type to another.
Boost must be first turned off before switching over to a new type.

Change-Id: I8d224a70cbef162f27078b62b73acaa22670861d
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-11-16 17:57:56 -08:00
Runmin Wang
9cc5c789d9 Merge remote-tracking branch 'msm4.4/tmp-da9a92f' into msm-4.4
* origin/tmp-da9a92f:
  arm64: kaslr: increase randomization granularity
  arm64: relocatable: deal with physically misaligned kernel images
  arm64: don't map TEXT_OFFSET bytes below the kernel if we can avoid it
  arm64: kernel: replace early 64-bit literal loads with move-immediates
  arm64: introduce mov_q macro to move a constant into a 64-bit register
  arm64: kernel: perform relocation processing from ID map
  arm64: kernel: use literal for relocated address of __secondary_switched
  arm64: kernel: don't export local symbols from head.S
  arm64: simplify kernel segment mapping granularity
  arm64: cover the .head.text section in the .text segment mapping
  arm64: move early boot code to the .init segment
  arm64: use 'segment' rather than 'chunk' to describe mapped kernel regions
  arm64: mm: Mark .rodata as RO
  Linux 4.4.16
  ovl: verify upper dentry before unlink and rename
  drm/i915: Revert DisplayPort fast link training feature
  tmpfs: fix regression hang in fallocate undo
  tmpfs: don't undo fallocate past its last page
  crypto: qat - make qat_asym_algs.o depend on asn1 headers
  xen/acpi: allow xen-acpi-processor driver to load on Xen 4.7
  File names with trailing period or space need special case conversion
  cifs: dynamic allocation of ntlmssp blob
  Fix reconnect to not defer smb3 session reconnect long after socket reconnect
  53c700: fix BUG on untagged commands
  s390: fix test_fp_ctl inline assembly contraints
  scsi: fix race between simultaneous decrements of ->host_failed
  ovl: verify upper dentry in ovl_remove_and_whiteout()
  ovl: Copy up underlying inode's ->i_mode to overlay inode
  ARM: mvebu: fix HW I/O coherency related deadlocks
  ARM: dts: armada-38x: fix MBUS_ID for crypto SRAM on Armada 385 Linksys
  ARM: sunxi/dt: make the CHIP inherit from allwinner,sun5i-a13
  ALSA: hda: add AMD Stoney PCI ID with proper driver caps
  ALSA: hda - fix use-after-free after module unload
  ALSA: ctl: Stop notification after disconnection
  ALSA: pcm: Free chmap at PCM free callback, too
  ALSA: hda/realtek - add new pin definition in alc225 pin quirk table
  ALSA: hda - fix read before array start
  ALSA: hda - Add PCI ID for Kabylake-H
  ALSA: hda/realtek: Add Lenovo L460 to docking unit fixup
  ALSA: timer: Fix negative queue usage by racy accesses
  ALSA: echoaudio: Fix memory allocation
  ALSA: au88x0: Fix calculation in vortex_wtdma_bufshift()
  ALSA: hda / realtek - add two more Thinkpad IDs (5050,5053) for tpt460 fixup
  ALSA: hda - Fix the headset mic jack detection on Dell machine
  ALSA: dummy: Fix a use-after-free at closing
  hwmon: (dell-smm) Cache fan_type() calls and change fan detection
  hwmon: (dell-smm) Disallow fan_type() calls on broken machines
  hwmon: (dell-smm) Restrict fan control and serial number to CAP_SYS_ADMIN by default
  tty/vt/keyboard: fix OOB access in do_compute_shiftstate()
  tty: vt: Fix soft lockup in fbcon cursor blink timer.
  iio:ad7266: Fix probe deferral for vref
  iio:ad7266: Fix support for optional regulators
  iio:ad7266: Fix broken regulator error handling
  iio: accel: kxsd9: fix the usage of spi_w8r8()
  staging: iio: accel: fix error check
  iio: hudmidity: hdc100x: fix incorrect shifting and scaling
  iio: humidity: hdc100x: fix IIO_TEMP channel reporting
  iio: humidity: hdc100x: correct humidity integration time mask
  iio: proximity: as3935: fix buffer stack trashing
  iio: proximity: as3935: remove triggered buffer processing
  iio: proximity: as3935: correct IIO_CHAN_INFO_RAW output
  iio: light apds9960: Add the missing dev.parent
  iio:st_pressure: fix sampling gains (bring inline with ABI)
  iio: Fix error handling in iio_trigger_attach_poll_func
  xen/balloon: Fix declared-but-not-defined warning
  perf/x86: Fix undefined shift on 32-bit kernels
  memory: omap-gpmc: Fix omap gpmc EXTRADELAY timing
  drm/vmwgfx: Fix error paths when mapping framebuffer
  drm/vmwgfx: Delay pinning fbdev framebuffer until after mode set
  drm/vmwgfx: Check pin count before attempting to move a buffer
  drm/vmwgfx: Work around mode set failure in 2D VMs
  drm/vmwgfx: Add an option to change assumed FB bpp
  drm/ttm: Make ttm_bo_mem_compat available
  drm: atmel-hlcdc: actually disable scaling when no scaling is required
  drm: make drm_atomic_set_mode_prop_for_crtc() more reliable
  drm: add missing drm_mode_set_crtcinfo call
  drm/i915: Update CDCLK_FREQ register on BDW after changing cdclk frequency
  drm/i915: Update ifdeffery for mutex->owner
  drm/i915: Refresh cached DP port register value on resume
  drm/i915/ilk: Don't disable SSC source if it's in use
  drm/nouveau/disp/sor/gf119: select correct sor when poking training pattern
  drm/nouveau: fix for disabled fbdev emulation
  drm/nouveau/fbcon: fix out-of-bounds memory accesses
  drm/nouveau/gr/gf100-: update sm error decoding from gk20a nvgpu headers
  drm/nouveau/disp/sor/gf119: both links use the same training register
  virtio_balloon: fix PFN format for virtio-1
  drm/dp/mst: Always clear proposed vcpi table for port.
  drm/amdkfd: destroy dbgmgr in notifier release
  drm/amdkfd: unbind only existing processes
  ubi: Make recover_peb power cut aware
  drm/amdgpu/gfx7: fix broken condition check
  drm/radeon: fix asic initialization for virtualized environments
  btrfs: account for non-CoW'd blocks in btrfs_abort_transaction
  percpu: fix synchronization between synchronous map extension and chunk destruction
  percpu: fix synchronization between chunk->map_extend_work and chunk destruction
  af_unix: fix hard linked sockets on overlay
  vfs: add d_real_inode() helper
  arm64: Rework valid_user_regs
  ipmi: Remove smi_msg from waiting_rcv_msgs list before handle_one_recv_msg()
  drm/mgag200: Black screen fix for G200e rev 4
  iommu/amd: Fix unity mapping initialization race
  iommu/vt-d: Enable QI on all IOMMUs before setting root entry
  iommu/arm-smmu: Wire up map_sg for arm-smmu-v3
  base: make module_create_drivers_dir race-free
  tracing: Handle NULL formats in hold_module_trace_bprintk_format()
  HID: multitouch: enable palm rejection for Windows Precision Touchpad
  HID: hiddev: validate num_values for HIDIOCGUSAGES, HIDIOCSUSAGES commands
  HID: elo: kill not flush the work
  KVM: nVMX: VMX instructions: fix segment checks when L1 is in long mode.
  kvm: Fix irq route entries exceeding KVM_MAX_IRQ_ROUTES
  KEYS: potential uninitialized variable
  ARCv2: LLSC: software backoff is NOT needed starting HS2.1c
  ARCv2: Check for LL-SC livelock only if LLSC is enabled
  ipv6: Fix mem leak in rt6i_pcpu
  cdc_ncm: workaround for EM7455 "silent" data interface
  net_sched: fix mirrored packets checksum
  packet: Use symmetric hash for PACKET_FANOUT_HASH.
  sched/fair: Fix cfs_rq avg tracking underflow
  UBIFS: Implement ->migratepage()
  mm: Export migrate_page_move_mapping and migrate_page_copy
  MIPS: KVM: Fix modular KVM under QEMU
  ARM: 8579/1: mm: Fix definition of pmd_mknotpresent
  ARM: 8578/1: mm: ensure pmd_present only checks the valid bit
  ARM: imx6ul: Fix Micrel PHY mask
  NFS: Fix another OPEN_DOWNGRADE bug
  make nfs_atomic_open() call d_drop() on all ->open_context() errors.
  nfsd: check permissions when setting ACLs
  posix_acl: Add set_posix_acl
  nfsd: Extend the mutex holding region around in nfsd4_process_open2()
  nfsd: Always lock state exclusively.
  nfsd4/rpc: move backchannel create logic into rpc code
  writeback: use higher precision calculation in domain_dirty_limits()
  thermal: cpu_cooling: fix improper order during initialization
  uvc: Forward compat ioctls to their handlers directly
  Revert "gpiolib: Split GPIO flags parsing and GPIO configuration"
  x86/amd_nb: Fix boot crash on non-AMD systems
  kprobes/x86: Clear TF bit in fault on single-stepping
  x86, build: copy ldlinux.c32 to image.iso
  locking/static_key: Fix concurrent static_key_slow_inc()
  locking/qspinlock: Fix spin_unlock_wait() some more
  locking/ww_mutex: Report recursive ww_mutex locking early
  of: irq: fix of_irq_get[_byname]() kernel-doc
  of: fix autoloading due to broken modalias with no 'compatible'
  mnt: If fs_fully_visible fails call put_filesystem.
  mnt: Account for MS_RDONLY in fs_fully_visible
  mnt: fs_fully_visible test the proper mount for MNT_LOCKED
  usb: common: otg-fsm: add license to usb-otg-fsm
  USB: EHCI: declare hostpc register as zero-length array
  usb: dwc2: fix regression on big-endian PowerPC/ARM systems
  powerpc/tm: Always reclaim in start_thread() for exec() class syscalls
  powerpc/pseries: Fix IBM_ARCH_VEC_NRCORES_OFFSET since POWER8NVL was added
  powerpc/pseries: Fix PCI config address for DDW
  powerpc/iommu: Remove the dependency on EEH struct in DDW mechanism
  IB/mlx4: Properly initialize GRH TClass and FlowLabel in AHs
  IB/cm: Fix a recently introduced locking bug
  EDAC, sb_edac: Fix rank lookup on Broadwell
  mac80211: Fix mesh estab_plinks counting in STA removal case
  mac80211_hwsim: Add missing check for HWSIM_ATTR_SIGNAL
  mac80211: mesh: flush mesh paths unconditionally
  mac80211: fix fast_tx header alignment
  Linux 4.4.15
  usb: dwc3: exynos: Fix deferred probing storm.
  usb: host: ehci-tegra: Grab the correct UTMI pads reset
  usb: gadget: fix spinlock dead lock in gadgetfs
  USB: mos7720: delete parport
  xhci: Fix handling timeouted commands on hosts in weird states.
  USB: xhci: Add broken streams quirk for Frescologic device id 1009
  usb: xhci-plat: properly handle probe deferral for devm_clk_get()
  xhci: Cleanup only when releasing primary hcd
  usb: musb: host: correct cppi dma channel for isoch transfer
  usb: musb: Ensure rx reinit occurs for shared_fifo endpoints
  usb: musb: Stop bulk endpoint while queue is rotated
  usb: musb: only restore devctl when session was set in backup
  usb: quirks: Add no-lpm quirk for Acer C120 LED Projector
  usb: quirks: Fix sorting
  USB: uas: Fix slave queue_depth not being set
  crypto: user - re-add size check for CRYPTO_MSG_GETALG
  crypto: ux500 - memmove the right size
  crypto: vmx - Increase priority of aes-cbc cipher
  AX.25: Close socket connection on session completion
  bpf: try harder on clones when writing into skb
  net: alx: Work around the DMA RX overflow issue
  net: macb: fix default configuration for GMAC on AT91
  neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit()
  bpf, perf: delay release of BPF prog after grace period
  sock_diag: do not broadcast raw socket destruction
  Bridge: Fix ipv6 mc snooping if bridge has no ipv6 address
  ipmr/ip6mr: Initialize the last assert time of mfc entries.
  netem: fix a use after free
  esp: Fix ESN generation under UDP encapsulation
  sit: correct IP protocol used in ipip6_err
  net: Don't forget pr_fmt on net_dbg_ratelimited for CONFIG_DYNAMIC_DEBUG
  net_sched: fix pfifo_head_drop behavior vs backlog
  sdcardfs: Truncate packages_gid.list on overflow
  UPSTREAM: cdc_ncm: do not call usbnet_link_change from cdc_ncm_bind
  BACKPORT: proc: add /proc/<pid>/timerslack_ns interface
  BACKPORT: timer: convert timer_slack_ns from unsigned long to u64
  netfilter: xt_quota2: make quota2_log work well
  Revert "usb: gadget: prevent change of Host MAC address of 'usb0' interface"
  BACKPORT: PM / sleep: Go direct_complete if driver has no callbacks
  ANDROID: base-cfg: enable UID_CPUTIME
  UPSTREAM: USB: usbfs: fix potential infoleak in devio
  UPSTREAM: ALSA: timer: Fix leak in events via snd_timer_user_ccallback
  UPSTREAM: ALSA: timer: Fix leak in events via snd_timer_user_tinterrupt
  UPSTREAM: ALSA: timer: Fix leak in SNDRV_TIMER_IOCTL_PARAMS
  ANDROID: configs: remove unused configs
  ANDROID: cpu: send KOBJ_ONLINE event when enabling cpus
  ANDROID: dm verity fec: initialize recursion level
  ANDROID: dm verity fec: fix RS block calculation
  Linux 4.4.14
  netfilter: x_tables: introduce and use xt_copy_counters_from_user
  netfilter: x_tables: do compat validation via translate_table
  netfilter: x_tables: xt_compat_match_from_user doesn't need a retval
  netfilter: ip6_tables: simplify translate_compat_table args
  netfilter: ip_tables: simplify translate_compat_table args
  netfilter: arp_tables: simplify translate_compat_table args
  netfilter: x_tables: don't reject valid target size on some architectures
  netfilter: x_tables: validate all offsets and sizes in a rule
  netfilter: x_tables: check for bogus target offset
  netfilter: x_tables: check standard target size too
  netfilter: x_tables: add compat version of xt_check_entry_offsets
  netfilter: x_tables: assert minimum target size
  netfilter: x_tables: kill check_entry helper
  netfilter: x_tables: add and use xt_check_entry_offsets
  netfilter: x_tables: validate targets of jumps
  netfilter: x_tables: don't move to non-existent next rule
  drm/core: Do not preserve framebuffer on rmfb, v4.
  crypto: qat - fix adf_ctl_drv.c:undefined reference to adf_init_pf_wq
  netfilter: x_tables: fix unconditional helper
  netfilter: x_tables: make sure e->next_offset covers remaining blob size
  netfilter: x_tables: validate e->target_offset early
  MIPS: Fix 64k page support for 32 bit kernels.
  sparc64: Fix return from trap window fill crashes.
  sparc: Harden signal return frame checks.
  sparc64: Take ctx_alloc_lock properly in hugetlb_setup().
  sparc64: Reduce TLB flushes during hugepte changes
  sparc/PCI: Fix for panic while enabling SR-IOV
  sparc64: Fix sparc64_set_context stack handling.
  sparc64: Fix numa node distance initialization
  sparc64: Fix bootup regressions on some Kconfig combinations.
  sparc: Fix system call tracing register handling.
  fix d_walk()/non-delayed __d_free() race
  sched: panic on corrupted stack end
  proc: prevent stacking filesystems on top
  x86/entry/traps: Don't force in_interrupt() to return true in IST handlers
  wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel
  ecryptfs: forbid opening files without mmap handler
  memcg: add RCU locking around css_for_each_descendant_pre() in memcg_offline_kmem()
  parisc: Fix pagefault crash in unaligned __get_user() call
  pinctrl: mediatek: fix dual-edge code defect
  powerpc/pseries: Add POWER8NVL support to ibm,client-architecture-support call
  powerpc: Use privileged SPR number for MMCR2
  powerpc: Fix definition of SIAR and SDAR registers
  powerpc/pseries/eeh: Handle RTAS delay requests in configure_bridge
  arm64: mm: always take dirty state from new pte in ptep_set_access_flags
  arm64: Provide "model name" in /proc/cpuinfo for PER_LINUX32 tasks
  crypto: ccp - Fix AES XTS error for request sizes above 4096
  crypto: public_key: select CRYPTO_AKCIPHER
  irqchip/gic-v3: Fix ICC_SGI1R_EL1.INTID decoding mask
  s390/bpf: reduce maximum program size to 64 KB
  s390/bpf: fix recache skb->data/hlen for skb_vlan_push/pop
  gpio: bcm-kona: fix bcm_kona_gpio_reset() warnings
  ARM: fix PTRACE_SETVFPREGS on SMP systems
  ALSA: hda/realtek: Add T560 docking unit fixup
  ALSA: hda/realtek - Add support for new codecs ALC700/ALC701/ALC703
  ALSA: hda/realtek - ALC256 speaker noise issue
  ALSA: hda - Fix headset mic detection problem for Dell machine
  ALSA: hda - Add PCI ID for Kabylake
  KVM: irqfd: fix NULL pointer dereference in kvm_irq_map_gsi
  KVM: x86: fix OOPS after invalid KVM_SET_DEBUGREGS
  vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices
  geneve: Relax MTU constraints
  vxlan: Relax MTU constraints
  ipv6: Skip XFRM lookup if dst_entry in socket cache is valid
  l2tp: fix configuration passed to setup_udp_tunnel_sock()
  bridge: Don't insert unnecessary local fdb entry on changing mac address
  tcp: record TLP and ER timer stats in v6 stats
  vxlan: Accept user specified MTU value when create new vxlan link
  team: don't call netdev_change_features under team->lock
  sfc: on MC reset, clear PIO buffer linkage in TXQs
  bpf, inode: disallow userns mounts
  uapi glibc compat: fix compilation when !__USE_MISC in glibc
  udp: prevent skbs lingering in tunnel socket queues
  bpf: Use mount_nodev not mount_ns to mount the bpf filesystem
  tuntap: correctly wake up process during uninit
  switchdev: pass pointer to fib_info instead of copy
  tipc: fix nametable publication field in nl compat
  netlink: Fix dump skb leak/double free
  tipc: check nl sock before parsing nested attributes
  scsi: Add QEMU CD-ROM to VPD Inquiry Blacklist
  scsi_lib: correctly retry failed zero length REQ_TYPE_FS commands
  cs-etm: associating output packet with CPU they executed on
  cs-etm: removing unecessary structure field
  cs-etm: account for each trace buffer in the queue
  cs-etm: avoid casting variable
  perf tools: fixing Makefile problems
  perf tools: new naming convention for openCSD
  perf scripts: Add python scripts for CoreSight traces
  perf tools: decoding capailitity for CoreSight traces
  perf symbols: Check before overwriting build_id
  perf tools: pushing driver configuration down to the kernel
  perf tools: add infrastructure for PMU specific configuration
  coresight: etm-perf: incorporating sink definition from the cmd line
  coresight: adding sink parameter to function coresight_build_path()
  perf: passing struct perf_event to function setup_aux()
  perf/core: adding PMU driver specific configuration
  perf tools: adding coresight etm PMU record capabilities
  perf tools: making coresight PMU listable
  coresight: tmc: implementing TMC-ETR AUX space API
  coresight: Add support for Juno platform
  coresight: Handle build path error
  coresight: Fix erroneous memset in tmc_read_unprepare_etr
  coresight: Fix tmc_read_unprepare_etr
  coresight: Fix NULL pointer dereference in _coresight_build_path
  ANDROID: dm verity fec: add missing release from fec_ktype
  ANDROID: dm verity fec: limit error correction recursion
  ANDROID: restrict access to perf events
  FROMLIST: security,perf: Allow further restriction of perf_event_open
  BACKPORT: perf tools: Document the perf sysctls
  Revert "armv6 dcc tty driver"
  Revert "arm: dcc_tty: fix armv6 dcc tty build failure"
  ARM64: Ignore Image-dtb from git point of view
  arm64: add option to build Image-dtb
  ANDROID: usb: gadget: f_midi: set fi->f to NULL when free f_midi function
  Linux 4.4.13
  xfs: handle dquot buffer readahead in log recovery correctly
  xfs: print name of verifier if it fails
  xfs: skip stale inodes in xfs_iflush_cluster
  xfs: fix inode validity check in xfs_iflush_cluster
  xfs: xfs_iflush_cluster fails to abort on error
  xfs: Don't wrap growfs AGFL indexes
  xfs: disallow rw remount on fs with unknown ro-compat features
  gcov: disable tree-loop-im to reduce stack usage
  scripts/package/Makefile: rpmbuild add support of RPMOPTS
  dma-debug: avoid spinlock recursion when disabling dma-debug
  PM / sleep: Handle failures in device_suspend_late() consistently
  ext4: silence UBSAN in ext4_mb_init()
  ext4: address UBSAN warning in mb_find_order_for_block()
  ext4: fix oops on corrupted filesystem
  ext4: clean up error handling when orphan list is corrupted
  ext4: fix hang when processing corrupted orphaned inode list
  drm/imx: Match imx-ipuv3-crtc components using device node in platform data
  drm/i915: Don't leave old junk in ilk active watermarks on readout
  drm/atomic: Verify connector->funcs != NULL when clearing states
  drm/fb_helper: Fix references to dev->mode_config.num_connector
  drm/i915/fbdev: Fix num_connector references in intel_fb_initial_config()
  drm/amdgpu: Fix hdmi deep color support.
  drm/amdgpu: use drm_mode_vrefresh() rather than mode->vrefresh
  drm/vmwgfx: Fix order of operation
  drm/vmwgfx: use vmw_cmd_dx_cid_check for query commands.
  drm/vmwgfx: Enable SVGA_3D_CMD_DX_SET_PREDICATION
  drm/gma500: Fix possible out of bounds read
  sunrpc: fix stripping of padded MIC tokens
  xen: use same main loop for counting and remapping pages
  xen/events: Don't move disabled irqs
  powerpc/eeh: Restore initial state in eeh_pe_reset_and_recover()
  Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell"
  powerpc/eeh: Don't report error in eeh_pe_reset_and_recover()
  powerpc/book3s64: Fix branching to OOL handlers in relocatable kernel
  pipe: limit the per-user amount of pages allocated in pipes
  QE-UART: add "fsl,t1040-ucc-uart" to of_device_id
  wait/ptrace: assume __WALL if the child is traced
  mm: use phys_addr_t for reserve_bootmem_region() arguments
  media: v4l2-compat-ioctl32: fix missing reserved field copy in put_v4l2_create32
  PCI: Disable all BAR sizing for devices with non-compliant BARs
  pinctrl: exynos5440: Use off-stack memory for pinctrl_gpio_range
  clk: bcm2835: divider value has to be 1 or more
  clk: bcm2835: pll_off should only update CM_PLL_ANARST
  clk: at91: fix check of clk_register() returned value
  clk: bcm2835: Fix PLL poweron
  cpuidle: Fix cpuidle_state_is_coupled() argument in cpuidle_enter()
  cpuidle: Indicate when a device has been unregistered
  PM / Runtime: Fix error path in pm_runtime_force_resume()
  mfd: intel_soc_pmic_core: Terminate panel control GPIO lookup table correctly
  mfd: intel-lpss: Save register context on suspend
  hwmon: (ads7828) Enable internal reference
  aacraid: Fix for KDUMP driver hang
  aacraid: Fix for aac_command_thread hang
  aacraid: Relinquish CPU during timeout wait
  rtlwifi: pci: use dev_kfree_skb_irq instead of kfree_skb in rtl_pci_reset_trx_ring
  rtlwifi: Fix logic error in enter/exit power-save mode
  rtlwifi: btcoexist: Implement antenna selection
  rtlwifi: rtl8723be: Add antenna select module parameter
  hwrng: exynos - Fix unbalanced PM runtime put on timeout error path
  ath5k: Change led pin configuration for compaq c700 laptop
  ath10k: fix kernel panic, move arvifs list head init before htt init
  ath10k: fix rx_channel during hw reconfigure
  ath10k: fix firmware assert in monitor mode
  ath10k: fix debugfs pktlog_filter write
  ath9k: Fix LED polarity for some Mini PCI AR9220 MB92 cards.
  ath9k: Add a module parameter to invert LED polarity.
  ARM: dts: imx35: restore existing used clock enumeration
  ARM: dts: exynos: Add interrupt line to MAX8997 PMIC on exynos4210-trats
  ARM: dts: at91: fix typo in sama5d2 PIN_PD24 description
  ARM: mvebu: fix GPIO config on the Linksys boards
  Input: uinput - handle compat ioctl for UI_SET_PHYS
  ASoC: ak4642: Enable cache usage to fix crashes on resume
  affs: fix remount failure when there are no options changed
  MIPS: VDSO: Build with `-fno-strict-aliasing'
  MIPS: lib: Mark intrinsics notrace
  MIPS: Build microMIPS VDSO for microMIPS kernels
  MIPS: Fix sigreturn via VDSO on microMIPS kernel
  MIPS: ptrace: Prevent writes to read-only FCSR bits
  MIPS: ptrace: Fix FP context restoration FCSR regression
  MIPS: Disable preemption during prctl(PR_SET_FP_MODE, ...)
  MIPS: Prevent "restoration" of MSA context in non-MSA kernels
  MIPS: Fix MSA ld_*/st_* asm macros to use PTR_ADDU
  MIPS: Use copy_s.fmt rather than copy_u.fmt
  MIPS: Loongson-3: Reserve 32MB for RS780E integrated GPU
  MIPS: Reserve nosave data for hibernation
  MIPS: ath79: make bootconsole wait for both THRE and TEMT
  MIPS: Sync icache & dcache in set_pte_at
  MIPS: Handle highmem pages in __update_cache
  MIPS: Flush highmem pages in __flush_dcache_page
  MIPS: Fix watchpoint restoration
  MIPS: Fix uapi include in exported asm/siginfo.h
  MIPS: Fix siginfo.h to use strict posix types
  MIPS: Avoid using unwind_stack() with usermode
  MIPS: Don't unwind to user mode with EVA
  MIPS: MSA: Fix a link error on `_init_msa_upper' with older GCC
  MIPS: math-emu: Fix jalr emulation when rd == $0
  MIPS64: R6: R2 emulation bugfix
  coresight: etb10: adjust read pointer only when needed
  coresight: configuring ETF in FIFO mode when acting as link
  coresight: tmc: implementing TMC-ETF AUX space API
  coresight: moving struct cs_buffers to header file
  coresight: tmc: keep track of memory width
  coresight: tmc: make sysFS and Perf mode mutually exclusive
  coresight: tmc: dump system memory content only when needed
  coresight: tmc: adding mode of operation for link/sinks
  coresight: tmc: getting rid of multiple read access
  coresight: tmc: allocating memory when needed
  coresight: tmc: making prepare/unprepare functions generic
  coresight: tmc: splitting driver in ETB/ETF and ETR components
  coresight: tmc: cleaning up header file
  coresight: tmc: introducing new header file
  coresight: tmc: clearly define number of transfers per burst
  coresight: tmc: re-implementing tmc_read_prepare/unprepare() functions
  coresight: tmc: waiting for TMCReady bit before programming
  coresight: tmc: modifying naming convention
  coresight: tmc: adding sysFS management entries
  coresight: etm4x: add tracer ID for A72 Maia processor.
  coresight: etb10: fixing the right amount of words to read
  coresight: stm: adding driver for CoreSight STM component
  coresight: adding path for STM device
  coresight: etm4x: modify q_support type
  coresight: no need to do the forced type conversion
  coresight: removing gratuitous boot time log messages
  coresight: etb10: splitting sysFS "status" entry
  coresight: moving coresight_simple_func() to header file
  coresight: etm4x: implementing the perf PMU API
  coresight: etm4x: implementing user/kernel mode tracing
  coresight: etm4x: moving etm_drvdata::enable to atomic field
  coresight: etm4x: unlocking tracers in default arch init
  coresight: etm4x: splitting etmv4 default configuration
  coresight: etm4x: splitting struct etmv4_drvdata
  coresight: etm4x: adding config and traceid registers
  coresight: etm4x: moving sysFS entries to a dedicated file
  stm class: Support devices that override software assigned masters
  stm class: Remove unnecessary pointer increment
  stm class: Fix stm device initialization order
  stm class: Do not leak the chrdev in error path
  stm class: Remove a pointless line
  stm class: stm_heartbeat: Make nr_devs parameter read-only
  stm class: dummy_stm: Make nr_dummies parameter read-only
  MAINTAINERS: Add a git tree for the stm class
  perf/ring_buffer: Document AUX API usage
  perf/core: Free AUX pages in unmap path
  perf/ring_buffer: Refuse to begin AUX transaction after rb->aux_mmap_count drops
  perf auxtrace: Add perf_evlist pointer to *info_priv_size()
  perf session: Simplify tool stubs
  perf inject: Hit all DSOs for AUX data in JIT and other cases
  perf tools: tracepoint_error() can receive e=NULL, robustify it
  perf evlist: Make perf_evlist__open() open evsels with their cpus and threads (like perf record does)
  perf evsel: Introduce disable() method
  perf cpumap: Auto initialize cpu__max_{node,cpu}
  drivers/hwtracing: make coresight-etm-perf.c explicitly non-modular
  drivers/hwtracing: make coresight-* explicitly non-modular
  coresight: introducing a global trace ID function
  coresight: etm-perf: new PMU driver for ETM tracers
  coresight: etb10: implementing AUX API
  coresight: etb10: adding operation mode for sink->enable()
  coresight: etb10: moving to local atomic operations
  coresight: etm3x: implementing perf_enable/disable() API
  coresight: etm3x: implementing user/kernel mode tracing
  coresight: etm3x: consolidating initial config
  coresight: etm3x: changing default trace configuration
  coresight: etm3x: set progbit to stop trace collection
  coresight: etm3x: adding operation mode for etm_enable()
  coresight: etm3x: splitting struct etm_drvdata
  coresight: etm3x: unlocking tracers in default arch init
  coresight: etm3x: moving sysFS entries to dedicated file
  coresight: etm3x: moving etm_readl/writel to header file
  coresight: moving PM runtime operations to core framework
  coresight: add API to get sink from path
  coresight: associating path with session rather than tracer
  coresight: etm4x: Check every parameter used by dma_xx_coherent.
  coresight: "DEVICE_ATTR_RO" should defined as static.
  coresight: implementing 'cpu_id()' API
  coresight: removing bind/unbind options from sysfs
  coresight: remove csdev's link from topology
  coresight: release reference taken by 'bus_find_device()'
  coresight: coresight_unregister() function cleanup
  coresight: fixing lockdep error
  coresight: fixing indentation problem
  coresight: Fix a typo in Kconfig
  coresight: checking for NULL string in coresight_name_match()
  perf/core: Disable the event on a truncated AUX record
  perf/core: Don't leak event in the syscall error path
  perf/core: Fix perf_sched_count derailment
  stm class: dummy_stm: Add link callback for fault injection
  stm class: Plug stm device's unlink callback
  stm class: Fix a race in unlinking
  stm class: Fix unbalanced module/device refcounting
  stm class: Guard output assignment against concurrency
  stm class: Fix unlocking braino in the error path
  stm class: Add heartbeat stm source device
  stm class: dummy_stm: Create multiple devices
  stm class: Support devices with multiple instances
  stm class: Use driver's packet callback return value
  stm class: Prevent user-controllable allocations
  stm class: Fix link list locking
  stm class: Fix locking in unbinding policy path
  stm class: Select CONFIG_SRCU
  stm class: Hide STM-specific options if STM is disabled
  perf: Synchronously free aux pages in case of allocation failure
  Linux 4.4.12
  kbuild: move -Wunused-const-variable to W=1 warning level
  Revert "scsi: fix soft lockup in scsi_remove_target() on module removal"
  scsi: Add intermediate STARGET_REMOVE state to scsi_target_state
  hpfs: implement the show_options method
  hpfs: fix remount failure when there are no options changed
  UBI: Fix static volume checks when Fastmap is used
  SIGNAL: Move generic copy_siginfo() to signal.h
  thunderbolt: Fix double free of drom buffer
  IB/srp: Fix a debug kernel crash
  ALSA: hda - Fix headset mic detection problem for one Dell machine
  ALSA: hda/realtek - Add support for ALC295/ALC3254
  ALSA: hda - Fix headphone noise on Dell XPS 13 9360
  ALSA: hda/realtek - New codecs support for ALC234/ALC274/ALC294
  mcb: Fixed bar number assignment for the gdd
  clk: bcm2835: add locking to pll*_on/off methods
  locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()
  serial: samsung: Reorder the sequence of clock control when call s3c24xx_serial_set_termios()
  serial: 8250_mid: recognize interrupt source in handler
  serial: 8250_mid: use proper bar for DNV platform
  serial: 8250_pci: fix divide error bug if baud rate is 0
  Fix OpenSSH pty regression on close
  tty/serial: atmel: fix hardware handshake selection
  TTY: n_gsm, fix false positive WARN_ON
  tty: vt, return error when con_startup fails
  xen/x86: actually allocate legacy interrupts on PV guests
  KVM: x86: mask CPUID(0xD,0x1).EAX against host value
  MIPS: KVM: Fix timer IRQ race when writing CP0_Compare
  MIPS: KVM: Fix timer IRQ race when freezing timer
  KVM: x86: fix ordering of cr0 initialization code in vmx_cpu_reset
  KVM: MTRR: remove MSR 0x2f8
  staging: comedi: das1800: fix possible NULL dereference
  usb: gadget: udc: core: Fix argument of dev_err() in usb_gadget_map_request()
  USB: leave LPM alone if possible when binding/unbinding interface drivers
  usb: misc: usbtest: fix pattern tests for scatterlists.
  usb: f_mass_storage: test whether thread is running before starting another
  usb: gadget: f_fs: Fix EFAULT generation for async read operations
  USB: serial: option: add even more ZTE device ids
  USB: serial: option: add more ZTE device ids
  USB: serial: option: add support for Cinterion PH8 and AHxx
  USB: serial: io_edgeport: fix memory leaks in probe error path
  USB: serial: io_edgeport: fix memory leaks in attach error path
  USB: serial: quatech2: fix use-after-free in probe error path
  USB: serial: keyspan: fix use-after-free in probe error path
  USB: serial: mxuport: fix use-after-free in probe error path
  mei: bus: call mei_cl_read_start under device lock
  mei: amthif: discard not read messages
  mei: fix NULL dereferencing during FW initiated disconnection
  Bluetooth: vhci: Fix race at creating hci device
  Bluetooth: vhci: purge unhandled skbs
  Bluetooth: vhci: fix open_timeout vs. hdev race
  mmc: sdhci-pci: Remove MMC_CAP_BUS_WIDTH_TEST for Intel controllers
  mmc: longer timeout for long read time quirk
  dell-rbtn: Ignore ACPI notifications if device is suspended
  ACPI / osi: Fix an issue that acpi_osi=!* cannot disable ACPICA internal strings
  mmc: sdhci-acpi: Remove MMC_CAP_BUS_WIDTH_TEST for Intel controllers
  mmc: mmc: Fix partition switch timeout for some eMMCs
  can: fix handling of unmodifiable configuration options
  irqchip/gic-v3: Configure all interrupts as non-secure Group-1
  irqchip/gic: Ensure ordering between read of INTACK and shared data
  Input: pwm-beeper - fix - scheduling while atomic
  mfd: omap-usb-tll: Fix scheduling while atomic BUG
  sched/loadavg: Fix loadavg artifacts on fully idle and on fully loaded systems
  clk: qcom: msm8916: Fix crypto clock flags
  crypto: sun4i-ss - Replace spinlock_bh by spin_lock_irq{save|restore}
  crypto: talitos - fix ahash algorithms registration
  crypto: caam - fix caam_jr_alloc() ret code
  ring-buffer: Prevent overflow of size in ring_buffer_resize()
  ring-buffer: Use long for nr_pages to avoid overflow failures
  asix: Fix offset calculation in asix_rx_fixup() causing slow transmissions
  fs/cifs: correctly to anonymous authentication for the NTLM(v2) authentication
  fs/cifs: correctly to anonymous authentication for the NTLM(v1) authentication
  fs/cifs: correctly to anonymous authentication for the LANMAN authentication
  fs/cifs: correctly to anonymous authentication via NTLMSSP
  remove directory incorrectly tries to set delete on close on non-empty directories
  kvm: arm64: Fix EC field in inject_abt64
  arm/arm64: KVM: Enforce Break-Before-Make on Stage-2 page tables
  arm64: cpuinfo: Missing NULL terminator in compat_hwcap_str
  arm64: Implement pmdp_set_access_flags() for hardware AF/DBM
  arm64: Implement ptep_set_access_flags() for hardware AF/DBM
  arm64: Ensure pmd_present() returns false after pmd_mknotpresent()
  arm64: Fix typo in the pmdp_huge_get_and_clear() definition
  ext4: iterate over buffer heads correctly in move_extent_per_page()
  perf test: Fix build of BPF and LLVM on older glibc libraries
  perf/core: Fix perf_event_open() vs. execve() race
  perf/x86/intel/pt: Generate PMI in the STOP region as well
  Btrfs: don't use src fd for printk
  UPSTREAM: mac80211: fix "warning: ‘target_metric’ may be used uninitialized"
  Revert "drivers: power: use 'current' instead of 'get_current()'"
  cpufreq: interactive: drop cpufreq_{get,put}_global_kobject func calls
  Revert "cpufreq: interactive: build fixes for 4.4"
  xt_qtaguid: Fix panic caused by processing non-full socket.
  fiq_debugger: Add fiq_debugger.disable option
  UPSTREAM: procfs: fixes pthread cross-thread naming if !PR_DUMPABLE
  FROMLIST: wlcore: Disable filtering in AP role
  Revert "drivers: power: Add watchdog timer to catch drivers which lockup during suspend."
  fiq_debugger: Add option to apply uart overlay by FIQ_DEBUGGER_UART_OVERLAY
  Revert "Recreate asm/mach/mmc.h include file"
  Revert "ARM: Add 'card_present' state to mmc_platfrom_data"
  usb: dual-role: make stub functions inline
  Revert "mmc: Add status IRQ and status callback function to mmc platform data"
  quick selinux support for tracefs
  Revert "hid-multitouch: Filter collections by application usage."
  Revert "HID: steelseries: validate output report details"
  xt_qtaguid: Fix panic caused by synack processing
  Revert "mm: vmscan: Add a debug file for shrinkers"
  Revert "SELinux: Enable setting security contexts on rootfs inodes."
  Revert "SELinux: build fix for 4.1"
  fuse: Add support for d_canonical_path
  vfs: change d_canonical_path to take two paths
  android: recommended.cfg: remove CONFIG_UID_STAT
  netfilter: xt_qtaguid: seq_printf fixes
  Revert "misc: uidstat: Adding uid stat driver to collect network statistics."
  Revert "net: activity_stats: Add statistics for network transmission activity"
  Revert "net: activity_stats: Stop using obsolete create_proc_read_entry api"
  Revert "misc: uidstat: avoid create_stat() race and blockage."
  Revert "misc: uidstat: Remove use of obsolete create_proc_read_entry api"
  Revert "misc seq_printf fixes for 4.4"
  Revert "misc: uid_stat: Include linux/atomic.h instead of asm/atomic.h"
  Revert "net: socket ioctl to reset connections matching local address"
  Revert "net: fix iterating over hashtable in tcp_nuke_addr()"
  Revert "net: fix crash in tcp_nuke_addr()"
  Revert "Don't kill IPv4 sockets when killing IPv6 sockets was requested."
  Revert "tcp: Fix IPV6 module build errors"
  android: base-cfg: remove CONFIG_SWITCH
  Revert "switch: switch class and GPIO drivers."
  Revert "drivers: switch: remove S_IWUSR from dev_attr"
  ANDROID: base-cfg: enable CONFIG_IP_NF_NAT
  BACKPORT: selinux: restrict kernel module loading
  android: base-cfg: enable CONFIG_QUOTA

Conflicts:
	Documentation/sysctl/kernel.txt
	drivers/cpufreq/cpufreq_interactive.c
	drivers/hwtracing/coresight/Kconfig
	drivers/hwtracing/coresight/Makefile
	drivers/hwtracing/coresight/coresight-etm4x.c
	drivers/hwtracing/coresight/coresight-etm4x.h
	drivers/hwtracing/coresight/coresight-priv.h
	drivers/hwtracing/coresight/coresight-stm.c
	drivers/hwtracing/coresight/coresight-tmc.c
	drivers/mmc/core/core.c
	include/linux/coresight-stm.h
	include/linux/coresight.h
	include/linux/msm_mdp.h
	include/uapi/linux/coresight-stm.h
	kernel/events/core.c
	kernel/sched/fair.c
	net/Makefile
	net/ipv4/netfilter/arp_tables.c
	net/ipv4/netfilter/ip_tables.c
	net/ipv4/tcp.c
	net/ipv6/netfilter/ip6_tables.c
	net/netfilter/xt_quota2.c
	sound/core/pcm.c

Change-Id: I17aa0002815014e9bddc47e67769a53c15768a99
Signed-off-by: Runmin Wang <runminw@codeaurora.org>
2016-10-28 10:48:35 -07:00
Syed Rameez Mustafa
066f712e43 sched: Add multiple load reporting policies for cpu frequency
The previous patches in this series introduce the mechanics of CPU
load tracking without fixups for intra cluster migration and top task
load tracking. Add a tunable that dictates what of the above needs to
be considered when reporting load to the governor. The default policy
is to take the maximum of the CPU load and top task load.

Change-Id: Ie585a11ed774b929910d04c41471db3a2a102ec5
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-10-17 12:45:54 -07:00
Patrick Bellasi
e89595cd93 sched/tune: add initial support for CGroups based boosting
To support task performance boosting, the usage of a single knob has the
advantage to be a simple solution, both from the implementation and the
usability standpoint.  However, on a real system it can be difficult to
identify a single value for the knob which fits the needs of multiple
different tasks. For example, some kernel threads and/or user-space
background services should be better managed the "standard" way while we
still want to be able to boost the performance of specific workloads.

In order to improve the flexibility of the task boosting mechanism this
patch is the first of a small series which extends the previous
implementation to introduce a "per task group" support.
This first patch introduces just the basic CGroups support, a new
"schedtune" CGroups controller is added which allows to configure
different boost value for different groups of tasks.
To keep the implementation simple but still effective for a boosting
strategy, the new controller:
  1. allows only a two layer hierarchy
  2. supports only a limited number of boost groups

A two layer hierarchy allows to place each task either:
  a) in the root control group
     thus being subject to a system-wide boosting value
  b) in a child of the root group
     thus being subject to the specific boost value defined by that
     "boost group"

The limited number of "boost groups" supported is mainly motivated by
the observation that in a real system it could be useful to have only
few classes of tasks which deserve different treatment.
For example, background vs foreground or interactive vs low-priority.
As an additional benefit, a limited number of boost groups allows also
to have a simpler implementation especially for the code required to
compute the boost value for CPUs which have runnable tasks belonging to
different boost groups.

Change-Id: I1304e33a8440bfdad9c8bcf8129ff390216f2e32
cc: Tejun Heo <tj@kernel.org>
cc: Li Zefan <lizefan@huawei.com>
cc: Johannes Weiner <hannes@cmpxchg.org>
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Git-commit: 13001f47c9
Git-repo: https://android.googlesource.com/kernel/common
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
2016-10-10 11:09:53 -07:00
Patrick Bellasi
754a122792 sched/tune: add sysctl interface to define a boost value
The current (CFS) scheduler implementation does not allow "to boost"
tasks performance by running them at a higher OPP compared to the
minimum required to meet their workload demands.

To support tasks performance boosting the scheduler should provide a
"knob" which allows to tune how much the system is going to be optimised
for energy efficiency vs performance.

This patch is the first of a series which provides a simple interface to
define a tuning knob. One system-wide "boost" tunable is exposed via:
  /proc/sys/kernel/sched_cfs_boost
which can be configured in the range [0..100], to define a percentage
where:
  - 0%   boost requires to operate in "standard" mode by scheduling
         tasks at the minimum capacities required by the workload demand
  - 100% boost requires to push at maximum the task performances,
         "regardless" of the incurred energy consumption

A boost value in between these two boundaries is used to bias the
power/performance trade-off, the higher the boost value the more the
scheduler is biased toward performance boosting instead of energy
efficiency.

Change-Id: I59a41725e2d8f9238a61dfb0c909071b53560fc0
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Git-commit: 63c8fad2b06805ef88f1220551289f0a3c3529f1
Git-repo: https://source.codeaurora.org/quic/la/kernel/msm-4.4
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
2016-10-05 17:24:23 -07:00
Pavankumar Kondeti
f1e9995fe4 sched: add a knob to prefer the waker CPU for sync wakeups
The current policy has a preference to select an idle CPU in the waker
cluster compared to the waker CPU running only 1 task. By selecting
an idle CPU, it eliminates the chance of waker migrating to a
different CPU after the wakee preempts it. This policy is also not
susceptible to the incorrect "sync" usage i.e the waker does not
goto sleep after waking up the wakee.

However LPM exit latency associated with an idle CPU outweigh the
above benefits on some targets. So add a knob to prefer the waker
CPU having only 1 runnable task over idle CPUs in the waker cluster.

Change-Id: Id974748c07625c1b19112235f426a5d204dfdb33
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-10-02 10:52:06 +05:30
Joonwoo Park
cc60f0790f sched: constrain HMP scheduler tunable range with in better way
HMP scheduler tunables can be constrained via extra1 and extra2 of
ctl_table.  Having valid range in the sysctl table gives clearer
view of tunable's range.

Also add range for sched_select_prev_cpu_us so we can avoid invalid
value configuration of that tunable.

CRs-fixed: 1056910
Change-Id: I09fcc019133f4d37b7be3287da8e0733e40fc0ac
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-09-21 16:50:42 -07:00
Pavankumar Kondeti
078568e425 sched: Introduce sched_freq_aggregate_threshold tunable
Do the aggregation for frequency only when the total group busy time
is above sched_freq_aggregate_threshold. This filtering is especially
needed for the cases where groups are created by including all threads
of an application process. This knob can be tuned to apply aggregation
only for the heavy workload applications.

When this knob is enabled and load is aggregated, the load is not
clipped to 100% @ current frequency to ramp up the frequency faster.

Change-Id: Icfd91c85938def101a989af3597d3dcaa8026d16
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-08-22 14:06:37 -07:00
Pavankumar Kondeti
5ddfbfec06 sched: inherit the group id from the group leader
When sysctl_sched_enable_thread_grouping is set to 1, any new tasks
created are put in the same group as their group leader.

Change-Id: If1837dd7c8120c8b097cfffa1dc52eb4781f1641
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-08-22 14:06:35 -07:00
Syed Rameez Mustafa
62f2600ce9 sched: Remove all existence of CONFIG_SCHED_FREQ_INPUT
CONFIG_SCHED_FREQ_INPUT was created to keep parts of the scheduler
dealing with frequency separate from other parts of the scheduler
that deal with task placement. However, overtime the two features
have become intricately linked whereby SCHED_FREQ_INPUT cannot be
turned on without having SCHED_HMP turned on as well. Given this
complex inter-dependency and the fact that all old, existing and
future targets use both config options, remove this unnecessary
feature separation. It will aid in making kernel upgrades a lot
simpler and faster.

Change-Id: Ia20e40d8a088d50909cc28f5be758fa3e9a4af6f
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-08-22 11:37:22 -07:00
Syed Rameez Mustafa
ef1e55638d sched: Remove unused migration notifier code.
Migration notifiers were created to aid the CPU-boost driver manage
CPU frequencies when tasks migrate from one CPU to another. Over time
with the evolution of scheduler guided frequency, the scheduler now
directly manages load when tasks migrate. Consequently the CPU-boost
driver no longer makes use of this information. Remove unused code
pertaining to this feature.

Change-Id: I3529e4356e15e342a5fcfbcf3654396752a1d7cd
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-08-22 11:32:19 -07:00
Subash Abhinov Kasiviswanathan
ea60e2fbe4 Revert "kernel/sysctl.c: detect overflows when converting to int"
We have scripts which write to certain fields on 3.18 kernels but
this seems to be failing on 4.4 kernels.
An entry which we write to here is xfrm_aevent_rseqth which is u32.

echo 4294967295  > /proc/sys/net/core/xfrm_aevent_rseqth

Commit 230633d109 ("kernel/sysctl.c:
detect overflows when converting to int") prevented writing to
sysctl entries when integer overflow occurs. However, this does not
apply to unsigned integers.

u32 should be able to hold 4294967295 here, however it fails due to
this check.

static int do_proc_dointvec_conv(bool *negp, unsigned long *lvalp,
			if (*lvalp > (unsigned long) INT_MAX)
				return -EINVAL;

Fix this for now by reverting this commit till a solution is
finalized upstream.

CRs-Fixed: 1026507
Change-Id: I4fae5f442e4cc2c2414a69e960d42c05c3062415
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
2016-06-15 16:16:33 -07:00
Alex Shi
9ad8208bd7 Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2016-06-14 17:08:03 +08:00
Willy Tarreau
fa6d0ba12a pipe: limit the per-user amount of pages allocated in pipes
commit 759c01142a5d0f364a462346168a56de28a80f52 upstream.

On no-so-small systems, it is possible for a single process to cause an
OOM condition by filling large pipes with data that are never read. A
typical process filling 4000 pipes with 1 MB of data will use 4 GB of
memory. On small systems it may be tricky to set the pipe max size to
prevent this from happening.

This patch makes it possible to enforce a per-user soft limit above
which new pipes will be limited to a single page, effectively limiting
them to 4 kB each, as well as a hard limit above which no new pipes may
be created for this user. This has the effect of protecting the system
against memory abuse without hurting other users, and still allowing
pipes to work correctly though with less data at once.

The limit are controlled by two new sysctls : pipe-user-pages-soft, and
pipe-user-pages-hard. Both may be disabled by setting them to zero. The
default soft limit allows the default number of FDs per process (1024)
to create pipes of the default size (64kB), thus reaching a limit of 64MB
before starting to create only smaller pipes. With 256 processes limited
to 1024 FDs each, this results in 1024*64kB + (256*1024 - 1024) * 4kB =
1084 MB of memory allocated for a user. The hard limit is disabled by
default to avoid breaking existing applications that make intensive use
of pipes (eg: for splicing).

Reported-by: socketpair@gmail.com
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Mitigates: CVE-2013-4312 (Linux 2.0+)
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Moritz Muehlenhoff <moritz@wikimedia.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07 18:14:35 -07:00
Joonwoo Park
42ab5394f4 Revert "sched: warn/panic upon excessive scheduling latency"
This reverts commit 8f90803a45 ("sched: warn/panic upon excessive
scheduling latency") as this feature is no longer used.

Change-Id: I200d0e9e8dad5047522cd02a68de25d4a70a91a4
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-06-03 14:48:17 -07:00
Joonwoo Park
9103cfbaa1 Revert "sched: add scheduling latency tracking procfs node"
This reverts commit b40bf941f6 ("sched: add scheduling latency
tracking procfs node") as this feature is no longer used.

Change-Id: I5de789b6349e6ea78ae3725af2a3ffa72b7b7f11
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-06-03 14:48:05 -07:00
Joonwoo Park
11ad3c4f92 sched: eliminate sched_early_detection_duration knob
Kill unused scheduler knob sched_early_detection_duration.

Change-Id: I36b7a10982367f9c7ab8eefcb8ef1d0f9955601d
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-06-03 14:47:51 -07:00
Joonwoo Park
eedf0821f6 sched: Remove the sched heavy task frequency guidance feature
This has always been unused feature given its limitation of adding
phantom load to the system. Since there are no immediate plans of
using this and the fact that it adds unnecessary complications to
the new load fixup mechanism, remove this feature for now. It can
be revisited later in light of the new mechanism.

Change-Id: Ie9501a898d0f423338293a8dde6bc56f493f1e75
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-06-03 14:47:39 -07:00
Joonwoo Park
6b2c4343e7 sched: eliminate sched_migration_fixup knob
Kill unused scheduler knob sched_migration_fixup.  With this change
scheduler always adjusts CPU's busy time during migration.

Change-Id: I5d59e89d5cc0f2c705c40036cd7b47f5d3f89e58
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-06-03 14:47:25 -07:00
Joonwoo Park
dc284e65df sched: eliminate sched_upmigrate_min_nice knob
Kill unused scheduler knob sched_upmigrate_min_nice.

Change-Id: I53ddfde39c78e78306bd746c1c4da9a94ec67cd8
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-06-03 14:46:44 -07:00
Joonwoo Park
d009f9c149 sched: eliminate sched_enable_power_aware knob and parameter
Kill unused scheduler knob and parameter sched_enable_power_aware.  HMP
scheduler always take into account power cost for placing task.

Change-Id: Ib26a21df9b903baac26c026862b0a41b4a8834f3
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-06-01 15:21:29 -07:00
Joonwoo Park
462213d1ac sched: eliminate sched_freq_account_wait_time knob
Kill unused scheduler knob sched_freq_account_wait_time.

Change-Id: Ib74123ebd69dfa3f86cf7335099f50c12a6e93c3
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-06-01 15:21:18 -07:00
Joonwoo Park
5160d93b6d sched: eliminate sched_account_wait_time knob
Kill unused scheduler knob sched_account_wait_time.  With this change
scheduler always accounts task's wait time into demand.

Change-Id: Ifa4bcb5685798f48fd020f3d0c9853220b3f5fdc
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-06-01 15:21:04 -07:00
Srivatsa Vaddagiri
e6aae1c3e0 sched: Aggregate for frequency
Related threads in a group could execute on different CPUs and hence
present a split-demand picture to cpufreq governor. IOW the governor
fails to see the net cpu demand of all related threads in a given
window if the threads's execution were to be split across CPUs. That
could result in sub-optimal frequency chosen in comparison to the
ideal frequency at which the aggregate work (taken up by related
threads) needs to be run.

This patch aggregates cpu execution stats in a window for all related
threads in a group. This helps present cpu busy time to governor as if
all related threads were part of the same thread and thus help select
the right frequency required by related threads. This aggregation
is done per-cluster.

Change-Id: I71e6047620066323721c6d542034ddd4b2950e7f
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: Fixed notify_migration() to hold rcu read
 lock as this version of Linux doesn't hold p->pi_lock when the
 function gets called while keeping use of rcu_access_pointer() since
 we never dereference return value.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-05-26 15:28:59 -07:00
Joonwoo Park
16ecb20600 sched: restrict sync wakee placement bias with waker's demand
Biasing sync wakee task towards waker CPU's cluster makes sense when the
waker's demand is high enough so the wakee also can take advantage
of high CPU frequency voted because of waker's load.  Placing sync wakee
on the low demand waker's CPU can lead placement imbalance which can
lead unnecessary migration.

Introduce a new tunable "sched_big_waker_task_load" that defines the big
waker so scheduler avoid wakee on waker's cluster bias when the waker's
load is below the tunable.

CRs-fixed: 971295
Change-Id: I1550ede0a71ac8c9be74a7daabe164c6a269a3fb
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
[joonwoop@codeaurora.org: fixed a minor conflict in
 include/linux/sched/sysctl.h.]
2016-03-23 21:25:23 -07:00
Joonwoo Park
616e04a51c sched: add preference for waker cluster CPU in wakee task placement
If sync wakee task's demand is small it's worth to place the wakee task
on waker's cluster for better performance in the sense that waker and
wakee are corelated so the wakee should take advantage of waker cluster's
frequency which is voted by the waker along with cache locality benefit.
While biasing towards the waker's cluster we want to avoid the waker CPU
as much as possible as placing the wakee on the waker's CPU can make the
waker got preempted and migrated by load balancer.

Introduce a new tunable 'sched_small_wakee_task_load' that differentiates
eligible small wakee task and place the small wakee tasks on the waker's
cluster.

CRs-fixed: 971295
Change-Id: I96897d9a72a6f63dca4986d9219c2058cd5a7916
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
[joonwoop@codeaurora.org: fixed a minor conflict in
 include/linux/sched/sysctl.h.]
2016-03-23 21:25:22 -07:00
Pavankumar Kondeti
6d742ce87b sched: Add separate load tracking histogram to predict loads
Current window based load tracking only saves history for five
windows. A historically heavy task's heavy load will be completely
forgotten after five windows of light load. Even before the five
window expires, a heavy task wakes up on same CPU it used to run won't
trigger any frequency change until end of the window. It would starve
for the entire window. It also adds one "small" load window to
history because it's accumulating load at a low frequency, further
reducing the tracked load for this heavy task.

Ideally, scheduler should be able to identify such tasks and notify
governor to increase frequency immediately after it wakes up.

Add a histogram for each task to track a much longer load history. A
prediction will be made based on runtime of previous or current
window, histogram data and load tracked in recent windows. Prediction
of all tasks that is currently running or runnable on a CPU is
aggregated and reported to CPUFreq governor in sched_get_cpus_busy().

sched_get_cpus_busy() now returns predicted busy time in addition
to previous window busy time and new task busy time, scaled to
the CPU maximum possible frequency.

Tunables:

- /proc/sys/kernel/sched_gov_alert_freq (KHz)

This tunable can be used to further filter the notifications.
Frequency alert notification is sent only when the predicted
load exceeds previous window load by sched_gov_alert_freq converted to
load.

Change-Id: If29098cd2c5499163ceaff18668639db76ee8504
Suggested-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
[joonwoop@codeaurora.org: fixed merge conflicts around __migrate_task()
 and removed changes for CONFIG_SCHED_QHMP.]
2016-03-23 21:25:17 -07:00
Pavankumar Kondeti
6418f213ab sched: Revise the inter cluster load balance restrictions
The frequency based inter cluster load balance restrictions are not
reliable as frequency does not provide a good estimate of the CPU's
current load. Replace them with the spill_load and spill_nr_run
based checks.

The higher capacity cluster is restricted from pulling the tasks from
the lower capacity cluster unless all of the lower capacity CPUs are
above spill. This behavior can be controlled by a sysctl tunable and
it is disabled by default (i.e. no load balance restrictions).

Change-Id: I45c09c8adcb61a8a7d4e08beadf2f97f1805fb42
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
[joonwoop@codeaurora.org: fixed merge conflicts due to omitted changes
 for CONFIG_SCHED_QHMP.]
2016-03-23 21:25:13 -07:00
Srivatsa Vaddagiri
3004236139 sched: colocate related threads
Provide userspace interface for tasks to be grouped together as
"related" threads. For example, all threads involved in updating
display buffer could be tagged as related.

Scheduler will attempt to provide special treatment for group of
related threads such as:

1) Colocation of related threads in same "preferred" cluster
2) Aggregation of demand towards determination of cluster frequency

This patch extends scheduler to provide best-effort colocation support
for a group of related threads.

Change-Id: Ic2cd769faf5da4d03a8f3cb0ada6224d0101a5f5
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor merge conflicts.  removed ifdefry
 for CONFIG_SCHED_QHMP.]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>

Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 21:25:12 -07:00
Srivatsa Vaddagiri
df6bfcaf70 sched: Update fair and rt placement logic to use scheduler clusters
Make use of clusters in the fair and rt scheduling classes. This is
needed as the freq domain mask can no longer be used to do correct
task placement. The freq domain mask was being used to demarcate
clusters.

Change-Id: I57f74147c7006f22d6760256926c10fd0bf50cbd
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed merge conflicts due to omitted changes
 for CONFIG_SCHED_QHMP.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 21:25:11 -07:00
Vinayak Menon
ad6f486284 mm: swap: swap ratio support
Add support to receive a static ratio from userspace to
divide the swap pages between ZRAM and disk based swap
devices. The existing infrastructure allows to keep
same priority for multiple swap devices, which results
in round robin distribution of pages. With this patch,
the ratio can be defined.

CRs-fixed: 968416
Change-Id: I54f54489db84cabb206569dd62d61a8a7a898991
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2016-03-23 21:19:04 -07:00
Joonwoo Park
20af1732c6 sched: fix compile error where !CONFIG_SCHED_FREQ_INPUT
The sysctl node sched_new_task_windows is only for CONFIG_SCHED_HMP and
CONFIG_SCHED_FREQ_INPUT.

Change-Id: I4791e977fa8516fd2cd31198f71103b8d7e874c3
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:44 -07:00
Joonwoo Park
07eb3f803b sched: select task's prev_cpu as the best CPU when it was chosen recently
Select given task's prev_cpu when the task slept for short period to
reduce latency of task placement and migrations.  A new tunable
/proc/sys/kernel/sched_select_prev_cpu_us introduced to determine whether
tasks are eligible to go through fast path.

CRs-fixed: 947467
Change-Id: Ia507665b91f4e9f0e6ee1448d8df8994ead9739a
[joonwoop@codeaurora.org: fixed conflict in include/linux/sched.h,
 include/linux/sched/sysctl.h, kernel/sched/core.c and kernel/sysctl.c]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:43 -07:00
Syed Rameez Mustafa
c00814c023 sched: Notify cpufreq governor early about potential big tasks
Tasks that are on the runqueue continuously for a certain amount of time
have the potential to be big tasks at the end of the window in which they
are runnable. In such scenarios ramping the CPU frequency early can
boost performance rather than waiting till the end of a window for the
governor to query load. Notify the governor early at every tick when a
task has been observed to execute beyond some percentage of the tick
period.

The threshold beyond which a task is eligible for early detection can be
changed via the tunable sched_early_detection_duration. The feature itself
is enabled only when scheduler boost is in effect.

Change-Id: I528b72bbc79a55b4593d1b8ab45450411c6d70f3
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict in scheduler_tick() in
 kernel/sched/core.c.  fixed minor conflicts in include/linux/sched.h,
 include/linux/sched/sysctl.h and kernel/sysctl.c due to
 CONFIG_SCHED_QHMP.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:34 -07:00
Joonwoo Park
446beddcd4 sched: account new task load so that governor can apply different policy
Account amount of load contributed by new tasks within CPU load so that
governor can apply different policy when CPU is loaded by new tasks.

To be able to distinguish new task load a new tunable
sched_new_task_windows also introduced.  The tunable defines tasks as new
when the tasks are have been active less than configured windows.

Change-Id: I2e2e62e4103882f7362154b792ab978b181b9f59
Suggested-by: Saravana Kannan <skannan@codeaurora.org>
[joonwoop@codeaurora.org: ommited changes for
 drivers/cpufreq/cpufreq_interactive.c.  cpufreq changes needs to be
 applied separately later.  fixed conflict in include/linux/sched.h and
 include/linux/sched/sysctl.h.  omitted changes for qhmp_core.c]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:29 -07:00
Syed Rameez Mustafa
ca42a1bec8 sched: add frequency zone awareness to the load balancer
Add zone awareness to the load balancer. Remove all earlier restrictions
that the load balancer had for inter cluster kicks and migration.

Change-Id: I12ad3d0c2d2e9bb498f49a231810f2ad418b061f
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in nohz_kick_needed() due
 to its return type change.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:21 -07:00
Syed Rameez Mustafa
d590f25153 sched: remove the notion of small tasks and small task packing
Task packing will now be determined solely on the basis of the
power cost of task placement. All tasks are eligible for packing.
Remove the notion of "small" tasks from the scheduler.

Change-Id: I72d52d04b2677c6a8d0bc6aa7d50ff0f1a4f5ebb
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 20:02:19 -07:00
Syed Rameez Mustafa
f2ea07a155 sched: Rework energy aware scheduling
Energy aware core rotation is not compatible with the power
based task placement being introduced in subsequent patches.
Remove all existing EA based task placement/migration logic.
power_cost() is the only function remaining. This function has
been modified to return the total power cost associated with a
task on a given CPU taking existing load on that CPU into
account.

Change-Id: Ia00501e3cbfc6e11446a9a2e93e318c4c42bdab4
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed multiple conflicts in fair.c and minor
 conflict in features.h]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:18 -07:00
Joonwoo Park
81280a6963 sched: check HMP scheduler tunables validity
Check tunables validity to take valid values only.

CRs-fixed: 812443
Change-Id: Ibb9ec0d6946247068174ab7abe775a6389412d5b
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:53 -07:00
Joonwoo Park
b40bf941f6 sched: add scheduling latency tracking procfs node
Add a new procfs node /proc/sys/kernel/sched_max_latency_us to track the
worst scheduling latency.  It provides easier way to identify maximum
scheduling latency seen across the CPUs.

Change-Id: I6e435bbf825c0a4dff2eded4a1256fb93f108d0e
[joonwoop@codeaurora.org: fixed conflict in update_stats_wait_end().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:50 -07:00
Joonwoo Park
8f90803a45 sched: warn/panic upon excessive scheduling latency
Add new tunables /proc/sys/kernel/sched_latency_warn_threshold_us and
/proc/sys/kernel/sched_latency_panic_threshold_us to warn or panic for the
cases that tasks are runnable but not scheduled more than configured time.

This helps to find out unacceptably high scheduling latency more easily.

Change-Id: If077aba6211062cf26ee289970c5abcd1c218c82
[joonwoop@codeaurora.org: fixed conflict in update_stats_wait_end().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:49 -07:00
Srivatsa Vaddagiri
5b45dc56e5 sched: Per-cpu prefer_idle flag
Remove the global sysctl_sched_prefer_idle flag and replace it with a
per-cpu prefer_idle flag. The per-cpu flag is expected to same for all
cpus in a cluster. It thus provides convenient means to disable
packing in one cluster while allowing packing in another cluster.

Change-Id: Ie4cc73bb1a55b4eac5697be38e558546161faca1
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 20:01:26 -07:00
Olav Haugan
3f947e7ba7 sched: Add sysctl to enable power aware scheduling
Add sysctl to enable energy awareness at runtime. This is useful for
performance/power tuning/measurements and debugging. In addition this
will match up with the Documentation/scheduler/sched-hmp.txt documentation.

Change-Id: I0a9185498640d66917b38bf5d55f6c59fc60ad5c
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
2016-03-23 20:01:24 -07:00
Joonwoo Park
790d5d8a4a sched: Prevent race conditions where upmigrate_min_nice changes
When upmigrate_min_nice is changed dec_nr_big_small_task() can trigger
BUG_ON(rq->nr_big_tasks < 0).  This happens when there is a task which was
considered as non-big task due to its nice > upmigrate_min_nice and later
upmigrate_min_nice is changed to higher value so the task becomes big task.
In this case runqueue still has nr_big_tasks = 0 incorrectly with current
implementation.  Consequently next scheduler tick sees a big task to
schedule and try to decrease nr_big_tasks which is already 0.

Introduce sched_upmigrate_min_nice which is updated atomically and re-count
the number of big and small tasks to fix BUG_ON() triggering.

Change-Id: I6f5fc62ed22bbe5c52ec71613082a6e64f406e58
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:22 -07:00
Olav Haugan
90dc3fa9a7 sched: Avoid frequent task migration due to EA in lb
A new tunable exists that allow task migration to be throttled when the
scheduler tries to do task migrations due to Energy Awareness (EA). This
tunable is only taken into account when migrations occur in the tick
path. Extend the usage of the tunable to take into account the load
balancer (lb) path also.

In addition ensure that the start of task execution on a CPU is updated
correctly. If a task is preempted but still runnable on the same CPU the
start of execution should not be updated. Only update the start of
execution when a task wakes up after sleep or moves to a new CPU.

Change-Id: I6b2a8e06d8d2df8e0f9f62b7aba3b4ee4b2c1c4d
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
[joonwoop@codeaurora.org: fixed conflict in group_classify() and
 set_task_cpu().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:21 -07:00
Srivatsa Vaddagiri
29a412dffa sched: Avoid frequent migration of running task
Power values for cpus can drop quite considerably when it goes idle.
As a result, the best choice for running a single task in a cluster
can vary quite rapidly. As the task keeps hopping cpus, other cpus go
idle and start being seen as more favorable target for running a task,
leading to task migrating almost every scheduler tick!

Prevent this by keeping track of when a task started running on a cpu
and allowing task migration in tick path (migration_needed()) on
account of energy efficiency reasons only if the task has run
sufficiently long (as determined by sysctl_sched_min_runtime
variable).

Note that currently sysctl_sched_min_runtime setting is considered
only in scheduler_tick()->migration_needed() path and not in
idle_balance() path. In other words, a task could be migrated to
another cpu which did a idle_balance(). This limitation should not
affect high-frequency migrations seen typically (when a single
high-demand task runs on high-performance cpu).

CRs-Fixed: 756570
Change-Id: I96413b7a81b623193c3bbcec6f3fa9dfec367d99
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict in set_task_cpu() and
 __schedule().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:17 -07:00
Srivatsa Vaddagiri
72b7c5d36c sched: Provide knob to prefer mostly_idle over idle cpus
sysctl_sched_prefer_idle lets the scheduler bias selection of
idle cpus over mostly idle cpus for tasks. This knob could be
useful to control balance between power and performance.

Change-Id: Ide6eef684ef94ac8b9927f53c220ccf94976fe67
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 20:01:12 -07:00
Steve Muckle
588055e8c7 sched: make sched_cpu_high_irqload a runtime tunable
It may be desirable to be able to alter the scehd_cpu_high_irqload
setting easily, so make it a runtime tunable value.

Change-Id: I832030eec2aafa101f0f435a4fd2d401d447880d
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23 20:01:11 -07:00
Srivatsa Vaddagiri
b2e57842c0 sched: per-cpu mostly_idle threshold
sched_mostly_idle_load and sched_mostly_idle_nr_run knobs help pack
tasks on cpus to some extent. In some cases, it may be desirable to
have different packing limits for different cpus. For example, pack to
a higher limit on high-performance cpus compared to power-efficient
cpus.

This patch removes the global mostly_idle tunables and makes them
per-cpu, thus letting task packing behavior to be controlled in a
fine-grained manner.

Change-Id: Ifc254cda34b928eae9d6c342ce4c0f64e531e6c2
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 20:00:59 -07:00
Srivatsa Vaddagiri
98f89f00dc sched: update governor notification logic
Make criteria for notifying governor to be per-cpu. Governor is
notified of any large change in cpu's busy time statistics
(rq->prev_runnable_sum) since the last reported value.

Change-Id: I727354d994d909b166d093b94d3dade7c7dddc0d
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 20:00:54 -07:00