* msm-4.4/tmp-510d0a3f:
Linux 4.4.11
nf_conntrack: avoid kernel pointer value leak in slab name
drm/radeon: fix DP link training issue with second 4K monitor
drm/i915/bdw: Add missing delay during L3 SQC credit programming
drm/i915: Bail out of pipe config compute loop on LPT
drm/radeon: fix PLL sharing on DCE6.1 (v2)
Revert "[media] videobuf2-v4l2: Verify planes array in buffer dequeueing"
Input: max8997-haptic - fix NULL pointer dereference
get_rock_ridge_filename(): handle malformed NM entries
tools lib traceevent: Do not reassign parg after collapse_tree()
qla1280: Don't allocate 512kb of host tags
atomic_open(): fix the handling of create_error
regulator: axp20x: Fix axp22x ldo_io voltage ranges
regulator: s2mps11: Fix invalid selector mask and voltages for buck9
workqueue: fix rebind bound workers warning
ARM: dts: at91: sam9x5: Fix the memory range assigned to the PMC
vfs: rename: check backing inode being equal
vfs: add vfs_select_inode() helper
perf/core: Disable the event on a truncated AUX record
regmap: spmi: Fix regmap_spmi_ext_read in multi-byte case
pinctrl: at91-pio4: fix pull-up/down logic
spi: spi-ti-qspi: Handle truncated frames properly
spi: spi-ti-qspi: Fix FLEN and WLEN settings if bits_per_word is overridden
spi: pxa2xx: Do not detect number of enabled chip selects on Intel SPT
ALSA: hda - Fix broken reconfig
ALSA: hda - Fix white noise on Asus UX501VW headset
ALSA: hda - Fix subwoofer pin on ASUS N751 and N551
ALSA: usb-audio: Yet another Phoneix Audio device quirk
ALSA: usb-audio: Quirk for yet another Phoenix Audio devices (v2)
crypto: testmgr - Use kmalloc memory for RSA input
crypto: hash - Fix page length clamping in hash walk
crypto: qat - fix invalid pf2vf_resp_wq logic
s390/mm: fix asce_bits handling with dynamic pagetable levels
zsmalloc: fix zs_can_compact() integer overflow
ocfs2: fix posix_acl_create deadlock
ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang
net/route: enforce hoplimit max value
tcp: refresh skb timestamp at retransmit time
net: thunderx: avoid exposing kernel stack
net: fix a kernel infoleak in x25 module
uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h MIME-Version: 1.0
bridge: fix igmp / mld query parsing
net: bridge: fix old ioctl unlocked net device walk
VSOCK: do not disconnect socket when peer has shutdown SEND only
net/mlx4_en: Fix endianness bug in IPV6 csum calculation
net: fix infoleak in rtnetlink
net: fix infoleak in llc
net: fec: only clear a queue's work bit if the queue was emptied
netem: Segment GSO packets on enqueue
sch_dsmark: update backlog as well
sch_htb: update backlog as well
net_sched: update hierarchical backlog too
net_sched: introduce qdisc_replace() helper
gre: do not pull header in ICMP error processing
net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case
samples/bpf: fix trace_output example
bpf: fix check_map_func_compatibility logic
bpf: fix refcnt overflow
bpf: fix double-fdput in replace_map_fd_with_map_ptr()
net/mlx4_en: fix spurious timestamping callbacks
ipv4/fib: don't warn when primary address is missing if in_dev is dead
net/mlx5e: Fix minimum MTU
net/mlx5e: Device's mtu field is u16 and not int
openvswitch: use flow protocol when recalculating ipv6 checksums
atl2: Disable unimplemented scatter/gather feature
vlan: pull on __vlan_insert_tag error path and fix csum correction
net: use skb_postpush_rcsum instead of own implementations
cdc_mbim: apply "NDP to end" quirk to all Huawei devices
bpf/verifier: reject invalid LD_ABS | BPF_DW instruction
net: sched: do not requeue a NULL skb
packet: fix heap info leak in PACKET_DIAG_MCLIST sock_diag interface
route: do not cache fib route info on local routes with oif
decnet: Do not build routes to devices without decnet private data.
parisc: Use generic extable search and sort routines
arm64: kasan: Use actual memory node when populating the kernel image shadow
arm64: mm: treat memstart_addr as a signed quantity
arm64: lse: deal with clobbered IP registers after branch via PLT
arm64: mm: check at build time that PAGE_OFFSET divides the VA space evenly
arm64: kasan: Fix zero shadow mapping overriding kernel image shadow
arm64: consistently use p?d_set_huge
arm64: fix KASLR boot-time I-cache maintenance
arm64: hugetlb: partial revert of 66b3923a1a0f
arm64: make irq_stack_ptr more robust
arm64: efi: invoke EFI_RNG_PROTOCOL to supply KASLR randomness
efi: stub: use high allocation for converted command line
efi: stub: add implementation of efi_random_alloc()
efi: stub: implement efi_get_random_bytes() based on EFI_RNG_PROTOCOL
arm64: kaslr: randomize the linear region
arm64: add support for kernel ASLR
arm64: add support for building vmlinux as a relocatable PIE binary
arm64: switch to relative exception tables
extable: add support for relative extables to search and sort routines
scripts/sortextable: add support for ET_DYN binaries
arm64: futex.h: Add missing PAN toggling
arm64: make asm/elf.h available to asm files
arm64: avoid dynamic relocations in early boot code
arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
arm64: add support for module PLTs
arm64: move brk immediate argument definitions to separate header
arm64: mm: use bit ops rather than arithmetic in pa/va translations
arm64: mm: only perform memstart_addr sanity check if DEBUG_VM
arm64: User die() instead of panic() in do_page_fault()
arm64: allow kernel Image to be loaded anywhere in physical memory
arm64: defer __va translation of initrd_start and initrd_end
arm64: move kernel image to base of vmalloc area
arm64: kvm: deal with kernel symbols outside of linear mapping
arm64: decouple early fixmap init from linear mapping
arm64: pgtable: implement static [pte|pmd|pud]_offset variants
arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
arm64: add support for ioremap() block mappings
arm64: prevent potential circular header dependencies in asm/bug.h
of/fdt: factor out assignment of initrd_start/initrd_end
of/fdt: make memblock minimum physical address arch configurable
arm64: Remove the get_thread_info() function
arm64: kernel: Don't toggle PAN on systems with UAO
arm64: cpufeature: Test 'matches' pointer to find the end of the list
arm64: kernel: Add support for User Access Override
arm64: add ARMv8.2 id_aa64mmfr2 boiler plate
arm64: cpufeature: Change read_cpuid() to use sysreg's mrs_s macro
arm64: use local label prefixes for __reg_num symbols
arm64: vdso: Mark vDSO code as read-only
arm64: ubsan: select ARCH_HAS_UBSAN_SANITIZE_ALL
arm64: ptdump: Indicate whether memory should be faulting
arm64: Add support for ARCH_SUPPORTS_DEBUG_PAGEALLOC
arm64: Drop alloc function from create_mapping
arm64: prefetch: add missing #include for spin_lock_prefetch
arm64: lib: patch in prfm for copy_page if requested
arm64: lib: improve copy_page to deal with 128 bytes at a time
arm64: prefetch: add alternative pattern for CPUs without a prefetcher
arm64: prefetch: don't provide spin_lock_prefetch with LSE
arm64: allow vmalloc regions to be set with set_memory_*
arm64: kernel: implement ACPI parking protocol
arm64: mm: create new fine-grained mappings at boot
arm64: ensure _stext and _etext are page-aligned
arm64: mm: allow passing a pgdir to alloc_init_*
arm64: mm: allocate pagetables anywhere
arm64: mm: use fixmap when creating page tables
arm64: mm: add functions to walk tables in fixmap
arm64: mm: add __{pud,pgd}_populate
arm64: mm: avoid redundant __pa(__va(x))
arm64: mm: add functions to walk page tables by PA
arm64: mm: move pte_* macros
arm64: kasan: avoid TLB conflicts
arm64: mm: add code to safely replace TTBR1_EL1
arm64: add function to install the idmap
arm64: unmap idmap earlier
arm64: unify idmap removal
arm64: mm: place empty_zero_page in bss
arm64: mm: specialise pagetable allocators
asm-generic: Fix local variable shadow in __set_fixmap_offset
Eliminate the .eh_frame sections from the aarch64 vmlinux and kernel modules
arm64: Fix an enum typo in mm/dump.c
arm64: kasan: ensure that the KASAN zero page is mapped read-only
arch/arm64/include/asm/pgtable.h: add pmd_mkclean for THP
arm64: hide __efistub_ aliases from kallsyms
Linux 4.4.10
drm/i915/skl: Fix DMC load on Skylake J0 and K0
lib/test-string_helpers.c: fix and improve string_get_size() tests
ACPI / processor: Request native thermal interrupt handling via _OSC
drm/i915: Fake HDMI live status
drm/i915: Make RPS EI/thresholds multiple of 25 on SNB-BDW
drm/i915: Fix eDP low vswing for Broadwell
drm/i915/ddi: Fix eDP VDD handling during booting and suspend/resume
drm/radeon: make sure vertical front porch is at least 1
iio: ak8975: fix maybe-uninitialized warning
iio: ak8975: Fix NULL pointer exception on early interrupt
drm/amdgpu: set metadata pointer to NULL after freeing.
drm/amdgpu: make sure vertical front porch is at least 1
gpu: ipu-v3: Fix imx-ipuv3-crtc module autoloading
nvmem: mxs-ocotp: fix buffer overflow in read
USB: serial: cp210x: add Straizona Focusers device ids
USB: serial: cp210x: add ID for Link ECU
ata: ahci-platform: Add ports-implemented DT bindings.
libahci: save port map for forced port map
powerpc: Fix bad inline asm constraint in create_zero_mask()
ACPICA: Dispatcher: Update thread ID for recursive method calls
x86/sysfb_efi: Fix valid BAR address range check
ARC: Add missing io barriers to io{read,write}{16,32}be()
ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return value
propogate_mnt: Handle the first propogated copy being a slave
fs/pnode.c: treat zero mnt_group_id-s as unequal
x86/tsc: Read all ratio bits from MSR_PLATFORM_INFO
MAINTAINERS: Remove asterisk from EFI directory names
writeback: Fix performance regression in wb_over_bg_thresh()
batman-adv: Reduce refcnt of removed router when updating route
batman-adv: Fix broadcast/ogm queue limit on a removed interface
batman-adv: Check skb size before using encapsulated ETH+VLAN header
batman-adv: fix DAT candidate selection (must use vid)
mm: update min_free_kbytes from khugepaged after core initialization
proc: prevent accessing /proc/<PID>/environ until it's ready
Input: zforce_ts - fix dual touch recognition
HID: Fix boot delay for Creative SB Omni Surround 5.1 with quirk
HID: wacom: Add support for DTK-1651
xen/evtchn: fix ring resize when binding new events
xen/balloon: Fix crash when ballooning on x86 32 bit PAE
xen: Fix page <-> pfn conversion on 32 bit systems
ARM: SoCFPGA: Fix secondary CPU startup in thumb2 kernel
ARM: EXYNOS: Properly skip unitialized parent clock in power domain on
mm/zswap: provide unique zpool name
mm, cma: prevent nr_isolated_* counters from going negative
Minimal fix-up of bad hashing behavior of hash_64()
MD: make bio mergeable
tracing: Don't display trigger file for events that can't be enabled
mac80211: fix statistics leak if dev_alloc_name() fails
ath9k: ar5008_hw_cmn_spur_mitigate: add missing mask_m & mask_p initialisation
lpfc: fix misleading indentation
clk: qcom: msm8960: Fix ce3_src register offset
clk: versatile: sp810: support reentrance
clk: qcom: msm8960: fix ce3_core clk enable register
clk: meson: Fix meson_clk_register_clks() signature type mismatch
clk: rockchip: free memory in error cases when registering clock branches
soc: rockchip: power-domain: fix err handle while probing
clk-divider: make sure read-only dividers do not write to their register
CNS3xxx: Fix PCI cns3xxx_write_config()
mwifiex: fix corner case association failure
ata: ahci_xgene: dereferencing uninitialized pointer in probe
nbd: ratelimit error msgs after socket close
mfd: intel-lpss: Remove clock tree on error path
ipvs: drop first packet to redirect conntrack
ipvs: correct initial offset of Call-ID header search in SIP persistence engine
ipvs: handle ip_vs_fill_iph_skb_off failure
RDMA/iw_cxgb4: Fix bar2 virt addr calculation for T4 chips
Revert: "powerpc/tm: Check for already reclaimed tasks"
arm64: head.S: use memset to clear BSS
efi: stub: define DISABLE_BRANCH_PROFILING for all architectures
arm64: entry: remove pointless SPSR mode check
arm64: mm: move pgd_cache initialisation to pgtable_cache_init
arm64: module: avoid undefined shift behavior in reloc_data()
arm64: module: fix relocation of movz instruction with negative immediate
arm64: traps: address fallout from printk -> pr_* conversion
arm64: ftrace: fix a stack tracer's output under function graph tracer
arm64: pass a task parameter to unwind_frame()
arm64: ftrace: modify a stack frame in a safe way
arm64: remove irq_count and do_softirq_own_stack()
arm64: hugetlb: add support for PTE contiguous bit
arm64: Use PoU cache instr for I/D coherency
arm64: Defer dcache flush in __cpu_copy_user_page
arm64: reduce stack use in irq_handler
arm64: Documentation: add list of software workarounds for errata
arm64: mm: place __cpu_setup in .text
arm64: cmpxchg: Don't incldue linux/mmdebug.h
arm64: mm: fold alternatives into .init
arm64: Remove redundant padding from linker script
arm64: mm: remove pointless PAGE_MASKing
arm64: don't call C code with el0's fp register
arm64: when walking onto the task stack, check sp & fp are in current->stack
arm64: Add this_cpu_ptr() assembler macro for use in entry.S
arm64: irq: fix walking from irq stack to task stack
arm64: Add do_softirq_own_stack() and enable irq_stacks
arm64: Modify stack trace and dump for use with irq_stack
arm64: Store struct thread_info in sp_el0
arm64: Add trace_hardirqs_off annotation in ret_to_user
arm64: ftrace: fix the comments for ftrace_modify_code
arm64: ftrace: stop using kstop_machine to enable/disable tracing
arm64: spinlock: serialise spin_unlock_wait against concurrent lockers
arm64: enable HAVE_IRQ_TIME_ACCOUNTING
arm64: fix COMPAT_SHMLBA definition for large pages
arm64: add __init/__initdata section marker to some functions/variables
arm64: pgtable: implement pte_accessible()
arm64: mm: allow sections for unaligned bases
arm64: mm: detect bad __create_mapping uses
Linux 4.4.9
extcon: max77843: Use correct size for reading the interrupt register
stm class: Select CONFIG_SRCU
megaraid_sas: add missing curly braces in ioctl handler
sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a race
thermal: rockchip: fix a impossible condition caused by the warning
unbreak allmodconfig KCONFIG_ALLCONFIG=...
jme: Fix device PM wakeup API usage
jme: Do not enable NIC WoL functions on S0
bus: imx-weim: Take the 'status' property value into account
ARM: dts: pxa: fix dma engine node to pxa3xx-nand
ARM: dts: armada-375: use armada-370-sata for SATA
ARM: EXYNOS: select THERMAL_OF
ARM: prima2: always enable reset controller
ARM: OMAP3: Add cpuidle parameters table for omap3430
ext4: fix races of writeback with punch hole and zero range
ext4: fix races between buffered IO and collapse / insert range
ext4: move unlocked dio protection from ext4_alloc_file_blocks()
ext4: fix races between page faults and hole punching
perf stat: Document --detailed option
perf tools: handle spaces in file names obtained from /proc/pid/maps
perf hists browser: Only offer symbol scripting when a symbol is under the cursor
mtd: nand: Drop mtd.owner requirement in nand_scan
mtd: brcmnand: Fix v7.1 register offsets
mtd: spi-nor: remove micron_quad_enable()
serial: sh-sci: Remove cpufreq notifier to fix crash/deadlock
ext4: fix NULL pointer dereference in ext4_mark_inode_dirty()
x86/mm/kmmio: Fix mmiotrace for hugepages
perf evlist: Reference count the cpu and thread maps at set_maps()
drivers/misc/ad525x_dpot: AD5274 fix RDAC read back errors
rtc: max77686: Properly handle regmap_irq_get_virq() error code
rtc: rx8025: remove rv8803 id
rtc: ds1685: passing bogus values to irq_restore
rtc: vr41xx: Wire up alarm_irq_enable
rtc: hym8563: fix invalid year calculation
PM / Domains: Fix removal of a subdomain
PM / OPP: Initialize u_volt_min/max to a valid value
misc: mic/scif: fix wrap around tests
misc/bmp085: Enable building as a module
lib/mpi: Endianness fix
fbdev: da8xx-fb: fix videomodes of lcd panels
scsi_dh: force modular build if SCSI is a module
paride: make 'verbose' parameter an 'int' again
regulator: s5m8767: fix get_register() error handling
irqchip/mxs: Fix error check of of_io_request_and_map()
irqchip/sunxi-nmi: Fix error check of of_io_request_and_map()
spi/rockchip: Make sure spi clk is on in rockchip_spi_set_cs
locking/mcs: Fix mcs_spin_lock() ordering
regulator: core: Fix nested locking of supplies
regulator: core: Ensure we lock all regulators
regulator: core: fix regulator_lock_supply regression
Revert "regulator: core: Fix nested locking of supplies"
videobuf2-v4l2: Verify planes array in buffer dequeueing
videobuf2-core: Check user space planes array in dqbuf
USB: usbip: fix potential out-of-bounds write
cgroup: make sure a parent css isn't freed before its children
mm/hwpoison: fix wrong num_poisoned_pages accounting
mm: vmscan: reclaim highmem zone if buffer_heads is over limit
numa: fix /proc/<pid>/numa_maps for THP
mm/huge_memory: replace VM_NO_THP VM_BUG_ON with actual VMA check
memcg: relocate charge moving from ->attach to ->post_attach
cgroup, cpuset: replace cpuset_post_attach_flush() with cgroup_subsys->post_attach callback
slub: clean up code for kmem cgroup support to kmem_cache_free_bulk
workqueue: fix ghost PENDING flag while doing MQ IO
x86/apic: Handle zero vector gracefully in clear_vector_irq()
efi: Expose non-blocking set_variable() wrapper to efivars
efi: Fix out-of-bounds read in variable_matches()
IB/security: Restrict use of the write() interface
IB/mlx5: Expose correct max_sge_rd limit
cxl: Keep IRQ mappings on context teardown
v4l2-dv-timings.h: fix polarity for 4k formats
vb2-memops: Fix over allocation of frame vectors
ASoC: rt5640: Correct the digital interface data select
ASoC: dapm: Make sure we have a card when displaying component widgets
ASoC: ssm4567: Reset device before regcache_sync()
ASoC: s3c24xx: use const snd_soc_component_driver pointer
EDAC: i7core, sb_edac: Don't return NOTIFY_BAD from mce_decoder callback
toshiba_acpi: Fix regression caused by hotkey enabling value
i2c: exynos5: Fix possible ABBA deadlock by keeping I2C clock prepared
i2c: cpm: Fix build break due to incompatible pointer types
perf intel-pt: Fix segfault tracing transactions
drm/i915: Use fw_domains_put_with_fifo() on HSW
drm/i915: Fixup the free space logic in ring_prepare
drm/amdkfd: uninitialized variable in dbgdev_wave_control_set_registers()
drm/i915: skl_update_scaler() wants a rotation bitmask instead of bit number
drm/i915: Cleanup phys status page too
pwm: brcmstb: Fix check of devm_ioremap_resource() return code
drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()
drm/dp/mst: Restore primary hub guid on resume
drm/dp/mst: Validate port in drm_dp_payload_send_msg()
drm/nouveau/gr/gf100: select a stream master to fixup tfb offset queries
drm: Loongson-3 doesn't fully support wc memory
drm/radeon: fix vertical bars appear on monitor (v2)
drm/radeon: forbid mapping of userptr bo through radeon device file
drm/radeon: fix initial connector audio value
drm/radeon: add a quirk for a XFX R9 270X
drm/amdgpu: fix regression on CIK (v2)
amdgpu/uvd: add uvd fw version for amdgpu
drm/amdgpu: bump the afmt limit for CZ, ST, Polaris
drm/amdgpu: use defines for CRTCs and AMFT blocks
drm/amdgpu: when suspending, if uvd/vce was running. need to cancel delay work.
iommu/dma: Restore scatterlist offsets correctly
iommu/amd: Fix checking of pci dma aliases
pinctrl: single: Fix pcs_parse_bits_in_pinctrl_entry to use __ffs than ffs
pinctrl: mediatek: correct debounce time unit in mtk_gpio_set_debounce
xen kconfig: don't "select INPUT_XEN_KBDDEV_FRONTEND"
Input: pmic8xxx-pwrkey - fix algorithm for converting trigger delay
Input: gtco - fix crash on detecting device without endpoints
netlink: don't send NETLINK_URELEASE for unbound sockets
nl80211: check netlink protocol in socket release notification
powerpc: Update TM user feature bits in scan_features()
powerpc: Update cpu_user_features2 in scan_features()
powerpc: scan_features() updates incorrect bits for REAL_LE
crypto: talitos - fix AEAD tcrypt tests
crypto: talitos - fix crash in talitos_cra_init()
crypto: sha1-mb - use corrcet pointer while completing jobs
crypto: ccp - Prevent information leakage on export
iwlwifi: mvm: fix memory leak in paging
iwlwifi: pcie: lower the debug level for RSA semaphore access
s390/pci: add extra padding to function measurement block
cpufreq: intel_pstate: Fix processing for turbo activation ratio
Revert "drm/amdgpu: disable runtime pm on PX laptops without dGPU power control"
Revert "drm/radeon: disable runtime pm on PX laptops without dGPU power control"
drm/i915: Fix race condition in intel_dp_destroy_mst_connector()
drm/qxl: fix cursor position with non-zero hotspot
drm/nouveau/core: use vzalloc for allocating ramht
futex: Acknowledge a new waiter in counter before plist
futex: Handle unlock_pi race gracefully
asm-generic/futex: Re-enable preemption in futex_atomic_cmpxchg_inatomic()
ALSA: hda - Add dock support for ThinkPad X260
ALSA: pcxhr: Fix missing mutex unlock
ALSA: hda - add PCI ID for Intel Broxton-T
ALSA: hda - Keep powering up ADCs on Cirrus codecs
ALSA: hda/realtek - Add ALC3234 headset mode for Optiplex 9020m
ALSA: hda - Don't trust the reported actual power state
x86 EDAC, sb_edac.c: Repair damage introduced when "fixing" channel address
x86/mm/xen: Suppress hugetlbfs in PV guests
arm64: Update PTE_RDONLY in set_pte_at() for PROT_NONE permission
arm64: Honour !PTE_WRITE in set_pte_at() for kernel mappings
sched/cgroup: Fix/cleanup cgroup teardown/init
dmaengine: pxa_dma: fix the maximum requestor line
dmaengine: hsu: correct use of channel status register
dmaengine: dw: fix master selection
debugfs: Make automount point inodes permanently empty
lib: lz4: fixed zram with lz4 on big endian machines
dm cache metadata: fix cmd_read_lock() acquiring write lock
dm cache metadata: fix READ_LOCK macros and cleanup WRITE_LOCK macros
usb: gadget: f_fs: Fix use-after-free
usb: hcd: out of bounds access in for_each_companion
xhci: fix 10 second timeout on removal of PCI hotpluggable xhci controllers
usb: xhci: fix wild pointers in xhci_mem_cleanup
xhci: resume USB 3 roothub first
usb: xhci: applying XHCI_PME_STUCK_QUIRK to Intel BXT B0 host
assoc_array: don't call compare_object() on a node
ARM: OMAP2+: hwmod: Fix updating of sysconfig register
ARM: OMAP2: Fix up interconnect barrier initialization for DRA7
ARM: mvebu: Correct unit address for linksys
ARM: dts: AM43x-epos: Fix clk parent for synctimer
KVM: arm/arm64: Handle forward time correction gracefully
kvm: x86: do not leak guest xcr0 into host interrupt handlers
x86/mce: Avoid using object after free in genpool
block: loop: fix filesystem corruption in case of aio/dio
block: partition: initialize percpuref before sending out KOBJ_ADD
Conflicts:
arch/arm64/Kconfig
arch/arm64/include/asm/cputype.h
arch/arm64/include/asm/hardirq.h
arch/arm64/include/asm/irq.h
arch/arm64/include/asm/mmu_context.h
arch/arm64/kernel/cpu_errata.c
arch/arm64/kernel/cpuinfo.c
arch/arm64/kernel/setup.c
arch/arm64/kernel/smp.c
arch/arm64/kernel/stacktrace.c
arch/arm64/mm/init.c
arch/arm64/mm/mmu.c
arch/arm64/mm/pageattr.c
mm/memcontrol.c
CRs-Fixed: 1069136
Signed-off-by: Bryan Huntsman <bryanh@codeaurora.org>
Signed-off-by: Runmin Wang <runminw@codeaurora.org>
Change-Id: Ie9a16debd0578331a66947376f3b787a7bb54d65
A discrepancy between cpu_online_mask and cpuset's effective_cpus
mask is inevitable during hotplug since cpuset defers updating of
effective_cpus mask using a workqueue, during which time nothing
prevents the system from more hotplug operations. For that reason
guarantee_online_cpus() walks up the cpuset hierarchy until it finds
an intersection under the assumption that top cpuset's effective_cpus
mask intersects with cpu_online_mask even with such a race occurring.
However a sequence of CPU hotplugs can open a time window, during which
none of the effective CPUs in the top cpuset intersect with
cpu_online_mask.
For example when there are 4 possible CPUs 0-3 and only CPU0 is online:
======================== ===========================
cpu_online_mask top_cpuset.effective_cpus
======================== ===========================
echo 1 > cpu2/online.
CPU hotplug notifier woke up hotplug work but not yet scheduled.
[0,2] [0]
echo 0 > cpu0/online.
The workqueue is still runnable.
[2] [0]
======================== ===========================
Now there is no intersection between cpu_online_mask and
top_cpuset.effective_cpus. Thus invoking sys_sched_setaffinity() at
this moment can cause following:
Unable to handle kernel NULL pointer dereference at virtual address 000000d0
------------[ cut here ]------------
Kernel BUG at ffffffc0001389b0 [verbose debug info unavailable]
Internal error: Oops - BUG: 96000005 [#1] PREEMPT SMP
Modules linked in:
CPU: 2 PID: 1420 Comm: taskset Tainted: G W 4.4.8+ #98
task: ffffffc06a5c4880 ti: ffffffc06e124000 task.ti: ffffffc06e124000
PC is at guarantee_online_cpus+0x2c/0x58
LR is at cpuset_cpus_allowed+0x4c/0x6c
<snip>
Process taskset (pid: 1420, stack limit = 0xffffffc06e124020)
Call trace:
[<ffffffc0001389b0>] guarantee_online_cpus+0x2c/0x58
[<ffffffc00013b208>] cpuset_cpus_allowed+0x4c/0x6c
[<ffffffc0000d61f0>] sched_setaffinity+0xc0/0x1ac
[<ffffffc0000d6374>] SyS_sched_setaffinity+0x98/0xac
[<ffffffc000085cb0>] el0_svc_naked+0x24/0x28
The top cpuset's effective_cpus are guaranteed to be identical to
cpu_online_mask eventually. Hence fall back to cpu_online_mask when
there is no intersection between top cpuset's effective_cpus and
cpu_online_mask.
CRs-fixed: 1058529
Change-Id: I83ee4619feff2ca7452119c9baecb6ffde755287
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: cgroups@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.17+
Signed-off-by: Tejun Heo <tj@kernel.org>
This reverts commit 9d6fd2c3e9 ("Merge remote-tracking branch
'msm-4.4/tmp-510d0a3f' into msm-4.4"), because it breaks the
dump parsing tools due to kernel can be loaded anywhere in the memory
now and not fixed at linear mapping.
Change-Id: Id416f0a249d803442847d09ac47781147b0d0ee6
Signed-off-by: Trilok Soni <tsoni@codeaurora.org>
This deliberately changes the behavior of the per-cpuset
cpus file to not be effected by hotplug. When a cpu is offlined,
it will be removed from the cpuset/cpus file. When a cpu is onlined,
if the cpuset originally requested that that cpu was part of the cpuset, that
cpu will be restored to the cpuset. The cpus files still
have to be hierachical, but the ranges no longer have to be out of
the currently online cpus, just the physically present cpus.
Change-Id: I3efbae24a1f6384be1e603fb56f0d3baef61d924
[ohaugan@codeaurora.org: Port to 4.4]
Git-commit: f180bcac788464a0baf3d79d76dd86d6972ea413
Git-repo: https://android.googlesource.com/kernel/common/msm.git
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
* msm-4.4/tmp-510d0a3f:
Linux 4.4.11
nf_conntrack: avoid kernel pointer value leak in slab name
drm/radeon: fix DP link training issue with second 4K monitor
drm/i915/bdw: Add missing delay during L3 SQC credit programming
drm/i915: Bail out of pipe config compute loop on LPT
drm/radeon: fix PLL sharing on DCE6.1 (v2)
Revert "[media] videobuf2-v4l2: Verify planes array in buffer dequeueing"
Input: max8997-haptic - fix NULL pointer dereference
get_rock_ridge_filename(): handle malformed NM entries
tools lib traceevent: Do not reassign parg after collapse_tree()
qla1280: Don't allocate 512kb of host tags
atomic_open(): fix the handling of create_error
regulator: axp20x: Fix axp22x ldo_io voltage ranges
regulator: s2mps11: Fix invalid selector mask and voltages for buck9
workqueue: fix rebind bound workers warning
ARM: dts: at91: sam9x5: Fix the memory range assigned to the PMC
vfs: rename: check backing inode being equal
vfs: add vfs_select_inode() helper
perf/core: Disable the event on a truncated AUX record
regmap: spmi: Fix regmap_spmi_ext_read in multi-byte case
pinctrl: at91-pio4: fix pull-up/down logic
spi: spi-ti-qspi: Handle truncated frames properly
spi: spi-ti-qspi: Fix FLEN and WLEN settings if bits_per_word is overridden
spi: pxa2xx: Do not detect number of enabled chip selects on Intel SPT
ALSA: hda - Fix broken reconfig
ALSA: hda - Fix white noise on Asus UX501VW headset
ALSA: hda - Fix subwoofer pin on ASUS N751 and N551
ALSA: usb-audio: Yet another Phoneix Audio device quirk
ALSA: usb-audio: Quirk for yet another Phoenix Audio devices (v2)
crypto: testmgr - Use kmalloc memory for RSA input
crypto: hash - Fix page length clamping in hash walk
crypto: qat - fix invalid pf2vf_resp_wq logic
s390/mm: fix asce_bits handling with dynamic pagetable levels
zsmalloc: fix zs_can_compact() integer overflow
ocfs2: fix posix_acl_create deadlock
ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang
net/route: enforce hoplimit max value
tcp: refresh skb timestamp at retransmit time
net: thunderx: avoid exposing kernel stack
net: fix a kernel infoleak in x25 module
uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h MIME-Version: 1.0
bridge: fix igmp / mld query parsing
net: bridge: fix old ioctl unlocked net device walk
VSOCK: do not disconnect socket when peer has shutdown SEND only
net/mlx4_en: Fix endianness bug in IPV6 csum calculation
net: fix infoleak in rtnetlink
net: fix infoleak in llc
net: fec: only clear a queue's work bit if the queue was emptied
netem: Segment GSO packets on enqueue
sch_dsmark: update backlog as well
sch_htb: update backlog as well
net_sched: update hierarchical backlog too
net_sched: introduce qdisc_replace() helper
gre: do not pull header in ICMP error processing
net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case
samples/bpf: fix trace_output example
bpf: fix check_map_func_compatibility logic
bpf: fix refcnt overflow
bpf: fix double-fdput in replace_map_fd_with_map_ptr()
net/mlx4_en: fix spurious timestamping callbacks
ipv4/fib: don't warn when primary address is missing if in_dev is dead
net/mlx5e: Fix minimum MTU
net/mlx5e: Device's mtu field is u16 and not int
openvswitch: use flow protocol when recalculating ipv6 checksums
atl2: Disable unimplemented scatter/gather feature
vlan: pull on __vlan_insert_tag error path and fix csum correction
net: use skb_postpush_rcsum instead of own implementations
cdc_mbim: apply "NDP to end" quirk to all Huawei devices
bpf/verifier: reject invalid LD_ABS | BPF_DW instruction
net: sched: do not requeue a NULL skb
packet: fix heap info leak in PACKET_DIAG_MCLIST sock_diag interface
route: do not cache fib route info on local routes with oif
decnet: Do not build routes to devices without decnet private data.
parisc: Use generic extable search and sort routines
arm64: kasan: Use actual memory node when populating the kernel image shadow
arm64: mm: treat memstart_addr as a signed quantity
arm64: lse: deal with clobbered IP registers after branch via PLT
arm64: mm: check at build time that PAGE_OFFSET divides the VA space evenly
arm64: kasan: Fix zero shadow mapping overriding kernel image shadow
arm64: consistently use p?d_set_huge
arm64: fix KASLR boot-time I-cache maintenance
arm64: hugetlb: partial revert of 66b3923a1a0f
arm64: make irq_stack_ptr more robust
arm64: efi: invoke EFI_RNG_PROTOCOL to supply KASLR randomness
efi: stub: use high allocation for converted command line
efi: stub: add implementation of efi_random_alloc()
efi: stub: implement efi_get_random_bytes() based on EFI_RNG_PROTOCOL
arm64: kaslr: randomize the linear region
arm64: add support for kernel ASLR
arm64: add support for building vmlinux as a relocatable PIE binary
arm64: switch to relative exception tables
extable: add support for relative extables to search and sort routines
scripts/sortextable: add support for ET_DYN binaries
arm64: futex.h: Add missing PAN toggling
arm64: make asm/elf.h available to asm files
arm64: avoid dynamic relocations in early boot code
arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
arm64: add support for module PLTs
arm64: move brk immediate argument definitions to separate header
arm64: mm: use bit ops rather than arithmetic in pa/va translations
arm64: mm: only perform memstart_addr sanity check if DEBUG_VM
arm64: User die() instead of panic() in do_page_fault()
arm64: allow kernel Image to be loaded anywhere in physical memory
arm64: defer __va translation of initrd_start and initrd_end
arm64: move kernel image to base of vmalloc area
arm64: kvm: deal with kernel symbols outside of linear mapping
arm64: decouple early fixmap init from linear mapping
arm64: pgtable: implement static [pte|pmd|pud]_offset variants
arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
arm64: add support for ioremap() block mappings
arm64: prevent potential circular header dependencies in asm/bug.h
of/fdt: factor out assignment of initrd_start/initrd_end
of/fdt: make memblock minimum physical address arch configurable
arm64: Remove the get_thread_info() function
arm64: kernel: Don't toggle PAN on systems with UAO
arm64: cpufeature: Test 'matches' pointer to find the end of the list
arm64: kernel: Add support for User Access Override
arm64: add ARMv8.2 id_aa64mmfr2 boiler plate
arm64: cpufeature: Change read_cpuid() to use sysreg's mrs_s macro
arm64: use local label prefixes for __reg_num symbols
arm64: vdso: Mark vDSO code as read-only
arm64: ubsan: select ARCH_HAS_UBSAN_SANITIZE_ALL
arm64: ptdump: Indicate whether memory should be faulting
arm64: Add support for ARCH_SUPPORTS_DEBUG_PAGEALLOC
arm64: Drop alloc function from create_mapping
arm64: prefetch: add missing #include for spin_lock_prefetch
arm64: lib: patch in prfm for copy_page if requested
arm64: lib: improve copy_page to deal with 128 bytes at a time
arm64: prefetch: add alternative pattern for CPUs without a prefetcher
arm64: prefetch: don't provide spin_lock_prefetch with LSE
arm64: allow vmalloc regions to be set with set_memory_*
arm64: kernel: implement ACPI parking protocol
arm64: mm: create new fine-grained mappings at boot
arm64: ensure _stext and _etext are page-aligned
arm64: mm: allow passing a pgdir to alloc_init_*
arm64: mm: allocate pagetables anywhere
arm64: mm: use fixmap when creating page tables
arm64: mm: add functions to walk tables in fixmap
arm64: mm: add __{pud,pgd}_populate
arm64: mm: avoid redundant __pa(__va(x))
arm64: mm: add functions to walk page tables by PA
arm64: mm: move pte_* macros
arm64: kasan: avoid TLB conflicts
arm64: mm: add code to safely replace TTBR1_EL1
arm64: add function to install the idmap
arm64: unmap idmap earlier
arm64: unify idmap removal
arm64: mm: place empty_zero_page in bss
arm64: mm: specialise pagetable allocators
asm-generic: Fix local variable shadow in __set_fixmap_offset
Eliminate the .eh_frame sections from the aarch64 vmlinux and kernel modules
arm64: Fix an enum typo in mm/dump.c
arm64: kasan: ensure that the KASAN zero page is mapped read-only
arch/arm64/include/asm/pgtable.h: add pmd_mkclean for THP
arm64: hide __efistub_ aliases from kallsyms
Linux 4.4.10
drm/i915/skl: Fix DMC load on Skylake J0 and K0
lib/test-string_helpers.c: fix and improve string_get_size() tests
ACPI / processor: Request native thermal interrupt handling via _OSC
drm/i915: Fake HDMI live status
drm/i915: Make RPS EI/thresholds multiple of 25 on SNB-BDW
drm/i915: Fix eDP low vswing for Broadwell
drm/i915/ddi: Fix eDP VDD handling during booting and suspend/resume
drm/radeon: make sure vertical front porch is at least 1
iio: ak8975: fix maybe-uninitialized warning
iio: ak8975: Fix NULL pointer exception on early interrupt
drm/amdgpu: set metadata pointer to NULL after freeing.
drm/amdgpu: make sure vertical front porch is at least 1
gpu: ipu-v3: Fix imx-ipuv3-crtc module autoloading
nvmem: mxs-ocotp: fix buffer overflow in read
USB: serial: cp210x: add Straizona Focusers device ids
USB: serial: cp210x: add ID for Link ECU
ata: ahci-platform: Add ports-implemented DT bindings.
libahci: save port map for forced port map
powerpc: Fix bad inline asm constraint in create_zero_mask()
ACPICA: Dispatcher: Update thread ID for recursive method calls
x86/sysfb_efi: Fix valid BAR address range check
ARC: Add missing io barriers to io{read,write}{16,32}be()
ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return value
propogate_mnt: Handle the first propogated copy being a slave
fs/pnode.c: treat zero mnt_group_id-s as unequal
x86/tsc: Read all ratio bits from MSR_PLATFORM_INFO
MAINTAINERS: Remove asterisk from EFI directory names
writeback: Fix performance regression in wb_over_bg_thresh()
batman-adv: Reduce refcnt of removed router when updating route
batman-adv: Fix broadcast/ogm queue limit on a removed interface
batman-adv: Check skb size before using encapsulated ETH+VLAN header
batman-adv: fix DAT candidate selection (must use vid)
mm: update min_free_kbytes from khugepaged after core initialization
proc: prevent accessing /proc/<PID>/environ until it's ready
Input: zforce_ts - fix dual touch recognition
HID: Fix boot delay for Creative SB Omni Surround 5.1 with quirk
HID: wacom: Add support for DTK-1651
xen/evtchn: fix ring resize when binding new events
xen/balloon: Fix crash when ballooning on x86 32 bit PAE
xen: Fix page <-> pfn conversion on 32 bit systems
ARM: SoCFPGA: Fix secondary CPU startup in thumb2 kernel
ARM: EXYNOS: Properly skip unitialized parent clock in power domain on
mm/zswap: provide unique zpool name
mm, cma: prevent nr_isolated_* counters from going negative
Minimal fix-up of bad hashing behavior of hash_64()
MD: make bio mergeable
tracing: Don't display trigger file for events that can't be enabled
mac80211: fix statistics leak if dev_alloc_name() fails
ath9k: ar5008_hw_cmn_spur_mitigate: add missing mask_m & mask_p initialisation
lpfc: fix misleading indentation
clk: qcom: msm8960: Fix ce3_src register offset
clk: versatile: sp810: support reentrance
clk: qcom: msm8960: fix ce3_core clk enable register
clk: meson: Fix meson_clk_register_clks() signature type mismatch
clk: rockchip: free memory in error cases when registering clock branches
soc: rockchip: power-domain: fix err handle while probing
clk-divider: make sure read-only dividers do not write to their register
CNS3xxx: Fix PCI cns3xxx_write_config()
mwifiex: fix corner case association failure
ata: ahci_xgene: dereferencing uninitialized pointer in probe
nbd: ratelimit error msgs after socket close
mfd: intel-lpss: Remove clock tree on error path
ipvs: drop first packet to redirect conntrack
ipvs: correct initial offset of Call-ID header search in SIP persistence engine
ipvs: handle ip_vs_fill_iph_skb_off failure
RDMA/iw_cxgb4: Fix bar2 virt addr calculation for T4 chips
Revert: "powerpc/tm: Check for already reclaimed tasks"
arm64: head.S: use memset to clear BSS
efi: stub: define DISABLE_BRANCH_PROFILING for all architectures
arm64: entry: remove pointless SPSR mode check
arm64: mm: move pgd_cache initialisation to pgtable_cache_init
arm64: module: avoid undefined shift behavior in reloc_data()
arm64: module: fix relocation of movz instruction with negative immediate
arm64: traps: address fallout from printk -> pr_* conversion
arm64: ftrace: fix a stack tracer's output under function graph tracer
arm64: pass a task parameter to unwind_frame()
arm64: ftrace: modify a stack frame in a safe way
arm64: remove irq_count and do_softirq_own_stack()
arm64: hugetlb: add support for PTE contiguous bit
arm64: Use PoU cache instr for I/D coherency
arm64: Defer dcache flush in __cpu_copy_user_page
arm64: reduce stack use in irq_handler
arm64: Documentation: add list of software workarounds for errata
arm64: mm: place __cpu_setup in .text
arm64: cmpxchg: Don't incldue linux/mmdebug.h
arm64: mm: fold alternatives into .init
arm64: Remove redundant padding from linker script
arm64: mm: remove pointless PAGE_MASKing
arm64: don't call C code with el0's fp register
arm64: when walking onto the task stack, check sp & fp are in current->stack
arm64: Add this_cpu_ptr() assembler macro for use in entry.S
arm64: irq: fix walking from irq stack to task stack
arm64: Add do_softirq_own_stack() and enable irq_stacks
arm64: Modify stack trace and dump for use with irq_stack
arm64: Store struct thread_info in sp_el0
arm64: Add trace_hardirqs_off annotation in ret_to_user
arm64: ftrace: fix the comments for ftrace_modify_code
arm64: ftrace: stop using kstop_machine to enable/disable tracing
arm64: spinlock: serialise spin_unlock_wait against concurrent lockers
arm64: enable HAVE_IRQ_TIME_ACCOUNTING
arm64: fix COMPAT_SHMLBA definition for large pages
arm64: add __init/__initdata section marker to some functions/variables
arm64: pgtable: implement pte_accessible()
arm64: mm: allow sections for unaligned bases
arm64: mm: detect bad __create_mapping uses
Linux 4.4.9
extcon: max77843: Use correct size for reading the interrupt register
stm class: Select CONFIG_SRCU
megaraid_sas: add missing curly braces in ioctl handler
sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a race
thermal: rockchip: fix a impossible condition caused by the warning
unbreak allmodconfig KCONFIG_ALLCONFIG=...
jme: Fix device PM wakeup API usage
jme: Do not enable NIC WoL functions on S0
bus: imx-weim: Take the 'status' property value into account
ARM: dts: pxa: fix dma engine node to pxa3xx-nand
ARM: dts: armada-375: use armada-370-sata for SATA
ARM: EXYNOS: select THERMAL_OF
ARM: prima2: always enable reset controller
ARM: OMAP3: Add cpuidle parameters table for omap3430
ext4: fix races of writeback with punch hole and zero range
ext4: fix races between buffered IO and collapse / insert range
ext4: move unlocked dio protection from ext4_alloc_file_blocks()
ext4: fix races between page faults and hole punching
perf stat: Document --detailed option
perf tools: handle spaces in file names obtained from /proc/pid/maps
perf hists browser: Only offer symbol scripting when a symbol is under the cursor
mtd: nand: Drop mtd.owner requirement in nand_scan
mtd: brcmnand: Fix v7.1 register offsets
mtd: spi-nor: remove micron_quad_enable()
serial: sh-sci: Remove cpufreq notifier to fix crash/deadlock
ext4: fix NULL pointer dereference in ext4_mark_inode_dirty()
x86/mm/kmmio: Fix mmiotrace for hugepages
perf evlist: Reference count the cpu and thread maps at set_maps()
drivers/misc/ad525x_dpot: AD5274 fix RDAC read back errors
rtc: max77686: Properly handle regmap_irq_get_virq() error code
rtc: rx8025: remove rv8803 id
rtc: ds1685: passing bogus values to irq_restore
rtc: vr41xx: Wire up alarm_irq_enable
rtc: hym8563: fix invalid year calculation
PM / Domains: Fix removal of a subdomain
PM / OPP: Initialize u_volt_min/max to a valid value
misc: mic/scif: fix wrap around tests
misc/bmp085: Enable building as a module
lib/mpi: Endianness fix
fbdev: da8xx-fb: fix videomodes of lcd panels
scsi_dh: force modular build if SCSI is a module
paride: make 'verbose' parameter an 'int' again
regulator: s5m8767: fix get_register() error handling
irqchip/mxs: Fix error check of of_io_request_and_map()
irqchip/sunxi-nmi: Fix error check of of_io_request_and_map()
spi/rockchip: Make sure spi clk is on in rockchip_spi_set_cs
locking/mcs: Fix mcs_spin_lock() ordering
regulator: core: Fix nested locking of supplies
regulator: core: Ensure we lock all regulators
regulator: core: fix regulator_lock_supply regression
Revert "regulator: core: Fix nested locking of supplies"
videobuf2-v4l2: Verify planes array in buffer dequeueing
videobuf2-core: Check user space planes array in dqbuf
USB: usbip: fix potential out-of-bounds write
cgroup: make sure a parent css isn't freed before its children
mm/hwpoison: fix wrong num_poisoned_pages accounting
mm: vmscan: reclaim highmem zone if buffer_heads is over limit
numa: fix /proc/<pid>/numa_maps for THP
mm/huge_memory: replace VM_NO_THP VM_BUG_ON with actual VMA check
memcg: relocate charge moving from ->attach to ->post_attach
cgroup, cpuset: replace cpuset_post_attach_flush() with cgroup_subsys->post_attach callback
slub: clean up code for kmem cgroup support to kmem_cache_free_bulk
workqueue: fix ghost PENDING flag while doing MQ IO
x86/apic: Handle zero vector gracefully in clear_vector_irq()
efi: Expose non-blocking set_variable() wrapper to efivars
efi: Fix out-of-bounds read in variable_matches()
IB/security: Restrict use of the write() interface
IB/mlx5: Expose correct max_sge_rd limit
cxl: Keep IRQ mappings on context teardown
v4l2-dv-timings.h: fix polarity for 4k formats
vb2-memops: Fix over allocation of frame vectors
ASoC: rt5640: Correct the digital interface data select
ASoC: dapm: Make sure we have a card when displaying component widgets
ASoC: ssm4567: Reset device before regcache_sync()
ASoC: s3c24xx: use const snd_soc_component_driver pointer
EDAC: i7core, sb_edac: Don't return NOTIFY_BAD from mce_decoder callback
toshiba_acpi: Fix regression caused by hotkey enabling value
i2c: exynos5: Fix possible ABBA deadlock by keeping I2C clock prepared
i2c: cpm: Fix build break due to incompatible pointer types
perf intel-pt: Fix segfault tracing transactions
drm/i915: Use fw_domains_put_with_fifo() on HSW
drm/i915: Fixup the free space logic in ring_prepare
drm/amdkfd: uninitialized variable in dbgdev_wave_control_set_registers()
drm/i915: skl_update_scaler() wants a rotation bitmask instead of bit number
drm/i915: Cleanup phys status page too
pwm: brcmstb: Fix check of devm_ioremap_resource() return code
drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()
drm/dp/mst: Restore primary hub guid on resume
drm/dp/mst: Validate port in drm_dp_payload_send_msg()
drm/nouveau/gr/gf100: select a stream master to fixup tfb offset queries
drm: Loongson-3 doesn't fully support wc memory
drm/radeon: fix vertical bars appear on monitor (v2)
drm/radeon: forbid mapping of userptr bo through radeon device file
drm/radeon: fix initial connector audio value
drm/radeon: add a quirk for a XFX R9 270X
drm/amdgpu: fix regression on CIK (v2)
amdgpu/uvd: add uvd fw version for amdgpu
drm/amdgpu: bump the afmt limit for CZ, ST, Polaris
drm/amdgpu: use defines for CRTCs and AMFT blocks
drm/amdgpu: when suspending, if uvd/vce was running. need to cancel delay work.
iommu/dma: Restore scatterlist offsets correctly
iommu/amd: Fix checking of pci dma aliases
pinctrl: single: Fix pcs_parse_bits_in_pinctrl_entry to use __ffs than ffs
pinctrl: mediatek: correct debounce time unit in mtk_gpio_set_debounce
xen kconfig: don't "select INPUT_XEN_KBDDEV_FRONTEND"
Input: pmic8xxx-pwrkey - fix algorithm for converting trigger delay
Input: gtco - fix crash on detecting device without endpoints
netlink: don't send NETLINK_URELEASE for unbound sockets
nl80211: check netlink protocol in socket release notification
powerpc: Update TM user feature bits in scan_features()
powerpc: Update cpu_user_features2 in scan_features()
powerpc: scan_features() updates incorrect bits for REAL_LE
crypto: talitos - fix AEAD tcrypt tests
crypto: talitos - fix crash in talitos_cra_init()
crypto: sha1-mb - use corrcet pointer while completing jobs
crypto: ccp - Prevent information leakage on export
iwlwifi: mvm: fix memory leak in paging
iwlwifi: pcie: lower the debug level for RSA semaphore access
s390/pci: add extra padding to function measurement block
cpufreq: intel_pstate: Fix processing for turbo activation ratio
Revert "drm/amdgpu: disable runtime pm on PX laptops without dGPU power control"
Revert "drm/radeon: disable runtime pm on PX laptops without dGPU power control"
drm/i915: Fix race condition in intel_dp_destroy_mst_connector()
drm/qxl: fix cursor position with non-zero hotspot
drm/nouveau/core: use vzalloc for allocating ramht
futex: Acknowledge a new waiter in counter before plist
futex: Handle unlock_pi race gracefully
asm-generic/futex: Re-enable preemption in futex_atomic_cmpxchg_inatomic()
ALSA: hda - Add dock support for ThinkPad X260
ALSA: pcxhr: Fix missing mutex unlock
ALSA: hda - add PCI ID for Intel Broxton-T
ALSA: hda - Keep powering up ADCs on Cirrus codecs
ALSA: hda/realtek - Add ALC3234 headset mode for Optiplex 9020m
ALSA: hda - Don't trust the reported actual power state
x86 EDAC, sb_edac.c: Repair damage introduced when "fixing" channel address
x86/mm/xen: Suppress hugetlbfs in PV guests
arm64: Update PTE_RDONLY in set_pte_at() for PROT_NONE permission
arm64: Honour !PTE_WRITE in set_pte_at() for kernel mappings
sched/cgroup: Fix/cleanup cgroup teardown/init
dmaengine: pxa_dma: fix the maximum requestor line
dmaengine: hsu: correct use of channel status register
dmaengine: dw: fix master selection
debugfs: Make automount point inodes permanently empty
lib: lz4: fixed zram with lz4 on big endian machines
dm cache metadata: fix cmd_read_lock() acquiring write lock
dm cache metadata: fix READ_LOCK macros and cleanup WRITE_LOCK macros
usb: gadget: f_fs: Fix use-after-free
usb: hcd: out of bounds access in for_each_companion
xhci: fix 10 second timeout on removal of PCI hotpluggable xhci controllers
usb: xhci: fix wild pointers in xhci_mem_cleanup
xhci: resume USB 3 roothub first
usb: xhci: applying XHCI_PME_STUCK_QUIRK to Intel BXT B0 host
assoc_array: don't call compare_object() on a node
ARM: OMAP2+: hwmod: Fix updating of sysconfig register
ARM: OMAP2: Fix up interconnect barrier initialization for DRA7
ARM: mvebu: Correct unit address for linksys
ARM: dts: AM43x-epos: Fix clk parent for synctimer
KVM: arm/arm64: Handle forward time correction gracefully
kvm: x86: do not leak guest xcr0 into host interrupt handlers
x86/mce: Avoid using object after free in genpool
block: loop: fix filesystem corruption in case of aio/dio
block: partition: initialize percpuref before sending out KOBJ_ADD
Conflicts:
arch/arm64/Kconfig
arch/arm64/include/asm/cputype.h
arch/arm64/include/asm/hardirq.h
arch/arm64/include/asm/irq.h
arch/arm64/kernel/cpu_errata.c
arch/arm64/kernel/cpuinfo.c
arch/arm64/kernel/setup.c
arch/arm64/kernel/smp.c
arch/arm64/kernel/stacktrace.c
arch/arm64/mm/init.c
arch/arm64/mm/mmu.c
arch/arm64/mm/pageattr.c
mm/memcontrol.c
CRs-Fixed: 1054234
Signed-off-by: Trilok Soni <tsoni@codeaurora.org>
Change-Id: I2a7a34631ffee36ce18b9171f16d023be777392f
This patch provides a allow_attach hook for cpusets,
which resolves lots of the following logcat noise.
W SchedPolicy: add_tid_to_cgroup failed to write '2816' (Permission denied); fd=29
W ActivityManager: Failed setting process group of 2816 to 0
W System.err: java.lang.IllegalArgumentException
W System.err: at android.os.Process.setProcessGroup(Native Method)
W System.err: at com.android.server.am.ActivityManagerService.applyOomAdjLocked(ActivityManagerService.java:18763)
W System.err: at com.android.server.am.ActivityManagerService.updateOomAdjLocked(ActivityManagerService.java:19028)
W System.err: at com.android.server.am.ActivityManagerService.updateOomAdjLocked(ActivityManagerService.java:19106)
W System.err: at com.android.server.am.ActiveServices.serviceDoneExecutingLocked(ActiveServices.java:2015)
W System.err: at com.android.server.am.ActiveServices.publishServiceLocked(ActiveServices.java:905)
W System.err: at com.android.server.am.ActivityManagerService.publishService(ActivityManagerService.java:16065)
W System.err: at android.app.ActivityManagerNative.onTransact(ActivityManagerNative.java:1007)
W System.err: at com.android.server.am.ActivityManagerService.onTransact(ActivityManagerService.java:2493)
W System.err: at android.os.Binder.execTransact(Binder.java:453)
Change-Id: Ic1b61b2bbb7ce74c9e9422b5e22ee9078251de21
[Ported to 4.4, added commit message]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Git-commit: dd802a9c97
Git-repo: https://android.googlesource.com/kernel/common/
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
commit 5cf1cacb49aee39c3e02ae87068fc3c6430659b0 upstream.
Since e93ad19d0564 ("cpuset: make mm migration asynchronous"), cpuset
kicks off asynchronous NUMA node migration if necessary during task
migration and flushes it from cpuset_post_attach_flush() which is
called at the end of __cgroup_procs_write(). This is to avoid
performing migration with cgroup_threadgroup_rwsem write-locked which
can lead to deadlock through dependency on kworker creation.
memcg has a similar issue with charge moving, so let's convert it to
an official callback rather than the current one-off cpuset specific
function. This patch adds cgroup_subsys->post_attach callback and
makes cpuset register cpuset_post_attach_flush() as its ->post_attach.
The conversion is mostly one-to-one except that the new callback is
called under cgroup_mutex. This is to guarantee that no other
migration operations are started before ->post_attach callbacks are
finished. cgroup_mutex is one of the outermost mutex in the system
and has never been and shouldn't be a problem. We can add specialized
synchronization around __cgroup_procs_write() but I don't think
there's any noticeable benefit.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e93ad19d05648397ef3bcb838d26aec06c245dc0 upstream.
If "cpuset.memory_migrate" is set, when a process is moved from one
cpuset to another with a different memory node mask, pages in used by
the process are migrated to the new set of nodes. This was performed
synchronously in the ->attach() callback, which is synchronized
against process management. Recently, the synchronization was changed
from per-process rwsem to global percpu rwsem for simplicity and
optimization.
Combined with the synchronous mm migration, this led to deadlocks
because mm migration could schedule a work item which may in turn try
to create a new worker blocking on the process management lock held
from cgroup process migration path.
This heavy an operation shouldn't be performed synchronously from that
deep inside cgroup migration in the first place. This patch punts the
actual migration to an ordered workqueue and updates cgroup process
migration and cpuset config update paths to flush the workqueue after
all locks are released. This way, the operations still seem
synchronous to userland without entangling mm migration with process
management synchronization. CPU hotplug can also invoke mm migration
but there's no reason for it to wait for mm migrations and thus
doesn't synchronize against their completions.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Consider the following v2 hierarchy.
P0 (+memory) --- P1 (-memory) --- A
\- B
P0 has memory enabled in its subtree_control while P1 doesn't. If
both A and B contain processes, they would belong to the memory css of
P1. Now if memory is enabled on P1's subtree_control, memory csses
should be created on both A and B and A's processes should be moved to
the former and B's processes the latter. IOW, enabling controllers
can cause atomic migrations into different csses.
The core cgroup migration logic has been updated accordingly but the
controller migration methods haven't and still assume that all tasks
migrate to a single target css; furthermore, the methods were fed the
css in which subtree_control was updated which is the parent of the
target csses. pids controller depends on the migration methods to
move charges and this made the controller attribute charges to the
wrong csses often triggering the following warning by driving a
counter negative.
WARNING: CPU: 1 PID: 1 at kernel/cgroup_pids.c:97 pids_cancel.constprop.6+0x31/0x40()
Modules linked in:
CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #29
...
ffffffff81f65382 ffff88007c043b90 ffffffff81551ffc 0000000000000000
ffff88007c043bc8 ffffffff810de202 ffff88007a752000 ffff88007a29ab00
ffff88007c043c80 ffff88007a1d8400 0000000000000001 ffff88007c043bd8
Call Trace:
[<ffffffff81551ffc>] dump_stack+0x4e/0x82
[<ffffffff810de202>] warn_slowpath_common+0x82/0xc0
[<ffffffff810de2fa>] warn_slowpath_null+0x1a/0x20
[<ffffffff8118e031>] pids_cancel.constprop.6+0x31/0x40
[<ffffffff8118e0fd>] pids_can_attach+0x6d/0xf0
[<ffffffff81188a4c>] cgroup_taskset_migrate+0x6c/0x330
[<ffffffff81188e05>] cgroup_migrate+0xf5/0x190
[<ffffffff81189016>] cgroup_attach_task+0x176/0x200
[<ffffffff8118949d>] __cgroup_procs_write+0x2ad/0x460
[<ffffffff81189684>] cgroup_procs_write+0x14/0x20
[<ffffffff811854e5>] cgroup_file_write+0x35/0x1c0
[<ffffffff812e26f1>] kernfs_fop_write+0x141/0x190
[<ffffffff81265f88>] __vfs_write+0x28/0xe0
[<ffffffff812666fc>] vfs_write+0xac/0x1a0
[<ffffffff81267019>] SyS_write+0x49/0xb0
[<ffffffff81bcef32>] entry_SYSCALL_64_fastpath+0x12/0x76
This patch fixes the bug by removing @css parameter from the three
migration methods, ->can_attach, ->cancel_attach() and ->attach() and
updating cgroup_taskset iteration helpers also return the destination
css in addition to the task being migrated. All controllers are
updated accordingly.
* Controllers which don't care whether there are one or multiple
target csses can be converted trivially. cpu, io, freezer, perf,
netclassid and netprio fall in this category.
* cpuset's current implementation assumes that there's single source
and destination and thus doesn't support v2 hierarchy already. The
only change made by this patchset is how that single destination css
is obtained.
* memory migration path already doesn't do anything on v2. How the
single destination css is obtained is updated and the prep stage of
mem_cgroup_can_attach() is reordered to accomodate the change.
* pids is the only controller which was affected by this bug. It now
correctly handles multi-destination migrations and no longer causes
counter underflow from incorrect accounting.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Merge patch-bomb from Andrew Morton:
- inotify tweaks
- some ocfs2 updates (many more are awaiting review)
- various misc bits
- kernel/watchdog.c updates
- Some of mm. I have a huge number of MM patches this time and quite a
lot of it is quite difficult and much will be held over to next time.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (162 commits)
selftests: vm: add tests for lock on fault
mm: mlock: add mlock flags to enable VM_LOCKONFAULT usage
mm: introduce VM_LOCKONFAULT
mm: mlock: add new mlock system call
mm: mlock: refactor mlock, munlock, and munlockall code
kasan: always taint kernel on report
mm, slub, kasan: enable user tracking by default with KASAN=y
kasan: use IS_ALIGNED in memory_is_poisoned_8()
kasan: Fix a type conversion error
lib: test_kasan: add some testcases
kasan: update reference to kasan prototype repo
kasan: move KASAN_SANITIZE in arch/x86/boot/Makefile
kasan: various fixes in documentation
kasan: update log messages
kasan: accurately determine the type of the bad access
kasan: update reported bug types for kernel memory accesses
kasan: update reported bug types for not user nor kernel memory accesses
mm/kasan: prevent deadlock in kasan reporting
mm/kasan: don't use kasan shadow pointer in generic functions
mm/kasan: MODULE_VADDR is not available on all archs
...
The oom killer takes task_lock() in a couple of places solely to protect
printing the task's comm.
A process's comm, including current's comm, may change due to
/proc/pid/comm or PR_SET_NAME.
The comm will always be NULL-terminated, so the worst race scenario would
only be during update. We can tolerate a comm being printed that is in
the middle of an update to avoid taking the lock.
Other locations in the kernel have already dropped task_lock() when
printing comm, so this is consistent.
Signed-off-by: David Rientjes <rientjes@google.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, cgroup_has_tasks() tests whether the target cgroup has any
css_set linked to it. This works because a css_set's refcnt converges
with the number of tasks linked to it and thus there's no css_set
linked to a cgroup if it doesn't have any live tasks.
To help tracking resource usage of zombie tasks, putting the ref of
css_set will be separated from disassociating the task from the
css_set which means that a cgroup may have css_sets linked to it even
when it doesn't have any live tasks.
This patch replaces cgroup_has_tasks() with cgroup_is_populated()
which tests cgroup->nr_populated instead which locally counts the
number of populated css_sets. Unlike cgroup_has_tasks(),
cgroup_is_populated() is recursive - if any of the descendants is
populated, the cgroup is populated too. While this changes the
meaning of the test, all the existing users are okay with the change.
While at it, replace the open-coded ->populated_cnt test in
cgroup_events_show() with cgroup_is_populated().
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
It wasn't explicitly documented but, when a process is being migrated,
cpuset and memcg depend on cgroup_taskset_first() returning the
threadgroup leader; however, this approach is somewhat ghetto and
would no longer work for the planned multi-process migration.
This patch introduces explicit cgroup_taskset_for_each_leader() which
iterates over only the threadgroup leaders and replaces
cgroup_taskset_first() usages for accessing the leader with it.
This prepares both memcg and cpuset for multi-process migration. This
patch also updates the documentation for cgroup_taskset_for_each() to
clarify the iteration rules and removes comments mentioning task
ordering in tasksets.
v2: A previous patch which added threadgroup leader test was dropped.
Patch updated accordingly.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Zefan Li <lizefan@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
If memory_migrate flag is set, cpuset migrates memory according to the
destnation css's nodemask. The current implementation migrates memory
whenever any thread of a process is migrated making the behavior
somewhat arbitrary. Let's tie memory operations to the threadgroup
leader so that memory is migrated only when the leader is migrated.
While this is a behavior change, given the inherent fuziness, this
change is not too likely to be noticed and allows us to clearly define
who owns the memory (always the leader) and helps the planned atomic
multi-process migration.
Note that we're currently migrating memory in migration path proper
while holding all the locks. In the long term, this should be moved
out to an async work item.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Zefan Li <lizefan@huawei.com>
cftype->mode allows controllers to give arbitrary permissions to
interface knobs. Except for "cgroup.event_control", the existing uses
are spurious.
* Some explicitly specify S_IRUGO | S_IWUSR even though that's the
default.
* "cpuset.memory_pressure" specifies S_IRUGO while also setting a
write callback which returns -EACCES. All it needs to do is simply
not setting a write callback.
"cgroup.event_control" uses cftype->mode to make the file
world-writable. It's a misdesigned interface and we don't want
controllers to be tweaking interface file permissions in general.
This patch removes cftype->mode and all its spurious uses and
implements CFTYPE_WORLD_WRITABLE for "cgroup.event_control" which is
marked as compatibility-only.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
cgroup_on_dfl() tests whether the cgroup's root is the default
hierarchy; however, an individual controller is only interested in
whether the controller is attached to the default hierarchy and never
tests a cgroup which doesn't belong to the hierarchy that the
controller is attached to.
This patch replaces cgroup_on_dfl() tests in controllers with faster
static_key based cgroup_subsys_on_dfl(). This leaves cgroup core as
the only user of cgroup_on_dfl() and the function is moved from the
header file to cgroup.c.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Zefan Li <lizefan@huawei.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
The comment says it's using trialcs->mems_allowed as a temp variable but
it didn't match the code. Change the code to match the comment.
This fixes an issue when writing in cpuset.mems when a sub-directory
exists: we need to write several times for the information to persist:
| root@alban:/sys/fs/cgroup/cpuset# mkdir footest9
| root@alban:/sys/fs/cgroup/cpuset# cd footest9
| root@alban:/sys/fs/cgroup/cpuset/footest9# mkdir aa
| root@alban:/sys/fs/cgroup/cpuset/footest9# cat cpuset.mems
|
| root@alban:/sys/fs/cgroup/cpuset/footest9# echo 0 > cpuset.mems
| root@alban:/sys/fs/cgroup/cpuset/footest9# cat cpuset.mems
|
| root@alban:/sys/fs/cgroup/cpuset/footest9# echo 0 > cpuset.mems
| root@alban:/sys/fs/cgroup/cpuset/footest9# cat cpuset.mems
| 0
| root@alban:/sys/fs/cgroup/cpuset/footest9# cat aa/cpuset.mems
|
| root@alban:/sys/fs/cgroup/cpuset/footest9# echo 0 > aa/cpuset.mems
| root@alban:/sys/fs/cgroup/cpuset/footest9# cat aa/cpuset.mems
| 0
| root@alban:/sys/fs/cgroup/cpuset/footest9#
This should help to fix the following issue in Docker:
https://github.com/opencontainers/runc/issues/133
In some conditions, a Docker container needs to be started twice in
order to work.
Signed-off-by: Alban Crequy <alban@endocode.com>
Tested-by: Iago López Galeiras <iago@endocode.com>
Cc: <stable@vger.kernel.org> # 3.17+
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Nothing calls __cpuset_node_allowed() with __GFP_THISNODE set anymore, so
remove the obscure comment about it and its special-case exception.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Pravin Shelar <pshelar@nicira.com>
Cc: Jarno Rajahalme <jrajahalme@nicira.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ensure that cpus specified with the isolcpus= boot commandline
option stay outside of the load balancing in the kernel scheduler.
Operations like load balancing can introduce unwanted latencies,
which is exactly what the isolcpus= commandline is there to prevent.
Previously, simply creating a new cpuset, without even touching the
cpuset.cpus field inside the new cpuset, would undo the effects of
isolcpus=, by creating a scheduler domain spanning the whole system,
and setting up load balancing inside that domain. The cpuset root
cpuset.cpus file is read-only, so there was not even a way to undo
that effect.
This does not impact the majority of cpusets users, since isolcpus=
is a fairly specialized feature used for realtime purposes.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Clark Williams <williams@redhat.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: cgroups@vger.kernel.org
Signed-off-by: Rik van Riel <riel@redhat.com>
Tested-by: David Rientjes <rientjes@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The cpuset.sched_relax_domain_level can control how far we do
immediate load balancing on a system. However, it was found on recent
kernels that echo'ing a value into cpuset.sched_relax_domain_level
did not reduce any immediate load balancing.
The reason this occurred was because the update_domain_attr_tree() traversal
did not update for the "top_cpuset". This resulted in nothing being changed
when modifying the sched_relax_domain_level parameter.
This patch is able to address that problem by having update_domain_attr_tree()
allow updates for the root in the cpuset traversal.
Fixes: fc560a26ac ("cpuset: replace cpuset->stack_list with cpuset_for_each_descendant_pre()")
Cc: <stable@vger.kernel.org> # 3.9+
Signed-off-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Tested-by: Serge Hallyn <serge.hallyn@canonical.com>
If clone_children is enabled, effective masks won't be initialized
due to the bug:
# mount -t cgroup -o cpuset xxx /mnt
# echo 1 > cgroup.clone_children
# mkdir /mnt/tmp
# cat /mnt/tmp/
# cat cpuset.effective_cpus
# cat cpuset.cpus
0-15
And then this cpuset won't constrain the tasks in it.
Either the bug or the fix has no effect on unified hierarchy, as
there's no clone_chidren flag there any more.
Reported-by: Christian Brauner <christianvanbrauner@gmail.com>
Reported-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: <stable@vger.kernel.org> # 3.17+
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Tested-by: Serge Hallyn <serge.hallyn@canonical.com>
printk and friends can now format bitmaps using '%*pb[l]'. cpumask
and nodemask also provide cpumask_pr_args() and nodemask_pr_args()
respectively which can be used to generate the two printf arguments
necessary to format the specified cpu/nodemask.
* kernel/cpuset.c::cpuset_print_task_mems_allowed() used a static
buffer which is protected by a dedicated spinlock. Removed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The only caller of cpuset_init_current_mems_allowed is the __init
annotated build_all_zonelists_init, so we can also make the former __init.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vishnu Pratap Singh <vishnu.ps@samsung.com>
Cc: Pintu Kumar <pintu.k@samsung.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull cgroup update from Tejun Heo:
"cpuset got simplified a bit. cgroup core got a fix on unified
hierarchy and grew some effective css related interfaces which will be
used for blkio support for writeback IO traffic which is currently
being worked on"
* 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: implement cgroup_get_e_css()
cgroup: add cgroup_subsys->css_e_css_changed()
cgroup: add cgroup_subsys->css_released()
cgroup: fix the async css offline wait logic in cgroup_subtree_control_write()
cgroup: restructure child_subsys_mask handling in cgroup_subtree_control_write()
cgroup: separate out cgroup_calc_child_subsys_mask() from cgroup_refresh_child_subsys_mask()
cpuset: lock vs unlock typo
cpuset: simplify cpuset_node_allowed API
cpuset: convert callback_mutex to a spinlock
How we deal with updates to exclusive cpusets is currently broken.
As an example, suppose we have an exclusive cpuset composed of
two cpus: A[cpu0,cpu1]. We can assign SCHED_DEADLINE task to it
up to the allowed bandwidth. If we want now to modify cpusetA's
cpumask, we have to check that removing a cpu's amount of
bandwidth doesn't break AC guarantees. This thing isn't checked
in the current code.
This patch fixes the problem above, denying an update if the
new cpumask won't have enough bandwidth for SCHED_DEADLINE tasks
that are currently active.
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: cgroups@vger.kernel.org
Link: http://lkml.kernel.org/r/5433E6AF.5080105@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Exclusive cpusets are the only way users can restrict SCHED_DEADLINE tasks
affinity (performing what is commonly called clustered scheduling).
Unfortunately, such thing is currently broken for two reasons:
- No check is performed when the user tries to attach a task to
an exlusive cpuset (recall that exclusive cpusets have an
associated maximum allowed bandwidth).
- Bandwidths of source and destination cpusets are not correctly
updated after a task is migrated between them.
This patch fixes both things at once, as they are opposite faces
of the same coin.
The check is performed in cpuset_can_attach(), as there aren't any
points of failure after that function. The updated is split in two
halves. We first reserve bandwidth in the destination cpuset, after
we pass the check in cpuset_can_attach(). And we then release
bandwidth from the source cpuset when the task's affinity is
actually changed. Even if there can be time windows when sched_setattr()
may erroneously fail in the source cpuset, we are fine with it, as
we can't perfom an atomic update of both cpusets at once.
Reported-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Reported-by: Vincent Legout <vincent@legout.info>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dario Faggioli <raistlin@linux.it>
Cc: Michael Trimarchi <michael@amarulasolutions.com>
Cc: Fabio Checconi <fchecconi@gmail.com>
Cc: michael@amarulasolutions.com
Cc: luca.abeni@unitn.it
Cc: Li Zefan <lizefan@huawei.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: cgroups@vger.kernel.org
Link: http://lkml.kernel.org/r/1411118561-26323-3-git-send-email-juri.lelli@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This will deadlock instead of unlocking.
Fixes: f73eae8d8384 ('cpuset: simplify cpuset_node_allowed API')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Current cpuset API for checking if a zone/node is allowed to allocate
from looks rather awkward. We have hardwall and softwall versions of
cpuset_node_allowed with the softwall version doing literally the same
as the hardwall version if __GFP_HARDWALL is passed to it in gfp flags.
If it isn't, the softwall version may check the given node against the
enclosing hardwall cpuset, which it needs to take the callback lock to
do.
Such a distinction was introduced by commit 02a0e53d82 ("cpuset:
rework cpuset_zone_allowed api"). Before, we had the only version with
the __GFP_HARDWALL flag determining its behavior. The purpose of the
commit was to avoid sleep-in-atomic bugs when someone would mistakenly
call the function without the __GFP_HARDWALL flag for an atomic
allocation. The suffixes introduced were intended to make the callers
think before using the function.
However, since the callback lock was converted from mutex to spinlock by
the previous patch, the softwall check function cannot sleep, and these
precautions are no longer necessary.
So let's simplify the API back to the single check.
Suggested-by: David Rientjes <rientjes@google.com>
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The callback_mutex is only used to synchronize reads/updates of cpusets'
flags and cpu/node masks. These operations should always proceed fast so
there's no reason why we can't use a spinlock instead of the mutex.
Converting the callback_mutex into a spinlock will let us call
cpuset_zone_allowed_softwall from atomic context. This, in turn, makes
it possible to simplify the code by merging the hardwall and asoftwall
checks into the same function, which is the business of the next patch.
Suggested-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Pull cgroup updates from Tejun Heo:
"Nothing too interesting. Just a handful of cleanup patches"
* 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
Revert "cgroup: remove redundant variable in cgroup_mount()"
cgroup: remove redundant variable in cgroup_mount()
cgroup: fix missing unlock in cgroup_release_agent()
cgroup: remove CGRP_RELEASABLE flag
perf/cgroup: Remove perf_put_cgroup()
cgroup: remove redundant check in cgroup_ino()
cpuset: simplify proc_cpuset_show()
cgroup: simplify proc_cgroup_show()
cgroup: use a per-cgroup work for release agent
cgroup: remove bogus comments
cgroup: remove redundant code in cgroup_rmdir()
cgroup: remove some useless forward declarations
cgroup: fix a typo in comment.
When we change cpuset.memory_spread_{page,slab}, cpuset will flip
PF_SPREAD_{PAGE,SLAB} bit of tsk->flags for each task in that cpuset.
This should be done using atomic bitops, but currently we don't,
which is broken.
Tetsuo reported a hard-to-reproduce kernel crash on RHEL6, which happened
when one thread tried to clear PF_USED_MATH while at the same time another
thread tried to flip PF_SPREAD_PAGE/PF_SPREAD_SLAB. They both operate on
the same task.
Here's the full report:
https://lkml.org/lkml/2014/9/19/230
To fix this, we make PF_SPREAD_PAGE and PF_SPREAD_SLAB atomic flags.
v4:
- updated mm/slab.c. (Fengguang Wu)
- updated Documentation.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Kees Cook <keescook@chromium.org>
Fixes: 950592f7b9 ("cpusets: update tasks' page/slab spread flags in time")
Cc: <stable@vger.kernel.org> # 2.6.31+
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Use the ONE macro instead of REG, and we can simplify proc_cpuset_show().
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Pull cgroup changes from Tejun Heo:
"Mostly changes to get the v2 interface ready. The core features are
mostly ready now and I think it's reasonable to expect to drop the
devel mask in one or two devel cycles at least for a subset of
controllers.
- cgroup added a controller dependency mechanism so that block cgroup
can depend on memory cgroup. This will be used to finally support
IO provisioning on the writeback traffic, which is currently being
implemented.
- The v2 interface now uses a separate table so that the interface
files for the new interface are explicitly declared in one place.
Each controller will explicitly review and add the files for the
new interface.
- cpuset is getting ready for the hierarchical behavior which is in
the similar style with other controllers so that an ancestor's
configuration change doesn't change the descendants' configurations
irreversibly and processes aren't silently migrated when a CPU or
node goes down.
All the changes are to the new interface and no behavior changed for
the multiple hierarchies"
* 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (29 commits)
cpuset: fix the WARN_ON() in update_nodemasks_hier()
cgroup: initialize cgrp_dfl_root_inhibit_ss_mask from !->dfl_files test
cgroup: make CFTYPE_ONLY_ON_DFL and CFTYPE_NO_ internal to cgroup core
cgroup: distinguish the default and legacy hierarchies when handling cftypes
cgroup: replace cgroup_add_cftypes() with cgroup_add_legacy_cftypes()
cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes
cgroup: split cgroup_base_files[] into cgroup_{dfl|legacy}_base_files[]
cpuset: export effective masks to userspace
cpuset: allow writing offlined masks to cpuset.cpus/mems
cpuset: enable onlined cpu/node in effective masks
cpuset: refactor cpuset_hotplug_update_tasks()
cpuset: make cs->{cpus, mems}_allowed as user-configured masks
cpuset: apply cs->effective_{cpus,mems}
cpuset: initialize top_cpuset's configured masks at mount
cpuset: use effective cpumask to build sched domains
cpuset: inherit ancestor's masks if effective_{cpus, mems} becomes empty
cpuset: update cs->effective_{cpus, mems} when config changes
cpuset: update cpuset->effective_{cpus,mems} at hotplug
cpuset: add cs->effective_cpus and cs->effective_mems
cgroup: clean up sane_behavior handling
...
The WARN_ON() is used to check if we break the legal hierarchy, on
which the effective mems should be equal to configured mems.
Reported-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Tested-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Currently, cgroup_subsys->base_cftypes is used for both the unified
default hierarchy and legacy ones and subsystems can mark each file
with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to appear
only on one of them. This is quite hairy and error-prone. Also, we
may end up exposing interface files to the default hierarchy without
thinking it through.
cgroup_subsys will grow two separate cftype arrays and apply each only
on the hierarchies of the matching type. This will allow organizing
cftypes in a lot clearer way and encourage subsystems to scrutinize
the interface which is being exposed in the new default hierarchy.
In preparation, this patch renames cgroup_subsys->base_cftypes to
cgroup_subsys->legacy_cftypes. This patch is pure rename.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
cpuset.cpus and cpuset.mems are the configured masks, and we need
to export effective masks to userspace, so users know the real
cpus_allowed and mems_allowed that apply to the tasks in a cpuset.
v2:
- export those masks unconditionally, suggested by Tejun.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
As the configured masks won't be limited by its parent, and the top
cpuset's masks won't change when hotplug happens, it's natural to
allow writing offlined masks to the configured masks.
If on default hierarchy:
# echo 0 > /sys/devices/system/cpu/cpu1/online
# mkdir /cpuset/sub
# echo 1 > /cpuset/sub/cpuset.cpus
# cat /cpuset/sub/cpuset.cpus
1
If on legacy hierarchy:
# echo 0 > /sys/devices/system/cpu/cpu1/online
# mkdir /cpuset/sub
# echo 1 > /cpuset/sub/cpuset.cpus
-bash: echo: write error: Invalid argument
Note the checks don't need to be gated by cgroup_on_dfl, because we've
initialized top_cpuset.{cpus,mems}_allowed accordingly in cpuset_bind().
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Firstly offline cpu1:
# echo 0-1 > cpuset.cpus
# echo 0 > /sys/devices/system/cpu/cpu1/online
# cat cpuset.cpus
0-1
# cat cpuset.effective_cpus
0
Then online it:
# echo 1 > /sys/devices/system/cpu/cpu1/online
# cat cpuset.cpus
0-1
# cat cpuset.effective_cpus
0-1
And cpuset will bring it back to the effective mask.
The implementation is quite straightforward. Instead of calculating the
offlined cpus/mems and do updates, we just set the new effective_mask
to online_mask & congifured_mask.
This is a behavior change for default hierarchy, so legacy hierarchy
won't be affected.
v2:
- make refactoring of cpuset_hotplug_update_tasks() as seperate patch,
suggested by Tejun.
- make hotplug_update_tasks_insane() use @new_cpus and @new_mems as
hotplug_update_tasks_sane() does.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
We mix the handling for both default hierarchy and legacy hierarchy in
the same function, and it's quite messy, so split into two functions.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Now we've used effective cpumasks to enforce hierarchical manner,
we can use cs->{cpus,mems}_allowed as configured masks.
Configured masks can be changed by writing cpuset.cpus and cpuset.mems
only. The new behaviors are:
- They won't be changed by hotplug anymore.
- They won't be limited by its parent's masks.
This ia a behavior change, but won't take effect unless mount with
sane_behavior.
v2:
- Add comments to explain the differences between configured masks and
effective masks.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Now we can use cs->effective_{cpus,mems} as effective masks. It's
used whenever:
- we update tasks' cpus_allowed/mems_allowed,
- we want to retrieve tasks_cs(tsk)'s cpus_allowed/mems_allowed.
They actually replace effective_{cpu,node}mask_cpuset().
effective_mask == configured_mask & parent effective_mask except when
the reault is empty, in which case it inherits parent effective_mask.
The result equals the mask computed from effective_{cpu,node}mask_cpuset().
This won't affect the original legacy hierarchy, because in this case we
make sure the effective masks are always the same with user-configured
masks.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
We now have to support different behaviors for default hierachy and
legacy hiearchy, top_cpuset's configured masks need to be initialized
accordingly.
Suppose we've offlined cpu1.
On default hierarchy:
# mount -t cgroup -o __DEVEL__sane_behavior xxx /cpuset
# cat /cpuset/cpuset.cpus
0-15
On legacy hierarchy:
# mount -t cgroup xxx /cpuset
# cat /cpuset/cpuset.cpus
0,2-15
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
We're going to have separate user-configured masks and effective ones.
Eventually configured masks can only be changed by writing cpuset.cpus
and cpuset.mems, and they won't be restricted by parent cpuset. While
effective masks reflect cpu/memory hotplug and hierachical restriction,
and these are the real masks that apply to the tasks in the cpuset.
We calculate effective mask this way:
- top cpuset's effective_mask == online_mask, otherwise
- cpuset's effective_mask == configured_mask & parent effective_mask,
if the result is empty, it inherits parent effective mask.
Those behavior changes are for default hierarchy only. For legacy
hierarchy, effective_mask and configured_mask are the same, so we won't
break old interfaces.
We should partition sched domains according to effective_cpus, which
is the real cpulist that takes effects on tasks in the cpuset.
This won't introduce behavior change.
v2:
- Add a comment for the call of rebuild_sched_domains(), suggested
by Tejun.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
We're going to have separate user-configured masks and effective ones.
Eventually configured masks can only be changed by writing cpuset.cpus
and cpuset.mems, and they won't be restricted by parent cpuset. While
effective masks reflect cpu/memory hotplug and hierachical restriction,
and these are the real masks that apply to the tasks in the cpuset.
We calculate effective mask this way:
- top cpuset's effective_mask == online_mask, otherwise
- cpuset's effective_mask == configured_mask & parent effective_mask,
if the result is empty, it inherits parent effective mask.
Those behavior changes are for default hierarchy only. For legacy
hierarchy, effective_mask and configured_mask are the same, so we won't
break old interfaces.
To make cs->effective_{cpus,mems} to be effective masks, we need to
- update the effective masks at hotplug
- update the effective masks at config change
- take on ancestor's mask when the effective mask is empty
The last item is done here.
This won't introduce behavior change.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
We're going to have separate user-configured masks and effective ones.
Eventually configured masks can only be changed by writing cpuset.cpus
and cpuset.mems, and they won't be restricted by parent cpuset. While
effective masks reflect cpu/memory hotplug and hierachical restriction,
and these are the real masks that apply to the tasks in the cpuset.
We calculate effective mask this way:
- top cpuset's effective_mask == online_mask, otherwise
- cpuset's effective_mask == configured_mask & parent effective_mask,
if the result is empty, it inherits parent effective mask.
Those behavior changes are for default hierarchy only. For legacy
hierarchy, effective_mask and configured_mask are the same, so we won't
break old interfaces.
To make cs->effective_{cpus,mems} to be effective masks, we need to
- update the effective masks at hotplug
- update the effective masks at config change
- take on ancestor's mask when the effective mask is empty
The second item is done here. We don't need to treat root_cs specially
in update_cpumasks_hier().
This won't introduce behavior change.
v3:
- add a WARN_ON() to check if effective masks are the same with configured
masks on legacy hierarchy.
- pass trialcs->cpus_allowed to update_cpumasks_hier() and add a comment for
it. Similar change for update_nodemasks_hier(). Suggested by Tejun.
v2:
- revise the comment in update_{cpu,node}masks_hier(), suggested by Tejun.
- fix to use @cp instead of @cs in these two functions.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
We're going to have separate user-configured masks and effective ones.
Eventually configured masks can only be changed by writing cpuset.cpus
and cpuset.mems, and they won't be restricted by parent cpuset. While
effective masks reflect cpu/memory hotplug and hierachical restriction,
and these are the real masks that apply to the tasks in the cpuset.
We calculate effective mask this way:
- top cpuset's effective_mask == online_mask, otherwise
- cpuset's effective_mask == configured_mask & parent effective_mask,
if the result is empty, it inherits parent effective mask.
Those behavior changes are for default hierarchy only. For legacy
hierarchy, effective_mask and configured_mask are the same, so we won't
break old interfaces.
To make cs->effective_{cpus,mems} to be effective masks, we need to
- update the effective masks at hotplug
- update the effective masks at config change
- take on ancestor's mask when the effective mask is empty
The first item is done here.
This won't introduce behavior change.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
We're going to have separate user-configured masks and effective ones.
Eventually configured masks can only be changed by writing cpuset.cpus
and cpuset.mems, and they won't be restricted by parent cpuset. While
effective masks reflect cpu/memory hotplug and hierachical restriction,
and these are the real masks that apply to the tasks in the cpuset.
We calculate effective mask this way:
- top cpuset's effective_mask == online_mask, otherwise
- cpuset's effective_mask == configured_mask & parent effective_mask,
if the result is empty, it inherits parent effective mask.
Those behavior changes are for default hierarchy only. For legacy
hierachy, effective_mask and configured_mask are the same, so we won't
break old interfaces.
This patch adds the effective masks to struct cpuset and initializes
them. The effective masks of the top cpuset is the same with configured
masks, and a child cpuset inherits its parent's effective masks.
This won't introduce behavior change.
v2:
- s/real_{mems,cpus}_allowed/effective_{mems,cpus}, suggested by Tejun.
- don't init effective masks in cpuset_css_online() if !cgroup_on_dfl.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
sane_behavior has been used as a development vehicle for the default
unified hierarchy. Now that the default hierarchy is in place, the
flag became redundant and confusing as its usage is allowed on all
hierarchies. There are gonna be either the default hierarchy or
legacy ones. Let's make that clear by removing sane_behavior support
on non-default hierarchies.
This patch replaces cgroup_sane_behavior() with cgroup_on_dfl(). The
comment on top of CGRP_ROOT_SANE_BEHAVIOR is moved to on top of
cgroup_on_dfl() with sane_behavior specific part dropped.
On the default and legacy hierarchies w/o sane_behavior, this
shouldn't cause any behavior differences.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>