Commit graph

700 commits

Author SHA1 Message Date
Runmin Wang
4b7c952db6 Merge tag 'lsk-v4.4-16.12-android' into branch 'msm-4.4'
* remotes/origin/tmp-2f0de51:
  Linux 4.4.38
  esp6: Fix integrity verification when ESN are used
  esp4: Fix integrity verification when ESN are used
  ipv4: Set skb->protocol properly for local output
  ipv6: Set skb->protocol properly for local output
  Don't feed anything but regular iovec's to blk_rq_map_user_iov
  constify iov_iter_count() and iter_is_iovec()
  sparc64: fix compile warning section mismatch in find_node()
  sparc64: Fix find_node warning if numa node cannot be found
  sparc32: Fix inverted invalid_frame_pointer checks on sigreturns
  net: ping: check minimum size on ICMP header length
  net: avoid signed overflows for SO_{SND|RCV}BUFFORCE
  geneve: avoid use-after-free of skb->data
  sh_eth: remove unchecked interrupts for RZ/A1
  net: bcmgenet: Utilize correct struct device for all DMA operations
  packet: fix race condition in packet_set_ring
  net/dccp: fix use-after-free in dccp_invalid_packet
  netlink: Do not schedule work from sk_destruct
  netlink: Call cb->done from a worker thread
  net/sched: pedit: make sure that offset is valid
  net, sched: respect rcu grace period on cls destruction
  net: dsa: bcm_sf2: Ensure we re-negotiate EEE during after link change
  l2tp: fix racy SOCK_ZAPPED flag check in l2tp_ip{,6}_bind()
  rtnetlink: fix FDB size computation
  af_unix: conditionally use freezable blocking calls in read
  net: sky2: Fix shutdown crash
  ip6_tunnel: disable caching when the traffic class is inherited
  net: check dead netns for peernet2id_alloc()
  virtio-net: add a missing synchronize_net()
  Linux 4.4.37
  arm64: suspend: Reconfigure PSTATE after resume from idle
  arm64: mm: Set PSTATE.PAN from the cpu_enable_pan() call
  arm64: cpufeature: Schedule enable() calls instead of calling them via IPI
  pwm: Fix device reference leak
  mwifiex: printk() overflow with 32-byte SSIDs
  PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)
  PCI: Export pcie_find_root_port
  rcu: Fix soft lockup for rcu_nocb_kthread
  ALSA: pcm : Call kill_fasync() in stream lock
  x86/traps: Ignore high word of regs->cs in early_fixup_exception()
  kasan: update kasan_global for gcc 7
  zram: fix unbalanced idr management at hot removal
  ARC: Don't use "+l" inline asm constraint
  Linux 4.4.36
  scsi: mpt3sas: Unblock device after controller reset
  flow_dissect: call init_default_flow_dissectors() earlier
  mei: fix return value on disconnection
  mei: me: fix place for kaby point device ids.
  mei: me: disable driver on SPT SPS firmware
  drm/radeon: Ensure vblank interrupt is enabled on DPMS transition to on
  mpi: Fix NULL ptr dereference in mpi_powm() [ver #3]
  parisc: Also flush data TLB in flush_icache_page_asm
  parisc: Fix race in pci-dma.c
  parisc: Fix races in parisc_setup_cache_timing()
  NFSv4.x: hide array-bounds warning
  apparmor: fix change_hat not finding hat after policy replacement
  cfg80211: limit scan results cache size
  tile: avoid using clocksource_cyc2ns with absolute cycle count
  scsi: mpt3sas: Fix secure erase premature termination
  Fix USB CB/CBI storage devices with CONFIG_VMAP_STACK=y
  USB: serial: ftdi_sio: add support for TI CC3200 LaunchPad
  USB: serial: cp210x: add ID for the Zone DPMX
  usb: chipidea: move the lock initialization to core file
  KVM: x86: check for pic and ioapic presence before use
  KVM: x86: drop error recovery in em_jmp_far and em_ret_far
  iommu/vt-d: Fix IOMMU lookup for SR-IOV Virtual Functions
  iommu/vt-d: Fix PASID table allocation
  sched: tune: Fix lacking spinlock initialization
  UPSTREAM: trace: Update documentation for mono, mono_raw and boot clock
  UPSTREAM: trace: Add an option for boot clock as trace clock
  UPSTREAM: timekeeping: Add a fast and NMI safe boot clock
  ANDROID: goldfish_pipe: fix allmodconfig build
  ANDROID: goldfish: goldfish_pipe: fix locking errors
  ANDROID: video: goldfishfb: fix platform_no_drv_owner.cocci warnings
  ANDROID: goldfish_pipe: fix call_kern.cocci warnings
  arm64: rename ranchu defconfig to ranchu64
  ANDROID: arch: x86: disable pic for Android toolchain
  ANDROID: goldfish_pipe: An implementation of more parallel pipe
  ANDROID: goldfish_pipe: bugfixes and performance improvements.
  ANDROID: goldfish: Add goldfish sync driver
  ANDROID: goldfish: add ranchu defconfigs
  ANDROID: goldfish_audio: Clear audio read buffer status after each read
  ANDROID: goldfish_events: no extra EV_SYN; register goldfish
  ANDROID: goldfish_fb: Set pixclock = 0
  ANDROID: goldfish: Enable ACPI-based enumeration for goldfish audio
  ANDROID: goldfish: Enable ACPI-based enumeration for goldfish framebuffer
  ANDROID: video: goldfishfb: add devicetree bindings
  BACKPORT: staging: goldfish: audio: fix compiliation on arm
  BACKPORT: Input: goldfish_events - enable ACPI-based enumeration for goldfish events
  BACKPORT: goldfish: Enable ACPI-based enumeration for goldfish battery
  BACKPORT: drivers: tty: goldfish: Add device tree bindings
  BACKPORT: tty: goldfish: support platform_device with id -1
  BACKPORT: Input: goldfish_events - add devicetree bindings
  BACKPORT: power: goldfish_battery: add devicetree bindings
  BACKPORT: staging: goldfish: audio: add devicetree bindings
  ANDROID: usb: gadget: function: cleanup: Add blank line after declaration
  cpufreq: sched: Fix kernel crash on accessing sysfs file
  usb: gadget: f_mtp: simplify ptp NULL pointer check
  cgroup: replace unified-hierarchy.txt with a proper cgroup v2 documentation
  cgroup: rename Documentation/cgroups/ to Documentation/cgroup-legacy/
  cgroup: replace __DEVEL__sane_behavior with cgroup2 fs type
  writeback: initialize inode members that track writeback history
  mm: page_alloc: generalize the dirty balance reserve
  block: fix module reference leak on put_disk() call for cgroups throttle
  Linux 4.4.35
  netfilter: nft_dynset: fix element timeout for HZ != 1000
  IB/cm: Mark stale CM id's whenever the mad agent was unregistered
  IB/uverbs: Fix leak of XRC target QPs
  IB/core: Avoid unsigned int overflow in sg_alloc_table
  IB/mlx5: Fix fatal error dispatching
  IB/mlx5: Use cache line size to select CQE stride
  IB/mlx4: Fix create CQ error flow
  IB/mlx4: Check gid_index return value
  PM / sleep: don't suspend parent when async child suspend_{noirq, late} fails
  PM / sleep: fix device reference leak in test_suspend
  uwb: fix device reference leaks
  mfd: core: Fix device reference leak in mfd_clone_cell
  iwlwifi: pcie: fix SPLC structure parsing
  rtc: omap: Fix selecting external osc
  clk: mmp: mmp2: fix return value check in mmp2_clk_init()
  clk: mmp: pxa168: fix return value check in pxa168_clk_init()
  clk: mmp: pxa910: fix return value check in pxa910_clk_init()
  drm/amdgpu: Attach exclusive fence to prime exported bo's. (v5)
  crypto: caam - do not register AES-XTS mode on LP units
  ext4: sanity check the block and cluster size at mount time
  kbuild: Steal gcc's pie from the very beginning
  x86/kexec: add -fno-PIE
  scripts/has-stack-protector: add -fno-PIE
  kbuild: add -fno-PIE
  i2c: mux: fix up dependencies
  can: bcm: fix warning in bcm_connect/proc_register
  mfd: intel-lpss: Do not put device in reset state on suspend
  fuse: fix fuse_write_end() if zero bytes were copied
  KVM: Disable irq while unregistering user notifier
  KVM: x86: fix missed SRCU usage in kvm_lapic_set_vapic_addr
  x86/cpu/AMD: Fix cpu_llc_id for AMD Fam17h systems
  Linux 4.4.34
  sparc64: Delete now unused user copy fixup functions.
  sparc64: Delete now unused user copy assembler helpers.
  sparc64: Convert U3copy_{from,to}_user to accurate exception reporting.
  sparc64: Convert NG2copy_{from,to}_user to accurate exception reporting.
  sparc64: Convert NGcopy_{from,to}_user to accurate exception reporting.
  sparc64: Convert NG4copy_{from,to}_user to accurate exception reporting.
  sparc64: Convert U1copy_{from,to}_user to accurate exception reporting.
  sparc64: Convert GENcopy_{from,to}_user to accurate exception reporting.
  sparc64: Convert copy_in_user to accurate exception reporting.
  sparc64: Prepare to move to more saner user copy exception handling.
  sparc64: Delete __ret_efault.
  sparc64: Handle extremely large kernel TLB range flushes more gracefully.
  sparc64: Fix illegal relative branches in hypervisor patched TLB cross-call code.
  sparc64: Fix instruction count in comment for __hypervisor_flush_tlb_pending.
  sparc64: Fix illegal relative branches in hypervisor patched TLB code.
  sparc64: Handle extremely large kernel TSB range flushes sanely.
  sparc: Handle negative offsets in arch_jump_label_transform
  sparc64 mm: Fix base TSB sizing when hugetlb pages are used
  sparc: serial: sunhv: fix a double lock bug
  sparc: Don't leak context bits into thread->fault_address
  tty: Prevent ldisc drivers from re-using stale tty fields
  tcp: take care of truncations done by sk_filter()
  ipv4: use new_gw for redirect neigh lookup
  net: __skb_flow_dissect() must cap its return value
  sock: fix sendmmsg for partial sendmsg
  fib_trie: Correct /proc/net/route off by one error
  sctp: assign assoc_id earlier in __sctp_connect
  ipv6: dccp: add missing bind_conflict to dccp_ipv6_mapped
  ipv6: dccp: fix out of bound access in dccp_v6_err()
  dccp: fix out of bound access in dccp_v4_err()
  dccp: do not send reset to already closed sockets
  tcp: fix potential memory corruption
  ip6_tunnel: Clear IP6CB in ip6tunnel_xmit()
  bgmac: stop clearing DMA receive control register right after it is set
  net: mangle zero checksum in skb_checksum_help()
  net: clear sk_err_soft in sk_clone_lock()
  dctcp: avoid bogus doubling of cwnd after loss
  ARM: 8485/1: cpuidle: remove cpu parameter from the cpuidle_ops suspend hook
  Linux 4.4.33
  netfilter: fix namespace handling in nf_log_proc_dostring
  btrfs: qgroup: Prevent qgroup->reserved from going subzero
  mmc: mxs: Initialize the spinlock prior to using it
  ASoC: sun4i-codec: return error code instead of NULL when create_card fails
  ACPI / APEI: Fix incorrect return value of ghes_proc()
  i40e: fix call of ndo_dflt_bridge_getlink()
  hwrng: core - Don't use a stack buffer in add_early_randomness()
  lib/genalloc.c: start search from start of chunk
  mei: bus: fix received data size check in NFC fixup
  iommu/vt-d: Fix dead-locks in disable_dmar_iommu() path
  iommu/amd: Free domain id when free a domain of struct dma_ops_domain
  tty/serial: at91: fix hardware handshake on Atmel platforms
  dmaengine: at_xdmac: fix spurious flag status for mem2mem transfers
  drm/i915: Respect alternate_ddc_pin for all DDI ports
  KVM: MIPS: Precalculate MMIO load resume PC
  scsi: mpt3sas: Fix for block device of raid exists even after deleting raid disk
  scsi: qla2xxx: Fix scsi scan hang triggered if adapter fails during init
  iio: orientation: hid-sensor-rotation: Add PM function (fix non working driver)
  iio: hid-sensors: Increase the precision of scale to fix wrong reading interpretation.
  clk: qoriq: Don't allow CPU clocks higher than starting value
  toshiba-wmi: Fix loading the driver on non Toshiba laptops
  drbd: Fix kernel_sendmsg() usage - potential NULL deref
  usb: gadget: u_ether: remove interrupt throttling
  USB: cdc-acm: fix TIOCMIWAIT
  staging: nvec: remove managed resource from PS2 driver
  Revert "staging: nvec: ps2: change serio type to passthrough"
  drivers: staging: nvec: remove bogus reset command for PS/2 interface
  staging: iio: ad5933: avoid uninitialized variable in error case
  pinctrl: cherryview: Prevent possible interrupt storm on resume
  pinctrl: cherryview: Serialize register access in suspend/resume
  ARC: timer: rtc: implement read loop in "C" vs. inline asm
  s390/hypfs: Use get_free_page() instead of kmalloc to ensure page alignment
  coredump: fix unfreezable coredumping task
  swapfile: fix memory corruption via malformed swapfile
  dib0700: fix nec repeat handling
  ASoC: cs4270: fix DAPM stream name mismatch
  ALSA: info: Limit the proc text input size
  ALSA: info: Return error for invalid read/write
  arm64: Enable KPROBES/HIBERNATION/CORESIGHT in defconfig
  arm64: kvm: allows kvm cpu hotplug
  arm64: KVM: Register CPU notifiers when the kernel runs at HYP
  arm64: KVM: Skip HYP setup when already running in HYP
  arm64: hyp/kvm: Make hyp-stub reject kvm_call_hyp()
  arm64: hyp/kvm: Make hyp-stub extensible
  arm64: kvm: Move lr save/restore from do_el2_call into EL1
  arm64: kvm: deal with kernel symbols outside of linear mapping
  arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
  ANDROID: video: adf: Avoid directly referencing user pointers
  ANDROID: usb: gadget: audio_source: fix comparison of distinct pointer types
  android: binder: support for file-descriptor arrays.
  android: binder: support for scatter-gather.
  android: binder: add extra size to allocator.
  android: binder: refactor binder_transact()
  android: binder: support multiple /dev instances.
  android: binder: deal with contexts in debugfs.
  android: binder: support multiple context managers.
  android: binder: split flat_binder_object.
  disable aio support in recommended configuration
  Linux 4.4.32
  scsi: megaraid_sas: fix macro MEGASAS_IS_LOGICAL to avoid regression
  drm/radeon: fix DP mode validation
  drm/radeon/dp: add back special handling for NUTMEG
  drm/amdgpu: fix DP mode validation
  drm/amdgpu/dp: add back special handling for NUTMEG
  KVM: MIPS: Drop other CPU ASIDs on guest MMU changes
  Revert KVM: MIPS: Drop other CPU ASIDs on guest MMU changes
  of: silence warnings due to max() usage
  packet: on direct_xmit, limit tso and csum to supported devices
  sctp: validate chunk len before actually using it
  net sched filters: fix notification of filter delete with proper handle
  udp: fix IP_CHECKSUM handling
  net: sctp, forbid negative length
  ipv4: use the right lock for ping_group_range
  ipv4: disable BH in set_ping_group_range()
  net: add recursion limit to GRO
  rtnetlink: Add rtnexthop offload flag to compare mask
  bridge: multicast: restore perm router ports on multicast enable
  net: pktgen: remove rcu locking in pktgen_change_name()
  ipv6: correctly add local routes when lo goes up
  ip6_tunnel: fix ip6_tnl_lookup
  ipv6: tcp: restore IP6CB for pktoptions skbs
  netlink: do not enter direct reclaim from netlink_dump()
  packet: call fanout_release, while UNREGISTERING a netdev
  net: Add netdev all_adj_list refcnt propagation to fix panic
  net/sched: act_vlan: Push skb->data to mac_header prior calling skb_vlan_*() functions
  net: pktgen: fix pkt_size
  net: fec: set mac address unconditionally
  tg3: Avoid NULL pointer dereference in tg3_io_error_detected()
  ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route
  ip6_gre: fix flowi6_proto value in ip6gre_xmit_other()
  tcp: fix a compile error in DBGUNDO()
  tcp: fix wrong checksum calculation on MTU probing
  net: avoid sk_forward_alloc overflows
  tcp: fix overflow in __tcp_retransmit_skb()
  arm64/kvm: fix build issue on kvm debug
  arm64: ptdump: Indicate whether memory should be faulting
  arm64: Add support for ARCH_SUPPORTS_DEBUG_PAGEALLOC
  arm64: Drop alloc function from create_mapping
  arm64: allow vmalloc regions to be set with set_memory_*
  arm64: kernel: implement ACPI parking protocol
  arm64: mm: create new fine-grained mappings at boot
  arm64: ensure _stext and _etext are page-aligned
  arm64: mm: allow passing a pgdir to alloc_init_*
  arm64: mm: allocate pagetables anywhere
  arm64: mm: use fixmap when creating page tables
  arm64: mm: add functions to walk tables in fixmap
  arm64: mm: add __{pud,pgd}_populate
  arm64: mm: avoid redundant __pa(__va(x))
  Linux 4.4.31
  HID: usbhid: add ATEN CS962 to list of quirky devices
  ubi: fastmap: Fix add_vol() return value test in ubi_attach_fastmap()
  kvm: x86: Check memopp before dereference (CVE-2016-8630)
  tty: vt, fix bogus division in csi_J
  usb: dwc3: Fix size used in dma_free_coherent()
  pwm: Unexport children before chip removal
  UBI: fastmap: scrub PEB when bitflips are detected in a free PEB EC header
  Disable "frame-address" warning
  smc91x: avoid self-comparison warning
  cgroup: avoid false positive gcc-6 warning
  drm/exynos: fix error handling in exynos_drm_subdrv_open
  mm/cma: silence warnings due to max() usage
  ARM: 8584/1: floppy: avoid gcc-6 warning
  powerpc/ptrace: Fix out of bounds array access warning
  x86/xen: fix upper bound of pmd loop in xen_cleanhighmap()
  perf build: Fix traceevent plugins build race
  drm/dp/mst: Check peer device type before attempting EDID read
  drm/radeon: drop register readback in cayman_cp_int_cntl_setup
  drm/radeon/si_dpm: workaround for SI kickers
  drm/radeon/si_dpm: Limit clocks on HD86xx part
  Revert "drm/radeon: fix DP link training issue with second 4K monitor"
  mmc: dw_mmc-pltfm: fix the potential NULL pointer dereference
  scsi: arcmsr: Send SYNCHRONIZE_CACHE command to firmware
  scsi: scsi_debug: Fix memory leak if LBP enabled and module is unloaded
  scsi: megaraid_sas: Fix data integrity failure for JBOD (passthrough) devices
  mac80211: discard multicast and 4-addr A-MSDUs
  firewire: net: fix fragmented datagram_size off-by-one
  firewire: net: guard against rx buffer overflows
  Input: i8042 - add XMG C504 to keyboard reset table
  dm mirror: fix read error on recovery after default leg failure
  virtio: console: Unlock vqs while freeing buffers
  virtio_ring: Make interrupt suppression spec compliant
  parisc: Ensure consistent state when switching to kernel stack at syscall entry
  ovl: fsync after copy-up
  KVM: MIPS: Make ERET handle ERL before EXL
  KVM: x86: fix wbinvd_dirty_mask use-after-free
  dm: free io_barrier after blk_cleanup_queue call
  USB: serial: cp210x: fix tiocmget error handling
  tty: limit terminal size to 4M chars
  xhci: add restart quirk for Intel Wildcatpoint PCH
  hv: do not lose pending heartbeat vmbus packets
  vt: clear selection before resizing
  Fix potential infoleak in older kernels
  GenWQE: Fix bad page access during abort of resource allocation
  usb: increase ohci watchdog delay to 275 msec
  xhci: use default USB_RESUME_TIMEOUT when resuming ports.
  USB: serial: ftdi_sio: add support for Infineon TriBoard TC2X7
  USB: serial: fix potential NULL-dereference at probe
  usb: gadget: function: u_ether: don't starve tx request queue
  mei: txe: don't clean an unprocessed interrupt cause.
  ubifs: Fix regression in ubifs_readdir()
  ubifs: Abort readdir upon error
  btrfs: fix races on root_log_ctx lists
  ANDROID: binder: Clear binder and cookie when setting handle in flat binder struct
  ANDROID: binder: Add strong ref checks
  ALSA: hda - Fix headset mic detection problem for two Dell laptops
  ALSA: hda - Adding a new group of pin cfg into ALC295 pin quirk table
  ALSA: hda - allow 40 bit DMA mask for NVidia devices
  ALSA: hda - Raise AZX_DCAPS_RIRB_DELAY handling into top drivers
  ALSA: hda - Merge RIRB_PRE_DELAY into CTX_WORKAROUND caps
  ALSA: usb-audio: Add quirk for Syntek STK1160
  KEYS: Fix short sprintf buffer in /proc/keys show function
  mm: memcontrol: do not recurse in direct reclaim
  mm/list_lru.c: avoid error-path NULL pointer deref
  libxfs: clean up _calc_dquots_per_chunk
  h8300: fix syscall restarting
  drm/dp/mst: Clear port->pdt when tearing down the i2c adapter
  i2c: core: fix NULL pointer dereference under race condition
  i2c: xgene: Avoid dma_buffer overrun
  arm64:cpufeature ARM64_NCAPS is the indicator of last feature
  arm64: hibernate: Refuse to hibernate if the boot cpu is offline
  PM / sleep: Add support for read-only sysfs attributes
  arm64: kernel: Add support for hibernate/suspend-to-disk
  arm64: mm: add functions to walk page tables by PA
  arm64: mm: move pte_* macros
  PM / Hibernate: Call flush_icache_range() on pages restored in-place
  arm64: Add new asm macro copy_page
  arm64: Promote KERNEL_START/KERNEL_END definitions to a header file
  arm64: kernel: Include _AC definition in page.h
  arm64: Change cpu_resume() to enable mmu early then access sleep_sp by va
  arm64: kernel: Rework finisher callback out of __cpu_suspend_enter()
  arm64: Cleanup SCTLR flags
  arm64: Fold proc-macros.S into assembler.h
  arm/arm64: KVM: Add hook for C-based stage2 init
  arm/arm64: KVM: Detect vGIC presence at runtime
  arm64: KVM: Add support for 16-bit VMID
  arm: KVM: Make kvm_arm.h friendly to assembly code
  arm/arm64: KVM: Remove unreferenced S2_PGD_ORDER
  arm64: KVM: debug: Remove spurious inline attributes
  ARM: KVM: Cleanup exception injection
  arm64: KVM: Remove weak attributes
  arm64: KVM: Cleanup asm-offset.c
  arm64: KVM: Turn system register numbers to an enum
  arm64: KVM: VHE: Patch out use of HVC
  arm64: Add ARM64_HAS_VIRT_HOST_EXTN feature
  arm/arm64: Add new is_kernel_in_hyp_mode predicate
  arm64: KVM: Move away from the assembly version of the world switch
  arm64: KVM: Map the kernel RO section into HYP
  arm64: KVM: Add compatibility aliases
  arm64: KVM: Implement vgic-v3 save/restore
  arm64: KVM: Add panic handling
  arm64: KVM: HYP mode entry points
  arm64: KVM: Implement TLB handling
  arm64: KVM: Implement fpsimd save/restore
  arm64: KVM: Implement the core world switch
  arm64: KVM: Add patchable function selector
  arm64: KVM: Implement guest entry
  arm64: KVM: Implement debug save/restore
  arm64: KVM: Implement 32bit system register save/restore
  arm64: KVM: Implement system register save/restore
  arm64: KVM: Implement timer save/restore
  arm64: KVM: Implement vgic-v2 save/restore
  arm64: KVM: Add a HYP-specific header file
  KVM: arm/arm64: vgic-v3: Make the LR indexing macro public
  arm64: Add macros to read/write system registers
  Linux 4.4.30
  Revert "fix minor infoleak in get_user_ex()"
  Revert "x86/mm: Expand the exception table logic to allow new handling options"
  Linux 4.4.29
  ARM: pxa: pxa_cplds: fix interrupt handling
  powerpc/nvram: Fix an incorrect partition merge
  mpt3sas: Don't spam logs if logging level is 0
  perf symbols: Fixup symbol sizes before picking best ones
  perf symbols: Check symbol_conf.allow_aliases for kallsyms loading too
  perf hists browser: Fix event group display
  clk: divider: Fix clk_divider_round_rate() to use clk_readl()
  clk: qoriq: fix a register offset error
  s390/con3270: fix insufficient space padding
  s390/con3270: fix use of uninitialised data
  s390/cio: fix accidental interrupt enabling during resume
  x86/mm: Expand the exception table logic to allow new handling options
  dmaengine: ipu: remove bogus NO_IRQ reference
  power: bq24257: Fix use of uninitialized pointer bq->charger
  staging: r8188eu: Fix scheduling while atomic splat
  ASoC: dapm: Fix kcontrol creation for output driver widget
  ASoC: dapm: Fix value setting for _ENUM_DOUBLE MUX's second channel
  ASoC: dapm: Fix possible uninitialized variable in snd_soc_dapm_get_volsw()
  ASoC: topology: Fix error return code in soc_tplg_dapm_widget_create()
  hwrng: omap - Only fail if pm_runtime_get_sync returns < 0
  crypto: arm/ghash-ce - add missing async import/export
  crypto: gcm - Fix IV buffer size in crypto_gcm_setkey
  mwifiex: correct aid value during tdls setup
  spi: spi-fsl-dspi: Drop extra spi_master_put in device remove function
  ARM: clk-imx35: fix name for ckil clk
  uio: fix dmem_region_start computation
  genirq/generic_chip: Add irq_unmap callback
  perf stat: Fix interval output values
  powerpc/eeh: Null check uses of eeh_pe_bus_get
  tunnels: Remove encapsulation offloads on decap.
  tunnels: Don't apply GRO to multiple layers of encapsulation.
  ipip: Properly mark ipip GRO packets as encapsulated.
  posix_acl: Clear SGID bit when setting file permissions
  brcmfmac: avoid potential stack overflow in brcmf_cfg80211_start_ap()
  mm/hugetlb: fix memory offline with hugepage size > memory block size
  drm/i915: Unalias obj->phys_handle and obj->userptr
  drm/i915: Account for TSEG size when determining 865G stolen base
  Revert "drm/i915: Check live status before reading edid"
  drm/i915/gen9: fix the WaWmMemoryReadLatency implementation
  xenbus: don't look up transaction IDs for ordinary writes
  drm/vmwgfx: Limit the user-space command buffer size
  drm/radeon: change vblank_time's calculation method to reduce computational error.
  drm/radeon/si/dpm: fix phase shedding setup
  drm/radeon: narrow asic_init for virtualization
  drm/amdgpu: change vblank_time's calculation method to reduce computational error.
  drm/amdgpu/dce11: add missing drm_mode_config_cleanup call
  drm/amdgpu/dce11: disable hpd on local panels
  drm/amdgpu/dce8: disable hpd on local panels
  drm/amdgpu/dce10: disable hpd on local panels
  drm/amdgpu: fix IB alignment for UVD
  drm/prime: Pass the right module owner through to dma_buf_export()
  Linux 4.4.28
  target: Don't override EXTENDED_COPY xcopy_pt_cmd SCSI status code
  target: Make EXTENDED_COPY 0xe4 failure return COPY TARGET DEVICE NOT REACHABLE
  target: Re-add missing SCF_ACK_KREF assignment in v4.1.y
  ubifs: Fix xattr_names length in exit paths
  jbd2: fix incorrect unlock on j_list_lock
  ext4: do not advertise encryption support when disabled
  mmc: rtsx_usb_sdmmc: Handle runtime PM while changing the led
  mmc: rtsx_usb_sdmmc: Avoid keeping the device runtime resumed when unused
  mmc: core: Annotate cmd_hdr as __le32
  powerpc/mm: Prevent unlikely crash in copro_calculate_slb()
  ceph: fix error handling in ceph_read_iter
  arm64: kernel: Init MDCR_EL2 even in the absence of a PMU
  arm64: percpu: rewrite ll/sc loops in assembly
  memstick: rtsx_usb_ms: Manage runtime PM when accessing the device
  memstick: rtsx_usb_ms: Runtime resume the device when polling for cards
  isofs: Do not return EACCES for unknown filesystems
  irqchip/gic-v3-its: Fix entry size mask for GITS_BASER
  s390/mm: fix gmap tlb flush issues
  Using BUG_ON() as an assert() is _never_ acceptable
  mm: filemap: fix mapping->nrpages double accounting in fuse
  mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()
  acpi, nfit: check for the correct event code in notifications
  net/mlx4_core: Allow resetting VF admin mac to zero
  bnx2x: Prevent false warning for lack of FC NPIV
  PKCS#7: Don't require SpcSpOpusInfo in Authenticode pkcs7 signatures
  hpsa: correct skipping masked peripherals
  sd: Fix rw_max for devices that report an optimal xfer size
  irqchip/gicv3: Handle loop timeout proper
  kvm: x86: memset whole irq_eoi
  x86/e820: Don't merge consecutive E820_PRAM ranges
  blkcg: Unlock blkcg_pol_mutex only once when cpd == NULL
  Fix regression which breaks DFS mounting
  Cleanup missing frees on some ioctls
  Do not send SMB3 SET_INFO request if nothing is changing
  SMB3: GUIDs should be constructed as random but valid uuids
  Set previous session id correctly on SMB3 reconnect
  Display number of credits available
  Clarify locking of cifs file and tcon structures and make more granular
  fs/cifs: keep guid when assigning fid to fileinfo
  cifs: Limit the overall credit acquired
  fs/super.c: fix race between freeze_super() and thaw_super()
  arc: don't leak bits of kernel stack into coredump
  lightnvm: ensure that nvm_dev_ops can be used without CONFIG_NVM
  ipc/sem.c: fix complex_count vs. simple op race
  mm: filemap: don't plant shadow entries without radix tree node
  metag: Only define atomic_dec_if_positive conditionally
  scsi: Fix use-after-free
  NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
  NFSv4: Open state recovery must account for file permission changes
  NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is invalid
  NFSv4: Don't report revoked delegations as valid in nfs_have_delegation()
  sunrpc: fix write space race causing stalls
  Input: elantech - add Fujitsu Lifebook E556 to force crc_enabled
  Input: elantech - force needed quirks on Fujitsu H760
  Input: i8042 - skip selftest on ASUS laptops
  lib: add "on"/"off" support to kstrtobool
  lib: update single-char callers of strtobool()
  lib: move strtobool() to kstrtobool()
  MIPS: ptrace: Fix regs_return_value for kernel context
  MIPS: Fix -mabi=64 build of vdso.lds
  ALSA: hda - Fix a failure of micmute led when having multi adcs
  cx231xx: fix GPIOs for Pixelview SBTVD hybrid
  cx231xx: don't return error on success
  mb86a20s: fix demod settings
  mb86a20s: fix the locking logic
  ovl: copy_up_xattr(): use strnlen
  ovl: Fix info leak in ovl_lookup_temp()
  fbdev/efifb: Fix 16 color palette entry calculation
  scsi: zfcp: spin_lock_irqsave() is not nestable
  zfcp: trace full payload of all SAN records (req,resp,iels)
  zfcp: fix payload trace length for SAN request&response
  zfcp: fix D_ID field with actual value on tracing SAN responses
  zfcp: restore tracing of handle for port and LUN with HBA records
  zfcp: trace on request for open and close of WKA port
  zfcp: restore: Dont use 0 to indicate invalid LUN in rec trace
  zfcp: retain trace level for SCSI and HBA FSF response records
  zfcp: close window with unblocked rport during rport gone
  zfcp: fix ELS/GS request&response length for hardware data router
  zfcp: fix fc_host port_type with NPIV
  ubi: Deal with interrupted erasures in WL
  powerpc/pseries: Fix stack corruption in htpe code
  powerpc/64: Fix incorrect return value from __copy_tofrom_user
  powerpc/powernv: Use CPU-endian PEST in pnv_pci_dump_p7ioc_diag_data()
  powerpc/powernv: Use CPU-endian hub diag-data type in pnv_eeh_get_and_dump_hub_diag()
  powerpc/powernv: Pass CPU-endian PE number to opal_pci_eeh_freeze_clear()
  powerpc/vdso64: Use double word compare on pointers
  dm crypt: fix crash on exit
  dm mpath: check if path's request_queue is dying in activate_path()
  dm: return correct error code in dm_resume()'s retry loop
  dm: mark request_queue dead before destroying the DM device
  perf intel-pt: Fix MTC timestamp calculation for large MTC periods
  perf intel-pt: Fix estimated timestamps for cycle-accurate mode
  perf intel-pt: Fix snapshot overlap detection decoder errors
  pstore/ram: Use memcpy_fromio() to save old buffer
  pstore/ram: Use memcpy_toio instead of memcpy
  pstore/core: drop cmpxchg based updates
  pstore/ramoops: fixup driver removal
  parisc: Increase initial kernel mapping size
  parisc: Fix kernel memory layout regarding position of __gp
  parisc: Increase KERNEL_INITIAL_SIZE for 32-bit SMP kernels
  cpufreq: intel_pstate: Fix unsafe HWP MSR access
  platform: don't return 0 from platform_get_irq[_byname]() on error
  PCI: Mark Atheros AR9580 to avoid bus reset
  mmc: sdhci: cast unsigned int to unsigned long long to avoid unexpeted error
  mmc: block: don't use CMD23 with very old MMC cards
  rtlwifi: Fix missing country code for Great Britain
  PM / devfreq: event: remove duplicate devfreq_event_get_drvdata()
  clk: imx6: initialize GPU clocks
  regulator: tps65910: Work around silicon erratum SWCZ010
  mei: me: add kaby point device ids
  gpio: mpc8xxx: Correct irq handler function
  cgroup: Change from CAP_SYS_NICE to CAP_SYS_RESOURCE for cgroup migration permissions
  UPSTREAM: cpu/hotplug: Handle unbalanced hotplug enable/disable
  UPSTREAM: arm64: kaslr: fix breakage with CONFIG_MODVERSIONS=y
  UPSTREAM: arm64: kaslr: keep modules close to the kernel when DYNAMIC_FTRACE=y
  cgroup: Remove leftover instances of allow_attach
  BACKPORT: lib: harden strncpy_from_user
  CHROMIUM: cgroups: relax permissions on moving tasks between cgroups
  CHROMIUM: remove Android's cgroup generic permissions checks
  Linux 4.4.27
  cfq: fix starvation of asynchronous writes
  vfs: move permission checking into notify_change() for utimes(NULL)
  dlm: free workqueues after the connections
  crypto: vmx - Fix memory corruption caused by p8_ghash
  crypto: ghash-generic - move common definitions to a new header file
  ext4: release bh in make_indexed_dir
  ext4: allow DAX writeback for hole punch
  ext4: fix memory leak in ext4_insert_range()
  ext4: reinforce check of i_dtime when clearing high fields of uid and gid
  ext4: enforce online defrag restriction for encrypted files
  scsi: ibmvfc: Fix I/O hang when port is not mapped
  scsi: arcmsr: Simplify user_len checking
  scsi: arcmsr: Buffer overflow in arcmsr_iop_message_xfer()
  async_pq_val: fix DMA memory leak
  reiserfs: switch to generic_{get,set,remove}xattr()
  reiserfs: Unlock superblock before calling reiserfs_quota_on_mount()
  ASoC: Intel: Atom: add a missing star in a memcpy call
  brcmfmac: fix memory leak in brcmf_fill_bss_param
  i40e: avoid NULL pointer dereference and recursive errors on early PCI error
  fuse: fix killing s[ug]id in setattr
  fuse: invalidate dir dentry after chmod
  fuse: listxattr: verify xattr list
  drivers: base: dma-mapping: page align the size when unmap_kernel_range
  btrfs: assign error values to the correct bio structs
  serial: 8250_dw: Check the data->pclk when get apb_pclk
  arm64: Use PoU cache instr for I/D coherency
  arm64: mm: add code to safely replace TTBR1_EL1
  arm64: mm: place __cpu_setup in .text
  arm64: add function to install the idmap
  arm64: unmap idmap earlier
  arm64: unify idmap removal
  arm64: mm: place empty_zero_page in bss
  arm64: head.S: use memset to clear BSS
  arm64: mm: specialise pagetable allocators
  arm64: mm: remove pointless PAGE_MASKing
  asm-generic: Fix local variable shadow in __set_fixmap_offset
  arm64: mm: fold alternatives into .init
  ARM: 8511/1: ARM64: kernel: PSCI: move PSCI idle management code to drivers/firmware
  ARM: 8481/2: drivers: psci: replace psci firmware calls
  ARM: 8480/2: arm64: add implementation for arm-smccc
  ARM: 8479/2: add implementation for arm-smccc
  ARM: 8478/2: arm/arm64: add arm-smccc
  ARM: 8510/1: rework ARM_CPU_SUSPEND dependencies
  ARM: 8458/1: bL_switcher: add GIC dependency
  Linux 4.4.26
  mm: remove gup_flags FOLL_WRITE games from __get_user_pages()
  x86/build: Build compressed x86 kernels as PIE
  arm64: Remove stack duplicating code from jprobes
  arm64: kprobes: Add KASAN instrumentation around stack accesses
  arm64: kprobes: Cleanup jprobe_return
  arm64: kprobes: Fix overflow when saving stack
  arm64: kprobes: WARN if attempting to step with PSTATE.D=1
  kprobes: Add arm64 case in kprobe example module
  arm64: Add kernel return probes support (kretprobes)
  arm64: Add trampoline code for kretprobes
  arm64: kprobes instruction simulation support
  arm64: Treat all entry code as non-kprobe-able
  arm64: Blacklist non-kprobe-able symbol
  arm64: Kprobes with single stepping support
  arm64: add conditional instruction simulation support
  arm64: Add more test functions to insn.c
  arm64: Add HAVE_REGS_AND_STACK_ACCESS_API feature
  Linux 4.4.25
  tpm_crb: fix crb_req_canceled behavior
  tpm: fix a race condition in tpm2_unseal_trusted()
  ima: use file_dentry()
  ARM: cpuidle: Fix error return code
  ARM: dts: MSM8064 remove flags from SPMI/MPP IRQs
  ARM: dts: mvebu: armada-390: add missing compatibility string and bracket
  x86/dumpstack: Fix x86_32 kernel_stack_pointer() previous stack access
  x86/irq: Prevent force migration of irqs which are not in the vector domain
  x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation
  KVM: PPC: BookE: Fix a sanity check
  KVM: MIPS: Drop other CPU ASIDs on guest MMU changes
  KVM: PPC: Book3s PR: Allow access to unprivileged MMCR2 register
  mfd: wm8350-i2c: Make sure the i2c regmap functions are compiled
  mfd: 88pm80x: Double shifting bug in suspend/resume
  mfd: atmel-hlcdc: Do not sleep in atomic context
  mfd: rtsx_usb: Avoid setting ucr->current_sg.status
  ALSA: usb-line6: use the same declaration as definition in header for MIDI manufacturer ID
  ALSA: usb-audio: Extend DragonFly dB scale quirk to cover other variants
  ALSA: ali5451: Fix out-of-bound position reporting
  timekeeping: Fix __ktime_get_fast_ns() regression
  time: Add cycles to nanoseconds translation
  mm: Fix build for hardened usercopy
  ANDROID: binder: Clear binder and cookie when setting handle in flat binder struct
  ANDROID: binder: Add strong ref checks
  UPSTREAM: staging/android/ion : fix a race condition in the ion driver
  ANDROID: android-base: CONFIG_HARDENED_USERCOPY=y
  UPSTREAM: fs/proc/kcore.c: Add bounce buffer for ktext data
  UPSTREAM: fs/proc/kcore.c: Make bounce buffer global for read
  BACKPORT: arm64: Correctly bounds check virt_addr_valid
  Fix a build breakage in IO latency hist code.
  UPSTREAM: efi: include asm/early_ioremap.h not asm/efi.h to get early_memremap
  UPSTREAM: ia64: split off early_ioremap() declarations into asm/early_ioremap.h
  FROMLIST: arm64: Enable CONFIG_ARM64_SW_TTBR0_PAN
  FROMLIST: arm64: xen: Enable user access before a privcmd hvc call
  FROMLIST: arm64: Handle faults caused by inadvertent user access with PAN enabled
  FROMLIST: arm64: Disable TTBR0_EL1 during normal kernel execution
  FROMLIST: arm64: Introduce uaccess_{disable,enable} functionality based on TTBR0_EL1
  FROMLIST: arm64: Factor out TTBR0_EL1 post-update workaround into a specific asm macro
  FROMLIST: arm64: Factor out PAN enabling/disabling into separate uaccess_* macros
  UPSTREAM: arm64: Handle el1 synchronous instruction aborts cleanly
  UPSTREAM: arm64: include alternative handling in dcache_by_line_op
  UPSTREAM: arm64: fix "dc cvau" cache operation on errata-affected core
  UPSTREAM: Revert "arm64: alternatives: add enable parameter to conditional asm macros"
  UPSTREAM: arm64: Add new asm macro copy_page
  UPSTREAM: arm64: kill ESR_LNX_EXEC
  UPSTREAM: arm64: add macro to extract ESR_ELx.EC
  UPSTREAM: arm64: mm: mark fault_info table const
  UPSTREAM: arm64: fix dump_instr when PAN and UAO are in use
  BACKPORT: arm64: Fold proc-macros.S into assembler.h
  UPSTREAM: arm64: choose memstart_addr based on minimum sparsemem section alignment
  UPSTREAM: arm64/mm: ensure memstart_addr remains sufficiently aligned
  UPSTREAM: arm64/kernel: fix incorrect EL0 check in inv_entry macro
  UPSTREAM: arm64: Add macros to read/write system registers
  UPSTREAM: arm64/efi: refactor EFI init and runtime code for reuse by 32-bit ARM
  UPSTREAM: arm64/efi: split off EFI init and runtime code for reuse by 32-bit ARM
  UPSTREAM: arm64/efi: mark UEFI reserved regions as MEMBLOCK_NOMAP
  BACKPORT: arm64: only consider memblocks with NOMAP cleared for linear mapping
  UPSTREAM: mm/memblock: add MEMBLOCK_NOMAP attribute to memblock memory table
  ANDROID: dm: android-verity: Remove fec_header location constraint
  BACKPORT: audit: consistently record PIDs with task_tgid_nr()
  android-base.cfg: Enable kernel ASLR
  UPSTREAM: vmlinux.lds.h: allow arch specific handling of ro_after_init data section
  UPSTREAM: arm64: spinlock: fix spin_unlock_wait for LSE atomics
  UPSTREAM: arm64: avoid TLB conflict with CONFIG_RANDOMIZE_BASE
  UPSTREAM: arm64: Only select ARM64_MODULE_PLTS if MODULES=y
  sched: Add Kconfig option DEFAULT_USE_ENERGY_AWARE to set ENERGY_AWARE feature flag
  sched/fair: remove printk while schedule is in progress
  ANDROID: fs: FS tracepoints to track IO.
  sched/walt: Drop arch-specific timer access
  ANDROID: fiq_debugger: Pass task parameter to unwind_frame()
  eas/sched/fair: Fixing comments in find_best_target.
  input: keyreset: switch to orderly_reboot
  UPSTREAM: tun: fix transmit timestamp support
  UPSTREAM: arch/arm/include/asm/pgtable-3level.h: add pmd_mkclean for THP
  net: inet: diag: expose the socket mark to privileged processes.
  net: diag: make udp_diag_destroy work for mapped addresses.
  net: diag: support SOCK_DESTROY for UDP sockets
  net: diag: allow socket bytecode filters to match socket marks
  net: diag: slightly refactor the inet_diag_bc_audit error checks.
  net: diag: Add support to filter on device index
  UPSTREAM: brcmfmac: avoid potential stack overflow in brcmf_cfg80211_start_ap()
  Linux 4.4.24
  ALSA: hda - Add the top speaker pin config for HP Spectre x360
  ALSA: hda - Fix headset mic detection problem for several Dell laptops
  ACPICA: acpi_get_sleep_type_data: Reduce warnings
  ALSA: hda - Adding one more ALC255 pin definition for headset problem
  Revert "usbtmc: convert to devm_kzalloc"
  USB: serial: cp210x: Add ID for a Juniper console
  Staging: fbtft: Fix bug in fbtft-core
  usb: misc: legousbtower: Fix NULL pointer deference
  USB: serial: cp210x: fix hardware flow-control disable
  dm log writes: fix bug with too large bios
  clk: xgene: Add missing parenthesis when clearing divider value
  aio: mark AIO pseudo-fs noexec
  batman-adv: remove unused callback from batadv_algo_ops struct
  IB/mlx4: Use correct subnet-prefix in QP1 mads under SR-IOV
  IB/mlx4: Fix code indentation in QP1 MAD flow
  IB/mlx4: Fix incorrect MC join state bit-masking on SR-IOV
  IB/ipoib: Don't allow MC joins during light MC flush
  IB/core: Fix use after free in send_leave function
  IB/ipoib: Fix memory corruption in ipoib cm mode connect flow
  KVM: nVMX: postpone VMCS changes on MSR_IA32_APICBASE write
  dmaengine: at_xdmac: fix to pass correct device identity to free_irq()
  kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
  ASoC: omap-mcpdm: Fix irq resource handling
  sysctl: handle error writing UINT_MAX to u32 fields
  powerpc/prom: Fix sub-processor option passed to ibm, client-architecture-support
  brcmsmac: Initialize power in brcms_c_stf_ss_algo_channel_get()
  brcmsmac: Free packet if dma_mapping_error() fails in dma_rxfill
  brcmfmac: Fix glob_skb leak in brcmf_sdiod_recv_chain
  ASoC: Intel: Skylake: Fix error return code in skl_probe()
  pNFS/flexfiles: Fix layoutcommit after a commit to DS
  pNFS/files: Fix layoutcommit after a commit to DS
  NFS: Don't drop CB requests with invalid principals
  svc: Avoid garbage replies when pc_func() returns rpc_drop_reply
  dmaengine: at_xdmac: fix debug string
  fnic: pci_dma_mapping_error() doesn't return an error code
  avr32: off by one in at32_init_pio()
  ath9k: Fix programming of minCCA power threshold
  gspca: avoid unused variable warnings
  em28xx-i2c: rt_mutex_trylock() returns zero on failure
  NFC: fdp: Detect errors from fdp_nci_create_conn()
  iwlmvm: mvm: set correct state in smart-fifo configuration
  tile: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO
  pstore: drop file opened reference count
  blk-mq: actually hook up defer list when running requests
  hwrng: omap - Fix assumption that runtime_get_sync will always succeed
  ARM: sa1111: fix pcmcia suspend/resume
  ARM: shmobile: fix regulator quirk for Gen2
  ARM: sa1100: clear reset status prior to reboot
  ARM: sa1100: fix 3.6864MHz clock
  ARM: sa1100: register clocks early
  ARM: sun5i: Fix typo in trip point temperature
  regulator: qcom_smd: Fix voltage ranges for pm8x41
  regulator: qcom_spmi: Update mvs1/mvs2 switches on pm8941
  regulator: qcom_spmi: Add support for get_mode/set_mode on switches
  regulator: qcom_spmi: Add support for S4 supply on pm8941
  tpm: fix byte-order for the value read by tpm2_get_tpm_pt
  printk: fix parsing of "brl=" option
  MIPS: uprobes: fix use of uninitialised variable
  MIPS: Malta: Fix IOCU disable switch read for MIPS64
  MIPS: fix uretprobe implementation
  MIPS: uprobes: remove incorrect set_orig_insn
  arm64: debug: avoid resetting stepping state machine when TIF_SINGLESTEP
  ARM: 8618/1: decompressor: reset ttbcr fields to use TTBR0 on ARMv7
  irqchip/gicv3: Silence noisy DEBUG_PER_CPU_MAPS warning
  gpio: sa1100: fix irq probing for ucb1x00
  usb: gadget: fsl_qe_udc: signedness bug in qe_get_frame()
  ceph: fix race during filling readdir cache
  iwlwifi: mvm: don't use ret when not initialised
  iwlwifi: pcie: fix access to scratch buffer
  spi: sh-msiof: Avoid invalid clock generator parameters
  hwmon: (adt7411) set bit 3 in CFG1 register
  nvmem: Declare nvmem_cell_read() consistently
  ipvs: fix bind to link-local mcast IPv6 address in backup
  tools/vm/slabinfo: fix an unintentional printf
  mmc: pxamci: fix potential oops
  drivers/perf: arm_pmu: Fix leak in error path
  pinctrl: Flag strict is a field in struct pinmux_ops
  pinctrl: uniphier: fix .pin_dbg_show() callback
  i40e: avoid null pointer dereference
  perf/core: Fix pmu::filter_match for SW-led groups
  iwlwifi: mvm: fix a few firmware capability checks
  usb: musb: fix DMA for host mode
  usb: musb: Fix DMA desired mode for Mentor DMA engine
  ARM: 8617/1: dma: fix dma_max_pfn()
  ARM: 8616/1: dt: Respect property size when parsing CPUs
  drm/radeon/si/dpm: add workaround for for Jet parts
  drm/nouveau/fifo/nv04: avoid ramht race against cookie insertion
  x86/boot: Initialize FPU and X86_FEATURE_ALWAYS even if we don't have CPUID
  x86/init: Fix cr4_init_shadow() on CR4-less machines
  can: dev: fix deadlock reported after bus-off
  mm,ksm: fix endless looping in allocating memory when ksm enable
  mtd: nand: davinci: Reinitialize the HW ECC engine in 4bit hwctl
  cpuset: handle race between CPU hotplug and cpuset_hotplug_work
  usercopy: fold builtin_const check into inline function
  Linux 4.4.23
  hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common()
  qxl: check for kmap failures
  power: supply: max17042_battery: fix model download bug.
  power_supply: tps65217-charger: fix missing platform_set_drvdata()
  PM / hibernate: Fix rtree_next_node() to avoid walking off list ends
  PM / hibernate: Restore processor state before using per-CPU variables
  MIPS: paravirt: Fix undefined reference to smp_bootstrap
  MIPS: Add a missing ".set pop" in an early commit
  MIPS: Avoid a BUG warning during prctl(PR_SET_FP_MODE, ...)
  MIPS: Remove compact branch policy Kconfig entries
  MIPS: vDSO: Fix Malta EVA mapping to vDSO page structs
  MIPS: SMP: Fix possibility of deadlock when bringing CPUs online
  MIPS: Fix pre-r6 emulation FPU initialisation
  i2c: qup: skip qup_i2c_suspend if the device is already runtime suspended
  i2c-eg20t: fix race between i2c init and interrupt enable
  btrfs: ensure that file descriptor used with subvol ioctls is a dir
  nl80211: validate number of probe response CSA counters
  can: flexcan: fix resume function
  mm: delete unnecessary and unsafe init_tlb_ubc()
  tracing: Move mutex to protect against resetting of seq data
  fix memory leaks in tracing_buffers_splice_read()
  power: reset: hisi-reboot: Unmap region obtained by of_iomap
  mtd: pmcmsp-flash: Allocating too much in init_msp_flash()
  mtd: maps: sa1100-flash: potential NULL dereference
  fix fault_in_multipages_...() on architectures with no-op access_ok()
  fanotify: fix list corruption in fanotify_get_response()
  fsnotify: add a way to stop queueing events on group shutdown
  xfs: prevent dropping ioend completions during buftarg wait
  autofs: use dentry flags to block walks during expire
  autofs races
  pwm: Mark all devices as "might sleep"
  bridge: re-introduce 'fix parsing of MLDv2 reports'
  net: smc91x: fix SMC accesses
  Revert "phy: IRQ cannot be shared"
  net: dsa: bcm_sf2: Fix race condition while unmasking interrupts
  net/mlx5: Added missing check of msg length in verifying its signature
  tipc: fix NULL pointer dereference in shutdown()
  net/irda: handle iriap_register_lsap() allocation failure
  vti: flush x-netns xfrm cache when vti interface is removed
  af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock'
  Revert "af_unix: Fix splice-bind deadlock"
  bonding: Fix bonding crash
  megaraid: fix null pointer check in megasas_detach_one().
  nouveau: fix nv40_perfctr_next() cleanup regression
  Staging: iio: adc: fix indent on break statement
  iwlegacy: avoid warning about missing braces
  ath9k: fix misleading indentation
  am437x-vfpe: fix typo in vpfe_get_app_input_index
  Add braces to avoid "ambiguous ‘else’" compiler warnings
  net: caif: fix misleading indentation
  Makefile: Mute warning for __builtin_return_address(>0) for tracing only
  Disable "frame-address" warning
  Disable "maybe-uninitialized" warning globally
  gcov: disable -Wmaybe-uninitialized warning
  Kbuild: disable 'maybe-uninitialized' warning for CONFIG_PROFILE_ALL_BRANCHES
  kbuild: forbid kernel directory to contain spaces and colons
  tools: Support relative directory path for 'O='
  Makefile: revert "Makefile: Document ability to make file.lst and file.S" partially
  kbuild: Do not run modules_install and install in paralel
  ocfs2: fix start offset to ocfs2_zero_range_for_truncate()
  ocfs2/dlm: fix race between convert and migration
  crypto: echainiv - Replace chaining with multiplication
  crypto: skcipher - Fix blkcipher walk OOM crash
  crypto: arm/aes-ctr - fix NULL dereference in tail processing
  crypto: arm64/aes-ctr - fix NULL dereference in tail processing
  tcp: properly scale window in tcp_v[46]_reqsk_send_ack()
  tcp: fix use after free in tcp_xmit_retransmit_queue()
  tcp: cwnd does not increase in TCP YeAH
  ipv6: release dst in ping_v6_sendmsg
  ipv4: panic in leaf_walk_rcu due to stale node pointer
  reiserfs: fix "new_insert_key may be used uninitialized ..."
  Fix build warning in kernel/cpuset.c
  include/linux/kernel.h: change abs() macro so it uses consistent return type
  Linux 4.4.22
  openrisc: fix the fix of copy_from_user()
  avr32: fix 'undefined reference to `___copy_from_user'
  ia64: copy_from_user() should zero the destination on access_ok() failure
  genirq/msi: Fix broken debug output
  ppc32: fix copy_from_user()
  sparc32: fix copy_from_user()
  mn10300: copy_from_user() should zero on access_ok() failure...
  nios2: copy_from_user() should zero the tail of destination
  openrisc: fix copy_from_user()
  parisc: fix copy_from_user()
  metag: copy_from_user() should zero the destination on access_ok() failure
  alpha: fix copy_from_user()
  asm-generic: make copy_from_user() zero the destination properly
  mips: copy_from_user() must zero the destination on access_ok() failure
  hexagon: fix strncpy_from_user() error return
  sh: fix copy_from_user()
  score: fix copy_from_user() and friends
  blackfin: fix copy_from_user()
  cris: buggered copy_from_user/copy_to_user/clear_user
  frv: fix clear_user()
  asm-generic: make get_user() clear the destination on errors
  ARC: uaccess: get_user to zero out dest in cause of fault
  s390: get_user() should zero on failure
  score: fix __get_user/get_user
  nios2: fix __get_user()
  sh64: failing __get_user() should zero
  m32r: fix __get_user()
  mn10300: failing __get_user() and get_user() should zero
  fix minor infoleak in get_user_ex()
  microblaze: fix copy_from_user()
  avr32: fix copy_from_user()
  microblaze: fix __get_user()
  fix iov_iter_fault_in_readable()
  irqchip/atmel-aic: Fix potential deadlock in ->xlate()
  genirq: Provide irq_gc_{lock_irqsave,unlock_irqrestore}() helpers
  drm: Only use compat ioctl for addfb2 on X86/IA64
  drm: atmel-hlcdc: Fix vertical scaling
  net: simplify napi_synchronize() to avoid warnings
  kconfig: tinyconfig: provide whole choice blocks to avoid warnings
  soc: qcom/spm: shut up uninitialized variable warning
  pinctrl: at91-pio4: use %pr format string for resource
  mmc: dw_mmc: use resource_size_t to store physical address
  drm/i915: Avoid pointer arithmetic in calculating plane surface offset
  mpssd: fix buffer overflow warning
  gma500: remove annoying deprecation warning
  ipv6: addrconf: fix dev refcont leak when DAD failed
  sched/core: Fix a race between try_to_wake_up() and a woken up task
  Revert "wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel"
  ath9k: fix using sta->drv_priv before initializing it
  md-cluster: make md-cluster also can work when compiled into kernel
  xhci: fix null pointer dereference in stop command timeout function
  fuse: direct-io: don't dirty ITER_BVEC pages
  Btrfs: remove root_log_ctx from ctx list before btrfs_sync_log returns
  crypto: cryptd - initialize child shash_desc on import
  arm64: spinlocks: implement smp_mb__before_spinlock() as smp_mb()
  pinctrl: sunxi: fix uart1 CTS/RTS pins at PG on A23/A33
  pinctrl: pistachio: fix mfio pll_lock pinmux
  dm crypt: fix error with too large bios
  dm log writes: move IO accounting earlier to fix error path
  dm log writes: fix check of kthread_run() return value
  bus: arm-ccn: Fix XP watchpoint settings bitmask
  bus: arm-ccn: Do not attempt to configure XPs for cycle counter
  bus: arm-ccn: Fix PMU handling of MN
  ARM: dts: STiH407-family: Provide interconnect clock for consumption in ST SDHCI
  ARM: dts: overo: fix gpmc nand on boards with ethernet
  ARM: dts: overo: fix gpmc nand cs0 range
  ARM: dts: imx6qdl: Fix SPDIF regression
  ARM: OMAP3: hwmod data: Add sysc information for DSI
  ARM: kirkwood: ib62x0: fix size of u-boot environment partition
  ARM: imx6: add missing BM_CLPCR_BYPASS_PMIC_READY setting for imx6sx
  ARM: imx6: add missing BM_CLPCR_BYP_MMDC_CH0_LPM_HS setting for imx6ul
  ARM: AM43XX: hwmod: Fix RSTST register offset for pruss
  cpuset: make sure new tasks conform to the current config of the cpuset
  net: thunderx: Fix OOPs with ethtool --register-dump
  USB: change bInterval default to 10 ms
  ARM: dts: STiH410: Handle interconnect clock required by EHCI/OHCI (USB)
  usb: chipidea: udc: fix NULL ptr dereference in isr_setup_status_phase
  usb: renesas_usbhs: fix clearing the {BRDY,BEMP}STS condition
  USB: serial: simple: add support for another Infineon flashloader
  serial: 8250: added acces i/o products quad and octal serial cards
  serial: 8250_mid: fix divide error bug if baud rate is 0
  iio: ensure ret is initialized to zero before entering do loop
  iio:core: fix IIO_VAL_FRACTIONAL sign handling
  iio: accel: kxsd9: Fix scaling bug
  iio: fix pressure data output unit in hid-sensor-attributes
  iio: accel: bmc150: reset chip at init time
  iio: adc: at91: unbreak channel adc channel 3
  iio: ad799x: Fix buffered capture for ad7991/ad7995/ad7999
  iio: adc: ti_am335x_adc: Increase timeout value waiting for ADC sample
  iio: adc: ti_am335x_adc: Protect FIFO1 from concurrent access
  iio: adc: rockchip_saradc: reset saradc controller before programming it
  iio: proximity: as3935: set up buffer timestamps for non-zero values
  iio: accel: kxsd9: Fix raw read return
  kvm-arm: Unmap shadow pagetables properly
  x86/AMD: Apply erratum 665 on machines without a BIOS fix
  x86/paravirt: Do not trace _paravirt_ident_*() functions
  ARC: mm: fix build breakage with STRICT_MM_TYPECHECKS
  IB/uverbs: Fix race between uverbs_close and remove_one
  dm flakey: fix reads to be issued if drop_writes configured
  audit: fix exe_file access in audit_exe_compare
  mm: introduce get_task_exe_file
  kexec: fix double-free when failing to relocate the purgatory
  NFSv4.1: Fix the CREATE_SESSION slot number accounting
  pNFS: Ensure LAYOUTGET and LAYOUTRETURN are properly serialised
  nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock
  NFSv4.x: Fix a refcount leak in nfs_callback_up_net
  pNFS: The client must not do I/O to the DS if it's lease has expired
  kernfs: don't depend on d_find_any_alias() when generating notifications
  powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET
  powerpc/powernv : Drop reference added by kset_find_obj()
  powerpc/tm: do not use r13 for tabort_syscall
  tipc: move linearization of buffers to generic code
  lightnvm: put bio before return
  fscrypto: require write access to mount to set encryption policy
  Revert "KVM: x86: fix missed hardware breakpoints"
  MIPS: KVM: Check for pfn noslot case
  clocksource/drivers/sun4i: Clear interrupts after stopping timer in probe function
  fscrypto: add authorization check for setting encryption policy
  ext4: use __GFP_NOFAIL in ext4_free_blocks()

Conflicts:
	arch/arm/kernel/devtree.c
	arch/arm64/Kconfig
	arch/arm64/kernel/arm64ksyms.c
	arch/arm64/kernel/psci.c
	arch/arm64/mm/fault.c
	drivers/android/binder.c
	drivers/usb/host/xhci-hub.c
	fs/ext4/readpage.c
	include/linux/mmc/core.h
	include/linux/mmzone.h
	mm/memcontrol.c
	net/core/filter.c
	net/netlink/af_netlink.c
	net/netlink/af_netlink.h

Change-Id: I99fe7a0914e83e284b11b33185b71448a8999d1f
Signed-off-by: Runmin Wang <runminw@codeaurora.org>
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2017-02-28 17:10:49 -08:00
Syed Rameez Mustafa
b9b63b0c62 sched/hmp: Fix memory leak when task fork fails
The scheduler allocates memory for the task load structures during
fork. It then relies to sched_exit() to be called to free that memory.
However, if the fork itself fails at any point after the allocation,
the memory is left unclaimed forever. Fix this memory leak by freeing
the allocated memory under error conditions.

Change-Id: I14a8290c9fcc4174ec80560e9f9d7bcdb119761f
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-11-07 14:46:22 -08:00
Alex Shi
a66f9577c6 Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2016-10-18 12:31:07 +08:00
Syed Rameez Mustafa
eb7300e9a8 sched: Add per CPU load tracking for each task
Keeping a track of the load footprint of each task on every CPU
that it executed on gives the scheduler much more flexibility in
terms of the number of frequency guidance policies. These new fields
will be used in subsequent patches as we alter the load fixup
mechanism upon task migration. We still need to maintain the
curr/prev_window sums as they will also be required in subsequent
patches as we start to track top tasks based on cumulative load.

Also, we need to call init_new_task_load() for the idle task. This
is an existing harmless bug as load tracking for the idle task is
irrelevant. However, in this patch we are adding pointers to the
ravg structure. These pointers have to be initialized even for the
idle task.

Finally move init_new_task_load() to sched_fork(). This was always
the more appropriate place, however, following the introduction of
new pointers in the ravg struct, this is necessary to avoid races
with functions such as reset_all_task_stats().

Change-Id: Ib584372eb539706da4319973314e54dae04e5934
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-10-17 12:43:53 -07:00
Alex Shi
16d185eee4 Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android
Conflicts:
	kernel/cpuset.c
2016-10-11 23:33:37 +02:00
Michal Hocko
82b7839a40 kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
commit 735f2770a770156100f534646158cb58cb8b2939 upstream.

Commit fec1d01152 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal
exit") has caused a subtle regression in nscd which uses
CLONE_CHILD_CLEARTID to clear the nscd_certainly_running flag in the
shared databases, so that the clients are notified when nscd is
restarted.  Now, when nscd uses a non-persistent database, clients that
have it mapped keep thinking the database is being updated by nscd, when
in fact nscd has created a new (anonymous) one (for non-persistent
databases it uses an unlinked file as backend).

The original proposal for the CLONE_CHILD_CLEARTID change claimed
(https://lkml.org/lkml/2006/10/25/233):

: The NPTL library uses the CLONE_CHILD_CLEARTID flag on clone() syscalls
: on behalf of pthread_create() library calls.  This feature is used to
: request that the kernel clear the thread-id in user space (at an address
: provided in the syscall) when the thread disassociates itself from the
: address space, which is done in mm_release().
:
: Unfortunately, when a multi-threaded process incurs a core dump (such as
: from a SIGSEGV), the core-dumping thread sends SIGKILL signals to all of
: the other threads, which then proceed to clear their user-space tids
: before synchronizing in exit_mm() with the start of core dumping.  This
: misrepresents the state of process's address space at the time of the
: SIGSEGV and makes it more difficult for someone to debug NPTL and glibc
: problems (misleading him/her to conclude that the threads had gone away
: before the fault).
:
: The fix below is to simply avoid the CLONE_CHILD_CLEARTID action if a
: core dump has been initiated.

The resulting patch from Roland (https://lkml.org/lkml/2006/10/26/269)
seems to have a larger scope than the original patch asked for.  It
seems that limitting the scope of the check to core dumping should work
for SIGSEGV issue describe above.

[Changelog partly based on Andreas' description]
Fixes: fec1d01152 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
Link: http://lkml.kernel.org/r/1471968749-26173-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Tested-by: William Preston <wpreston@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Andreas Schwab <schwab@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-07 15:23:46 +02:00
Mateusz Guzik
f750847daa mm: introduce get_task_exe_file
commit cd81a9170e69e018bbaba547c1fd85a585f5697a upstream.

For more convenient access if one has a pointer to the task.

As a minor nit take advantage of the fact that only task lock + rcu are
needed to safely grab ->exe_file. This saves mm refcount dance.

Use the helper in proc_exe_link.

Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-24 10:07:36 +02:00
Balbir Singh
db8c7fff99 cgroup: reduce read locked section of cgroup_threadgroup_rwsem during fork
commit 568ac888215c7fb2fabe8ea739b00ec3c1f5d440 upstream.

cgroup_threadgroup_rwsem is acquired in read mode during process exit
and fork.  It is also grabbed in write mode during
__cgroups_proc_write().  I've recently run into a scenario with lots
of memory pressure and OOM and I am beginning to see

systemd

 __switch_to+0x1f8/0x350
 __schedule+0x30c/0x990
 schedule+0x48/0xc0
 percpu_down_write+0x114/0x170
 __cgroup_procs_write.isra.12+0xb8/0x3c0
 cgroup_file_write+0x74/0x1a0
 kernfs_fop_write+0x188/0x200
 __vfs_write+0x6c/0xe0
 vfs_write+0xc0/0x230
 SyS_write+0x6c/0x110
 system_call+0x38/0xb4

This thread is waiting on the reader of cgroup_threadgroup_rwsem to
exit.  The reader itself is under memory pressure and has gone into
reclaim after fork. There are times the reader also ends up waiting on
oom_lock as well.

 __switch_to+0x1f8/0x350
 __schedule+0x30c/0x990
 schedule+0x48/0xc0
 jbd2_log_wait_commit+0xd4/0x180
 ext4_evict_inode+0x88/0x5c0
 evict+0xf8/0x2a0
 dispose_list+0x50/0x80
 prune_icache_sb+0x6c/0x90
 super_cache_scan+0x190/0x210
 shrink_slab.part.15+0x22c/0x4c0
 shrink_zone+0x288/0x3c0
 do_try_to_free_pages+0x1dc/0x590
 try_to_free_pages+0xdc/0x260
 __alloc_pages_nodemask+0x72c/0xc90
 alloc_pages_current+0xb4/0x1a0
 page_table_alloc+0xc0/0x170
 __pte_alloc+0x58/0x1f0
 copy_page_range+0x4ec/0x950
 copy_process.isra.5+0x15a0/0x1870
 _do_fork+0xa8/0x4b0
 ppc_clone+0x8/0xc

In the meanwhile, all processes exiting/forking are blocked almost
stalling the system.

This patch moves the threadgroup_change_begin from before
cgroup_fork() to just before cgroup_canfork().  There is no nee to
worry about threadgroup changes till the task is actually added to the
threadgroup.  This avoids having to call reclaim with
cgroup_threadgroup_rwsem held.

tj: Subject and description edits.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Zefan Li <lizefan@huawei.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:51 +02:00
Balbir Singh
48bb58c012 RFC: FROMLIST: cgroup: reduce read locked section of cgroup_threadgroup_rwsem during fork
cgroup_threadgroup_rwsem is acquired in read mode during process exit
and fork.  It is also grabbed in write mode during
__cgroups_proc_write().  I've recently run into a scenario with lots
of memory pressure and OOM and I am beginning to see

systemd

 __switch_to+0x1f8/0x350
 __schedule+0x30c/0x990
 schedule+0x48/0xc0
 percpu_down_write+0x114/0x170
 __cgroup_procs_write.isra.12+0xb8/0x3c0
 cgroup_file_write+0x74/0x1a0
 kernfs_fop_write+0x188/0x200
 __vfs_write+0x6c/0xe0
 vfs_write+0xc0/0x230
 SyS_write+0x6c/0x110
 system_call+0x38/0xb4

This thread is waiting on the reader of cgroup_threadgroup_rwsem to
exit.  The reader itself is under memory pressure and has gone into
reclaim after fork. There are times the reader also ends up waiting on
oom_lock as well.

 __switch_to+0x1f8/0x350
 __schedule+0x30c/0x990
 schedule+0x48/0xc0
 jbd2_log_wait_commit+0xd4/0x180
 ext4_evict_inode+0x88/0x5c0
 evict+0xf8/0x2a0
 dispose_list+0x50/0x80
 prune_icache_sb+0x6c/0x90
 super_cache_scan+0x190/0x210
 shrink_slab.part.15+0x22c/0x4c0
 shrink_zone+0x288/0x3c0
 do_try_to_free_pages+0x1dc/0x590
 try_to_free_pages+0xdc/0x260
 __alloc_pages_nodemask+0x72c/0xc90
 alloc_pages_current+0xb4/0x1a0
 page_table_alloc+0xc0/0x170
 __pte_alloc+0x58/0x1f0
 copy_page_range+0x4ec/0x950
 copy_process.isra.5+0x15a0/0x1870
 _do_fork+0xa8/0x4b0
 ppc_clone+0x8/0xc

In the meanwhile, all processes exiting/forking are blocked almost
stalling the system.

This patch moves the threadgroup_change_begin from before
cgroup_fork() to just before cgroup_canfork().  There is no nee to
worry about threadgroup changes till the task is actually added to the
threadgroup.  This avoids having to call reclaim with
cgroup_threadgroup_rwsem held.

tj: Subject and description edits.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Zefan Li <lizefan@huawei.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Tejun Heo <tj@kernel.org>
[jstultz: Cherry-picked from:
 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git 568ac888215c7f]
Change-Id: Ie8ece84fb613cf6a7b08cea1468473a8df2b9661
Signed-off-by: John Stultz <john.stultz@linaro.org>
2016-09-14 14:26:20 +05:30
Balbir Singh
35f2961082 RFC: FROMLIST: cgroup: reduce read locked section of cgroup_threadgroup_rwsem during fork
cgroup_threadgroup_rwsem is acquired in read mode during process exit
and fork.  It is also grabbed in write mode during
__cgroups_proc_write().  I've recently run into a scenario with lots
of memory pressure and OOM and I am beginning to see

systemd

 __switch_to+0x1f8/0x350
 __schedule+0x30c/0x990
 schedule+0x48/0xc0
 percpu_down_write+0x114/0x170
 __cgroup_procs_write.isra.12+0xb8/0x3c0
 cgroup_file_write+0x74/0x1a0
 kernfs_fop_write+0x188/0x200
 __vfs_write+0x6c/0xe0
 vfs_write+0xc0/0x230
 SyS_write+0x6c/0x110
 system_call+0x38/0xb4

This thread is waiting on the reader of cgroup_threadgroup_rwsem to
exit.  The reader itself is under memory pressure and has gone into
reclaim after fork. There are times the reader also ends up waiting on
oom_lock as well.

 __switch_to+0x1f8/0x350
 __schedule+0x30c/0x990
 schedule+0x48/0xc0
 jbd2_log_wait_commit+0xd4/0x180
 ext4_evict_inode+0x88/0x5c0
 evict+0xf8/0x2a0
 dispose_list+0x50/0x80
 prune_icache_sb+0x6c/0x90
 super_cache_scan+0x190/0x210
 shrink_slab.part.15+0x22c/0x4c0
 shrink_zone+0x288/0x3c0
 do_try_to_free_pages+0x1dc/0x590
 try_to_free_pages+0xdc/0x260
 __alloc_pages_nodemask+0x72c/0xc90
 alloc_pages_current+0xb4/0x1a0
 page_table_alloc+0xc0/0x170
 __pte_alloc+0x58/0x1f0
 copy_page_range+0x4ec/0x950
 copy_process.isra.5+0x15a0/0x1870
 _do_fork+0xa8/0x4b0
 ppc_clone+0x8/0xc

In the meanwhile, all processes exiting/forking are blocked almost
stalling the system.

This patch moves the threadgroup_change_begin from before
cgroup_fork() to just before cgroup_canfork().  There is no nee to
worry about threadgroup changes till the task is actually added to the
threadgroup.  This avoids having to call reclaim with
cgroup_threadgroup_rwsem held.

tj: Subject and description edits.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Zefan Li <lizefan@huawei.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Tejun Heo <tj@kernel.org>
[jstultz: Cherry-picked from:
 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git 568ac888215c7f]
Change-Id: Ie8ece84fb613cf6a7b08cea1468473a8df2b9661
Signed-off-by: John Stultz <john.stultz@linaro.org>
Git-commit: e91f1799ff
Git-repo: https://android.googlesource.com/kernel/common/+/android-4.4
Signed-off-by: Omprakash Dhyade <odhyade@codeaurora.org>
2016-08-29 14:19:13 -07:00
Liam Mark
21f86651a6 android/lowmemorykiller: Ignore tasks with freed mm
A killed task can stay in the task list long after its
memory has been returned to the system, therefore
ignore any tasks whose mm struct has been freed.

Change-Id: I76394b203b4ab2312437c839976f0ecb7b6dde4e
CRs-fixed: 450383
Signed-off-by: Liam Mark <lmark@codeaurora.org>
2016-04-13 11:09:29 -07:00
Se Wang (Patrick) Oh
dae9a397e1 kernel: fork: Call KASan alloc before release the thread info pages
the pages allocated for thread info is used for stack. KAsan marks
some stack memory region for guarding area and the bitmasks for
that region are not cleared until the pages are freed. When
CONFIG_PAGE_POISONING is enabled, as the pages still have special
bitmasks, a out of bound access KASan report arises during pages
poisoning. So mark the pages as alloc status before poisoning the
pages.
==================================================================
BUG: KASan: out of bounds on stack in memset+0x24/0x44 at addr ffffffc0b8e3f000
Write of size 4096 by task swapper/0/0
page:ffffffbacc38e760 count:0 mapcount:0 mapping:          (null) index:0x0
flags: 0x4000000000000000()
page dumped because: kasan: bad access detected
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W      3.18.0-g5a4a5d5-07244-g488682c-dirty #12
Hardware name: Qualcomm Technologies, Inc. MSM 8996 v2.0 LiQUID (DT)
Call trace:
[<ffffffc00008c010>] dump_backtrace+0x0/0x250
[<ffffffc00008c270>] show_stack+0x10/0x1c
[<ffffffc001b6f9e4>] dump_stack+0x74/0xfc
[<ffffffc0002debf4>] kasan_report_error+0x2b0/0x408
[<ffffffc0002dee28>] kasan_report+0x34/0x40
[<ffffffc0002de240>] __asan_storeN+0x15c/0x168
[<ffffffc0002de47c>] memset+0x20/0x44
[<ffffffc0002d77bc>] kernel_map_pages+0x2e8/0x384
[<ffffffc000266458>] free_pages_prepare+0x340/0x3a0
[<ffffffc0002694cc>] __free_pages_ok+0x20/0x12c
[<ffffffc00026a698>] __free_pages+0x34/0x44
[<ffffffc00026abb0>] free_kmem_pages+0x68/0x80
[<ffffffc0000b0424>] free_task+0x80/0xac
[<ffffffc0000b05a8>] __put_task_struct+0x158/0x23c
[<ffffffc0000b9194>] delayed_put_task_struct+0x188/0x1cc
[<ffffffc00018586c>] rcu_process_callbacks+0x6cc/0xbb0
[<ffffffc0000bfdb0>] __do_softirq+0x368/0x750
[<ffffffc0000c0630>] irq_exit+0xd8/0x15c
[<ffffffc00016f610>] __handle_domain_irq+0x108/0x168
[<ffffffc000081af8>] gic_handle_irq+0x50/0xc0
Memory state around the buggy address:
 ffffffc0b8e3f980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffffffc0b8e3fa00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 >ffffffc0b8e3fa80: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00
                                            ^
 ffffffc0b8e3fb00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffffffc0b8e3fb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Change-Id: I90aa1c6e82a0bde58d2d5d68d84e67f932728a88
Signed-off-by: Se Wang (Patrick) Oh <sewango@codeaurora.org>
2016-03-22 11:10:44 -07:00
San Mehat
9d19f72b43 proc: smaps: Allow smaps access for CAP_SYS_RESOURCE
Signed-off-by: San Mehat <san@google.com>
2016-02-16 13:53:50 -08:00
Sebastian Andrzej Siewior
093e5840ae sched/core: Reset task's lockless wake-queues on fork()
In the following commit:

  7675104990 ("sched: Implement lockless wake-queues")

we gained lockless wake-queues.

The -RT kernel managed to lockup itself with those. There could be multiple
attempts for task X to enqueue it for a wakeup _even_ if task X is already
running.

The reason is that task X could be runnable but not yet on CPU. The the
task performing the wakeup did not leave the CPU it could performe
multiple wakeups.

With the proper timming task X could be running and enqueued for a
wakeup. If this happens while X is performing a fork() then its its
child will have a !NULL `wake_q` member copied.

This is not a problem as long as the child task does not participate in
lockless wakeups :)

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 7675104990 ("sched: Implement lockless wake-queues")
Link: http://lkml.kernel.org/r/20151221171710.GA5499@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-06 11:01:07 +01:00
Oleg Nesterov
c9e75f0492 cgroup: pids: fix race between cgroup_post_fork() and cgroup_migrate()
If the new child migrates to another cgroup before cgroup_post_fork() calls
subsys->fork(), then both pids_can_attach() and pids_fork() will do the same
pids_uncharge(old_pids) + pids_charge(pids) sequence twice.

Change copy_process() to call threadgroup_change_begin/threadgroup_change_end
unconditionally. percpu_down_read() is cheap and this allows other cleanups,
see the next changes.

Also, this way we can unify cgroup_threadgroup_rwsem and dup_mmap_sem.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2015-11-30 09:48:18 -05:00
Linus Torvalds
2e3078af2c Merge branch 'akpm' (patches from Andrew)
Merge patch-bomb from Andrew Morton:

 - inotify tweaks

 - some ocfs2 updates (many more are awaiting review)

 - various misc bits

 - kernel/watchdog.c updates

 - Some of mm.  I have a huge number of MM patches this time and quite a
   lot of it is quite difficult and much will be held over to next time.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (162 commits)
  selftests: vm: add tests for lock on fault
  mm: mlock: add mlock flags to enable VM_LOCKONFAULT usage
  mm: introduce VM_LOCKONFAULT
  mm: mlock: add new mlock system call
  mm: mlock: refactor mlock, munlock, and munlockall code
  kasan: always taint kernel on report
  mm, slub, kasan: enable user tracking by default with KASAN=y
  kasan: use IS_ALIGNED in memory_is_poisoned_8()
  kasan: Fix a type conversion error
  lib: test_kasan: add some testcases
  kasan: update reference to kasan prototype repo
  kasan: move KASAN_SANITIZE in arch/x86/boot/Makefile
  kasan: various fixes in documentation
  kasan: update log messages
  kasan: accurately determine the type of the bad access
  kasan: update reported bug types for kernel memory accesses
  kasan: update reported bug types for not user nor kernel memory accesses
  mm/kasan: prevent deadlock in kasan reporting
  mm/kasan: don't use kasan shadow pointer in generic functions
  mm/kasan: MODULE_VADDR is not available on all archs
  ...
2015-11-05 23:10:54 -08:00
Eric B Munson
de60f5f10c mm: introduce VM_LOCKONFAULT
The cost of faulting in all memory to be locked can be very high when
working with large mappings.  If only portions of the mapping will be used
this can incur a high penalty for locking.

For the example of a large file, this is the usage pattern for a large
statical language model (probably applies to other statical or graphical
models as well).  For the security example, any application transacting in
data that cannot be swapped out (credit card data, medical records, etc).

This patch introduces the ability to request that pages are not
pre-faulted, but are placed on the unevictable LRU when they are finally
faulted in.  The VM_LOCKONFAULT flag will be used together with VM_LOCKED
and has no effect when set without VM_LOCKED.  Setting the VM_LOCKONFAULT
flag for a VMA will cause pages faulted into that VMA to be added to the
unevictable LRU when they are faulted or if they are already present, but
will not cause any missing pages to be faulted in.

Exposing this new lock state means that we cannot overload the meaning of
the FOLL_POPULATE flag any longer.  Prior to this patch it was used to
mean that the VMA for a fault was locked.  This means we need the new
FOLL_MLOCK flag to communicate the locked state of a VMA.  FOLL_POPULATE
will now only control if the VMA should be populated and in the case of
VM_LOCKONFAULT, it will not be set.

Signed-off-by: Eric B Munson <emunson@akamai.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Linus Torvalds
69234acee5 Merge branch 'for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "The cgroup core saw several significant updates this cycle:

   - percpu_rwsem for threadgroup locking is reinstated.  This was
     temporarily dropped due to down_write latency issues.  Oleg's
     rework of percpu_rwsem which is scheduled to be merged in this
     merge window resolves the issue.

   - On the v2 hierarchy, when controllers are enabled and disabled, all
     operations are atomic and can fail and revert cleanly.  This allows
     ->can_attach() failure which is necessary for cpu RT slices.

   - Tasks now stay associated with the original cgroups after exit
     until released.  This allows tracking resources held by zombies
     (e.g.  pids) and makes it easy to find out where zombies came from
     on the v2 hierarchy.  The pids controller was broken before these
     changes as zombies escaped the limits; unfortunately, updating this
     behavior required too many invasive changes and I don't think it's
     a good idea to backport them, so the pids controller on 4.3, the
     first version which included the pids controller, will stay broken
     at least until I'm sure about the cgroup core changes.

   - Optimization of a couple common tests using static_key"

* 'for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (38 commits)
  cgroup: fix race condition around termination check in css_task_iter_next()
  blkcg: don't create "io.stat" on the root cgroup
  cgroup: drop cgroup__DEVEL__legacy_files_on_dfl
  cgroup: replace error handling in cgroup_init() with WARN_ON()s
  cgroup: add cgroup_subsys->free() method and use it to fix pids controller
  cgroup: keep zombies associated with their original cgroups
  cgroup: make css_set_rwsem a spinlock and rename it to css_set_lock
  cgroup: don't hold css_set_rwsem across css task iteration
  cgroup: reorganize css_task_iter functions
  cgroup: factor out css_set_move_task()
  cgroup: keep css_set and task lists in chronological order
  cgroup: make cgroup_destroy_locked() test cgroup_is_populated()
  cgroup: make css_sets pin the associated cgroups
  cgroup: relocate cgroup_[try]get/put()
  cgroup: move check_for_release() invocation
  cgroup: replace cgroup_has_tasks() with cgroup_is_populated()
  cgroup: make cgroup->nr_populated count the number of populated css_sets
  cgroup: remove an unused parameter from cgroup_task_migrate()
  cgroup: fix too early usage of static_branch_disable()
  cgroup: make cgroup_update_dfl_csses() migrate all target processes atomically
  ...
2015-11-05 14:51:32 -08:00
Tejun Heo
2e91fa7f6d cgroup: keep zombies associated with their original cgroups
cgroup_exit() is called when a task exits and disassociates the
exiting task from its cgroups and half-attach it to the root cgroup.
This is unnecessary and undesirable.

No controller actually needs an exiting task to be disassociated with
non-root cgroups.  Both cpu and perf_event controllers update the
association to the root cgroup from their exit callbacks just to keep
consistent with the cgroup core behavior.

Also, this disassociation makes it difficult to track resources held
by zombies or determine where the zombies came from.  Currently, pids
controller is completely broken as it uncharges on exit and zombies
always escape the resource restriction.  With cgroup association being
reset on exit, fixing it is pretty painful.

There's no reason to reset cgroup membership on exit.  The zombie can
be removed from its css_set so that it doesn't show up on
"cgroup.procs" and thus can't be migrated or interfere with cgroup
removal.  It can still pin and point to the css_set so that its cgroup
membership is maintained.  This patch makes cgroup core keep zombies
associated with their cgroups at the time of exit.

* Previous patches decoupled populated_cnt tracking from css_set
  lifetime, so a dying task can be simply unlinked from its css_set
  while pinning and pointing to the css_set.  This keeps css_set
  association from task side alive while hiding it from "cgroup.procs"
  and populated_cnt tracking.  The css_set reference is dropped when
  the task_struct is freed.

* ->exit() callback no longer needs the css arguments as the
  associated css never changes once PF_EXITING is set.  Removed.

* cpu and perf_events controllers no longer need ->exit() callbacks.
  There's no reason to explicitly switch away on exit.  The final
  schedule out is enough.  The callbacks are removed.

* On traditional hierarchies, nothing changes.  "/proc/PID/cgroup"
  still reports "/" for all zombies.  On the default hierarchy,
  "/proc/PID/cgroup" keeps reporting the cgroup that the task belonged
  to at the time of exit.  If the cgroup gets removed before the task
  is reaped, " (deleted)" is appended.

v2: Build brekage due to missing dummy cgroup_free() when
    !CONFIG_CGROUP fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
2015-10-15 16:41:53 -04:00
Jason Low
d5c373eb56 posix_cpu_timer: Convert cputimer->running to bool
In the next patch in this series, a new field 'checking_timer' will
be added to 'struct thread_group_cputimer'. Both this and the
existing 'running' integer field are just used as boolean values. To
save space in the structure, we can make both of these fields booleans.

This is a preparatory patch to convert the existing running integer
field to a boolean.

Suggested-by: George Spelvin <linux@horizon.com>
Signed-off-by: Jason Low <jason.low2@hp.com>
Reviewed: George Spelvin <linux@horizon.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: hideaki.kimura@hpe.com
Cc: terry.rudd@hpe.com
Cc: scott.norton@hpe.com
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1444849677-29330-4-git-send-email-jason.low2@hp.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-15 11:23:41 +02:00
Tejun Heo
1ed1328792 sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem
Note: This commit was originally committed as d59cfc09c3 but got
      reverted by 0c986253b9 due to the performance regression from
      the percpu_rwsem write down/up operations added to cgroup task
      migration path.  percpu_rwsem changes which alleviate the
      performance issue are pending for v4.4-rc1 merge window.
      Re-apply.

The cgroup side of threadgroup locking uses signal_struct->group_rwsem
to synchronize against threadgroup changes.  This per-process rwsem
adds small overhead to thread creation, exit and exec paths, forces
cgroup code paths to do lock-verify-unlock-retry dance in a couple
places and makes it impossible to atomically perform operations across
multiple processes.

This patch replaces signal_struct->group_rwsem with a global
percpu_rwsem cgroup_threadgroup_rwsem which is cheaper on the reader
side and contained in cgroups proper.  This patch converts one-to-one.

This does make writer side heavier and lower the granularity; however,
cgroup process migration is a fairly cold path, we do want to optimize
thread operations over it and cgroup migration operations don't take
enough time for the lower granularity to matter.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/g/55F8097A.7000206@de.ibm.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
2015-09-16 12:53:17 -04:00
Tejun Heo
0c986253b9 Revert "sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem"
This reverts commit d59cfc09c3.

d59cfc09c3 ("sched, cgroup: replace signal_struct->group_rwsem with
a global percpu_rwsem") and b5ba75b5fc ("cgroup: simplify
threadgroup locking") changed how cgroup synchronizes against task
fork and exits so that it uses global percpu_rwsem instead of
per-process rwsem; unfortunately, the write [un]lock paths of
percpu_rwsem always involve synchronize_rcu_expedited() which turned
out to be too expensive.

Improvements for percpu_rwsem are scheduled to be merged in the coming
v4.4-rc1 merge window which alleviates this issue.  For now, revert
the two commits to restore per-process rwsem.  They will be re-applied
for the v4.4-rc1 merge window.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/g/55F8097A.7000206@de.ibm.com
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org # v4.2+
2015-09-16 11:51:12 -04:00
Andrea Arcangeli
16ba6f811d userfaultfd: add VM_UFFD_MISSING and VM_UFFD_WP
These two flags gets set in vma->vm_flags to tell the VM common code
if the userfaultfd is armed and in which mode (only tracking missing
faults, only tracking wrprotect faults or both). If neither flags is
set it means the userfaultfd is not armed on the vma.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Cc: zhang.zhanghailiang@huawei.com
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Andrea Arcangeli
745f234be1 userfaultfd: add vm_userfaultfd_ctx to the vm_area_struct
This adds the vm_userfaultfd_ctx to the vm_area_struct.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Cc: zhang.zhanghailiang@huawei.com
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Linus Torvalds
8bdc69b764 Merge branch 'for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:

 - a new PIDs controller is added.  It turns out that PIDs are actually
   an independent resource from kmem due to the limited PID space.

 - more core preparations for the v2 interface.  Once cpu side interface
   is settled, it should be ready for lifting the devel mask.
   for-4.3-unified-base was temporarily branched so that other trees
   (block) can pull cgroup core changes that blkcg changes depend on.

 - a non-critical idr_preload usage bug fix.

* 'for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: pids: fix invalid get/put usage
  cgroup: introduce cgroup_subsys->legacy_name
  cgroup: don't print subsystems for the default hierarchy
  cgroup: make cftype->private a unsigned long
  cgroup: export cgrp_dfl_root
  cgroup: define controller file conventions
  cgroup: fix idr_preload usage
  cgroup: add documentation for the PIDs controller
  cgroup: implement the PIDs subsystem
  cgroup: allow a cgroup subsystem to reject a fork
2015-09-02 08:04:23 -07:00
Linus Torvalds
73b6fa8e49 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace updates from Eric Biederman:
 "This finishes up the changes to ensure proc and sysfs do not start
  implementing executable files, as the there are application today that
  are only secure because such files do not exist.

  It akso fixes a long standing misfeature of /proc/<pid>/mountinfo that
  did not show the proper source for files bind mounted from
  /proc/<pid>/ns/*.

  It also straightens out the handling of clone flags related to user
  namespaces, fixing an unnecessary failure of unshare(CLONE_NEWUSER)
  when files such as /proc/<pid>/environ are read while <pid> is calling
  unshare.  This winds up fixing a minor bug in unshare flag handling
  that dates back to the first version of unshare in the kernel.

  Finally, this fixes a minor regression caused by the introduction of
  sysfs_create_mount_point, which broke someone's in house application,
  by restoring the size of /sys/fs/cgroup to 0 bytes.  Apparently that
  application uses the directory size to determine if a tmpfs is mounted
  on /sys/fs/cgroup.

  The bind mount escape fixes are present in Al Viros for-next branch.
  and I expect them to come from there.  The bind mount escape is the
  last of the user namespace related security bugs that I am aware of"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  fs: Set the size of empty dirs to 0.
  userns,pidns: Force thread group sharing, not signal handler sharing.
  unshare: Unsharing a thread does not require unsharing a vm
  nsfs: Add a show_path method to fix mountinfo
  mnt: fs_fully_visible enforce noexec and nosuid  if !SB_I_NOEXEC
  vfs: Commit to never having exectuables on proc and sysfs.
2015-09-01 16:13:25 -07:00
Linus Torvalds
a1d8561172 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The biggest change in this cycle is the rewrite of the main SMP load
  balancing metric: the CPU load/utilization.  The main goal was to make
  the metric more precise and more representative - see the changelog of
  this commit for the gory details:

    9d89c257df ("sched/fair: Rewrite runnable load and utilization average tracking")

  It is done in a way that significantly reduces complexity of the code:

    5 files changed, 249 insertions(+), 494 deletions(-)

  and the performance testing results are encouraging.  Nevertheless we
  need to keep an eye on potential regressions, since this potentially
  affects every SMP workload in existence.

  This work comes from Yuyang Du.

  Other changes:

   - SCHED_DL updates.  (Andrea Parri)

   - Simplify architecture callbacks by removing finish_arch_switch().
     (Peter Zijlstra et al)

   - cputime accounting: guarantee stime + utime == rtime.  (Peter
     Zijlstra)

   - optimize idle CPU wakeups some more - inspired by Facebook server
     loads.  (Mike Galbraith)

   - stop_machine fixes and updates.  (Oleg Nesterov)

   - Introduce the 'trace_sched_waking' tracepoint.  (Peter Zijlstra)

   - sched/numa tweaks.  (Srikar Dronamraju)

   - misc fixes and small cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
  sched/deadline: Fix comment in enqueue_task_dl()
  sched/deadline: Fix comment in push_dl_tasks()
  sched: Change the sched_class::set_cpus_allowed() calling context
  sched: Make sched_class::set_cpus_allowed() unconditional
  sched: Fix a race between __kthread_bind() and sched_setaffinity()
  sched: Ensure a task has a non-normalized vruntime when returning back to CFS
  sched/numa: Fix NUMA_DIRECT topology identification
  tile: Reorganize _switch_to()
  sched, sparc32: Update scheduler comments in copy_thread()
  sched: Remove finish_arch_switch()
  sched, tile: Remove finish_arch_switch
  sched, sh: Fold finish_arch_switch() into switch_to()
  sched, score: Remove finish_arch_switch()
  sched, avr32: Remove finish_arch_switch()
  sched, MIPS: Get rid of finish_arch_switch()
  sched, arm: Remove finish_arch_switch()
  sched/fair: Clean up load average references
  sched/fair: Provide runnable_load_avg back to cfs_rq
  sched/fair: Remove task and group entity load when they are dead
  sched/fair: Init cfs_rq's sched_entity load average
  ...
2015-08-31 20:26:22 -07:00
Eric W. Biederman
faf00da544 userns,pidns: Force thread group sharing, not signal handler sharing.
The code that places signals in signal queues computes the uids, gids,
and pids at the time the signals are enqueued.  Which means that tasks
that share signal queues must be in the same pid and user namespaces.

Sharing signal handlers is fine, but bizarre.

So make the code in fork and userns_install clearer by only testing
for what is functionally necessary.

Also update the comment in unshare about unsharing a user namespace to
be a little more explicit and make a little more sense.

Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-08-12 14:55:28 -05:00
Eric W. Biederman
12c641ab82 unshare: Unsharing a thread does not require unsharing a vm
In the logic in the initial commit of unshare made creating a new
thread group for a process, contingent upon creating a new memory
address space for that process.  That is wrong.  Two separate
processes in different thread groups can share a memory address space
and clone allows creation of such proceses.

This is significant because it was observed that mm_users > 1 does not
mean that a process is multi-threaded, as reading /proc/PID/maps
temporarily increments mm_users, which allows other processes to
(accidentally) interfere with unshare() calls.

Correct the check in check_unshare_flags() to test for
!thread_group_empty() for CLONE_THREAD, CLONE_SIGHAND, and CLONE_VM.
For sighand->count > 1 for CLONE_SIGHAND and CLONE_VM.
For !current_is_single_threaded instead of mm_users > 1 for CLONE_VM.

By using the correct checks in unshare this removes the possibility of
an accidental denial of service attack.

Additionally using the correct checks in unshare ensures that only an
explicit unshare(CLONE_VM) can possibly trigger the slow path of
current_is_single_threaded().  As an explict unshare(CLONE_VM) is
pointless it is not expected there are many applications that make
that call.

Cc: stable@vger.kernel.org
Fixes: b2e0d98705 userns: Implement unshare of the user namespace
Reported-by: Ricky Zhou <rickyz@chromium.org>
Reported-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-08-12 14:54:26 -05:00
Peter Zijlstra
9d7fb04276 sched/cputime: Guarantee stime + utime == rtime
While the current code guarantees monotonicity for stime and utime
independently of one another, it does not guarantee that the sum of
both is equal to the total time we started out with.

This confuses things (and peoples) who look at this sum, like top, and
will report >100% usage followed by a matching period of 0%.

Rework the code to provide both individual monotonicity and a coherent
sum.

Suggested-by: Fredrik Markstrom <fredrik.markstrom@gmail.com>
Reported-by: Fredrik Markstrom <fredrik.markstrom@gmail.com>
Tested-by: Fredrik Markstrom <fredrik.markstrom@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: jason.low2@hp.com
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-03 12:21:21 +02:00
Ingo Molnar
5aaeb5c01c x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86
Don't burden architectures without dynamic task_struct sizing
with the overhead of dynamic sizing.

Also optimize the x86 code a bit by caching task_struct_size.

Acked-and-Tested-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437128892-9831-3-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-18 03:42:51 +02:00
Dave Hansen
0c8c0f03e3 x86/fpu, sched: Dynamically allocate 'struct fpu'
The FPU rewrite removed the dynamic allocations of 'struct fpu'.
But, this potentially wastes massive amounts of memory (2k per
task on systems that do not have AVX-512 for instance).

Instead of having a separate slab, this patch just appends the
space that we need to the 'task_struct' which we dynamically
allocate already.  This saves from doing an extra slab
allocation at fork().

The only real downside here is that we have to stick everything
and the end of the task_struct.  But, I think the
BUILD_BUG_ON()s I stuck in there should keep that from being too
fragile.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437128892-9831-2-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-18 03:42:35 +02:00
Aleksa Sarai
7e47682ea5 cgroup: allow a cgroup subsystem to reject a fork
Add a new cgroup subsystem callback can_fork that conditionally
states whether or not the fork is accepted or rejected by a cgroup
policy. In addition, add a cancel_fork callback so that if an error
occurs later in the forking process, any state modified by can_fork can
be reverted.

Allow for a private opaque pointer to be passed from cgroup_can_fork to
cgroup_post_fork, allowing for the fork state to be stored by each
subsystem separately.

Also add a tagging system for cgroup_subsys.h to allow for CGROUP_<TAG>
enumerations to be be defined and used. In addition, explicitly add a
CGROUP_CANFORK_COUNT macro to make arrays easier to define.

This is in preparation for implementing the pids cgroup subsystem.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2015-07-14 17:29:23 -04:00
Linus Torvalds
bbe179f88d Merge branch 'for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:

 - threadgroup_lock got reorganized so that its users can pick the
   actual locking mechanism to use.  Its only user - cgroups - is
   updated to use a percpu_rwsem instead of per-process rwsem.

   This makes things a bit lighter on hot paths and allows cgroups to
   perform and fail multi-task (a process) migrations atomically.
   Multi-task migrations are used in several places including the
   unified hierarchy.

 - Delegation rule and documentation added to unified hierarchy.  This
   will likely be the last interface update from the cgroup core side
   for unified hierarchy before lifting the devel mask.

 - Some groundwork for the pids controller which is scheduled to be
   merged in the coming devel cycle.

* 'for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: add delegation section to unified hierarchy documentation
  cgroup: require write perm on common ancestor when moving processes on the default hierarchy
  cgroup: separate out cgroup_procs_write_permission() from __cgroup_procs_write()
  kernfs: make kernfs_get_inode() public
  MAINTAINERS: add a cgroup core co-maintainer
  cgroup: fix uninitialised iterator in for_each_subsys_which
  cgroup: replace explicit ss_mask checking with for_each_subsys_which
  cgroup: use bitmask to filter for_each_subsys
  cgroup: add seq_file forward declaration for struct cftype
  cgroup: simplify threadgroup locking
  sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem
  sched, cgroup: reorganize threadgroup locking
  cgroup: switch to unsigned long for bitmasks
  cgroup: reorganize include/linux/cgroup.h
  cgroup: separate out include/linux/cgroup-defs.h
  cgroup: fix some comment typos
2015-06-26 19:50:04 -07:00
Josh Triplett
3033f14ab7 clone: support passing tls argument via C rather than pt_regs magic
clone has some of the quirkiest syscall handling in the kernel, with a
pile of special cases, historical curiosities, and architecture-specific
calling conventions.  In particular, clone with CLONE_SETTLS accepts a
parameter "tls" that the C entry point completely ignores and some
assembly entry points overwrite; instead, the low-level arch-specific
code pulls the tls parameter out of the arch-specific register captured
as part of pt_regs on entry to the kernel.  That's a massive hack, and
it makes the arch-specific code only work when called via the specific
existing syscall entry points; because of this hack, any new clone-like
system call would have to accept an identical tls argument in exactly
the same arch-specific position, rather than providing a unified system
call entry point across architectures.

The first patch allows architectures to handle the tls argument via
normal C parameter passing, if they opt in by selecting
HAVE_COPY_THREAD_TLS.  The second patch makes 32-bit and 64-bit x86 opt
into this.

These two patches came out of the clone4 series, which isn't ready for
this merge window, but these first two cleanup patches were entirely
uncontroversial and have acks.  I'd like to go ahead and submit these
two so that other architectures can begin building on top of this and
opting into HAVE_COPY_THREAD_TLS.  However, I'm also happy to wait and
send these through the next merge window (along with v3 of clone4) if
anyone would prefer that.

This patch (of 2):

clone with CLONE_SETTLS accepts an argument to set the thread-local
storage area for the new thread.  sys_clone declares an int argument
tls_val in the appropriate point in the argument list (based on the
various CLONE_BACKWARDS variants), but doesn't actually use or pass along
that argument.  Instead, sys_clone calls do_fork, which calls
copy_process, which calls the arch-specific copy_thread, and copy_thread
pulls the corresponding syscall argument out of the pt_regs captured at
kernel entry (knowing what argument of clone that architecture passes tls
in).

Apart from being awful and inscrutable, that also only works because only
one code path into copy_thread can pass the CLONE_SETTLS flag, and that
code path comes from sys_clone with its architecture-specific
argument-passing order.  This prevents introducing a new version of the
clone system call without propagating the same architecture-specific
position of the tls argument.

However, there's no reason to pull the argument out of pt_regs when
sys_clone could just pass it down via C function call arguments.

Introduce a new CONFIG_HAVE_COPY_THREAD_TLS for architectures to opt into,
and a new copy_thread_tls that accepts the tls parameter as an additional
unsigned long (syscall-argument-sized) argument.  Change sys_clone's tls
argument to an unsigned long (which does not change the ABI), and pass
that down to copy_thread_tls.

Architectures that don't opt into copy_thread_tls will continue to ignore
the C argument to sys_clone in favor of the pt_regs captured at kernel
entry, and thus will be unable to introduce new versions of the clone
syscall.

Patch co-authored by Josh Triplett and Thiago Macieira.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thiago Macieira <thiago.macieira@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:38 -07:00
Tejun Heo
d59cfc09c3 sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem
The cgroup side of threadgroup locking uses signal_struct->group_rwsem
to synchronize against threadgroup changes.  This per-process rwsem
adds small overhead to thread creation, exit and exec paths, forces
cgroup code paths to do lock-verify-unlock-retry dance in a couple
places and makes it impossible to atomically perform operations across
multiple processes.

This patch replaces signal_struct->group_rwsem with a global
percpu_rwsem cgroup_threadgroup_rwsem which is cheaper on the reader
side and contained in cgroups proper.  This patch converts one-to-one.

This does make writer side heavier and lower the granularity; however,
cgroup process migration is a fairly cold path, we do want to optimize
thread operations over it and cgroup migration operations don't take
enough time for the lower granularity to matter.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
2015-05-26 20:35:00 -04:00
David Hildenbrand
8bcbde5480 sched/preempt, mm/fault: Count pagefault_disable() levels in pagefault_disabled
Until now, pagefault_disable()/pagefault_enabled() used the preempt
count to track whether in an environment with pagefaults disabled (can
be queried via in_atomic()).

This patch introduces a separate counter in task_struct to count the
level of pagefault_disable() calls. We'll keep manipulating the preempt
count to retain compatibility to existing pagefault handlers.

It is now possible to verify whether in a pagefault_disable() envionment
by calling pagefault_disabled(). In contrast to in_atomic() it will not
be influenced by preempt_enable()/preempt_disable().

This patch is based on a patch from Ingo Molnar.

Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: benh@kernel.crashing.org
Cc: bigeasy@linutronix.de
Cc: borntraeger@de.ibm.com
Cc: daniel.vetter@intel.com
Cc: heiko.carstens@de.ibm.com
Cc: herbert@gondor.apana.org.au
Cc: hocko@suse.cz
Cc: hughd@google.com
Cc: mst@redhat.com
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: schwidefsky@de.ibm.com
Cc: yang.shi@windriver.com
Link: http://lkml.kernel.org/r/1431359540-32227-2-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 08:39:13 +02:00
Jason Low
1018016c70 sched, timer: Replace spinlocks with atomics in thread_group_cputimer(), to improve scalability
While running a database workload, we found a scalability issue with itimers.

Much of the problem was caused by the thread_group_cputimer spinlock.
Each time we account for group system/user time, we need to obtain a
thread_group_cputimer's spinlock to update the timers. On larger systems
(such as a 16 socket machine), this caused more than 30% of total time
spent trying to obtain this kernel lock to update these group timer stats.

This patch converts the timers to 64-bit atomic variables and use
atomic add to update them without a lock. With this patch, the percent
of total time spent updating thread group cputimer timers was reduced
from 30% down to less than 1%.

Note: On 32-bit systems using the generic 64-bit atomics, this causes
sample_group_cputimer() to take locks 3 times instead of just 1 time.
However, we tested this patch on a 32-bit system ARM system using the
generic atomics and did not find the overhead to be much of an issue.
An explanation for why this isn't an issue is that 32-bit systems usually
have small numbers of CPUs, and cacheline contention from extra spinlocks
called periodically is not really apparent on smaller systems.

Signed-off-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Waiman Long <Waiman.Long@hp.com>
Link: http://lkml.kernel.org/r/1430251224-5764-4-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08 12:15:31 +02:00
Jason Low
316c1608d1 sched, timer: Convert usages of ACCESS_ONCE() in the scheduler to READ_ONCE()/WRITE_ONCE()
ACCESS_ONCE doesn't work reliably on non-scalar types. This patch removes
the rest of the existing usages of ACCESS_ONCE() in the scheduler, and use
the new READ_ONCE() and WRITE_ONCE() APIs as appropriate.

Signed-off-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Waiman Long <Waiman.Long@hp.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1430251224-5764-2-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08 12:11:32 +02:00
Davidlohr Bueso
11163348a2 oprofile: reduce mmap_sem hold for mm->exe_file
sync_buffer() needs the mmap_sem for two distinct operations, both only
occurring upon user context switch handling:

 1) Dealing with the exe_file.

 2) Adding the dcookie data as we need to lookup the vma that
   backs it. This is done via add_sample() and add_data().

This patch isolates 1), for it will no longer need the mmap_sem for
serialization.  However, for now, make of the more standard
get_mm_exe_file(), requiring only holding the mmap_sem to read the value,
and relying on reference counting to make sure that the exe file won't
dissappear underneath us while doing the get dcookie.

As a consequence, for 2) we move the mmap_sem locking into where we really
need it, in lookup_dcookie().  The benefits are twofold: reduce mmap_sem
hold times, and cleaner code.

[akpm@linux-foundation.org: export get_mm_exe_file for arch/x86/oprofile/oprofile.ko]
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Robert Richter <rric@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:11 -04:00
Davidlohr Bueso
6e399cd144 prctl: avoid using mmap_sem for exe_file serialization
Oleg cleverly suggested using xchg() to set the new mm->exe_file instead
of calling set_mm_exe_file() which requires some form of serialization --
mmap_sem in this case.  For archs that do not have atomic rmw instructions
we still fallback to a spinlock alternative, so this should always be
safe.  As such, we only need the mmap_sem for looking up the backing
vm_file, which can be done sharing the lock.  Naturally, this means we
need to manually deal with both the new and old file reference counting,
and we need not worry about the MMF_EXE_FILE_CHANGED bits, which can
probably be deleted in the future anyway.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:07 -04:00
Konstantin Khlebnikov
90f31d0ea8 mm: rcu-protected get_mm_exe_file()
This patch removes mm->mmap_sem from mm->exe_file read side.
Also it kills dup_mm_exe_file() and moves exe_file duplication into
dup_mmap() where both mmap_sems are locked.

[akpm@linux-foundation.org: fix comment typo]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:07 -04:00
Heinrich Schuchardt
16db3d3f11 kernel/sysctl.c: threads-max observe limits
Users can change the maximum number of threads by writing to
/proc/sys/kernel/threads-max.

With the patch the value entered is checked against the same limits that
apply when fork_init is called.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:07 -04:00
Heinrich Schuchardt
ac1b398de1 kernel/fork.c: avoid division by zero
PAGE_SIZE is not guaranteed to be equal to or less than 8 times the
THREAD_SIZE.

E.g.  architecture hexagon may have page size 1M and thread size 4096.
This would lead to a division by zero in the calculation of max_threads.

With 32-bit calculation there is no solution which delivers valid results
for all possible combinations of the parameters.  The code is only called
once.  Hence a 64-bit calculation can be used as solution.

[akpm@linux-foundation.org: use clamp_t(), per Oleg]
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:07 -04:00
Heinrich Schuchardt
ff691f6e03 kernel/fork.c: new function for max_threads
PAGE_SIZE is not guaranteed to be equal to or less than 8 times the
THREAD_SIZE.

E.g.  architecture hexagon may have page size 1M and thread size 4096.
This would lead to a division by zero in the calculation of max_threads.

With this patch the buggy code is moved to a separate function
set_max_threads.  The error is not fixed.

After fixing the problem in a separate patch the new function can be
reused to adjust max_threads after adding or removing memory.

Argument mempages of function fork_init() is removed as totalram_pages is
an exported symbol.

The creation of separate patches for refactoring to a new function and for
fixing the logic was suggested by Ingo Molnar.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:06 -04:00
Jean Delvare
3ea7f5e25e fork_init: update max_threads comment
The comment explaining what value max_threads is set to is outdated.  The
maximum memory consumption ratio for thread structures was 1/2 until
February 2002, then it was briefly changed to 1/16 before being set to 1/8
which we still use today.  The comment was never updated to reflect that
change, it's about time.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:06 -04:00
Michal Hocko
35f71bc0a0 fork: report pid reservation failure properly
copy_process will report any failure in alloc_pid as ENOMEM currently
which is misleading because the pid allocation might fail not only when
the memory is short but also when the pid space is consumed already.

The current man page even mentions this case:

: EAGAIN
:
:       A system-imposed limit on the number of threads was encountered.
:       There are a number of limits that may trigger this error: the
:       RLIMIT_NPROC soft resource limit (set via setrlimit(2)), which
:       limits the number of processes and threads for a real user ID, was
:       reached; the kernel's system-wide limit on the number of processes
:       and threads, /proc/sys/kernel/threads-max, was reached (see
:       proc(5)); or the maximum number of PIDs, /proc/sys/kernel/pid_max,
:       was reached (see proc(5)).

so the current behavior is also incorrect wrt.  documentation.  POSIX man
page also suggest returing EAGAIN when the process count limit is reached.

This patch simply propagates error code from alloc_pid and makes sure we
return -EAGAIN due to reservation failure.  This will make behavior of
fork closer to both our documentation and POSIX.

alloc_pid might alsoo fail when the reaper in the pid namespace is dead
(the namespace basically disallows all new processes) and there is no
good error code which would match documented ones. We have traditionally
returned ENOMEM for this case which is misleading as well but as per
Eric W. Biederman this behavior is documented in man pid_namespaces(7)

: If the "init" process of a PID namespace terminates, the kernel
: terminates all of the processes in the namespace via a SIGKILL signal.
: This behavior reflects the fact that the "init" process is essential for
: the correct operation of a PID namespace.  In this case, a subsequent
: fork(2) into this PID namespace will fail with the error ENOMEM; it is
: not possible to create a new processes in a PID namespace whose "init"
: process has terminated.

and introducing a new error code would be too risky so let's stick to
ENOMEM for this case.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:06 -04:00
Richard Weinberger
973f911f55 Remove execution domain support
All users of exec_domain are gone, now we can get rid
of that abandoned feature.
To not break existing userspace we keep a dummy
/proc/execdomains file which will always contain
"0-0     Linux                   [kernel]".

Signed-off-by: Richard Weinberger <richard@nod.at>
2015-04-12 20:58:24 +02:00
Kirill A. Shutemov
2d2f5119b8 mm: do not use mm->nr_pmds on !MMU configurations
mm->nr_pmds doesn't make sense on !MMU configurations

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:10 -08:00
Kirill A. Shutemov
b30fe6c7ce mm: fix false-positive warning on exit due mm_nr_pmds(mm)
The problem is that we check nr_ptes/nr_pmds in exit_mmap() which happens
*before* pgd_free().  And if an arch does pte/pmd allocation in
pgd_alloc() and frees them in pgd_free() we see offset in counters by the
time of the checks.

We tried to workaround this by offsetting expected counter value according
to FIRST_USER_ADDRESS for both nr_pte and nr_pmd in exit_mmap().  But it
doesn't work in some cases:

1. ARM with LPAE enabled also has non-zero USER_PGTABLES_CEILING, but
   upper addresses occupied with huge pmd entries, so the trick with
   offsetting expected counter value will get really ugly: we will have
   to apply it nr_pmds, but not nr_ptes.

2. Metag has non-zero FIRST_USER_ADDRESS, but doesn't do allocation
   pte/pmd page tables allocation in pgd_alloc(), just setup a pgd entry
   which is allocated at boot and shared accross all processes.

The proposal is to move the check to check_mm() which happens *after*
pgd_free() and do proper accounting during pgd_alloc() and pgd_free()
which would bring counters to zero if nothing leaked.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Tyler Baker <tyler.baker@linaro.org>
Tested-by: Tyler Baker <tyler.baker@linaro.org>
Tested-by: Nishanth Menon <nm@ti.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:04 -08:00