evie/android_kernel_oneplus_msm8998 - Gay Catgirls Forgejo: gay catgirls having sex

evie/android_kernel_oneplus_msm8998

Author	SHA1	Message	Date
Yisheng Xie	0eac567d2d	mm/hotplug: enable memory hotplug for non-lru movable pages We had considered all of the non-lru pages as unmovable before commit bda807d44454 ("mm: migrate: support non-lru movable page migration"). But now some of non-lru pages like zsmalloc, virtio-balloon pages also become movable. So we can offline such blocks by using non-lru page migration. This patch straightforwardly adds non-lru migration code, which means adding non-lru related code to the functions which scan over pfn and collect pages to be migrated and isolate them before migration. Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Yisheng Xie <xieyisheng1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Git-commit: 0efadf48bca01f17cb64ebceaf528590b2bc7665 Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Change-Id: Ieebf8f7965b42866ba6a25e18c5cf18d7f6a997b Signed-off-by: Arun KS <arunks@codeaurora.org>	2018-02-22 17:51:21 +05:30
Srinivasarao P	dd4f1e35fa	Merge android-4.4.106 (`2fea039`) into msm-4.4 * refs/heads/tmp-2fea039 Linux 4.4.106 usb: gadget: ffs: Forbid usb_ep_alloc_request from sleeping arm: KVM: Fix VTTBR_BADDR_MASK BUG_ON off-by-one Revert "x86/mm/pat: Ensure cpa->pfn only contains page frame numbers" Revert "x86/efi: Hoist page table switching code into efi_call_virt()" Revert "x86/efi: Build our own page table structures" net/packet: fix a race in packet_bind() and packet_notifier() packet: fix crash in fanout_demux_rollover() sit: update frag_off info rds: Fix NULL pointer dereference in __rds_rdma_map tipc: fix memory leak in tipc_accept_from_sock() more bio_map_user_iov() leak fixes s390: always save and restore all registers on context switch ipmi: Stop timers before cleaning up the module audit: ensure that 'audit=1' actually enables audit for PID 1 ipvlan: fix ipv6 outbound device afs: Connect up the CB.ProbeUuid IB/mlx5: Assign send CQ and recv CQ of UMR QP IB/mlx4: Increase maximal message size under UD QP xfrm: Copy policy family in clone_policy jump_label: Invoke jump_label_test() via early_initcall() atm: horizon: Fix irq release error sctp: use the right sk after waking up from wait_buf sleep sctp: do not free asoc when it is already dead in sctp_sendmsg sparc64/mm: set fields in deferred pages block: wake up all tasks blocked in get_request() sunrpc: Fix rpc_task_begin trace point NFS: Fix a typo in nfs_rename() dynamic-debug-howto: fix optional/omitted ending line number to be LARGE instead of 0 lib/genalloc.c: make the avail variable an atomic_long_t route: update fnhe_expires for redirect when the fnhe exists route: also update fnhe_genid when updating a route cache mac80211_hwsim: Fix memory leak in hwsim_new_radio_nl() kbuild: pkg: use --transform option to prefix paths in tar EDAC, i5000, i5400: Fix definition of NRECMEMB register EDAC, i5000, i5400: Fix use of MTR_DRAM_WIDTH macro powerpc/powernv/ioda2: Gracefully fail if too many TCE levels requested drm/amd/amdgpu: fix console deadlock if late init failed axonram: Fix gendisk handling netfilter: don't track fragmented packets zram: set physical queue limits to avoid array out of bounds accesses i2c: riic: fix restart condition crypto: s5p-sss - Fix completing crypto request in IRQ handler ipv6: reorder icmpv6_init() and ip6_mr_init() bnx2x: do not rollback VF MAC/VLAN filters we did not configure bnx2x: fix possible overrun of VFPF multicast addresses array bnx2x: prevent crash when accessing PTP with interface down spi_ks8995: fix "BUG: key accdaa28 not in .data!" arm64: KVM: Survive unknown traps from guests arm: KVM: Survive unknown traps from guests KVM: nVMX: reset nested_run_pending if the vCPU is going to be reset irqchip/crossbar: Fix incorrect type of register size scsi: lpfc: Fix crash during Hardware error recovery on SLI3 adapters workqueue: trigger WARN if queue_delayed_work() is called with NULL @wq libata: drop WARN from protocol error in ata_sff_qc_issue() kvm: nVMX: VMCLEAR should not cause the vCPU to shut down USB: gadgetfs: Fix a potential memory leak in 'dev_config()' usb: gadget: configs: plug memory leak HID: chicony: Add support for another ASUS Zen AiO keyboard gpio: altera: Use handle_level_irq when configured as a level_high ARM: OMAP2+: Release device node after it is no longer needed. ARM: OMAP2+: Fix device node reference counts module: set __jump_table alignment to 8 selftest/powerpc: Fix false failures for skipped tests x86/hpet: Prevent might sleep splat on resume ARM: OMAP2+: gpmc-onenand: propagate error on initialization failure vti6: Don't report path MTU below IPV6_MIN_MTU. Revert "s390/kbuild: enable modversions for symbols exported from asm" Revert "spi: SPI_FSL_DSPI should depend on HAS_DMA" Revert "drm/armada: Fix compile fail" mm: drop unused pmdp_huge_get_and_clear_notify() thp: fix MADV_DONTNEED vs. numa balancing race thp: reduce indentation level in change_huge_pmd() scsi: storvsc: Workaround for virtual DVD SCSI version ARM: avoid faulting on qemu ARM: BUG if jumping to usermode address in kernel mode arm64: fpsimd: Prevent registers leaking from dead tasks KVM: VMX: remove I/O port 0x80 bypass on Intel hosts arm64: KVM: fix VTTBR_BADDR_MASK BUG_ON off-by-one media: dvb: i2c transfers over usb cannot be done from stack drm/exynos: gem: Drop NONCONTIG flag for buffers allocated without IOMMU drm: extra printk() wrapper macros kdb: Fix handling of kallsyms_symbol_next() return value s390: fix compat system call table iommu/vt-d: Fix scatterlist offset handling ALSA: usb-audio: Add check return value for usb_string() ALSA: usb-audio: Fix out-of-bound error ALSA: seq: Remove spurious WARN_ON() at timer check ALSA: pcm: prevent UAF in snd_pcm_info x86/PCI: Make broadcom_postcore_init() check acpi_disabled X.509: reject invalid BIT STRING for subjectPublicKey ASN.1: check for error from ASN1_OP_END__ACT actions ASN.1: fix out-of-bounds read when parsing indefinite length item efi: Move some sysfs files to be read-only by root scsi: libsas: align sata_device's rps_resp on a cacheline isa: Prevent NULL dereference in isa_bus driver callbacks hv: kvp: Avoid reading past allocated blocks from KVP file virtio: release virtio index when fail to device_register can: usb_8dev: cancel urb on -EPIPE and -EPROTO can: esd_usb2: cancel urb on -EPIPE and -EPROTO can: ems_usb: cancel urb on -EPIPE and -EPROTO can: kvaser_usb: cancel urb on -EPIPE and -EPROTO can: kvaser_usb: ratelimit errors if incomplete messages are received can: kvaser_usb: Fix comparison bug in kvaser_usb_read_bulk_callback() can: kvaser_usb: free buf in error paths can: ti_hecc: Fix napi poll return value for repoll BACKPORT: irq: Make the irqentry text section unconditional UPSTREAM: arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections UPSTREAM: x86, kasan, ftrace: Put APIC interrupt handlers into .irqentry.text UPSTREAM: kasan: make get_wild_bug_type() static UPSTREAM: kasan: separate report parts by empty lines UPSTREAM: kasan: improve double-free report format UPSTREAM: kasan: print page description after stacks UPSTREAM: kasan: improve slab object description UPSTREAM: kasan: change report header UPSTREAM: kasan: simplify address description logic UPSTREAM: kasan: change allocation and freeing stack traces headers UPSTREAM: kasan: unify report headers UPSTREAM: kasan: introduce helper functions for determining bug type BACKPORT: kasan: report only the first error by default UPSTREAM: kasan: fix races in quarantine_remove_cache() UPSTREAM: kasan: resched in quarantine_remove_cache() BACKPORT: kasan, sched/headers: Uninline kasan_enable/disable_current() BACKPORT: kasan: drain quarantine of memcg slab objects UPSTREAM: kasan: eliminate long stalls during quarantine reduction UPSTREAM: kasan: support panic_on_warn UPSTREAM: x86/suspend: fix false positive KASAN warning on suspend/resume UPSTREAM: kasan: support use-after-scope detection UPSTREAM: kasan/tests: add tests for user memory access functions UPSTREAM: mm, kasan: add a ksize() test UPSTREAM: kasan: test fix: warn if the UAF could not be detected in kmalloc_uaf2 UPSTREAM: kasan: modify kmalloc_large_oob_right(), add kmalloc_pagealloc_oob_right() UPSTREAM: lib/stackdepot: export save/fetch stack for drivers UPSTREAM: lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MB BACKPORT: kprobes: Unpoison stack in jprobe_return() for KASAN UPSTREAM: kasan: remove the unnecessary WARN_ONCE from quarantine.c UPSTREAM: kasan: avoid overflowing quarantine size on low memory systems UPSTREAM: kasan: improve double-free reports BACKPORT: mm: coalesce split strings BACKPORT: mm/kasan: get rid of ->state in struct kasan_alloc_meta UPSTREAM: mm/kasan: get rid of ->alloc_size in struct kasan_alloc_meta UPSTREAM: mm: kasan: remove unused 'reserved' field from struct kasan_alloc_meta UPSTREAM: mm/kasan, slub: don't disable interrupts when object leaves quarantine UPSTREAM: mm/kasan: don't reduce quarantine in atomic contexts UPSTREAM: mm/kasan: fix corruptions and false positive reports UPSTREAM: lib/stackdepot.c: use __GFP_NOWARN for stack allocations BACKPORT: mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB UPSTREAM: kasan/quarantine: fix bugs on qlist_move_cache() UPSTREAM: mm: mempool: kasan: don't poot mempool objects in quarantine UPSTREAM: kasan: change memory hot-add error messages to info messages BACKPORT: mm/kasan: add API to check memory regions UPSTREAM: mm/kasan: print name of mem[set,cpy,move]() caller in report UPSTREAM: mm: kasan: initial memory quarantine implementation UPSTREAM: lib/stackdepot: avoid to return 0 handle UPSTREAM: lib/stackdepot.c: allow the stack trace hash to be zero UPSTREAM: mm, kasan: fix compilation for CONFIG_SLAB BACKPORT: mm, kasan: stackdepot implementation. Enable stackdepot for SLAB BACKPORT: mm, kasan: add GFP flags to KASAN API UPSTREAM: mm, kasan: SLAB support UPSTREAM: mm/slab: align cache size first before determination of OFF_SLAB candidate UPSTREAM: mm/slab: use more appropriate condition check for debug_pagealloc UPSTREAM: mm/slab: factor out debugging initialization in cache_init_objs() UPSTREAM: mm/slab: remove object status buffer for DEBUG_SLAB_LEAK UPSTREAM: mm/slab: alternative implementation for DEBUG_SLAB_LEAK UPSTREAM: mm/slab: clean up DEBUG_PAGEALLOC processing code UPSTREAM: mm/slab: activate debug_pagealloc in SLAB when it is actually enabled sched: EAS/WALT: Don't take into account of running task's util BACKPORT: schedutil: Reset cached freq if it is not in sync with next_freq UPSTREAM: kasan: add functions to clear stack poison Conflicts: arch/arm/include/asm/kvm_arm.h arch/arm64/kernel/vmlinux.lds.S include/linux/kasan.h kernel/softirq.c lib/Kconfig lib/Kconfig.kasan lib/Makefile lib/stackdepot.c mm/kasan/kasan.c sound/usb/mixer.c Change-Id: If70ced6da5f19be3dd92d10a8d8cd4d5841e5870 Signed-off-by: Srinivasarao P <spathi@codeaurora.org>	2018-01-18 12:45:07 +05:30
Joe Perches	1a44264ae8	BACKPORT: mm: coalesce split strings Kernel style prefers a single string over split strings when the string is 'user-visible'. Miscellanea: - Add a missing newline - Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Tejun Heo <tj@kernel.org> [percpu] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Bug: 64145065 (cherry-picked from 756a025f00091918d9d09ca3229defb160b409c0) Change-Id: I377fb1542980c15d2f306924656227ad17b02b5e Signed-off-by: Paul Lawrence <paullawrence@google.com>	2017-12-14 08:21:34 -08:00
Arun KS	ea4e70e7d6	arm64: Honor limits set by bootloader Introduce a varible to save bootloader enforced memory limits and restricts adding beyond this boundary during a memory hotplug. Change-Id: I28c100644b7287ec4625c4c018b5fffc865e2e72 Signed-off-by: Arun KS <arunks@codeaurora.org>	2017-11-22 17:12:55 +05:30
Runmin Wang	78cbd38fd5	Merge tag 'lsk-v4.4-17.02-android' into branch 'msm-4.4' * refs/heads/tmp-26c8156: Linux 4.4.49 drm/i915: fix use-after-free in page_flip_completed() ALSA: seq: Don't handle loop timeout at snd_seq_pool_done() ALSA: seq: Fix race at creating a queue xen-netfront: Delete rx_refill_timer in xennet_disconnect_backend() scsi: mpt3sas: disable ASPM for MPI2 controllers scsi: aacraid: Fix INTx/MSI-x issue with older controllers scsi: zfcp: fix use-after-free by not tracing WKA port open/close on failed send netvsc: Set maximum GSO size in the right place mac80211: Fix adding of mesh vendor IEs ARM: 8642/1: LPAE: catch pending imprecise abort on unmask target: Fix COMPARE_AND_WRITE ref leak for non GOOD status target: Fix early transport_generic_handle_tmr abort scenario target: Use correct SCSI status during EXTENDED_COPY exception target: Don't BUG_ON during NodeACL dynamic -> explicit conversion ARM: 8643/3: arm/ptrace: Preserve previous registers for short regset write hns: avoid stack overflow with CONFIG_KASAN cpumask: use nr_cpumask_bits for parsing functions Revert "x86/ioapic: Restore IO-APIC irq_chip retrigger callback" selinux: fix off-by-one in setprocattr ARC: [arcompact] brown paper bag bug in unaligned access delay slot fixup Linux 4.4.48 base/memory, hotplug: fix a kernel oops in show_valid_zones() x86/irq: Make irq activate operations symmetric USB: serial: option: add device ID for HP lt2523 (Novatel E371) usb: gadget: f_fs: Assorted buffer overflow checks. USB: Add quirk for WORLDE easykey.25 MIDI keyboard USB: serial: pl2303: add ATEN device ID USB: serial: qcserial: add Dell DW5570 QDL KVM: x86: do not save guest-unsupported XSAVE state HID: wacom: Fix poor prox handling in 'wacom_pl_irq' percpu-refcount: fix reference leak during percpu-atomic transition mmc: sdhci: Ignore unexpected CARD_INT interrupts can: bcm: fix hrtimer/tasklet termination in bcm op removal mm, fs: check for fatal signals in do_generic_file_read() mm/memory_hotplug.c: check start_pfn in test_pages_in_a_zone() cifs: initialize file_info_lock zswap: disable changing params if init fails svcrpc: fix oops in absence of krb5 module NFSD: Fix a null reference case in find_or_create_lock_stateid() powerpc: Add missing error check to prom_find_boot_cpu() powerpc/eeh: Fix wrong flag passed to eeh_unfreeze_pe() libata: apply MAX_SEC_1024 to all CX1-JB-HP devices ata: sata_mv:- Handle return value of devm_ioremap. perf/core: Fix PERF_RECORD_MMAP2 prot/flags for anonymous memory crypto: arm64/aes-blk - honour iv_out requirement in CBC and CTR modes crypto: api - Clear CRYPTO_ALG_DEAD bit before registering an alg drm/nouveau/nv1a,nv1f/disp: fix memory clock rate retrieval drm/nouveau/disp/gt215: Fix HDA ELD handling (thus, HDMI audio) on gt215 ext4: validate s_first_meta_bg at mount time PCI/ASPM: Handle PCI-to-PCIe bridges as roots of PCIe hierarchies ANDROID: security: export security_path_chown() Linux 4.4.47 net: dsa: Bring back device detaching in dsa_slave_suspend() qmi_wwan/cdc_ether: add device ID for HP lt2523 (Novatel E371) WWAN card af_unix: move unix_mknod() out of bindlock r8152: don't execute runtime suspend if the tx is not empty bridge: netlink: call br_changelink() during br_dev_newlink() tcp: initialize max window for a new fastopen socket ipv6: addrconf: Avoid addrconf_disable_change() using RCU read-side lock net: phy: bcm63xx: Utilize correct config_intr function net: fix harmonize_features() vs NETIF_F_HIGHDMA ax25: Fix segfault after sock connection timeout ravb: do not use zero-length alignment DMA descriptor openvswitch: maintain correct checksum state in conntrack actions tcp: fix tcp_fastopen unaligned access complaints on sparc net: systemport: Decouple flow control from __bcm_sysport_tx_reclaim net: ipv4: fix table id in getroute response net: lwtunnel: Handle lwtunnel_fill_encap failure mlxsw: pci: Fix EQE structure definition mlxsw: switchx2: Fix memory leak at skb reallocation mlxsw: spectrum: Fix memory leak at skb reallocation r8152: fix the sw rx checksum is unavailable ANDROID: sdcardfs: Switch strcasecmp for internal call ANDROID: sdcardfs: switch to full_name_hash and qstr ANDROID: sdcardfs: Add GID Derivation to sdcardfs ANDROID: sdcardfs: Remove redundant operation ANDROID: sdcardfs: add support for user permission isolation ANDROID: sdcardfs: Refactor configfs interface ANDROID: sdcardfs: Allow non-owners to touch ANDROID: binder: fix format specifier for type binder_size_t ANDROID: fs: Export vfs_rmdir2 ANDROID: fs: Export free_fs_struct and set_fs_pwd ANDROID: mnt: remount should propagate to slaves of slaves ANDROID: sdcardfs: Switch ->d_inode to d_inode() ANDROID: sdcardfs: Fix locking issue with permision fix up ANDROID: sdcardfs: Change magic value ANDROID: sdcardfs: Use per mount permissions ANDROID: sdcardfs: Add gid and mask to private mount data ANDROID: sdcardfs: User new permission2 functions ANDROID: vfs: Add setattr2 for filesystems with per mount permissions ANDROID: vfs: Add permission2 for filesystems with per mount permissions ANDROID: vfs: Allow filesystems to access their private mount data ANDROID: mnt: Add filesystem private data to mount points ANDROID: sdcardfs: Move directory unlock before touch ANDROID: sdcardfs: fix external storage exporting incorrect uid ANDROID: sdcardfs: Added top to sdcardfs_inode_info ANDROID: sdcardfs: Switch package list to RCU ANDROID: sdcardfs: Fix locking for permission fix up ANDROID: sdcardfs: Check for other cases on path lookup ANDROID: sdcardfs: override umask on mkdir and create Linux 4.4.46 mm, memcg: do not retry precharge charges platform/x86: intel_mid_powerbtn: Set IRQ_ONESHOT pinctrl: broxton: Use correct PADCFGLOCK offset s5k4ecgx: select CRC32 helper IB/umem: Release pid in error and ODP flow IB/ipoib: move back IB LL address into the hard header drm/i915: Don't leak edid in intel_crt_detect_ddc() SUNRPC: cleanup ida information when removing sunrpc module NFSv4.0: always send mode in SETATTR after EXCLUSIVE4 nfs: Don't increment lock sequence ID after NFS4ERR_MOVED parisc: Don't use BITS_PER_LONG in userspace-exported swab.h header ARC: [arcompact] handle unaligned access delay slot corner case ARC: udelay: fix inline assembler by adding LP_COUNT to clobber list can: ti_hecc: add missing prepare and unprepare of the clock can: c_can_pci: fix null-pointer-deref in c_can_start() - set device pointer s390/ptrace: Preserve previous registers for short regset write RDMA/cma: Fix unknown symbol when CONFIG_IPV6 is not enabled ISDN: eicon: silence misleading array-bounds warning sysctl: fix proc_doulongvec_ms_jiffies_minmax() mm/mempolicy.c: do not put mempolicy before using its nodemask drm: Fix broken VT switch with video=1366x768 option tile/ptrace: Preserve previous registers for short regset write fbdev: color map copying bounds checking Linux 4.4.45 arm64: avoid returning from bad_mode selftest/powerpc: Wrong PMC initialized in pmc56_overflow test dmaengine: pl330: Fix runtime PM support for terminated transfers ite-cir: initialize use_demodulator before using it blackfin: check devm_pinctrl_get() for errors ARM: 8613/1: Fix the uaccess crash on PB11MPCore ARM: ux500: fix prcmu_is_cpu_in_wfi() calculation ARM: dts: imx6qdl-nitrogen6_max: fix sgtl5000 pinctrl init arm64/ptrace: Reject attempts to set incomplete hardware breakpoint fields arm64/ptrace: Avoid uninitialised struct padding in fpr_set() arm64/ptrace: Preserve previous registers for short regset write - 3 arm64/ptrace: Preserve previous registers for short regset write - 2 arm64/ptrace: Preserve previous registers for short regset write ARM: dts: da850-evm: fix read access to SPI flash ceph: fix bad endianness handling in parse_reply_info_extra ARM: 8634/1: hw_breakpoint: blacklist Scorpion CPUs svcrdma: avoid duplicate dma unmapping during error recovery clocksource/exynos_mct: Clear interrupt when cpu is shut down ubifs: Fix journal replay wrt. xattr nodes qla2xxx: Fix crash due to null pointer access x86/ioapic: Restore IO-APIC irq_chip retrigger callback mtd: nand: xway: disable module support ieee802154: atusb: do not use the stack for buffers to make them DMA able mmc: mxs-mmc: Fix additional cycles after transmission stop HID: corsair: fix control-transfer error handling HID: corsair: fix DMA buffers on stack PCI: Enumerate switches below PCI-to-PCIe bridges fuse: clear FR_PENDING flag when moving requests out of pending queue svcrpc: don't leak contexts on PROC_DESTROY x86/PCI: Ignore _CRS on Supermicro X8DTH-i/6/iF/6F tmpfs: clear S_ISGID when setting posix ACLs ARM: dts: imx31: fix AVIC base address ARM: dts: imx31: move CCM device node to AIPS2 bus devices ARM: dts: imx31: fix clock control module interrupts description perf scripting: Avoid leaking the scripting_context variable IB/IPoIB: Remove can't use GFP_NOIO warning IB/mlx4: When no DMFS for IPoIB, don't allow NET_IF QPs IB/mlx4: Fix port query for 56Gb Ethernet links IB/mlx4: Fix out-of-range array index in destroy qp flow IB/mlx4: Set traffic class in AH IB/mlx5: Wait for all async command completions to complete ftrace/x86: Set ftrace_stub to weak to prevent gcc from using short jumps to it Linux 4.4.44 pinctrl: sh-pfc: Do not unconditionally support PIN_CONFIG_BIAS_DISABLE powerpc/ibmebus: Fix device reference leaks in sysfs interface powerpc/ibmebus: Fix further device reference leaks bus: vexpress-config: fix device reference leak blk-mq: Always schedule hctx->next_cpu ACPI / APEI: Fix NMI notification handling block: cfq_cpd_alloc() should use @gfp cpufreq: powernv: Disable preemption while checking CPU throttling state NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success. NFS: Fix a performance regression in readdir pNFS: Fix race in pnfs_wait_on_layoutreturn pinctrl: meson: fix gpio request disabling other modes btrfs: fix error handling when run_delayed_extent_op fails btrfs: fix locking when we put back a delayed ref that's too new x86/cpu: Fix bootup crashes by sanitizing the argument of the 'clearcpuid=' command-line option USB: serial: ch341: fix modem-control and B0 handling USB: serial: ch341: fix resume after reset drm/radeon: drop verde dpm quirks sysctl: Drop reference added by grab_header in proc_sys_readdir sysrq: attach sysrq handler correctly for 32-bit kernel tty/serial: atmel_serial: BUG: stop DMA from transmitting in stop_tx mnt: Protect the mountpoint hashtable with mount_lock vme: Fix wrong pointer utilization in ca91cx42_slave_get xhci: fix deadlock at host remove by running watchdog correctly i2c: fix kernel memory disclosure in dev interface i2c: print correct device invalid address Input: elants_i2c - avoid divide by 0 errors on bad touchscreen data USB: serial: ch341: fix open and resume after B0 USB: serial: ch341: fix control-message error handling USB: serial: ch341: fix open error handling USB: serial: ch341: fix initial modem-control state USB: serial: kl5kusb105: fix line-state error handling nl80211: fix sched scan netlink socket owner destruction KVM: x86: Introduce segmented_write_std KVM: x86: emulate FXSAVE and FXRSTOR KVM: x86: add asm_safe wrapper KVM: x86: add Align16 instruction flag KVM: x86: flush pending lapic jump label updates on module unload jump_labels: API for flushing deferred jump label updates KVM: eventfd: fix NULL deref irqbypass consumer KVM: x86: fix emulation of "MOV SS, null selector" mm/hugetlb.c: fix reservation race when freeing surplus pages ocfs2: fix crash caused by stale lvb with fsdlm plugin mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done} selftests: do not require bash for the generated test selftests: do not require bash to run netsocktests testcase Input: i8042 - add Pegatron touchpad to noloop table Input: xpad - use correct product id for x360w controllers DEBUG: sched/fair: Fix sched_load_avg_cpu events for task_groups DEBUG: sched/fair: Fix missing sched_load_avg_cpu events net: socket: don't set sk_uid to garbage value in ->setattr() ANDROID: configs: CONFIG_ARM64_SW_TTBR0_PAN=y UPSTREAM: arm64: Disable PAN on uaccess_enable() UPSTREAM: arm64: Enable CONFIG_ARM64_SW_TTBR0_PAN UPSTREAM: arm64: xen: Enable user access before a privcmd hvc call UPSTREAM: arm64: Handle faults caused by inadvertent user access with PAN enabled BACKPORT: arm64: Disable TTBR0_EL1 during normal kernel execution BACKPORT: arm64: Introduce uaccess_{disable,enable} functionality based on TTBR0_EL1 BACKPORT: arm64: Factor out TTBR0_EL1 post-update workaround into a specific asm macro BACKPORT: arm64: Factor out PAN enabling/disabling into separate uaccess_ macros UPSTREAM: arm64: alternative: add auto-nop infrastructure UPSTREAM: arm64: barriers: introduce nops and __nops macros for NOP sequences Revert "FROMLIST: arm64: Factor out PAN enabling/disabling into separate uaccess_* macros" Revert "FROMLIST: arm64: Factor out TTBR0_EL1 post-update workaround into a specific asm macro" Revert "FROMLIST: arm64: Introduce uaccess_{disable,enable} functionality based on TTBR0_EL1" Revert "FROMLIST: arm64: Disable TTBR0_EL1 during normal kernel execution" Revert "FROMLIST: arm64: Handle faults caused by inadvertent user access with PAN enabled" Revert "FROMLIST: arm64: xen: Enable user access before a privcmd hvc call" Revert "FROMLIST: arm64: Enable CONFIG_ARM64_SW_TTBR0_PAN" ANDROID: sched/walt: fix build failure if FAIR_GROUP_SCHED=n Linux 4.4.43 mm/init: fix zone boundary creation ALSA: usb-audio: Add a quirk for Plantronics BT600 spi: mvebu: fix baudrate calculation for armada variant ARM: OMAP4+: Fix bad fallthrough for cpuidle ARM: zynq: Reserve correct amount of non-DMA RAM powerpc: Fix build warning on 32-bit PPC ALSA: firewire-tascam: Fix to handle error from initialization of stream data HID: hid-cypress: validate length of report net: vrf: do not allow table id 0 net: ipv4: Fix multipath selection with vrf gro: Disable frag0 optimization on IPv6 ext headers gro: use min_t() in skb_gro_reset_offset() gro: Enter slow-path if there is no tailroom r8152: fix rx issue for runtime suspend r8152: split rtl8152_suspend function ipv4: Do not allow MAIN to be alias for new LOCAL w/ custom rules igmp: Make igmp group member RFC 3376 compliant drop_monitor: consider inserted data in genlmsg_end drop_monitor: add missing call to genlmsg_end net/mlx5: Avoid shadowing numa_node net/mlx5: Check FW limitations on log_max_qp before setting it net: stmmac: Fix race between stmmac_drv_probe and stmmac_open net, sched: fix soft lockup in tc_classify ipv6: handle -EFAULT from skb_copy_bits net: vrf: Drop conntrack data after pass through VRF device on Tx ser_gigaset: return -ENOMEM on error instead of success netvsc: reduce maximum GSO size Linux 4.4.42 usb: gadget: composite: always set ep->mult to a sensible value Revert "usb: gadget: composite: always set ep->mult to a sensible value" tick/broadcast: Prevent NULL pointer dereference drm/radeon: Always store CRTC relative radeon_crtc->cursor_x/y values cx23885-dvb: move initialization of a8293_pdata net: vxge: avoid unused function warnings net: ti: cpmac: Fix compiler warning due to type confusion cred/userns: define current_user_ns() as a function staging: comedi: dt282x: tidy up register bit defines powerpc/pci/rpadlpar: Fix device reference leaks md: MD_RECOVERY_NEEDED is set for mddev->recovery crypto: arm64/aes-ce - fix for big endian crypto: arm64/aes-xts-ce: fix for big endian crypto: arm64/sha1-ce - fix for big endian crypto: arm64/aes-neon - fix for big endian crypto: arm64/aes-ccm-ce: fix for big endian crypto: arm/aes-ce - fix for big endian crypto: arm64/ghash-ce - fix for big endian crypto: arm64/sha2-ce - fix for big endian s390/crypto: unlock on error in prng_tdes_read() mmc: mmc_test: Uninitialized return value PM / wakeirq: Fix dedicated wakeirq for drivers not using autosuspend irqchip/bcm7038-l1: Implement irq_cpu_offline() callback target/iscsi: Fix double free in lio_target_tiqn_addtpg() scsi: mvsas: fix command_active typo ASoC: samsung: i2s: Fixup last IRQ unsafe spin lock call iommu/vt-d: Flush old iommu caches for kdump when the device gets context mapped iommu/vt-d: Fix pasid table size encoding iommu/amd: Fix the left value check of cmd buffer iommu/amd: Missing error code in amd_iommu_init_device() clk: imx31: fix rewritten input argument of mx31_clocks_init() clk: clk-wm831x: fix a logic error hwmon: (g762) Fix overflows and crash seen when writing limit attributes hwmon: (nct7802) Fix overflows seen when writing into limit attributes hwmon: (ds620) Fix overflows seen when writing temperature limits hwmon: (amc6821) sign extension temperature hwmon: (scpi) Fix module autoload cris: Only build flash rescue image if CONFIG_ETRAX_AXISFLASHMAP is selected ath10k: use the right length of "background" stable-fixup: hotplug: fix unused function warning usb: dwc3: ep0: explicitly call dwc3_ep0_prepare_one_trb() usb: dwc3: ep0: add dwc3_ep0_prepare_one_trb() usb: dwc3: gadget: always unmap EP0 requests staging: iio: ad7606: fix improper setting of oversampling pins mei: bus: fix mei_cldev_enable KDoc USB: serial: io_ti: bind to interface after fw download USB: phy: am335x-control: fix device and of_node leaks ARM: dts: r8a7794: Correct hsusb parent clock USB: serial: kl5kusb105: abort on open exception path ALSA: usb-audio: Fix bogus error return in snd_usb_create_stream() usb: musb: blackfin: add bfin_fifo_offset in bfin_ops usb: hub: Move hub_port_disable() to fix warning if PM is disabled usb: musb: Fix trying to free already-free IRQ 4 usb: dwc3: pci: add Intel Gemini Lake PCI ID xhci: Fix race related to abort operation xhci: Use delayed_work instead of timer for command timeout usb: xhci-mem: use passed in GFP flags instead of GFP_KERNEL USB: serial: mos7720: fix parallel probe USB: serial: mos7720: fix parport use-after-free on probe errors USB: serial: mos7720: fix use-after-free on probe errors USB: serial: mos7720: fix NULL-deref at open USB: serial: mos7840: fix NULL-deref at open USB: serial: kobil_sct: fix NULL-deref in write USB: serial: cyberjack: fix NULL-deref at open USB: serial: oti6858: fix NULL-deref at open USB: serial: io_edgeport: fix NULL-deref at open USB: serial: ti_usb_3410_5052: fix NULL-deref at open USB: serial: garmin_gps: fix memory leak on failed URB submit USB: serial: iuu_phoenix: fix NULL-deref at open USB: serial: io_ti: fix I/O after disconnect USB: serial: io_ti: fix another NULL-deref at open USB: serial: io_ti: fix NULL-deref at open USB: serial: spcp8x5: fix NULL-deref at open USB: serial: keyspan_pda: verify endpoints at probe USB: serial: pl2303: fix NULL-deref at open USB: serial: quatech2: fix sleep-while-atomic in close USB: serial: omninet: fix NULL-derefs at open and disconnect usb: xhci: hold lock over xhci_abort_cmd_ring() xhci: Handle command completion and timeout race usb: host: xhci: Fix possible wild pointer when handling abort command usb: xhci: fix return value of xhci_setup_device() xhci: free xhci virtual devices with leaf nodes first usb: xhci: apply XHCI_PME_STUCK_QUIRK to Intel Apollo Lake xhci: workaround for hosts missing CAS bit usb: xhci: fix possible wild pointer usb: dwc3: core: avoid Overflow events usb: gadget: composite: Test get_alt() presence instead of set_alt() USB: dummy-hcd: fix bug in stop_activity (handle ep0) USB: fix problems with duplicate endpoint addresses USB: gadgetfs: fix checks of wTotalLength in config descriptors USB: gadgetfs: fix use-after-free bug USB: gadgetfs: fix unbounded memory allocation bug usb: gadgetfs: restrict upper bound on device configuration size usb: storage: unusual_uas: Add JMicron JMS56x to unusual device usb: musb: dsps: implement clear_ep_rxintr() callback usb: musb: core: add clear_ep_rxintr() to musb_platform_ops KVM: MIPS: Flush KVM entry code from icache globally KVM: x86: reset MMU on KVM_SET_VCPU_EVENTS mac80211: initialize fast-xmit 'info' later ARM: davinci: da850: don't add emac clock to lookup table twice ALSA: usb-audio: Fix irq/process data synchronization ALSA: hda - Apply asus-mode8 fixup to ASUS X71SL ALSA: hda - Fix up GPIO for ASUS ROG Ranger Linux 4.4.41 net: mvpp2: fix dma unmapping of TX buffers for fragments sg_write()/bsg_write() is not fit to be called under KERNEL_DS kconfig/nconf: Fix hang when editing symbol with a long prompt target/user: Fix use-after-free of tcmu_cmds if they are expired powerpc: Convert cmp to cmpd in idle enter sequence powerpc/ps3: Fix system hang with GCC 5 builds nfs_write_end(): fix handling of short copies libceph: verify authorize reply on connect PCI: Check for PME in targeted sleep state Input: drv260x - fix input device's parent assignment media: solo6x10: fix lockup by avoiding delayed register write IB/cma: Fix a race condition in iboe_addr_get_sgid() IB/multicast: Check ib_find_pkey() return value IPoIB: Avoid reading an uninitialized member variable IB/mad: Fix an array index check fgraph: Handle a case where a tracer ignores set_graph_notrace platform/x86: asus-nb-wmi.c: Add X45U quirk ftrace/x86_32: Set ftrace_stub to weak to prevent gcc from using short jumps to it kvm: nVMX: Allow L1 to intercept software exceptions (#BP and #OF) KVM: PPC: Book3S HV: Don't lose hardware R/C bit updates in H_PROTECT KVM: PPC: Book3S HV: Save/restore XER in checkpointed register state md/raid5: limit request size according to implementation limits sc16is7xx: Drop bogus use of IRQF_ONESHOT s390/vmlogrdr: fix IUCV buffer allocation firmware: fix usermode helper fallback loading ARC: mm: arc700: Don't assume 2 colours for aliasing VIPT dcache scsi: avoid a permanent stop of the scsi device's request queue scsi: zfcp: fix rport unblock race with LUN recovery scsi: zfcp: do not trace pure benign residual HBA responses at default level scsi: zfcp: fix use-after-"free" in FC ingress path after TMF scsi: megaraid_sas: Do not set MPI2_TYPE_CUDA for JBOD FP path for FW which does not support JBOD sequence map scsi: megaraid_sas: For SRIOV enabled firmware, ensure VF driver waits for 30secs before reset vt: fix Scroll Lock LED trigger name block: protect iterate_bdevs() against concurrent close mei: request async autosuspend at the end of enumeration drivers/gpu/drm/ast: Fix infinite loop if read fails drm/gma500: Add compat ioctl drm/radeon: add additional pci revision to dpm workaround drm/radeon: Hide the HW cursor while it's out of bounds drm/radeon: Also call cursor_move_locked when the cursor size changes drm/nouveau/i2c/gk110b,gm10x: use the correct implementation drm/nouveau/fifo/gf100-: protect channel preempt with subdev mutex drm/nouveau/ltc: protect clearing of comptags with mutex drm/nouveau/bios: require checksum to match for fast acpi shadow method drm/nouveau/kms: lvds panel strap moved again on maxwell ACPI / video: Add force_native quirk for HP Pavilion dv6 ACPI / video: Add force_native quirk for Dell XPS 17 L702X staging: comedi: ni_mio_common: fix E series ni_ai_insn_read() data staging: comedi: ni_mio_common: fix M Series ni_ai_insn_read() data mask thermal: hwmon: Properly report critical temperature in sysfs clk: bcm2835: Avoid overwriting the div info when disabling a pll_div clk timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion regulator: stw481x-vmmc: fix ages old enable error mmc: sdhci: Fix recovery from tuning timeout ath9k: Really fix LED polarity for some Mini PCI AR9220 MB92 cards. cfg80211/mac80211: fix BSS leaks when abandoning assoc attempts rtlwifi: Fix enter/exit power_save ssb: Fix error routine when fallback SPROM fails Linux 4.4.40 ppp: defer netns reference release for ppp channel driver core: fix race between creating/querying glue dir and its cleanup xfs: set AGI buffer type in xlog_recover_clear_agi_bucket arm/xen: Use alloc_percpu rather than __alloc_percpu xen/gntdev: Use VM_MIXEDMAP instead of VM_IO to avoid NUMA balancing tpm xen: Remove bogus tpm_chip_unregister kernel/debug/debug_core.c: more properly delay for secondary CPUs kernel/watchdog: use nmi registers snapshot in hardlockup handler CIFS: Fix a possible memory corruption in push locks CIFS: Fix missing nls unload in smb2_reconnect() CIFS: Fix a possible memory corruption during reconnect ASoC: intel: Fix crash at suspend/resume without card registration dm space map metadata: fix 'struct sm_metadata' leak on failed create dm crypt: mark key as invalid until properly loaded dm flakey: return -EINVAL on interval bounds error in flakey_ctr() blk-mq: Do not invoke .queue_rq() for a stopped queue usb: gadget: composite: always set ep->mult to a sensible value exec: Ensure mm->user_ns contains the execed files fs: exec: apply CLOEXEC before changing dumpable task flags mm/vmscan.c: set correct defer count for shrinker loop: return proper error from loop_queue_rq() f2fs: set ->owner for debugfs status file's file_operations ext4: do not perform data journaling when data is encrypted ext4: return -ENOMEM instead of success ext4: reject inodes with negative size ext4: add sanity checking to count_overhead() ext4: fix in-superblock mount options processing ext4: use more strict checks for inodes_per_block on mount ext4: fix stack memory corruption with 64k block size ext4: fix mballoc breakage with 64k block size crypto: caam - fix AEAD givenc descriptors ptrace: Capture the ptracer's creds not PT_PTRACE_CAP mm: Add a user_ns owner to mm_struct and fix ptrace permission checks block_dev: don't test bdev->bd_contains when it is not stable btrfs: make file clone aware of fatal signals Btrfs: don't BUG() during drop snapshot Btrfs: fix memory leak in do_walk_down Btrfs: don't leak reloc root nodes on error Btrfs: return gracefully from balance if fs tree is corrupted Btrfs: bail out if block group has different mixed flag Btrfs: fix memory leak in reading btree blocks clk: ti: omap36xx: Work around sprz319 advisory 2.1 ALSA: hda: when comparing pin configurations, ignore assoc in addition to seq ALSA: hda - Gate the mic jack on HP Z1 Gen3 AiO ALSA: hda - fix headset-mic problem on a Dell laptop ALSA: hda - ignore the assoc and seq when comparing pin configurations ALSA: hda/ca0132 - Add quirk for Alienware 15 R2 2016 ALSA: hiface: Fix M2Tech hiFace driver sampling rate change ALSA: usb-audio: Add QuickCam Communicate Deluxe/S7500 to volume_control_quirks USB: UHCI: report non-PME wakeup signalling for Intel hardware usb: gadget: composite: correctly initialize ep->maxpacket usb: gadget: f_uac2: fix error handling at afunc_bind usb: hub: Fix auto-remount of safely removed or ejected USB-3 devices USB: cdc-acm: add device id for GW Instek AFG-125 USB: serial: kl5kusb105: fix open error path USB: serial: option: add dlink dwm-158 USB: serial: option: add support for Telit LE922A PIDs 0x1040, 0x1041 Btrfs: fix qgroup rescan worker initialization btrfs: store and load values of stripes_min/stripes_max in balance status item Btrfs: fix tree search logic when replaying directory entry deletes btrfs: limit async_work allocation and worker func duration ANDROID: trace: net: use %pK for kernel pointers ANDROID: android-base: Enable QUOTA related configs net: ipv4: Don't crash if passing a null sk to ip_rt_update_pmtu. net: inet: Support UID-based routing in IP protocols. Revert "net: ipv6: fix virtual tunneling build" net: core: add UID to flows, rules, and routes net: core: Add a UID field to struct sock. Revert "net: core: Support UID-based routing." Revert "net: core: Handle 'sk' being NULL in UID-based routing" Revert "ANDROID: net: fix 'const' warnings" Revert "ANDROID: net: fib: remove duplicate assignment" Revert "ANDROID: net: core: fix UID-based routing" UPSTREAM: efi/arm64: Don't apply MEMBLOCK_NOMAP to UEFI memory map mapping UPSTREAM: arm64: enable CONFIG_DEBUG_RODATA by default goldfish: enable CONFIG_INET_DIAG_DESTROY sched/walt: kill {min,max}_capacity sched: fix wrong truncation of walt_avg ANDROID: dm verity: add minimum prefetch size Linux 4.4.39 crypto: rsa - Add Makefile dependencies to fix parallel builds hotplug: Make register and unregister notifier API symmetric batman-adv: Check for alloc errors when preparing TT local data m68k: Fix ndelay() macro arm64: futex.h: Add missing PAN toggling can: peak: fix bad memory access and free sequence can: raw: raw_setsockopt: limit number of can_filter that can be set crypto: mcryptd - Check mcryptd algorithm compatibility perf/x86: Fix full width counter, counter overflow locking/rtmutex: Use READ_ONCE() in rt_mutex_owner() locking/rtmutex: Prevent dequeue vs. unlock race zram: restrict add/remove attributes to root only parisc: Fix TLB related boot crash on SMP machines parisc: Remove unnecessary TLB purges from flush_dcache_page_asm and flush_icache_page_asm parisc: Purge TLB before setting PTE powerpc/eeh: Fix deadlock when PE frozen state can't be cleared Conflicts: arch/arm64/kernel/traps.c drivers/usb/dwc3/core.h drivers/usb/dwc3/ep0.c drivers/usb/gadget/function/f_fs.c drivers/usb/host/xhci-mem.c drivers/usb/host/xhci-ring.c drivers/usb/host/xhci.c drivers/video/fbdev/core/fbcmap.c include/trace/events/sched.h mm/vmscan.c Change-Id: I3faa0010ecb98972cd8e6470377a493b56d95f89 Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org> Signed-off-by: Runmin Wang <runminw@codeaurora.org>	2017-03-18 08:55:10 -07:00
Vlastimil Babka	28aea75c4b	mm, memory hotplug: small cleanup in online_pages() We can reuse the nid we've determined instead of repeated pfn_to_nid() usages. Also zone_to_nid() should be a bit cheaper in general than pfn_to_nid(). Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: David Rientjes <rientjes@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Git-commit: e888ca3545dc6823603b976e40b62af2c68b6fcc Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Change-Id: I6bdb2530fd6306ceb49022ef9cdc82b5598ebe8c Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>	2017-02-21 12:37:10 +05:30
Vlastimil Babka	f9d29d58eb	mm, compaction: introduce kcompactd Memory compaction can be currently performed in several contexts: - kswapd balancing a zone after a high-order allocation failure - direct compaction to satisfy a high-order allocation, including THP page fault attemps - khugepaged trying to collapse a hugepage - manually from /proc The purpose of compaction is two-fold. The obvious purpose is to satisfy a (pending or future) high-order allocation, and is easy to evaluate. The other purpose is to keep overal memory fragmentation low and help the anti-fragmentation mechanism. The success wrt the latter purpose is more The current situation wrt the purposes has a few drawbacks: - compaction is invoked only when a high-order page or hugepage is not available (or manually). This might be too late for the purposes of keeping memory fragmentation low. - direct compaction increases latency of allocations. Again, it would be better if compaction was performed asynchronously to keep fragmentation low, before the allocation itself comes. - (a special case of the previous) the cost of compaction during THP page faults can easily offset the benefits of THP. - kswapd compaction appears to be complex, fragile and not working in some scenarios. It could also end up compacting for a high-order allocation request when it should be reclaiming memory for a later order-0 request. To improve the situation, we should be able to benefit from an equivalent of kswapd, but for compaction - i.e. a background thread which responds to fragmentation and the need for high-order allocations (including hugepages) somewhat proactively. One possibility is to extend the responsibilities of kswapd, which could however complicate its design too much. It should be better to let kswapd handle reclaim, as order-0 allocations are often more critical than high-order ones. Another possibility is to extend khugepaged, but this kthread is a single instance and tied to THP configs. This patch goes with the option of a new set of per-node kthreads called kcompactd, and lays the foundations, without introducing any new tunables. The lifecycle mimics kswapd kthreads, including the memory hotplug hooks. For compaction, kcompactd uses the standard compaction_suitable() and ompact_finished() criteria and the deferred compaction functionality. Unlike direct compaction, it uses only sync compaction, as there's no allocation latency to minimize. This patch doesn't yet add a call to wakeup_kcompactd. The kswapd compact/reclaim loop for high-order pages will be replaced by waking up kcompactd in the next patch with the description of what's wrong with the old approach. Waking up of the kcompactd threads is also tied to kswapd activity and follows these rules: - we don't want to affect any fastpaths, so wake up kcompactd only from the slowpath, as it's done for kswapd - if kswapd is doing reclaim, it's more important than compaction, so don't invoke kcompactd until kswapd goes to sleep - the target order used for kswapd is passed to kcompactd Future possible future uses for kcompactd include the ability to wake up kcompactd on demand in special situations, such as when hugepages are not available (currently not done due to __GFP_NO_KSWAPD) or when a fragmentation event (i.e. __rmqueue_fallback()) occurs. It's also possible to perform periodic compaction with kcompactd. [arnd@arndb.de: fix build errors with kcompactd] [paul.gortmaker@windriver.com: don't use modular references for non modular code] Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: David Rientjes <rientjes@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Git-commit: 698b1b30642f1ff0ea10ef1de9745ab633031377 Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Change-Id: I987ae548cba936987b8479dc02de67d0f88b9cb6 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>	2017-02-21 12:36:57 +05:30
Toshi Kani	87ebcc534d	base/memory, hotplug: fix a kernel oops in show_valid_zones() commit a96dfddbcc04336bbed50dc2b24823e45e09e80c upstream. Reading a sysfs "memoryN/valid_zones" file leads to the following oops when the first page of a range is not backed by struct page. show_valid_zones() assumes that 'start_pfn' is always valid for page_zone(). BUG: unable to handle kernel paging request at ffffea017a000000 IP: show_valid_zones+0x6f/0x160 This issue may happen on x86-64 systems with 64GiB or more memory since their memory block size is bumped up to 2GiB. [1] An example of such systems is desribed below. 0x3240000000 is only aligned by 1GiB and this memory block starts from 0x3200000000, which is not backed by struct page. BIOS-e820: [mem 0x0000003240000000-0x000000603fffffff] usable Since test_pages_in_a_zone() already checks holes, fix this issue by extending this function to return 'valid_start' and 'valid_end' for a given range. show_valid_zones() then proceeds with the valid range. [1] 'Commit `bdee237c03` ("x86: mm: Use 2GB memory block size on large-memory x86-64 systems")' Link: http://lkml.kernel.org/r/20170127222149.30893-3-toshi.kani@hpe.com Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Zhang Zhen <zhenzhang.zhang@huawei.com> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> [4.4+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-02-09 08:02:47 +01:00
Toshi Kani	e86a876957	mm/memory_hotplug.c: check start_pfn in test_pages_in_a_zone() commit deb88a2a19e85842d79ba96b05031739ec327ff4 upstream. Patch series "fix a kernel oops when reading sysfs valid_zones", v2. A sysfs memory file is created for each 2GiB memory block on x86-64 when the system has 64GiB or more memory. [1] When the start address of a memory block is not backed by struct page, i.e. a memory range is not aligned by 2GiB, reading its 'valid_zones' attribute file leads to a kernel oops. This issue was observed on multiple x86-64 systems with more than 64GiB of memory. This patch-set fixes this issue. Patch 1 first fixes an issue in test_pages_in_a_zone(), which does not test the start section. Patch 2 then fixes the kernel oops by extending test_pages_in_a_zone() to return valid [start, end). Note for stable kernels: The memory block size change was made by commit `bdee237c03` ("x86: mm: Use 2GB memory block size on large-memory x86-64 systems"), which was accepted to 3.9. However, this patch-set depends on (and fixes) the change to test_pages_in_a_zone() made by commit `5f0f2887f4` ("mm/memory_hotplug.c: check for missing sections in test_pages_in_a_zone()"), which was accepted to 4.4. So, I recommend that we backport it up to 4.4. [1] 'Commit `bdee237c03` ("x86: mm: Use 2GB memory block size on large-memory x86-64 systems")' This patch (of 2): test_pages_in_a_zone() does not check 'start_pfn' when it is aligned by section since 'sec_end_pfn' is set equal to 'pfn'. Since this function is called for testing the range of a sysfs memory file, 'start_pfn' is always aligned by section. Fix it by properly setting 'sec_end_pfn' to the next section pfn. Also make sure that this function returns 1 only when the range belongs to a zone. Link: http://lkml.kernel.org/r/20170127222149.30893-2-toshi.kani@hpe.com Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Andrew Banman <abanman@sgi.com> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-02-09 08:02:45 +01:00
Andrew Banman	5f0f2887f4	mm/memory_hotplug.c: check for missing sections in test_pages_in_a_zone() test_pages_in_a_zone() does not account for the possibility of missing sections in the given pfn range. pfn_valid_within always returns 1 when CONFIG_HOLES_IN_ZONE is not set, allowing invalid pfns from missing sections to pass the test, leading to a kernel oops. Wrap an additional pfn loop with PAGES_PER_SECTION granularity to check for missing sections before proceeding into the zone-check code. This also prevents a crash from offlining memory devices with missing sections. Despite this, it may be a good idea to keep the related patch '[PATCH 3/3] drivers: memory: prohibit offlining of memory blocks with missing sections' because missing sections in a memory block may lead to other problems not covered by the scope of this fix. Signed-off-by: Andrew Banman <abanman@sgi.com> Acked-by: Alex Thorlton <athorlton@sgi.com> Cc: Russ Anderson <rja@sgi.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Greg KH <greg@kroah.com> Cc: Seth Jennings <sjennings@variantweb.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-12-29 17:45:49 -08:00
Yaowei Bai	b171e40930	mm/page_alloc: remove unused parameter in init_currently_empty_zone() Commit `a2f3aa0257` ("[PATCH] Fix sparsemem on Cell") fixed an oops experienced on the Cell architecture when init-time functions, early_*(), are called at runtime by introducing an 'enum memmap_context' parameter to memmap_init_zone() and init_currently_empty_zone(). This parameter is intended to be used to tell whether the call of these two functions is being made on behalf of a hotplug event, or happening at boot-time. However, init_currently_empty_zone() does not use this parameter at all, so remove it. Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-05 19:34:48 -08:00
David Vrabel	62cedb9f13	mm: memory hotplug with an existing resource Add add_memory_resource() to add memory using an existing "System RAM" resource. This is useful if the memory region is being located by finding a free resource slot with allocate_resource(). Xen guests will make use of this in their balloon driver to hotplug arbitrary amounts of memory in response to toolstack requests. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>	2015-10-23 14:19:58 +01:00
Linus Torvalds	12f03ee606	libnvdimm for 4.3: 1/ Introduce ZONE_DEVICE and devm_memremap_pages() as a generic mechanism for adding device-driver-discovered memory regions to the kernel's direct map. This facility is used by the pmem driver to enable pfn_to_page() operations on the page frames returned by DAX ('direct_access' in 'struct block_device_operations'). For now, the 'memmap' allocation for these "device" pages comes from "System RAM". Support for allocating the memmap from device memory will arrive in a later kernel. 2/ Introduce memremap() to replace usages of ioremap_cache() and ioremap_wt(). memremap() drops the __iomem annotation for these mappings to memory that do not have i/o side effects. The replacement of ioremap_cache() with memremap() is limited to the pmem driver to ease merging the api change in v4.3. Completion of the conversion is targeted for v4.4. 3/ Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem driver, update the VFS DAX implementation and PMEM api to provide persistence guarantees for kernel operations on a DAX mapping. 4/ Convert the ACPI NFIT 'BLK' driver to map the block apertures as cacheable to improve performance. 5/ Miscellaneous updates and fixes to libnvdimm including support for issuing "address range scrub" commands, clarifying the optimal 'sector size' of pmem devices, a clarification of the usage of the ACPI '_STA' (status) property for DIMM devices, and other minor fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJV6Nx7AAoJEB7SkWpmfYgCWyYQAI5ju6Gvw27RNFtPovHcZUf5 JGnxXejI6/AqeTQ+IulgprxtEUCrXOHjCDA5dkjr1qvsoqK1qxug+vJHOZLgeW0R OwDtmdW4Qrgeqm+CPoxETkorJ8wDOc8mol81kTiMgeV3UqbYeeHIiTAmwe7VzZ0C nNdCRDm5g8dHCjTKcvK3rvozgyoNoWeBiHkPe76EbnxDICxCB5dak7XsVKNMIVFQ NuYlnw6IYN7+rMHgpgpRux38NtIW8VlYPWTmHExejc2mlioWMNBG/bmtwLyJ6M3e zliz4/cnonTMUaizZaVozyinTa65m7wcnpjK+vlyGV2deDZPJpDRvSOtB0lH30bR 1gy+qrKzuGKpaN6thOISxFLLjmEeYwzYd7SvC9n118r32qShz+opN9XX0WmWSFlA sajE1ehm4M7s5pkMoa/dRnAyR8RUPu4RNINdQ/Z9jFfAOx+Q26rLdQXwf9+uqbEb bIeSQwOteK5vYYCstvpAcHSMlJAglzIX5UfZBvtEIJN7rlb0VhmGWfxAnTu+ktG1 o9cqAt+J4146xHaFwj5duTsyKhWb8BL9+xqbKPNpXEp+PbLsrnE/+WkDLFD67jxz dgIoK60mGnVXp+16I2uMqYYDgAyO5zUdmM4OygOMnZNa1mxesjbDJC6Wat1Wsndn slsw6DkrWT60CRE42nbK =o57/ -----END PGP SIGNATURE----- Merge tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "This update has successfully completed a 0day-kbuild run and has appeared in a linux-next release. The changes outside of the typical drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and the introduction of ZONE_DEVICE + devm_memremap_pages(). Summary: - Introduce ZONE_DEVICE and devm_memremap_pages() as a generic mechanism for adding device-driver-discovered memory regions to the kernel's direct map. This facility is used by the pmem driver to enable pfn_to_page() operations on the page frames returned by DAX ('direct_access' in 'struct block_device_operations'). For now, the 'memmap' allocation for these "device" pages comes from "System RAM". Support for allocating the memmap from device memory will arrive in a later kernel. - Introduce memremap() to replace usages of ioremap_cache() and ioremap_wt(). memremap() drops the __iomem annotation for these mappings to memory that do not have i/o side effects. The replacement of ioremap_cache() with memremap() is limited to the pmem driver to ease merging the api change in v4.3. Completion of the conversion is targeted for v4.4. - Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem driver, update the VFS DAX implementation and PMEM api to provide persistence guarantees for kernel operations on a DAX mapping. - Convert the ACPI NFIT 'BLK' driver to map the block apertures as cacheable to improve performance. - Miscellaneous updates and fixes to libnvdimm including support for issuing "address range scrub" commands, clarifying the optimal 'sector size' of pmem devices, a clarification of the usage of the ACPI '_STA' (status) property for DIMM devices, and other minor fixes" * tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (34 commits) libnvdimm, pmem: direct map legacy pmem by default libnvdimm, pmem: 'struct page' for pmem libnvdimm, pfn: 'struct page' provider infrastructure x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB add devm_memremap_pages mm: ZONE_DEVICE for "device memory" mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h dax: drop size parameter to ->direct_access() nd_blk: change aperture mapping from WC to WB nvdimm: change to use generic kvfree() pmem, dax: have direct_access use __pmem annotation dax: update I/O path to do proper PMEM flushing pmem: add copy_from_iter_pmem() and clear_pmem() pmem, x86: clean up conditional pmem includes pmem: remove layer when calling arch_has_wmb_pmem() pmem, x86: move x86 PMEM API to new pmem.h header libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option pmem: switch to devm_ allocations devres: add devm_memremap libnvdimm, btt: write and validate parent_uuid ...	2015-09-08 14:35:59 -07:00
Tang Chen	7f36e3e56d	memory-hotplug: add hot-added memory ranges to memblock before allocate node_data for a node. Commit `f9126ab924` ("memory-hotplug: fix wrong edge when hot add a new node") hot-added memory range to memblock, after creating pgdat for new node. But there is a problem: add_memory() \|--> hotadd_new_pgdat() \|--> free_area_init_node() \|--> get_pfn_range_for_nid() \|--> find start_pfn and end_pfn in memblock \|--> ...... \|--> memblock_add_node(start, size, nid) -------- Here, just too late. get_pfn_range_for_nid() will find that start_pfn and end_pfn are both 0. As a result, when adding memory, dmesg will give the following wrong message. Initmem setup node 5 [mem 0x0000000000000000-0xffffffffffffffff] On node 5 totalpages: 0 Built 5 zonelists in Node order, mobility grouping on. Total pages: 32588823 Policy zone: Normal init_memory_mapping: [mem 0x60000000000-0x607ffffffff] The solution is simple, just add the memory range to memblock a little earlier, before hotadd_new_pgdat(). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> [4.2.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Dan Williams	033fbae988	mm: ZONE_DEVICE for "device memory" While pmem is usable as a block device or via DAX mappings to userspace there are several usage scenarios that can not target pmem due to its lack of struct page coverage. In preparation for "hot plugging" pmem into the vmemmap add ZONE_DEVICE as a new zone to tag these pages separately from the ones that are subject to standard page allocations. Importantly "device memory" can be removed at will by userspace unbinding the driver of the device. Having a separate zone prevents allocation and otherwise marks these pages that are distinct from typical uniform memory. Device memory has different lifetime and performance characteristics than RAM. However, since we have run out of ZONES_SHIFT bits this functionality currently depends on sacrificing ZONE_DMA. Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Jerome Glisse <j.glisse@gmail.com> [hch: various simplifications in the arch interface] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2015-08-27 19:40:58 -04:00
Xishi Qiu	f9126ab924	memory-hotplug: fix wrong edge when hot add a new node When we add a new node, the edge of memory may be wrong. e.g. system has 4 nodes, and node3 is movable, node3 mem:[24G-32G], 1. hotremove the node3, 2. then hotadd node3 with a part of memory, mem:[26G-30G], 3. call hotadd_new_pgdat() free_area_init_node() get_pfn_range_for_nid() 4. it will return wrong start_pfn and end_pfn, because we have not update the memblock. This patch also fixes a BUG_ON during hot-addition, please see http://marc.info/?l=linux-kernel&m=142961156129456&w=2 Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-08-14 15:56:32 -07:00
Mel Gorman	e298ff75f1	mm: initialize hotplugged pages as reserved Commit `92923ca3aa` ("mm: meminit: only set page reserved in the memblock region") broke memory hotplug which expects the memmap for newly added sections to be reserved until onlined by online_pages_range(). This patch marks hotplugged pages as reserved when adding new zones. Signed-off-by: Mel Gorman <mgorman@suse.de> Reported-by: David Vrabel <david.vrabel@citrix.com> Tested-by: David Vrabel <david.vrabel@citrix.com> Cc: Nathan Zimmer <nzimmer@sgi.com> Cc: Robin Holt <holt@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-08-07 04:39:41 +03:00
Zhu Guihua	c435a39057	mm/memory hotplug: print the last vmemmap region at the end of hot add memory When hot add two nodes continuously, we found the vmemmap region info is a bit messed. The last region of node 2 is printed when node 3 hot added, like the following: Initmem setup node 2 [mem 0x0000000000000000-0xffffffffffffffff] On node 2 totalpages: 0 Built 2 zonelists in Node order, mobility grouping on. Total pages: 16090539 Policy zone: Normal init_memory_mapping: [mem 0x40000000000-0x407ffffffff] [mem 0x40000000000-0x407ffffffff] page 1G [ffffea1000000000-ffffea10001fffff] PMD -> [ffff8a077d800000-ffff8a077d9fffff] on node 2 [ffffea1000200000-ffffea10003fffff] PMD -> [ffff8a077de00000-ffff8a077dffffff] on node 2 ... [ffffea101f600000-ffffea101f9fffff] PMD -> [ffff8a074ac00000-ffff8a074affffff] on node 2 [ffffea101fa00000-ffffea101fdfffff] PMD -> [ffff8a074a800000-ffff8a074abfffff] on node 2 Initmem setup node 3 [mem 0x0000000000000000-0xffffffffffffffff] On node 3 totalpages: 0 Built 3 zonelists in Node order, mobility grouping on. Total pages: 16090539 Policy zone: Normal init_memory_mapping: [mem 0x60000000000-0x607ffffffff] [mem 0x60000000000-0x607ffffffff] page 1G [ffffea101fe00000-ffffea101fffffff] PMD -> [ffff8a074a400000-ffff8a074a5fffff] on node 2 <=== node 2 ??? [ffffea1800000000-ffffea18001fffff] PMD -> [ffff8a074a600000-ffff8a074a7fffff] on node 3 [ffffea1800200000-ffffea18005fffff] PMD -> [ffff8a074a000000-ffff8a074a3fffff] on node 3 [ffffea1800600000-ffffea18009fffff] PMD -> [ffff8a0749c00000-ffff8a0749ffffff] on node 3 ... The cause is the last region was missed at the and of hot add memory, and p_start, p_end, node_start were not reset, so when hot add memory to a new node, it will consider they are not contiguous blocks and print the previous one. So we print the last vmemmap region at the end of hot add memory to avoid the confusion. Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-06-24 17:49:45 -07:00
Gu Zheng	85bd839983	mm/memory_hotplug.c: set zone->wait_table to null after freeing it Izumi found the following oops when hot re-adding a node: BUG: unable to handle kernel paging request at ffffc90008963690 IP: __wake_up_bit+0x20/0x70 Oops: 0000 [#1] SMP CPU: 68 PID: 1237 Comm: rs:main Q:Reg Not tainted 4.1.0-rc5 #80 Hardware name: FUJITSU PRIMEQUEST2800E/SB, BIOS PRIMEQUEST 2000 Series BIOS Version 1.87 04/28/2015 task: ffff880838df8000 ti: ffff880017b94000 task.ti: ffff880017b94000 RIP: 0010:[<ffffffff810dff80>] [<ffffffff810dff80>] __wake_up_bit+0x20/0x70 RSP: 0018:ffff880017b97be8 EFLAGS: 00010246 RAX: ffffc90008963690 RBX: 00000000003c0000 RCX: 000000000000a4c9 RDX: 0000000000000000 RSI: ffffea101bffd500 RDI: ffffc90008963648 RBP: ffff880017b97c08 R08: 0000000002000020 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a0797c73800 R13: ffffea101bffd500 R14: 0000000000000001 R15: 00000000003c0000 FS: 00007fcc7ffff700(0000) GS:ffff880874800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc90008963690 CR3: 0000000836761000 CR4: 00000000001407e0 Call Trace: unlock_page+0x6d/0x70 generic_write_end+0x53/0xb0 xfs_vm_write_end+0x29/0x80 [xfs] generic_perform_write+0x10a/0x1e0 xfs_file_buffered_aio_write+0x14d/0x3e0 [xfs] xfs_file_write_iter+0x79/0x120 [xfs] __vfs_write+0xd4/0x110 vfs_write+0xac/0x1c0 SyS_write+0x58/0xd0 system_call_fastpath+0x12/0x76 Code: 5d c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 45 f8 31 c0 48 8d 47 48 <48> 39 47 48 48 c7 45 e8 00 00 00 00 48 c7 45 f0 00 00 00 00 48 RIP [<ffffffff810dff80>] __wake_up_bit+0x20/0x70 RSP <ffff880017b97be8> CR2: ffffc90008963690 Reproduce method (re-add a node):: Hot-add nodeA --> remove nodeA --> hot-add nodeA (panic) This seems an use-after-free problem, and the root cause is zone->wait_table was not set to NULL after free it in try_offline_node. When hot re-add a node, we will reuse the pgdat of it, so does the zone struct, and when add pages to the target zone, it will init the zone first (including the wait_table) if the zone is not initialized. The judgement of zone initialized is based on zone->wait_table: static inline bool zone_is_initialized(struct zone zone) { return !!zone->wait_table; } so if we do not set the zone->wait_table to NULL* after free it, the memory hotplug routine will skip the init of new zone when hot re-add the node, and the wait_table still points to the freed memory, then we will access the invalid address when trying to wake up the waiting people after the i/o operation with the page is done, such as mentioned above. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Reported-by: Taku Izumi <izumi.taku@jp.fujitsu.com> Reviewed by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-06-10 16:43:43 -07:00
Naoya Horiguchi	7e1f049efb	mm: hugetlb: cleanup using paeg_huge_active() Now we have an easy access to hugepages' activeness, so existing helpers to get the information can be cleaned up. [akpm@linux-foundation.org: s/PageHugeActive/page_huge_active/] Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hugh Dickins <hughd@google.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-04-15 16:35:19 -07:00
David Rientjes	30467e0b3b	mm, hotplug: fix concurrent memory hot-add deadlock There's a deadlock when concurrently hot-adding memory through the probe interface and switching a memory block from offline to online. When hot-adding memory via the probe interface, add_memory() first takes mem_hotplug_begin() and then device_lock() is later taken when registering the newly initialized memory block. This creates a lock dependency of (1) mem_hotplug.lock (2) dev->mutex. When switching a memory block from offline to online, dev->mutex is first grabbed in device_online() when the write(2) transitions an existing memory block from offline to online, and then online_pages() will take mem_hotplug_begin(). This creates a lock inversion between mem_hotplug.lock and dev->mutex. Vitaly reports that this deadlock can happen when kworker handling a probe event races with systemd-udevd switching a memory block's state. This patch requires the state transition to take mem_hotplug_begin() before dev->mutex. Hot-adding memory via the probe interface creates a memory block while holding mem_hotplug_begin(), there is no way to take dev->mutex first in this case. online_pages() and offline_pages() are only called when transitioning memory block state. We now require that mem_hotplug_begin() is taken before calling them -- this requires exporting the mem_hotplug_begin() and mem_hotplug_done() to generic code. In all hot-add and hot-remove cases, mem_hotplug_begin() is done prior to device_online(). This is all that is needed to avoid the deadlock. Signed-off-by: David Rientjes <rientjes@google.com> Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zhang Zhen <zhenzhang.zhang@huawei.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: Wang Nan <wangnan0@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-04-14 16:49:00 -07:00
Sheng Yong	19c07d5e04	memory hotplug: use macro to switch between section and pfn Use macro section_nr_to_pfn() to switch between section and pfn, instead of open-coding it. No semantic changes. Signed-off-by: Sheng Yong <shengyong1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-04-14 16:49:00 -07:00
Gu Zheng	b0dc3a342a	mm/memory hotplug: postpone the reset of obsolete pgdat Qiu Xishi reported the following BUG when testing hot-add/hot-remove node under stress condition: BUG: unable to handle kernel paging request at 0000000000025f60 IP: next_online_pgdat+0x1/0x50 PGD 0 Oops: 0000 [#1] SMP ACPI: Device does not support D3cold Modules linked in: fuse nls_iso8859_1 nls_cp437 vfat fat loop dm_mod coretemp mperf crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 pcspkr microcode igb dca i2c_algo_bit ipv6 megaraid_sas iTCO_wdt i2c_i801 i2c_core iTCO_vendor_support tg3 sg hwmon ptp lpc_ich pps_core mfd_core acpi_pad rtc_cmos button ext3 jbd mbcache sd_mod crc_t10dif scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh ahci libahci libata scsi_mod [last unloaded: rasf] CPU: 23 PID: 238 Comm: kworker/23:1 Tainted: G O 3.10.15-5885-euler0302 #1 Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei N1, BIOS V100R001 03/02/2015 Workqueue: events vmstat_update task: ffffa800d32c0000 ti: ffffa800d32ae000 task.ti: ffffa800d32ae000 RIP: 0010: next_online_pgdat+0x1/0x50 RSP: 0018:ffffa800d32afce8 EFLAGS: 00010286 RAX: 0000000000001440 RBX: ffffffff81da53b8 RCX: 0000000000000082 RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000000 RBP: ffffa800d32afd28 R08: ffffffff81c93bfc R09: ffffffff81cbdc96 R10: 00000000000040ec R11: 00000000000000a0 R12: ffffa800fffb3440 R13: ffffa800d32afd38 R14: 0000000000000017 R15: ffffa800e6616800 FS: 0000000000000000(0000) GS:ffffa800e6600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000025f60 CR3: 0000000001a0b000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: refresh_cpu_vm_stats+0xd0/0x140 vmstat_update+0x11/0x50 process_one_work+0x194/0x3d0 worker_thread+0x12b/0x410 kthread+0xc6/0xd0 ret_from_fork+0x7c/0xb0 The cause is the "memset(pgdat, 0, sizeof(pgdat))" at the end of try_offline_node, which will reset all the content of pgdat to 0, as the pgdat is accessed lock-free, so that the users still using the pgdat will panic, such as the vmstat_update routine. process A: offline node XX: vmstat_updat() refresh_cpu_vm_stats() for_each_populated_zone() find online node XX cond_resched() offline cpu and memory, then try_offline_node() node_set_offline(nid), and memset(pgdat, 0, sizeof(pgdat)) zone = next_zone(zone) pg_data_t *pgdat = zone->zone_pgdat; // here pgdat is NULL now next_online_pgdat(pgdat) next_online_node(pgdat->node_id); // NULL pointer access So the solution here is postponing the reset of obsolete pgdat from try_offline_node() to hotadd_new_pgdat(), and just resetting pgdat->nr_zones and pgdat->classzone_idx to be 0 rather than the memset 0 to avoid breaking pointer information in pgdat. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Reported-by: Xishi Qiu <qiuxishi@huawei.com> Suggested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Xie XiuQi <xiexiuqi@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-03-25 16:20:30 -07:00
Vlastimil Babka	c05543293e	mm, memory_hotplug/failure: drain single zone pcplists Memory hotplug and failure mechanisms have several places where pcplists are drained so that pages are returned to the buddy allocator and can be e.g. prepared for offlining. This is always done in the context of a single zone, we can reduce the pcplists drain to the single zone, which is now possible. The change should make memory offlining due to hotremove or failure faster and not disturbing unrelated pcplists anymore. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:05 -08:00
Vlastimil Babka	93481ff0e5	mm: introduce single zone pcplists drain The functions for draining per-cpu pages back to buddy allocators currently always operate on all zones. There are however several cases where the drain is only needed in the context of a single zone, and spilling other pcplists is a waste of time both due to the extra spilling and later refilling. This patch introduces new zone pointer parameter to drain_all_pages() and changes the dummy parameter of drain_local_pages() to be also a zone pointer. When NULL is passed, the functions operate on all zones as usual. Passing a specific zone pointer reduces the work to the single zone. All callers are updated to pass the NULL pointer in this patch. Conversion to single zone (where appropriate) is done in further patches. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:05 -08:00
Tang Chen	0bd8542008	mem-hotplug: reset node present pages when hot-adding a new pgdat When memory is hot-added, all the memory is in offline state. So clear all zones' present_pages because they will be updated in online_pages() and offline_pages(). Otherwise, /proc/zoneinfo will corrupt: When the memory of node2 is offline: # cat /proc/zoneinfo ...... Node 2, zone Movable ...... spanned 8388608 present 8388608 managed 0 When we online memory on node2: # cat /proc/zoneinfo ...... Node 2, zone Movable ...... spanned 8388608 present 16777216 managed 8388608 Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: <stable@vger.kernel.org> [3.16+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-11-13 16:17:06 -08:00
Tang Chen	f784a3f196	mem-hotplug: reset node managed pages when hot-adding a new pgdat In free_area_init_core(), zone->managed_pages is set to an approximate value for lowmem, and will be adjusted when the bootmem allocator frees pages into the buddy system. But free_area_init_core() is also called by hotadd_new_pgdat() when hot-adding memory. As a result, zone->managed_pages of the newly added node's pgdat is set to an approximate value in the very beginning. Even if the memory on that node has node been onlined, /sys/device/system/node/nodeXXX/meminfo has wrong value: hot-add node2 (memory not onlined) cat /sys/device/system/node/node2/meminfo Node 2 MemTotal: 33554432 kB Node 2 MemFree: 0 kB Node 2 MemUsed: 33554432 kB Node 2 Active: 0 kB This patch fixes this problem by reset node managed pages to 0 after hot-adding a new node. 1. Move reset_managed_pages_done from reset_node_managed_pages() to reset_all_zones_managed_pages() 2. Make reset_node_managed_pages() non-static 3. Call reset_node_managed_pages() in hotadd_new_pgdat() after pgdat is initialized Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: <stable@vger.kernel.org> [3.16+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-11-13 16:17:06 -08:00
Yasuaki Ishimatsu	35dca71c1f	memory-hotplug: clear pgdat which is allocated by bootmem in try_offline_node() When hot adding the same memory after hot removal, the following messages are shown: WARNING: CPU: 20 PID: 6 at mm/page_alloc.c:4968 free_area_init_node+0x3fe/0x426() ... Call Trace: dump_stack+0x46/0x58 warn_slowpath_common+0x81/0xa0 warn_slowpath_null+0x1a/0x20 free_area_init_node+0x3fe/0x426 hotadd_new_pgdat+0x90/0x110 add_memory+0xd4/0x200 acpi_memory_device_add+0x1aa/0x289 acpi_bus_attach+0xfd/0x204 acpi_bus_attach+0x178/0x204 acpi_bus_scan+0x6a/0x90 acpi_device_hotplug+0xe8/0x418 acpi_hotplug_work_fn+0x1f/0x2b process_one_work+0x14e/0x3f0 worker_thread+0x11b/0x510 kthread+0xe1/0x100 ret_from_fork+0x7c/0xb0 The detaled explanation is as follows: When hot removing memory, pgdat is set to 0 in try_offline_node(). But if the pgdat is allocated by bootmem allocator, the clearing step is skipped. And when hot adding the same memory, the uninitialized pgdat is reused. But free_area_init_node() checks wether pgdat is set to zero. As a result, free_area_init_node() hits WARN_ON(). This patch clears pgdat which is allocated by bootmem allocator in try_offline_node(). Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Zhen <zhenzhang.zhang@huawei.com> Cc: Wang Nan <wangnan0@huawei.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-29 16:33:14 -07:00
Zhang Zhen	ed2f240094	memory-hotplug: add sysfs valid_zones attribute Currently memory-hotplug has two limits: 1. If the memory block is in ZONE_NORMAL, you can change it to ZONE_MOVABLE, but this memory block must be adjacent to ZONE_MOVABLE. 2. If the memory block is in ZONE_MOVABLE, you can change it to ZONE_NORMAL, but this memory block must be adjacent to ZONE_NORMAL. With this patch, we can easy to know a memory block can be onlined to which zone, and don't need to know the above two limits. Updated the related Documentation. [akpm@linux-foundation.org: use conventional comment layout] [akpm@linux-foundation.org: fix build with CONFIG_MEMORY_HOTREMOVE=n] [akpm@linux-foundation.org: remove unused local zone_prev] Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Wang Nan <wangnan0@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-09 22:25:52 -04:00
Wang Nan	6326440077	memory-hotplug: add zone_for_memory() for selecting zone for new memory This series of patches fixes a problem when adding memory in bad manner. For example: for a x86_64 machine booted with "mem=400M" and with 2GiB memory installed, following commands cause problem: # echo 0x40000000 > /sys/devices/system/memory/probe [ 28.613895] init_memory_mapping: [mem 0x40000000-0x47ffffff] # echo 0x48000000 > /sys/devices/system/memory/probe [ 28.693675] init_memory_mapping: [mem 0x48000000-0x4fffffff] # echo online_movable > /sys/devices/system/memory/memory9/state # echo 0x50000000 > /sys/devices/system/memory/probe [ 29.084090] init_memory_mapping: [mem 0x50000000-0x57ffffff] # echo 0x58000000 > /sys/devices/system/memory/probe [ 29.151880] init_memory_mapping: [mem 0x58000000-0x5fffffff] # echo online_movable > /sys/devices/system/memory/memory11/state # echo online> /sys/devices/system/memory/memory8/state # echo online> /sys/devices/system/memory/memory10/state # echo offline> /sys/devices/system/memory/memory9/state [ 30.558819] Offlined Pages 32768 # free total used free shared buffers cached Mem: 780588 18014398509432020 830552 0 0 51180 -/+ buffers/cache: 18014398509380840 881732 Swap: 0 0 0 This is because the above commands probe higher memory after online a section with online_movable, which causes ZONE_HIGHMEM (or ZONE_NORMAL for systems without ZONE_HIGHMEM) overlaps ZONE_MOVABLE. After the second online_movable, the problem can be observed from zoneinfo: # cat /proc/zoneinfo ... Node 0, zone Movable pages free 65491 min 250 low 312 high 375 scanned 0 spanned 18446744073709518848 present 65536 managed 65536 ... This series of patches solve the problem by checking ZONE_MOVABLE when choosing zone for new memory. If new memory is inside or higher than ZONE_MOVABLE, makes it go there instead. After applying this series of patches, following are free and zoneinfo result (after offlining memory9): bash-4.2# free total used free shared buffers cached Mem: 780956 80112 700844 0 0 51180 -/+ buffers/cache: 28932 752024 Swap: 0 0 0 bash-4.2# cat /proc/zoneinfo Node 0, zone DMA pages free 3389 min 14 low 17 high 21 scanned 0 spanned 4095 present 3998 managed 3977 nr_free_pages 3389 ... start_pfn: 1 inactive_ratio: 1 Node 0, zone DMA32 pages free 73724 min 341 low 426 high 511 scanned 0 spanned 98304 present 98304 managed 92958 nr_free_pages 73724 ... start_pfn: 4096 inactive_ratio: 1 Node 0, zone Normal pages free 32630 min 120 low 150 high 180 scanned 0 spanned 32768 present 32768 managed 32768 nr_free_pages 32630 ... start_pfn: 262144 inactive_ratio: 1 Node 0, zone Movable pages free 65476 min 241 low 301 high 361 scanned 0 spanned 98304 present 65536 managed 65536 nr_free_pages 65476 ... start_pfn: 294912 inactive_ratio: 1 This patch (of 7): Introduce zone_for_memory() in arch independent code for arch_add_memory() use. Many arch_add_memory() function simply selects ZONE_HIGHMEM or ZONE_NORMAL and add new memory into it. However, with the existance of ZONE_MOVABLE, the selection method should be carefully considered: if new, higher memory is added after ZONE_MOVABLE is setup, the default zone and ZONE_MOVABLE may overlap each other. should_add_memory_movable() checks the status of ZONE_MOVABLE. If it has already contain memory, compare the address of new memory and movable memory. If new memory is higher than movable, it should be added into ZONE_MOVABLE instead of default zone. Signed-off-by: Wang Nan <wangnan0@huawei.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: "Mel Gorman" <mgorman@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-08-06 18:01:21 -07:00
Tang Chen	4f7c6b49c4	mem-hotplug: introduce MMOP_OFFLINE to replace the hard coding -1 In store_mem_state(), we have: ... 334 else if (!strncmp(buf, "offline", min_t(int, count, 7))) 335 online_type = -1; ... 355 case -1: 356 ret = device_offline(&mem->dev); 357 break; ... Here, "offline" is hard coded as -1. This patch does the following renaming: ONLINE_KEEP -> MMOP_ONLINE_KEEP ONLINE_KERNEL -> MMOP_ONLINE_KERNEL ONLINE_MOVABLE -> MMOP_ONLINE_MOVABLE and introduces MMOP_OFFLINE = -1 to avoid hard coding. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: Hu Tao <hutao@cn.fujitsu.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-08-06 18:01:16 -07:00
Fabian Frederick	f276540441	mm/memory_hotplug.c: add __meminit to grow_zone_span/grow_pgdat_span grow_zone_span and grow_pgdat_span are only called by __meminit __add_zone Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Toshi Kani <toshi.kani@hp.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-08-06 18:01:15 -07:00
David Rientjes	68711a7463	mm, migration: add destination page freeing callback Memory migration uses a callback defined by the caller to determine how to allocate destination pages. When migration fails for a source page, however, it frees the destination page back to the system. This patch adds a memory migration callback defined by the caller to determine how to free destination pages. If a caller, such as memory compaction, builds its own freelist for migration targets, this can reuse already freed memory instead of scanning additional memory. If the caller provides a function to handle freeing of destination pages, it is called when page migration fails. If the caller passes NULL then freeing back to the system will be handled as usual. This patch introduces no functional change. Signed-off-by: David Rientjes <rientjes@google.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Greg Thelen <gthelen@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-04 16:54:06 -07:00
Fabian Frederick	c8e861a531	mm/memory_hotplug.c: use PFN_DOWN() Replace ((x) >> PAGE_SHIFT) with the pfn macro. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-04 16:54:02 -07:00
Vladimir Davydov	bfc8c90139	mem-hotplug: implement get/put_online_mems kmem_cache_{create,destroy,shrink} need to get a stable value of cpu/node online mask, because they init/destroy/access per-cpu/node kmem_cache parts, which can be allocated or destroyed on cpu/mem hotplug. To protect against cpu hotplug, these functions use {get,put}_online_cpus. However, they do nothing to synchronize with memory hotplug - taking the slab_mutex does not eliminate the possibility of race as described in patch 2. What we need there is something like get_online_cpus, but for memory. We already have lock_memory_hotplug, which serves for the purpose, but it's a bit of a hammer right now, because it's backed by a mutex. As a result, it imposes some limitations to locking order, which are not desirable, and can't be used just like get_online_cpus. That's why in patch 1 I substitute it with get/put_online_mems, which work exactly like get/put_online_cpus except they block not cpu, but memory hotplug. [ v1 can be found at https://lkml.org/lkml/2014/4/6/68. I NAK'ed it by myself, because it used an rw semaphore for get/put_online_mems, making them dead lock prune. ] This patch (of 2): {un}lock_memory_hotplug, which is used to synchronize against memory hotplug, is currently backed by a mutex, which makes it a bit of a hammer - threads that only want to get a stable value of online nodes mask won't be able to proceed concurrently. Also, it imposes some strong locking ordering rules on it, which narrows down the set of its usage scenarios. This patch introduces get/put_online_mems, which are the same as get/put_online_cpus, but for memory hotplug, i.e. executing a code inside a get/put_online_mems section will guarantee a stable value of online nodes, present pages, etc. lock_memory_hotplug()/unlock_memory_hotplug() are removed altogether. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-04 16:53:59 -07:00
Nathan Zimmer	ac13c4622b	mm/memory_hotplug.c: move register_memory_resource out of the lock_memory_hotplug We don't need to do register_memory_resource() under lock_memory_hotplug() since it has its own lock and doesn't make any callbacks. Also register_memory_resource return NULL on failure so we don't have anything to cleanup at this point. The reason for this rfc is I was doing some experiments with hotplugging of memory on some of our larger systems. While it seems to work, it can be quite slow. With some preliminary digging I found that lock_memory_hotplug is clearly ripe for breakup. It could be broken up per nid or something but it also covers the online_page_callback. The online_page_callback shouldn't be very hard to break out. Also there is the issue of various structures(wmarks come to mind) that are only updated under the lock_memory_hotplug that would need to be dealt with. Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Hedi <hedi@sgi.com> Cc: Mike Travis <travis@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-01-23 16:36:52 -08:00
Dave Hansen	f0b791a34c	mm: print more details for bad_page() bad_page() is cool in that it prints out a bunch of data about the page. But, I can never remember which page flags are good and which are bad, or whether ->index or ->mapping is required to be NULL. This patch allows bad/dump_page() callers to specify a string about why they are dumping the page and adds explanation strings to a number of places. It also adds a 'bad_flags' argument to bad_page(), which it then dumps out separately from the flags which are actually set. This way, the messages will show specifically why the page was bad, specifically which flags it is complaining about, if it was a page flag combination which was the problem. [akpm@linux-foundation.org: switch to pr_alert] Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Christoph Lameter <cl@linux.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-01-23 16:36:50 -08:00
Santosh Shilimkar	9e43aa2b8d	mm/memory_hotplug.c: use memblock apis for early memory allocations Correct ensure_zone_is_initialized() function description according to the introduced memblock APIs for early memory allocations. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Paul Walmsley <paul@pwsan.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Russell King <linux@arm.linux.org.uk> Cc: Tejun Heo <tj@kernel.org> Cc: Tony Lindgren <tony@atomide.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-01-21 16:19:47 -08:00
Grygorii Strashko	869a84e1ca	mm/memblock: remove unnecessary inclusions of bootmem.h Clean-up to remove depedency with bootmem headers. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Paul Walmsley <paul@pwsan.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Russell King <linux@arm.linux.org.uk> Cc: Tony Lindgren <tony@atomide.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-01-21 16:19:46 -08:00
Tang Chen	55ac590c2f	memblock, mem_hotplug: make memblock skip hotpluggable regions if needed Linux kernel cannot migrate pages used by the kernel. As a result, hotpluggable memory used by the kernel won't be able to be hot-removed. To solve this problem, the basic idea is to prevent memblock from allocating hotpluggable memory for the kernel at early time, and arrange all hotpluggable memory in ACPI SRAT(System Resource Affinity Table) as ZONE_MOVABLE when initializing zones. In the previous patches, we have marked hotpluggable memory regions with MEMBLOCK_HOTPLUG flag in memblock.memory. In this patch, we make memblock skip these hotpluggable memory regions in the default top-down allocation function if movable_node boot option is specified. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Rafael J . Wysocki" <rjw@sisk.pl> Cc: Chen Tang <imtangchen@gmail.com> Cc: Gong Chen <gong.chen@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Len Brown <lenb@kernel.org> Cc: Liu Jiang <jiang.liu@huawei.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Renninger <trenn@suse.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-01-21 16:19:45 -08:00
Tang Chen	c5320926e3	mem-hotplug: introduce movable_node boot option The hot-Pluggable field in SRAT specifies which memory is hotpluggable. As we mentioned before, if hotpluggable memory is used by the kernel, it cannot be hot-removed. So memory hotplug users may want to set all hotpluggable memory in ZONE_MOVABLE so that the kernel won't use it. Memory hotplug users may also set a node as movable node, which has ZONE_MOVABLE only, so that the whole node can be hot-removed. But the kernel cannot use memory in ZONE_MOVABLE. By doing this, the kernel cannot use memory in movable nodes. This will cause NUMA performance down. And other users may be unhappy. So we need a way to allow users to enable and disable this functionality. In this patch, we introduce movable_node boot option to allow users to choose to not to consume hotpluggable memory at early boot time and later we can set it as ZONE_MOVABLE. To achieve this, the movable_node boot option will control the memblock allocation direction. That said, after memblock is ready, before SRAT is parsed, we should allocate memory near the kernel image as we explained in the previous patches. So if movable_node boot option is set, the kernel does the following: 1. After memblock is ready, make memblock allocate memory bottom up. 2. After SRAT is parsed, make memblock behave as default, allocate memory top down. Users can specify "movable_node" in kernel commandline to enable this functionality. For those who don't use memory hotplug or who don't want to lose their NUMA performance, just don't specify anything. The kernel will work as before. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Suggested-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Suggested-by: Ingo Molnar <mingo@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Toshi Kani <toshi.kani@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Thomas Renninger <trenn@suse.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:09 +09:00
Zhang Yanfei	85b35feaec	mm/sparsemem: use PAGES_PER_SECTION to remove redundant nr_pages parameter For below functions, - sparse_add_one_section() - kmalloc_section_memmap() - __kmalloc_section_memmap() - __kfree_section_memmap() they are always invoked to operate on one memory section, so it is redundant to always pass a nr_pages parameter, which is the page numbers in one section. So we can directly use predefined macro PAGES_PER_SECTION instead of passing the parameter. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:06 +09:00
Toshi Kani	01b0f19707	cpu/mem hotplug: add try_online_node() for cpu_up() cpu_up() has #ifdef CONFIG_MEMORY_HOTPLUG code blocks, which call mem_online_node() to put its node online if offlined and then call build_all_zonelists() to initialize the zone list. These steps are specific to memory hotplug, and should be managed in mm/memory_hotplug.c. lock_memory_hotplug() should also be held for the whole steps. For this reason, this patch replaces mem_online_node() with try_online_node(), which performs the whole steps with lock_memory_hotplug() held. try_online_node() is named after try_offline_node() as they have similar purpose. There is no functional change in this patch. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:04 +09:00
Xishi Qiu	9c2606b77d	mm/memory_hotplug.c: use pfn_to_nid() instead of page_to_nid(pfn_to_page()) Use "pfn_to_nid(pfn)" instead of "page_to_nid(pfn_to_page(pfn))". Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:04 +09:00
Xishi Qiu	d6de9d5349	mm/memory_hotplug.c: rename the function is_memblock_offlined_cb() A is_memblock_offlined() return or 1 means memory block is offlined, but is_memblock_offlined_cb() returning 1 means memory block is not offlined, this will confuse somebody, so rename the function. Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:04 +09:00
Xishi Qiu	83285c72e0	mm: use pgdat_end_pfn() to simplify the code in others Use "pgdat_end_pfn()" instead of "pgdat->node_start_pfn + pgdat->node_spanned_pages". Simplify the code, no functional change. Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:03 +09:00
Linus Torvalds	02b9735c12	ACPI and power management fixes for 3.12-rc1 1) ACPI-based PCI hotplug (ACPIPHP) fixes related to spurious events After the recent ACPIPHP changes we've seen some interesting breakage on a system that triggers device check notifications during boot for non-existing devices. Although those notifications are really spurious, we should be able to deal with them nevertheless and that shouldn't introduce too much overhead. Four commits to make that work properly. 2) Memory hotplug and hibernation mutual exclusion rework This was maent to be a cleanup, but it happens to fix a classical ABBA deadlock between system suspend/hibernation and ACPI memory hotplug which is possible if they are started roughly at the same time. Three commits rework memory hotplug so that it doesn't acquire pm_mutex and make hibernation use device_hotplug_lock which prevents it from racing with memory hotplug. 3) ACPI Intel LPSS (Low-Power Subsystem) driver crash fix The ACPI LPSS driver crashes during boot on Apple Macbook Air with Haswell that has slightly unusual BIOS configuration in which one of the LPSS device's _CRS method doesn't return all of the information expected by the driver. Fix from Mika Westerberg, for stable. 4) ACPICA fix related to Store->ArgX operation AML interpreter fix for obscure breakage that causes AML to be executed incorrectly on some machines (observed in practice). From Bob Moore. 5) ACPI core fix for PCI ACPI device objects lookup There still are cases in which there is more than one ACPI device object matching a given PCI device and we don't choose the one that the BIOS expects us to choose, so this makes the lookup take more criteria into account in those cases. 6) Fix to prevent cpuidle from crashing in some rare cases If the result of cpuidle_get_driver() is NULL, which can happen on some systems, cpuidle_driver_ref() will crash trying to use that pointer and the Daniel Fu's fix prevents that from happening. 7) cpufreq fixes related to CPU hotplug Stephen Boyd reported a number of concurrency problems with cpufreq related to CPU hotplug which are addressed by a series of fixes from Srivatsa S Bhat and Viresh Kumar. 8) cpufreq fix for time conversion in time_in_state attribute Time conversion carried out by cpufreq when user space attempts to read /sys/devices/system/cpu/cpu/cpufreq/stats/time_in_state won't work correcty if cputime_t doesn't map directly to jiffies. Fix from Andreas Schwab. 9) Revert of a troublesome cpufreq commit Commit `7c30ed5` (cpufreq: make sure frequency transitions are serialized) was intended to address some known concurrency problems in cpufreq related to the ordering of transitions, but unfortunately it introduced several problems of its own, so I decided to revert it now and address the original problems later in a more robust way. 10) Intel Haswell CPU models for intel_pstate from Nell Hardcastle. 11) cpufreq fixes related to system suspend/resume The recent cpufreq changes that made it preserve CPU sysfs attributes over suspend/resume cycles introduced a possible NULL pointer dereference that caused it to crash during the second attempt to suspend. Three commits from Srivatsa S Bhat fix that problem and a couple of related issues. 12) cpufreq locking fix cpufreq_policy_restore() should acquire the lock for reading, but it acquires it for writing. Fix from Lan Tianyu. / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAABCAAGBQJSMbdRAAoJEKhOf7ml8uNsiFkQAKSh1iBXuiUCxBApEGZgoQio 8lmnuyWdhNQWdjZTnh7ptjpDxdrWhxcoxvoaGABU++reDObjef1QnyrQtdO3r8dl oy0C/YGh5kq5SIffIDEwPIb/ipDe/47cgRMW8iBlnViDa1MJBqICuLyefcTRIrKp QGvv0owUM2o7TXpA10+qm8zXjv6m5mu1DTtxYI+2Eodhwi54neAqb+aKMspa2thy V9KFcVv3Td4rJrNvw6BhXNM81QbaYpRxaK3DRr1T6SM++EKvbqYFA1jgW24YvqTL nrCZlDMb6KRww5DCxA/ns9Kx5H+ZyicoRwdtAM3PBYA6MGqsLqPozC/8VKV1fSvZ sgUdbUSuLqKRAkOqM1bjKAhi9PdCGBvkQAg2AqbRK6IBl4HJC8xhdb5E6eZ/J42G GyNBpKef7wVJwYKXE2hSChZ5dYjqMizNHWxFHf8Xy1dveExbQ2nmSJmaWMy2A3kx YOXFkcTV5F6GOIZB8WCRruzUalff9xal4G+iVhGF+AZIOCm7bC+FDXfwIS82uVor ej2l+uQLLZCB499IRmM6942ZIAXshmtN7eRfGtKBc6jsbSCEdQDqf1Z7oRwqAD6h WkD/k/zz30CyM8y4snOkAXkZgqAQsZodtqfowE3e9OHd51tfcNiqdht+obwCx+eD MWXc2xATMAX6NcZTXSZS =U/Jw -----END PGP SIGNATURE----- Merge tag 'pm+acpi-fixes-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management fixes from Rafael Wysocki: "All of these commits are fixes that have emerged recently and some of them fix bugs introduced during this merge window. Specifics: 1) ACPI-based PCI hotplug (ACPIPHP) fixes related to spurious events After the recent ACPIPHP changes we've seen some interesting breakage on a system that triggers device check notifications during boot for non-existing devices. Although those notifications are really spurious, we should be able to deal with them nevertheless and that shouldn't introduce too much overhead. Four commits to make that work properly. 2) Memory hotplug and hibernation mutual exclusion rework This was maent to be a cleanup, but it happens to fix a classical ABBA deadlock between system suspend/hibernation and ACPI memory hotplug which is possible if they are started roughly at the same time. Three commits rework memory hotplug so that it doesn't acquire pm_mutex and make hibernation use device_hotplug_lock which prevents it from racing with memory hotplug. 3) ACPI Intel LPSS (Low-Power Subsystem) driver crash fix The ACPI LPSS driver crashes during boot on Apple Macbook Air with Haswell that has slightly unusual BIOS configuration in which one of the LPSS device's _CRS method doesn't return all of the information expected by the driver. Fix from Mika Westerberg, for stable. 4) ACPICA fix related to Store->ArgX operation AML interpreter fix for obscure breakage that causes AML to be executed incorrectly on some machines (observed in practice). From Bob Moore. 5) ACPI core fix for PCI ACPI device objects lookup There still are cases in which there is more than one ACPI device object matching a given PCI device and we don't choose the one that the BIOS expects us to choose, so this makes the lookup take more criteria into account in those cases. 6) Fix to prevent cpuidle from crashing in some rare cases If the result of cpuidle_get_driver() is NULL, which can happen on some systems, cpuidle_driver_ref() will crash trying to use that pointer and the Daniel Fu's fix prevents that from happening. 7) cpufreq fixes related to CPU hotplug Stephen Boyd reported a number of concurrency problems with cpufreq related to CPU hotplug which are addressed by a series of fixes from Srivatsa S Bhat and Viresh Kumar. 8) cpufreq fix for time conversion in time_in_state attribute Time conversion carried out by cpufreq when user space attempts to read /sys/devices/system/cpu/cpu/cpufreq/stats/time_in_state won't work correcty if cputime_t doesn't map directly to jiffies. Fix from Andreas Schwab. 9) Revert of a troublesome cpufreq commit Commit `7c30ed5` (cpufreq: make sure frequency transitions are serialized) was intended to address some known concurrency problems in cpufreq related to the ordering of transitions, but unfortunately it introduced several problems of its own, so I decided to revert it now and address the original problems later in a more robust way. 10) Intel Haswell CPU models for intel_pstate from Nell Hardcastle. 11) cpufreq fixes related to system suspend/resume The recent cpufreq changes that made it preserve CPU sysfs attributes over suspend/resume cycles introduced a possible NULL pointer dereference that caused it to crash during the second attempt to suspend. Three commits from Srivatsa S Bhat fix that problem and a couple of related issues. 12) cpufreq locking fix cpufreq_policy_restore() should acquire the lock for reading, but it acquires it for writing. Fix from Lan Tianyu" * tag 'pm+acpi-fixes-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (25 commits) cpufreq: Acquire the lock in cpufreq_policy_restore() for reading cpufreq: Prevent problems in update_policy_cpu() if last_cpu == new_cpu cpufreq: Restructure if/else block to avoid unintended behavior cpufreq: Fix crash in cpufreq-stats during suspend/resume intel_pstate: Add Haswell CPU models Revert "cpufreq: make sure frequency transitions are serialized" cpufreq: Use signed type for 'ret' variable, to store negative error values cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplug cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock cpufreq: Split __cpufreq_remove_dev() into two parts cpufreq: Fix wrong time unit conversion cpufreq: serialize calls to __cpufreq_governor() cpufreq: don't allow governor limits to be changed when it is disabled ACPI / bind: Prefer device objects with _STA to those without it ACPI / hotplug / PCI: Avoid parent bus rescans on spurious device checks ACPI / hotplug / PCI: Use _OST to notify firmware about notify status ACPI / hotplug / PCI: Avoid doing too much for spurious notifies ACPICA: Fix for a Store->ArgX when ArgX contains a reference to a field. ACPI / hotplug / PCI: Don't trim devices before scanning the namespace ...	2013-09-12 11:22:45 -07:00
Naoya Horiguchi	c8721bbbdd	mm: memory-hotplug: enable memory hotplug to handle hugepage Until now we can't offline memory blocks which contain hugepages because a hugepage is considered as an unmovable page. But now with this patch series, a hugepage has become movable, so by using hugepage migration we can offline such memory blocks. What's different from other users of hugepage migration is that we need to decompose all the hugepages inside the target memory block into free buddy pages after hugepage migration, because otherwise free hugepages remaining in the memory block intervene the memory offlining. For this reason we introduce new functions dissolve_free_huge_page() and dissolve_free_huge_pages(). Other than that, what this patch does is straightforwardly to add hugepage migration code, that is, adding hugepage code to the functions which scan over pfn and collect hugepages to be migrated, and adding a hugepage allocation function to alloc_migrate_target(). As for larger hugepages (1GB for x86_64), it's not easy to do hotremove over them because it's larger than memory block. So we now simply leave it to fail as it is. [yongjun_wei@trendmicro.com.cn: remove duplicated include] Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-09-11 15:57:48 -07:00
Toshi Kani	0f1cfe9d0d	mm/hotplug: remove stop_machine() from try_offline_node() lock_device_hotplug() serializes hotplug & online/offline operations. The lock is held in common sysfs online/offline interfaces and ACPI hotplug code paths. And here are the code paths: - CPU & Mem online/offline via sysfs online store_online()->lock_device_hotplug() - Mem online via sysfs state: store_mem_state()->lock_device_hotplug() - ACPI CPU & Mem hot-add: acpi_scan_bus_device_check()->lock_device_hotplug() - ACPI CPU & Mem hot-delete: acpi_scan_hot_remove()->lock_device_hotplug() try_offline_node() off-lines a node if all memory sections and cpus are removed on the node. It is called from acpi_processor_remove() and acpi_memory_remove_memory()->remove_memory() paths, both of which are in the ACPI hotplug code. try_offline_node() calls stop_machine() to stop all cpus while checking all cpu status with the assumption that the caller is not protected from CPU hotplug or CPU online/offline operations. However, the caller is always serialized with lock_device_hotplug(). Also, the code needs to be properly serialized with a lock, not by stopping all cpus at a random place with stop_machine(). This patch removes the use of stop_machine() in try_offline_node() and adds comments to try_offline_node() and remove_memory() that lock_device_hotplug() is required. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-09-11 15:57:41 -07:00
Toshi Kani	27356f54c8	mm/hotplug: verify hotplug memory range add_memory() and remove_memory() can only handle a memory range aligned with section. There are problems when an unaligned range is added and then deleted as follows: - add_memory() with an unaligned range succeeds, but __add_pages() called from add_memory() adds a whole section of pages even though a given memory range is less than the section size. - remove_memory() to the added unaligned range hits BUG_ON() in __remove_pages(). This patch changes add_memory() and remove_memory() to check if a given memory range is aligned with section at the beginning. As the result, add_memory() fails with -EINVAL when a given range is unaligned, and does not add such memory range. This prevents remove_memory() to be called with an unaligned range as well. Note that remove_memory() has to use BUG_ON() since this function cannot fail. [akpm@linux-foundation.org: avoid printk warnings] Signed-off-by: Toshi Kani <toshi.kani@hp.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-09-11 15:57:39 -07:00

1 2 3 4 5