If the caller has set __GFP_NOWARN don't print the following message:
vmap allocation for size 15736832 failed: use vmalloc=<size> to increase
size.
This can happen with the ARM/Linux or ARM64/Linux module loader built
with CONFIG_ARM{,64}_MODULE_PLTS=y which does a first attempt at loading
a large module from module space, then falls back to vmalloc space.
Change-Id: Ib907156055959e22a419b79fb424772baea556d0
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Git-Commit: 03497d761c55438144fd63534d4223418fdfd345
Git-Repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
This size is the maximum amount of virtual address space we gather
up before attempting to purge with a TLB flush. It is 128M in most cases.
With repeated and high size vmalloc operations, it may easily generate
more fragments. This is wasting limited vmalloc area, for 32bits.
So make it configable and the default multiplier as 8, 32bits only.
Change-Id: I68a75acb16d3cff05f8b13c05ae78922269e219f
Signed-off-by: Zhenhua Huang <zhenhuah@codeaurora.org>
* refs/heads/tmp-13962260
Linux 4.4.146
scsi: sg: fix minor memory leak in error path
crypto: padlock-aes - Fix Nano workaround data corruption
kvm: x86: vmx: fix vpid leak
virtio_balloon: fix another race between migration and ballooning
net: socket: fix potential spectre v1 gadget in socketcall
can: ems_usb: Fix memory leak on ems_usb_disconnect()
squashfs: more metadata hardenings
squashfs: more metadata hardening
netlink: Fix spectre v1 gadget in netlink_create()
net: dsa: Do not suspend/resume closed slave_dev
inet: frag: enforce memory limits earlier
tcp: add one more quick ack after after ECN events
tcp: refactor tcp_ecn_check_ce to remove sk type cast
tcp: do not aggressively quick ack after ECN events
tcp: add max_quickacks param to tcp_incr_quickack and tcp_enter_quickack_mode
tcp: do not force quickack when receiving out-of-order packets
NET: stmmac: align DMA stuff to largest cache line length
xen-netfront: wait xenbus state change when load module manually
net: lan78xx: fix rx handling before first packet is send
net: fix amd-xgbe flow-control issue
ipv4: remove BUG_ON() from fib_compute_spec_dst
ASoC: pxa: Fix module autoload for platform drivers
dmaengine: pxa_dma: remove duplicate const qualifier
ext4: check for allocation block validity with block group locked
ext4: fix inline data updates with checksums enabled
squashfs: be more careful about metadata corruption
random: mix rdrand with entropy sent in from userspace
drm: Add DP PSR2 sink enable bit
media: si470x: fix __be16 annotations
scsi: megaraid_sas: Increase timeout by 1 sec for non-RAID fastpath IOs
scsi: scsi_dh: replace too broad "TP9" string with the exact models
media: omap3isp: fix unbalanced dma_iommu_mapping
crypto: authenc - don't leak pointers to authenc keys
crypto: authencesn - don't leak pointers to authenc keys
usb: hub: Don't wait for connect state at resume for powered-off ports
microblaze: Fix simpleImage format generation
audit: allow not equal op for audit by executable
rsi: Fix 'invalid vdd' warning in mmc
ipconfig: Correctly initialise ic_nameservers
drm/gma500: fix psb_intel_lvds_mode_valid()'s return type
memory: tegra: Apply interrupts mask per SoC
memory: tegra: Do not handle spurious interrupts
ALSA: hda/ca0132: fix build failure when a local macro is defined
drm/atomic: Handling the case when setting old crtc for plane
media: siano: get rid of __le32/__le16 cast warnings
bpf: fix references to free_bpf_prog_info() in comments
thermal: exynos: fix setting rising_threshold for Exynos5433
scsi: megaraid: silence a static checker bug
scsi: 3w-xxxx: fix a missing-check bug
scsi: 3w-9xxx: fix a missing-check bug
perf: fix invalid bit in diagnostic entry
s390/cpum_sf: Add data entry sizes to sampling trailer entry
brcmfmac: Add support for bcm43364 wireless chipset
mtd: rawnand: fsl_ifc: fix FSL NAND driver to read all ONFI parameter pages
media: saa7164: Fix driver name in debug output
libata: Fix command retry decision
media: rcar_jpu: Add missing clk_disable_unprepare() on error in jpu_open()
dma-iommu: Fix compilation when !CONFIG_IOMMU_DMA
tty: Fix data race in tty_insert_flip_string_fixed_flag
HID: i2c-hid: check if device is there before really probing
powerpc/embedded6xx/hlwd-pic: Prevent interrupts from being handled by Starlet
drm/radeon: fix mode_valid's return type
HID: hid-plantronics: Re-resend Update to map button for PTT products
ALSA: usb-audio: Apply rate limit to warning messages in URB complete callback
media: smiapp: fix timeout checking in smiapp_read_nvm
md: fix NULL dereference of mddev->pers in remove_and_add_spares()
regulator: pfuze100: add .is_enable() for pfuze100_swb_regulator_ops
ALSA: emu10k1: Rate-limit error messages about page errors
scsi: ufs: fix exception event handling
mwifiex: correct histogram data with appropriate index
PCI: pciehp: Request control of native hotplug only if supported
pinctrl: at91-pio4: add missing of_node_put
powerpc/8xx: fix invalid register expression in head_8xx.S
powerpc/powermac: Mark variable x as unused
powerpc/powermac: Add missing prototype for note_bootable_part()
powerpc/chrp/time: Make some functions static, add missing header include
powerpc/32: Add a missing include header
ath: Add regulatory mapping for Bahamas
ath: Add regulatory mapping for Bermuda
ath: Add regulatory mapping for Serbia
ath: Add regulatory mapping for Tanzania
ath: Add regulatory mapping for Uganda
ath: Add regulatory mapping for APL2_FCCA
ath: Add regulatory mapping for APL13_WORLD
ath: Add regulatory mapping for ETSI8_WORLD
ath: Add regulatory mapping for FCC3_ETSIC
PCI: Prevent sysfs disable of device while driver is attached
btrfs: qgroup: Finish rescan when hit the last leaf of extent tree
btrfs: add barriers to btrfs_sync_log before log_commit_wait wakeups
media: videobuf2-core: don't call memop 'finish' when queueing
wlcore: sdio: check for valid platform device data before suspend
mwifiex: handle race during mwifiex_usb_disconnect
mfd: cros_ec: Fail early if we cannot identify the EC
ASoC: dpcm: fix BE dai not hw_free and shutdown
Bluetooth: btusb: Add a new Realtek 8723DE ID 2ff8:b011
Bluetooth: hci_qca: Fix "Sleep inside atomic section" warning
iwlwifi: pcie: fix race in Rx buffer allocator
perf/x86/intel/uncore: Correct fixed counter index check for NHM
perf/x86/intel/uncore: Correct fixed counter index check in generic code
usbip: usbip_detach: Fix memory, udev context and udev leak
f2fs: fix to don't trigger writeback during recovery
disable loading f2fs module on PAGE_SIZE > 4KB
RDMA/mad: Convert BUG_ONs to error flows
powerpc/64s: Fix compiler store ordering to SLB shadow area
hvc_opal: don't set tb_ticks_per_usec in udbg_init_opal_common()
infiniband: fix a possible use-after-free bug
netfilter: ipset: List timing out entries with "timeout 1" instead of zero
rtc: ensure rtc_set_alarm fails when alarms are not supported
mm/slub.c: add __printf verification to slab_err()
mm: vmalloc: avoid racy handling of debugobjects in vunmap
nfsd: fix potential use-after-free in nfsd4_decode_getdeviceinfo
ALSA: fm801: add error handling for snd_ctl_add
ALSA: emu10k1: add error handling for snd_ctl_add
xen/netfront: raise max number of slots in xennet_get_responses()
tracing: Quiet gcc warning about maybe unused link variable
tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure
tracing: Fix possible double free in event_enable_trigger_func()
tracing: Fix double free of event_trigger_data
Input: elan_i2c - add another ACPI ID for Lenovo Ideapad 330-15AST
Input: i8042 - add Lenovo LaVie Z to the i8042 reset list
Input: elan_i2c - add ACPI ID for lenovo ideapad 330
MIPS: Fix off-by-one in pci_resource_to_user()
kernel/sys.c: fix merge error with 4.4.144
Conflicts:
drivers/scsi/ufs/ufshcd.c
include/net/tcp.h
net/socket.c
Change-Id: Ie84fdcf54b0a45508f76ef56330291f54e35ed30
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAltoWioACgkQONu9yGCS
aT6YrQ//d8dWKaNZK08Z/l2ZqRS56wlNTJyHIB81p1uM2PuPHfLjsZzLQ+HnZ3Ha
G+fedEj3sbwJp8i61TRu9Q1p/PyLWsnaryWZaK3gm4Yo8GrdVbXAY47EHwz3fbUK
yxrC0+zQmIlyZgwzbUNGspDuAdNt2MFDug97RFF8BdhJd84Rv0BbicGMwKJQFfFN
g0Tv6yB+8cjmnCMjmLreLyi+puWvXZtZXAi+idl9eTC4ysGDKNvO1ERptv2NC5C6
171cbsS/ngpY5ZIUcmLy0QPPFh/ZCeoft22R3gOxZDkjT4Ro6lY5ubPKDEcn57Hv
FSV5fuQ3cBtmsODn7LMIWqLDKuCRM/gTmvXrWxM91JDLSsuAdZWATj8k4iIoocmk
l/3iOixBMFCGToQ1I2/O33QZOssKoDIz4bpG6+HM/Cj4anSnVZKjouJSTlNZr/3i
ZJOXpu/MpQItc/RGo/PumzJLkXhS+HyGwPbTIOPy29NMqp+xvjZv4DttuJbqyHJ2
Pm/OZcvU7z1wSMhcTknvZLLMQVRIICQjfPJNDefqAdrCdd233cRo37cU8egg4A0l
F3q+ZI/ny01YWQP8KrCJyWB5lHHbEc44wUHCxet0TPZ1qaqvVcXzaWhwxP2H0L3I
7r2u9bDg15ielw3jhPpRWZMvANbQlToNoj6YROqj5ArcIowcBPc=
=7/iL
-----END PGP SIGNATURE-----
Merge 4.4.146 into android-4.4
Changes in 4.4.146
MIPS: Fix off-by-one in pci_resource_to_user()
Input: elan_i2c - add ACPI ID for lenovo ideapad 330
Input: i8042 - add Lenovo LaVie Z to the i8042 reset list
Input: elan_i2c - add another ACPI ID for Lenovo Ideapad 330-15AST
tracing: Fix double free of event_trigger_data
tracing: Fix possible double free in event_enable_trigger_func()
tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure
tracing: Quiet gcc warning about maybe unused link variable
xen/netfront: raise max number of slots in xennet_get_responses()
ALSA: emu10k1: add error handling for snd_ctl_add
ALSA: fm801: add error handling for snd_ctl_add
nfsd: fix potential use-after-free in nfsd4_decode_getdeviceinfo
mm: vmalloc: avoid racy handling of debugobjects in vunmap
mm/slub.c: add __printf verification to slab_err()
rtc: ensure rtc_set_alarm fails when alarms are not supported
netfilter: ipset: List timing out entries with "timeout 1" instead of zero
infiniband: fix a possible use-after-free bug
hvc_opal: don't set tb_ticks_per_usec in udbg_init_opal_common()
powerpc/64s: Fix compiler store ordering to SLB shadow area
RDMA/mad: Convert BUG_ONs to error flows
disable loading f2fs module on PAGE_SIZE > 4KB
f2fs: fix to don't trigger writeback during recovery
usbip: usbip_detach: Fix memory, udev context and udev leak
perf/x86/intel/uncore: Correct fixed counter index check in generic code
perf/x86/intel/uncore: Correct fixed counter index check for NHM
iwlwifi: pcie: fix race in Rx buffer allocator
Bluetooth: hci_qca: Fix "Sleep inside atomic section" warning
Bluetooth: btusb: Add a new Realtek 8723DE ID 2ff8:b011
ASoC: dpcm: fix BE dai not hw_free and shutdown
mfd: cros_ec: Fail early if we cannot identify the EC
mwifiex: handle race during mwifiex_usb_disconnect
wlcore: sdio: check for valid platform device data before suspend
media: videobuf2-core: don't call memop 'finish' when queueing
btrfs: add barriers to btrfs_sync_log before log_commit_wait wakeups
btrfs: qgroup: Finish rescan when hit the last leaf of extent tree
PCI: Prevent sysfs disable of device while driver is attached
ath: Add regulatory mapping for FCC3_ETSIC
ath: Add regulatory mapping for ETSI8_WORLD
ath: Add regulatory mapping for APL13_WORLD
ath: Add regulatory mapping for APL2_FCCA
ath: Add regulatory mapping for Uganda
ath: Add regulatory mapping for Tanzania
ath: Add regulatory mapping for Serbia
ath: Add regulatory mapping for Bermuda
ath: Add regulatory mapping for Bahamas
powerpc/32: Add a missing include header
powerpc/chrp/time: Make some functions static, add missing header include
powerpc/powermac: Add missing prototype for note_bootable_part()
powerpc/powermac: Mark variable x as unused
powerpc/8xx: fix invalid register expression in head_8xx.S
pinctrl: at91-pio4: add missing of_node_put
PCI: pciehp: Request control of native hotplug only if supported
mwifiex: correct histogram data with appropriate index
scsi: ufs: fix exception event handling
ALSA: emu10k1: Rate-limit error messages about page errors
regulator: pfuze100: add .is_enable() for pfuze100_swb_regulator_ops
md: fix NULL dereference of mddev->pers in remove_and_add_spares()
media: smiapp: fix timeout checking in smiapp_read_nvm
ALSA: usb-audio: Apply rate limit to warning messages in URB complete callback
HID: hid-plantronics: Re-resend Update to map button for PTT products
drm/radeon: fix mode_valid's return type
powerpc/embedded6xx/hlwd-pic: Prevent interrupts from being handled by Starlet
HID: i2c-hid: check if device is there before really probing
tty: Fix data race in tty_insert_flip_string_fixed_flag
dma-iommu: Fix compilation when !CONFIG_IOMMU_DMA
media: rcar_jpu: Add missing clk_disable_unprepare() on error in jpu_open()
libata: Fix command retry decision
media: saa7164: Fix driver name in debug output
mtd: rawnand: fsl_ifc: fix FSL NAND driver to read all ONFI parameter pages
brcmfmac: Add support for bcm43364 wireless chipset
s390/cpum_sf: Add data entry sizes to sampling trailer entry
perf: fix invalid bit in diagnostic entry
scsi: 3w-9xxx: fix a missing-check bug
scsi: 3w-xxxx: fix a missing-check bug
scsi: megaraid: silence a static checker bug
thermal: exynos: fix setting rising_threshold for Exynos5433
bpf: fix references to free_bpf_prog_info() in comments
media: siano: get rid of __le32/__le16 cast warnings
drm/atomic: Handling the case when setting old crtc for plane
ALSA: hda/ca0132: fix build failure when a local macro is defined
memory: tegra: Do not handle spurious interrupts
memory: tegra: Apply interrupts mask per SoC
drm/gma500: fix psb_intel_lvds_mode_valid()'s return type
ipconfig: Correctly initialise ic_nameservers
rsi: Fix 'invalid vdd' warning in mmc
audit: allow not equal op for audit by executable
microblaze: Fix simpleImage format generation
usb: hub: Don't wait for connect state at resume for powered-off ports
crypto: authencesn - don't leak pointers to authenc keys
crypto: authenc - don't leak pointers to authenc keys
media: omap3isp: fix unbalanced dma_iommu_mapping
scsi: scsi_dh: replace too broad "TP9" string with the exact models
scsi: megaraid_sas: Increase timeout by 1 sec for non-RAID fastpath IOs
media: si470x: fix __be16 annotations
drm: Add DP PSR2 sink enable bit
random: mix rdrand with entropy sent in from userspace
squashfs: be more careful about metadata corruption
ext4: fix inline data updates with checksums enabled
ext4: check for allocation block validity with block group locked
dmaengine: pxa_dma: remove duplicate const qualifier
ASoC: pxa: Fix module autoload for platform drivers
ipv4: remove BUG_ON() from fib_compute_spec_dst
net: fix amd-xgbe flow-control issue
net: lan78xx: fix rx handling before first packet is send
xen-netfront: wait xenbus state change when load module manually
NET: stmmac: align DMA stuff to largest cache line length
tcp: do not force quickack when receiving out-of-order packets
tcp: add max_quickacks param to tcp_incr_quickack and tcp_enter_quickack_mode
tcp: do not aggressively quick ack after ECN events
tcp: refactor tcp_ecn_check_ce to remove sk type cast
tcp: add one more quick ack after after ECN events
inet: frag: enforce memory limits earlier
net: dsa: Do not suspend/resume closed slave_dev
netlink: Fix spectre v1 gadget in netlink_create()
squashfs: more metadata hardening
squashfs: more metadata hardenings
can: ems_usb: Fix memory leak on ems_usb_disconnect()
net: socket: fix potential spectre v1 gadget in socketcall
virtio_balloon: fix another race between migration and ballooning
kvm: x86: vmx: fix vpid leak
crypto: padlock-aes - Fix Nano workaround data corruption
scsi: sg: fix minor memory leak in error path
Linux 4.4.146
Change-Id: Ia7e43a90d0f5603c741811436b8de41884cb2851
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit f3c01d2f3ade6790db67f80fef60df84424f8964 ]
Currently, __vunmap flow is,
1) Release the VM area
2) Free the debug objects corresponding to that vm area.
This leave some race window open.
1) Release the VM area
1.5) Some other client gets the same vm area
1.6) This client allocates new debug objects on the same
vm area
2) Free the debug objects corresponding to this vm area.
Here, we actually free 'other' client's debug objects.
Fix this by freeing the debug objects first and then releasing the VM
area.
Link: http://lkml.kernel.org/r/1523961828-9485-2-git-send-email-cpandya@codeaurora.org
Signed-off-by: Chintan Pandya <cpandya@codeaurora.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* refs/heads/tmp-2fea039
Linux 4.4.106
usb: gadget: ffs: Forbid usb_ep_alloc_request from sleeping
arm: KVM: Fix VTTBR_BADDR_MASK BUG_ON off-by-one
Revert "x86/mm/pat: Ensure cpa->pfn only contains page frame numbers"
Revert "x86/efi: Hoist page table switching code into efi_call_virt()"
Revert "x86/efi: Build our own page table structures"
net/packet: fix a race in packet_bind() and packet_notifier()
packet: fix crash in fanout_demux_rollover()
sit: update frag_off info
rds: Fix NULL pointer dereference in __rds_rdma_map
tipc: fix memory leak in tipc_accept_from_sock()
more bio_map_user_iov() leak fixes
s390: always save and restore all registers on context switch
ipmi: Stop timers before cleaning up the module
audit: ensure that 'audit=1' actually enables audit for PID 1
ipvlan: fix ipv6 outbound device
afs: Connect up the CB.ProbeUuid
IB/mlx5: Assign send CQ and recv CQ of UMR QP
IB/mlx4: Increase maximal message size under UD QP
xfrm: Copy policy family in clone_policy
jump_label: Invoke jump_label_test() via early_initcall()
atm: horizon: Fix irq release error
sctp: use the right sk after waking up from wait_buf sleep
sctp: do not free asoc when it is already dead in sctp_sendmsg
sparc64/mm: set fields in deferred pages
block: wake up all tasks blocked in get_request()
sunrpc: Fix rpc_task_begin trace point
NFS: Fix a typo in nfs_rename()
dynamic-debug-howto: fix optional/omitted ending line number to be LARGE instead of 0
lib/genalloc.c: make the avail variable an atomic_long_t
route: update fnhe_expires for redirect when the fnhe exists
route: also update fnhe_genid when updating a route cache
mac80211_hwsim: Fix memory leak in hwsim_new_radio_nl()
kbuild: pkg: use --transform option to prefix paths in tar
EDAC, i5000, i5400: Fix definition of NRECMEMB register
EDAC, i5000, i5400: Fix use of MTR_DRAM_WIDTH macro
powerpc/powernv/ioda2: Gracefully fail if too many TCE levels requested
drm/amd/amdgpu: fix console deadlock if late init failed
axonram: Fix gendisk handling
netfilter: don't track fragmented packets
zram: set physical queue limits to avoid array out of bounds accesses
i2c: riic: fix restart condition
crypto: s5p-sss - Fix completing crypto request in IRQ handler
ipv6: reorder icmpv6_init() and ip6_mr_init()
bnx2x: do not rollback VF MAC/VLAN filters we did not configure
bnx2x: fix possible overrun of VFPF multicast addresses array
bnx2x: prevent crash when accessing PTP with interface down
spi_ks8995: fix "BUG: key accdaa28 not in .data!"
arm64: KVM: Survive unknown traps from guests
arm: KVM: Survive unknown traps from guests
KVM: nVMX: reset nested_run_pending if the vCPU is going to be reset
irqchip/crossbar: Fix incorrect type of register size
scsi: lpfc: Fix crash during Hardware error recovery on SLI3 adapters
workqueue: trigger WARN if queue_delayed_work() is called with NULL @wq
libata: drop WARN from protocol error in ata_sff_qc_issue()
kvm: nVMX: VMCLEAR should not cause the vCPU to shut down
USB: gadgetfs: Fix a potential memory leak in 'dev_config()'
usb: gadget: configs: plug memory leak
HID: chicony: Add support for another ASUS Zen AiO keyboard
gpio: altera: Use handle_level_irq when configured as a level_high
ARM: OMAP2+: Release device node after it is no longer needed.
ARM: OMAP2+: Fix device node reference counts
module: set __jump_table alignment to 8
selftest/powerpc: Fix false failures for skipped tests
x86/hpet: Prevent might sleep splat on resume
ARM: OMAP2+: gpmc-onenand: propagate error on initialization failure
vti6: Don't report path MTU below IPV6_MIN_MTU.
Revert "s390/kbuild: enable modversions for symbols exported from asm"
Revert "spi: SPI_FSL_DSPI should depend on HAS_DMA"
Revert "drm/armada: Fix compile fail"
mm: drop unused pmdp_huge_get_and_clear_notify()
thp: fix MADV_DONTNEED vs. numa balancing race
thp: reduce indentation level in change_huge_pmd()
scsi: storvsc: Workaround for virtual DVD SCSI version
ARM: avoid faulting on qemu
ARM: BUG if jumping to usermode address in kernel mode
arm64: fpsimd: Prevent registers leaking from dead tasks
KVM: VMX: remove I/O port 0x80 bypass on Intel hosts
arm64: KVM: fix VTTBR_BADDR_MASK BUG_ON off-by-one
media: dvb: i2c transfers over usb cannot be done from stack
drm/exynos: gem: Drop NONCONTIG flag for buffers allocated without IOMMU
drm: extra printk() wrapper macros
kdb: Fix handling of kallsyms_symbol_next() return value
s390: fix compat system call table
iommu/vt-d: Fix scatterlist offset handling
ALSA: usb-audio: Add check return value for usb_string()
ALSA: usb-audio: Fix out-of-bound error
ALSA: seq: Remove spurious WARN_ON() at timer check
ALSA: pcm: prevent UAF in snd_pcm_info
x86/PCI: Make broadcom_postcore_init() check acpi_disabled
X.509: reject invalid BIT STRING for subjectPublicKey
ASN.1: check for error from ASN1_OP_END__ACT actions
ASN.1: fix out-of-bounds read when parsing indefinite length item
efi: Move some sysfs files to be read-only by root
scsi: libsas: align sata_device's rps_resp on a cacheline
isa: Prevent NULL dereference in isa_bus driver callbacks
hv: kvp: Avoid reading past allocated blocks from KVP file
virtio: release virtio index when fail to device_register
can: usb_8dev: cancel urb on -EPIPE and -EPROTO
can: esd_usb2: cancel urb on -EPIPE and -EPROTO
can: ems_usb: cancel urb on -EPIPE and -EPROTO
can: kvaser_usb: cancel urb on -EPIPE and -EPROTO
can: kvaser_usb: ratelimit errors if incomplete messages are received
can: kvaser_usb: Fix comparison bug in kvaser_usb_read_bulk_callback()
can: kvaser_usb: free buf in error paths
can: ti_hecc: Fix napi poll return value for repoll
BACKPORT: irq: Make the irqentry text section unconditional
UPSTREAM: arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections
UPSTREAM: x86, kasan, ftrace: Put APIC interrupt handlers into .irqentry.text
UPSTREAM: kasan: make get_wild_bug_type() static
UPSTREAM: kasan: separate report parts by empty lines
UPSTREAM: kasan: improve double-free report format
UPSTREAM: kasan: print page description after stacks
UPSTREAM: kasan: improve slab object description
UPSTREAM: kasan: change report header
UPSTREAM: kasan: simplify address description logic
UPSTREAM: kasan: change allocation and freeing stack traces headers
UPSTREAM: kasan: unify report headers
UPSTREAM: kasan: introduce helper functions for determining bug type
BACKPORT: kasan: report only the first error by default
UPSTREAM: kasan: fix races in quarantine_remove_cache()
UPSTREAM: kasan: resched in quarantine_remove_cache()
BACKPORT: kasan, sched/headers: Uninline kasan_enable/disable_current()
BACKPORT: kasan: drain quarantine of memcg slab objects
UPSTREAM: kasan: eliminate long stalls during quarantine reduction
UPSTREAM: kasan: support panic_on_warn
UPSTREAM: x86/suspend: fix false positive KASAN warning on suspend/resume
UPSTREAM: kasan: support use-after-scope detection
UPSTREAM: kasan/tests: add tests for user memory access functions
UPSTREAM: mm, kasan: add a ksize() test
UPSTREAM: kasan: test fix: warn if the UAF could not be detected in kmalloc_uaf2
UPSTREAM: kasan: modify kmalloc_large_oob_right(), add kmalloc_pagealloc_oob_right()
UPSTREAM: lib/stackdepot: export save/fetch stack for drivers
UPSTREAM: lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MB
BACKPORT: kprobes: Unpoison stack in jprobe_return() for KASAN
UPSTREAM: kasan: remove the unnecessary WARN_ONCE from quarantine.c
UPSTREAM: kasan: avoid overflowing quarantine size on low memory systems
UPSTREAM: kasan: improve double-free reports
BACKPORT: mm: coalesce split strings
BACKPORT: mm/kasan: get rid of ->state in struct kasan_alloc_meta
UPSTREAM: mm/kasan: get rid of ->alloc_size in struct kasan_alloc_meta
UPSTREAM: mm: kasan: remove unused 'reserved' field from struct kasan_alloc_meta
UPSTREAM: mm/kasan, slub: don't disable interrupts when object leaves quarantine
UPSTREAM: mm/kasan: don't reduce quarantine in atomic contexts
UPSTREAM: mm/kasan: fix corruptions and false positive reports
UPSTREAM: lib/stackdepot.c: use __GFP_NOWARN for stack allocations
BACKPORT: mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB
UPSTREAM: kasan/quarantine: fix bugs on qlist_move_cache()
UPSTREAM: mm: mempool: kasan: don't poot mempool objects in quarantine
UPSTREAM: kasan: change memory hot-add error messages to info messages
BACKPORT: mm/kasan: add API to check memory regions
UPSTREAM: mm/kasan: print name of mem[set,cpy,move]() caller in report
UPSTREAM: mm: kasan: initial memory quarantine implementation
UPSTREAM: lib/stackdepot: avoid to return 0 handle
UPSTREAM: lib/stackdepot.c: allow the stack trace hash to be zero
UPSTREAM: mm, kasan: fix compilation for CONFIG_SLAB
BACKPORT: mm, kasan: stackdepot implementation. Enable stackdepot for SLAB
BACKPORT: mm, kasan: add GFP flags to KASAN API
UPSTREAM: mm, kasan: SLAB support
UPSTREAM: mm/slab: align cache size first before determination of OFF_SLAB candidate
UPSTREAM: mm/slab: use more appropriate condition check for debug_pagealloc
UPSTREAM: mm/slab: factor out debugging initialization in cache_init_objs()
UPSTREAM: mm/slab: remove object status buffer for DEBUG_SLAB_LEAK
UPSTREAM: mm/slab: alternative implementation for DEBUG_SLAB_LEAK
UPSTREAM: mm/slab: clean up DEBUG_PAGEALLOC processing code
UPSTREAM: mm/slab: activate debug_pagealloc in SLAB when it is actually enabled
sched: EAS/WALT: Don't take into account of running task's util
BACKPORT: schedutil: Reset cached freq if it is not in sync with next_freq
UPSTREAM: kasan: add functions to clear stack poison
Conflicts:
arch/arm/include/asm/kvm_arm.h
arch/arm64/kernel/vmlinux.lds.S
include/linux/kasan.h
kernel/softirq.c
lib/Kconfig
lib/Kconfig.kasan
lib/Makefile
lib/stackdepot.c
mm/kasan/kasan.c
sound/usb/mixer.c
Change-Id: If70ced6da5f19be3dd92d10a8d8cd4d5841e5870
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
Kernel style prefers a single string over split strings when the string is
'user-visible'.
Miscellanea:
- Add a missing newline
- Realign arguments
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Tejun Heo <tj@kernel.org> [percpu]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 64145065
(cherry-picked from 756a025f00091918d9d09ca3229defb160b409c0)
Change-Id: I377fb1542980c15d2f306924656227ad17b02b5e
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Use cond_resched_lock to avoid holding the vmap_area_lock for a
potentially long time and thus creating bad latencies for various
workloads.
Change-Id: I36eb4d8dbd6604f52e5c463373a9754847a44bc2
[hch: split from a larger patch by Joel, wrote the crappy changelog]
Link: http://lkml.kernel.org/r/1479474236-4139-11-git-send-email-hch@lst.de
Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Jisheng Zhang <jszhang@marvell.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Dias <joaodias@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 763b218ddfaf56761c19923beb7e16656f66ec62
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
The purge_lock spinlock causes high latencies with non RT kernel. This
has been reported multiple times on lkml [1] [2] and affects
applications like audio.
This patch replaces it with a mutex to allow preemption while holding
the lock.
Thanks to Joel Fernandes for the detailed report and analysis as well as
an earlier attempt at fixing this issue.
[1] http://lists.openwall.net/linux-kernel/2016/03/23/29
[2] https://lkml.org/lkml/2016/10/9/59
Change-Id: I57d4e9c7ce5aeb3273574026da2a8b737ef0b809
Link: http://lkml.kernel.org/r/1479474236-4139-10-git-send-email-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Jisheng Zhang <jszhang@marvell.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Dias <joaodias@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: f9e09977671b618aeb25ddc0d4c9a84d5b5cde9d
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
We will take a sleeping lock in later in this series, so this adds the
proper safeguards.
Change-Id: Iba7efcb690ad584a30ac31cfb7937889bab44e2e
Link: http://lkml.kernel.org/r/1479474236-4139-9-git-send-email-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Jisheng Zhang <jszhang@marvell.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Dias <joaodias@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 5803ed292e63a1bf00722d6655d0229794607183
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
We are going to use sleeping lock for freeing vmap. However some
vfree() users want to free memory from atomic (but not from interrupt)
context. For this we add vfree_atomic() - deferred variation of vfree()
which can be used in any atomic context (except NMIs).
[akpm@linux-foundation.org: tweak comment grammar]
[aryabinin@virtuozzo.com: use raw_cpu_ptr() instead of this_cpu_ptr()]
Link: http://lkml.kernel.org/r/1481553981-3856-1-git-send-email-aryabinin@virtuozzo.com
Link: http://lkml.kernel.org/r/1479474236-4139-5-git-send-email-hch@lst.de
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Jisheng Zhang <jszhang@marvell.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Dias <joaodias@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: bf22e37a641327e34681b7b6959d9646e3886770
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Change-Id: I5f67e939774da6e811f3a5180a6b0f5d31fbe32b
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Move the purge_lock synchronization to the callers, move the call to
purge_fragmented_blocks_allcpus at the beginning of the function to the
callers that need it, move the force_flush behavior to the caller that
needs it, and pass start and end by value instead of by reference.
No change in behavior.
Change-Id: I6344f3c1de50e6fe939e886edeca610d6b539365
Link: http://lkml.kernel.org/r/1479474236-4139-4-git-send-email-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Jisheng Zhang <jszhang@marvell.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Dias <joaodias@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 0574ecd141df28d573d4364adec59766ddf5f38d
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Just inline it into the only caller.
Change-Id: I5691805a6cec3e9e160b653551a99c6c998ff087
Link: http://lkml.kernel.org/r/1479474236-4139-3-git-send-email-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Jisheng Zhang <jszhang@marvell.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Dias <joaodias@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 9c3acf6043ac437ae0a45de4657ee700c3dc8850
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Patch series "reduce latency in __purge_vmap_area_lazy", v2.
This patch (of 10):
Sort out the long lock hold times in __purge_vmap_area_lazy. It is
based on a patch from Joel.
Inline free_unmap_vmap_area_noflush() it into the only caller.
Change-Id: I1cb90a5f4e14bae7b513da9cc672f2f8d06bfcfd
Link: http://lkml.kernel.org/r/1479474236-4139-2-git-send-email-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Jisheng Zhang <jszhang@marvell.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Dias <joaodias@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: c8eef01e2f98e09a6733f2acdc675b4cf87a22a1
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
When mixing lots of vmallocs and set_memory_*() (which calls
vm_unmap_aliases()) I encountered situations where the performance
degraded severely due to the walking of the entire vmap_area list each
invocation.
One simple improvement is to add the lazily freed vmap_area to a
separate lockless free list, such that we then avoid having to walk the
full list on each purge.
Change-Id: I489700962fc86d539a68b5af489dfa9da04dfaad
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Roman Pen <r.peniaev@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Roman Pen <r.peniaev@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 80c4bd7a5e4368b680e0aeb57050a1b06eb573d8
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
is_vmalloc_addr currently assumes that all vmalloc addresses
exist between VMALLOC_START and VMALLOC_END. This may not be
the case when interleaving vmalloc and lowmem. Update the
is_vmalloc_addr to properly check for this.
Correspondingly we need to ensure that VMALLOC_TOTAL accounts
for all the vmalloc regions when CONFIG_ENABLE_VMALLOC_SAVING
is enabled.
Change-Id: I5def3d6ae1a4de59ea36f095b8c73649a37b1f36
Signed-off-by: Susheel Khiani <skhiani@codeaurora.org>
Currently on 32 bit systems, virtual space above
PAGE_OFFSET is reserved for direct mapped lowmem
and part of virtual address space is reserved for
vmalloc. We want to optimize such as to have as
much direct mapped memory as possible since there is
penalty for mapping/unmapping highmem. Now, we may
have an image that is expected to have a lifetime of
the entire system and is reserved in physical region
that would be part of direct mapped lowmem. The
physical memory which is thus reserved is never used
by Linux. This means that even though the system is
not actually accessing the virtual memory
corresponding to the reserved physical memory, we
are still losing that portion of direct mapped lowmem
space.
So by allowing lowmem to be non contiguous we can
give this unused virtual address space of reserved
region back for use in vmalloc.
Change-Id: I980b3dfafac71884dcdcb8cd2e4a6363cde5746a
Signed-off-by: Susheel Khiani <skhiani@codeaurora.org>
Even though lowmem is accounted for in vmalloc space,
allocation comes only from the region bounded by
VMALLOC_START and VMALLOC_END. The kernel virtual area
can now allocate from any unmapped region starting
from PAGE_OFFSET.
Change-Id: I291b9eb443d3f7445fd979bd7b09e9241ff22ba3
Signed-off-by: Neeti Desai <neetid@codeaurora.org>
Signed-off-by: Susheel Khiani <skhiani@codeaurora.org>
Commit 71394fe501 ("mm: vmalloc: add flag preventing guard hole
allocation") missed a spot. Currently remove_vm_area() decreases vm->size
to "remove" the guard hole page, even when it isn't present. All but one
users just free the vm_struct rigth away and never access vm->size anyway.
Don't touch the size in remove_vm_area() and have __vunmap() use the
proper get_vm_area_size() helper.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew stated the following
We have quite a history of remote parts of the kernel using
weird/wrong/inexplicable combinations of __GFP_ flags. I tend
to think that this is because we didn't adequately explain the
interface.
And I don't think that gfp.h really improved much in this area as
a result of this patchset. Could you go through it some time and
decide if we've adequately documented all this stuff?
This patches first moves some GFP flag combinations that are part of the MM
internals to mm/internal.h. The rest of the patch documents the __GFP_FOO
bits under various headings and then documents the flag combinations. It
will not help callers that are brain damaged but the clarity might motivate
some fixes and avoid future mistakes.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts. They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve". __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".
Over time, callers had a requirement to not block when fallback options
were available. Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.
This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative. High priority users continue to use
__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim. __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.
This patch then converts a number of sites
o __GFP_ATOMIC is used by callers that are high priority and have memory
pools for those requests. GFP_ATOMIC uses this flag.
o Callers that have a limited mempool to guarantee forward progress clear
__GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
into this category where kswapd will still be woken but atomic reserves
are not used as there is a one-entry mempool to guarantee progress.
o Callers that are checking if they are non-blocking should use the
helper gfpflags_allow_blocking() where possible. This is because
checking for __GFP_WAIT as was done historically now can trigger false
positives. Some exceptions like dm-crypt.c exist where the code intent
is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
flag manipulations.
o Callers that built their own GFP flags instead of starting with GFP_KERNEL
and friends now also need to specify __GFP_KSWAPD_RECLAIM.
The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.
The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL. They may
now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It turns out that at least some versions of glibc end up reading
/proc/meminfo at every single startup, because glibc wants to know the
amount of memory the machine has. And while that's arguably insane,
it's just how things are.
And it turns out that it's not all that expensive most of the time, but
the vmalloc information statistics (amount of virtual memory used in the
vmalloc space, and the biggest remaining chunk) can be rather expensive
to compute.
The 'get_vmalloc_info()' function actually showed up on my profiles as
4% of the CPU usage of "make test" in the git source repository, because
the git tests are lots of very short-lived shell-scripts etc.
It turns out that apparently this same silly vmalloc info gathering
shows up on the facebook servers too, according to Dave Jones. So it's
not just "make test" for git.
We had two patches to just cache the information (one by me, one by
Ingo) to mitigate this issue, but the whole vmalloc information of of
rather dubious value to begin with, and people who *actually* want to
know what the situation is wrt the vmalloc area should just look at the
much more complete /proc/vmallocinfo instead.
In fact, according to my testing - and perhaps more importantly,
according to that big search engine in the sky: Google - there is
nothing out there that actually cares about those two expensive fields:
VmallocUsed and VmallocChunk.
So let's try to just remove them entirely. Actually, this just removes
the computation and reports the numbers as zero for now, just to try to
be minimally intrusive.
If this breaks anything, we'll obviously have to re-introduce the code
to compute this all and add the caching patches on top. But if given
the option, I'd really prefer to just remove this bad idea entirely
rather than add even more code to work around our historical mistake
that likely nobody really cares about.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In original implementation of vm_map_ram made by Nick Piggin there were
two bitmaps: alloc_map and dirty_map. None of them were used as supposed
to be: finding a suitable free hole for next allocation in block.
vm_map_ram allocates space sequentially in block and on free call marks
pages as dirty, so freed space can't be reused anymore.
Actually it would be very interesting to know the real meaning of those
bitmaps, maybe implementation was incomplete, etc.
But long time ago Zhang Yanfei removed alloc_map by these two commits:
mm/vmalloc.c: remove dead code in vb_alloc
3fcd76e802
mm/vmalloc.c: remove alloc_map from vmap_block
b8e748b6c3
In this patch I replaced dirty_map with two range variables: dirty min and
max. These variables store minimum and maximum position of dirty space in
a block, since we need only to know the dirty range, not exact position of
dirty pages.
Why it was made? Several reasons: at first glance it seems that
vm_map_ram allocator concerns about fragmentation thus it uses bitmaps for
finding free hole, but it is not true. To avoid complexity seems it is
better to use something simple, like min or max range values. Secondly,
code also becomes simpler, without iteration over bitmap, just comparing
values in min and max macros. Thirdly, bitmap occupies up to 1024 bits
(4MB is a max size of a block). Here I replaced the whole bitmap with two
longs.
Finally vm_unmap_aliases should be slightly faster and the whole
vmap_block structure occupies less memory.
Signed-off-by: Roman Pen <r.peniaev@gmail.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Christoph Lameter <cl@linux.com>
Cc: Gioh Kim <gioh.kim@lge.com>
Cc: Rob Jones <rob.jones@codethink.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Previous implementation allocates new vmap block and repeats search of a
free block from the very beginning, iterating over the CPU free list.
Why it can be better??
1. Allocation can happen on one CPU, but search can be done on another CPU.
In worst case we preallocate amount of vmap blocks which is equal to
CPU number on the system.
2. In previous patch I added newly allocated block to the tail of free list
to avoid soon exhaustion of virtual space and give a chance to occupy
blocks which were allocated long time ago. Thus to find newly allocated
block all the search sequence should be repeated, seems it is not efficient.
In this patch newly allocated block is occupied right away, address of
virtual space is returned to the caller, so there is no any need to repeat
the search sequence, allocation job is done.
Signed-off-by: Roman Pen <r.peniaev@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Christoph Lameter <cl@linux.com>
Cc: Gioh Kim <gioh.kim@lge.com>
Cc: Rob Jones <rob.jones@codethink.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recently I came across high fragmentation of vm_map_ram allocator:
vmap_block has free space, but still new blocks continue to appear.
Further investigation showed that certain mapping/unmapping sequences
can exhaust vmalloc space. On small 32bit systems that's not a big
problem, cause purging will be called soon on a first allocation failure
(alloc_vmap_area), but on 64bit machines, e.g. x86_64 has 45 bits of
vmalloc space, that can be a disaster.
1) I came up with a simple allocation sequence, which exhausts virtual
space very quickly:
while (iters) {
/* Map/unmap big chunk */
vaddr = vm_map_ram(pages, 16, -1, PAGE_KERNEL);
vm_unmap_ram(vaddr, 16);
/* Map/unmap small chunks.
*
* -1 for hole, which should be left at the end of each block
* to keep it partially used, with some free space available */
for (i = 0; i < (VMAP_BBMAP_BITS - 16) / 8 - 1; i++) {
vaddr = vm_map_ram(pages, 8, -1, PAGE_KERNEL);
vm_unmap_ram(vaddr, 8);
}
}
The idea behind is simple:
1. We have to map a big chunk, e.g. 16 pages.
2. Then we have to occupy the remaining space with smaller chunks, i.e.
8 pages. At the end small hole should remain to keep block in free list,
but do not let big chunk to occupy remaining space.
3. Goto 1 - allocation request of 16 pages can't be completed (only 8 slots
are left free in the block in the #2 step), new block will be allocated,
all further requests will lay into newly allocated block.
To have some measurement numbers for all further tests I setup ftrace and
enabled 4 basic calls in a function profile:
echo vm_map_ram > /sys/kernel/debug/tracing/set_ftrace_filter;
echo alloc_vmap_area >> /sys/kernel/debug/tracing/set_ftrace_filter;
echo vm_unmap_ram >> /sys/kernel/debug/tracing/set_ftrace_filter;
echo free_vmap_block >> /sys/kernel/debug/tracing/set_ftrace_filter;
So for this scenario I got these results:
BEFORE (all new blocks are put to the head of a free list)
# cat /sys/kernel/debug/tracing/trace_stat/function0
Function Hit Time Avg s^2
-------- --- ---- --- ---
vm_map_ram 126000 30683.30 us 0.243 us 30819.36 us
vm_unmap_ram 126000 22003.24 us 0.174 us 340.886 us
alloc_vmap_area 1000 4132.065 us 4.132 us 0.903 us
AFTER (all new blocks are put to the tail of a free list)
# cat /sys/kernel/debug/tracing/trace_stat/function0
Function Hit Time Avg s^2
-------- --- ---- --- ---
vm_map_ram 126000 28713.13 us 0.227 us 24944.70 us
vm_unmap_ram 126000 20403.96 us 0.161 us 1429.872 us
alloc_vmap_area 993 3916.795 us 3.944 us 29.370 us
free_vmap_block 992 654.157 us 0.659 us 1.273 us
SUMMARY:
The most interesting numbers in those tables are numbers of block
allocations and deallocations: alloc_vmap_area and free_vmap_block
calls, which show that before the change blocks were not freed, and
virtual space and physical memory (vmap_block structure allocations,
etc) were consumed.
Average time which were spent in vm_map_ram/vm_unmap_ram became slightly
better. That can be explained with a reasonable amount of blocks in a
free list, which we need to iterate to find a suitable free block.
2) Another scenario is a random allocation:
while (iters) {
/* Randomly take number from a range [1..32/64] */
nr = rand(1, VMAP_MAX_ALLOC);
vaddr = vm_map_ram(pages, nr, -1, PAGE_KERNEL);
vm_unmap_ram(vaddr, nr);
}
I chose mersenne twister PRNG to generate persistent random state to
guarantee that both runs have the same random sequence. For each
vm_map_ram call random number from [1..32/64] was taken to represent
amount of pages which I do map.
I did 10'000 vm_map_ram calls and got these two tables:
BEFORE (all new blocks are put to the head of a free list)
# cat /sys/kernel/debug/tracing/trace_stat/function0
Function Hit Time Avg s^2
-------- --- ---- --- ---
vm_map_ram 10000 10170.01 us 1.017 us 993.609 us
vm_unmap_ram 10000 5321.823 us 0.532 us 59.789 us
alloc_vmap_area 420 2150.239 us 5.119 us 3.307 us
free_vmap_block 37 159.587 us 4.313 us 134.344 us
AFTER (all new blocks are put to the tail of a free list)
# cat /sys/kernel/debug/tracing/trace_stat/function0
Function Hit Time Avg s^2
-------- --- ---- --- ---
vm_map_ram 10000 7745.637 us 0.774 us 395.229 us
vm_unmap_ram 10000 5460.573 us 0.546 us 67.187 us
alloc_vmap_area 414 2201.650 us 5.317 us 5.591 us
free_vmap_block 412 574.421 us 1.394 us 15.138 us
SUMMARY:
'BEFORE' table shows, that 420 blocks were allocated and only 37 were
freed. Remained 383 blocks are still in a free list, consuming virtual
space and physical memory.
'AFTER' table shows, that 414 blocks were allocated and 412 were really
freed. 2 blocks remained in a free list.
So fragmentation was dramatically reduced. Why? Because when we put
newly allocated block to the head, all further requests will occupy new
block, regardless remained space in other blocks. In this scenario all
requests come randomly. Eventually remained free space will be less
than requested size, free list will be iterated and it is possible that
nothing will be found there - finally new block will be created. So
exhaustion in random scenario happens for the maximum possible
allocation size: 32 pages for 32-bit system and 64 pages for 64-bit
system.
Also average cost of vm_map_ram was reduced from 1.017 us to 0.774 us.
Again this can be explained by iteration through smaller list of free
blocks.
3) Next simple scenario is a sequential allocation, when the allocation
order is increased for each block. This scenario forces allocator to
reach maximum amount of partially free blocks in a free list:
while (iters) {
/* Populate free list with blocks with remaining space */
for (order = 0; order <= ilog2(VMAP_MAX_ALLOC); order++) {
nr = VMAP_BBMAP_BITS / (1 << order);
/* Leave a hole */
nr -= 1;
for (i = 0; i < nr; i++) {
vaddr = vm_map_ram(pages, (1 << order), -1, PAGE_KERNEL);
vm_unmap_ram(vaddr, (1 << order));
}
/* Completely occupy blocks from a free list */
for (order = 0; order <= ilog2(VMAP_MAX_ALLOC); order++) {
vaddr = vm_map_ram(pages, (1 << order), -1, PAGE_KERNEL);
vm_unmap_ram(vaddr, (1 << order));
}
}
Results which I got:
BEFORE (all new blocks are put to the head of a free list)
# cat /sys/kernel/debug/tracing/trace_stat/function0
Function Hit Time Avg s^2
-------- --- ---- --- ---
vm_map_ram 2032000 399545.2 us 0.196 us 467123.7 us
vm_unmap_ram 2032000 363225.7 us 0.178 us 111405.9 us
alloc_vmap_area 7001 30627.76 us 4.374 us 495.755 us
free_vmap_block 6993 7011.685 us 1.002 us 159.090 us
AFTER (all new blocks are put to the tail of a free list)
# cat /sys/kernel/debug/tracing/trace_stat/function0
Function Hit Time Avg s^2
-------- --- ---- --- ---
vm_map_ram 2032000 394259.7 us 0.194 us 589395.9 us
vm_unmap_ram 2032000 292500.7 us 0.143 us 94181.08 us
alloc_vmap_area 7000 31103.11 us 4.443 us 703.225 us
free_vmap_block 7000 6750.844 us 0.964 us 119.112 us
SUMMARY:
No surprises here, almost all numbers are the same.
Fixing this fragmentation problem I also did some improvements in a
allocation logic of a new vmap block: occupy block immediately and get
rid of extra search in a free list.
Also I replaced dirty bitmap with min/max dirty range values to make the
logic simpler and slightly faster, since two longs comparison costs
less, than loop thru bitmap.
This patchset raises several questions:
Q: Think the problem you comments is already known so that I wrote comments
about it as "it could consume lots of address space through fragmentation".
Could you tell me about your situation and reason why it should be avoided?
Gioh Kim
A: Indeed, there was a commit 364376383 which adds explicit comment about
fragmentation. But fragmentation which is described in this comment caused
by mixing of long-lived and short-lived objects, when a whole block is pinned
in memory because some page slots are still in use. But here I am talking
about blocks which are free, nobody uses them, and allocator keeps them alive
forever, continuously allocating new blocks.
Q: I think that if you put newly allocated block to the tail of a free
list, below example would results in enormous performance degradation.
new block: 1MB (256 pages)
while (iters--) {
vm_map_ram(3 or something else not dividable for 256) * 85
vm_unmap_ram(3) * 85
}
On every iteration, it needs newly allocated block and it is put to the
tail of a free list so finding it consumes large amount of time.
Joonsoo Kim
A: Second patch in current patchset gets rid of extra search in a free list,
so new block will be immediately occupied..
Also, the scenario above is impossible, cause vm_map_ram allocates virtual
range in orders, i.e. 2^n. I.e. passing 3 to vm_map_ram you will allocate
4 slots in a block and 256 slots (capacity of a block) of course dividable
on 4, so block will be completely occupied.
But there is a worst case which we can achieve: each free block has a hole
equal to order size.
The maximum size of allocation is 64 pages for 64-bit system
(if you try to map more, original alloc_vmap_area will be called).
So the maximum order is 6. That means that worst case, before allocator
makes a decision to allocate a new block, is to iterate 7 blocks:
HEAD
1st block - has 1 page slot free (order 0)
2nd block - has 2 page slots free (order 1)
3rd block - has 4 page slots free (order 2)
4th block - has 8 page slots free (order 3)
5th block - has 16 page slots free (order 4)
6th block - has 32 page slots free (order 5)
7th block - has 64 page slots free (order 6)
TAIL
So the worst scenario on 64-bit system is that each CPU queue can have 7
blocks in a free list.
This can happen only and only if you allocate blocks increasing the order.
(as I did in the function written in the comment of the first patch)
This is weird and rare case, but still it is possible. Afterwards you will
get 7 blocks in a list.
All further requests should be placed in a newly allocated block or some
free slots should be found in a free list.
Seems it does not look dramatically awful.
This patch (of 3):
If suitable block can't be found, new block is allocated and put into a
head of a free list, so on next iteration this new block will be found
first.
That's bad, because old blocks in a free list will not get a chance to be
fully used, thus fragmentation will grow.
Let's consider this simple example:
#1 We have one block in a free list which is partially used, and where only
one page is free:
HEAD |xxxxxxxxx-| TAIL
^
free space for 1 page, order 0
#2 New allocation request of order 1 (2 pages) comes, new block is allocated
since we do not have free space to complete this request. New block is put
into a head of a free list:
HEAD |----------|xxxxxxxxx-| TAIL
#3 Two pages were occupied in a new found block:
HEAD |xx--------|xxxxxxxxx-| TAIL
^
two pages mapped here
#4 New allocation request of order 0 (1 page) comes. Block, which was created
on #2 step, is located at the beginning of a free list, so it will be found
first:
HEAD |xxX-------|xxxxxxxxx-| TAIL
^ ^
page mapped here, but better to use this hole
It is obvious, that it is better to complete request of #4 step using the
old block, where free space is left, because in other case fragmentation
will be highly increased.
But fragmentation is not only the case. The worst thing is that I can
easily create scenario, when the whole vmalloc space is exhausted by
blocks, which are not used, but already dirty and have several free pages.
Let's consider this function which execution should be pinned to one CPU:
static void exhaust_virtual_space(struct page *pages[16], int iters)
{
/* Firstly we have to map a big chunk, e.g. 16 pages.
* Then we have to occupy the remaining space with smaller
* chunks, i.e. 8 pages. At the end small hole should remain.
* So at the end of our allocation sequence block looks like
* this:
* XX big chunk
* |XXxxxxxxx-| x small chunk
* - hole, which is enough for a small chunk,
* but is not enough for a big chunk
*/
while (iters--) {
int i;
void *vaddr;
/* Map/unmap big chunk */
vaddr = vm_map_ram(pages, 16, -1, PAGE_KERNEL);
vm_unmap_ram(vaddr, 16);
/* Map/unmap small chunks.
*
* -1 for hole, which should be left at the end of each block
* to keep it partially used, with some free space available */
for (i = 0; i < (VMAP_BBMAP_BITS - 16) / 8 - 1; i++) {
vaddr = vm_map_ram(pages, 8, -1, PAGE_KERNEL);
vm_unmap_ram(vaddr, 8);
}
}
}
On every iteration new block (1MB of vm area in my case) will be
allocated and then will be occupied, without attempt to resolve small
allocation request using previously allocated blocks in a free list.
In case of random allocation (size should be randomly taken from the
range [1..64] in 64-bit case or [1..32] in 32-bit case) situation is the
same: new blocks continue to appear if maximum possible allocation size
(32 or 64) passed to the allocator, because all remaining blocks in a
free list do not have enough free space to complete this allocation
request.
In summary if new blocks are put into the head of a free list eventually
virtual space will be exhausted.
In current patch I simply put newly allocated block to the tail of a
free list, thus reduce fragmentation, giving a chance to resolve
allocation request using older blocks with possible holes left.
Signed-off-by: Roman Pen <r.peniaev@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Christoph Lameter <cl@linux.com>
Cc: Gioh Kim <gioh.kim@lge.com>
Cc: Rob Jones <rob.jones@codethink.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change vunmap_pmd_range() and vunmap_pud_range() to tear down huge KVA
mappings when they are set. pud_clear_huge() and pmd_clear_huge() return
zero when no-operation is performed, i.e. huge page mapping was not used.
These changes are only enabled when CONFIG_HAVE_ARCH_HUGE_VMAP is defined
on the architecture.
[akpm@linux-foundation.org: use consistent code layout]
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Robert Elliott <Elliott@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ioremap() and its related interfaces are used to create I/O mappings to
memory-mapped I/O devices. The mapping sizes of the traditional I/O
devices are relatively small. Non-volatile memory (NVM), however, has
many GB and is going to have TB soon. It is not very efficient to create
large I/O mappings with 4KB.
This patchset extends the ioremap() interfaces to transparently create I/O
mappings with huge pages whenever possible. ioremap() continues to use
4KB mappings when a huge page does not fit into a requested range. There
is no change necessary to the drivers using ioremap(). A requested
physical address must be aligned by a huge page size (1GB or 2MB on x86)
for using huge page mapping, though. The kernel huge I/O mapping will
improve performance of NVM and other devices with large memory, and reduce
the time to create their mappings as well.
On x86, MTRRs can override PAT memory types with a 4KB granularity. When
using a huge page, MTRRs can override the memory type of the huge page,
which may lead a performance penalty. The processor can also behave in an
undefined manner if a huge page is mapped to a memory range that MTRRs
have mapped with multiple different memory types. Therefore, the mapping
code falls back to use a smaller page size toward 4KB when a mapping range
is covered by non-WB type of MTRRs. The WB type of MTRRs has no affect on
the PAT memory types.
The patchset introduces HAVE_ARCH_HUGE_VMAP, which indicates that the arch
supports huge KVA mappings for ioremap(). User may specify a new kernel
option "nohugeiomap" to disable the huge I/O mapping capability of
ioremap() when necessary.
Patch 1-4 change common files to support huge I/O mappings. There is no
change in the functinalities unless HAVE_ARCH_HUGE_VMAP is defined on the
architecture of the system.
Patch 5-6 implement the HAVE_ARCH_HUGE_VMAP funcs on x86, and set
HAVE_ARCH_HUGE_VMAP on x86.
This patch (of 6):
__get_vm_area_node() takes unsigned long size, which is a 64-bit value on
a 64-bit kernel. However, fls(size) simply ignores the upper 32-bit.
Change to use fls_long() to handle the size properly.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Robert Elliott <Elliott@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Current approach in handling shadow memory for modules is broken.
Shadow memory could be freed only after memory shadow corresponds it is no
longer used. vfree() called from interrupt context could use memory its
freeing to store 'struct llist_node' in it:
void vfree(const void *addr)
{
...
if (unlikely(in_interrupt())) {
struct vfree_deferred *p = this_cpu_ptr(&vfree_deferred);
if (llist_add((struct llist_node *)addr, &p->list))
schedule_work(&p->wq);
Later this list node used in free_work() which actually frees memory.
Currently module_memfree() called in interrupt context will free shadow
before freeing module's memory which could provoke kernel crash.
So shadow memory should be freed after module's memory. However, such
deallocation order could race with kasan_module_alloc() in module_alloc().
Free shadow right before releasing vm area. At this point vfree()'d
memory is not used anymore and yet not available for other allocations.
New VM_KASAN flag used to indicate that vm area has dynamically allocated
shadow memory so kasan frees shadow only if it was previously allocated.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For instrumenting global variables KASan will shadow memory backing memory
for modules. So on module loading we will need to allocate memory for
shadow and map it at address in shadow that corresponds to the address
allocated in module_alloc().
__vmalloc_node_range() could be used for this purpose, except it puts a
guard hole after allocated area. Guard hole in shadow memory should be a
problem because at some future point we might need to have a shadow memory
at address occupied by guard hole. So we could fail to allocate shadow
for module_alloc().
Now we have VM_NO_GUARD flag disabling guard page, so we need to pass into
__vmalloc_node_range(). Add new parameter 'vm_flags' to
__vmalloc_node_range() function.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For instrumenting global variables KASan will shadow memory backing memory
for modules. So on module loading we will need to allocate memory for
shadow and map it at address in shadow that corresponds to the address
allocated in module_alloc().
__vmalloc_node_range() could be used for this purpose, except it puts a
guard hole after allocated area. Guard hole in shadow memory should be a
problem because at some future point we might need to have a shadow memory
at address occupied by guard hole. So we could fail to allocate shadow
for module_alloc().
Add a new vm_struct flag 'VM_NO_GUARD' indicating that vm area doesn't
have a guard hole.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch replaces printk(KERN_WARNING..) with pr_warn.
Thus it also reduces one line extra because of formatting.
Signed-off-by: Pintu Kumar <pintu.k@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Using seq_open_private() removes boilerplate code from vmalloc_open().
The resultant code is shorter and easier to follow.
However, please note that seq_open_private() call kzalloc() rather than
kmalloc() which may affect timing due to the memory initialisation
overhead.
Signed-off-by: Rob Jones <rob.jones@codethink.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently map_vm_area() takes (struct page *** pages) as third argument,
and after mapping, it moves (*pages) to point to (*pages +
nr_mappped_pages).
It looks like this kind of increment is useless to its caller these
days. The callers don't care about the increments and actually they're
trying to avoid this by passing another copy to map_vm_area().
The caller can always guarantee all the pages can be mapped into vm_area
as specified in first argument and the caller only cares about whether
map_vm_area() fails or not.
This patch cleans up the pointer movement in map_vm_area() and updates
its callers accordingly.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
tmp_mask in the __vmalloc_area_node() iteration never changes so it can
be moved into function scope and marked with const. This causes the
movl and orl to only be done once per call rather than area->nr_pages
times.
nested_gfp can also be marked const.
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is not uncommon on busy servers to get stuck hundred of ms in
vmalloc() calls (like file descriptor expansions).
Add a cond_resched() to __vmalloc_area_node() to be gentle to
other tasks.
[akpm@linux-foundation.org: only do it for __GFP_WAIT, per David]
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hugh Dickins <hughd@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Richard Yao reported a month ago that his system have a trouble with
vmap_area_lock contention during performance analysis by /proc/meminfo.
Andrew asked why his analysis checks /proc/meminfo stressfully, but he
didn't answer it.
https://lkml.org/lkml/2014/4/10/416
Although I'm not sure that this is right usage or not, there is a
solution reducing vmap_area_lock contention with no side-effect. That
is just to use rcu list iterator in get_vmalloc_info().
rcu can be used in this function because all RCU protocol is already
respected by writers, since Nick Piggin commit db64fe0225 ("mm:
rewrite vmap layer") back in linux-2.6.28
Specifically :
insertions use list_add_rcu(),
deletions use list_del_rcu() and kfree_rcu().
Note the rb tree is not used from rcu reader (it would not be safe),
only the vmap_area_list has full RCU protection.
Note that __purge_vmap_area_lazy() already uses this rcu protection.
rcu_read_lock();
list_for_each_entry_rcu(va, &vmap_area_list, list) {
if (va->flags & VM_LAZY_FREE) {
if (va->va_start < *start)
*start = va->va_start;
if (va->va_end > *end)
*end = va->va_end;
nr += (va->va_end - va->va_start) >> PAGE_SHIFT;
list_add_tail(&va->purge_list, &valist);
va->flags |= VM_LAZY_FREEING;
va->flags &= ~VM_LAZY_FREE;
}
}
rcu_read_unlock();
Peter:
: While rcu list traversal over the vmap_area_list is safe, this may
: arrive at different results than the spinlocked version. The rcu list
: traversal version will not be a 'snapshot' of a single, valid instant
: of the entire vmap_area_list, but rather a potential amalgam of
: different list states.
Joonsoo:
: Yes, you are right, but I don't think that we should be strict here.
: Meminfo is already not a 'snapshot' at specific time. While we try to get
: certain stats, the other stats can change. And, although we may arrive at
: different results than the spinlocked version, the difference would not be
: large and would not make serious side-effect.
[edumazet@google.com: add more commit description]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reported-by: Richard Yao <ryao@gentoo.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Zhang Yanfei <zhangyanfei.yes@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
zsmalloc needs exported unmap_kernel_range for building as a module. See
https://lkml.org/lkml/2013/1/18/487
I didn't send a patch to make unmap_kernel_range exportable at that time
because zram was staging stuff and I thought VM function exporting for
staging stuff makes no sense.
Now zsmalloc was promoted. If we can't build zsmalloc as module, it means
we can't build zram as module, either. Additionally, buddy map_vm_area is
already exported so let's export unmap_kernel_range to help his buddy.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace seq_printf where possible
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace places where __get_cpu_var() is used for an address calculation
with this_cpu_ptr().
Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
vm_map_ram() has a fragmentation problem when it cannot purge a
chunk(ie, 4M address space) if there is a pinning object in that
addresss space. So it could consume all VMALLOC address space easily.
We can fix the fragmentation problem by using vmap instead of
vm_map_ram() but vmap() is known to be slow compared to vm_map_ram().
Minchan said vm_map_ram is 5 times faster than vmap in his tests. So I
thought we should fix fragment problem of vm_map_ram because our
proprietary GPU driver has used it heavily.
On second thought, it's not an easy because we should reuse freed space
for solving the problem and it could make more IPI and bitmap operation
for searching hole. It could mitigate API's goal which is very fast
mapping. And even fragmentation problem wouldn't show in 64 bit
machine.
Another option is that the user should separate long-life and short-life
object and use vmap for long-life but vm_map_ram for short-life. If we
inform the user about the characteristic of vm_map_ram the user can
choose one according to the page lifetime.
Let's add some notice messages to user.
[akpm@linux-foundation.org: tweak comment text]
Signed-off-by: Gioh Kim <gioh.kim@lge.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To increase compiler portability there is <linux/compiler.h> which
provides convenience macros for various gcc constructs. Eg: __weak for
__attribute__((weak)). I've replaced all instances of gcc attributes with
the right macro in the memory management (/mm) subsystem.
[akpm@linux-foundation.org: while-we're-there consistency tweaks]
Signed-off-by: Gideon Israel Dsouza <gidisrael@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert commit ece86e222d, which was intended as a small performance
improvement.
Despite the claim that the patch doesn't introduce any functional
changes in fact it does.
The "no page" path behaves different now. Originally, vmalloc_to_page
might return NULL under some conditions, with new implementation it
returns pfn_to_page(0) which is not the same as NULL.
Simple test shows the difference.
test.c
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/vmalloc.h>
#include <linux/mm.h>
int __init myi(void)
{
struct page *p;
void *v;
v = vmalloc(PAGE_SIZE);
/* trigger the "no page" path in vmalloc_to_page*/
vfree(v);
p = vmalloc_to_page(v);
pr_err("expected val = NULL, returned val = %p", p);
return -EBUSY;
}
void __exit mye(void)
{
}
module_init(myi)
module_exit(mye)
Before interchange:
expected val = NULL, returned val = (null)
After interchange:
expected val = NULL, returned val = c7ebe000
Signed-off-by: Vladimir Murzin <murzin.v@gmail.com>
Cc: Jianyu Zhan <nasa4836@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently we are implementing vmalloc_to_pfn() as a wrapper around
vmalloc_to_page(), which is implemented as follow:
1. walks the page talbes to generates the corresponding pfn,
2. then converts the pfn to struct page,
3. returns it.
And vmalloc_to_pfn() re-wraps vmalloc_to_page() to get the pfn.
This seems too circuitous, so this patch reverses the way: implement
vmalloc_to_page() as a wrapper around vmalloc_to_pfn(). This makes
vmalloc_to_pfn() and vmalloc_to_page() slightly more efficient.
No functional change.
Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
Cc: Vladimir Murzin <murzin.v@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 248ac0e194 ("mm/vmalloc: remove guard page from between vmap
blocks") had the side effect of making vmap_area.va_end member point to
the next vmap_area.va_start. This was creating an artificial reference
to vmalloc'ed objects and kmemleak was rarely reporting vmalloc() leaks.
This patch marks the vmap_area containing pointers explicitly and
reduces the min ref_count to 2 as vm_struct still contains a reference
to the vmalloc'ed object. The kmemleak add_scan_area() function has
been improved to allow a SIZE_MAX argument covering the rest of the
object (for simpler calling sites).
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The VM_UNINITIALIZED/VM_UNLIST flag introduced by f5252e009d ("mm:
avoid null pointer access in vm_struct via /proc/vmallocinfo") is used
to avoid accessing the pages field with unallocated page when
show_numa_info() is called.
This patch moves the check just before show_numa_info in order that some
messages still can be dumped via /proc/vmallocinfo. This patch reverts
commit d157a55815 ("mm/vmalloc.c: check VM_UNINITIALIZED flag in
s_show instead of show_numa_info");
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a race window between vmap_area tear down and show vmap_area
information.
A B
remove_vm_area
spin_lock(&vmap_area_lock);
va->vm = NULL;
va->flags &= ~VM_VM_AREA;
spin_unlock(&vmap_area_lock);
spin_lock(&vmap_area_lock);
if (va->flags & (VM_LAZY_FREE | VM_LAZY_FREEZING))
return 0;
if (!(va->flags & VM_VM_AREA)) {
seq_printf(m, "0x%pK-0x%pK %7ld vm_map_ram\n",
(void *)va->va_start, (void *)va->va_end,
va->va_end - va->va_start);
return 0;
}
free_unmap_vmap_area(va);
flush_cache_vunmap
free_unmap_vmap_area_noflush
unmap_vmap_area
free_vmap_area_noflush
va->flags |= VM_LAZY_FREE
The assumption !VM_VM_AREA represents vm_map_ram allocation is
introduced by d4033afdf8 ("mm, vmalloc: iterate vmap_area_list,
instead of vmlist, in vmallocinfo()").
However, !VM_VM_AREA also represents vmap_area is being tear down in
race window mentioned above. This patch fix it by don't dump any
information for !VM_VM_AREA case and also remove (VM_LAZY_FREE |
VM_LAZY_FREEING) check since they are not possible for !VM_VM_AREA case.
Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The caller address has already been set in set_vmalloc_vm(), there's no
need to set it again in __vmalloc_area_node.
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>