-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlu9oZ4ACgkQONu9yGCS
aT5wmw/6As7cB5ufEFIVzCU3xJdf2yrD/+iaAY4fJUFWrgsqvImvwTeGyGm05AK2
/7VHaIW3ATmfLbgE4Qsq+eP/rfNPqkfDd7rVCIfrP3r51XhmP/e6/Mnfd3NN9K+O
FbRDc5U9kirzItAUsm1z9ntCuZDRfMdbazDAHB7eFlO2DgmV+u+o5KbzoeGM4mRk
IIDbdROW3sRmoPhubHBYZmGKFL+WNMxG/V1x+3iVnM1TNeGFgfR0NXaQ4s2lqdz8
tiJ0SNxcfEy/rAa1BgyuaKCcIXrD3OjaWOLYTB8Lr2PDn3WIyvpTw3sD2puCYWB9
zKLzKL/zPo4VK4wFAXZwbEhJuYrxRv4EsqyKKIdVzHeKtyMfHzMZg2uhnT1luLd8
yFiagE66H/Nn4SUznkD/bZNn1Zvyz7ME1AXq/L5go8HfuF2qVxaq/tczTJSCKsmH
M195RmR6JJ9ZF63mvyfopdyErcPXmBjnOgVb7TNXRa3yNyjZBFXvAUQQg/ZPkidl
81WsNVRyOr2LKpHmhceEcrXICqLmederLW/ZYc3+Ti8GnCf0AVL1bcnwAFygqvfp
Liq1YTWfqZl3/LHTCn1Jp3PduCgUAIREjP4g/YaHHJs+HfnZuvZcSa5maf1TieVk
IYbVtzkeKW8nTMGQnDazMl/LVmjV0bsA8tLakDW4ClUKRxX4nNI=
=99U3
-----END PGP SIGNATURE-----
Merge 4.4.160 into android-4.4-p
Changes in 4.4.160
crypto: skcipher - Fix -Wstringop-truncation warnings
tsl2550: fix lux1_input error in low light
vmci: type promotion bug in qp_host_get_user_memory()
x86/numa_emulation: Fix emulated-to-physical node mapping
staging: rts5208: fix missing error check on call to rtsx_write_register
uwb: hwa-rc: fix memory leak at probe
power: vexpress: fix corruption in notifier registration
Bluetooth: Add a new Realtek 8723DE ID 0bda:b009
USB: serial: kobil_sct: fix modem-status error handling
6lowpan: iphc: reset mac_header after decompress to fix panic
md-cluster: clear another node's suspend_area after the copy is finished
media: exynos4-is: Prevent NULL pointer dereference in __isp_video_try_fmt()
powerpc/kdump: Handle crashkernel memory reservation failure
media: fsl-viu: fix error handling in viu_of_probe()
x86/tsc: Add missing header to tsc_msr.c
x86/entry/64: Add two more instruction suffixes
scsi: target/iscsi: Make iscsit_ta_authentication() respect the output buffer size
scsi: klist: Make it safe to use klists in atomic context
scsi: ibmvscsi: Improve strings handling
usb: wusbcore: security: cast sizeof to int for comparison
powerpc/powernv/ioda2: Reduce upper limit for DMA window size
alarmtimer: Prevent overflow for relative nanosleep
s390/extmem: fix gcc 8 stringop-overflow warning
ALSA: snd-aoa: add of_node_put() in error path
media: s3c-camif: ignore -ENOIOCTLCMD from v4l2_subdev_call for s_power
media: soc_camera: ov772x: correct setting of banding filter
media: omap3isp: zero-initialize the isp cam_xclk{a,b} initial data
staging: android: ashmem: Fix mmap size validation
drivers/tty: add error handling for pcmcia_loop_config
media: tm6000: add error handling for dvb_register_adapter
ALSA: hda: Add AZX_DCAPS_PM_RUNTIME for AMD Raven Ridge
ath10k: protect ath10k_htt_rx_ring_free with rx_ring.lock
rndis_wlan: potential buffer overflow in rndis_wlan_auth_indication()
wlcore: Add missing PM call for wlcore_cmd_wait_for_event_or_timeout()
ARM: mvebu: declare asm symbols as character arrays in pmsu.c
HID: hid-ntrig: add error handling for sysfs_create_group
scsi: bnx2i: add error handling for ioremap_nocache
EDAC, i7core: Fix memleaks and use-after-free on probe and remove
ASoC: dapm: Fix potential DAI widget pointer deref when linking DAIs
module: exclude SHN_UNDEF symbols from kallsyms api
nfsd: fix corrupted reply to badly ordered compound
ARM: dts: dra7: fix DCAN node addresses
floppy: Do not copy a kernel pointer to user memory in FDGETPRM ioctl
serial: cpm_uart: return immediately from console poll
spi: tegra20-slink: explicitly enable/disable clock
spi: sh-msiof: Fix invalid SPI use during system suspend
spi: sh-msiof: Fix handling of write value for SISTR register
spi: rspi: Fix invalid SPI use during system suspend
spi: rspi: Fix interrupted DMA transfers
USB: fix error handling in usb_driver_claim_interface()
USB: handle NULL config in usb_find_alt_setting()
slub: make ->cpu_partial unsigned int
media: uvcvideo: Support realtek's UVC 1.5 device
USB: usbdevfs: sanitize flags more
USB: usbdevfs: restore warning for nonsensical flags
Revert "usb: cdc-wdm: Fix a sleep-in-atomic-context bug in service_outstanding_interrupt()"
USB: remove LPM management from usb_driver_claim_interface()
Input: elantech - enable middle button of touchpad on ThinkPad P72
IB/srp: Avoid that sg_reset -d ${srp_device} triggers an infinite loop
scsi: target: iscsi: Use bin2hex instead of a re-implementation
serial: imx: restore handshaking irq for imx1
arm64: KVM: Tighten guest core register access from userspace
ext4: never move the system.data xattr out of the inode body
thermal: of-thermal: disable passive polling when thermal zone is disabled
net: hns: fix length and page_offset overflow when CONFIG_ARM64_64K_PAGES
e1000: check on netif_running() before calling e1000_up()
e1000: ensure to free old tx/rx rings in set_ringparam()
hwmon: (ina2xx) fix sysfs shunt resistor read access
hwmon: (adt7475) Make adt7475_read_word() return errors
i2c: i801: Allow ACPI AML access I/O ports not reserved for SMBus
arm64: cpufeature: Track 32bit EL0 support
arm64: KVM: Sanitize PSTATE.M when being set from userspace
media: v4l: event: Prevent freeing event subscriptions while accessed
KVM: PPC: Book3S HV: Don't truncate HPTE index in xlate function
mac80211: correct use of IEEE80211_VHT_CAP_RXSTBC_X
mac80211_hwsim: correct use of IEEE80211_VHT_CAP_RXSTBC_X
gpio: adp5588: Fix sleep-in-atomic-context bug
mac80211: mesh: fix HWMP sequence numbering to follow standard
cfg80211: nl80211_update_ft_ies() to validate NL80211_ATTR_IE
RAID10 BUG_ON in raise_barrier when force is true and conf->barrier is 0
i2c: uniphier: issue STOP only for last message or I2C_M_STOP
i2c: uniphier-f: issue STOP only for last message or I2C_M_STOP
net: cadence: Fix a sleep-in-atomic-context bug in macb_halt_tx()
fs/cifs: don't translate SFM_SLASH (U+F026) to backslash
cfg80211: fix a type issue in ieee80211_chandef_to_operating_class()
mac80211: fix a race between restart and CSA flows
mac80211: Fix station bandwidth setting after channel switch
mac80211: shorten the IBSS debug messages
tools/vm/slabinfo.c: fix sign-compare warning
tools/vm/page-types.c: fix "defined but not used" warning
mm: madvise(MADV_DODUMP): allow hugetlbfs pages
usb: gadget: fotg210-udc: Fix memory leak of fotg210->ep[i]
perf probe powerpc: Ignore SyS symbols irrespective of endianness
RDMA/ucma: check fd type in ucma_migrate_id()
USB: yurex: Check for truncation in yurex_read()
drm/nouveau/TBDdevinit: don't fail when PMU/PRE_OS is missing from VBIOS
fs/cifs: suppress a string overflow warning
dm thin metadata: try to avoid ever aborting transactions
arch/hexagon: fix kernel/dma.c build warning
hexagon: modify ffs() and fls() to return int
arm64: jump_label.h: use asm_volatile_goto macro instead of "asm goto"
r8169: Clear RTL_FLAG_TASK_*_PENDING when clearing RTL_FLAG_TASK_ENABLED
s390/qeth: don't dump past end of unknown HW header
cifs: read overflow in is_valid_oplock_break()
xen/manage: don't complain about an empty value in control/sysrq node
xen: avoid crash in disable_hotplug_cpu
xen: fix GCC warning and remove duplicate EVTCHN_ROW/EVTCHN_COL usage
smb2: fix missing files in root share directory listing
ALSA: hda/realtek - Cannot adjust speaker's volume on Dell XPS 27 7760
crypto: mxs-dcp - Fix wait logic on chan threads
proc: restrict kernel stack dumps to root
ocfs2: fix locking for res->tracking and dlm->tracking_list
dm thin metadata: fix __udivdi3 undefined on 32-bit
Linux 4.4.160
Change-Id: I38a5feb4700b1b5947ff469b28b0894968750172
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit e5d9998f3e09359b372a037a6ac55ba235d95d57 upstream.
/*
* cpu_partial determined the maximum number of objects
* kept in the per cpu partial lists of a processor.
*/
Can't be negative.
Link: http://lkml.kernel.org/r/20180305200730.15812-15-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAltoWioACgkQONu9yGCS
aT6YrQ//d8dWKaNZK08Z/l2ZqRS56wlNTJyHIB81p1uM2PuPHfLjsZzLQ+HnZ3Ha
G+fedEj3sbwJp8i61TRu9Q1p/PyLWsnaryWZaK3gm4Yo8GrdVbXAY47EHwz3fbUK
yxrC0+zQmIlyZgwzbUNGspDuAdNt2MFDug97RFF8BdhJd84Rv0BbicGMwKJQFfFN
g0Tv6yB+8cjmnCMjmLreLyi+puWvXZtZXAi+idl9eTC4ysGDKNvO1ERptv2NC5C6
171cbsS/ngpY5ZIUcmLy0QPPFh/ZCeoft22R3gOxZDkjT4Ro6lY5ubPKDEcn57Hv
FSV5fuQ3cBtmsODn7LMIWqLDKuCRM/gTmvXrWxM91JDLSsuAdZWATj8k4iIoocmk
l/3iOixBMFCGToQ1I2/O33QZOssKoDIz4bpG6+HM/Cj4anSnVZKjouJSTlNZr/3i
ZJOXpu/MpQItc/RGo/PumzJLkXhS+HyGwPbTIOPy29NMqp+xvjZv4DttuJbqyHJ2
Pm/OZcvU7z1wSMhcTknvZLLMQVRIICQjfPJNDefqAdrCdd233cRo37cU8egg4A0l
F3q+ZI/ny01YWQP8KrCJyWB5lHHbEc44wUHCxet0TPZ1qaqvVcXzaWhwxP2H0L3I
7r2u9bDg15ielw3jhPpRWZMvANbQlToNoj6YROqj5ArcIowcBPc=
=7/iL
-----END PGP SIGNATURE-----
Merge 4.4.146 into android-4.4-p
Changes in 4.4.146
MIPS: Fix off-by-one in pci_resource_to_user()
Input: elan_i2c - add ACPI ID for lenovo ideapad 330
Input: i8042 - add Lenovo LaVie Z to the i8042 reset list
Input: elan_i2c - add another ACPI ID for Lenovo Ideapad 330-15AST
tracing: Fix double free of event_trigger_data
tracing: Fix possible double free in event_enable_trigger_func()
tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure
tracing: Quiet gcc warning about maybe unused link variable
xen/netfront: raise max number of slots in xennet_get_responses()
ALSA: emu10k1: add error handling for snd_ctl_add
ALSA: fm801: add error handling for snd_ctl_add
nfsd: fix potential use-after-free in nfsd4_decode_getdeviceinfo
mm: vmalloc: avoid racy handling of debugobjects in vunmap
mm/slub.c: add __printf verification to slab_err()
rtc: ensure rtc_set_alarm fails when alarms are not supported
netfilter: ipset: List timing out entries with "timeout 1" instead of zero
infiniband: fix a possible use-after-free bug
hvc_opal: don't set tb_ticks_per_usec in udbg_init_opal_common()
powerpc/64s: Fix compiler store ordering to SLB shadow area
RDMA/mad: Convert BUG_ONs to error flows
disable loading f2fs module on PAGE_SIZE > 4KB
f2fs: fix to don't trigger writeback during recovery
usbip: usbip_detach: Fix memory, udev context and udev leak
perf/x86/intel/uncore: Correct fixed counter index check in generic code
perf/x86/intel/uncore: Correct fixed counter index check for NHM
iwlwifi: pcie: fix race in Rx buffer allocator
Bluetooth: hci_qca: Fix "Sleep inside atomic section" warning
Bluetooth: btusb: Add a new Realtek 8723DE ID 2ff8:b011
ASoC: dpcm: fix BE dai not hw_free and shutdown
mfd: cros_ec: Fail early if we cannot identify the EC
mwifiex: handle race during mwifiex_usb_disconnect
wlcore: sdio: check for valid platform device data before suspend
media: videobuf2-core: don't call memop 'finish' when queueing
btrfs: add barriers to btrfs_sync_log before log_commit_wait wakeups
btrfs: qgroup: Finish rescan when hit the last leaf of extent tree
PCI: Prevent sysfs disable of device while driver is attached
ath: Add regulatory mapping for FCC3_ETSIC
ath: Add regulatory mapping for ETSI8_WORLD
ath: Add regulatory mapping for APL13_WORLD
ath: Add regulatory mapping for APL2_FCCA
ath: Add regulatory mapping for Uganda
ath: Add regulatory mapping for Tanzania
ath: Add regulatory mapping for Serbia
ath: Add regulatory mapping for Bermuda
ath: Add regulatory mapping for Bahamas
powerpc/32: Add a missing include header
powerpc/chrp/time: Make some functions static, add missing header include
powerpc/powermac: Add missing prototype for note_bootable_part()
powerpc/powermac: Mark variable x as unused
powerpc/8xx: fix invalid register expression in head_8xx.S
pinctrl: at91-pio4: add missing of_node_put
PCI: pciehp: Request control of native hotplug only if supported
mwifiex: correct histogram data with appropriate index
scsi: ufs: fix exception event handling
ALSA: emu10k1: Rate-limit error messages about page errors
regulator: pfuze100: add .is_enable() for pfuze100_swb_regulator_ops
md: fix NULL dereference of mddev->pers in remove_and_add_spares()
media: smiapp: fix timeout checking in smiapp_read_nvm
ALSA: usb-audio: Apply rate limit to warning messages in URB complete callback
HID: hid-plantronics: Re-resend Update to map button for PTT products
drm/radeon: fix mode_valid's return type
powerpc/embedded6xx/hlwd-pic: Prevent interrupts from being handled by Starlet
HID: i2c-hid: check if device is there before really probing
tty: Fix data race in tty_insert_flip_string_fixed_flag
dma-iommu: Fix compilation when !CONFIG_IOMMU_DMA
media: rcar_jpu: Add missing clk_disable_unprepare() on error in jpu_open()
libata: Fix command retry decision
media: saa7164: Fix driver name in debug output
mtd: rawnand: fsl_ifc: fix FSL NAND driver to read all ONFI parameter pages
brcmfmac: Add support for bcm43364 wireless chipset
s390/cpum_sf: Add data entry sizes to sampling trailer entry
perf: fix invalid bit in diagnostic entry
scsi: 3w-9xxx: fix a missing-check bug
scsi: 3w-xxxx: fix a missing-check bug
scsi: megaraid: silence a static checker bug
thermal: exynos: fix setting rising_threshold for Exynos5433
bpf: fix references to free_bpf_prog_info() in comments
media: siano: get rid of __le32/__le16 cast warnings
drm/atomic: Handling the case when setting old crtc for plane
ALSA: hda/ca0132: fix build failure when a local macro is defined
memory: tegra: Do not handle spurious interrupts
memory: tegra: Apply interrupts mask per SoC
drm/gma500: fix psb_intel_lvds_mode_valid()'s return type
ipconfig: Correctly initialise ic_nameservers
rsi: Fix 'invalid vdd' warning in mmc
audit: allow not equal op for audit by executable
microblaze: Fix simpleImage format generation
usb: hub: Don't wait for connect state at resume for powered-off ports
crypto: authencesn - don't leak pointers to authenc keys
crypto: authenc - don't leak pointers to authenc keys
media: omap3isp: fix unbalanced dma_iommu_mapping
scsi: scsi_dh: replace too broad "TP9" string with the exact models
scsi: megaraid_sas: Increase timeout by 1 sec for non-RAID fastpath IOs
media: si470x: fix __be16 annotations
drm: Add DP PSR2 sink enable bit
random: mix rdrand with entropy sent in from userspace
squashfs: be more careful about metadata corruption
ext4: fix inline data updates with checksums enabled
ext4: check for allocation block validity with block group locked
dmaengine: pxa_dma: remove duplicate const qualifier
ASoC: pxa: Fix module autoload for platform drivers
ipv4: remove BUG_ON() from fib_compute_spec_dst
net: fix amd-xgbe flow-control issue
net: lan78xx: fix rx handling before first packet is send
xen-netfront: wait xenbus state change when load module manually
NET: stmmac: align DMA stuff to largest cache line length
tcp: do not force quickack when receiving out-of-order packets
tcp: add max_quickacks param to tcp_incr_quickack and tcp_enter_quickack_mode
tcp: do not aggressively quick ack after ECN events
tcp: refactor tcp_ecn_check_ce to remove sk type cast
tcp: add one more quick ack after after ECN events
inet: frag: enforce memory limits earlier
net: dsa: Do not suspend/resume closed slave_dev
netlink: Fix spectre v1 gadget in netlink_create()
squashfs: more metadata hardening
squashfs: more metadata hardenings
can: ems_usb: Fix memory leak on ems_usb_disconnect()
net: socket: fix potential spectre v1 gadget in socketcall
virtio_balloon: fix another race between migration and ballooning
kvm: x86: vmx: fix vpid leak
crypto: padlock-aes - Fix Nano workaround data corruption
scsi: sg: fix minor memory leak in error path
Linux 4.4.146
Change-Id: I7b8ad5e297804f92b3e3a8c5daf8a26ba684029b
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit a38965bf941b7c2af50de09c96bc5f03e136caef ]
__printf is useful to verify format and arguments. Remove the following
warning (with W=1):
mm/slub.c:721:2: warning: function might be possible candidate for `gnu_printf' format attribute [-Wsuggest-attribute=format]
Link: http://lkml.kernel.org/r/20180505200706.19986-1-malat@debian.org
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kernel style prefers a single string over split strings when the string is
'user-visible'.
Miscellanea:
- Add a missing newline
- Realign arguments
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Tejun Heo <tj@kernel.org> [percpu]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 64145065
(cherry-picked from 756a025f00091918d9d09ca3229defb160b409c0)
Change-Id: I377fb1542980c15d2f306924656227ad17b02b5e
Signed-off-by: Paul Lawrence <paullawrence@google.com>
The state of object currently tracked in two places - shadow memory, and
the ->state field in struct kasan_alloc_meta. We can get rid of the
latter. The will save us a little bit of memory. Also, this allow us
to move free stack into struct kasan_alloc_meta, without increasing
memory consumption. So now we should always know when the last time the
object was freed. This may be useful for long delayed use-after-free
bugs.
As a side effect this fixes following UBSAN warning:
UBSAN: Undefined behaviour in mm/kasan/quarantine.c:102:13
member access within misaligned address ffff88000d1efebc for type 'struct qlist_node'
which requires 8 byte alignment
Link: http://lkml.kernel.org/r/1470062715-14077-5-git-send-email-aryabinin@virtuozzo.com
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 64145065
(cherry-picked from b3cbd9bf77cd1888114dbee1653e79aa23fd4068)
Change-Id: Iaa4959a78ffd2e49f9060099df1fb32483df3085
Signed-off-by: Paul Lawrence <paullawrence@google.com>
For KASAN builds:
- switch SLUB allocator to using stackdepot instead of storing the
allocation/deallocation stacks in the objects;
- change the freelist hook so that parts of the freelist can be put
into the quarantine.
[aryabinin@virtuozzo.com: fixes]
Link: http://lkml.kernel.org/r/1468601423-28676-1-git-send-email-aryabinin@virtuozzo.com
Link: http://lkml.kernel.org/r/1468347165-41906-3-git-send-email-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Steven Rostedt (Red Hat) <rostedt@goodmis.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Kuthonuzo Luruo <kuthonuzo.luruo@hpe.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 64145065
(cherry-picked from 80a9201a5965f4715d5c09790862e0df84ce0614)
Change-Id: I2b59c6d50d0db62d3609edfdc7be54e48f8afa5c
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Add GFP flags to KASAN hooks for future patches to use.
This patch is based on the "mm: kasan: unified support for SLUB and SLAB
allocators" patch originally prepared by Dmitry Chernenkov.
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 64145065
(cherry-picked from 505f5dcb1c419e55a9621a01f83eb5745d8d7398)
Change-Id: I7c5539f59e6969e484a6ff4f104dce2390669cfd
Signed-off-by: Paul Lawrence <paullawrence@google.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlk30BwACgkQONu9yGCS
aT5cmhAAh3etTuZ3xRw2eGW/Y/C8L2F2CjJjmR4vp1ms8P55uZg3xA20r5jNj7Ho
pwag3WTNzHpVfKFApavfEzToqDszRAtXcvYPPW9uXUPeu8LWyBJyvmN7lSQVKgDc
M9SWsd+8EGceopaj8KHjLMxNsV2n8j2ckxNf/BL/KgiMtJlgp/1TCDKUVS1k0cA7
CsuxDhxpRYpQofsIVww1hdrwCxVuntAY7u+/3B19ozXGFSRe/h5GO6xYRcG8pqfT
lvIgD6btdQJwI55QoSpJCpL96a534zc+akO0dtyaMJ3Q8UWQXD3JF8ZxMiPPrAe8
CLW390ATranIafmLi9g9DU1vQeEPNFXpeiYfxe65YL7igeAj/uPtVzKp0MvRcKG7
IBVNxbtsTQa73ig7gKSJ323CnpEfrr/XG73JNVtUQLxHa2poY7SUonRI587MFW2T
sONl9Pk3TxRC7Rc45si4RFsIj4jEF8ubUDXOPb2CrmDMB7MrM0PHfOW9lLCP92FD
pn0fM4vwNvm2ILsblqNcBumgeIBQ8ld2TBTbhRbh2FK4Rzxd2TSlWh4KqkcWcXCt
Lz8conU06AwTvDob1xoht3m6Gj32maopKZKGn5/Wq0YlfjOB/70CXOvPO3ChhKTh
QGNgA66bYdm+xn55wf7ty7Bq8yO6kcSNPQCXOb9S61nfCLA4KHM=
=U7IH
-----END PGP SIGNATURE-----
Merge 4.4.71 into android-4.4
Changes in 4.4.71
sparc: Fix -Wstringop-overflow warning
dccp/tcp: do not inherit mc_list from parent
ipv6/dccp: do not inherit ipv6_mc_list from parent
s390/qeth: handle sysfs error during initialization
s390/qeth: unbreak OSM and OSN support
s390/qeth: avoid null pointer dereference on OSN
tcp: avoid fragmenting peculiar skbs in SACK
sctp: fix src address selection if using secondary addresses for ipv6
sctp: do not inherit ipv6_{mc|ac|fl}_list from parent
tcp: eliminate negative reordering in tcp_clean_rtx_queue
net: Improve handling of failures on link and route dumps
ipv6: Prevent overrun when parsing v6 header options
ipv6: Check ip6_find_1stfragopt() return value properly.
bridge: netlink: check vlan_default_pvid range
qmi_wwan: add another Lenovo EM74xx device ID
bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
ipv6: fix out of bound writes in __ip6_append_data()
be2net: Fix offload features for Q-in-Q packets
virtio-net: enable TSO/checksum offloads for Q-in-Q vlans
tcp: avoid fastopen API to be used on AF_UNSPEC
sctp: fix ICMP processing if skb is non-linear
ipv4: add reference counting to metrics
netem: fix skb_orphan_partial()
net: phy: marvell: Limit errata to 88m1101
vlan: Fix tcp checksum offloads in Q-in-Q vlans
i2c: i2c-tiny-usb: fix buffer not being DMA capable
mmc: sdhci-iproc: suppress spurious interrupt with Multiblock read
HID: wacom: Have wacom_tpc_irq guard against possible NULL dereference
scsi: mpt3sas: Force request partial completion alignment
drm/radeon/ci: disable mclk switching for high refresh rates (v2)
drm/radeon: Unbreak HPD handling for r600+
pcmcia: remove left-over %Z format
ALSA: hda - apply STAC_9200_DELL_M22 quirk for Dell Latitude D430
slub/memcg: cure the brainless abuse of sysfs attributes
drm/gma500/psb: Actually use VBT mode when it is found
mm/migrate: fix refcount handling when !hugepage_migration_supported()
mlock: fix mlock count can not decrease in race condition
xfs: Fix missed holes in SEEK_HOLE implementation
xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff()
xfs: fix over-copying of getbmap parameters from userspace
xfs: handle array index overrun in xfs_dir2_leaf_readbuf()
xfs: prevent multi-fsb dir readahead from reading random blocks
xfs: fix up quotacheck buffer list error handling
xfs: support ability to wait on new inodes
xfs: update ag iterator to support wait on new inodes
xfs: wait on new inodes during quotaoff dquot release
xfs: fix indlen accounting error on partial delalloc conversion
xfs: bad assertion for delalloc an extent that start at i_size
xfs: fix unaligned access in xfs_btree_visit_blocks
xfs: in _attrlist_by_handle, copy the cursor back to userspace
xfs: only return -errno or success from attr ->put_listent
Linux 4.4.71
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 478fe3037b2278d276d4cd9cd0ab06c4cb2e9b32 upstream.
memcg_propagate_slab_attrs() abuses the sysfs attribute file functions
to propagate settings from the root kmem_cache to a newly created
kmem_cache. It does that with:
attr->show(root, buf);
attr->store(new, buf, strlen(bug);
Aside of being a lazy and absurd hackery this is broken because it does
not check the return value of the show() function.
Some of the show() functions return 0 w/o touching the buffer. That
means in such a case the store function is called with the stale content
of the previous show(). That causes nonsense like invoking
kmem_cache_shrink() on a newly created kmem_cache. In the worst case it
would cause handing in an uninitialized buffer.
This should be rewritten proper by adding a propagate() callback to
those slub_attributes which must be propagated and avoid that insane
conversion to and from ASCII, but that's too large for a hot fix.
Check at least the return value of the show() function, so calling
store() with stale content is prevented.
Steven said:
"It can cause a deadlock with get_online_cpus() that has been uncovered
by recent cpu hotplug and lockdep changes that Thomas and Peter have
been doing.
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(cpu_hotplug.lock);
lock(slab_mutex);
lock(cpu_hotplug.lock);
lock(slab_mutex);
*** DEADLOCK ***"
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705201244540.2255@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
SLUB already has a redzone debugging feature. But it is only positioned
at the end of object (aka right redzone) so it cannot catch left oob.
Although current object's right redzone acts as left redzone of next
object, first object in a slab cannot take advantage of this effect.
This patch explicitly adds a left red zone to each object to detect left
oob more precisely.
Background:
Someone complained to me that left OOB doesn't catch even if KASAN is
enabled which does page allocation debugging. That page is out of our
control so it would be allocated when left OOB happens and, in this
case, we can't find OOB. Moreover, SLUB debugging feature can be
enabled without page allocator debugging and, in this case, we will miss
that OOB.
Before trying to implement, I expected that changes would be too
complex, but, it doesn't look that complex to me now. Almost changes
are applied to debug specific functions so I feel okay.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ib893a17ecabd692e6c402e864196bf89cd6781a5
(cherry picked from commit d86bd1bece6fc41d59253002db5441fe960a37f6)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Under CONFIG_HARDENED_USERCOPY, this adds object size checking to the
SLUB allocator to catch any copies that may span objects. Includes a
redzone handling fix discovered by Michael Ellerman.
Based on code from PaX and grsecurity.
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Reviwed-by: Laura Abbott <labbott@redhat.com>
Change-Id: I52dc6fb3a3492b937d52b5cf9c046bf03dc40a3a
(cherry picked from commit ed18adc1cdd00a5c55a20fbdaed4804660772281)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
commit 376bf125ac781d32e202760ed7deb1ae4ed35d31 upstream.
This change is primarily an attempt to make it easier to realize the
optimizations the compiler performs in-case CONFIG_MEMCG_KMEM is not
enabled.
Performance wise, even when CONFIG_MEMCG_KMEM is compiled in, the
overhead is zero. This is because, as long as no process have enabled
kmem cgroups accounting, the assignment is replaced by asm-NOP
operations. This is possible because memcg_kmem_enabled() uses a
static_key_false() construct.
It also helps readability as it avoid accessing the p[] array like:
p[size - 1] which "expose" that the array is processed backwards inside
helper function build_detached_freelist().
Lastly this also makes the code more robust, in error case like passing
NULL pointers in the array. Which were previously handled before commit
033745189b ("slub: add missing kmem cgroup support to
kmem_cache_free_bulk").
Fixes: 033745189b ("slub: add missing kmem cgroup support to kmem_cache_free_bulk")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Adjust kmem_cache_alloc_bulk API before we have any real users.
Adjust API to return type 'int' instead of previously type 'bool'. This
is done to allow future extension of the bulk alloc API.
A future extension could be to allow SLUB to stop at a page boundary, when
specified by a flag, and then return the number of objects.
The advantage of this approach, would make it easier to make bulk alloc
run without local IRQs disabled. With an approach of cmpxchg "stealing"
the entire c->freelist or page->freelist. To avoid overshooting we would
stop processing at a slab-page boundary. Else we always end up returning
some objects at the cost of another cmpxchg.
To keep compatible with future users of this API linking against an older
kernel when using the new flag, we need to return the number of allocated
objects with this API change.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Initial implementation missed support for kmem cgroup support in
kmem_cache_free_bulk() call, add this.
If CONFIG_MEMCG_KMEM is not enabled, the compiler should be smart enough
to not add any asm code.
Incoming bulk free objects can belong to different kmem cgroups, and
object free call can happen at a later point outside memcg context. Thus,
we need to keep the orig kmem_cache, to correctly verify if a memcg object
match against its "root_cache" (s->memcg_params.root_cache).
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The call slab_pre_alloc_hook() interacts with kmemgc and is not allowed to
be called several times inside the bulk alloc for loop, due to the call to
memcg_kmem_get_cache().
This would result in hitting the VM_BUG_ON in __memcg_kmem_get_cache.
As suggested by Vladimir Davydov, change slab_post_alloc_hook() to be able
to handle an array of objects.
A subtle detail is, loop iterator "i" in slab_post_alloc_hook() must have
same type (size_t) as size argument. This helps the compiler to easier
realize that it can remove the loop, when all debug statements inside loop
evaluates to nothing. Note, this is only an issue because the kernel is
compiled with GCC option: -fno-strict-overflow
In slab_alloc_node() the compiler inlines and optimizes the invocation of
slab_post_alloc_hook(s, flags, 1, &object) by removing the loop and access
object directly.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reported-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Suggested-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make it possible to free a freelist with several objects by adjusting API
of slab_free() and __slab_free() to have head, tail and an objects counter
(cnt).
Tail being NULL indicate single object free of head object. This allow
compiler inline constant propagation in slab_free() and
slab_free_freelist_hook() to avoid adding any overhead in case of single
object free.
This allows a freelist with several objects (all within the same
slab-page) to be free'ed using a single locked cmpxchg_double in
__slab_free() and with an unlocked cmpxchg_double in slab_free().
Object debugging on the free path is also extended to handle these
freelists. When CONFIG_SLUB_DEBUG is enabled it will also detect if
objects don't belong to the same slab-page.
These changes are needed for the next patch to bulk free the detached
freelists it introduces and constructs.
Micro benchmarking showed no performance reduction due to this change,
when debugging is turned off (compiled with CONFIG_SLUB_DEBUG).
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The #ifdef of CONFIG_SLUB_DEBUG is located very far from the associated
#else. For readability mark it with a comment.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use the new function that can do allocation while interrupts are disabled.
Avoids irq on/off sequences.
Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bulk alloc needs a function like that because it enables interrupts before
calling __slab_alloc which promptly disables them again using the expensive
local_irq_save().
Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts. They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve". __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".
Over time, callers had a requirement to not block when fallback options
were available. Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.
This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative. High priority users continue to use
__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim. __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.
This patch then converts a number of sites
o __GFP_ATOMIC is used by callers that are high priority and have memory
pools for those requests. GFP_ATOMIC uses this flag.
o Callers that have a limited mempool to guarantee forward progress clear
__GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
into this category where kswapd will still be woken but atomic reserves
are not used as there is a one-entry mempool to guarantee progress.
o Callers that are checking if they are non-blocking should use the
helper gfpflags_allow_blocking() where possible. This is because
checking for __GFP_WAIT as was done historically now can trigger false
positives. Some exceptions like dm-crypt.c exist where the code intent
is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
flag manipulations.
o Callers that built their own GFP flags instead of starting with GFP_KERNEL
and friends now also need to specify __GFP_KSWAPD_RECLAIM.
The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.
The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL. They may
now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's recommended to have slub's user tracking enabled with CONFIG_KASAN,
because:
a) User tracking disables slab merging which improves
detecting out-of-bounds accesses.
b) User tracking metadata acts as redzone which also improves
detecting out-of-bounds accesses.
c) User tracking provides additional information about object.
This information helps to understand bugs.
Currently it is not enabled by default. Besides recompiling the kernel
with KASAN and reinstalling it, user also have to change the boot cmdline,
which is not very handy.
Enable slub user tracking by default with KASAN=y, since there is no good
reason to not do this.
[akpm@linux-foundation.org: little fixes, per David]
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have memcg_kmem_charge and memcg_kmem_uncharge methods for charging and
uncharging kmem pages to memcg, but currently they are not used for
charging slab pages (i.e. they are only used for charging pages allocated
with alloc_kmem_pages). The only reason why the slab subsystem uses
special helpers, memcg_charge_slab and memcg_uncharge_slab, is that it
needs to charge to the memcg of kmem cache while memcg_charge_kmem charges
to the memcg that the current task belongs to.
To remove this diversity, this patch adds an extra argument to
__memcg_kmem_charge that can be a pointer to a memcg or NULL. If it is
not NULL, the function tries to charge to the memcg it points to,
otherwise it charge to the current context. Next, it makes the slab
subsystem use this function to charge slab pages.
Since memcg_charge_kmem and memcg_uncharge_kmem helpers are now used only
in __memcg_kmem_charge and __memcg_kmem_uncharge, they are inlined. Since
__memcg_kmem_charge stores a pointer to the memcg in the page struct, we
don't need memcg_uncharge_slab anymore and can use free_kmem_pages.
Besides, one can now detect which memcg a slab page belongs to by reading
/proc/kpagecgroup.
Note, this patch switches slab to charge-after-alloc design. Since this
design is already used for all other memcg charges, it should not make any
difference.
[hannes@cmpxchg.org: better to have an outer function than a magic parameter for the memcg lookup]
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In slub_order(), the order starts from max(min_order,
get_order(min_objects * size)). When (min_objects * size) has different
order from (min_objects * size + reserved), it will skip this order via a
check in the loop.
This patch optimizes this a little by calculating the start order with
`reserved' in consideration and removing the check in loop.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
get_order() is more easy to understand.
This patch just replaces it.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In calculate_order(), it tries to calculate the best order by adjusting
the fraction and min_objects. On each iteration on min_objects, fraction
iterates on 16, 8, 4. Which means the acceptable waste increases with
1/16, 1/8, 1/4.
This patch corrects the comment according to the code.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
alloc_pages_exact_node() was introduced in commit 6484eb3e2a ("page
allocator: do not check NUMA node ID when the caller knows the node is
valid") as an optimized variant of alloc_pages_node(), that doesn't
fallback to current node for nid == NUMA_NO_NODE. Unfortunately the
name of the function can easily suggest that the allocation is
restricted to the given node and fails otherwise. In truth, the node is
only preferred, unless __GFP_THISNODE is passed among the gfp flags.
The misleading name has lead to mistakes in the past, see for example
commits 5265047ac3 ("mm, thp: really limit transparent hugepage
allocation to local node") and b360edb43f ("mm, mempolicy:
migrate_to_node should only migrate to node").
Another issue with the name is that there's a family of
alloc_pages_exact*() functions where 'exact' means exact size (instead
of page order), which leads to more confusion.
To prevent further mistakes, this patch effectively renames
alloc_pages_exact_node() to __alloc_pages_node() to better convey that
it's an optimized variant of alloc_pages_node() not intended for general
usage. Both functions get described in comments.
It has been also considered to really provide a convenience function for
allocations restricted to a node, but the major opinion seems to be that
__GFP_THISNODE already provides that functionality and we shouldn't
duplicate the API needlessly. The number of users would be small
anyway.
Existing callers of alloc_pages_exact_node() are simply converted to
call __alloc_pages_node(), with the exception of sba_alloc_coherent()
which open-codes the check for NUMA_NO_NODE, so it is converted to use
alloc_pages_node() instead. This means it no longer performs some
VM_BUG_ON checks, and since the current check for nid in
alloc_pages_node() uses a 'nid < 0' comparison (which includes
NUMA_NO_NODE), it may hide wrong values which would be previously
exposed.
Both differences will be rectified by the next patch.
To sum up, this patch makes no functional changes, except temporarily
hiding potentially buggy callers. Restricting the checks in
alloc_pages_node() is left for the next patch which can in turn expose
more existing buggy callers.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Robin Holt <robinmholt@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Cliff Whickman <cpw@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Description is almost copied from commit fb05e7a89f ("net: don't wait
for order-3 page allocation").
I saw excessive direct memory reclaim/compaction triggered by slub. This
causes performance issues and add latency. Slub uses high-order
allocation to reduce internal fragmentation and management overhead. But,
direct memory reclaim/compaction has high overhead and the benefit of
high-order allocation can't compensate the overhead of both work.
This patch makes auxiliary high-order allocation atomic. If there is no
memory pressure and memory isn't fragmented, the alloction will still
success, so we don't sacrifice high-order allocation's benefit here. If
the atomic allocation fails, direct memory reclaim/compaction will not be
triggered, allocation fallback to low-order immediately, hence the direct
memory reclaim/compaction overhead is avoided. In the allocation failure
case, kswapd is waken up and trying to make high-order freepages, so
allocation could success next time.
Following is the test to measure effect of this patch.
System: QEMU, CPU 8, 512 MB
Mem: 25% memory is allocated at random position to make fragmentation.
Memory-hogger occupies 150 MB memory.
Workload: hackbench -g 20 -l 1000
Average result by 10 runs (Base va Patched)
elapsed_time(s): 4.3468 vs 2.9838
compact_stall: 461.7 vs 73.6
pgmigrate_success: 28315.9 vs 7256.1
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sysfs_slab_add() shouldn't call kobject_put at error path: this puts last
reference of kmem-cache kobject and frees it. Kmem cache will be freed
second time at error path in kmem_cache_create().
For example this happens when slub debug was enabled in runtime and
somebody creates new kmem cache:
# echo 1 | tee /sys/kernel/slab/*/sanity_checks
# modprobe configfs
"configfs_dir_cache" cannot be merged because existing slab have debug and
cannot create new slab because unique name ":t-0000096" already taken.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Initializing a new slab can introduce rather large latencies because most
of the initialization runs always with interrupts disabled.
There is no point in doing so. The newly allocated slab is not visible
yet, so there is no reason to protect it against concurrent alloc/free.
Move the expensive parts of the initialization into allocate_slab(), so
for all allocations with GFP_WAIT set, interrupts are enabled.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
First piece: acceleration of retrieval of per cpu objects
If we are allocating lots of objects then it is advantageous to disable
interrupts and avoid the this_cpu_cmpxchg() operation to get these objects
faster.
Note that we cannot do the fast operation if debugging is enabled, because
we would have to add extra code to do all the debugging checks. And it
would not be fast anyway.
Note also that the requirement of having interrupts disabled avoids having
to do processor flag operations.
Allocate as many objects as possible in the fast way and then fall back to
the generic implementation for the rest of the objects.
Measurements on CPU CPU i7-4790K @ 4.00GHz
Baseline normal fastpath (alloc+free cost): 42 cycles(tsc) 10.554 ns
Bulk- fallback - this-patch
1 - 57 cycles(tsc) 14.432 ns - 48 cycles(tsc) 12.155 ns improved 15.8%
2 - 50 cycles(tsc) 12.746 ns - 37 cycles(tsc) 9.390 ns improved 26.0%
3 - 48 cycles(tsc) 12.180 ns - 33 cycles(tsc) 8.417 ns improved 31.2%
4 - 48 cycles(tsc) 12.015 ns - 32 cycles(tsc) 8.045 ns improved 33.3%
8 - 46 cycles(tsc) 11.526 ns - 30 cycles(tsc) 7.699 ns improved 34.8%
16 - 45 cycles(tsc) 11.418 ns - 32 cycles(tsc) 8.205 ns improved 28.9%
30 - 80 cycles(tsc) 20.246 ns - 73 cycles(tsc) 18.328 ns improved 8.8%
32 - 79 cycles(tsc) 19.946 ns - 72 cycles(tsc) 18.208 ns improved 8.9%
34 - 78 cycles(tsc) 19.659 ns - 71 cycles(tsc) 17.987 ns improved 9.0%
48 - 86 cycles(tsc) 21.516 ns - 82 cycles(tsc) 20.566 ns improved 4.7%
64 - 93 cycles(tsc) 23.423 ns - 89 cycles(tsc) 22.480 ns improved 4.3%
128 - 100 cycles(tsc) 25.170 ns - 99 cycles(tsc) 24.871 ns improved 1.0%
158 - 102 cycles(tsc) 25.549 ns - 101 cycles(tsc) 25.375 ns improved 1.0%
250 - 101 cycles(tsc) 25.344 ns - 100 cycles(tsc) 25.182 ns improved 1.0%
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add the basic infrastructure for alloc/free operations on pointer arrays.
It includes a generic function in the common slab code that is used in
this infrastructure patch to create the unoptimized functionality for slab
bulk operations.
Allocators can then provide optimized allocation functions for situations
in which large numbers of objects are needed. These optimization may
avoid taking locks repeatedly and bypass metadata creation if all objects
in slab pages can be used to provide the objects required.
Allocators can extend the skeletons provided and add their own code to the
bulk alloc and free functions. They can keep the generic allocation and
freeing and just fall back to those if optimizations would not work (like
for example when debugging is on).
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With this patchset the SLUB allocator now has both bulk alloc and free
implemented.
This patchset mostly optimizes the "fastpath" where objects are available
on the per CPU fastpath page. This mostly amortize the less-heavy
none-locked cmpxchg_double used on fastpath.
The "fallback" bulking (e.g __kmem_cache_free_bulk) provides a good basis
for comparison. Measurements[1] of the fallback functions
__kmem_cache_{free,alloc}_bulk have been copied from slab_common.c and
forced "noinline" to force a function call like slab_common.c.
Measurements on CPU CPU i7-4790K @ 4.00GHz
Baseline normal fastpath (alloc+free cost): 42 cycles(tsc) 10.601 ns
Measurements last-patch with disabled debugging:
Bulk- fallback - this-patch
1 - 57 cycles(tsc) 14.448 ns - 44 cycles(tsc) 11.236 ns improved 22.8%
2 - 51 cycles(tsc) 12.768 ns - 28 cycles(tsc) 7.019 ns improved 45.1%
3 - 48 cycles(tsc) 12.232 ns - 22 cycles(tsc) 5.526 ns improved 54.2%
4 - 48 cycles(tsc) 12.025 ns - 19 cycles(tsc) 4.786 ns improved 60.4%
8 - 46 cycles(tsc) 11.558 ns - 18 cycles(tsc) 4.572 ns improved 60.9%
16 - 45 cycles(tsc) 11.458 ns - 18 cycles(tsc) 4.658 ns improved 60.0%
30 - 45 cycles(tsc) 11.499 ns - 18 cycles(tsc) 4.568 ns improved 60.0%
32 - 79 cycles(tsc) 19.917 ns - 65 cycles(tsc) 16.454 ns improved 17.7%
34 - 78 cycles(tsc) 19.655 ns - 63 cycles(tsc) 15.932 ns improved 19.2%
48 - 68 cycles(tsc) 17.049 ns - 50 cycles(tsc) 12.506 ns improved 26.5%
64 - 80 cycles(tsc) 20.009 ns - 63 cycles(tsc) 15.929 ns improved 21.3%
128 - 94 cycles(tsc) 23.749 ns - 86 cycles(tsc) 21.583 ns improved 8.5%
158 - 97 cycles(tsc) 24.299 ns - 90 cycles(tsc) 22.552 ns improved 7.2%
250 - 102 cycles(tsc) 25.681 ns - 98 cycles(tsc) 24.589 ns improved 3.9%
Benchmarking shows impressive improvements in the "fastpath" with a small
number of objects in the working set. Once the working set increases,
resulting in activating the "slowpath" (that contains the heavier locked
cmpxchg_double) the improvement decreases.
I'm currently working on also optimizing the "slowpath" (as network stack
use-case hits this), but this patchset should provide a good foundation
for further improvements. Rest of my patch queue in this area needs some
more work, but preliminary results are good. I'm attending Netfilter
Workshop[2] next week, and I'll hopefully return working on further
improvements in this area.
This patch (of 6):
s/succedd/succeed/
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit c48a11c7ad ("netvm: propagate page->pfmemalloc to skb") added
checks for page->pfmemalloc to __skb_fill_page_desc():
if (page->pfmemalloc && !page->mapping)
skb->pfmemalloc = true;
It assumes page->mapping == NULL implies that page->pfmemalloc can be
trusted. However, __delete_from_page_cache() can set set page->mapping
to NULL and leave page->index value alone. Due to being in union, a
non-zero page->index will be interpreted as true page->pfmemalloc.
So the assumption is invalid if the networking code can see such a page.
And it seems it can. We have encountered this with a NFS over loopback
setup when such a page is attached to a new skbuf. There is no copying
going on in this case so the page confuses __skb_fill_page_desc which
interprets the index as pfmemalloc flag and the network stack drops
packets that have been allocated using the reserves unless they are to
be queued on sockets handling the swapping which is the case here and
that leads to hangs when the nfs client waits for a response from the
server which has been dropped and thus never arrive.
The struct page is already heavily packed so rather than finding another
hole to put it in, let's do a trick instead. We can reuse the index
again but define it to an impossible value (-1UL). This is the page
index so it should never see the value that large. Replace all direct
users of page->pfmemalloc by page_is_pfmemalloc which will hide this
nastiness from unspoiled eyes.
The information will get lost if somebody wants to use page->index
obviously but that was the case before and the original code expected
that the information should be persisted somewhere else if that is
really needed (e.g. what SLAB and SLUB do).
[akpm@linux-foundation.org: fix blooper in slub]
Fixes: c48a11c7ad ("netvm: propagate page->pfmemalloc to skb")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Debugged-by: Vlastimil Babka <vbabka@suse.com>
Debugged-by: Jiri Bohac <jbohac@suse.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: <stable@vger.kernel.org> [3.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch moves the initialization of the size_index table slightly
earlier so that the first few kmem_cache_node's can be safely allocated
when KMALLOC_MIN_SIZE is large.
There are currently two ways to generate indices into kmalloc_caches (via
kmalloc_index() and via the size_index table in slab_common.c) and on some
arches (possibly only MIPS) they potentially disagree with each other
until create_kmalloc_caches() has been called. It seems that the
intention is that the size_index table is a fast equivalent to
kmalloc_index() and that create_kmalloc_caches() patches the table to
return the correct value for the cases where kmalloc_index()'s
if-statements apply.
The failing sequence was:
* kmalloc_caches contains NULL elements
* kmem_cache_init initialises the element that 'struct
kmem_cache_node' will be allocated to. For 32-bit Mips, this is a
56-byte struct and kmalloc_index returns KMALLOC_SHIFT_LOW (7).
* init_list is called which calls kmalloc_node to allocate a 'struct
kmem_cache_node'.
* kmalloc_slab selects the kmem_caches element using
size_index[size_index_elem(size)]. For MIPS, size is 56, and the
expression returns 6.
* This element of kmalloc_caches is NULL and allocation fails.
* If it had not already failed, it would have called
create_kmalloc_caches() at this point which would have changed
size_index[size_index_elem(size)] to 7.
I don't believe the bug to be LLVM specific but GCC doesn't normally
encounter the problem. I haven't been able to identify exactly what GCC
is doing better (probably inlining) but it seems that GCC is managing to
optimize to the point that it eliminates the problematic allocations.
This theory is supported by the fact that GCC can be made to fail in the
same way by changing inline, __inline, __inline__, and __always_inline in
include/linux/compiler-gcc.h such that they don't actually inline things.
Signed-off-by: Daniel Sanders <daniel.sanders@imgtec.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We converted some of the usages of ACCESS_ONCE to READ_ONCE in the mm/
tree since it doesn't work reliably on non-scalar types.
This patch removes the rest of the usages of ACCESS_ONCE, and use the new
READ_ONCE API for the read accesses. This makes things cleaner, instead
of using separate/multiple sets of APIs.
Signed-off-by: Jason Low <jason.low2@hp.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use the normal return values for bool functions
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
By moving the O option detection into the switch statement, we allow this
parameter to be combined with other options correctly. Previously options
like slub_debug=OFZ would only detect the 'o' and use DEBUG_DEFAULT_FLAGS
to fill in the rest of the flags.
Signed-off-by: Chris J Arges <chris.j.arges@canonical.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 9aabf810a6 ("mm/slub: optimize alloc/free fastpath by removing
preemption on/off") introduced an occasional hang for kernels built with
CONFIG_PREEMPT && !CONFIG_SMP.
The problem is the following loop the patch introduced to
slab_alloc_node and slab_free:
do {
tid = this_cpu_read(s->cpu_slab->tid);
c = raw_cpu_ptr(s->cpu_slab);
} while (IS_ENABLED(CONFIG_PREEMPT) && unlikely(tid != c->tid));
GCC 4.9 has been observed to hoist the load of c and c->tid above the
loop for !SMP kernels (as in this case raw_cpu_ptr(x) is compile-time
constant and does not force a reload). On arm64 the generated assembly
looks like:
ldr x4, [x0,#8]
loop:
ldr x1, [x0,#8]
cmp x1, x4
b.ne loop
If the thread is preempted between the load of c->tid (into x1) and tid
(into x4), and an allocation or free occurs in another thread (bumping
the cpu_slab's tid), the thread will be stuck in the loop until
s->cpu_slab->tid wraps, which may be forever in the absence of
allocations/frees on the same CPU.
This patch changes the loop condition to access c->tid with READ_ONCE.
This ensures that the value is reloaded even when the compiler would
otherwise assume it could cache the value, and also ensures that the
load will not be torn.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With this patch kasan will be able to catch bugs in memory allocated by
slub. Initially all objects in newly allocated slab page, marked as
redzone. Later, when allocation of slub object happens, requested by
caller number of bytes marked as accessible, and the rest of the object
(including slub's metadata) marked as redzone (inaccessible).
We also mark object as accessible if ksize was called for this object.
There is some places in kernel where ksize function is called to inquire
size of really allocated area. Such callers could validly access whole
allocated memory, so it should be marked as accessible.
Code in slub.c and slab_common.c files could validly access to object's
metadata, so instrumentation for this files are disabled.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Signed-off-by: Dmitry Chernenkov <dmitryc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's ok for slub to access memory that marked by kasan as inaccessible
(object's metadata). Kasan shouldn't print report in that case because
these accesses are valid. Disabling instrumentation of slub.c code is not
enough to achieve this because slub passes pointer to object's metadata
into external functions like memchr_inv().
We don't want to disable instrumentation for memchr_inv() because this is
quite generic function, and we don't want to miss bugs.
metadata_access_enable/metadata_access_disable used to tell KASan where
accesses to metadata starts/end, so we could temporarily disable KASan
reports.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove static and add function declarations to linux/slub_def.h so it
could be used by kernel address sanitizer.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
printk and friends can now format bitmaps using '%*pb[l]'. cpumask
and nodemask also provide cpumask_pr_args() and nodemask_pr_args()
respectively which can be used to generate the two printf arguments
necessary to format the specified cpu/nodemask.
* This is an equivalent conversion but the whole function should be
converted to use scnprinf famiily of functions rather than
performing custom output length predictions in multiple places.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To speed up further allocations SLUB may store empty slabs in per cpu/node
partial lists instead of freeing them immediately. This prevents per
memcg caches destruction, because kmem caches created for a memory cgroup
are only destroyed after the last page charged to the cgroup is freed.
To fix this issue, this patch resurrects approach first proposed in [1].
It forbids SLUB to cache empty slabs after the memory cgroup that the
cache belongs to was destroyed. It is achieved by setting kmem_cache's
cpu_partial and min_partial constants to 0 and tuning put_cpu_partial() so
that it would drop frozen empty slabs immediately if cpu_partial = 0.
The runtime overhead is minimal. From all the hot functions, we only
touch relatively cold put_cpu_partial(): we make it call
unfreeze_partials() after freezing a slab that belongs to an offline
memory cgroup. Since slab freezing exists to avoid moving slabs from/to a
partial list on free/alloc, and there can't be allocations from dead
caches, it shouldn't cause any overhead. We do have to disable preemption
for put_cpu_partial() to achieve that though.
The original patch was accepted well and even merged to the mm tree.
However, I decided to withdraw it due to changes happening to the memcg
core at that time. I had an idea of introducing per-memcg shrinkers for
kmem caches, but now, as memcg has finally settled down, I do not see it
as an option, because SLUB shrinker would be too costly to call since SLUB
does not keep free slabs on a separate list. Besides, we currently do not
even call per-memcg shrinkers for offline memcgs. Overall, it would
introduce much more complexity to both SLUB and memcg than this small
patch.
Regarding to SLAB, there's no problem with it, because it shrinks
per-cpu/node caches periodically. Thanks to list_lru reparenting, we no
longer keep entries for offline cgroups in per-memcg arrays (such as
memcg_cache_params->memcg_caches), so we do not have to bother if a
per-memcg cache will be shrunk a bit later than it could be.
[1] http://thread.gmane.org/gmane.linux.kernel.mm/118649/focus=118650
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>