* refs/heads/tmp-5e24b4e
Linux 4.4.153
ovl: warn instead of error if d_type is not supported
ovl: Do d_type check only if work dir creation was successful
ovl: Ensure upper filesystem supports d_type
x86/mm: Fix use-after-free of ldt_struct
x86/mm/pat: Fix L1TF stable backport for CPA
ANDROID: x86_64_cuttlefish_defconfig: Enable lz4 compression for zram
UPSTREAM: drivers/block/zram/zram_drv.c: fix bug storing backing_dev
BACKPORT: zram: introduce zram memory tracking
BACKPORT: zram: record accessed second
BACKPORT: zram: mark incompressible page as ZRAM_HUGE
UPSTREAM: zram: correct flag name of ZRAM_ACCESS
UPSTREAM: zram: Delete gendisk before cleaning up the request queue
UPSTREAM: drivers/block/zram/zram_drv.c: make zram_page_end_io() static
BACKPORT: zram: set BDI_CAP_STABLE_WRITES once
UPSTREAM: zram: fix null dereference of handle
UPSTREAM: zram: add config and doc file for writeback feature
BACKPORT: zram: read page from backing device
BACKPORT: zram: write incompressible pages to backing device
BACKPORT: zram: identify asynchronous IO's return value
BACKPORT: zram: add free space management in backing device
UPSTREAM: zram: add interface to specif backing device
UPSTREAM: zram: rename zram_decompress_page to __zram_bvec_read
UPSTREAM: zram: inline zram_compress
UPSTREAM: zram: clean up duplicated codes in __zram_bvec_write
Linux 4.4.152
reiserfs: fix broken xattr handling (heap corruption, bad retval)
i2c: imx: Fix race condition in dma read
PCI: pciehp: Fix use-after-free on unplug
PCI: Skip MPS logic for Virtual Functions (VFs)
PCI: hotplug: Don't leak pci_slot on registration failure
parisc: Remove unnecessary barriers from spinlock.h
bridge: Propagate vlan add failure to user
packet: refine ring v3 block size test to hold one frame
netfilter: conntrack: dccp: treat SYNC/SYNCACK as invalid if no prior state
xfrm_user: prevent leaking 2 bytes of kernel memory
parisc: Remove ordered stores from syscall.S
ext4: fix spectre gadget in ext4_mb_regular_allocator()
KVM: irqfd: fix race between EPOLLHUP and irq_bypass_register_consumer
staging: android: ion: check for kref overflow
tcp: identify cryptic messages as TCP seq # bugs
net: qca_spi: Fix log level if probe fails
net: qca_spi: Make sure the QCA7000 reset is triggered
net: qca_spi: Avoid packet drop during initial sync
net: usb: rtl8150: demote allmulti message to dev_dbg()
net/ethernet/freescale/fman: fix cross-build error
drm/nouveau/gem: off by one bugs in nouveau_gem_pushbuf_reloc_apply()
tcp: remove DELAYED ACK events in DCTCP
qlogic: check kstrtoul() for errors
packet: reset network header if packet shorter than ll reserved space
ixgbe: Be more careful when modifying MAC filters
ARM: dts: am3517.dtsi: Disable reference to OMAP3 OTG controller
ARM: 8780/1: ftrace: Only set kernel memory back to read-only after boot
perf llvm-utils: Remove bashism from kernel include fetch script
bnxt_en: Fix for system hang if request_irq fails
drm/armada: fix colorkey mode property
ieee802154: fakelb: switch from BUG_ON() to WARN_ON() on problem
ieee802154: at86rf230: use __func__ macro for debug messages
ieee802154: at86rf230: switch from BUG_ON() to WARN_ON() on problem
ARM: pxa: irq: fix handling of ICMR registers in suspend/resume
netfilter: x_tables: set module owner for icmp(6) matches
smsc75xx: Add workaround for gigabit link up hardware errata.
kasan: fix shadow_size calculation error in kasan_module_alloc
tracing: Use __printf markup to silence compiler
ARM: imx_v4_v5_defconfig: Select ULPI support
ARM: imx_v6_v7_defconfig: Select ULPI support
HID: wacom: Correct touch maximum XY of 2nd-gen Intuos
m68k: fix "bad page state" oops on ColdFire boot
bnx2x: Fix receiving tx-timeout in error or recovery state.
drm/exynos: decon5433: Fix WINCONx reset value
drm/exynos: decon5433: Fix per-plane global alpha for XRGB modes
drm/exynos: gsc: Fix support for NV16/61, YUV420/YVU420 and YUV422 modes
md/raid10: fix that replacement cannot complete recovery after reassemble
dmaengine: k3dma: Off by one in k3_of_dma_simple_xlate()
ARM: dts: da850: Fix interrups property for gpio
selftests/x86/sigreturn/64: Fix spurious failures on AMD CPUs
perf report powerpc: Fix crash if callchain is empty
perf test session topology: Fix test on s390
usb: xhci: increase CRS timeout value
ARM: dts: am437x: make edt-ft5x06 a wakeup source
brcmfmac: stop watchdog before detach and free everything
cxgb4: when disabling dcb set txq dcb priority to 0
Smack: Mark inode instant in smack_task_to_inode
ipv6: mcast: fix unsolicited report interval after receiving querys
locking/lockdep: Do not record IRQ state within lockdep code
net: davinci_emac: match the mdio device against its compatible if possible
ARC: Enable machine_desc->init_per_cpu for !CONFIG_SMP
net: propagate dev_get_valid_name return code
net: hamradio: use eth_broadcast_addr
enic: initialize enic->rfs_h.lock in enic_probe
qed: Add sanity check for SIMD fastpath handler.
arm64: make secondary_start_kernel() notrace
scsi: xen-scsifront: add error handling for xenbus_printf
usb: gadget: dwc2: fix memory leak in gadget_init()
usb: gadget: composite: fix delayed_status race condition when set_interface
usb: dwc2: fix isoc split in transfer with no data
ARM: dts: Cygnus: Fix I2C controller interrupt type
selftests: sync: add config fragment for testing sync framework
selftests: zram: return Kselftest Skip code for skipped tests
selftests: user: return Kselftest Skip code for skipped tests
selftests: static_keys: return Kselftest Skip code for skipped tests
selftests: pstore: return Kselftest Skip code for skipped tests
netfilter: ipv6: nf_defrag: reduce struct net memory waste
ARC: Explicitly add -mmedium-calls to CFLAGS
ANDROID: x86_64_cuttlefish_defconfig: Enable zram and zstd
BACKPORT: crypto: zstd - Add zstd support
UPSTREAM: zram: add zstd to the supported algorithms list
UPSTREAM: lib: Add zstd modules
UPSTREAM: lib: Add xxhash module
UPSTREAM: zram: rework copy of compressor name in comp_algorithm_store()
UPSTREAM: zram: constify attribute_group structures.
UPSTREAM: zram: count same page write as page_stored
UPSTREAM: zram: reduce load operation in page_same_filled
UPSTREAM: zram: use zram_free_page instead of open-coded
UPSTREAM: zram: introduce zram data accessor
UPSTREAM: zram: remove zram_meta structure
UPSTREAM: zram: use zram_slot_lock instead of raw bit_spin_lock op
BACKPORT: zram: partial IO refactoring
BACKPORT: zram: handle multiple pages attached bio's bvec
UPSTREAM: zram: fix operator precedence to get offset
BACKPORT: zram: extend zero pages to same element pages
BACKPORT: zram: remove waitqueue for IO done
UPSTREAM: zram: remove obsolete sysfs attrs
UPSTREAM: zram: support BDI_CAP_STABLE_WRITES
UPSTREAM: zram: revalidate disk under init_lock
BACKPORT: mm: support anonymous stable page
UPSTREAM: zram: use __GFP_MOVABLE for memory allocation
UPSTREAM: zram: drop gfp_t from zcomp_strm_alloc()
UPSTREAM: zram: add more compression algorithms
UPSTREAM: zram: delete custom lzo/lz4
UPSTREAM: zram: cosmetic: cleanup documentation
UPSTREAM: zram: use crypto api to check alg availability
BACKPORT: zram: switch to crypto compress API
UPSTREAM: zram: rename zstrm find-release functions
UPSTREAM: zram: introduce per-device debug_stat sysfs node
UPSTREAM: zram: remove max_comp_streams internals
UPSTREAM: zram: user per-cpu compression streams
BACKPORT: zsmalloc: require GFP in zs_malloc()
UPSTREAM: zram/zcomp: do not zero out zcomp private pages
UPSTREAM: zram: pass gfp from zcomp frontend to backend
UPSTREAM: socket: close race condition between sock_close() and sockfs_setattr()
ANDROID: Refresh x86_64_cuttlefish_defconfig
Linux 4.4.151
isdn: Disable IIOCDBGVAR
Bluetooth: avoid killing an already killed socket
x86/mm: Simplify p[g4um]d_page() macros
serial: 8250_dw: always set baud rate in dw8250_set_termios
ACPI / PM: save NVS memory for ASUS 1025C laptop
ACPI: save NVS memory for Lenovo G50-45
USB: option: add support for DW5821e
USB: serial: sierra: fix potential deadlock at close
ALSA: vxpocket: Fix invalid endian conversions
ALSA: memalloc: Don't exceed over the requested size
ALSA: hda: Correct Asrock B85M-ITX power_save blacklist entry
ALSA: cs5535audio: Fix invalid endian conversion
ALSA: virmidi: Fix too long output trigger loop
ALSA: vx222: Fix invalid endian conversions
ALSA: hda - Turn CX8200 into D3 as well upon reboot
ALSA: hda - Sleep for 10ms after entering D3 on Conexant codecs
net_sched: fix NULL pointer dereference when delete tcindex filter
vsock: split dwork to avoid reinitializations
net_sched: Fix missing res info when create new tc_index filter
llc: use refcount_inc_not_zero() for llc_sap_find()
l2tp: use sk_dst_check() to avoid race on sk->sk_dst_cache
dccp: fix undefined behavior with 'cwnd' shift in ccid2_cwnd_restart()
Conflicts:
drivers/block/zram/zram_drv.c
drivers/staging/android/ion/ion.c
include/linux/swap.h
mm/zsmalloc.c
Change-Id: I1c437ac5133503a939d06d51ec778b65371df6d1
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
The call to strlcpy in backing_dev_store is incorrect. It should take
the size of the destination buffer instead of the size of the source
buffer. Additionally, ignore the newline character (\n) when reading
the new file_name buffer. This makes it possible to set the backing_dev
as follows:
echo /dev/sdX > /sys/block/zram0/backing_dev
The reason it worked before was the fact that strlcpy() copies 'len - 1'
bytes, which is strlen(buf) - 1 in our case, so it accidentally didn't
copy the trailing new line symbol. Which also means that "echo -n
/dev/sdX" most likely was broken.
Signed-off-by: Peter Kalauskas <peskal@google.com>
Link: http://lkml.kernel.org/r/20180813061623.GC64836@rodete-desktop-imager.corp.google.com
Acked-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: <stable@vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit c8bd134a4bddafe5917d163eea73873932c15e83)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I0a0d602b61169ae9adc8f89914ce4e30cc10e191
zRam as swap is useful for small memory device. However, swap means
those pages on zram are mostly cold pages due to VM's LRU algorithm.
Especially, once init data for application are touched for launching,
they tend to be not accessed any more and finally swapped out. zRAM can
store such cold pages as compressed form but it's pointless to keep in
memory. Better idea is app developers free them directly rather than
remaining them on heap.
This patch tell us last access time of each block of zram via "cat
/sys/kernel/debug/zram/zram0/block_state".
The output is as follows,
300 75.033841 .wh
301 63.806904 s..
302 63.806919 ..h
First column is zram's block index and 3rh one represents symbol (s:
same page w: written page to backing store h: huge page) of the block
state. Second column represents usec time unit of the block was last
accessed. So above example means the 300th block is accessed at
75.033851 second and it was huge so it was written to the backing store.
Admin can leverage this information to catch cold|incompressible pages
of process with *pagemap* once part of heaps are swapped out.
I used the feature a few years ago to find memory hoggers in userspace
to notify them what memory they have wasted without touch for a long
time. With it, they could reduce unnecessary memory space. However, at
that time, I hacked up zram for the feature but now I need the feature
again so I decided it would be better to upstream rather than keeping it
alone. I hope I submit the userspace tool to use the feature soon.
[akpm@linux-foundation.org: fix i386 printk warning]
[minchan@kernel.org: use ktime_get_boottime() instead of sched_clock()]
Link: http://lkml.kernel.org/r/20180420063525.GA253739@rodete-desktop-imager.corp.google.com
[akpm@linux-foundation.org: documentation tweak]
[akpm@linux-foundation.org: fix i386 printk warning]
[minchan@kernel.org: fix compile warning]
Link: http://lkml.kernel.org/r/20180508104849.GA8209@rodete-desktop-imager.corp.google.com
[rdunlap@infradead.org: fix printk formats]
Link: http://lkml.kernel.org/r/3652ccb1-96ef-0b0b-05d1-f661d7733dcc@infradead.org
Link: http://lkml.kernel.org/r/20180416090946.63057-5-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit c0265342bff4fcaa2cdf13f4596244c18d4a7ae5)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I932447d33d1b6af78ae6463b494006c725e5e38c
zRam as swap is useful for small memory device. However, swap means
those pages on zram are mostly cold pages due to VM's LRU algorithm.
Especially, once init data for application are touched for launching,
they tend to be not accessed any more and finally swapped out. zRAM can
store such cold pages as compressed form but it's pointless to keep in
memory. Better idea is app developers free them directly rather than
remaining them on heap.
This patch records last access time of each block of zram so that With
upcoming zram memory tracking, it could help userspace developers to
reduce memory footprint.
Link: http://lkml.kernel.org/r/20180416090946.63057-4-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit d7eac6b6e1838ef1a1400df4ec55daa34bbc855e)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I5b217d3cd4da57e548196658e0824d65a0cad631
Mark incompressible pages so that we could investigate who is the owner
of the incompressible pages once the page is swapped out via using
upcoming zram memory tracker feature.
With it, we could prevent such pages to be swapped out by using mlock.
Otherwise we might remove them.
This patch exposes new stat for huge pages via mm_stat.
Link: http://lkml.kernel.org/r/20180416090946.63057-3-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 89e85bce4b02edb7408aebf69d5d1a6692a05f4f)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: If1b7b2d6ea6672a575ffc3d70c2c8b58ecafd0d7
Patch series "zram memory tracking", v5.
zRam as swap is useful for small memory device. However, swap means
those pages on zram are mostly cold pages due to VM's LRU algorithm.
Especially, once init data for application are touched for launching,
they tend to be not accessed any more and finally swapped out. zRAM can
store such cold pages as compressed form but it's pointless to keep in
memory. As well, it's pointless to store incompressible pages to zram
so better idea is app developers manages them directly like free or
mlock rather than remaining them on heap.
This patch provides a debugfs /sys/kernel/debug/zram/zram0/block_state
to represent each block's state so admin can investigate what memory is
cold|incompressible|same page with using pagemap once the pages are
swapped out.
The output is as follows:
300 75.033841 .wh
301 63.806904 s..
302 63.806919 ..h
First column is zram's block index and 3rh one represents symbol (s:
same page w: written page to backing store h: huge page) of the block
state. Second column represents usec time unit of the block was last
accessed. So above example means the 300th block is accessed at
75.033851 second and it was huge so it was written to the backing store.
This patch (of 4):
ZRAM_ACCESS is used for locking a slot of zram so correct the name. It
is also not a common flag to indicate status of the block so move the
declare position on top of the flag. Lastly, let's move the function to
the top of source code to be able to use it easily without forward
declaration.
Link: http://lkml.kernel.org/r/20180416090946.63057-2-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit c4d6c4cc7bfd5ecc18548420b7fb9440cf8416ae)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I037a22a739fb4005918eb668d10e8be354a1524f
Remove the disk, partition and bdi sysfs attributes before cleaning up
the request queue associated with the disk.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 392db38058eb47250a9d0cc737af37e78a7e443d)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: Ifbcb6e03fee764054dc9a371c00b95547e4de745
zram_page_end_io() is local to the source and does not need to be in
global scope, so make it static.
Cleans up sparse warning:
symbol 'zram_page_end_io' was not declared. Should it be static?
Link: http://lkml.kernel.org/r/20171016173336.20320-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 384bc41fc064bd8b12b7081aa3e81d26f3407045)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: Ie0f250e580bc1dd16e963b5dbe5bdc429fb4cd65
With fast swap storage, the platform wants to use swap more aggressively
and swap-in is crucial to application latency.
The rw_page() based synchronous devices like zram, pmem and btt are such
fast storage. When I profile swapin performance with zram lz4
decompress test, S/W overhead is more than 70%. Maybe, it would be
bigger in nvdimm.
This patchset reduces swap-in latency by skipping swapcache if the swap
device is a synchronous device like a rw_page() based device.
It enhances by 45% my swapin test (5G sequential swapin, no readahead)
from 2.41sec to 1.64sec.
This patch (of 4):
Commit 19b7ccf8651d ("block: get rid of blk_integrity_revalidate()")
fixed a weird thing (i.e., reset BDI_CAP_STABLE_WRITES flag
unconditionally whenever revalidat_disk is called) so zram doesn't need
to reset the flag any more when revalidating the bdev. Instead, set the
flag just once when the zram device is created.
It shouldn't change any behavior.
Link: http://lkml.kernel.org/r/1505886205-9671-2-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Hugh Dickins <hughd@google.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit e447a0151f7ce8dd884fea48279274bd64434c29)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: If41edc4871ed470f050bbf4d51a24fe5c0e18738
In testing I found handle passed to zs_map_object in __zram_bvec_read is
NULL so eh kernel goes oops in pin_object().
The reason is there is no routine to check the slot's freeing after
getting the slot's lock. This patch fixes it.
[minchan@kernel.org: v2]
Link: http://lkml.kernel.org/r/1505887347-10881-1-git-send-email-minchan@kernel.org
Link: http://lkml.kernel.org/r/1505788488-26723-1-git-send-email-minchan@kernel.org
Fixes: 1f7319c74275 ("zram: partial IO refactoring")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit ae94264ed4b0cf7cd887947650db4c69acb62072)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I0ff4a8c2f1fcd0ee39511985809b58bf94b2d44c
This patch enables read IO from backing device. For the feature, it
implements two IO read functions to transfer data from backing storage.
One is asynchronous IO function and other is synchronous one.
A reason I need synchrnous IO is due to partial write which need to
complete read IO before the overwriting partial data.
We can make the partial IO's case asynchronous, too but at the moment, I
don't feel adding more complexity to support such rare use cases so want
to go with simple.
[xieyisheng1@huawei.com: read_from_bdev_async(): return 1 to avoid call page_endio() in zram_rw_page()]
Link: http://lkml.kernel.org/r/1502707447-6944-1-git-send-email-xieyisheng1@huawei.com
Link: http://lkml.kernel.org/r/1498459987-24562-9-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 8e654f8fbff52ac483fb69957222853d7e2fc588)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: Ia82f5fc4697aacc723a336e4dad4e7bc56a1bdb9
This patch enables write IO to transfer data to backing device. For
that, it implements write_to_bdev function which creates new bio and
chaining with parent bio to make the parent bio asynchrnous.
For rw_page which don't have parent bio, it submit owned bio and handle
IO completion by zram_page_end_io.
Also, this patch defines new flag ZRAM_WB to mark written page for later
read IO.
[xieyisheng1@huawei.com: fix typo in comment]
Link: http://lkml.kernel.org/r/1502707447-6944-2-git-send-email-xieyisheng1@huawei.com
Link: http://lkml.kernel.org/r/1498459987-24562-8-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit db8ffbd4e7634cc537c8d32e73e7ce0f06248645)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: Ie675efd6c3ec04a151443f1cd0bf798d4847710f
For upcoming asynchronous IO like writeback, zram_rw_page should be
aware of that whether requested IO was completed or submitted
successfully, otherwise error.
For the goal, zram_bvec_rw has three return values.
-errno: returns error number
0: IO request is done synchronously
1: IO request is issued successfully.
Link: http://lkml.kernel.org/r/1498459987-24562-7-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit ae85a8075c5b025b9d503554ddc480a346a24536)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: Id6e764b3eacfebdca2f46050648a49fc5f276b2c
With backing device, zram needs management of free space of backing
device.
This patch adds bitmap logic to manage free space which is very naive.
However, it would be simple enough as considering uncompressible pages's
frequenty in zram.
Link: http://lkml.kernel.org/r/1498459987-24562-6-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 1363d4662a0d28dfdb81ef426c88c9a8dbf7c338)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I37dc98b40bfddceb9eb6d989ca30683dbf89210c
For writeback feature, user should set up backing device before the zram
working.
This patch enables the interface via /sys/block/zramX/backing_dev.
Currently, it supports block device only but it could be enhanced for
file as well.
Link: http://lkml.kernel.org/r/1498459987-24562-5-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 013bf95a83ec760a2afc37fabd6bf13a9cdae205)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I4bbf12ed7496d476bddd574e756bac5c8a838089
zram_decompress_page naming is not proper because it doesn't decompress
if page was dedup hit or stored with compression.
Use more abstract term and consistent with write path function
__zram_bvec_write.
Link: http://lkml.kernel.org/r/1498459987-24562-4-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 693dc1ce25b8c8fa33f930d47cd8f926eeb90812)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: Ia7c948f4b78601458b7ebc23ab345d4bc0a8d4a8
zram_compress does several things, compress, entry alloc and check
limitation. I did for just readbility but it hurts modulization.:(
So this patch removes zram_compress functions and inline it in
__zram_bvec_write for upcoming patches.
Link: http://lkml.kernel.org/r/1498459987-24562-3-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 97ec7c8bd5d029b2c3e40355c1204197094e9ba1)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: Ibb37d77168edd0b01d0b9820e431c73cc3c2ff20
Patch series "writeback incompressible pages to storage", v1.
zRam is useful for memory saving with compressible pages but sometime,
workload can be changed and system has lots of incompressible pages
which is very harmful for zram.
This patch supports writeback feature of zram so admin can set up a
block device and with it, zram can save the memory via writing out the
incompressile pages once it found it's incompressible pages (1/4 comp
ratio) instead of keeping the page in memory.
[1-3] is just clean up and [4-8] is step by step feature enablement.
[4-8] is logically not bisectable(ie, logical unit separation)
although I tried to compiled out without breaking but I think it would
be better to review.
This patch (of 9):
__zram_bvec_write has some of duplicated logic for zram meta data
handling of same_page|compressed_page. This patch aims to clean it up
without behavior change.
[xieyisheng1@huawei.com: fix compr_data_size stat]
Link: http://lkml.kernel.org/r/1502707447-6944-1-git-send-email-xieyisheng1@huawei.com
Link: http://lkml.kernel.org/r/1496019048-27016-1-git-send-email-minchan@kernel.org
Link: http://lkml.kernel.org/r/1498459987-24562-2-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Juneho Choi <juno.choi@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 4ebbe7f7fc99260afd51759e35dbfdd6010dc697)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I3fa150c869a66ff289712b956924ecb361864a2e
comp_algorithm_store() passes the size of the source buffer to strlcpy()
instead of the destination buffer size. Make it explicit that the two
buffers have the same size and use strcpy() instead of strlcpy(). The
latter can be done safely since the function ensures that the string in
the source buffer is terminated.
Link: http://lkml.kernel.org/r/20170803163350.45245-1-mka@chromium.org
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit f357e345eef7863da037e0243f2d3df4ba6df986)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: Ic9667b215ce5e0717bc6829d65e43e9b79602362
attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/sysfs.h> work with
const attribute_group. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
8293 841 4 9138 23b2 drivers/block/zram/zram_drv.o
File size After adding 'const':
text data bss dec hex filename
8357 777 4 9138 23b2 drivers/block/zram/zram_drv.o
Link: http://lkml.kernel.org/r/65680c1c4d85818f7094cbfa31c91bf28185ba1b.1499061182.git.arvind.yadav.cs@gmail.com
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit bc1bb362334ebc4c65dd4301f10fb70902b3db7d)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: Ic0765dea8c2fadb18623605ba48748a9b33df3fa
Regardless of whether it is same page or not, it's surely write and
stored to zram so we should increase pages_stored stat. Otherwise, user
can see zero value via mm_stats although he writes a lot of pages to
zram.
Link: http://lkml.kernel.org/r/1494834068-27004-1-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 51f9f82c855d65ef14c2af10e0d2c86ec332a182)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I006d80df413a0fe0fd7dd58e535c6a2c03ab2c9d
In page_same_filled function, all elements in the page is compared with
next index value. The current comparison routine compares the (i)th and
(i+1)th values of the page.
In this case, two load operaions occur for each comparison. But if we
store first value of the page stores at 'val' variable and using it to
compare with others, the load opearation is reduced. It reduce load
operation per page by up to 64times.
Link: http://lkml.kernel.org/r/1488428104-7257-1-git-send-email-sangwoo2.park@lge.com
Signed-off-by: Sangwoo Park <sangwoo2.park@lge.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit f0fe9984656604ea8effd5ff82709ff8ce1f954b)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I6b58b583e83139eee9f0540da12850c43510cb8e
The zram_free_page already handles NULL handle case and same page so use
it to reduce error probability. (Acutaully, I made a mistake when I
handled same page feature)
Link: http://lkml.kernel.org/r/1492052365-16169-7-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 302128dce142d780417aa548bfd7ef4dfb89fa80)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: Ie38c52dfb1959377936b7cd9158ad1b5a02219bd
With element, sometime I got confused handle and element access. It
might be my bad but I think it's time to introduce accessor to prevent
future idiot like me. This patch is just clean-up patch so it shouldn't
change any behavior.
Link: http://lkml.kernel.org/r/1492052365-16169-6-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 643ae61d0f41c48aa7179921fe15ba4b4d8ddfec)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I3916d5561ab9fb2917455cac74bee431fbe84b5d
With this clean-up phase, I want to use zram's wrapper function to lock
table access which is more consistent with other zram's functions.
Link: http://lkml.kernel.org/r/1492052365-16169-4-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 86c49814d449ebc51c7d455ac8e3d17b9fa702eb)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I6afee89dce63dff6d759c78e25926814fc016107
For architecture(PAGE_SIZE > 4K), zram have supported partial IO.
However, the mixed code for handling normal/partial IO is too mess,
error-prone to modify IO handler functions with upcoming feature so this
patch aims for cleaning up zram's IO handling functions.
Link: http://lkml.kernel.org/r/1492052365-16169-3-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 1f7319c7427503abe2d365683588827b80f5714e)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I0f023d646d17f8156130cd0507b65f2223768adf
Patch series "zram clean up", v2.
This patchset aims to clean up zram .
[1] clean up multiple pages's bvec handling.
[2] clean up partial IO handling
[3-6] clean up zram via using accessor and removing pointless structure.
With [2-6] applied, we can get a few hundred bytes as well as huge
readibility enhance.
x86: 708 byte save
add/remove: 1/1 grow/shrink: 0/11 up/down: 478/-1186 (-708)
function old new delta
zram_special_page_read - 478 +478
zram_reset_device 317 314 -3
mem_used_max_store 131 128 -3
compact_store 96 93 -3
mm_stat_show 203 197 -6
zram_add 719 712 -7
zram_slot_free_notify 229 214 -15
zram_make_request 819 803 -16
zram_meta_free 128 111 -17
zram_free_page 180 151 -29
disksize_store 432 361 -71
zram_decompress_page.isra 504 - -504
zram_bvec_rw 2592 2080 -512
Total: Before=25350773, After=25350065, chg -0.00%
ppc64: 231 byte save
add/remove: 2/0 grow/shrink: 1/9 up/down: 681/-912 (-231)
function old new delta
zram_special_page_read - 480 +480
zram_slot_lock - 200 +200
vermagic 39 40 +1
mm_stat_show 256 248 -8
zram_meta_free 200 184 -16
zram_add 944 912 -32
zram_free_page 348 308 -40
disksize_store 572 492 -80
zram_decompress_page 664 564 -100
zram_slot_free_notify 292 160 -132
zram_make_request 1132 1000 -132
zram_bvec_rw 2768 2396 -372
Total: Before=17565825, After=17565594, chg -0.00%
This patch (of 6):
Johannes Thumshirn reported system goes the panic when using NVMe over
Fabrics loopback target with zram.
The reason is zram expects each bvec in bio contains a single page
but nvme can attach a huge bulk of pages attached to the bio's bvec
so that zram's index arithmetic could be wrong so that out-of-bound
access makes system panic.
[1] in mainline solved solved the problem by limiting max_sectors with
SECTORS_PER_PAGE but it makes zram slow because bio should split with
each pages so this patch makes zram aware of multiple pages in a bvec
so it could solve without any regression(ie, bio split).
[1] 0bc315381fe9, zram: set physical queue limits to avoid array out of
bounds accesses
Link: http://lkml.kernel.org/r/20170413134057.GA27499@bbox
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit e86942c7b6c1e1dd5e539f3bf3cfb63799163048)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: Ibedc9e8163fc16a0c2569e8c3e33dd81bb325ee5
In zram_rw_page, the logic to get offset is wrong by operator precedence
(i.e., "<<" is higher than "&"). With wrong offset, zram can corrupt
the user's data. This patch fixes it.
Fixes: 8c7f01025 ("zram: implement rw_page operation of zram")
Link: http://lkml.kernel.org/r/1492042622-12074-1-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 4ca82dabc9fbf7bc5322aa54d802cb3cb7b125c5)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I6abb2aef381463976aea1fa8e7f5ca07367190e9
The idea is that without doing more calculations we extend zero pages to
same element pages for zram. zero page is special case of same element
page with zero element.
1. the test is done under android 7.0
2. startup too many applications circularly
3. sample the zero pages, same pages (none-zero element)
and total pages in function page_zero_filled
the result is listed as below:
ZERO SAME TOTAL
36214 17842 598196
ZERO/TOTAL SAME/TOTAL (ZERO+SAME)/TOTAL ZERO/SAME
AVERAGE 0.060631909 0.024990816 0.085622726 2.663825038
STDEV 0.00674612 0.005887625 0.009707034 2.115881328
MAX 0.069698422 0.030046087 0.094975336 7.56043956
MIN 0.03959586 0.007332205 0.056055193 1.928985507
from the above data, the benefit is about 2.5% and up to 3% of total
swapout pages.
The defect of the patch is that when we recovery a page from non-zero
element the operations are low efficient for partial read.
This patch extends zero_page to same_page so if there is any user to
have monitored zero_pages, he will be surprised if the number is
increased but it's not harmful, I believe.
[minchan@kernel.org: do not free same element pages in zram_meta_free]
Link: http://lkml.kernel.org/r/20170207065741.GA2567@bbox
Link: http://lkml.kernel.org/r/1483692145-75357-1-git-send-email-zhouxianrong@huawei.com
Link: http://lkml.kernel.org/r/1486307804-27903-1-git-send-email-minchan@kernel.org
Signed-off-by: zhouxianrong <zhouxianrong@huawei.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 8e19d540d107ee897eb9a874844060c94e2376c0)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I92ebb07a6ad96be82443d6e0c0d4f25cbe936915
zram_reset_device() waits for ongoing writepage pages to be completed by
zram->refcount logic. However, it's pointless because before the reset,
we prevent further opening of zram by zram->claim and flush all of
pending IO by fsync_bdev so there should be no pending IO at the
zram_reset_device().
So let's remove that code which is even broken due to the lack of
wake_up elsewhere.
Link: http://lkml.kernel.org/r/1485145031-11661-1-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit a09759acaacf6cf738e1bc6c66d41485c87fd371)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I97170fb576be7baae63f82334af0dd5e91b16763
We had a deprecated_attr_warn() warning for 2 years and now the time has
come and we finally can do the cleanup.
The plan was as follows:
: per-stat sysfs attributes are considered to be deprecated.
: The basic strategy is:
: -- the existing RW nodes will be downgraded to WO nodes (in linux 4.11)
: -- deprecated RO sysfs nodes will eventually be removed (in linux 4.11)
:
: The list of deprecated attributes can be found here:
: Documentation/ABI/obsolete/sysfs-block-zram
:
: Basically, every attribute that has its own read accessible sysfs
: node (e.g. num_reads) *AND* is accessible via one of the stat files
: (zram<id>/stat or zram<id>/io_stat or zram<id>/mm_stat) is considered
: to be deprecated.
The patch also removes `obsolete/sysfs-block-zram', clean ups
`testing/sysfs-block-zram' and tweaks zram.txt files.
Link: http://lkml.kernel.org/r/20170118035838.11090-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit c87d1655c29500b459fb135258a93f8309ada9c7)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: Idd86259230d6a4bf0feeee53b5b69f3f3df774d4
zram has used per-cpu stream feature from v4.7. It aims for increasing
cache hit ratio of scratch buffer for compressing. Downside of that
approach is that zram should ask memory space for compressed page in
per-cpu context which requires stricted gfp flag which could be failed.
If so, it retries to allocate memory space out of per-cpu context so it
could get memory this time and compress the data again, copies it to the
memory space.
In this scenario, zram assumes the data should never be changed but it is
not true without stable page support. So, If the data is changed under
us, zram can make buffer overrun so that zsmalloc free object chain is
broken so system goes crash like below
https://bugzilla.suse.com/show_bug.cgi?id=997574
This patch adds BDI_CAP_STABLE_WRITES to zram for declaring "I am block
device needing *stable write*".
Fixes: da9556a2367c ("zram: user per-cpu compression streams")
Link: http://lkml.kernel.org/r/1482366980-3782-4-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Hyeoncheol Lee <cheol.lee@lge.com>
Cc: <yjay.kim@lge.com>
Cc: Sangseok Lee <sangseok.lee@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: <stable@vger.kernel.org> [4.7+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit b09ab054b69b07077bd3292f67e777861ac796e5)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I3134a5882c1939792ffa71b8f31f7ab642a0e9a3
Commit b4c5c60920 ("zram: avoid lockdep splat by revalidate_disk")
moved revalidate_disk call out of init_lock to avoid lockdep
false-positive splat. However, commit 08eee69fcf ("zram: remove
init_lock in zram_make_request") removed init_lock in IO path so there
is no worry about lockdep splat. So, let's restore it.
This patch is needed to set BDI_CAP_STABLE_WRITES atomically in next
patch.
Fixes: da9556a2367c ("zram: user per-cpu compression streams")
Link: http://lkml.kernel.org/r/1482366980-3782-3-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Hyeoncheol Lee <cheol.lee@lge.com>
Cc: <yjay.kim@lge.com>
Cc: Sangseok Lee <sangseok.lee@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: <stable@vger.kernel.org> [4.7+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit e7ccfc4ccb703e0f033bd4617580039898e912dd)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: Iebb6f694e46a797f8ce34029857c01c0c71086c7
We now allocate streams from CPU_UP hot-plug path, there are no
context-dependent stream allocations anymore and we can schedule from
zcomp_strm_alloc(). Use GFP_KERNEL directly and drop a gfp_t parameter.
Link: http://lkml.kernel.org/r/20160531122017.2878-9-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 16d37725a042cc66f9ee95889dd40e734264508e)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: If09c4a97f3d3e45ad578d2b1d64b26f65617774d
Add "deflate", "lz4hc", "842" algorithms to the list of known
compression backends. The real availability of those algorithms,
however, depends on the corresponding CONFIG_CRYPTO_FOO config options.
[sergey.senozhatsky@gmail.com: zram-add-more-compression-algorithms-v3]
Link: http://lkml.kernel.org/r/20160604024902.11778-7-sergey.senozhatsky@gmail.com
Link: http://lkml.kernel.org/r/20160531122017.2878-8-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit eb9f56d82547db407779967a2251ea28969245b0)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: Ie46c7676363ef13c559b45dab4968e2cc48a6cbe
There is no way to get a string with all the crypto comp algorithms
supported by the crypto comp engine, so we need to maintain our own
backends list. At the same time we additionally need to use
crypto_has_comp() to make sure that the user has requested a compression
algorithm that is recognized by the crypto comp engine. Relying on
/proc/crypto is not an options here, because it does not show
not-yet-inserted compression modules.
Example:
modprobe zram
cat /proc/crypto | grep -i lz4
modprobe lz4
cat /proc/crypto | grep -i lz4
name : lz4
driver : lz4-generic
module : lz4
So the user can't tell exactly if the lz4 is really supported from
/proc/crypto output, unless someone or something has loaded it.
This patch also adds crypto_has_comp() to zcomp_available_show(). We
store all the compression algorithms names in zcomp's `backends' array,
regardless the CONFIG_CRYPTO_FOO configuration, but show only those that
are also supported by crypto engine. This helps user to know the exact
list of compression algorithms that can be used.
Example:
module lz4 is not loaded yet, but is supported by the crypto
engine. /proc/crypto has no information on this module, while
zram's `comp_algorithm' lists it:
cat /proc/crypto | grep -i lz4
cat /sys/block/zram0/comp_algorithm
[lzo] lz4 deflate lz4hc 842
We still use the `backends' array to determine if the requested
compression backend is known to crypto api. This array, however, may not
contain some entries, therefore as the last step we call crypto_has_comp()
function which attempts to insmod the requested compression algorithm to
determine if crypto api supports it. The advantage of this method is that
now we permit the usage of out-of-tree crypto compression modules
(implementing S/W or H/W compression).
[sergey.senozhatsky@gmail.com: zram-use-crypto-api-to-check-alg-availability-v3]
Link: http://lkml.kernel.org/r/20160604024902.11778-4-sergey.senozhatsky@gmail.com
Link: http://lkml.kernel.org/r/20160531122017.2878-5-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 415403be37e204632b17bdb6857890fe5a220cea)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I7c823238329bd6e5180386507d16123228804cc5
This has started as a 'add zlib support' work, but after some thinking I
saw no blockers for a bigger change -- a switch to crypto API.
We don't have an idle zstreams list anymore and our write path now works
absolutely differently, preventing preemption during compression. This
removes possibilities of read paths preempting writes at wrong places
and opens the door for a move from custom LZO/LZ4 compression backends
implementation to a more generic one, using crypto compress API.
This patch set also eliminates the need of a new context-less crypto API
interface, which was quite hard to sell, so we can move along faster.
benchmarks:
(x86_64, 4GB, zram-perf script)
perf reported run-time fio (max jobs=3). I performed fio test with the
increasing number of parallel jobs (max to 3) on a 3G zram device, using
`static' data and the following crypto comp algorithms:
842, deflate, lz4, lz4hc, lzo
the output was:
- test running time (which can tell us what algorithms performs faster)
and
- zram mm_stat (which tells the compressed memory size, max used memory, etc).
It's just for information. for example, LZ4HC has twice the running
time of LZO, but the compressed memory size is: 23592960 vs 34603008
bytes.
test-fio-zram-842
197.907655282 seconds time elapsed
201.623142884 seconds time elapsed
226.854291345 seconds time elapsed
test-fio-zram-DEFLATE
253.259516155 seconds time elapsed
258.148563401 seconds time elapsed
290.251909365 seconds time elapsed
test-fio-zram-LZ4
27.022598717 seconds time elapsed
29.580522717 seconds time elapsed
33.293463430 seconds time elapsed
test-fio-zram-LZ4HC
56.393954615 seconds time elapsed
74.904659747 seconds time elapsed
101.940998564 seconds time elapsed
test-fio-zram-LZO
28.155948075 seconds time elapsed
30.390036330 seconds time elapsed
34.455773159 seconds time elapsed
zram mm_stat-s (max fio jobs=3)
test-fio-zram-842
mm_stat (jobs1): 3221225472 673185792 690266112 0 690266112 0 0
mm_stat (jobs2): 3221225472 673185792 690266112 0 690266112 0 0
mm_stat (jobs3): 3221225472 673185792 690266112 0 690266112 0 0
test-fio-zram-DEFLATE
mm_stat (jobs1): 3221225472 24379392 37761024 0 37761024 0 0
mm_stat (jobs2): 3221225472 24379392 37761024 0 37761024 0 0
mm_stat (jobs3): 3221225472 24379392 37761024 0 37761024 0 0
test-fio-zram-LZ4
mm_stat (jobs1): 3221225472 23592960 37761024 0 37761024 0 0
mm_stat (jobs2): 3221225472 23592960 37761024 0 37761024 0 0
mm_stat (jobs3): 3221225472 23592960 37761024 0 37761024 0 0
test-fio-zram-LZ4HC
mm_stat (jobs1): 3221225472 23592960 37761024 0 37761024 0 0
mm_stat (jobs2): 3221225472 23592960 37761024 0 37761024 0 0
mm_stat (jobs3): 3221225472 23592960 37761024 0 37761024 0 0
test-fio-zram-LZO
mm_stat (jobs1): 3221225472 34603008 50335744 0 50335744 0 0
mm_stat (jobs2): 3221225472 34603008 50335744 0 50335744 0 0
mm_stat (jobs3): 3221225472 34603008 50335744 0 50339840 0 0
This patch (of 8):
We don't perform any zstream idle list lookup anymore, so
zcomp_strm_find()/zcomp_strm_release() names are not representative.
Rename to zcomp_stream_get()/zcomp_stream_put().
Link: http://lkml.kernel.org/r/20160531122017.2878-2-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 2aea8493d326bdf15446768333e1d2c91b040b5c)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I2f4c9e215bca73ba5adb1354aaec6e32e420920d
debug_stat sysfs is read-only and represents various debugging data that
zram developers may need. This file is not meant to be used by anyone
else: its content is not documented and will change any time w/o any
notice. Therefore, the output of debug_stat file contains a version
string. To avoid any confusion, we will increase the version number
every time we modify the output.
At the moment this file exports only one value -- the number of
re-compressions, IOW, the number of times compression fast path has
failed. This stat is temporary any will be useful in case if any
per-cpu compression streams regressions will be reported.
Link: http://lkml.kernel.org/r/20160513230834.GB26763@bbox
Link: http://lkml.kernel.org/r/20160511134553.12655-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 623e47fc64f8de480b322b7ed68855f97137e2a5)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: Ie0ef61db7aa0b2c713de1d8bf48e8a545b4276e9
Remove the internal part of max_comp_streams interface, since we
switched to per-cpu streams. We will keep RW max_comp_streams attr
around, because:
a) we may (silently) switch back to idle compression streams list and
don't want to disturb user space
b) max_comp_streams attr must wait for the next 'lay off cycle'; we
give user space 2 years to adjust before we remove/downgrade the attr,
and there are already several attrs scheduled for removal in 4.11, so
it's too late for max_comp_streams.
This slightly change a user visible behaviour:
- First, reading from max_comp_stream file now will always return the
number of online CPUs.
- Second, writing to max_comp_stream will not take any effect.
Link: http://lkml.kernel.org/r/20160503165546.25201-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 43209ea2d17aae1540d4e28274e36404f72702f2)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I1902e741b4d3b83c5bd0d66bf1bae021dbfe2056
Pass GFP flags to zs_malloc() instead of using a fixed mask supplied to
zs_create_pool(), so we can be more flexible, but, more importantly, we
need this to switch zram to per-cpu compression streams -- zram will try
to allocate handle with preemption disabled in a fast path and switch to
a slow path (using different gfp mask) if the fast one has failed.
Apart from that, this also align zs_malloc() interface with zspool/zbud.
[sergey.senozhatsky@gmail.com: pass GFP flags to zs_malloc() instead of using a fixed mask]
Link: http://lkml.kernel.org/r/20160429150942.GA637@swordfish
Link: http://lkml.kernel.org/r/20160429150942.GA637@swordfish
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit d0d8da2dc49dfdfe1d788eaf4d55eb5d4964d926)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I31276c9351be21a4ed588681b332e98142b76526
Do not __GFP_ZERO allocated zcomp ->private pages. We keep allocated
streams around and use them for read/write requests, so we supply a
zeroed out ->private to compression algorithm as a scratch buffer only
once -- the first time we use that stream. For the rest of IO requests
served by this stream ->private usually contains some temporarily data
from the previous requests.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit e02d238c9852a91b30da9ea32ce36d1416cdc683)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I911832da703f596998a4139d6033ef1564848c9e
Each zcomp backend uses own gfp flag but it's pointless because the
context they could be called is driven by upper layer(ie, zcomp
frontend). As well, zcomp frondend could call them in different
context. One context(ie, zram init part) is it should be better to make
sure successful allocation other context(ie, further stream allocation
part for accelarating I/O speed) is just optional so let's pass gfp down
from driver (ie, zcomp frontend) like normal MM convention.
[sergey.senozhatsky@gmail.com: add missing __vmalloc zero and highmem gfps]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 75d8947a36d0c9aedd69118d1f14bf424005c7c2)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I572d0565de5aff94ebe0782eba9d34f9c9862060
* refs/heads/tmp-b1bad9e
Linux 4.4.141
loop: remember whether sysfs_create_group() was done
RDMA/ucm: Mark UCM interface as BROKEN
PM / hibernate: Fix oops at snapshot_write()
loop: add recursion validation to LOOP_CHANGE_FD
netfilter: x_tables: initialise match/target check parameter struct
netfilter: nf_queue: augment nfqa_cfg_policy
uprobes/x86: Remove incorrect WARN_ON() in uprobe_init_insn()
x86/cpufeature: Add helper macro for mask check macros
x86/cpufeature: Make sure DISABLED/REQUIRED macros are updated
x86/cpufeature: Update cpufeaure macros
x86/cpufeature, x86/mm/pkeys: Fix broken compile-time disabling of pkeys
x86/cpu: Add detection of AMD RAS Capabilities
x86/mm/pkeys: Fix mismerge of protection keys CPUID bits
x86/cpufeature, x86/mm/pkeys: Add protection keys related CPUID definitions
x86/cpufeature: Speed up cpu_feature_enabled()
x86/boot: Simplify kernel load address alignment check
x86/vdso: Use static_cpu_has()
x86/alternatives: Discard dynamic check after init
x86/alternatives: Add an auxilary section
x86/cpufeature: Get rid of the non-asm goto variant
x86/cpufeature: Replace the old static_cpu_has() with safe variant
x86/cpufeature: Carve out X86_FEATURE_*
x86/headers: Don't include asm/processor.h in asm/atomic.h
x86/fpu: Get rid of xstate_fault()
x86/fpu: Add an XSTATE_OP() macro
x86/cpu: Provide a config option to disable static_cpu_has
x86/cpufeature: Cleanup get_cpu_cap()
x86/cpufeature: Move some of the scattered feature bits to x86_capability
iw_cxgb4: correctly enforce the max reg_mr depth
tools build: fix # escaping in .cmd files for future Make
Fix up non-directory creation in SGID directories
HID: usbhid: add quirk for innomedia INNEX GENESIS/ATARI adapter
xhci: xhci-mem: off by one in xhci_stream_id_to_ring()
usb: quirks: add delay quirks for Corsair Strafe
USB: serial: mos7840: fix status-register error handling
USB: yurex: fix out-of-bounds uaccess in read handler
USB: serial: keyspan_pda: fix modem-status error handling
USB: serial: cp210x: add another USB ID for Qivicon ZigBee stick
USB: serial: ch341: fix type promotion bug in ch341_control_in()
ahci: Disable LPM on Lenovo 50 series laptops with a too old BIOS
vmw_balloon: fix inflation with batching
ibmasm: don't write out of bounds in read handler
MIPS: Fix ioremap() RAM check
cpufreq: Kconfig: Remove CPU_FREQ_DEFAULT_GOV_SCHED
Change-Id: I0909a2917621f2384cdfe27078577cc2c06b9612
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>