This patch supports to recognize hot file extension in f2fs, so that we
can allocate proper hot segment location for its data, which can lead to
better hot/cold seperation in filesystem.
In addition, we changes a bit on query/add/del operation method for
extension_list sysfs entry as below:
- Query: cat /sys/fs/f2fs/<disk>/extension_list
- Add: echo 'extension' > /sys/fs/f2fs/<disk>/extension_list
- Del: echo '!extension' > /sys/fs/f2fs/<disk>/extension_list
- Add: echo '[h/c]extension' > /sys/fs/f2fs/<disk>/extension_list
- Del: echo '[h/c]!extension' > /sys/fs/f2fs/<disk>/extension_list
- [h] means add/del hot file extension
- [c] means add/del cold file extension
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Sqlite user Background GC
- move_data_block
: move page #1
- f2fs_is_atomic_file
- f2fs_ioc_start_atomic_write
- f2fs_ioc_commit_atomic_write
- commit_inmem_pages
: commit page #1 & set node #2 dirty
- f2fs_submit_page_write
- f2fs_update_data_blkaddr
- set_page_dirty
: set node #2 dirty
- f2fs_do_sync_file
- fsync_node_pages
: commit node #1 & node #2, then sudden power-cut
In a race case, we may check FI_ATOMIC_FILE flag before starting atomic
write flow, then we will commit meta data before data with reversed
order, after a sudden pow-cut, database transaction will be inconsistent.
So we'd better to exclude gc/atomic_write to each other by using lock
instead of flag checking.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Otherwise, f2fs conducts GC on 8GB range only based on slow cost-benefit.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch avoids to skip discard commands when user sets gc_urgent mode.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If f2fs is running on top of very small devices, it's worth to avoid abusing
free LBAs. In order to achieve that, this patch introduces some parameter
tuning.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds an mount option, "alloc_mode=%s" having two options, "default"
and "reuse".
In "alloc_mode=reuse" case, f2fs starts to allocate segments from 0'th segment
all the time to reassign segments. It'd be useful for small-sized eMMC parts.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Jayashree Mohan reported:
A simple workload to reproduce this would be :
1. create foo
2. Write (8K - 16K) // foo size = 16K now
3. fsync()
4. falloc zero_range , keep_size (4202496 - 4210688) // foo size must be 16K
5. fdatasync()
Crash now
On recovery, we see that the file size is 4210688 and not 16K, which
violates the semantics of keep_size flag. We have a test case to
reproduce this using CrashMonkey on 4.15 kernel. Try this out by
simply running :
./c_harness -f /dev/sda -d /dev/cow_ram0 -t f2fs -e 102400 -P -v
tests/generic_468_zero.so
The root cause is that we miss to set KEEP_SIZE bit correctly in zero_range
when zeroing block cross EOF with FALLOC_FL_KEEP_SIZE, let's fix this
missing case.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
f2fs_super_block.encrypt_pw_salt can be udpated and persisted
concurrently, result in getting different pwsalt in separated
threads, so let's introduce sb_lock to exclude concurrent
accessers.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Pointer p is initialized with a value that is never read and is later
re-assigned a new value, hence the initialization is redundant and can
be removed.
Cleans up clang warning:
fs/f2fs/extent_cache.c:463:19: warning: Value stored to 'p' during
its initialization is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Previously, we attempt to flush the whole cp pack in a single bio,
however, when suddenly powering off at this time, we could get into
an extreme scenario that cp pack 1 page and cp pack 2 page are updated
and latest, but payload or current summaries are still partially
outdated. (see reliable write in the UFS specification)
This patch submits the whole cp pack except cp pack 2 page at first,
and then writes the cp pack 2 page with an extra independent
bio with pre-io barrier.
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch introduces F2FS_FEATURE_FUNCS to clean up the definitions of
different f2fs_sb_has_xxx functions.
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch removes redundant check of page type when submit bio to
make the logic more clear.
Signed-off-by: Tiezhu Yang <kernelpatch@126.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
There is no checksum in node block now, so bit-transition from hardware
can make node_footer.next_blkaddr being corrupted w/o any detection,
result in node chain becoming looped one.
For this condition, during recovery, in order to avoid running into dead
loop, let's detect it and just skip out.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This is to detect dquot_initialize errors early from evict_inode
for orphan inodes.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Add the 'whint_mode' mount option that controls which write
hints are passed down to block layer. There are "off" and
"user-based" mode. The default mode is "off".
1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.
2) whint_mode=user-based. F2FS tries to pass down hints given
by users.
User F2FS Block
---- ---- -----
META WRITE_LIFE_NOT_SET
HOT_NODE "
WARM_NODE "
COLD_NODE "
ioctl(COLD) COLD_DATA WRITE_LIFE_EXTREME
extension list " "
-- buffered io
WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
WRITE_LIFE_NONE " "
WRITE_LIFE_MEDIUM " "
WRITE_LIFE_LONG " "
-- direct io
WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
WRITE_LIFE_NONE " WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM " WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG " WRITE_LIFE_LONG
Many thanks to Chao Yu and Jaegeuk Kim for comments to
implement this patch.
Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid build warning]
[Chao Yu: fix to restore whint_mode in ->remount_fs]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Once CP_TRIMMED_FLAG is set, after a reboot, we will never issue discard
before LBA becomes invalid again, fix it by clearing the flag in
checkpoint without CP_TRIMMED reason.
Fixes: 1f43e2ad7bff ("f2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discard")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Previously, we will store all nat version bitmap in checkpoint pack block,
so our total node entry number has a limitation which caused total node
number can not exceed (3900 * 8) block * 455 node/block = 14196000. So
that once user wants to create more nodes in large size image, it becomes
a bottleneck, that's unreasonable.
This patch detects the new layout of nat/sit version bitmap in image in
order to enable supporting large nat bitmap.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If noextent_cache mount option is on, we will never initialize extent tree
in inode, but still we're going to access it in f2fs_drop_extent_tree,
result in kernel panic as below:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
IP: _raw_write_lock+0xc/0x30
Call Trace:
? f2fs_drop_extent_tree+0x41/0x70 [f2fs]
f2fs_fallocate+0x5a0/0xdd0 [f2fs]
? common_file_perm+0x47/0xc0
? apparmor_file_permission+0x1a/0x20
vfs_fallocate+0x15b/0x290
SyS_fallocate+0x44/0x70
do_syscall_64+0x6e/0x160
entry_SYSCALL64_slow_path+0x25/0x25
This patch fixes to check extent cache status before using in
f2fs_drop_extent_tree.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch limits to enable inline_xattr_size mount option only if
both extra_attr and flexible_inline_xattr feature is on in current
image.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Commit 7a20b8a61eff81bdb7097a578752a74860e9d142 ("f2fs: allocate node
and hot data in the beginning of partition") introduces another mount
option, heap, to reset it back. But it does not do anything for heap
mode, so fix it.
Cc: stable@vger.kernel.org
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
sb_getblk does not guarantee the buffer head is uptodate. If bh is not
uptodate, the data (may be used as boot code) in area before
F2FS_SUPER_OFFSET may get corrupted when super block is committed.
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
gcc versions prior to 4.6 require an extra level of braces when using a
designated initializer for a member in an anonymous struct or union.
This caused a compile error with the 'struct qstr' initialization in
__fscrypt_encrypt_symlink().
Fix it by using QSTR_INIT().
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Fixes: 76e81d6d5048 ("fscrypt: new helper functions for ->symlink()")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
commit b18cb64ead400c01bf1580eeba330ace51f8087d upstream.
This reverts more of:
b76437579d ("procfs: mark thread stack correctly in proc/<pid>/maps")
... which was partially reverted by:
65376df58217 ("proc: revert /proc/<pid>/maps [stack:TID] annotation")
Originally, /proc/PID/task/TID/maps was the same as /proc/TID/maps.
In current kernels, /proc/PID/maps (or /proc/TID/maps even for
threads) shows "[stack]" for VMAs in the mm's stack address range.
In contrast, /proc/PID/task/TID/maps uses KSTK_ESP to guess the
target thread's stack's VMA. This is racy, probably returns garbage
and, on arches with CONFIG_TASK_INFO_IN_THREAD=y, is also crash-prone:
KSTK_ESP is not safe to use on tasks that aren't known to be running
ordinary process-context kernel code.
This patch removes the difference and just shows "[stack]" for VMAs
in the mm's stack range. This is IMO much more sensible -- the
actual "stack" address really is treated specially by the VM code,
and the current thread stack isn't even well-defined for programs
that frequently switch stacks on their own.
Reported-by: Jann Horn <jann@thejh.net>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux API <linux-api@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tycho Andersen <tycho.andersen@canonical.com>
Link: http://lkml.kernel.org/r/3e678474ec14e0a0ec34c611016753eea2e1b8ba.1475257877.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9280cdd6fe5b8287a726d24cc1d558b96c8491d7 upstream.
cmd in COMPATIBLE_IOCTL is always a u32, so cast it so there isn't a
warning about an overflow in XFORM.
From: Mark Charlebois <charlebm@gmail.com>
Signed-off-by: Mark Charlebois <charlebm@gmail.com>
Signed-off-by: Behan Webster <behanw@converseincode.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Filesystems don't need fscrypt_fname_encrypted_size() anymore, so
unexport it and move it to fscrypt_private.h.
We also never calculate the encrypted size of a filename without having
the fscrypt_info present since it is needed to know the amount of
NUL-padding which is determined by the encryption policy, and also we
will always truncate the NUL-padding to the maximum filename length.
Therefore, also make fscrypt_fname_encrypted_size() assume that the
fscrypt_info is present, and make it truncate the returned length to the
specified max_len.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Previously fscrypt_fname_alloc_buffer() was used to allocate buffers for
both presented (decrypted or encoded) and encrypted filenames. That was
confusing, because it had to allocate the worst-case size for either,
e.g. including NUL-padding even when it was meaningless.
But now that fscrypt_setup_filename() no longer calls it, it is only
used in the ->get_link() and ->readdir() paths, which specifically want
a buffer for presented filenames. Therefore, switch the behavior over
to allocating the buffer for presented filenames only.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Currently, when encrypting a filename (either a real filename or a
symlink target) we calculate the amount of NUL-padding twice: once
before encryption and once during encryption in fname_encrypt(). It is
needed before encryption to allocate the needed buffer size as well as
calculate the size the symlink target will take up on-disk before
creating the symlink inode. Calculating the size during encryption as
well is redundant.
Remove this redundancy by always calculating the exact size beforehand,
and making fname_encrypt() just add as much NUL padding as is needed to
fill the output buffer.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Now that all filesystems have been converted to use the symlink helper
functions, they no longer need the declaration of 'struct
fscrypt_symlink_data'. Move it from fscrypt.h to fscrypt_private.h.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
fscrypt_fname_usr_to_disk() sounded very generic but was actually only
used to encrypt symlinks. Remove it now that all filesystems have been
switched over to fscrypt_encrypt_symlink().
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Filesystems also have duplicate code to support ->get_link() on
encrypted symlinks. Factor it out into a new function
fscrypt_get_symlink(). It takes in the contents of the encrypted
symlink on-disk and provides the target (decrypted or encoded) that
should be returned from ->get_link().
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Currently, filesystems supporting fscrypt need to implement some tricky
logic when creating encrypted symlinks, including handling a peculiar
on-disk format (struct fscrypt_symlink_data) and correctly calculating
the size of the encrypted symlink. Introduce helper functions to make
things a bit easier:
- fscrypt_prepare_symlink() computes and validates the size the symlink
target will require on-disk.
- fscrypt_encrypt_symlink() creates the encrypted target if needed.
The new helpers actually fix some subtle bugs. First, when checking
whether the symlink target was too long, filesystems didn't account for
the fact that the NUL padding is meant to be truncated if it would cause
the maximum length to be exceeded, as is done for filenames in
directories. Consequently users would receive ENAMETOOLONG when
creating symlinks close to what is supposed to be the maximum length.
For example, with EXT4 with a 4K block size, the maximum symlink target
length in an encrypted directory is supposed to be 4093 bytes (in
comparison to 4095 in an unencrypted directory), but in
FS_POLICY_FLAGS_PAD_32-mode only up to 4064 bytes were accepted.
Second, symlink targets of "." and ".." were not being encrypted, even
though they should be, as these names are special in *directory entries*
but not in symlink targets. Fortunately, we can fix this simply by
starting to encrypt them, as old kernels already accept them in
encrypted form.
Third, the output string length the filesystems were providing when
doing the actual encryption was incorrect, as it was forgotten to
exclude 'sizeof(struct fscrypt_symlink_data)'. Fortunately though, this
bug didn't make a difference.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
fscrypt.h included way too many other headers, given that it is included
by filesystems both with and without encryption support. Trim down the
includes list by moving the needed includes into more appropriate
places, and removing the unneeded ones.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Only fs/crypto/fname.c cares about treating the "." and ".." filenames
specially with regards to encryption, so move fscrypt_is_dot_dotdot()
from fscrypt.h to there.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The encryption modes are validated by fs/crypto/, not by individual
filesystems. Therefore, move fscrypt_valid_enc_modes() from fscrypt.h
to fscrypt_private.h.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The fscrypt_info kmem_cache is internal to fscrypt; filesystems don't
need to access it. So move its declaration into fscrypt_private.h.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
page allocated in fuse_dentry_canonical_path to be handled in
fuse_dev_do_write is allocated using __get_free_pages(GFP_KERNEL).
This may not return a page with data filled with 0. Now this
page may not have a null terminator at all.
If this happens and userspace fuse daemon screws up by passing a string
to kernel which is not NULL terminated (or did not fill anything),
then inside fuse driver in kernel when we try to do
strlen(fuse_dev_write->kern_path->getname_kernel)
on that page data -> it may give us issue with kernel paging request.
Unable to handle kernel paging request at virtual address
------------[ cut here ]------------
<..>
PC is at strlen+0x10/0x90
LR is at getname_kernel+0x2c/0xf4
<..>
strlen+0x10/0x90
kern_path+0x28/0x4c
fuse_dev_do_write+0x5b8/0x694
fuse_dev_write+0x74/0x94
do_iter_readv_writev+0x80/0xb8
do_readv_writev+0xec/0x1cc
vfs_writev+0x54/0x64
SyS_writev+0x64/0xe4
el0_svc_naked+0x24/0x28
To avoid this we should ensure in case of FUSE_CANONICAL_PATH,
the page is null terminated.
Change-Id: I33ca7cc76b4472eaa982c67bb20685df451121f5
Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org>
Bug: 75984715
[Daniel - small edit, using args size ]
Signed-off-by: Daniel Rosenberg <drosen@google.com>
sdcardfs_name_match gets a 'name' argument from the underlying FS.
This need not be null terminated string.
So in sdcardfs_name_match -> qstr_case_eq -> we should use
str_n_case_eq.
This happens because few of the entries in lower level FS may not be
NULL terminated and may have some garbage characters passed while
doing sdcardfs_name_match.
For e.g.
# dmesg |grep Download
[ 103.646386] sdcardfs_name_match: q1->name=.nomedia, q1->len=8,
q2->name=Download\x17\x80\x03, q2->len=8
[ 104.021340] sdcardfs_name_match: q1->name=.nomedia, q1->len=8,
q2->name=Download\x17\x80\x03, q2->len=8
[ 105.196864] sdcardfs_name_match: q1->name=.nomedia, q1->len=8,
q2->name=Download\x17\x80\x03, q2->len=8
[ 109.113521] sdcardfs_name_match: q1->name=logs, q1->len=4,
q2->name=Download\x17\x80\x03, q2->len=8
Now when we try to create a directory with different case for a such
files. SDCARDFS creates a entry if it could not find the underlying
entry in it's dcache.
To reproduce:-
1. bootup the device wait for some time after sdcardfs mounting to
complete.
2. cd /storage/emulated/0
3. echo 3 > /proc/sys/vm/drop_caches
4. mkdir download
We now start seeing two entries with name.
Download & download.
Change-Id: I976d92a220a607dd8cdb96c01c2041c5c2bc3326
Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org>
bug: 75987238
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlq7xXcACgkQONu9yGCS
aT5Syw/+K3OYEdVvooPrlPKHlCey8PUP8BuX7dYJ9eiRD+oxbyPKewNEIJyx0eUo
3OkdCArHt3I2+XdN8wEmhpOWVVElE6lg81Gyn+gWxprKac60A5O924lsXx8uULrQ
VSDE2+BQ6V45vzGUoqzSocA3Au8tqiP9Py9V7BSj0ADDI5LQLL+J9vp7+8sRlJWq
F2nQxV3vKc036Da7XbPpnAxds/p9B+CoglThzJMhhqMYxfsvvMHTnV28hZqGQvHa
/yuUBAAx/9Q0lXFimyu9Gj1pFl5KA0mLHZo0UZWg1wR/3edaB9Dor/NfY9zENWtw
W1+vyK5zq3zku25SjVgpz8W19rcB9T4jL7capNHB7lJTpM9LncX3UdsyuC1WY+gW
lg/sS0ESRWMJrgO+4JmXWehTYBLCNn9pqC7jxZSo/Fq3zMZWrkBx/53Mrku4LBQa
f0sUzwvGQ9vmgWVgdbIA123A2b444tLvG2BY3d2Pq6naBE3/Xgkr58XJhNpZ8E6K
WSIuxi60Vh9xhaXseYZBQpEGriWtwCljPaCP93k2DmksbwIP7r1UhkQ8lk3iTh2V
1+VYX9iv2+ceL3LPc6zmwbSYXP8czkB8F5okRH7lb9ykREtHXqleeHSX5fwdLik1
Y5rCrdFVw9avKjfNIg/gPQRFYN3KQbgUt/E4bum3w5Oo9VTdeTk=
=AHKD
-----END PGP SIGNATURE-----
Merge 4.4.125 into android-4.4
Changes in 4.4.125
MIPS: ralink: Remove ralink_halt()
iio: st_pressure: st_accel: pass correct platform data to init
ALSA: usb-audio: Fix parsing descriptor of UAC2 processing unit
ALSA: aloop: Sync stale timer before release
ALSA: aloop: Fix access to not-yet-ready substream via cable
ALSA: hda/realtek - Always immediately update mute LED with pin VREF
mmc: dw_mmc: fix falling from idmac to PIO mode when dw_mci_reset occurs
PCI: Add function 1 DMA alias quirk for Highpoint RocketRAID 644L
ahci: Add PCI-id for the Highpoint Rocketraid 644L card
clk: bcm2835: Protect sections updating shared registers
Bluetooth: btusb: Fix quirk for Atheros 1525/QCA6174
libata: fix length validation of ATAPI-relayed SCSI commands
libata: remove WARN() for DMA or PIO command without data
libata: Apply NOLPM quirk to Crucial MX100 512GB SSDs
libata: disable LPM for Crucial BX100 SSD 500GB drive
libata: Enable queued TRIM for Samsung SSD 860
libata: Apply NOLPM quirk to Crucial M500 480 and 960GB SSDs
libata: Make Crucial BX100 500GB LPM quirk apply to all firmware versions
libata: Modify quirks for MX100 to limit NCQ_TRIM quirk to MU01 version
mm/vmalloc: add interfaces to free unmapped page table
x86/mm: implement free pmd/pte page interfaces
drm/vmwgfx: Fix a destoy-while-held mutex problem.
drm/radeon: Don't turn off DP sink when disconnected
drm: udl: Properly check framebuffer mmap offsets
acpi, numa: fix pxm to online numa node associations
brcmfmac: fix P2P_DEVICE ethernet address generation
rtlwifi: rtl8723be: Fix loss of signal
tracing: probeevent: Fix to support minus offset from symbol
mtd: nand: fsl_ifc: Fix nand waitfunc return value
staging: ncpfs: memory corruption in ncp_read_kernel()
can: cc770: Fix stalls on rt-linux, remove redundant IRQ ack
can: cc770: Fix queue stall & dropped RTR reply
can: cc770: Fix use after free in cc770_tx_interrupt()
tty: vt: fix up tabstops properly
kvm/x86: fix icebp instruction handling
x86/build/64: Force the linker to use 2MB page size
x86/boot/64: Verify alignment of the LOAD segment
x86/entry/64: Don't use IST entry for #BP stack
perf/x86/intel: Don't accidentally clear high bits in bdw_limit_period()
staging: lustre: ptlrpc: kfree used instead of kvfree
kbuild: disable clang's default use of -fmerge-all-constants
bpf: skip unnecessary capability check
bpf, x64: increase number of passes
Linux 4.4.125
Change-Id: I14b307cd27ff088800174c74819a3ff1790b41ce
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 4c41aa24baa4ed338241d05494f2c595c885af8f upstream.
If the server is malicious then *bytes_read could be larger than the
size of the "target" buffer. It would lead to memory corruption when we
do the memcpy().
Reported-by: Dr Silvio Cesare of InfoSect <Silvio Cesare <silvio.cesare@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlq2IVkACgkQONu9yGCS
aT5SGQ//eDH59qGQJFy8GwDQULnYV8JEeBZT2tIzx2LDVJvKKz/8BkJmcGZrQ0gH
9EnMCiPkbxRH6HV6Cu1SHcyLFK+iVXGEw28Uk/sEcesQSZvrRxUKIgRLvyWhANgP
E1FiPbI6V1tXTROmUk/lrGZYgHMVUMAa+kYRndpbijtB++7qO25mVngP0Yg6OveO
oyw0MRx+v9mwQS/SID2EtlXnZOVGM+7kd3gg5fLY5w0qJT1jHjhrtZNWyKH2SMrS
QnzkFDMGDIWk5TrHJY0LEvpZI/toKNtPrG/Mt5gOvtSgjmQ4+EEVndrlutPAOa3K
xF6ERjtyBIRRG2ftk122fm6brjpxyCoqycD8JgCm1BxalNN7Bg+1ogQCGwE0EWNp
yYEsxLmSbLllX/4rkZtx+WsgiG4rwNWG1/IsPsEo80La3M53WzA3My8E0aVLpbIj
aqkzR62lZ1TSRgtbFjm6Bl4DtpJH4f9uELbO0VBi7b8LRTUI99Qcz4bOoKH+WKOC
umvHsuyMwDm8wc0KhQ4hdyGb1le0nS4Y1xydp1p0pcASLfeA2VOIg20ZvxQVBTZA
rlHo7zEQVkVYvphoPONUodOUVZ8/AGxoVvim3HlI6s5VvtdF6k/hQUWaOuWDh59J
bjw2vnMWOLWSqmmkpK9HmU6ohIV310TJewPu8BShzAuZRSNDxl8=
=OCjx
-----END PGP SIGNATURE-----
Merge 4.4.124 into android-4.4
Changes in 4.4.124
tpm: fix potential buffer overruns caused by bit glitches on the bus
tpm_tis: fix potential buffer overruns caused by bit glitches on the bus
SMB3: Validate negotiate request must always be signed
CIFS: Enable encryption during session setup phase
staging: android: ashmem: Fix possible deadlock in ashmem_ioctl
platform/x86: asus-nb-wmi: Add wapf4 quirk for the X302UA
regulator: anatop: set default voltage selector for pcie
x86: i8259: export legacy_pic symbol
rtc: cmos: Do not assume irq 8 for rtc when there are no legacy irqs
Input: ar1021_i2c - fix too long name in driver's device table
time: Change posix clocks ops interfaces to use timespec64
ACPI/processor: Fix error handling in __acpi_processor_start()
ACPI/processor: Replace racy task affinity logic
cpufreq/sh: Replace racy task affinity logic
genirq: Use irqd_get_trigger_type to compare the trigger type for shared IRQs
i2c: i2c-scmi: add a MS HID
net: ipv6: send unsolicited NA on admin up
media/dvb-core: Race condition when writing to CAM
spi: dw: Disable clock after unregistering the host
ath: Fix updating radar flags for coutry code India
clk: ns2: Correct SDIO bits
scsi: virtio_scsi: Always try to read VPD pages
KVM: PPC: Book3S PR: Exit KVM on failed mapping
ARM: 8668/1: ftrace: Fix dynamic ftrace with DEBUG_RODATA and !FRAME_POINTER
iommu/omap: Register driver before setting IOMMU ops
md/raid10: wait up frozen array in handle_write_completed
NFS: Fix missing pg_cleanup after nfs_pageio_cond_complete()
tcp: remove poll() flakes with FastOpen
e1000e: fix timing for 82579 Gigabit Ethernet controller
ALSA: hda - Fix headset microphone detection for ASUS N551 and N751
IB/ipoib: Fix deadlock between ipoib_stop and mcast join flow
IB/ipoib: Update broadcast object if PKey value was changed in index 0
HSI: ssi_protocol: double free in ssip_pn_xmit()
IB/mlx4: Take write semaphore when changing the vma struct
IB/mlx4: Change vma from shared to private
ASoC: Intel: Skylake: Uninitialized variable in probe_codec()
Fix driver usage of 128B WQEs when WQ_CREATE is V1.
netfilter: xt_CT: fix refcnt leak on error path
openvswitch: Delete conntrack entry clashing with an expectation.
mmc: host: omap_hsmmc: checking for NULL instead of IS_ERR()
wan: pc300too: abort path on failure
qlcnic: fix unchecked return value
scsi: mac_esp: Replace bogus memory barrier with spinlock
infiniband/uverbs: Fix integer overflows
NFS: don't try to cross a mountpount when there isn't one there.
iio: st_pressure: st_accel: Initialise sensor platform data properly
mt7601u: check return value of alloc_skb
rndis_wlan: add return value validation
Btrfs: send, fix file hole not being preserved due to inline extent
mac80211: don't parse encrypted management frames in ieee80211_frame_acked
mfd: palmas: Reset the POWERHOLD mux during power off
mtip32xx: use runtime tag to initialize command header
staging: unisys: visorhba: fix s-Par to boot with option CONFIG_VMAP_STACK set to y
staging: wilc1000: fix unchecked return value
mmc: sdhci-of-esdhc: limit SD clock for ls1012a/ls1046a
ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP
ipmi/watchdog: fix wdog hang on panic waiting for ipmi response
ACPI / PMIC: xpower: Fix power_table addresses
drm/nouveau/kms: Increase max retries in scanout position queries.
bnx2x: Align RX buffers
power: supply: pda_power: move from timer to delayed_work
Input: twl4030-pwrbutton - use correct device for irq request
md/raid10: skip spare disk as 'first' disk
ia64: fix module loading for gcc-5.4
tcm_fileio: Prevent information leak for short reads
video: fbdev: udlfb: Fix buffer on stack
sm501fb: don't return zero on failure path in sm501fb_start()
net: hns: fix ethtool_get_strings overflow in hns driver
cifs: small underflow in cnvrtDosUnixTm()
rtc: ds1374: wdt: Fix issue with timeout scaling from secs to wdt ticks
rtc: ds1374: wdt: Fix stop/start ioctl always returning -EINVAL
perf tests kmod-path: Don't fail if compressed modules aren't supported
Bluetooth: hci_qca: Avoid setup failure on missing rampatch
media: c8sectpfe: fix potential NULL pointer dereference in c8sectpfe_timer_interrupt
drm/msm: fix leak in failed get_pages
RDMA/iwpm: Fix uninitialized error code in iwpm_send_mapinfo()
rtlwifi: rtl_pci: Fix the bug when inactiveps is enabled.
media: bt8xx: Fix err 'bt878_probe()'
media: [RESEND] media: dvb-frontends: Add delay to Si2168 restart
cros_ec: fix nul-termination for firmware build info
platform/chrome: Use proper protocol transfer function
mmc: avoid removing non-removable hosts during suspend
IB/ipoib: Avoid memory leak if the SA returns a different DGID
RDMA/cma: Use correct size when writing netlink stats
IB/umem: Fix use of npages/nmap fields
vgacon: Set VGA struct resource types
drm/omap: DMM: Check for DMM readiness after successful transaction commit
pty: cancel pty slave port buf's work in tty_release
coresight: Fix disabling of CoreSight TPIU
pinctrl: Really force states during suspend/resume
iommu/vt-d: clean up pr_irq if request_threaded_irq fails
ip6_vti: adjust vti mtu according to mtu of lower device
RDMA/ocrdma: Fix permissions for OCRDMA_RESET_STATS
nfsd4: permit layoutget of executable-only files
clk: si5351: Rename internal plls to avoid name collisions
dmaengine: ti-dma-crossbar: Fix event mapping for TPCC_EVT_MUX_60_63
RDMA/ucma: Fix access to non-initialized CM_ID object
Linux 4.4.124
Change-Id: Iac6f5bda7941f032c5b1f58750e084140b0e3f23
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 66282ec1cf004c09083c29cb5e49019037937bbd ]
Clients must be able to read a file in order to execute it, and for pNFS
that means the client needs to be able to perform a LAYOUTGET on the file.
This behavior for executable-only files was added for OPEN in commit
a043226bc1 "nfsd4: permit read opens of executable-only files".
This fixes up xfstests generic/126 on block/scsi layouts.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 564277eceeca01e02b1ef3e141cfb939184601b4 ]
January is month 1. There is no zero-th month. If someone passes a
zero month then it means we read from one space before the start of the
total_days_of_prev_months[] array.
We may as well also be strict about days as well.
Fixes: 1bd5bbcb65 ("[CIFS] Legacy time handling for Win9x and OS/2 part 1")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e1cbfd7bf6dabdac561c75d08357571f44040a45 ]
Normally we don't have inline extents followed by regular extents, but
there's currently at least one harmless case where this happens. For
example, when the page size is 4Kb and compression is enabled:
$ mkfs.btrfs -f /dev/sdb
$ mount -o compress /dev/sdb /mnt
$ xfs_io -f -c "pwrite -S 0xaa 0 4K" -c "fsync" /mnt/foobar
$ xfs_io -c "pwrite -S 0xbb 8K 4K" -c "fsync" /mnt/foobar
In this case we get a compressed inline extent, representing 4Kb of
data, followed by a hole extent and then a regular data extent. The
inline extent was not expanded/converted to a regular extent exactly
because it represents 4Kb of data. This does not cause any apparent
problem (such as the issue solved by commit e1699d2d7bf6
("btrfs: add missing memset while reading compressed inline extents"))
except trigger an unexpected case in the incremental send code path
that makes us issue an operation to write a hole when it's not needed,
resulting in more writes at the receiver and wasting space at the
receiver.
So teach the incremental send code to deal with this particular case.
The issue can be currently triggered by running fstests btrfs/137 with
compression enabled (MOUNT_OPTIONS="-o compress" ./check btrfs/137).
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 99bbf6ecc694dfe0b026e15359c5aa2a60b97a93 ]
consider the sequence of commands:
mkdir -p /import/nfs /import/bind /import/etc
mount --bind / /import/bind
mount --make-private /import/bind
mount --bind /import/etc /import/bind/etc
exportfs -o rw,no_root_squash,crossmnt,async,no_subtree_check localhost:/
mount -o vers=4 localhost:/ /import/nfs
ls -l /import/nfs/etc
You would not expect this to report a stale file handle.
Yet it does.
The manipulations under /import/bind cause the dentry for
/etc to get the DCACHE_MOUNTED flag set, even though nothing
is mounted on /etc. This causes nfsd to call
nfsd_cross_mnt() even though there is no mountpoint. So an
upcall to mountd for "/etc" is performed.
The 'crossmnt' flag on the export of / causes mountd to
report that /etc is exported as it is a descendant of /. It
assumes the kernel wouldn't ask about something that wasn't
a mountpoint. The filehandle returned identifies the
filesystem and the inode number of /etc.
When this filehandle is presented to rpc.mountd, via
"nfsd.fh", the inode cannot be found associated with any
name in /etc/exports, or with any mountpoint listed by
getmntent(). So rpc.mountd says the filehandle doesn't
exist. Hence ESTALE.
This is fixed by teaching nfsd not to trust DCACHE_MOUNTED
too much. It is just a hint, not a guarantee.
Change nfsd_mountpoint() to return '1' for a certain mountpoint,
'2' for a possible mountpoint, and 0 otherwise.
Then change nfsd_crossmnt() to check if follow_down()
actually found a mountpount and, if not, to avoid performing
a lookup if the location is not known to certainly require
an export-point.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>