This patch moves error handling from commit_inmem_pages() into
__commit_inmem_page() for cleanup, no logic change.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Only dir may have F2FS_INLINE_DOTS flag, so there is no need to check
the flag in recover flow.
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch removes duplicated dquot_initialize in recover_orphan_inode(),
and fix the error handling if dquot_initialize fails.
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
v4->v5: move data corruption check to __submit_discard_cmd, in order to
control discard io submitted more accurately, besides, increase async
thread wait time if data corruption detected.
This patch stop async thread and umount process to issue discard
if something wrong with f2fs, which is similar to fstrim.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In f2fs_ioc_commit_atomic_write, if file is volatile, return -EINVAL to
indicate that commit failure.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If a file not set type as hot, has dirty pages more than
threshold 64 before starting atomic write, may be lose hot
flag.
v1->v2: move set FI_ATOMIC_FILE flag behind flush dirty pages too,
in case of dirty pages before starting atomic use atomic mode to
write back.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
`cur' will never be NULL, we should check inmem_pages list instead.
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Thread GC thread
- f2fs_ioc_start_atomic_write
- get_dirty_pages
- filemap_write_and_wait_range
- f2fs_gc
- do_garbage_collect
- gc_data_segment
- move_data_page
- f2fs_is_atomic_file
- set_page_dirty
- set_inode_flag(, FI_ATOMIC_FILE)
Dirty data page can still be generated by GC in race condition as
above call stack.
This patch adds fi->dio_rwsem[WRITE] in f2fs_ioc_start_atomic_write
to avoid such race.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In the structure of f2fs_inode, i_extra_size's type is __le16,
so we should keep type consistent when using it.
Fixes: 704956ecf5bc ("f2fs: support inode checksum")
Signed-off-by: Zhikang Zhang <zhangzhikang1@huawei.com>
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We should check valid_map_mir and block count to ensure
the flushed raw_sit is correct.
Signed-off-by: Zhikang Zhang <zhangzhikang1@huawei.com>
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Correct return value in two cases:
- return EINVAL if end boundary is out-of-range.
- return EIO if fs needs off-line check.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch fixes to show missing encrypt/inline_data flag in
FS_IOC_GETFLAGS like ext4 does.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Now F2FS_FL_USER_VISIBLE and F2FS_FL_USER_MODIFIABLE has included
F2FS_PROJINHERIT_FL, so remove unneeded F2FS_PROJINHERIT_FL when
using visible/modifiable flag macro.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Related to https://lkml.org/lkml/2018/4/8/661
Sometimes, we need to write meta data to new allocated block address,
then we will allocate a zeroed page in inner inode's address space, and
fill partial data in it, and leave other place with zero value which means
some fields are initial status.
There are two inner inodes (meta inode and node inode) setting __GFP_ZERO,
I have just checked them, for both of them, we can avoid using __GFP_ZERO,
and do initialization by ourselves to avoid unneeded/redundant zeroing
from mm.
Cc: <stable@vger.kernel.org>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch modify max_requests to UINT_MAX, to issue
all big range discards in umount.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
For buffered IO, we don't need to use block plug to cache bio,
for direct IO, generic f2fs_direct_IO has already added block
plug, so let's remove redundant one in .write_iter.
As Yunlei described in his patch:
-f2fs_file_write_iter
-blk_start_plug
-__generic_file_write_iter
...
-do_blockdev_direct_IO
-blk_start_plug
...
-blk_finish_plug
...
-blk_finish_plug
which may conduct performance decrease in our platform
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Since the layout of regular dentry block is different from inline dentry
block, zero_user_segment starting from MAX_INLINE_DATA(dir) is not
correct for regular dentry block, besides, bitmap is already copied and
used, so there is no necessary to zero page at all, so just remove the
zero_user_segment is OK.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Previously, we use generic FS_*_FL defined by vfs to indicate inode status
for each bit of i_flags, so f2fs's flag status definition is tied to vfs'
one, it will be hard for f2fs to reuse bits f2fs never used to indicate
new status..
In order to solve this issue, we introduce private inode status mapping,
Note, for these bits have already been persisted into disk, we should
never change their definition, for other ones, we can remap them for
later new coming status.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAls7QB8ACgkQONu9yGCS
aT6trQ/9EO1dgc0lZO0zGCxFFiikPzzMp1auSKd99FhSaqlrCPutT5K0gBVc1rug
EvggbqWj2MBX2HZvxQR8LbGNvp7+kkM3apIdYOqyTQPvs7x03YNeuvXZUF3EFyPO
eDZ71nLuwgnEeySceJ+Z9HcVBcWR/0dEkwjhjpIJ2IO25tcecWzbqOOdzNypBIKK
EG4dGhO5JY6jLqxbEFZ9d302bGZQozOQHiDfEZz6NueI0yYVJIjQQvuLp/V0ChDg
TN+PgTOdzxIPCpZw9y4XzN4nhdsOial1xeX7agzAkZDjdbprNpbZrxjfY0NLdpQ0
4ZV3vLqIZ5rs8xuCRgNJ7yTVt6X7miw/h7TQp30qpeDuRf1SHZa4ITqMzdXJUahW
BT+XkjrrCjKxXkCH+rWy0txtouUaVwM+sKHIW0bvrOJwHM0UJXNAUppt4NrBtgtD
7Zt/FDKAHCJk1GuW3U5zXOHmgn+QkRNEndpwbUjwRowvHcE5jVSLLkH4XZkA0+SL
ucQCxOqGKrbHjhyXT+e2Kpx4Z5sqJIUHhc4iw6gi7xyaoJ55kHZ2S+sCwo3cjreq
B43SrwkQ0EJXwHzcrmvDfnvEFf7ylDVWH597lQsIQMNI7Gg04fXixYpvr6DYOBSN
AKHvoqd7VztHnX/ZogyLXp4jWiU5dU6qYXdj/zEs+tB8DYPZ4+c=
=Mli0
-----END PGP SIGNATURE-----
Merge 4.4.139 into android-4.4
Changes in 4.4.139
xfrm6: avoid potential infinite loop in _decode_session6()
netfilter: ebtables: handle string from userspace with care
ipvs: fix buffer overflow with sync daemon and service
atm: zatm: fix memcmp casting
net: qmi_wwan: Add Netgear Aircard 779S
net/sonic: Use dma_mapping_error()
Revert "Btrfs: fix scrub to repair raid6 corruption"
tcp: do not overshoot window_clamp in tcp_rcv_space_adjust()
Btrfs: make raid6 rebuild retry more
usb: musb: fix remote wakeup racing with suspend
bonding: re-evaluate force_primary when the primary slave name changes
tcp: verify the checksum of the first data segment in a new connection
ext4: update mtime in ext4_punch_hole even if no blocks are released
ext4: fix fencepost error in check for inode count overflow during resize
driver core: Don't ignore class_dir_create_and_add() failure.
btrfs: scrub: Don't use inode pages for device replace
ALSA: hda - Handle kzalloc() failure in snd_hda_attach_pcm_stream()
ALSA: hda: add dock and led support for HP EliteBook 830 G5
ALSA: hda: add dock and led support for HP ProBook 640 G4
cpufreq: Fix new policy initialization during limits updates via sysfs
libata: zpodd: make arrays cdb static, reduces object code size
libata: zpodd: small read overflow in eject_tray()
libata: Drop SanDisk SD7UB3Q*G1001 NOLPM quirk
w1: mxc_w1: Enable clock before calling clk_get_rate() on it
fs/binfmt_misc.c: do not allow offset overflow
x86/spectre_v1: Disable compiler optimizations over array_index_mask_nospec()
m68k/mm: Adjust VM area to be unmapped by gap size for __iounmap()
serial: sh-sci: Use spin_{try}lock_irqsave instead of open coding version
signal/xtensa: Consistenly use SIGBUS in do_unaligned_user
usb: do not reset if a low-speed or full-speed device timed out
1wire: family module autoload fails because of upper/lower case mismatch.
ASoC: dapm: delete dapm_kcontrol_data paths list before freeing it
ASoC: cirrus: i2s: Fix LRCLK configuration
ASoC: cirrus: i2s: Fix {TX|RX}LinCtrlData setup
lib/vsprintf: Remove atomic-unsafe support for %pCr
mips: ftrace: fix static function graph tracing
branch-check: fix long->int truncation when profiling branches
ipmi:bt: Set the timeout before doing a capabilities check
Bluetooth: hci_qca: Avoid missing rampatch failure with userspace fw loader
fuse: atomic_o_trunc should truncate pagecache
fuse: don't keep dead fuse_conn at fuse_fill_super().
fuse: fix control dir setup and teardown
powerpc/mm/hash: Add missing isync prior to kernel stack SLB switch
powerpc/ptrace: Fix setting 512B aligned breakpoints with PTRACE_SET_DEBUGREG
powerpc/ptrace: Fix enforcement of DAWR constraints
cpuidle: powernv: Fix promotion from snooze if next state disabled
powerpc/fadump: Unregister fadump on kexec down path.
ARM: 8764/1: kgdb: fix NUMREGBYTES so that gdb_regs[] is the correct size
of: unittest: for strings, account for trailing \0 in property length field
IB/qib: Fix DMA api warning with debug kernel
RDMA/mlx4: Discard unknown SQP work requests
mtd: cfi_cmdset_0002: Change write buffer to check correct value
mtd: cfi_cmdset_0002: Use right chip in do_ppb_xxlock()
mtd: cfi_cmdset_0002: fix SEGV unlocking multiple chips
mtd: cfi_cmdset_0002: Fix unlocking requests crossing a chip boudary
mtd: cfi_cmdset_0002: Avoid walking all chips when unlocking.
MIPS: BCM47XX: Enable 74K Core ExternalSync for PCIe erratum
PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resume
MIPS: io: Add barrier after register read in inX()
time: Make sure jiffies_to_msecs() preserves non-zero time periods
Btrfs: fix clone vs chattr NODATASUM race
iio:buffer: make length types match kfifo types
scsi: qla2xxx: Fix setting lower transfer speed if GPSC fails
scsi: zfcp: fix missing SCSI trace for result of eh_host_reset_handler
scsi: zfcp: fix missing SCSI trace for retry of abort / scsi_eh TMF
scsi: zfcp: fix misleading REC trigger trace where erp_action setup failed
scsi: zfcp: fix missing REC trigger trace on terminate_rport_io early return
scsi: zfcp: fix missing REC trigger trace on terminate_rport_io for ERP_FAILED
scsi: zfcp: fix missing REC trigger trace for all objects in ERP_FAILED
scsi: zfcp: fix missing REC trigger trace on enqueue without ERP thread
linvdimm, pmem: Preserve read-only setting for pmem devices
md: fix two problems with setting the "re-add" device state.
ubi: fastmap: Cancel work upon detach
UBIFS: Fix potential integer overflow in allocation
xfrm: Ignore socket policies when rebuilding hash tables
xfrm: skip policies marked as dead while rehashing
backlight: as3711_bl: Fix Device Tree node lookup
backlight: max8925_bl: Fix Device Tree node lookup
backlight: tps65217_bl: Fix Device Tree node lookup
mfd: intel-lpss: Program REMAP register in PIO mode
perf tools: Fix symbol and object code resolution for vdso32 and vdsox32
perf intel-pt: Fix sync_switch INTEL_PT_SS_NOT_TRACING
perf intel-pt: Fix decoding to accept CBR between FUP and corresponding TIP
perf intel-pt: Fix MTC timing after overflow
perf intel-pt: Fix "Unexpected indirect branch" error
perf intel-pt: Fix packet decoding of CYC packets
media: v4l2-compat-ioctl32: prevent go past max size
media: cx231xx: Add support for AverMedia DVD EZMaker 7
media: dvb_frontend: fix locking issues at dvb_frontend_get_event()
nfsd: restrict rd_maxcount to svc_max_payload in nfsd_encode_readdir
NFSv4: Fix possible 1-byte stack overflow in nfs_idmap_read_and_verify_message
video: uvesafb: Fix integer overflow in allocation
Input: elan_i2c - add ELAN0618 (Lenovo v330 15IKB) ACPI ID
xen: Remove unnecessary BUG_ON from __unbind_from_irq()
udf: Detect incorrect directory size
Input: elan_i2c_smbus - fix more potential stack buffer overflows
Input: elantech - enable middle button of touchpads on ThinkPad P52
Input: elantech - fix V4 report decoding for module with middle key
ALSA: hda/realtek - Add a quirk for FSC ESPRIMO U9210
Btrfs: fix unexpected cow in run_delalloc_nocow
spi: Fix scatterlist elements size in spi_map_buf
block: Fix transfer when chunk sectors exceeds max
dm thin: handle running out of data space vs concurrent discard
cdc_ncm: avoid padding beyond end of skb
Bluetooth: Fix connection if directed advertising and privacy is used
Linux 4.4.139
Change-Id: I93013bedf2ebe3e6a8718972d8854723609963cc
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 5811375325420052fcadd944792a416a43072b7f upstream.
Fstests generic/475 provides a way to fail metadata reads while
checking if checksum exists for the inode inside run_delalloc_nocow(),
and csum_exist_in_range() interprets error (-EIO) as inode having
checksum and makes its caller enter the cow path.
In case of free space inode, this ends up with a warning in
cow_file_range().
The same problem applies to btrfs_cross_ref_exist() since it may also
read metadata in between.
With this, run_delalloc_nocow() bails out when errors occur at the two
places.
cc: <stable@vger.kernel.org> v2.6.28+
Fixes: 17d217fe97 ("Btrfs: fix nodatasum handling in balancing code")
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fa65653e575fbd958bdf5fb9c4a71a324e39510d upstream.
Detect when a directory entry is (possibly partially) beyond directory
size and return EIO in that case since it means the filesystem is
corrupted. Otherwise directory operations can further corrupt the
directory and possibly also oops the kernel.
CC: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
CC: stable@vger.kernel.org
Reported-and-tested-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d68894800ec5712d7ddf042356f11e36f87d7f78 upstream.
In nfs_idmap_read_and_verify_message there is an incorrect sprintf '%d'
that converts the __u32 'im_id' from struct idmap_msg to 'id_str', which
is a stack char array variable of length NFS_UINT_MAXLEN == 11.
If a uid or gid value is > 2147483647 = 0x7fffffff, the conversion
overflows into a negative value, for example:
crash> p (unsigned) (0x80000000)
$1 = 2147483648
crash> p (signed) (0x80000000)
$2 = -2147483648
The '-' sign is written to the buffer and this causes a 1 byte overflow
when the NULL byte is written, which corrupts kernel stack memory. If
CONFIG_CC_STACKPROTECTOR_STRONG is set we see a stack-protector panic:
[11558053.616565] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffa05b8a8c
[11558053.639063] CPU: 6 PID: 9423 Comm: rpc.idmapd Tainted: G W ------------ T 3.10.0-514.el7.x86_64 #1
[11558053.641990] Hardware name: Red Hat OpenStack Compute, BIOS 1.10.2-3.el7_4.1 04/01/2014
[11558053.644462] ffffffff818c7bc0 00000000b1f3aec1 ffff880de0f9bd48 ffffffff81685eac
[11558053.646430] ffff880de0f9bdc8 ffffffff8167f2b3 ffffffff00000010 ffff880de0f9bdd8
[11558053.648313] ffff880de0f9bd78 00000000b1f3aec1 ffffffff811dcb03 ffffffffa05b8a8c
[11558053.650107] Call Trace:
[11558053.651347] [<ffffffff81685eac>] dump_stack+0x19/0x1b
[11558053.653013] [<ffffffff8167f2b3>] panic+0xe3/0x1f2
[11558053.666240] [<ffffffff811dcb03>] ? kfree+0x103/0x140
[11558053.682589] [<ffffffffa05b8a8c>] ? idmap_pipe_downcall+0x1cc/0x1e0 [nfsv4]
[11558053.689710] [<ffffffff810855db>] __stack_chk_fail+0x1b/0x30
[11558053.691619] [<ffffffffa05b8a8c>] idmap_pipe_downcall+0x1cc/0x1e0 [nfsv4]
[11558053.693867] [<ffffffffa00209d6>] rpc_pipe_write+0x56/0x70 [sunrpc]
[11558053.695763] [<ffffffff811fe12d>] vfs_write+0xbd/0x1e0
[11558053.702236] [<ffffffff810acccc>] ? task_work_run+0xac/0xe0
[11558053.704215] [<ffffffff811fec4f>] SyS_write+0x7f/0xe0
[11558053.709674] [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b
Fix this by calling the internally defined nfs_map_numeric_to_string()
function which properly uses '%u' to convert this __u32. For consistency,
also replace the one other place where snprintf is called.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reported-by: Stephen Johnston <sjohnsto@redhat.com>
Fixes: cf4ab538f1 ("NFSv4: Fix the string length returned by the idmapper")
Cc: stable@vger.kernel.org # v3.4+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9c2ece6ef67e9d376f32823086169b489c422ed0 upstream.
nfsd4_readdir_rsize restricts rd_maxcount to svc_max_payload when
estimating the size of the readdir reply, but nfsd_encode_readdir
restricts it to INT_MAX when encoding the reply. This can result in log
messages like "kernel: RPC request reserved 32896 but used 1049444".
Restrict rd_dircount similarly (no reason it should be larger than
svc_max_payload).
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 353748a359f1821ee934afc579cf04572406b420 upstream.
There is potential for the size and len fields in ubifs_data_node to be
too large causing either a negative value for the length fields or an
integer overflow leading to an incorrect memory allocation. Likewise,
when the len field is small, an integer underflow may occur.
Signed-off-by: Silvio Cesare <silvio.cesare@gmail.com>
Fixes: 1e51764a3c ("UBIFS: add new flash file system")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b5c40d598f5408bd0ca22dfffa82f03cd9433f23 upstream.
In btrfs_clone_files(), we must check the NODATASUM flag while the
inodes are locked. Otherwise, it's possible that btrfs_ioctl_setflags()
will change the flags after we check and we can end up with a party
checksummed file.
The race window is only a few instructions in size, between the if and
the locks which is:
3834 if (S_ISDIR(src->i_mode) || S_ISDIR(inode->i_mode))
3835 return -EISDIR;
where the setflags must be run and toggle the NODATASUM flag (provided
the file size is 0). The clone will block on the inode lock, segflags
takes the inode lock, changes flags, releases log and clone continues.
Not impossible but still needs a lot of bad luck to hit unintentionally.
Fixes: 0e7b824c4e ("Btrfs: don't make a file partly checksummed through file clone")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ adjusted for 4.4 ]
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
commit 6becdb601bae2a043d7fb9762c4d48699528ea6e upstream.
syzbot is reporting NULL pointer dereference at fuse_ctl_remove_conn() [1].
Since fc->ctl_ndents is incremented by fuse_ctl_add_conn() when new_inode()
failed, fuse_ctl_remove_conn() reaches an inode-less dentry and tries to
clear d_inode(dentry)->i_private field.
Fix by only adding the dentry to the array after being fully set up.
When tearing down the control directory, do d_invalidate() on it to get rid
of any mounts that might have been added.
[1] https://syzkaller.appspot.com/bug?id=f396d863067238959c91c0b7cfc10b163638cac6
Reported-by: syzbot <syzbot+32c236387d66c4516827@syzkaller.appspotmail.com>
Fixes: bafa96541b ("[PATCH] fuse: add control filesystem")
Cc: <stable@vger.kernel.org> # v2.6.18
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 543b8f8662fe6d21f19958b666ab0051af9db21a upstream.
syzbot is reporting use-after-free at fuse_kill_sb_blk() [1].
Since sb->s_fs_info field is not cleared after fc was released by
fuse_conn_put() when initialization failed, fuse_kill_sb_blk() finds
already released fc and tries to hold the lock. Fix this by clearing
sb->s_fs_info field after calling fuse_conn_put().
[1] https://syzkaller.appspot.com/bug?id=a07a680ed0a9290585ca424546860464dd9658db
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+ec3986119086fe4eec97@syzkaller.appspotmail.com>
Fixes: 3b463ae0c6 ("fuse: invalidation reverse calls")
Cc: John Muir <john@jmuir.com>
Cc: Csaba Henk <csaba@gluster.com>
Cc: Anand Avati <avati@redhat.com>
Cc: <stable@vger.kernel.org> # v2.6.31
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit df0e91d488276086bc07da2e389986cae0048c37 upstream.
Fuse has an "atomic_o_trunc" mode, where userspace filesystem uses the
O_TRUNC flag in the OPEN request to truncate the file atomically with the
open.
In this mode there's no need to send a SETATTR request to userspace after
the open, so fuse_do_setattr() checks this mode and returns. But this
misses the important step of truncating the pagecache.
Add the missing parts of truncation to the ATTR_OPEN branch.
Reported-by: Chad Austin <chadaustin@fb.com>
Fixes: 6ff958edbf ("fuse: add atomic open+truncate support")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5cc41e099504b77014358b58567c5ea6293dd220 upstream.
WHen registering a new binfmt_misc handler, it is possible to overflow
the offset to get a negative value, which might crash the system, or
possibly leak kernel data.
Here is a crash log when 2500000000 was used as an offset:
BUG: unable to handle kernel paging request at ffff989cfd6edca0
IP: load_misc_binary+0x22b/0x470 [binfmt_misc]
PGD 1ef3e067 P4D 1ef3e067 PUD 0
Oops: 0000 [#1] SMP NOPTI
Modules linked in: binfmt_misc kvm_intel ppdev kvm irqbypass joydev input_leds serio_raw mac_hid parport_pc qemu_fw_cfg parpy
CPU: 0 PID: 2499 Comm: bash Not tainted 4.15.0-22-generic #24-Ubuntu
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
RIP: 0010:load_misc_binary+0x22b/0x470 [binfmt_misc]
Call Trace:
search_binary_handler+0x97/0x1d0
do_execveat_common.isra.34+0x667/0x810
SyS_execve+0x31/0x40
do_syscall_64+0x73/0x130
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Use kstrtoint instead of simple_strtoul. It will work as the code
already set the delimiter byte to '\0' and we only do it when the field
is not empty.
Tested with offsets -1, 2500000000, UINT_MAX and INT_MAX. Also tested
with examples documented at Documentation/admin-guide/binfmt-misc.rst
and other registrations from packages on Ubuntu.
Link: http://lkml.kernel.org/r/20180529135648.14254-1-cascardo@canonical.com
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ac0b4145d662a3b9e34085dea460fb06ede9b69b upstream.
[BUG]
Btrfs can create compressed extent without checksum (even though it
shouldn't), and if we then try to replace device containing such extent,
the result device will contain all the uncompressed data instead of the
compressed one.
Test case already submitted to fstests:
https://patchwork.kernel.org/patch/10442353/
[CAUSE]
When handling compressed extent without checksum, device replace will
goe into copy_nocow_pages() function.
In that function, btrfs will get all inodes referring to this data
extents and then use find_or_create_page() to get pages direct from that
inode.
The problem here is, pages directly from inode are always uncompressed.
And for compressed data extent, they mismatch with on-disk data.
Thus this leads to corrupted compressed data extent written to replace
device.
[FIX]
In this attempt, we could just remove the "optimization" branch, and let
unified scrub_pages() to handle it.
Although scrub_pages() won't bother reusing page cache, it will be a
little slower, but it does the correct csum checking and won't cause
such data corruption caused by "optimization".
Note about the fix: this is the minimal fix that can be backported to
older stable trees without conflicts. The whole callchain from
copy_nocow_pages() can be deleted, and will be in followup patches.
Fixes: ff023aac31 ("Btrfs: add code to scrub to copy read data to another disk")
CC: stable@vger.kernel.org # 4.4+
Reported-by: James Harvey <jamespharvey20@gmail.com>
Reviewed-by: James Harvey <jamespharvey20@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
[ remove code removal, add note why ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4f2f76f751433908364ccff82f437a57d0e6e9b7 upstream.
ext4_resize_fs() has an off-by-one bug when checking whether growing of
a filesystem will not overflow inode count. As a result it allows a
filesystem with 8192 inodes per group to grow to 64TB which overflows
inode count to 0 and makes filesystem unusable. Fix it.
Cc: stable@vger.kernel.org
Fixes: 3f8a6411fb
Reported-by: Jaco Kroon <jaco@uls.co.za>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eee597ac931305eff3d3fd1d61d6aae553bc0984 upstream.
Currently in ext4_punch_hole we're going to skip the mtime update if
there are no actual blocks to release. However we've actually modified
the file by zeroing the partial block so the mtime should be updated.
Moreover the sync and datasync handling is skipped as well, which is
also wrong. Fix it.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Joe Habermann <joe.habermann@quantum.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8810f7517a3bc4ca2d41d022446d3f5fd6b77c09 ]
There is a scenario that can end up with rebuild process failing to
return good content, i.e.
suppose that all disks can be read without problems and if the content
that was read out doesn't match its checksum, currently for raid6
btrfs at most retries twice,
- the 1st retry is to rebuild with all other stripes, it'll eventually
be a raid5 xor rebuild,
- if the 1st fails, the 2nd retry will deliberately fail parity p so
that it will do raid6 style rebuild,
however, the chances are that another non-parity stripe content also
has something corrupted, so that the above retries are not able to
return correct content, and users will think of this as data loss.
More seriouly, if the loss happens on some important internal btree
roots, it could refuse to mount.
This extends btrfs to do more retries and each retry fails only one
stripe. Since raid6 can tolerate 2 disk failures, if there is one
more failure besides the failure on which we're recovering, this can
always work.
The worst case is to retry as many times as the number of raid6 disks,
but given the fact that such a scenario is really rare in practice,
it's still acceptable.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 95b286daf7.
This commit used an incorrect log message.
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Log the crypto algorithm driver name for each fscrypt encryption mode on
its first use, also showing a friendly name for the mode.
This will help people determine whether the expected implementations are
being used. In some cases we've seen people do benchmarks and reject
using encryption for performance reasons, when in fact they used a much
slower implementation of AES-XTS than was possible on the hardware. It
can make an enormous difference; e.g., AES-XTS on ARM is about 10x
faster with the crypto extensions (AES instructions) than without.
This also makes it more obvious which modes are being used, now that
fscrypt supports multiple combinations of modes.
Example messages (with default modes, on x86_64):
[ 35.492057] fscrypt: AES-256-CTS-CBC using implementation "cts(cbc-aes-aesni)"
[ 35.492171] fscrypt: AES-256-XTS using implementation "xts-aes-aesni"
Note: algorithms can be dynamically added to the crypto API, which can
result in different implementations being used at different times. But
this is rare; for most users, showing the first will be good enough.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
fscrypt currently only supports AES encryption. However, many low-end
mobile devices have older CPUs that don't have AES instructions, e.g.
the ARMv8 Cryptography Extensions. Currently, user data on such devices
is not encrypted at rest because AES is too slow, even when the NEON
bit-sliced implementation of AES is used. Unfortunately, it is
infeasible to encrypt these devices at all when AES is the only option.
Therefore, this patch updates fscrypt to support the Speck block cipher,
which was recently added to the crypto API. The C implementation of
Speck is not especially fast, but Speck can be implemented very
efficiently with general-purpose vector instructions, e.g. ARM NEON.
For example, on an ARMv7 processor, we measured the NEON-accelerated
Speck128/256-XTS at 69 MB/s for both encryption and decryption, while
AES-256-XTS with the NEON bit-sliced implementation was only 22 MB/s
encryption and 19 MB/s decryption.
There are multiple variants of Speck. This patch only adds support for
Speck128/256, which is the variant with a 128-bit block size and 256-bit
key size -- the same as AES-256. This is believed to be the most secure
variant of Speck, and it's only about 6% slower than Speck128/128.
Speck64/128 would be at least 20% faster because it has 20% rounds, and
it can be even faster on CPUs that can't efficiently do the 64-bit
operations needed for Speck128. However, Speck64's 64-bit block size is
not preferred security-wise. ARM NEON also supports the needed 64-bit
operations even on 32-bit CPUs, resulting in Speck128 being fast enough
for our targeted use cases so far.
The chosen modes of operation are XTS for contents and CTS-CBC for
filenames. These are the same modes of operation that fscrypt defaults
to for AES. Note that as with the other fscrypt modes, Speck will not
be used unless userspace chooses to use it. Nor are any of the existing
modes (which are all AES-based) being removed, of course.
We intentionally don't make CONFIG_FS_ENCRYPTION select
CONFIG_CRYPTO_SPECK, so people will have to enable Speck support
themselves if they need it. This is because we shouldn't bloat the
FS_ENCRYPTION dependencies with every new cipher, especially ones that
aren't recommended for most users. Moreover, CRYPTO_SPECK is just the
generic implementation, which won't be fast enough for many users; in
practice, they'll need to enable CRYPTO_SPECK_NEON to get acceptable
performance.
More details about our choice of Speck can be found in our patches that
added Speck to the crypto API, and the follow-on discussion threads.
We're planning a publication that explains the choice in more detail.
But briefly, we can't use ChaCha20 as we previously proposed, since it
would be insecure to use a stream cipher in this context, with potential
IV reuse during writes on f2fs and/or on wear-leveling flash storage.
We also evaluated many other lightweight and/or ARX-based block ciphers
such as Chaskey-LTS, RC5, LEA, CHAM, Threefish, RC6, NOEKEON, SPARX, and
XTEA. However, all had disadvantages vs. Speck, such as insufficient
performance with NEON, much less published cryptanalysis, or an
insufficient security level. Various design choices in Speck make it
perform better with NEON than competing ciphers while still having a
security margin similar to AES, and in the case of Speck128 also the
same available security levels. Unfortunately, Speck does have some
political baggage attached -- it's an NSA designed cipher, and was
rejected from an ISO standard (though for context, as far as I know none
of the above-mentioned alternatives are ISO standards either).
Nevertheless, we believe it is a good solution to the problem from a
technical perspective.
Certain algorithms constructed from ChaCha or the ChaCha permutation,
such as MEM (Masked Even-Mansour) or HPolyC, may also meet our
performance requirements. However, these are new constructions that
need more time to receive the cryptographic review and acceptance needed
to be confident in their security. HPolyC hasn't been published yet,
and we are concerned that MEM makes stronger assumptions about the
underlying permutation than the ChaCha stream cipher does. In contrast,
the XTS mode of operation is relatively well accepted, and Speck has
over 70 cryptanalysis papers. Of course, these ChaCha-based algorithms
can still be added later if they become ready.
The best known attack on Speck128/256 is a differential cryptanalysis
attack on 25 of 34 rounds with 2^253 time complexity and 2^125 chosen
plaintexts, i.e. only marginally faster than brute force. There is no
known attack on the full 34 rounds.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Currently the key derivation function in fscrypt uses the master key
length as the amount of output key material to derive. This works, but
it means we can waste time deriving more key material than is actually
used, e.g. most commonly, deriving 64 bytes for directories which only
take a 32-byte AES-256-CTS-CBC key. It also forces us to validate that
the master key length is a multiple of AES_BLOCK_SIZE, which wouldn't
otherwise be necessary.
Fix it to only derive the needed length key.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Refactor the confusingly-named function 'validate_user_key()' into a new
function 'find_and_derive_key()' which first finds the keyring key, then
does the key derivation. Among other benefits this avoids the strange
behavior we had previously where if key derivation failed for some
reason, then we would fall back to the alternate key prefix. Now, we'll
only fall back to the alternate key prefix if a valid key isn't found.
This patch also improves the warning messages that are logged when the
keyring key's payload is invalid.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Use a common function for fscrypt warning and error messages so that all
the messages are consistently ratelimited, include the "fscrypt:"
prefix, and include the filesystem name if applicable.
Also fix up a few of the log messages to be more descriptive.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
With one exception, the internal key size constants such as
FS_AES_256_XTS_KEY_SIZE are only used for the 'available_modes' array,
where they really only serve to obfuscate what the values are. Also
some of the constants are unused, and the key sizes tend to be in the
names of the algorithms anyway. In the past these values were also
misused, e.g. we used to have FS_AES_256_XTS_KEY_SIZE in places that
technically should have been FS_MAX_KEY_SIZE.
The exception is that FS_AES_128_ECB_KEY_SIZE is used for key
derivation. But it's more appropriate to use
FS_KEY_DERIVATION_NONCE_SIZE for that instead.
Thus, just put the sizes directly in the 'available_modes' array.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
We're passing 'key_type_logon' to request_key(), so the found key is
guaranteed to be of type "logon". Thus, there is no reason to check
later that the key is really a "logon" key.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Now ->max_namelen() is only called to limit the filename length when
adding NUL padding, and only for real filenames -- not symlink targets.
It also didn't give the correct length for symlink targets anyway since
it forgot to subtract 'sizeof(struct fscrypt_symlink_data)'.
Thus, change ->max_namelen from a function to a simple 'unsigned int'
that gives the filesystem's maximum filename length.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
fname_decrypt() is validating that the encrypted filename is nonempty.
However, earlier a stronger precondition was already enforced: the
encrypted filename must be at least 16 (FS_CRYPTO_BLOCK_SIZE) bytes.
Drop the redundant check for an empty filename.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
fname_decrypt() returns an error if the input filename is longer than
the inode's ->max_namelen() as given by the filesystem. But, this
doesn't actually make sense because the filesystem provided the input
filename in the first place, where it was subject to the filesystem's
limits. And fname_decrypt() has no internal limit itself.
Thus, remove this unnecessary check.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
In fscrypt_setup_filename(), remove the unnecessary check for
fscrypt_get_encryption_info() returning EOPNOTSUPP. There's no reason
to handle this error differently from any other. I think there may have
been some confusion because the "notsupp" version of
fscrypt_get_encryption_info() returns EOPNOTSUPP -- but that's not
applicable from inside fs/crypto/.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
fscrypt is clearing the flags on the crypto_skcipher it allocates for
each inode. But, this is unnecessary and may cause problems in the
future because it will even clear flags that are meant to be internal to
the crypto API, e.g. CRYPTO_TFM_NEED_KEY.
Remove the unnecessary flag clearing.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>