Currently the code takes care to ensure that the prefixpath has a
leading '/' delimiter. What if someone passes us a prefixpath with a
leading '\\' instead? The code doesn't properly handle that currently
AFAICS.
Let's just change the code to skip over any leading delimiter character
when copying the prepath. Then, fix up the users of the prepath option
to prefix it with the correct delimiter when they use it.
Also, there's no need to limit the length of the prefixpath to 1k. If
the server can handle it, why bother forbidding it?
Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Make sure we free any existing memory allocated for vol->UNC, just in
case someone passes in multiple unc= options.
Get rid of the check for too long a UNC. The check for >300 bytes seems
arbitrary. We later copy this into the tcon->treeName, for instance and
it's a lot shorter than 300 bytes.
Eliminate an extra kmalloc and copy as well. Just set the vol->UNC
directly with the contents of match_strdup.
Establish that the UNC should be stored with '\\' delimiters. Use
convert_delimiter to change it in place in the vol->UNC.
Finally, move the check for a malformed UNC into
cifs_parse_mount_options so we can catch that situation earlier.
Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
The authority fields are supposed to be represented by a single 48-bit
value. It's also supposed to represent the value as hex if it's equal to
or greater than 2^32. This is documented in MS-DTYP, section 2.4.2.1.
Also, fix up the max string length to account for this fix.
Acked-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
If the server sends us a target that looks like an outlier, but
is lower than the existing target, then respect it anyway.
However defer actually updating the generation counter until
we get a target that doesn't look like an outlier.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
if ore_write() fails, we would unlock the pages of pcol, which is now
empty, rather than pcol_copy which owns the pages when ore_write() is
called. this means that no pages will actually be unlocked
(pcol.nr_pages == 0) and the writing process (more accurately, the
syncing process) will hang waiting for a writeback notification that
never comes.
moreover, if ore_write() fails, pcol_free() is called for pcol, whereas
pcol_copy is the object owning the ore_io_state, thus leaking the
ore_io_state.
[Boaz]
I have simplified Idan's original patch a bit, everything else still
holds
Signed-off-by: Idan Kedar <idank@tonian.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Most (all) NFS4ERR_BADSLOT errors are due to the client failing to
respect the server's sr_highest_slotid limit. This mainly happens
due to reordered RPC requests.
The way to handle it is simply to drop the slot that we're using,
and retry using the new highest_slotid limits.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Jian reported that the following sequence would leave "testfile" with
corrupt data:
# mount localhost:/export /mnt/nfs/ -o vers=3
# echo abc > /mnt/nfs/testfile; echo def >> /export/testfile; echo ghi >> /mnt/nfs/testfile
# cat -v /export/testfile
abc
^@^@^@^@ghi
While there's no locking involved here, the operations are serialized,
so CTO should prevent corruption.
The first write to the file is fine and writes 4 bytes. The file is then
extended on the server. When it's reopened a GETATTR is issued and the
size change is noticed. This causes NFS_INO_INVALID_DATA to be set on
the file. Because the file is opened for write only,
nfs_want_read_modify_write() returns 0 to nfs_write_begin().
nfs_updatepage then calls nfs_write_pageuptodate() to see if it should
extend the nfs_page to cover the whole page. NFS_INO_INVALID_DATA is
still set on the file at that point, but that flag is ignored and
nfs_pageuptodate erroneously extends the write to cover the whole page,
with the write done on the server side filled in with zeroes.
This patch just has that function check for NFS_INO_INVALID_DATA in
addition to NFS_INO_REVAL_PAGECACHE. This fixes the bug, but looking
over the code, I wonder if we might have a similar bug in
nfs_revalidate_size(). The difference between those two flags is very
subtle, so it seems like we ought to be checking for
NFS_INO_INVALID_DATA in most of the places that we look for
NFS_INO_REVAL_PAGECACHE.
I believe this is regression introduced by commit 8d197a568. The code
did check for NFS_INO_INVALID_DATA prior to that patch.
Original bug report is here:
https://bugzilla.redhat.com/show_bug.cgi?id=885743
Cc: <stable@vger.kernel.org> # 3.5+
Reported-by: Jian Li <jiali@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Commit 1f1ea6c "NFSv4: Fix buffer overflow checking in
__nfs4_get_acl_uncached" accidently dropped the checking for too small
result buffer length.
If someone uses getxattr on "system.nfs4_acl" on an NFSv4 mount
supporting ACLs, the ACL has not been cached and the buffer suplied is
too short, we still copy the complete ACL, resulting in kernel and user
space memory corruption.
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Not all architectures (in particular, sparc64) have empty_zero_page.
So instead of copying from empty_zero_page, use memset to clear the
inline data by signalling to ext4_xattr_set_entry() via a magic
pointer value, EXT4_ZERO_ATTR_VALUE, which is defined by casting -1 to
a pointer.
This fixes a build failure on sparc64, and the memset() should be more
efficient than using memcpy() anyway.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Flags being used by atomic operations in inode flags (e.g.
ext4_test_inode_flag(), should be consistent with that actually stored
in inodes, i.e.: EXT4_XXX_FL.
It ensures that this consistency is checked at build-time, not at
run-time.
Currently, the flags consistency are being checked at run-time, but,
there is no real reason to not do a build-time check instead of a
run-time check. The code is comparing macro defined values with enum
type variables, where both are constants, so, there is no problem in
comparing constants at build-time.
enum variables are treated as constants by the C compiler, according
to the C99 specs (see www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf
sec. 6.2.5, item 16), so, there is no real problem in comparing an
enumeration type at build time
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Ted has sent out a RFC about removing this feature. Eric and Jan
confirmed that both RedHat and SUSE enable this feature in all their
product. David also said that "As far as I know, it's enabled in all
Android kernels that use ext4." So it seems OK for us.
And what's more, as inline data depends its implementation on xattr,
and to be frank, I don't run any test again inline data enabled while
xattr disabled. So I think we should add inline data and remove this
config option in the same release.
[ The savings if you disable CONFIG_EXT4_FS_XATTR is only 27k, which
isn't much in the grand scheme of things. Since no one seems to be
testing this configuration except for some automated compile farms, on
balance we are better removing this config option, and so that it is
effectively always enabled. -- tytso ]
Cc: David Brown <davidb@codeaurora.org>
Cc: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
We use kzalloc() to allocate sbi, no need to zero its field.
Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
inode_init_always() will initialize inode->i_data.writeback_index
anyway, no need to do this in ext4_alloc_inode().
Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
We have a dedicated interface to sync inode metadata. Use it to
simplify ext4's code some.
Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
If we are punching hole in a file, we will return ENOTSUPP.
As for the fallocation of some extents, we will convert the
inline data to a normal extent based file first.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Now we that store data in the inode, in case we need to store some
xattrs and inode doesn't have enough space, Andreas suggested that we
should keep the xattr(metadata) in and data should be pushed out. So
this patch does the work.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
fiemap is used to find the disk layout of a file, as for inline data,
let us just pretend like a file with just one extent.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
In case we rename a directory, ext4_rename has to read the dir block
and change its dotdot's information. The old ext4_rename encapsulated
the dir_block read into itself. So this patch adds a new function
ext4_get_first_dir_block() which gets the dir buffer information so
the ext4_rename can handle it properly. As it will also change the
parent inode number, we return the parent_de so that ext4_rename() can
handle it more easily.
ext4_find_entry is also changed so that the caller(rename) can tell
whether the found entry is an inlined one or not and journaling the
corresponding buffer head.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
empty_dir is used when deleting a dir. So it should handle inline dir
properly.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Currently ext4_delete_entry() is used only for dir entry removing from
a dir block. So let us create a new function
ext4_generic_delete_entry and this function takes a entry_buf and a
buf_size so that it can be used for inline data.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Create a new function ext4_find_inline_entry() to handle the case of
inline data.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
search_dirblock is used to search a dir block, but the code is almost
the same for searching an inline dir.
So create a new fuction search_dir and let search_dirblock call it.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
For "." and "..", we just call filldir by ourselves
instead of iterating the real dir entry.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This patch let add_dir_entry handle the inline data case. So the
dir is initialized as inline dir first and then we can try to add
some files to it, when the inline space can't hold all the entries,
a dir block will be created and the dir entry will be moved to it.
Also for an inlined dir, "." and ".." are removed and we only use
4 bytes to store the parent inode number. These 2 entries will be
added when we convert an inline dir to a block-based one.
[ Folded in patch from Dan Carpenter to remove an unused variable. ]
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The old add_dirent_to_buf handles all the work related to the
work of adding dir entry to a dir block. Now we have inline data,
so create 2 new function __ext4_find_dest_de and __ext4_insert_dentry
that do the real work and let add_dirent_to_buf call them.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The __ext4_check_dir_entry() function() is used to check whether the
de is over the block boundary. Now with inline data, it could be
within the block boundary while exceeds the inode size. So check this
function to check the overflow more precisely.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Currently, the initialization of dot and dotdot are encapsulated in
ext4_mkdir and also bond with dir_block. So create a new function
named ext4_init_new_dir and the initialization is moved to
ext4_init_dot_dotdot. Now it will called either in the normal non-inline
case(rec_len of ".." will cover the whole block) or when we converting an
inline dir to a block(rec len of ".." will be the real length). The start
of the next entry is also returned for inline dir usage.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
For delayed allocation mode, we write to inline data if the file
is small enough. And in case of we write to some offset larger
than the inline size, the 1st page is dirtied, so that
ext4_da_writepages can handle the conversion. When the 1st page
is initialized with blocks, the inline part is removed.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
For a normal write case (not journalled write, not delayed
allocation), we write to the inline if the file is small and convert
it to an extent based file when the write is larger than the max
inline size.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Let readpage and readpages handle the case when we want to read an
inlined file.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Implement inline data with xattr.
Now we use "system.data" to store xattr, and the xattr will
be extended if the i_size is increased while we don't release
the space during truncate.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
SMB2 and later will return only 1 credit for session setup (phase 1)
not just for the negotiate protocol response. Do not disable
echoes and oplocks on session setup (we only need one credit
for tree connection anyway) as a resonse with only 1 credit
on phase 1 of sessionsetup is expected.
Fixes the "CIFS VFS: disabling echoes and oplocks" message
logged to dmesg.
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Jeff Layton <jlayton@samba.org>
Restructure code to make SMB2 vs. SMB3 signing a protocol
specific op. SMB3 signing (AES_CMAC) is not enabled yet,
but this restructuring at least makes sure we don't send
an smb2 signature on an smb3 signed connection. A followon
patch will add AES_CMAC and enable smb3 signing.
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Jeff Layton <jlayton@samba.org>
A SID could potentially be embedded inside of payload.value if there are
no subauthorities, and the arch has 8 byte pointers. Allow for that
possibility there.
While we're at it, rephrase the "embedding" check in terms of
key->payload to allow for the possibility that the union might change
size in the future.
Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
It was hardcoded to 192 bytes, which was not enough when the max number
of subauthorities went to 15. Redefine this constant in terms of sizeof
the structs involved, and rename it for better clarity.
While we're at it, remove a couple more unused constants from cifsacl.h.
Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Now that we aren't so rigid about the length of the key being passed
in, we need to be a bit more rigorous about checking the length of
the actual data against the claimed length (a'la num_subauths field).
Check for the case where userspace sends us a seemingly valid key
with a num_subauths field that goes beyond the end of the array. If
that happens, return -EIO and invalidate the key.
Also change the other places where we check for malformed keys in this
code to invalidate the key as well.
Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
The cifs.idmap keytype always allocates memory to hold the payload from
userspace. In the common case where we're translating a SID to a UID or
GID, we're allocating memory to hold something that's less than or equal
to the size of a pointer.
When the payload is the same size as a pointer or smaller, just store
it in the payload.value union member instead. That saves us an extra
allocation on the sid_to_id upcall.
Note that we have to take extra care to check the datalen when we
go to dereference the .data pointer in the union, but the callers
now check that as a matter of course anyway.
Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
The cifs.idmap handling code currently causes the kernel to cache the
data from userspace twice. It first looks in a rbtree to see if there is
a matching entry for the given id. If there isn't then it calls
request_key which then checks its cache and then calls out to userland
if it doesn't have one. If the userland program establishes a mapping
and downcalls with that info, it then gets cached in the keyring and in
this rbtree.
Aside from the double memory usage and the performance penalty in doing
all of these extra copies, there are some nasty bugs in here too. The
code declares four rbtrees and spinlocks to protect them, but only seems
to use two of them. The upshot is that the same tree is used to hold
(eg) uid:sid and sid:uid mappings. The comparitors aren't equipped to
deal with that.
I think we'd be best off to remove a layer of caching in this code. If
this was originally done for performance reasons, then that really seems
like a premature optimization.
This patch does that -- it removes the rbtrees and the locks that
protect them and simply has the code do a request_key call on each call
into sid_to_id and id_to_sid. This greatly simplifies this code and
should roughly halve the memory utilization from using the idmapping
code.
Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
The direct-IO write path already had the i_size checks in mm/filemap.c,
but it turns out the read path did not, and removing the block size
checks in fs/block_dev.c (commit bbec0270bd: "blkdev_max_block: make
private to fs/buffer.c") removed the magic "shrink IO to past the end of
the device" code there.
Fix it by truncating the IO to the size of the block device, like the
write path already does.
NOTE! I suspect the write path would be *much* better off doing it this
way in fs/block_dev.c, rather than hidden deep in mm/filemap.c. The
mm/filemap.c code is extremely hard to follow, and has various
conditionals on the target being a block device (ie the flag passed in
to 'generic_write_checks()', along with a conditional update of the
inode timestamp etc).
It is also quite possible that we should treat this whole block device
size as a "s_maxbytes" issue, and try to make the logic even more
generic. However, in the meantime this is the fairly minimal targeted
fix.
Noted by Milan Broz thanks to a regression test for the cryptsetup
reencrypt tool.
Reported-and-tested-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
by using cifs_invalidate_mapping rather than invalidate_remote_inode
in cifs_oplock_break - this invalidates all inode pages and resets
fscache cookies.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
We don't need to permit a write to the area locked with a read lock
by any process including the process that issues the write.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
I've legally changed my name with New York State, the US Social Security
Administration, et al. This patch propagates the name change and change
in initials and login to comments in the kernel source as well.
Signed-off-by: Nadia Yvette Chambers <nyc@holomorphy.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>