We do not want to dirty the dentry->d_flags cacheline in dput() just to
set the DCACHE_REFERENCED flag when it is already set in the common case
anyway. This way the first cacheline of the dentry (which contains the
RCU lookup information etc) can stay shared among multiple CPU's.
This finishes off some of the details of all the scalability patches
merged during the merge window.
Also don't mark dentry_kill() for inlining, since it's the uncommon path
and inlining it just makes the common path slower due to extra function
entry/exit overhead.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
At xfs_iext_add(), if extent(s) are being appended to the last page in
the indirection array and the new extent(s) don't fit in the page, the
number of extents(erp->er_extcount) in a new allocated entry should be
the minimum value between count and XFS_LINEAR_EXTS, instead of count.
For now, there is no existing test case can demonstrates a problem with
the er_extcount being set incorrectly here, but it obviously like a bug.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
This reverts commit 05713082ab. The idea
to remove __GFP_NOFAIL was opposed by Andrew Morton. Although mm guys do
want to get rid of __GFP_NOFAIL users, opencoding the allocation retry
is even worse.
See emails following
http://www.gossamer-threads.com/lists/linux/kernel/1809153#1809153
Signed-off-by: Jan Kara <jack@suse.cz>
commit 6686390bab (NFS: remove incorrect "Lock reclaim failed!"
warning.) added a test for a delegation before checking to see if any
reclaimed locks failed. The test however is backward and is only doing
that check when a delegation is held instead of when one isn't.
Cc: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Fixes: 6686390bab: NFS: remove incorrect "Lock reclaim failed!" warning.
Cc: stable@vger.kernel.org # 3.12
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
A bg that's been flagged "corrupt" by definition has no free blocks,
so that the allocator won't be tempted to use the damaged bg.
Therefore, we shouldn't count the clusters in the damaged group when
calculating free counts.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Previously, f2fs_sync_file() waits for all the node blocks to be written.
But, we don't need to do that, but wait only the inode-related node blocks.
This patch adds wait_on_node_pages_writeback() in which waits inode-related
node blocks that are on writeback.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
The server does allow NFS over v4.2, even if it doesn't add any new
operations yet.
I also switch to using constants to represent the last operation for
each minor version since this makes the code cleaner and easier to
understand at a quick glance.
Signed-off-by: Anna Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Today, if xfs_sb_read_verify encounters a v4 superblock
with junk past v4 fields which includes data in sb_crc,
it will be treated as a failing checksum and a significant
corruption.
There are known prior bugs which leave junk at the end
of the V4 superblock; we don't need to actually fail the
verification in this case if other checks pan out ok.
So if this is a secondary superblock, and the primary
superblock doesn't indicate that this is a V5 filesystem,
don't treat this as an actual checksum failure.
We should probably check the garbage condition as
we do in xfs_repair, and possibly warn about it
or self-heal, but that's a different scope of work.
Stable folks: This can go back to v3.10, which is what
introduced the sb CRC checking that is tripped up by old,
stale, incorrect V4 superblocks w/ unzeroed bits.
Cc: stable@vger.kernel.org
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
In xlog_verify_iclog a debug check of the incore log buffers prints an
error if icptr is null and then goes on to dereference the pointer
regardless. Convert this to an assert so that the intention is clear.
This was reported by Coverty.
Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
ASSERT on args takes place after args dereference.
This assertion is redundant since we are going to panic anyway.
Found by Linux Driver Verification project (linuxtesting.org) -
PVS-Studio analyzer.
Signed-off-by: Denis Efremov <yefremov.denis@gmail.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Page cache allocation doesn't always go through ->begin_write and
hence we don't always get the opportunity to set the allocation
context to GFP_NOFS. Failing to do this means we open up the direct
relcaim stack to recurse into the filesystem and consume a
significant amount of stack.
On RHEL6.4 kernels we are seeing ra_submit() and
generic_file_splice_read() from an nfsd context recursing into the
filesystem via the inode cache shrinker and evicting inodes. This is
causing truncation to be run (e.g EOF block freeing) and causing
bmap btree block merges and free space btree block splits to occur.
These btree manipulations are occurring with the call chain already
30 functions deep and hence there is not enough stack space to
complete such operations.
To avoid these specific overruns, we need to prevent the page cache
allocation from recursing via direct reclaim. We can do that because
the allocation functions take the allocation context from that which
is stored in the mapping for the inode. We don't set that right now,
so the default is GFP_HIGHUSER_MOVABLE, which is effectively a
GFP_KERNEL context. We need it to be the equivalent of GFP_NOFS, so
when we initialise an inode, set the mapping gfp mask appropriately.
This makes the use of AOP_FLAG_NOFS redundant from other parts of
the XFS IO path, so get rid of it.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
The kbuild test robot indicated that there were some new sparse
warnings in fs/xfs/xfs_dquot_buf.c. Actually, there were a lot more
that is wasn't warning about, so fix them all up.
Reported-by: kbuild test robot
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
The directory block format verifier fails to check that the leaf
entry count is in a valid range, and so if it is corrupted then it
can lead to derefencing a pointer outside the block buffer. While we
can't exactly validate the count without first walking the directory
block, we can ensure the count lands in the valid area within the
directory block and hence avoid out-of-block references.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Rather than hiding the ftype field size accounting inside the dirent
padding for the ".." and first entry offset functions for v2
directory formats, add explicit functions that calculate it
correctly.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Many of the vectorised function calls now take no parameters and
return a constant value. There is no reason for these to be vectored
functions, so convert them to constants
Binary sizes:
text data bss dec hex filename
794490 96802 1096 892388 d9de4 fs/xfs/xfs.o.orig
792986 96802 1096 890884 d9804 fs/xfs/xfs.o.p1
792350 96802 1096 890248 d9588 fs/xfs/xfs.o.p2
789293 96802 1096 887191 d8997 fs/xfs/xfs.o.p3
789005 96802 1096 886903 d8997 fs/xfs/xfs.o.p4
789061 96802 1096 886959 d88af fs/xfs/xfs.o.p5
789733 96802 1096 887631 d8b4f fs/xfs/xfs.o.p6
791421 96802 1096 889319 d91e7 fs/xfs/xfs.o.p7
791701 96802 1096 889599 d92ff fs/xfs/xfs.o.p8
791205 96802 1096 889103 d91cf fs/xfs/xfs.o.p9
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Next step in the vectorisation process is the directory free block
encode/decode operations. There are relatively few of these, though
there are quite a number of calls to them.
Binary sizes:
text data bss dec hex filename
794490 96802 1096 892388 d9de4 fs/xfs/xfs.o.orig
792986 96802 1096 890884 d9804 fs/xfs/xfs.o.p1
792350 96802 1096 890248 d9588 fs/xfs/xfs.o.p2
789293 96802 1096 887191 d8997 fs/xfs/xfs.o.p3
789005 96802 1096 886903 d8997 fs/xfs/xfs.o.p4
789061 96802 1096 886959 d88af fs/xfs/xfs.o.p5
789733 96802 1096 887631 d8b4f fs/xfs/xfs.o.p6
791421 96802 1096 889319 d91e7 fs/xfs/xfs.o.p7
791701 96802 1096 889599 d92ff fs/xfs/xfs.o.p8
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Conversion from on-disk structures to in-core header structures
currently relies on magic number checks. If the magic number is
wrong, but one of the supported values, we do the wrong thing with
the encode/decode operation. Split these functions so that there are
discrete operations for the specific directory format we are
handling.
In doing this, move all the header encode/decode functions to
xfs_da_format.c as they are directly manipulating the on-disk
format. It should be noted that all the growth in binary size is
from xfs_da_format.c - the rest of the code actaully shrinks.
text data bss dec hex filename
794490 96802 1096 892388 d9de4 fs/xfs/xfs.o.orig
792986 96802 1096 890884 d9804 fs/xfs/xfs.o.p1
792350 96802 1096 890248 d9588 fs/xfs/xfs.o.p2
789293 96802 1096 887191 d8997 fs/xfs/xfs.o.p3
789005 96802 1096 886903 d8997 fs/xfs/xfs.o.p4
789061 96802 1096 886959 d88af fs/xfs/xfs.o.p5
789733 96802 1096 887631 d8b4f fs/xfs/xfs.o.p6
791421 96802 1096 889319 d91e7 fs/xfs/xfs.o.p7
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
The remaining non-vectorised code for the directory structure is the
node format blocks. This is shared with the attribute tree, and so
is slightly more complex to vectorise.
Introduce a "non-directory" directory ops structure that is attached
to all non-directory inodes so that attribute operations can be
vectorised for all inodes.
Once we do this, we can vectorise all the da btree operations.
Because this patch adds more infrastructure than it removes the
binary size does not decrease:
text data bss dec hex filename
794490 96802 1096 892388 d9de4 fs/xfs/xfs.o.orig
792986 96802 1096 890884 d9804 fs/xfs/xfs.o.p1
792350 96802 1096 890248 d9588 fs/xfs/xfs.o.p2
789293 96802 1096 887191 d8997 fs/xfs/xfs.o.p3
789005 96802 1096 886903 d8997 fs/xfs/xfs.o.p4
789061 96802 1096 886959 d88af fs/xfs/xfs.o.p5
789733 96802 1096 887631 d8b4f fs/xfs/xfs.o.p6
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Next step in the vectorisation process is the leaf block
encode/decode operations. Most of the operations on leaves are
handled by the data block vectors, so there are relatively few of
them here.
Because of all the shuffling of code and having to pass more state
to some functions, this patch doesn't directly reduce the size of
the binary. It does open up many more opportunities for factoring
and optimisation, however.
text data bss dec hex filename
794490 96802 1096 892388 d9de4 fs/xfs/xfs.o.orig
792986 96802 1096 890884 d9804 fs/xfs/xfs.o.p1
792350 96802 1096 890248 d9588 fs/xfs/xfs.o.p2
789293 96802 1096 887191 d8997 fs/xfs/xfs.o.p3
789005 96802 1096 886903 d8997 fs/xfs/xfs.o.p4
789061 96802 1096 886959 d88af fs/xfs/xfs.o.p5
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
Convert the rest of the directory data block encode/decode
operations to vector format.
This further reduces the size of the built binary:
text data bss dec hex filename
794490 96802 1096 892388 d9de4 fs/xfs/xfs.o.orig
792986 96802 1096 890884 d9804 fs/xfs/xfs.o.p1
792350 96802 1096 890248 d9588 fs/xfs/xfs.o.p2
789293 96802 1096 887191 d8997 fs/xfs/xfs.o.p3
789005 96802 1096 886903 d8997 fs/xfs/xfs.o.p4
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
Following from the initial patches to vectorise the shortform
directory encode/decode operations, convert half the data block
operations to use the vector. The rest will be done in a second
patch.
This further reduces the size of the built binary:
text data bss dec hex filename
794490 96802 1096 892388 d9de4 fs/xfs/xfs.o.orig
792986 96802 1096 890884 d9804 fs/xfs/xfs.o.p1
792350 96802 1096 890248 d9588 fs/xfs/xfs.o.p2
789293 96802 1096 887191 d8997 fs/xfs/xfs.o.p3
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
Following from the initial patch to introduce the directory
operations vector, convert the rest of the shortform directory
operations to use vectored ops rather than superblock feature
checks. This further reduces the size of the built binary:
text data bss dec hex filename
794490 96802 1096 892388 d9de4 fs/xfs/xfs.o.orig
792986 96802 1096 890884 d9804 fs/xfs/xfs.o.p1
792350 96802 1096 890248 d9588 fs/xfs/xfs.o.p2
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
Lots of the dir code now goes through switches to determine what is
the correct on-disk format to parse. It generally involves a
"xfs_sbversion_hasfoo" check, deferencing the superblock version and
feature fields and hence touching several cache lines per operation
in the process. Some operations do multiple checks because they nest
conditional operations and they don't pass the information in a
direct fashion between each other.
Hence, add an ops vector to the xfs_inode structure that is
configured when the inode is initialised to point to all the correct
decode and encoding operations. This will significantly reduce the
branchiness and cacheline footprint of the directory object decoding
and encoding.
This is the first patch in a series of conversion patches. It will
introduce the ops structure, the setup of it and add the first
operation to the vector. Subsequent patches will convert directory
ops one at a time to keep the changes simple and obvious.
Just this patch shows the benefit of such an approach on code size.
Just converting the two shortform dir operations as this patch does
decreases the built binary size by ~1500 bytes:
$ size fs/xfs/xfs.o.orig fs/xfs/xfs.o.p1
text data bss dec hex filename
794490 96802 1096 892388 d9de4 fs/xfs/xfs.o.orig
792986 96802 1096 890884 d9804 fs/xfs/xfs.o.p1
$
That's a significant decrease in the instruction cache footprint of
the directory code for such a simple change, and indicates that this
approach is definitely worth pursuing further.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
Both POSIX.1-2008 and Linux Programmer's Manual have a dedicated return
error code for a case, when a file doesn't support mmap(), it's ENODEV.
This change replaces overloaded EINVAL with ENODEV in a situation
described above for sysfs binary files.
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When using FITRIM ioctl on a file system without journal it will
only trim the block group once, no matter how many times you invoke
FITRIM ioctl and how many block you release from the block group.
It is because we only clear EXT4_GROUP_INFO_WAS_TRIMMED_BIT in journal
callback. Fix this by clearing the bit in no journal mode as well.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Jorge Fábregas <jorge.fabregas@gmail.com>
A comment claims the caller should take it, but that's not being done.
Note we don't want it around the cancel_delayed_work_sync since that may
wait on work which holds the client lock.
Reported-by: Benny Halevy <bhalevy@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This reverts commit 9745cdb36d (select: use freezable blocking call)
that triggers problems during resume from suspend to RAM on Paul Bolle's
32-bit x86 machines. Paul says:
Ever since I tried running (release candidates of) v3.11 on the two
working i686s I still have lying around I ran into issues on resuming
from suspend. Reverting 9745cdb36d (select: use freezable blocking
call) resolves those issues.
Resuming from suspend on i686 on (release candidates of) v3.11 and
later triggers issues like:
traps: systemd[1] general protection ip:b738e490 sp:bf882fc0 error:0 in libc-2.16.so[b731c000+1b0000]
and
traps: rtkit-daemon[552] general protection ip:804d6e5 sp:b6cb32f0 error:0 in rtkit-daemon[8048000+d000]
Once I hit the systemd error I can only get out of the mess that the
system is at that point by power cycling it.
Since we are reverting another freezer-related change causing similar
problems to happen, this one should be reverted as well.
References: https://lkml.org/lkml/2013/10/29/583
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Fixes: 9745cdb36d (select: use freezable blocking call)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 3.11+ <stable@vger.kernel.org> # 3.11+
This reverts commit 1c441e9212 (epoll: use freezable blocking call)
which is reported to cause user space memory corruption to happen
after suspend to RAM.
Since it appears to be extremely difficult to root cause this
problem, it is best to revert the offending commit and try to address
the original issue in a better way later.
References: https://bugzilla.kernel.org/show_bug.cgi?id=61781
Reported-by: Natrio <natrio@list.ru>
Reported-by: Jeff Pohlmeyer <yetanothergeek@gmail.com>
Bisected-by: Leo Wolf <jclw@ymail.com>
Fixes: 1c441e9212 (epoll: use freezable blocking call)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 3.11+ <stable@vger.kernel.org> # 3.11+
We were using a different array of function pointers to represent each
minor version. This makes adding a new minor version tedious, since it
needs a step to copy, paste and modify a new version of the same
functions.
This patch combines the v4 and v4.1 arrays into a single instance and
will check minor version support inside each decoder function.
Signed-off-by: Anna Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
In ext4_read_inline_dir(), if there is inline data, the successful
return value is the return value of ext4_read_inline_data(). Howewer,
this is used by ext4_readdir(), and while it seems harmless to return
a positive value on success, it's inconsistent, since historically
we've always return 0 on success.
Signed-off-by: BoxiLiu <lewis.liulei@huawei.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Tao Ma <boyu.mt@taobao.com>
Pair the two trace events to make troubeshooting writepages
easier, and it should be more convinient to write a simple script
to parse the traces.
Cc: linux-ext4@vger.kernel.org
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Previously, check_block_count check valid_map with bit data type in common
scenario that sit has all ones or zeros bitmap, it makes low mount performance.
So let's check the special bitmap with integer data type instead of the bit one.
v1-->v2:
o use find_next_{zero_}bit_le for better performance and readable as Jaegeuk
suggested.
o use neat logogram in comment as Gu Zheng suggested.
o search continuous ones or zeros for better performance when checking mixed
bitmap.
Suggested-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Signed-off-by: Shu Tan <shu.tan@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
npages_for_summary_flush uses (SUMMARY_SIZE + 1) as the size of a f2fs_summary
while its actual size is SUMMARY_SIZE. So the result sometimes is bigger than
actual number by one, which causes checkpoint can't be written into disk
contiguously, and sometimes summary blocks can't be compacted like they should.
Besides, when writing summary blocks into pages, if remain space in a page
isn't big enough for one f2fs_summary, it will be left unused, current code
seems not to take it into account.
Signed-off-by: Fan Li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Separate out sysfs_warn_dup() out of sysfs_add_one(). This will help
separating out the core sysfs functionalities into kernfs so that it
can be used by non-sysfs users too.
This doesn't make any functional changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Most removal related logic is implemented in fs/sysfs/dir.c. Move
sysfs_hash_and_remove() to fs/sysfs/dir.c so that __sysfs_remove()
doesn't have to be public.
This is pure relocation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sysfs_get_dentry() has been gone for years now. Remove the left-over
prototype.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ignore_lockdep is currently honored only for regular files. There's
no reason to ignore it for bin files. Update sysfs_ignore_lockdep()
so that bin_attr.attr.ignore_lockdep works too.
While this doesn't have any in-kernel user, this unifies the behaviors
between regular and bin files and will help later changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3124eb1679 ("sysfs: merge regular and bin file handling") folded bin
file handling into regular file handling. Among other things, bin
file now shares the same open path including sysfs_open_dirent
association using sysfs_dirent->s_attr.open. This is buggy because
->s_bin_attr lives in the same union and doesn't have the field. This
bug doesn't trigger because sysfs_elem_bin_attr doesn't have an active
field at the conflicting position. It does have a field "buffers" but
it isn't used anymore.
This patch collapses sysfs_elem_bin_attr into sysfs_elem_attr so that
the bin_attr is accessed through ->s_attr.bin_attr which lives with
->s_attr.attr in an anonymous union. The code paths already assume
bin_attr contains attr as the first element, so this doesn't add any
more assumptions while making it explicit that the two types are
handled together.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If we're going to refuse to accept these it would be polite of us to at
least say so....
This introduces a slight complication since we need to grandfather in
exportfs's ill-advised use of -1 uid and gid on its test_export.
If it turns out there are other users passing down -1 we may need to
do something else.
Best might be to drop the checks entirely, but I'm not sure if other
parts of the kernel might assume that a task can't run as uid or gid -1.
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Someone noticed exportfs happily accepted exports that would later be
rejected when mountd tried to give them to the kernel. Fix this.
This is a regression from 4c1e1b34d5
"nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids".
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: stable@vger.kernel.org
Reported-by: Yin.JianHong <jiyin@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The reporter saw a NULL dereference when a filesystem's ->mknod returned
success but left the dentry negative, and then nfsd tried to dereference
d_inode (in this case because the CREATE was followed by a GETATTR in
the same nfsv4 compound).
fh_update already checks for this and another broken case, but for some
reason it returns success and leaves nfsd trying to soldier on. If it
failed we'd avoid the crash. There's only so much we can do with a
buggy filesystem, but it's easy enough to bail out here, so let's do
that.
Reported-by: Antti Tönkyrä <daedalus@pingtimeout.net>
Tested-by: Antti Tönkyrä <daedalus@pingtimeout.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[use list_splice_init]
Signed-off-by: Benny Halevy <bhalevy@primarydata.com>
[bfields: no need for recall_lock here]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
idr_remove is about to be called before kmem_cache_free so unhashing it
is redundant
Signed-off-by: Benny Halevy <bhalevy@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
During xattr updating, free size should be corrected to remainder free size
+ old entry size.
It can avoid ENOSPC error when we update old entry with the same size new
entry at fully filled xattr.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
If you want to remove unnecessary BUG_ONs, you can just turn off F2FS_CHECK_FS
in your kernel config.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This config will support an option to remove so many BUG_ONs that degrade
the performance potentially.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Pull fs-cache fixes from David Howells:
Can you pull these commits to fix an issue with NFS whereby caching can be
enabled on a file that is open for writing by subsequently opening it for
reading. This can be made to crash by opening it for writing again if you're
quick enough.
The gist of the patchset is that the cookie should be acquired at inode
creation only and subsequently enabled and disabled as appropriate (which
dispenses with the backing objects when they're not needed).
The extra synchronisation that NFS does can then be dispensed with as it is
thenceforth managed by FS-Cache.
Could you send these on to Linus?
This likely will need fixing also in CIFS and 9P also once the FS-Cache
changes are upstream. AFS and Ceph are probably safe.
* 'fscache' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
NFS: Use i_writecount to control whether to get an fscache cookie in nfs_open()
FS-Cache: Provide the ability to enable/disable cookies
FS-Cache: Add use/unuse/wake cookie wrappers
This check was added by Al Viro with
d9e80b7de9 "nfs d_revalidate() is too
trigger-happy with d_drop()", with the explanation that we don't want to
remove the root of a disconnected tree, which will still be included on
the s_anon list.
But DCACHE_DISCONNECTED does *not* actually identify dentries that are
disconnected from the dentry tree or hashed on s_anon. IS_ROOT() is the
way to do that.
Also add a comment from Al's commit to remind us why this check is
there.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>