Convert w_dq_id to be a struct kquid and remove the now unncessary
w_dq_type.
This is a simple conversion and enough other places have already
been converted that this actually reduces the code complexity
by a little bit, when removing now unnecessary type conversions.
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Change struct dquot dq_id to a struct kqid and remove the now
unecessary dq_type.
Make minimal changes to dquot, quota_tree, quota_v1, quota_v2, ext3,
ext4, and ocfs2 to deal with the change in quota structures and
signatures. The ocfs2 changes are larger than most because of the
extensive tracing throughout the ocfs2 quota code that prints out
dq_id.
quota_tree.c:get_index is modified to take a struct kqid instead of a
qid_t because all of it's callers pass in dquot->dq_id and it allows
me to introduce only a single conversion.
The rest of the changes are either just replacing dq_type with dq_id.type,
adding conversions to deal with the change in type and occassionally
adding qid_eq to allow quota id comparisons in a user namespace safe way.
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Theodore Tso <tytso@mit.edu>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Modify dqget to take struct kqid instead of a type and an identifier
pair.
Modify the callers of dqget in ocfs2 and dquot to take generate
a struct kqid so they can continue to call dqget. The conversion
to create struct kqid should all be the final conversions that
are needed in those code paths.
Cc: Jan Kara <jack@suse.cz>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Modify quota_send_warning to take struct kqid instead a type and
identifier pair.
When sending netlink broadcasts always convert uids and quota
identifiers into the intial user namespace. There is as yet no way to
send a netlink broadcast message with different contents to receivers
in different namespaces, so for the time being just map all of the
identifiers into the initial user namespace which preserves the
current behavior.
Change the callers of quota_send_warning in gfs2, xfs and dquot
to generate a struct kqid to pass to quota send warning. When
all of the user namespaces convesions are complete a struct kqid
values will be availbe without need for conversion, but a conversion
is needed now to avoid needing to convert everything at once.
Cc: Ben Myers <bpm@sgi.com>
Cc: Alex Elder <elder@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Update the quotactl user space interface to successfull compile with
user namespaces support enabled and to hand off quota identifiers to
lower layers of the kernel in struct kqid instead of type and qid
pairs.
The quota on function is not converted because while it takes a quota
type and an id. The id is the on disk quota format to use, which
is something completely different.
The signature of two struct quotactl_ops methods were changed to take
struct kqid argumetns get_dqblk and set_dqblk.
The dquot, xfs, and ocfs2 implementations of get_dqblk and set_dqblk
are minimally changed so that the code continues to work with
the change in parameter type.
This is the first in a series of changes to always store quota
identifiers in the kernel in struct kqid and only use raw type and qid
values when interacting with on disk structures or userspace. Always
using struct kqid internally makes it hard to miss places that need
conversion to or from the kernel internal values.
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Ben Myers <bpm@sgi.com>
Cc: Alex Elder <elder@kernel.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Add the data type struct kqid which holds the kernel internal form of
the owning identifier of a quota. struct kqid is a replacement for
the implicit union of uid, gid and project id stored in an unsigned
int and the quota type field that is was used in the quota data
structures. Making the data type explicit allows the kuid_t and
kgid_t type safety to propogate more thoroughly through the code,
revealing more places where uid/gid conversions need be made.
Along with the data type struct kqid comes the helper functions
qid_eq, qid_lt, from_kqid, from_kqid_munged, qid_valid, make_kqid,
make_kqid_invalid, make_kqid_uid, make_kqid_gid.
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Implement kprojid_t a cousin of the kuid_t and kgid_t.
The per user namespace mapping of project id values can be set with
/proc/<pid>/projid_map.
A full compliment of helpers is provided: make_kprojid, from_kprojid,
from_kprojid_munged, kporjid_has_mapping, projid_valid, projid_eq,
projid_eq, projid_lt.
Project identifiers are part of the generic disk quota interface,
although it appears only xfs implements project identifiers currently.
The xfs code allows anyone who has permission to set the project
identifier on a file to use any project identifier so when
setting up the user namespace project identifier mappings I do
not require a capability.
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Convert ext2, ext3, and ext4 to fully support the posix acl changes,
using e_uid e_gid instead e_id.
Enabled building with posix acls enabled, all filesystems supporting
user namespaces, now also support posix acls when user namespaces are enabled.
Cc: Theodore Tso <tytso@mit.edu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
- Pass the user namespace the uid and gid values in the xattr are stored
in into posix_acl_from_xattr.
- Pass the user namespace kuid and kgid values should be converted into
when storing uid and gid values in an xattr in posix_acl_to_xattr.
- Modify all callers of posix_acl_from_xattr and posix_acl_to_xattr to
pass in &init_user_ns.
In the short term this change is not strictly needed but it makes the
code clearer. In the longer term this change is necessary to be able to
mount filesystems outside of the initial user namespace that natively
store posix acls in the linux xattr format.
Cc: Theodore Tso <tytso@mit.edu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
- In setxattr if we are setting a posix acl convert uids and gids from
the current user namespace into the initial user namespace, before
the xattrs are passed to the underlying filesystem.
Untranslatable uids and gids are represented as -1 which
posix_acl_from_xattr will represent as INVALID_UID or INVALID_GID.
posix_acl_valid will fail if an acl from userspace has any
INVALID_UID or INVALID_GID values. In net this guarantees that
untranslatable posix acls will not be stored by filesystems.
- In getxattr if we are reading a posix acl convert uids and gids from
the initial user namespace into the current user namespace.
Uids and gids that can not be tranlsated into the current user namespace
will be represented as -1.
- Replace e_id in struct posix_acl_entry with an anymouns union of
e_uid and e_gid. For the short term retain the e_id field
until all of the users are converted.
- Don't set struct posix_acl.e_id in the cases where the acl type
does not use e_id. Greatly reducing the use of ACL_UNDEFINED_ID.
- Rework the ordering checks in posix_acl_valid so that I use kuid_t
and kgid_t types throughout the code, and so that I don't need
arithmetic on uid and gid types.
Cc: Theodore Tso <tytso@mit.edu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
htree_dirblock_to_tree() declares a non-initialized 'err' variable,
which is passed as a reference to another functions expecting them to
set this variable with their error codes.
It's passed to ext4_bread(), which then passes it to ext4_getblk(). If
ext4_map_blocks() returns 0 due to a lookup failure, leaving the
ext4_getblk() buffer_head uninitialized, it will make ext4_getblk()
return to ext4_bread() without initialize the 'err' variable, and
ext4_bread() will return to htree_dirblock_to_tree() with this variable
still uninitialized. htree_dirblock_to_tree() will pass this variable
with garbage back to ext4_htree_fill_tree(), which expects a number of
directory entries added to the rb-tree. which, in case, might return a
fake non-zero value due the garbage left in the 'err' variable, leading
the kernel to an Oops in ext4_dx_readdir(), once this is expecting a
filled rb-tree node, when in turn it will have a NULL-ed one, causing an
invalid page request when trying to get a fname struct from this NULL-ed
rb-tree node in this line:
fname = rb_entry(info->curr_node, struct fname, rb_hash);
The patch itself initializes the err variable in
htree_dirblock_to_tree() to avoid usage mistakes by the called
functions, and also fix ext4_getblk() to return a initialized 'err'
variable when ext4_map_blocks() fails a lookup.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Always store audit loginuids in type kuid_t.
Print loginuids by converting them into uids in the appropriate user
namespace, and then printing the resulting uid.
Modify audit_get_loginuid to return a kuid_t.
Modify audit_set_loginuid to take a kuid_t.
Modify /proc/<pid>/loginuid on read to convert the loginuid into the
user namespace of the opener of the file.
Modify /proc/<pid>/loginud on write to convert the loginuid
rom the user namespace of the opener of the file.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: Paul Moore <paul@paul-moore.com> ?
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
These are generally very small strings. We don't need an entire 4k
allocation for each. Instead, just free and reallocate them on an
as-needed basis.
Note: This patch is untested since I don't have a 9p server available at
the moment. It's mainly something I noticed while doing some
getname/putname cleanup work.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
The unregister_sysctl_table() function hangs if all references to its
ctl_table_header structure are not dropped.
This can happen sometimes because of a leak in proc_sys_lookup():
proc_sys_lookup() gets a reference to the table via lookup_entry(), but
it does not release it when a subsequent call to sysctl_follow_link()
fails.
This patch fixes this leak by making sure the reference is always
dropped on return.
See also commit 076c3eed2c ("sysctl: Rewrite proc_sys_lookup
introducing find_entry and lookup_entry") which reorganized this code in
3.4.
Tested in Linux 3.4.4.
Signed-off-by: Francesco Ruggeri <fruggeri@aristanetworks.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cancel work of the xfs_sync_worker before teardown of the log in
xfs_unmountfs. This prevents occasional crashes on unmount like so:
PID: 21602 TASK: ee9df060 CPU: 0 COMMAND: "kworker/0:3"
#0 [c5377d28] crash_kexec at c0292c94
#1 [c5377d80] oops_end at c07090c2
#2 [c5377d98] no_context at c06f614e
#3 [c5377dbc] __bad_area_nosemaphore at c06f6281
#4 [c5377df4] bad_area_nosemaphore at c06f629b
#5 [c5377e00] do_page_fault at c070b0cb
#6 [c5377e7c] error_code (via page_fault) at c070892c
EAX: f300c6a8 EBX: f300c6a8 ECX: 000000c0 EDX: 000000c0 EBP: c5377ed0
DS: 007b ESI: 00000000 ES: 007b EDI: 00000001 GS: ffffad20
CS: 0060 EIP: c0481ad0 ERR: ffffffff EFLAGS: 00010246
#7 [c5377eb0] atomic64_read_cx8 at c0481ad0
#8 [c5377ebc] xlog_assign_tail_lsn_locked at f7cc7c6e [xfs]
#9 [c5377ed4] xfs_trans_ail_delete_bulk at f7ccd520 [xfs]
#10 [c5377f0c] xfs_buf_iodone at f7ccb602 [xfs]
#11 [c5377f24] xfs_buf_do_callbacks at f7cca524 [xfs]
#12 [c5377f30] xfs_buf_iodone_callbacks at f7cca5da [xfs]
#13 [c5377f4c] xfs_buf_iodone_work at f7c718d0 [xfs]
#14 [c5377f58] process_one_work at c024ee4c
#15 [c5377f98] worker_thread at c024f43d
#16 [c5377fbc] kthread at c025326b
#17 [c5377fe8] kernel_thread_helper at c070e834
PID: 26653 TASK: e79143b0 CPU: 3 COMMAND: "umount"
#0 [cde0fda0] __schedule at c0706595
#1 [cde0fe28] schedule at c0706b89
#2 [cde0fe30] schedule_timeout at c0705600
#3 [cde0fe94] __down_common at c0706098
#4 [cde0fec8] __down at c0706122
#5 [cde0fed0] down at c025936f
#6 [cde0fee0] xfs_buf_lock at f7c7131d [xfs]
#7 [cde0ff00] xfs_freesb at f7cc2236 [xfs]
#8 [cde0ff10] xfs_fs_put_super at f7c80f21 [xfs]
#9 [cde0ff1c] generic_shutdown_super at c0333d7a
#10 [cde0ff38] kill_block_super at c0333e0f
#11 [cde0ff48] deactivate_locked_super at c0334218
#12 [cde0ff58] deactivate_super at c033495d
#13 [cde0ff68] mntput_no_expire at c034bc13
#14 [cde0ff7c] sys_umount at c034cc69
#15 [cde0ffa0] sys_oldumount at c034ccd4
#16 [cde0ffb0] system_call at c0707e66
commit 11159a05 added this to xfs_log_unmount and needs to be cleaned up
at a later date.
Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Using list_move() instead of list_del() + list_add().
spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
This patch adds support for the two linux interfaces of the discard/TRIM
command for SSD devices and sparse/thinly-provisioned LUNs.
JFS will support batched discard via FITRIM ioctl and online discard
with the discard mount option.
Signed-off-by: Tino Reichardt <list-jfs@mcmilk.de>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Pull a btrfs revert from Chris Mason:
"My for-linus branch has one revert in the new quota code.
We're building up more fixes at etc for the next merge window, but I'm
keeping them out unless they are bigger regressions or have a huge
impact."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Revert "Btrfs: fix some error codes in btrfs_qgroup_inherit()"
Conflicts:
net/netfilter/nfnetlink_log.c
net/netfilter/xt_LOG.c
Rather easy conflict resolution, the 'net' tree had bug fixes to make
sure we checked if a socket is a time-wait one or not and elide the
logging code if so.
Whereas on the 'net-next' side we are calculating the UID and GID from
the creds using different interfaces due to the user namespace changes
from Eric Biederman.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull GFS2 fixes from Steven Whitehouse:
"Here are three GFS2 fixes for the current kernel tree. These are all
related to the block reservation code which was added at the merge
window. That code will be getting an update at the forthcoming merge
window too. In the mean time though there are a few smaller issues
which should be fixed.
The first patch resolves an issue with write sizes of greater than 32
bits with the size hinting code. The second ensures that the
allocation data structure is initialised when using xattrs and the
third takes into account allocations which may have been made by other
nodes which affect a reservation on the local node."
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
GFS2: Take account of blockages when using reserved blocks
GFS2: Fix missing allocation data for set/remove xattr
GFS2: Make write size hinting code common
shared memory mapping is dirtied and unmapped. The lower file was being
released when the eCryptfs file was closed and the dirtied pages could not be
written out.
- Adds a call to the lower filesystem's ->flush() from ecryptfs_flush().
- Fixes a regression, introduced in 2.6.39, when a file is renamed on top of
another file. The target file's inode was not being evicted and the space
taken by the file was not reclaimed until eCryptfs was unmounted.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABCgAGBQJQU14UAAoJENaSAD2qAscKrpwQAK+Xu3RAHPLO16bEjhrFZaIr
FY2z8yK5geFzHfmhDxUT7jovw8d2ghAjrE3XHf+PgrqfUqIZ6vo57GFmR8ZNS+30
exNOb1DffaRqIejStrFkz+Ek4/FbnaPD/FvJYgC8D3Bv04YllyNI9Euq49VkkOiy
v6jxAy4H+/MCGzYvR4K0jmKlOLuC47TtJ322jXTm0iNeJJH2+kNXm0WvE8dY7dYV
2QMvv11M1HMI39S4yIY0eBPwbloP3AoO++H3Id7/XWcIlaaeZrVA7HmV8z04DEWq
Nt2vceSbX4FVBxmLab5TzTftorskJRDLCtMaSM8opve+3Sxgx+wooLskgrHqJjZH
8AtGfs/ZzvUkKcYMNv+iMax0/KqkGKOhdJMmqc37aI8qkBo5IYXpIPU1ZTy/S2T8
t3fM8jEI4AOKFn7GlG0kZg7EXo0wWPw3A8X73tCQx8W6sJy5STvv07hFZspX/4nn
+qhlM7Qzpr1XXPEv9cKo4oXzqOmlSi1Jd2ueGANNGJvche90uCWgmSZCfNNxGnPq
mLiiiV6KJ94xjjyf/ZWiKLryk5rv0fWiYXzutaORADn6WWwIVKX9IyvJfFI8y+M6
phREWNdOnyDFFA++h5QotWh3L7ZCL/HNeTpLDSD1ONxosVDhViviA2HrRlyl8LSc
soDzN7w+yOh+QjlN4sO/
=UC2V
-----END PGP SIGNATURE-----
Merge tag 'ecryptfs-3.6-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull ecryptfs fixes from Tyler Hicks:
- Fixes a regression, introduced in 3.6-rc1, when a file is closed
before its shared memory mapping is dirtied and unmapped. The lower
file was being released when the eCryptfs file was closed and the
dirtied pages could not be written out.
- Adds a call to the lower filesystem's ->flush() from
ecryptfs_flush().
- Fixes a regression, introduced in 2.6.39, when a file is renamed on
top of another file. The target file's inode was not being evicted
and the space taken by the file was not reclaimed until eCryptfs was
unmounted.
* tag 'ecryptfs-3.6-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: Copy up attributes of the lower target inode after rename
eCryptfs: Call lower ->flush() from ecryptfs_flush()
eCryptfs: Write out all dirty pages just before releasing the lower file
This reverts commit 5986802c2f.
Both paths are not error paths but regular cases where non-qgroup
subvols are involved.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
We already use them for openat() and friends, but fstat() also wants to
be able to use O_PATH file descriptors. This should make it more
directly comparable to the O_SEARCH of Solaris.
Note that you could already do the same thing with "fstatat()" and an
empty path, but just doing "fstat()" directly is simpler and faster, so
there is no reason not to just allow it directly.
See also commit 332a2e1244, which did the same thing for fchdir, for
the same reasons.
Reported-by: ольга крыжановская <olga.kryzhanovska@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org # O_PATH introduced in 3.0+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After calling into the lower filesystem to do a rename, the lower target
inode's attributes were not copied up to the eCryptfs target inode. This
resulted in the eCryptfs target inode staying around, rather than being
evicted, because i_nlink was not updated for the eCryptfs inode. This
also meant that eCryptfs didn't do the final iput() on the lower target
inode so it stayed around, as well. This would result in a failure to
free up space occupied by the target file in the rename() operation.
Both target inodes would eventually be evicted when the eCryptfs
filesystem was unmounted.
This patch calls fsstack_copy_attr_all() after the lower filesystem
does its ->rename() so that important inode attributes, such as i_nlink,
are updated at the eCryptfs layer. ecryptfs_evict_inode() is now called
and eCryptfs can drop its final reference on the lower inode.
http://launchpad.net/bugs/561129
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Tested-by: Colin Ian King <colin.king@canonical.com>
Cc: <stable@vger.kernel.org> [2.6.39+]
Since eCryptfs only calls fput() on the lower file in
ecryptfs_release(), eCryptfs should call the lower filesystem's
->flush() from ecryptfs_flush().
If the lower filesystem implements ->flush(), then eCryptfs should try
to flush out any dirty pages prior to calling the lower ->flush(). If
the lower filesystem does not implement ->flush(), then eCryptfs has no
need to do anything in ecryptfs_flush() since dirty pages are now
written out to the lower filesystem in ecryptfs_release().
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
On AArch64 Linux, we want the sys_stat64() and related functions for
compat support but do not need the generic struct stat64, enabled
automatically if __ARCH_WANT_STAT64.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Fixes a regression caused by:
821f749 eCryptfs: Revert to a writethrough cache model
That patch reverted some code (specifically, 32001d6f) that was
necessary to properly handle open() -> mmap() -> close() -> dirty pages
-> munmap(), because the lower file could be closed before the dirty
pages are written out.
Rather than reapplying 32001d6f, this approach is a better way of
ensuring that the lower file is still open in order to handle writing
out the dirty pages. It is called from ecryptfs_release(), while we have
a lock on the lower file pointer, just before the lower file gets the
final fput() and we overwrite the pointer.
https://launchpad.net/bugs/1047261
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Artemy Tregubenko <me@arty.name>
Tested-by: Artemy Tregubenko <me@arty.name>
Tested-by: Colin Ian King <colin.king@canonical.com>
new_xattr in __simple_xattr_set() is only initialized with a valid
pointer if value is not NULL, which only happens if this function is
called directly with the intention to remove an existing extended
attribute. Even being safe to be this way, smatch warns about possible
NULL dereference. Dan Carpenter suggested using uninitialized_var()
which will make both gcc and smatch happy.
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
v2: add function documentation instead of adding a separate file under
Documentation/
tj: Updated comment a bit and rolled in Randy's suggestions.
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
For very long online resizes, a periodic update to the console log is
helpful for debugging and for progress reporting.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If we have run out of reserved gdt blocks, then clear the resize_inode
feature and enable the meta_bg feature, so that we can continue
resizing the file system seamlessly.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The claim_reserved_blks() function was not taking account of
the possibility of "blockages" while performing allocation.
This can be caused by another node allocating something in
the same extent which has been reserved locally.
This patch tests for this condition and then skips the remainder
of the reservation in this case. This is a relatively rare event,
so that it should not affect the general performance improvement
which the block reservations provide.
The claim_reserved_blks() function also appears not to be able
to deal with reservations which cross bitmap boundaries, but
that can be dealt with in a future patch since we don't generate
boundary crossing reservations currently.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: David Teigland <teigland@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>
This collects up the write size hinting code which is used by the
block reservation subsystem into a single function. At the same
time this also corrects the rounding for this calculation.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
- Final (hopefully) fix for the range checking code in NFSv4 getacl. This
should fix the Oopses being seen when the acl size is close to PAGE_SIZE.
- Fix a regression with the legacy binary mount code
- Fix a regression in the readdir cookieverf initialisation
- Fix an RPC over UDP regression
- Ensure that we report all errors in the NFSv4 open code
- Ensure that fsync() reports all relevant synchronisation errors.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJQUN+KAAoJEGcL54qWCgDyHGcQAKj7MYVDIjhdmsVGGNWXUCnf
X0LVg/ajh+vjusK+hmquzcJikZqgce5IU5DW4vcFr1X8BgP+R51UVvU0KksByD5H
ourV2JVCztAQzQ4WWOsZAGqN0tooJUjyEjl4lEiDsQCF4Nk1HWbuCHeYuX74OToZ
jrgedj0EZ6zb7TOizvbgU/7lI+FKu3Hlw6+u27M9phtSuefJdYSHZHYVMOX81qPh
k0zgZ4tuLIaDuBB84iCrPwNt9icnevq6cIc+AGluI6xhDw+foPvUaUR+OUI420IZ
tunNzP2So+nNoyjEiyMVENaCdEyA75XAmmGHTUUdBiVOsMV4HF/TqvTtSsjk2mN1
FbZVvtjD6srjsQaKdVmqMIZBdhY9LSMLIQVqb4H2rYP6Mwq06WTuyCxf5YhzFfoy
2tai7JuqBkTAWfKB8ESWywV6Qk/MkUWRAOBO6ksS66gAwpcFDj6nfeAdwaEmoYKc
uzLUIRZaclPMZf661cs1fWeFV5XOnCL7je4owgTRGs7MHooWHPcC3273fEJqnhFz
5MkC7nfmUiGcdO1v0mfYTEtMj9Pp9icBoZcVTGn4eZIHzvhhZOx//8LhyBfS+jll
bKjaLZ1rErvIqwnSGcB7PK2yBYY9P6ZaxWjOrAAncZmiOxfhN0hvCo54jNOr/VZ+
atsDEAuqSTeK7ouBqyO4
=e5yE
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.6-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
- Final (hopefully) fix for the range checking code in NFSv4 getacl.
This should fix the Oopses being seen when the acl size is close to
PAGE_SIZE.
- Fix a regression with the legacy binary mount code
- Fix a regression in the readdir cookieverf initialisation
- Fix an RPC over UDP regression
- Ensure that we report all errors in the NFSv4 open code
- Ensure that fsync() reports all relevant synchronisation errors.
* tag 'nfs-for-3.6-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: fsync() must exit with an error if page writeback failed
SUNRPC: Fix a UDP transport regression
NFS: return error from decode_getfh in decode open
NFSv4: Fix buffer overflow checking in __nfs4_get_acl_uncached
NFSv4: Fix range checking in __nfs4_get_acl_uncached and __nfs4_proc_set_acl
NFS: Fix a problem with the legacy binary mount code
NFS: Fix the initialisation of the readdir 'cookieverf' array
Set bg_itable_unused for file systems that have uninit_bg enabled.
This will speed up the first e2fsck run after the file system is
resized.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
ext3 users of data=journal mode with blocksize < pagesize were occasionally
hitting assertion failure in journal_commit_transaction() checking whether the
transaction has at least as many credits reserved as buffers attached. The
core of the problem is that when a file gets truncated, buffers that still need
checkpointing or that are attached to the committing transaction are left with
buffer_mapped set. When this happens to buffers beyond i_size attached to a
page stradding i_size, subsequent write extending the file will see these
buffers and as they are mapped (but underlying blocks were freed) things go
awry from here.
The assertion failure just coincidentally (and in this case luckily as we would
start corrupting filesystem) triggers due to journal_head not being properly
cleaned up as well.
Under some rare circumstances this bug could even hit data=ordered mode users.
There the assertion won't trigger and we would end up corrupting the
filesystem.
We fix the problem by unmapping buffers if possible (in lots of cases we just
need a buffer attached to a transaction as a place holder but it must not be
written out anyway). And in one case, we just have to bite the bullet and wait
for transaction commit to finish.
Reviewed-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Jan Kara <jack@suse.cz>
We need to ensure that if the call to filemap_write_and_wait_range()
fails, then we report that error back to the application.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Pull FUSE fixes from Miklos Szeredi:
"This contains bugfixes for FUSE and CUSE and a compile warning fix."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: fix retrieve length
fuse: mark variables uninitialized
cuse: kill connection on initialization error
cuse: fix fuse_conn_kill()
The function scans @delaying_queue and stops at the first inode
whose dirtied_when is after *work->older_than_this. So the expired
ones being moved are those before *work->older_than_this. Correct
the comment here.
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Pull CIFS fixes from Steve French.
* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: Fix endianness conversion
CIFS: Fix error handling in cifs_push_mandatory_locks
Pull UDF and ext3 fixes from Jan Kara:
"One UDF data corruption fix and one ext3 fix where we didn't write
everything to disk on fsync in one corner case."
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Fix data corruption for files in ICB
ext3: Fix fdatasync() for files with only i_size changes
It is a frequent mistake to confuse the netlink port identifier with a
process identifier. Try to reduce this confusion by renaming fields
that hold port identifiers portid instead of pid.
I have carefully avoided changing the structures exported to
userspace to avoid changing the userspace API.
I have successfully built an allyesconfig kernel with this change.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
device_write only checks whether the request size is big enough, but it doesn't
check if the size is too big.
At that point, it also tries to allocate as much memory as the user has requested
even if it's too much. This can lead to OOM killer kicking in, or memory corruption
if (count + 1) overflows.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Previously, there was bio_clone() but it only allocated from the fs bio
set; as a result various users were open coding it and using
__bio_clone().
This changes bio_clone() to become bio_clone_bioset(), and then we add
bio_clone() and bio_clone_kmalloc() as wrappers around it, making use of
the functionality the last patch adedd.
This will also help in a later patch changing how bio cloning works.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: NeilBrown <neilb@suse.de>
CC: Alasdair Kergon <agk@redhat.com>
CC: Boaz Harrosh <bharrosh@panasas.com>
CC: Jeff Garzik <jeff@garzik.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Previously, bio_kmalloc() and bio_alloc_bioset() behaved slightly
different because there was some almost-duplicated code - this fixes
some of that.
The important change is that previously bio_kmalloc() always set
bi_io_vec = bi_inline_vecs, even if nr_iovecs == 0 - unlike
bio_alloc_bioset(). This would cause bio_has_data() to return true; I
don't know if this resulted in any actual bugs but it was certainly
wrong.
bio_kmalloc() and bio_alloc_bioset() also have different arbitrary
limits on nr_iovecs - 1024 (UIO_MAXIOV) for bio_kmalloc(), 256
(BIO_MAX_PAGES) for bio_alloc_bioset(). This patch doesn't fix that, but
at least they're enforced closer together and hopefully they will be
fixed in a later patch.
This'll also help with some future cleanups - there are a fair number of
functions that allocate bios (e.g. bio_clone()), and now they don't have
to be duplicated for bio_alloc(), bio_alloc_bioset(), and bio_kmalloc().
Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
v7: Re-add dropped comments, improv patch description
Signed-off-by: Jens Axboe <axboe@kernel.dk>