evie/android_kernel_oneplus_msm8998 - Gay Catgirls Forgejo: gay catgirls having sex

evie/android_kernel_oneplus_msm8998

Author	SHA1	Message	Date
Chuck Lever	be05c860d7	NFS: Add nfs4_sequence calls for OPEN_CONFIRM Ensure OPEN_CONFIRM is not emitted while the transport is plugged. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-03 15:26:37 -04:00
Chuck Lever	fbd4bfd1d9	NFS: Add nfs4_sequence calls for RELEASE_LOCKOWNER Ensure RELEASE_LOCKOWNER is not emitted while the transport is plugged. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-03 15:26:37 -04:00
Chuck Lever	160881e33d	NFS: Enable nfs4_setup_sequence() for DELEGRETURN When CONFIG_NFS_V4_1 is disabled, the calls to nfs4_setup_sequence() and nfs4_sequence_done() are compiled out for the DELEGRETURN operation. To allow NFSv4.0 transport blocking to work for DELEGRETURN, these call sites have to be present all the time. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-03 15:26:36 -04:00
Chuck Lever	3bd2384a77	NFS: NFSv4.0 transport blocking Plumb in a mechanism for plugging an NFSv4.0 mount, using the same infrastructure as NFSv4.1 sessions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-03 15:26:35 -04:00
Chuck Lever	abf79bb341	NFS: Add a slot table to struct nfs_client for NFSv4.0 transport blocking Anchor an nfs4_slot_table in the nfs_client for use with NFSv4.0 transport blocking. It is initialized only for NFSv4.0 nfs_client's. Introduce appropriate minor version ops to handle nfs_client initialization and shutdown requirements that differ for each minor version. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-03 15:26:35 -04:00
Chuck Lever	eb2a1cd3c9	NFS: Add global helper for releasing slot table resources The nfs4_destroy_slot_tables() function is renamed to avoid confusion with the new helper. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-03 15:26:34 -04:00
Chuck Lever	744aa52530	NFS: Add global helper to set up a stand-along nfs4_slot_table Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-03 15:26:34 -04:00
Chuck Lever	9d33059c1b	NFS: Enable slot table helpers for NFSv4.0 I'd like to re-use NFSv4.1's slot table machinery for NFSv4.0 transport blocking. Re-organize some of nfs4session.c so the slot table code is built even when NFS_V4_1 is disabled. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-03 15:26:33 -04:00
Chuck Lever	220e09ccd3	NFS: Remove unused call_sync minor version op Clean up. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-03 15:26:33 -04:00
Chuck Lever	9915ea7e0a	NFS: Add RPC callouts to start NFSv4.0 synchronous requests Refactor nfs4_call_sync_sequence() so it is used for NFSv4.0 now. The RPC callouts will house transport blocking logic similar to NFSv4.1 sessions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-03 15:26:32 -04:00
Chuck Lever	a9c92d6b85	NFS: Common versions of sequence helper functions NFSv4.0 will have need for this functionality when I add the ability to block NFSv4.0 traffic before migration recovery. I'm not really clear on why nfs4_set_sequence_privileged() gets a generic name, but nfs41_init_sequence() gets a minor version-specific name. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-03 15:26:32 -04:00
Chuck Lever	5a580e0ae2	NFS: Clean up nfs4_setup_sequence() Clean up: Both the NFSv4.0 and NFSv4.1 version of nfs4_setup_sequence() are used only in fs/nfs/nfs4proc.c. No need to keep global header declarations for either version. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-03 15:26:31 -04:00
Chuck Lever	2a3eb2b97b	NFS: Rename nfs41_call_sync_data as a common data structure Clean up: rename nfs41_call_sync_data for use as a data structure common to all NFSv4 minor versions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-03 15:26:31 -04:00
Chuck Lever	e8d92382dd	NFS: When displaying session slot numbers, use "%u" consistently Clean up, since slot and sequence numbers are all unsigned anyway. Among other things, squelch compiler warnings: linux/fs/nfs/nfs4proc.c: In function ‘nfs4_setup_sequence’: linux/fs/nfs/nfs4proc.c:703:2: warning: signed and unsigned type in conditional expression [-Wsign-compare] and linux/fs/nfs/nfs4session.c: In function ‘nfs4_alloc_slot’: linux/fs/nfs/nfs4session.c:151:31: warning: signed and unsigned type in conditional expression [-Wsign-compare] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-03 15:26:30 -04:00
Trond Myklebust	ba6c05928d	NFS: Ensure that rmdir() waits for sillyrenames to complete If an NFS client does mkdir("dir"); fd = open("dir/file"); unlink("dir/file"); close(fd); rmdir("dir"); then the asynchronous nature of the sillyrename operation means that we can end up getting EBUSY for the rmdir() in the above test. Fix that by ensuring that we wait for any in-progress sillyrenames before sending the rmdir() to the server. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-03 15:26:29 -04:00
Weston Andros Adamson	a5250def7c	NFSv4: use the mach cred for SECINFO w/ integrity Commit `5ec16a8500` introduced a regression that causes SECINFO to fail without actualy sending an RPC if: 1) the nfs_client's rpc_client was using KRB5i/p (now tried by default) 2) the current user doesn't have valid kerberos credentials This situation is quite common - as of now a sec=sys mount would use krb5i for the nfs_client's rpc_client and a user would hardly be faulted for not having run kinit. The solution is to use the machine cred when trying to use an integrity protected auth flavor for SECINFO. Older servers may not support using the machine cred or an integrity protected auth flavor for SECINFO in every circumstance, so we fall back to using the user's cred and the filesystem's auth flavor in this case. We run into another problem when running against linux nfs servers - they return NFS4ERR_WRONGSEC when using integrity auth flavor (unless the mount is also that flavor) even though that is not a valid error for SECINFO*. Even though it's against spec, handle WRONGSEC errors on SECINFO by falling back to using the user cred and the filesystem's auth flavor. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-03 15:25:10 -04:00
Andy Adamson	dc24826bfc	NFS avoid expired credential keys for buffered writes We must avoid buffering a WRITE that is using a credential key (e.g. a GSS context key) that is about to expire or has expired. We currently will paint ourselves into a corner by returning success to the applciation for such a buffered WRITE, only to discover that we do not have permission when we attempt to flush the WRITE (and potentially associated COMMIT) to disk. Use the RPC layer credential key timeout and expire routines which use a a watermark, gss_key_expire_timeo. We test the key in nfs_file_write. If a WRITE is using a credential with a key that will expire within watermark seconds, flush the inode in nfs_write_end and send only NFS_FILE_SYNC WRITEs by adding nfs_ctx_key_to_expire to nfs_need_sync_write. Note that this results in single page NFS_FILE_SYNC WRITEs. Signed-off-by: Andy Adamson <andros@netapp.com> [Trond: removed a pr_warn_ratelimited() for now] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-03 15:25:09 -04:00
Linus Torvalds	542a086ac7	Driver core patches for 3.12-rc1 Here's the big driver core pull request for 3.12-rc1. Lots of tiny changes here fixing up the way sysfs attributes are created, to try to make drivers simpler, and fix a whole class race conditions with creations of device attributes after the device was announced to userspace. All the various pieces are acked by the different subsystem maintainers. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.21 (GNU/Linux) iEYEABECAAYFAlIlIPcACgkQMUfUDdst+ynUMwCaAnITsxyDXYQ4DqEsz8EcOtMk 718AoLrgnUZs3B+70AT34DVktg4HSThk =USl9 -----END PGP SIGNATURE----- Merge tag 'driver-core-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core patches from Greg KH: "Here's the big driver core pull request for 3.12-rc1. Lots of tiny changes here fixing up the way sysfs attributes are created, to try to make drivers simpler, and fix a whole class race conditions with creations of device attributes after the device was announced to userspace. All the various pieces are acked by the different subsystem maintainers" * tag 'driver-core-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (119 commits) firmware loader: fix pending_fw_head list corruption drivers/base/memory.c: introduce help macro to_memory_block dynamic debug: line queries failing due to uninitialized local variable sysfs: sysfs_create_groups returns a value. debugfs: provide debugfs_create_x64() when disabled rbd: convert bus code to use bus_groups firmware: dcdbas: use binary attribute groups sysfs: add sysfs_create/remove_groups for when SYSFS is not enabled driver core: add #include <linux/sysfs.h> to core files. HID: convert bus code to use dev_groups Input: serio: convert bus code to use drv_groups Input: gameport: convert bus code to use drv_groups driver core: firmware: use __ATTR_RW() driver core: core: use DEVICE_ATTR_RO driver core: bus: use DRIVER_ATTR_WO() driver core: create write-only attribute macros for devices and drivers sysfs: create __ATTR_WO() driver-core: platform: convert bus code to use dev_groups workqueue: convert bus code to use dev_groups MEI: convert bus code to use dev_groups ...	2013-09-03 11:37:15 -07:00
Linus Torvalds	fc6d0b0376	Merge branch 'lockref' (locked reference counts) Merge lockref infrastructure code by me and Waiman Long. I already merged some of the preparatory patches that didn't actually do any semantic changes earlier, but this merges the actual _reason_ for those preparatory patches. The "lockref" structure is a combination "spinlock and reference count" that allows optimized reference count accesses. In particular, it guarantees that the reference count will be updated AS IF the spinlock was held, but using atomic accesses that cover both the reference count and the spinlock words, we can often do the update without actually having to take the lock. This allows us to avoid the nastiest cases of spinlock contention on large machines under heavy pathname lookup loads. When updating the dentry reference counts on a large system, we'll still end up with the cache line bouncing around, but that's much less noticeable than actually having to spin waiting for the lock. * lockref: lockref: implement lockless reference count updates using cmpxchg() lockref: uninline lockref helper functions vfs: reimplement d_rcu_to_refcount() using lockref_get_or_lock() vfs: use lockref_get_not_zero() for optimistic lockless dget_parent() lockref: add 'lockref_get_or_lock() helper	2013-09-03 08:08:21 -07:00
Miklos Szeredi	efeb9e60d4	fuse: readdir: check for slash in names Userspace can add names containing a slash character to the directory listing. Don't allow this as it could cause all sorts of trouble. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: stable@vger.kernel.org	2013-09-03 14:28:38 +02:00
Maxim Patlasov	06a7c3c278	fuse: hotfix truncate_pagecache() issue The way how fuse calls truncate_pagecache() from fuse_change_attributes() is completely wrong. Because, w/o i_mutex held, we never sure whether 'oldsize' and 'attr->size' are valid by the time of execution of truncate_pagecache(inode, oldsize, attr->size). In fact, as soon as we released fc->lock in the middle of fuse_change_attributes(), we completely loose control of actions which may happen with given inode until we reach truncate_pagecache. The list of potentially dangerous actions includes mmap-ed reads and writes, ftruncate(2) and write(2) extending file size. The typical outcome of doing truncate_pagecache() with outdated arguments is data corruption from user point of view. This is (in some sense) acceptable in cases when the issue is triggered by a change of the file on the server (i.e. externally wrt fuse operation), but it is absolutely intolerable in scenarios when a single fuse client modifies a file without any external intervention. A real life case I discovered by fsx-linux looked like this: 1. Shrinking ftruncate(2) comes to fuse_do_setattr(). The latter sends FUSE_SETATTR to the server synchronously, but before getting fc->lock ... 2. fuse_dentry_revalidate() is asynchronously called. It sends FUSE_LOOKUP to the server synchronously, then calls fuse_change_attributes(). The latter updates i_size, releases fc->lock, but before comparing oldsize vs attr->size.. 3. fuse_do_setattr() from the first step proceeds by acquiring fc->lock and updating attributes and i_size, but now oldsize is equal to outarg.attr.size because i_size has just been updated (step 2). Hence, fuse_do_setattr() returns w/o calling truncate_pagecache(). 4. As soon as ftruncate(2) completes, the user extends file size by write(2) making a hole in the middle of file, then reads data from the hole either by read(2) or mmap-ed read. The user expects to get zero data from the hole, but gets stale data because truncate_pagecache() is not executed yet. The scenario above illustrates one side of the problem: not truncating the page cache even though we should. Another side corresponds to truncating page cache too late, when the state of inode changed significantly. Theoretically, the following is possible: 1. As in the previous scenario fuse_dentry_revalidate() discovered that i_size changed (due to our own fuse_do_setattr()) and is going to call truncate_pagecache() for some 'new_size' it believes valid right now. But by the time that particular truncate_pagecache() is called ... 2. fuse_do_setattr() returns (either having called truncate_pagecache() or not -- it doesn't matter). 3. The file is extended either by write(2) or ftruncate(2) or fallocate(2). 4. mmap-ed write makes a page in the extended region dirty. The result will be the lost of data user wrote on the fourth step. The patch is a hotfix resolving the issue in a simplistic way: let's skip dangerous i_size update and truncate_pagecache if an operation changing file size is in progress. This simplistic approach looks correct for the cases w/o external changes. And to handle them properly, more sophisticated and intrusive techniques (e.g. NFS-like one) would be required. I'd like to postpone it until the issue is well discussed on the mailing list(s). Changed in v2: - improved patch description to cover both sides of the issue. Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: stable@vger.kernel.org	2013-09-03 13:41:58 +02:00
Anand Avati	d331a415ae	fuse: invalidate inode attributes on xattr modification Calls like setxattr and removexattr result in updation of ctime. Therefore invalidate inode attributes to force a refresh. Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: stable@vger.kernel.org	2013-09-03 13:41:58 +02:00
Maxim Patlasov	4a4ac4eba1	fuse: postpone end_page_writeback() in fuse_writepage_locked() The patch fixes a race between ftruncate(2), mmap-ed write and write(2): 1) An user makes a page dirty via mmap-ed write. 2) The user performs shrinking truncate(2) intended to purge the page. 3) Before fuse_do_setattr calls truncate_pagecache, the page goes to writeback. fuse_writepage_locked fills FUSE_WRITE request and releases the original page by end_page_writeback. 4) fuse_do_setattr() completes and successfully returns. Since now, i_mutex is free. 5) Ordinary write(2) extends i_size back to cover the page. Note that fuse_send_write_pages do wait for fuse writeback, but for another page->index. 6) fuse_writepage_locked proceeds by queueing FUSE_WRITE request. fuse_send_writepage is supposed to crop inarg->size of the request, but it doesn't because i_size has already been extended back. Moving end_page_writeback to the end of fuse_writepage_locked fixes the race because now the fact that truncate_pagecache is successfully returned infers that fuse_writepage_locked has already called end_page_writeback. And this, in turn, infers that fuse_flush_writepages has already called fuse_send_writepage, and the latter used valid (shrunk) i_size. write(2) could not extend it because of i_mutex held by ftruncate(2). Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: stable@vger.kernel.org	2013-09-03 13:41:57 +02:00
Jaegeuk Kim	222cbdc483	f2fs: avoid an overflow during utilization calculation The current f2fs uses all the block counts with 32 bit numbers, which is able to cover about 15TB volume. But in calculation of utilization, f2fs multiplies the count by 100 which can induce overflow. This patch fixes this. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-09-03 13:41:37 +09:00
Jaegeuk Kim	c34e333fd5	f2fs: trigger GC when there are prefree segments Previously, f2fs conducts SSR when free_sections() < overprovision_sections. But, even though there are a lot of prefree segments, it can consider SSR only. So, let's consider the number of prefree segments too for triggering SSR. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-09-03 10:11:20 +09:00
Linus Torvalds	15570086b5	vfs: reimplement d_rcu_to_refcount() using lockref_get_or_lock() This moves __d_rcu_to_refcount() from <linux/dcache.h> into fs/namei.c and re-implements it using the lockref infrastructure instead. It also adds a lot of comments about what is actually going on, because turning a dentry that was looked up using RCU into a long-lived reference counted entry is one of the more subtle parts of the rcu walk. We also used to be _particularly_ subtle in unlazy_walk() where we re-validate both the dentry and its parent using the same sequence count. We used to do it by nesting the locks and then verifying the sequence count just once. That was silly, because nested locking is expensive, but the sequence count check is not. So this just re-validates the dentry and the parent separately, avoiding the nested locking, and making the lockref lookup possible. Acked-by: Waiman Long <waiman.long@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-09-02 11:38:06 -07:00
Waiman Long	df3d0bbcdb	vfs: use lockref_get_not_zero() for optimistic lockless dget_parent() A valid parent pointer is always going to have a non-zero reference count, but if we look up the parent optimistically without locking, we have to protect against the (very unlikely) race against renaming changing the parent from under us. We do that by using lockref_get_not_zero(), and then re-checking the parent pointer after getting a valid reference. [ This is a re-implementation of a chunk from the original patch by Waiman Long: "dcache: Enable lockless update of dentry's refcount". I've completely rewritten the patch-series and split it up, but I'm attributing this part to Waiman as it's close enough to his earlier patch - Linus ] Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-09-02 11:29:22 -07:00
Trond Myklebust	2127d82af3	NFSv4: Convert idmapper to use the new framework for pipefs dentries Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-01 11:12:42 -04:00
Eric W. Biederman	c7b96acf14	userns: Kill nsown_capable it makes the wrong thing easy nsown_capable is a special case of ns_capable essentially for just CAP_SETUID and CAP_SETGID. For the existing users it doesn't noticably simplify things and from the suggested patches I have seen it encourages people to do the wrong thing. So remove nsown_capable. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2013-08-30 23:44:11 -07:00
Maxime Bizon	3bd11cf56e	pstore/ram: (really) fix undefined usage of rounddown_pow_of_two Previous attempt to fix was `b042e47491` Suggested use of is_power_of_2() was bogus because is_power_of_2(0) is false (documented behaviour). Signed-off-by: Maxime Bizon <mbizon@freebox.fr> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tony Luck <tony.luck@intel.com>	2013-08-30 15:57:01 -07:00
J. Bruce Fields	248f807b47	nfsd4: nfsd4_create_clid_dir prints uninitialized data Take the easy way out and just remove the printk. Reported-by: David Howells <dhowells@redhat.com>	2013-08-30 17:30:52 -04:00
J. Bruce Fields	bf7bd3e98b	nfsd4: fix leak of inode reference on delegation failure This fixes a regression from `68a3396178` "nfsd4: shut down more of delegation earlier". After that commit, nfs4_set_delegation() failures result in nfs4_put_delegation being called, but nfs4_put_delegation doesn't free the nfs4_file that has already been set by alloc_init_deleg(). This can result in an oops on later unmounting the exported filesystem. Note also delaying the fi_had_conflict check we're able to return a better error (hence give 4.1 clients a better idea why the delegation failed; though note CONFLICT isn't an exact match here, as that's supposed to indicate a current conflict, but all we know here is that there was one recently). Reported-by: Toralf Förster <toralf.foerster@gmx.de> Tested-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2013-08-30 17:30:52 -04:00
J. Bruce Fields	3477565e6a	Revert "nfsd: nfs4_file_get_access: need to be more careful with O_RDWR" This reverts commit `df66e75395`. nfsd4_lock can get a read-only or write-only reference when only a read-write open is available. This is normal. Cc: Harshula Jayasuriya <harshula@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2013-08-30 17:30:45 -04:00
J. Bruce Fields	b8297cec2d	Linux 3.11-rc5 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQEcBAABAgAGBQJSCDSjAAoJEHm+PkMAQRiGDXMIAI7Loae0Oqb1eoeJkvjyZsBS OJDeeEcn+k58VbxVHyRdc7hGo4yI4tUZm172SpnOaM8sZ/ehPU7zBrwJK2lzX334 /jAM3uvVPfxA2nu0I4paNpkED/NQ8NRRsYE1iTE8dzHXOH6dA3mgp5qfco50rQvx rvseXpME4KIAJEq4jnyFZF5+nuHiPueM9JftPmSSmJJ3/KY9kY1LESovyWd7ttg1 jYSVPFal9J0E+tl2UQY5g9H16GqhhjYn+39Iei6Q5P4bL4ZubQgTRQTN9nyDc06Z ezQtGoqZ8kEz/2SyRlkda6PzjSEhgXlc8mCL5J7AW+dMhTHHx2IrosjiCA80kG8= =c0rK -----END PGP SIGNATURE----- Merge tag 'v3.11-rc5' into for-3.12 branch For testing purposes I want some nfs and nfsd bugfixes (specifically, `58cd57bfd9` and previous nfsd patches, and Trond's `4f3cc4809a`).	2013-08-30 16:42:49 -04:00
Eric Sandeen	914ed44b17	Fix wrong flag ASSERT in xfs_attr_shortform_getvalue This ASSERT is testing an if_flags flag value against a di_aformat enum value. di_aformat is never assigned XFS_IFINLINE. This happens to work for now, because XFS_IFINLINE has the same value as XFS_DINODE_FMT_LOCAL, and that's tested just before we call this function. However, I think the intention is to assert that we have read in the data, i.e. XFS_IFINLINE on if_flags, before we use if_data. This is done in other places through the code as well. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-08-30 15:20:50 -05:00
Dave Chinner	904c17e683	xfs: finish removing IOP_* macros. In optimising the CIL operations, some of the IOP_* macros for calling log item operations were removed. Remove the rest of them as Christoph requested. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Geoffrey Wehrman <gwehrman@sgi.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-08-30 14:14:35 -05:00
Dave Chinner	239567033c	xfs: inode log reservations are too small We've been seeing occasional problems with log space leaks and transaction underruns such as this for some time: XFS (dm-0): xlog_write: reservation summary: trans type = FSYNC_TS (36) unit res = 2740 bytes current res = -4 bytes total reg = 0 bytes (o/flow = 0 bytes) ophdrs = 0 (ophdr space = 0 bytes) ophdr + reg = 0 bytes num regions = 0 Turns out that xfstests generic/311 is reliably reproducing this problem with the test it runs at sequence 16 of it execution. It is a 100% reliable reproducer with the mkfs configuration of "-b size=1024 -m crc=1" on a 10GB scratch device. The problem? Inode forks in btree format are logged in memory format, not disk format (i.e. bmbt format, not bmdr format). That means there is a btree block header being logged, when such a structure is never written to the inode fork in bmdr format. The bmdr header in the inode is only 4 bytes, while the bmbt header is 24 bytes for v4 filesystems and 72 bytes for v5 filesystems. We currently reserve the inode size plus the rounded up overhead of a logging a buffer, which is 128 bytes. That means the reservation for a 512 byte inode is 640 bytes. What we can actually log is: inode core, data and attr fork = 512 bytes inode log format + log op header = 56 + 12 = 68 bytes data fork bmbt hdr = 24/72 bytes attr fork bmbt hdr = 24/72 bytes So, for a v2 inodes we can log at least 628 bytes, but if we split that inode over the end of the log across log buffers, we need to also another log op header, which takes us to 640 bytes. If there's another reservation taken out of this that I haven't taken into account (perhaps multiple iclog splits?) or I haven't corectly calculated the bmbt format space used (entirely possible), then we will overun it. For v3 inodes the maximum is actually 724 bytes, and even a single maximally sized btree format fork can blow it (652 bytes). And that's exactly what is happening with the FSYNC_TS transaction in the above output - it's consumed 644 bytes of space after the CIL context took the space reserved for it (2100 bytes). This problem has always been present in the XFS code - the btree format inode forks have always been logged in this manner. Hence there has always been the possibility of an overrun with such a transaction. The CRC code has just exposed it frequently enough to be able to debug and understand the root cause.... So, let's fix all the inode log space reservations. [ I'm so glad we spent the effort to clean up the transaction reservation code. This is an easy fix now. ] Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-08-30 13:59:30 -05:00
Brian Foster	b121099d84	xfs: check correct status variable for xfs_inobt_get_rec() call The call to xfs_inobt_get_rec() in xfs_dialloc_ag() passes 'j' as the output status variable. The immediately following XFS_WANT_CORRUPTED_GOTO() checks the value of 'i,' which is from the previous lookup call and has already been checked. Fix the corruption check to use 'j.' Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-08-30 13:48:35 -05:00
Dave Chinner	d8914002a0	xfs: inode buffers may not be valid during recovery readahead CRC enabled filesystems fail log recovery with 100% reliability on xfstests xfs/085 with the following failure: XFS (vdb): Mounting Filesystem XFS (vdb): Starting recovery (logdev: internal) XFS (vdb): Corruption detected. Unmount and run xfs_repair XFS (vdb): bad inode magic/vsn daddr 144 #0 (magic=0) XFS: Assertion failed: 0, file: fs/xfs/xfs_inode_buf.c, line: 95 The problem is that the inode buffer has not been recovered before the readahead on the inode buffer is issued. The checkpoint being recovered actually allocates the inode chunk we are doing readahead from, so what comes from disk during readahead is essentially random and the verifier barfs on it. This inode buffer readahead problem affects non-crc filesystems, too, but xfstests does not trigger it at all on such configurations.... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-08-30 13:45:49 -05:00
Dave Chinner	50d5c8d8e9	xfs: check LSN ordering for v5 superblocks during recovery Log recovery has some strict ordering requirements which unordered or reordered metadata writeback can defeat. This can occur when an item is logged in a transaction, written back to disk, and then logged in a new transaction before the tail of the log is moved past the original modification. The result of this is that when we read an object off disk for recovery purposes, the buffer that we read may not contain the object type that recovery is expecting and hence at the end of the checkpoint being recovered we have an invalid object in memory. This isn't usually a problem, as recovery will then replay all the other checkpoints and that brings the object back to a valid and correct state, but the issue is that while the object is in the invalid state it can be flushed to disk. This results in the object verifier failing and triggering a corruption shutdown of log recover. This is correct behaviour for the verifiers - the problem is that we are not detecting that the object we've read off disk is newer than the transaction we are replaying. All metadata in v5 filesystems has the LSN of it's last modification stamped in it. This enabled log recover to read that field and determine the age of the object on disk correctly. If the LSN of the object on disk is older than the transaction being replayed, then we replay the modification. If the LSN of the object matches or is more recent than the transaction's LSN, then we should avoid overwriting the object as that is what leads to the transient corrupt state. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-08-30 13:44:53 -05:00
Dave Chinner	b58fa554e9	xfs: btree block LSN escaping to disk uninitialised When testing LSN ordering code for v5 superblocks, it was discovered that the the LSN embedded in the generic btree blocks was occasionally uninitialised. These values didn't get written to disk by metadata writeback - they got written by previous transactions in log recovery. The issue is here that the when the block is first allocated and initialised, the LSN field was not initialised - it gets overwritten before IO is issued on the buffer - but the value that is logged by transactions that modify the header before it is written to disk (and initialised) contain garbage. Hence the first recovery of the buffer will stamp garbage into the LSN field, and that can cause subsequent transactions to not replay correctly. The fix is simply to initialise the bb_lsn field to zero when we initialise the block for the first time. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-08-30 13:43:34 -05:00
Dave Chinner	3780437612	XFS: Assertion failed: first <= last && last < BBTOB(bp->b_length), file: fs/xfs/xfs_trans_buf.c, line: 568 The calculation doesn't take into account the size of the dir v3 header, so overestimates the hash entries in a node. This causes directory buffer overruns when splitting and merging nodes. Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-08-30 09:48:59 -05:00
Trond Myklebust	d7631250b2	NFSv4: Fix a potentially Oopsable condition in __nfs_idmap_unregister Ensure that __nfs_idmap_unregister can be called twice without consequences. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-08-30 09:19:38 -04:00
Trond Myklebust	c219066103	SUNRPC: Replace clnt->cl_principal The clnt->cl_principal is being used exclusively to store the service target name for RPCSEC_GSS/krb5 callbacks. Replace it with something that is stored only in the RPCSEC_GSS-specific code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-08-30 09:19:36 -04:00
Trond Myklebust	2d9db75005	NFS: Fix up two use-after-free issues with the new tracing code We don't want to pass the context argument to trace_nfs_atomic_open_exit() after it has been released. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-08-30 09:19:34 -04:00
Dave Chinner	0f0d334595	xfs: fix bad dquot buffer size in log recovery readahead xfstests xfs/087 fails 100% reliably with this assert: XFS (vdb): Mounting Filesystem XFS (vdb): Starting recovery (logdev: internal) XFS: Assertion failed: bp->b_flags & XBF_STALE, file: fs/xfs/xfs_buf.c, line: 548 while trying to read a dquot buffer in xlog_recover_dquot_ra_pass2(). The issue is that the buffer length to read that is passed to xfs_buf_readahead is in units of filesystem blocks, not disk blocks. (i.e. FSB, not daddr). Fix it but putting the correct conversion in place. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-08-29 10:51:35 -05:00
Dave Chinner	84a5b7300c	xfs: don't account buffer cancellation during log recovery readahead When doing readhaead in log recovery, we check to see if buffers are cancelled before doing readahead. If we find a cancelled buffer, however, we always decrement the reference count we have on it, and that means that readahead is causing a double decrement of the cancelled buffer reference count. This results in log recovery replaying cancelled buffers as the actual recovery pass does not find the cancelled buffer entry in the commit phase of the second pass across a transaction. On debug kernels, this results in an ASSERT failure like so: XFS: Assertion failed: !(flags & XFS_BLF_CANCEL), file: fs/xfs/xfs_log_recover.c, line: 1815 xfstests generic/311 reproduces this ASSERT failure with 100% reproducability. Fix it by making readahead only peek at the buffer cancelled state rather than the full accounting that xlog_check_buffer_cancelled() does. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-08-29 10:37:06 -05:00
Eric W. Biederman	7dc5dbc879	sysfs: Restrict mounting sysfs Don't allow mounting sysfs unless the caller has CAP_SYS_ADMIN rights over the net namespace. The principle here is if you create or have capabilities over it you can mount it, otherwise you get to live with what other people have mounted. Instead of testing this with a straight forward ns_capable call, perform this check the long and torturous way with kobject helpers, this keeps direct knowledge of namespaces out of sysfs, and preserves the existing sysfs abstractions. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2013-08-28 21:35:14 -07:00
Linus Torvalds	c95389b4cd	Merge branch 'akpm' (patches from Andrew Morton) Merge fixes from Andrew Morton: "Five fixes. err, make that six. let me try again" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: fs/ocfs2/super.c: Use bigger nodestr to accomodate 32-bit node numbers memcg: check that kmem_cache has memcg_params before accessing it drivers/base/memory.c: fix show_mem_removable() to handle missing sections IPC: bugfix for msgrcv with msgtyp < 0 Omnikey Cardman 4000: pull in ioctl.h in user header timer_list: correct the iterator for timer_list	2013-08-28 19:31:33 -07:00
Goldwyn Rodrigues	49fa8140e4	fs/ocfs2/super.c: Use bigger nodestr to accomodate 32-bit node numbers While using pacemaker/corosync, the node numbers are generated using IP address as opposed to serial node number generation. This may not fit in a 8-byte string. Use a bigger string to print the complete node number. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-08-28 19:26:38 -07:00

1 2 3 4 5 ...