evie/android_kernel_oneplus_msm8998 - Gay Catgirls Forgejo: gay catgirls having sex

evie/android_kernel_oneplus_msm8998

Author	SHA1	Message	Date
Andy Lutomirski	23adbe12ef	fs,userns: Change inode_capable to capable_wrt_inode_uidgid The kernel has no concept of capabilities with respect to inodes; inodes exist independently of namespaces. For example, inode_capable(inode, CAP_LINUX_IMMUTABLE) would be nonsense. This patch changes inode_capable to check for uid and gid mappings and renames it to capable_wrt_inode_uidgid, which should make it more obvious what it does. Fixes CVE-2014-4014. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Chinner <david@fromorbit.com> Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-10 13:57:22 -07:00
Chris Mason	c7548af69d	Btrfs: convert smp_mb__{before,after}_clear_bit The new call is smp_mb__{before,after}_atomic. The __ gives us extra protection from the atomic rays. Signed-off-by: Chris Mason <clm@fb.com>	2014-06-10 13:10:47 -07:00
Linus Torvalds	5b174fd647	Merge branch 'for-3.16' of git://linux-nfs.org/~bfields/linux Pull nfsd updates from Bruce Fields: "The largest piece is a long-overdue rewrite of the xdr code to remove some annoying limitations: for example, there was no way to return ACLs larger than 4K, and readdir results were returned only in 4k chunks, limiting performance on large directories. Also: - part of Neil Brown's work to make NFS work reliably over the loopback interface (so client and server can run on the same machine without deadlocks). The rest of it is coming through other trees. - cleanup and bugfixes for some of the server RDMA code, from Steve Wise. - Various cleanup of NFSv4 state code in preparation for an overhaul of the locking, from Jeff, Trond, and Benny. - smaller bugfixes and cleanup from Christoph Hellwig and Kinglong Mee. Thanks to everyone! This summer looks likely to be busier than usual for knfsd. Hopefully we won't break it too badly; testing definitely welcomed" * 'for-3.16' of git://linux-nfs.org/~bfields/linux: (100 commits) nfsd4: fix FREE_STATEID lockowner leak svcrdma: Fence LOCAL_INV work requests svcrdma: refactor marshalling logic nfsd: don't halt scanning the DRC LRU list when there's an RC_INPROG entry nfs4: remove unused CHANGE_SECURITY_LABEL nfsd4: kill READ64 nfsd4: kill READ32 nfsd4: simplify server xdr->next_page use nfsd4: hash deleg stateid only on successful nfs4_set_delegation nfsd4: rename recall_lock to state_lock nfsd: remove unneeded zeroing of fields in nfsd4_proc_compound nfsd: fix setting of NFS4_OO_CONFIRMED in nfsd4_open nfsd4: use recall_lock for delegation hashing nfsd: fix laundromat next-run-time calculation nfsd: make nfsd4_encode_fattr static SUNRPC/NFSD: Remove using of dprintk with KERN_WARNING nfsd: remove unused function nfsd_read_file nfsd: getattr for FATTR4_WORD0_FILES_AVAIL needs the statfs buffer NFSD: Error out when getting more than one fsloc/secinfo/uuid NFSD: Using type of uint32_t for ex_nflavors instead of int ...	2014-06-10 11:50:57 -07:00
Jeff Layton	0c27362998	locks: set fl_owner for leases back to current->files This fixes a regression due to commit `130d1f956a` (locks: ensure that fl_owner is always initialized properly in flock and lease codepaths). I had mistakenly thought that the fl_owner wasn't used in the lease code, but I missed the place in __break_lease that does use it. The i_have_this_lease check in generic_add_lease uses it. While I'm not sure that check is terribly helpful [1], reset it back to using current->files in order to ensure that there's no behavior change here. [1]: leases are owned by the file description. It's possible that this is a threaded program, and the lease breaker and the task that would handle the signal are different, even if they have the same file table. So, there is the potential for false positives with this check. Fixes: `130d1f956a` (locks: ensure that fl_owner is always initialized properly in flock and lease codepaths) Signed-off-by: Jeff Layton <jlayton@primarydata.com>	2014-06-10 12:29:05 -04:00
Linus Torvalds	d53b47c08d	This pull request contains several UBIFS fixes. One of them fixes a race condition between the mmap page fault path and fsync. Another just removes a bogus assertion from the UBIFS memory shrinker. UBIFS also started honoring the MS_SILENT mount flag, so now it won't print many I/O errors when user-space just tries to probe for the FS. Rest of the changes are rather minor UBI/UBIFS fixes, improvements, and clean-ups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJTlqqmAAoJECmIfjd9wqK0cYAP+QGoB8+DOGv4+rTtUegNaWaG QAeAkvOBxDa72NrTYMtDFBKiPPYcz0BUnPgsCyvYIlYknRuZ4xMsu1BucLF3y8E3 P8aHpnNas0dA3el/M+M0qwpEG0pb1lQvFT1dJVP6/D4q4VexLlgt7PlQjP+98Dua jp9jBDMu3sEDsqA9NiO2hfuFuIqF6DFnBpglkNcGcAyOtzQ/Ps49nivhb1mFKuzI 2kSm2kWmZCTnLlGG1uj9+diCpTKWZoES2Jmo6gcZjQIjlejSzQw4Fo8LqdmhfwOg 0ba1D9kJuwPHmBeM0UW4fLtrJHkvs2F66YaRcbA7GUo5lNw6+yxTqi3jkGevsPOD 7eQB8FVCco14PFKu8Vx4lbkS2ekZN6aQuYYw/EeY2f+w8MpQTfcWINP5OVNHHGVA QDeRiu9mRMbm3JARRe9NeL0+sryGN402jCHtESAhONUDsCmZKX9k8HUMY1iaPAIp kGJWOjbyzYRMJiOfQiklv4N6rusmnECiRWGCliKDCU6TER8U6m9ZCUHUKmtX+56c 2yRtd6y6LVequ/p3wC3x66jF64wjh2PlTM5jrWSOIg42ihIGSMAbmB6MKA+4RxIv EejZ4lhCxUg6Xp7zd/qgdu3aJ/P2OM8yFL+BngGKFac54EnTDEUcc7d8c2DZZv8s 7YYjpiKK1NQGnih0FABA =CP7Z -----END PGP SIGNATURE----- Merge tag 'upstream-3.16-rc1-v2' of git://git.infradead.org/linux-ubifs Pull UBIFS updates from Artem Bityutskiy: "This contains several UBIFS fixes. One of them fixes a race condition between the mmap page fault path and fsync. Another just removes a bogus assertion from the UBIFS memory shrinker. UBIFS also started honoring the MS_SILENT mount flag, so now it won't print many I/O errors when user-space just tries to probe for the FS. Rest of the changes are rather minor UBI/UBIFS fixes, improvements, and clean-ups" * tag 'upstream-3.16-rc1-v2' of git://git.infradead.org/linux-ubifs: UBIFS: Add an assertion for clean_zn_cnt UBIFS: respect MS_SILENT mount flag UBIFS: Remove incorrect assertion in shrink_tnc() UBIFS: fix debugging check UBIFS: add missing ui pointer in debugging code UBI: block: Fix error path on alloc_workqueue failure UBIFS: Fix dump messages in ubifs_dump_lprops UBI: fix rb_tree node comparison in add_map UBIFS: Remove unused variables in ubifs_budget_space UBI: weaken the 'exclusive' constraint when opening volumes to rename UBIFS: fix an mmap and fsync race condition	2014-06-10 08:55:46 -07:00
Mateusz Guzik	a914722f33	NFS: populate ->net in mount data when remounting Otherwise the kernel oopses when remounting with IPv6 server because net is dereferenced in dev_get_by_name. Use net ns of current thread so that dev_get_by_name does not operate on foreign ns. Changing the address is prohibited anyway so this should not affect anything. Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Cc: linux-nfs@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org # 3.4+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-06-10 11:10:29 -04:00
Weston Andros Adamson	c5e20cb700	pnfs: fix lockup caused by pnfs_generic_pg_test end_offset and req_offset both return u64 - avoid casting to u32 until it's needed, when it's less than the (u32) size returned by nfs_generic_pg_test. Also, fix the comments in pnfs_generic_pg_test. Running the cthon04 special tests caused this lockup in the "write/read at 2GB, 4GB edges" test when running against a file layout server: BUG: soft lockup - CPU#0 stuck for 22s! [bigfile2:823] Modules linked in: nfs_layout_nfsv41_files rpcsec_gss_krb5 nfsv4 nfs fscache ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle ip6table_filter ip6_tables iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ppdev crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd serio_raw e1000 shpchp i2c_piix4 i2c_core parport_pc parport nfsd auth_rpcgss oid_registry exportfs nfs_acl lockd sunrpc btrfs xor zlib_deflate raid6_pq mptspi scsi_transport_spi mptscsih mptbase ata_generic floppy autofs4 irq event stamp: 205958 hardirqs last enabled at (205957): [<ffffffff814a62dc>] restore_args+0x0/0x30 hardirqs last disabled at (205958): [<ffffffff814ad96a>] apic_timer_interrupt+0x6a/0x80 softirqs last enabled at (205956): [<ffffffff8103ffb2>] __do_softirq+0x1ea/0x2ab softirqs last disabled at (205951): [<ffffffff8104026d>] irq_exit+0x44/0x9a CPU: 0 PID: 823 Comm: bigfile2 Not tainted 3.15.0-rc1-branch-pgio_plus+ #3 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 task: ffff8800792ec480 ti: ffff880078c4e000 task.ti: ffff880078c4e000 RIP: 0010:[<ffffffffa02ce51f>] [<ffffffffa02ce51f>] nfs_page_group_unlock+0x3e/0x4b [nfs] RSP: 0018:ffff880078c4fab0 EFLAGS: 00000202 RAX: 0000000000000fff RBX: ffff88006bf83300 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88006bf83300 RBP: ffff880078c4fab8 R08: 0000000000000001 R09: 0000000000000000 R10: ffffffff8249840c R11: 0000000000000000 R12: 0000000000000035 R13: ffff88007ffc72d8 R14: 0000000000000001 R15: 0000000000000000 FS: 00007f45f11b7740(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3a8cb632d0 CR3: 000000007931c000 CR4: 00000000001407f0 Stack: ffff88006bf832c0 ffff880078c4fb00 ffffffffa02cec22 ffff880078c4fad8 00000fff810f9d99 ffff880078c4fca0 ffff88006bf832c0 ffff88006bf832c0 ffff880078c4fca0 ffff880078c4fd60 ffff880078c4fb28 ffffffffa02cee34 Call Trace: [<ffffffffa02cec22>] __nfs_pageio_add_request+0x298/0x34f [nfs] [<ffffffffa02cee34>] nfs_pageio_add_request+0x1f/0x42 [nfs] [<ffffffffa02d1722>] nfs_do_writepage+0x1b5/0x1e4 [nfs] [<ffffffffa02d1764>] nfs_writepages_callback+0x13/0x25 [nfs] [<ffffffffa02d1751>] ? nfs_do_writepage+0x1e4/0x1e4 [nfs] [<ffffffff810eb32d>] write_cache_pages+0x254/0x37f [<ffffffffa02d1751>] ? nfs_do_writepage+0x1e4/0x1e4 [nfs] [<ffffffff8149cf9e>] ? printk+0x54/0x56 [<ffffffff810eacca>] ? __set_page_dirty_nobuffers+0x22/0xe9 [<ffffffffa016d864>] ? put_rpccred+0x38/0x101 [sunrpc] [<ffffffffa02d1ae1>] nfs_writepages+0xb4/0xf8 [nfs] [<ffffffff810ec59c>] do_writepages+0x21/0x2f [<ffffffff810e36e8>] __filemap_fdatawrite_range+0x55/0x57 [<ffffffff810e374a>] filemap_write_and_wait_range+0x2d/0x5b [<ffffffffa030ba0a>] nfs4_file_fsync+0x3a/0x98 [nfsv4] [<ffffffff8114ee3c>] vfs_fsync_range+0x18/0x20 [<ffffffff810e40c2>] generic_file_aio_write+0xa7/0xbd [<ffffffffa02c5c6b>] nfs_file_write+0xf0/0x170 [nfs] [<ffffffff81129215>] do_sync_write+0x59/0x78 [<ffffffff8112956c>] vfs_write+0xab/0x107 [<ffffffff81129c8b>] SyS_write+0x49/0x7f [<ffffffff814acd12>] system_call_fastpath+0x16/0x1b Reported-by: Anna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-06-10 11:07:56 -04:00
Linus Torvalds	64b2d1fbbf	f2fs updates for v3.16 This patch-set includes the following major enhancement patches. o enhance wait_on_page_writeback o support SEEK_DATA and SEEK_HOLE o enhance readahead flows o enhance IO flushes o support fiemap o add some tracepoints The other bug fixes are as follows. o fix to support a large volume > 2TB correctly o recovery bug fix wrt fallocated space o fix recursive lock on xattr operations o fix some cases on the remount flow And, there are a bunch of cleanups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJTleLYAAoJEEAUqH6CSFDSdhEP/iY5hTZ02jH4ZiFPP/Nee4hS l0BHeZvrMjoccaWUDu0ZvIPC8BiJ7kLOgK+/VWS7LAfY1PY11ALEtYQOrW+RM47+ jMfULegTod/F8WS2B6dk31QMhAZldtnsYvA5PS1VV3S0rht+qbOz+PDejZFgSMc3 VVQ7OZzq30gMmtsw7oh3FHfeTu4xe/bxygdMRXgljQQD2MvorJiOb4MA+ovEDd9z CZMMTQvRKjc0d8LPf0gOkZEvG63GR6klCgFMuiappUsua0O8IPIEhCyEGFrE66vS fObVKpqNAsR2ABhS2grgn6Q7UUvn4xrF6jOwDH5tuw2yzmkQiMAWINwBdgnbEy1c D5S89PQ8TkQK9KBSoU0v8WKWC4HzWZF4ZEd6eq9VxVTS8iT0w8DtNHXK518FVC34 82iqrLc0EhrcGNiW/7Nrc6WzHrWqorylCFN7atB3ruhVqeMh3MZsDU4Gq0WgB2oh pF9XVBEpJQpV35DYSAPzLkm+2+mwHVNqwdY3HcvUs7IP90+wZlrWSRG2FEfFt/e8 6nwvbORrHMTI5VfdYlEPgpjuesFmYPZqEvRGdaDa1YrHqhvvgStEPT9qiq2qVn9+ tr0HjpNRDD/etkaE6ujYOYqdxuk3mm6RY68h+KSbNcY1/VTvt2bN2telwWuDsxIq 8yOsxV2x3JB/euDAJsSU =xqsO -----END PGP SIGNATURE----- Merge tag 'for-f2fs-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, there is no special interesting feature, but we've investigated a couple of tuning points with respect to the I/O flow. Several major bug fixes and a bunch of clean-ups also have been made. This patch-set includes the following major enhancement patches: - enhance wait_on_page_writeback - support SEEK_DATA and SEEK_HOLE - enhance readahead flows - enhance IO flushes - support fiemap - add some tracepoints The other bug fixes are as follows: - fix to support a large volume > 2TB correctly - recovery bug fix wrt fallocated space - fix recursive lock on xattr operations - fix some cases on the remount flow And, there are a bunch of cleanups" * tag 'for-f2fs-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (52 commits) f2fs: support f2fs_fiemap f2fs: avoid not to call remove_dirty_inode f2fs: recover fallocated space f2fs: fix to recover data written by dio f2fs: large volume support f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages f2fs: avoid overflow when large directory feathure is enabled f2fs: fix recursive lock by f2fs_setxattr MAINTAINERS: add a co-maintainer from samsung for F2FS MAINTAINERS: change the email address for f2fs f2fs: use inode_init_owner() to simplify codes f2fs: avoid to use slab memory in f2fs_issue_flush for efficiency f2fs: add a tracepoint for f2fs_read_data_page f2fs: add a tracepoint for f2fs_write_{meta,node,data}_pages f2fs: add a tracepoint for f2fs_write_{meta,node,data}_page f2fs: add a tracepoint for f2fs_write_end f2fs: add a tracepoint for f2fs_write_begin f2fs: fix checkpatch warning f2fs: deactivate inode page if the inode is evicted f2fs: decrease the lock granularity during write_begin ...	2014-06-09 19:11:44 -07:00
Linus Torvalds	b1cce8032f	Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 Pull CIFS fixes from Steve French. * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: CIFS: Fix memory leaks in SMB2_open cifs: ensure that vol->username is not NULL before running strlen on it Clarify SMB2/SMB3 create context and add missing ones Do not send ClientGUID on SMB2.02 dialect cifs: Set client guid on per connection basis fs/cifs/netmisc.c: convert printk to pr_foo() fs/cifs/cifs.c: replace seq_printf by seq_puts Update cifs version number to 2.03 fs: cifs: new helper: file_inode(file) cifs: fix potential races in cifs_revalidate_mapping cifs: new helper function: cifs_revalidate_mapping cifs: convert booleans in cifsInodeInfo to a flags field cifs: fix cifs_uniqueid_to_ino_t not to ever return 0	2014-06-09 19:08:43 -07:00
Liu Bo	6eda71d0c0	Btrfs: fix scrub_print_warning to handle skinny metadata extents The skinny extents are intepreted incorrectly in scrub_print_warning(), and end up hitting the BUG() in btrfs_extent_inline_ref_size. Reported-by: Konstantinos Skarlatos <k.skarlatos@gmail.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:17 -07:00
Filipe Manana	7ffbb598a0	Btrfs: make fsync work after cloning into a file When cloning into a file, we were correctly replacing the extent items in the target range and removing the extent maps. However we weren't replacing the extent maps with new ones that point to the new extents - as a consequence, an incremental fsync (when the inode doesn't have the full sync flag) was a NOOP, since it relies on the existence of extent maps in the modified list of the inode's extent map tree, which was empty. Therefore add new extent maps to reflect the target clone range. A test case for xfstests follows. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:16 -07:00
Liu Bo	cd857dd6bc	Btrfs: use right type to get real comparison We want to make sure the point is still within the extent item, not to verify the memory it's pointing to. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:15 -07:00
Josef Bacik	8a56457f5f	Btrfs: don't check nodes for extent items The backref code was looking at nodes as well as leaves when we tried to populate extent item entries. This is not good, and although we go away with it for the most part because we'd skip where disk_bytenr != random_memory, sometimes random_memory would match and suddenly boom. This fixes that problem. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:14 -07:00
Filipe Manana	6fdef6d43c	Btrfs: don't release invalid page in btrfs_page_exists_in_range() In inode.c:btrfs_page_exists_in_range(), if the page we got from the radix tree is an exception entry, which can't be retried, we exit the loop with a non-NULL page and then call page_cache_release against it, which is not ok since it's not a valid page. This could also make us return true when we shouldn't. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:14 -07:00
Filipe Manana	809f901625	Btrfs: make sure we retry if page is a retriable exception In inode.c:btrfs_page_exists_in_range(), if the page we get from the radix tree is an exception which should make us retry, set page to NULL in order to really retry, because otherwise we don't get another loop iteration executed (page != NULL makes the while loop exit). This also was making us call page_cache_release after exiting the loop, which isn't correct because page doesn't point to a valid page, and possibly return true from the function when we shouldn't. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:13 -07:00
Filipe Manana	91405151eb	Btrfs: make sure we retry if we couldn't get the page In inode.c:btrfs_page_exists_in_range(), if we can't get the page we need to retry. However we weren't retrying because we weren't setting page to NULL, which makes the while loop exit immediately and will make us call page_cache_release after exiting the loop which is incorrect because our page get didn't succeed. This could also make us return true when we shouldn't. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:12 -07:00
Gui Hecheng	c81d57679e	btrfs: replace EINVAL with EOPNOTSUPP for dev_replace raid56 To return EOPNOTSUPP is more user friendly than to return EINVAL, and then user-space tool will show that the dev_replace operation for raid56 is not currently supported rather than showing that there is an invalid argument. Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:12 -07:00
Antonio Ospite	9391558411	trivial: fs/btrfs/ioctl.c: fix typo s/substract/subtract/ Signed-off-by: Antonio Ospite <ao2@ao2.it> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: linux-btrfs@vger.kernel.org Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:11 -07:00
Liu Bo	0b43e04f70	Btrfs: fix leaf corruption after __btrfs_drop_extents Several reports about leaf corruption has been floating on the list, one of them points to __btrfs_drop_extents(), and we find that the leaf becomes corrupted after __btrfs_drop_extents(), it's really a rare case but it does exist. The problem turns out to be btrfs_next_leaf() called in __btrfs_drop_extents(). So in btrfs_next_leaf(), we release the current path to re-search the last key of the leaf for locating next leaf, and we've taken it into account that there might be balance operations between leafs during this 'unlock and re-lock' dance, so we check the path again and advance it if there are now more items available. But things are a bit different if that last key happens to be removed and balance gets a bigger key as the last one, and btrfs_search_slot will return it with ret > 0, IOW, nothing change in this leaf except the new last key, then we think we're okay because there is no more item balanced in, fine, we thinks we can go to the next leaf. However, we should return that bigger key, otherwise we deserve leaf corruption, for example, in endio, skipping that key means that __btrfs_drop_extents() thinks it has dropped all extent matched the required range and finish_ordered_io can safely insert a new extent, but it actually doesn't and ends up a leaf corruption. One may be asking that why our locking on extent io tree doesn't work as expected, ie. it should avoid this kind of race situation. But in __btrfs_drop_extents(), we don't always find extents which are included within our locking range, IOW, extents can start before our searching start, in this case locking on extent io tree doesn't protect us from the race. This takes the special case into account. Reviewed-by: Filipe Manana <fdmanana@gmail.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:10 -07:00
Filipe Manana	337c6f6830	Btrfs: ensure btrfs_prev_leaf doesn't miss 1 item We might have had an item with the previous key in the tree right before we released our path. And after we released our path, that item might have been pushed to the first slot (0) of the leaf we were holding due to a tree balance. Alternatively, an item with the previous key can exist as the only element of a leaf (big fat item). Therefore account for these 2 cases, so that our callers (like btrfs_previous_item) don't miss an existing item with a key matching the previous key we computed above. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:09 -07:00
Filipe Manana	f82a9901b0	Btrfs: fix clone to deal with holes when NO_HOLES feature is enabled If the NO_HOLES feature is enabled holes don't have file extent items in the btree that represent them anymore. This made the clone operation ignore the gaps that exist between consecutive file extent items and therefore not create the holes at the destination. When not using the NO_HOLES feature, the holes were created at the destination. A test case for xfstests follows. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:09 -07:00
Jeff Mahoney	964930312a	btrfs: free delayed node outside of root->inode_lock On heavy workloads, we're seeing soft lockup warnings on root->inode_lock in __btrfs_release_delayed_node. The low hanging fruit is to reduce the size of the critical section. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:08 -07:00
Gui Hecheng	902c68a4da	btrfs: replace EINVAL with ERANGE for resize when ULLONG_MAX To be accurate about the error case, if the new size is beyond ULLONG_MAX, return ERANGE instead of EINVAL. Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:07 -07:00
Filipe Manana	b05fd8742f	Btrfs: fix transaction leak during fsync call If btrfs_log_dentry_safe() returns an error, we set ret to 1 and fall through with the goal of committing the transaction. However, in the case where the inode doesn't need a full sync, we would call btrfs_wait_ordered_range() against the target range for our inode, and if it returned an error, we would return without commiting or ending the transaction. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:06 -07:00
Qu Wenruo	d77815461f	btrfs: Avoid trucating page or punching hole in a already existed hole. btrfs_punch_hole() will truncate unaligned pages or punch hole on a already existed hole. This will cause unneeded zero page or holes splitting the original huge hole. This patch will skip already existed holes before any page truncating or hole punching. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:06 -07:00
Filipe Manana	3821f34888	Btrfs: update commit root on snapshot creation after orphan cleanup On snapshot creation (either writable or read-only), we do orphan cleanup against the root of the snapshot. If the cleanup did remove any orphans, then the current root node will be different from the commit root node until the next transaction commit happens. A send operation always uses the commit root of a snapshot - this means it will see the orphans if it starts computing the send stream before the next transaction commit happens (triggered by a timer or sync() for .e.g), which is when the commit root gets assigned a reference to current root, where the orphans are not visible anymore. The consequence of send seeing the orphans is explained below. For example: mkfs.btrfs -f /dev/sdd mount -o commit=999 /dev/sdd /mnt # open a file with O_TMPFILE and leave it open # write some data to the file btrfs subvolume snapshot -r /mnt /mnt/snap1 btrfs send /mnt/snap1 -f /tmp/send.data The send operation will fail with the following error: ERROR: send ioctl failed with -116: Stale file handle What happens here is that our snapshot has an orphan inode still visible through the commit root, that corresponds to the tmpfile. However send will attempt to call inode.c:btrfs_iget(), with the goal of reading the file's data, which will return -ESTALE because it will use the current root (and not the commit root) of the snapshot. Of course, there are other cases where we can get orphans, but this example using a tmpfile makes it much easier to reproduce the issue. Therefore on snapshot creation, after calling btrfs_orphan_cleanup, if the commit root is different from the current root, just commit the transaction associated with the snapshot's root (if it exists), so that a send will not see any orphans that don't exist anymore. This also guarantees a send will always see the same content regardless of whether a transaction commit happened already before the send was requested and after the orphan cleanup (meaning the commit root and current roots are the same) or it hasn't happened yet (commit and current roots are different). Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:05 -07:00
Filipe Manana	ff5df9b884	Btrfs: ioctl, don't re-lock extent range when not necessary In ioctl.c:lock_extent_range(), after locking our target range, the ordered extent that btrfs_lookup_first_ordered_extent() returns us may not overlap our target range at all. In this case we would just unlock our target range, wait for any new ordered extents that overlap the range to complete, lock again the range and repeat all these steps until we don't get any ordered extent and the delalloc flag isn't set in the io tree for our target range. Therefore just stop if we get an ordered extent that doesn't overlap our target range and the dealalloc flag isn't set for the range in the inode's io tree. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:04 -07:00
Filipe Manana	2c463823cb	Btrfs: avoid visiting all extent items when cloning a range When cloning a range of a file, we were visiting all the extent items in the btree that belong to our source inode. We don't need to visit those extent items that don't overlap the range we are cloning, as doing so only makes us waste time and do unnecessary btree navigations (btrfs_next_leaf) for inodes that have a large number of file extent items in the btree. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:04 -07:00
Filipe Manana	c55bfa67e9	Btrfs: set dead flag on the right root when destroying snapshot We were setting the BTRFS_ROOT_SUBVOL_DEAD flag on the root of the parent of our target snapshot, instead of setting it in the target snapshot's root. This is easy to observe by running the following scenario: mkfs.btrfs -f /dev/sdd mount /dev/sdd /mnt btrfs subvolume create /mnt/first_subvol btrfs subvolume snapshot -r /mnt /mnt/mysnap1 btrfs subvolume delete /mnt/first_subvol btrfs subvolume snapshot -r /mnt /mnt/mysnap2 btrfs send -p /mnt/mysnap1 /mnt/mysnap2 -f /tmp/send.data The send command failed because the send ioctl returned -EPERM. A test case for xfstests follows. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:03 -07:00
Filipe Manana	c125b8bff1	Btrfs: ensure readers see new data after a clone operation We were cleaning the clone target file range from the page cache before we did replace the file extent items in the fs tree. This was racy, as right after cleaning the relevant range from the page cache and before replacing the file extent items, a read against that range could be performed by another task and populate again the page cache with stale data (stale after the cloning finishes). This would result in reads after the clone operation successfully finishes to get old data (and potentially for a very long time). Therefore evict the pages after replacing the file extent items, so that subsequent reads will always get the new data. Similarly, we were prone to races while cloning the file extent items because we weren't locking the target range and wait for any existing ordered extents against that range to complete. It was possible that after cloning the extent items, a write operation that was performed before the clone operation and overlaps the same range, would end up undoing all or part of the work the clone operation did (a worker task running inode.c:btrfs_finish_ordered_io). Therefore lock the target range in the io tree, wait for all pending ordered extents against that range to finish and then safely perform the cloning. The issue of reading stale data after the clone operation is easy to reproduce by running the following C program in a loop until it exits with return value 1. #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <errno.h> #include <pthread.h> #include <fcntl.h> #include <assert.h> #include <asm/types.h> #include <linux/ioctl.h> #include <sys/stat.h> #include <sys/types.h> #include <sys/ioctl.h> #define SRC_FILE "/mnt/sdd/foo" #define DST_FILE "/mnt/sdd/bar" #define FILE_SIZE (16 * 1024) #define PATTERN_SRC 'X' #define PATTERN_DST 'Y' struct btrfs_ioctl_clone_range_args { __s64 src_fd; __u64 src_offset, src_length; __u64 dest_offset; }; #define BTRFS_IOCTL_MAGIC 0x94 #define BTRFS_IOC_CLONE_RANGE _IOW(BTRFS_IOCTL_MAGIC, 13, \ struct btrfs_ioctl_clone_range_args) static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; static int clone_done = 0; static int reader_ready = 0; static int stale_data = 0; static void reader_loop(void arg) { char buf[4096], want_buf[4096]; memset(want_buf, PATTERN_SRC, 4096); pthread_mutex_lock(&mutex); reader_ready = 1; pthread_mutex_unlock(&mutex); while (1) { int done, fd, ret; fd = open(DST_FILE, O_RDONLY); assert(fd != -1); pthread_mutex_lock(&mutex); done = clone_done; pthread_mutex_unlock(&mutex); ret = read(fd, buf, 4096); assert(ret == 4096); close(fd); if (done) { ret = memcmp(buf, want_buf, 4096); if (ret == 0) { printf("Found new content\n"); } else { printf("Found old content\n"); pthread_mutex_lock(&mutex); stale_data = 1; pthread_mutex_unlock(&mutex); } break; } } return NULL; } int main(int argc, char *argv[]) { pthread_t reader; int ret, i, fd; struct btrfs_ioctl_clone_range_args clone_args; int fd1, fd2; ret = remove(SRC_FILE); if (ret == -1 && errno != ENOENT) { fprintf(stderr, "Error deleting src file: %s\n", strerror(errno)); return 1; } ret = remove(DST_FILE); if (ret == -1 && errno != ENOENT) { fprintf(stderr, "Error deleting dst file: %s\n", strerror(errno)); return 1; } fd = open(SRC_FILE, O_CREAT \| O_WRONLY \| O_TRUNC, S_IRWXU); assert(fd != -1); for (i = 0; i < FILE_SIZE; i++) { char c = PATTERN_SRC; ret = write(fd, &c, 1); assert(ret == 1); } close(fd); fd = open(DST_FILE, O_CREAT \| O_WRONLY \| O_TRUNC, S_IRWXU); assert(fd != -1); for (i = 0; i < FILE_SIZE; i++) { char c = PATTERN_DST; ret = write(fd, &c, 1); assert(ret == 1); } close(fd); sync(); ret = pthread_create(&reader, NULL, reader_loop, NULL); assert(ret == 0); while (1) { int r; pthread_mutex_lock(&mutex); r = reader_ready; pthread_mutex_unlock(&mutex); if (r) break; } fd1 = open(SRC_FILE, O_RDONLY); if (fd1 < 0) { fprintf(stderr, "Error open src file: %s\n", strerror(errno)); return 1; } fd2 = open(DST_FILE, O_RDWR); if (fd2 < 0) { fprintf(stderr, "Error open dst file: %s\n", strerror(errno)); return 1; } clone_args.src_fd = fd1; clone_args.src_offset = 0; clone_args.src_length = 4096; clone_args.dest_offset = 0; ret = ioctl(fd2, BTRFS_IOC_CLONE_RANGE, &clone_args); assert(ret == 0); close(fd1); close(fd2); pthread_mutex_lock(&mutex); clone_done = 1; pthread_mutex_unlock(&mutex); ret = pthread_join(reader, NULL); assert(ret == 0); pthread_mutex_lock(&mutex); ret = stale_data ? 1 : 0; pthread_mutex_unlock(&mutex); return ret; } Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:02 -07:00
Rickard Strandqvist	8321cf2596	fs: btrfs: volumes.c: Fix for possible null pointer dereference There is otherwise a risk of a possible null pointer dereference. Was largely found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:01 -07:00
Jeff Mahoney	c1895442be	btrfs: allocate raid type kobjects dynamically We are currently allocating space_info objects in an array when we allocate space_info. When a user does something like: # btrfs balance start -mconvert=raid1 -dconvert=raid1 /mnt # btrfs balance start -mconvert=single -dconvert=single /mnt -f # btrfs balance start -mconvert=raid1 -dconvert=raid1 / We can end up with memory corruption since the kobject hasn't been reinitialized properly and the name pointer was left set. The rationale behind allocating them statically was to avoid creating a separate kobject container that just contained the raid type. It used the index in the array to determine the index. Ultimately, though, this wastes more memory than it saves in all but the most complex scenarios and introduces kobject lifetime questions. This patch allocates the kobjects dynamically instead. Note that we also remove the kobject_get/put of the parent kobject since kobject_add and kobject_del do that internally. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:01 -07:00
Filipe Manana	7e3ae33efa	Btrfs: send, use the right limits for xattr names and values We were limiting the sum of the xattr name and value lengths to PATH_MAX, which is not correct, specially on filesystems created with btrfs-progs v3.12 or higher, where the default leaf size is max(16384, PAGE_SIZE), or systems with page sizes larger than 4096 bytes. Xattrs have their own specific maximum name and value lengths, which depend on the leaf size, therefore use these limits to be able to send xattrs with sizes larger than PATH_MAX. A test case for xfstests follows. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:00 -07:00
Filipe Manana	1af56070e3	Btrfs: send, don't error in the presence of subvols/snapshots If we are doing an incremental send and the base snapshot has a directory with name X that doesn't exist anymore in the second snapshot and a new subvolume/snapshot exists in the second snapshot that has the same name as the directory (name X), the incremental send would fail with -ENOENT error. This is because it attempts to lookup for an inode with a number matching the objectid of a root, which doesn't exist. Steps to reproduce: mkfs.btrfs -f /dev/sdd mount /dev/sdd /mnt mkdir /mnt/testdir btrfs subvolume snapshot -r /mnt /mnt/mysnap1 rmdir /mnt/testdir btrfs subvolume create /mnt/testdir btrfs subvolume snapshot -r /mnt /mnt/mysnap2 btrfs send -p /mnt/mysnap1 /mnt/mysnap2 -f /tmp/send.data A test case for xfstests follows. Reported-by: Robert White <rwhite@pobox.com> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:59 -07:00
Chris Mason	a79b7d4b3e	Btrfs: async delayed refs Delayed extent operations are triggered during transaction commits. The goal is to queue up a healthly batch of changes to the extent allocation tree and run through them in bulk. This farms them off to async helper threads. The goal is to have the bulk of the delayed operations being done in the background, but this is also important to limit our stack footprint. Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:58 -07:00
Chris Mason	40f765805f	Btrfs: split up __extent_writepage to lower stack usage __extent_writepage has two unrelated parts. First it does the delayed allocation dance and second it does the mapping and IO for the page we're actually writing. This splits it up into those two parts so the stack from one doesn't impact the stack from the other. Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:58 -07:00
Alex Gartrell	fc4adbff82	btrfs: Drop EXTENT_UPTODATE check in hole punching and direct locking In these instances, we are trying to determine if a page has been accessed since we began the operation for the sake of retry. This is easily accomplished by doing a gang lookup in the page mapping radix tree, and it saves us the dependency on the flag (so that we might eventually delete it). btrfs_page_exists_in_range borrows heavily from find_get_page, replacing the radix tree look up with a gang lookup of 1, so that we can find the next highest page >= index and see if it falls into our lock range. Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Alex Gartrell <agartrell@fb.com>	2014-06-09 17:20:57 -07:00
Chris Mason	0e378df15c	Btrfs: cut down stack usage in btree_write_cache_pages This adds noinline_for_stack to two helpers used by btree_write_cache_pages. It shaves us down from 424 bytes on the stack to 280. Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:56 -07:00
Chris Mason	d4452bc526	Btrfs: break up __btrfs_write_out_cache to cut down stack usage __btrfs_write_out_cache was one of our stack pigs. This breaks it up into helper functions and slims it down to 194 bytes. Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:55 -07:00
Josef Bacik	2a10840945	Btrfs: free tmp ulist for qgroup rescan Memory leaks are bad mmkay? Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:55 -07:00
Anand Jain	402a0f4759	btrfs: usage error should not be logged into system log I have an opinion that system logs /var/log/messages are valuable info to investigate the real system issues at the data center. People handling data center issues do spend a lot time and efforts analyzing messages files. Having usage error logged into /var/log/messages is something we should avoid. Signed-off-by: Anand Jain <Anand.Jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:54 -07:00
David Sterba	67a77eb147	btrfs: remove newline from inode cache kthread name Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:53 -07:00
David Sterba	351fd35321	btrfs: remove stale newlines from log messages I've noticed an extra line after "use no compression", but search revealed much more in messages of more critical levels and rare errors. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:53 -07:00
Chris Mason	7d78874273	Btrfs: fix double free in find_lock_delalloc_range We need to NULL the cached_state after freeing it, otherwise we might free it again if find_delalloc_range doesn't find anything. Signed-off-by: Chris Mason <clm@fb.com> cc: stable@vger.kernel.org	2014-06-09 17:20:52 -07:00
ZhangZhen	58dfae6365	btrfs: replace simple_strtoull() with kstrtoull() use the newer and more pleasant kstrtoull() to replace simple_strtoull(), because simple_strtoull() is marked for obsoletion. Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:51 -07:00
Wang Shilong	298658414a	Btrfs: set right total device count for seeding support Seeding device support allows us to create a new filesystem based on existed filesystem. However newly created filesystem's @total_devices should include seed devices. This patch fix the following problem: # mkfs.btrfs -f /dev/sdb # btrfstune -S 1 /dev/sdb # mount /dev/sdb /mnt # btrfs device add -f /dev/sdc /mnt --->fs_devices->total_devices = 1 # umount /mnt # mount /dev/sdc /mnt --->fs_devices->total_devices = 2 This is because we record right @total_devices in superblock, but @fs_devices->total_devices is reset to be 0 in btrfs_prepare_sprout(). Fix this problem by not resetting @fs_devices->total_devices. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:50 -07:00
Guangliang Zhao	45ff35d6b9	Btrfs: remove OPT_acl parse when acl disabled Even CONFIG_BTRFS_FS_POSIX_ACL is not defined, the acl still could been enabled using a mount option, and now fs/btrfs/acl.o is not built, so the mount options will appear to be supported but will be silently ignored. Signed-off-by: Guangliang Zhao <lucienchao@gmail.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:50 -07:00
Josef Bacik	faa2dbf004	Btrfs: add sanity tests for new qgroup accounting code This exercises the various parts of the new qgroup accounting code. We do some basic stuff and do some things with the shared refs to make sure all that code works. I had to add a bunch of infrastructure because I needed to be able to insert items into a fake tree without having to do all the hard work myself, hopefully this will be usefull in the future. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:49 -07:00
Josef Bacik	fcebe4562d	Btrfs: rework qgroup accounting Currently qgroups account for space by intercepting delayed ref updates to fs trees. It does this by adding sequence numbers to delayed ref updates so that it can figure out how the tree looked before the update so we can adjust the counters properly. The problem with this is that it does not allow delayed refs to be merged, so if you say are defragging an extent with 5k snapshots pointing to it we will thrash the delayed ref lock because we need to go back and manually merge these things together. Instead we want to process quota changes when we know they are going to happen, like when we first allocate an extent, we free a reference for an extent, we add new references etc. This patch accomplishes this by only adding qgroup operations for real ref changes. We only modify the sequence number when we need to lookup roots for bytenrs, this reduces the amount of churn on the sequence number and allows us to merge delayed refs as we add them most of the time. This patch encompasses a bunch of architectural changes 1) qgroup ref operations: instead of tracking qgroup operations through the delayed refs we simply add new ref operations whenever we notice that we need to when we've modified the refs themselves. 2) tree mod seq: we no longer have this separation of major/minor counters. this makes the sequence number stuff much more sane and we can remove some locking that was needed to protect the counter. 3) delayed ref seq: we now read the tree mod seq number and use that as our sequence. This means each new delayed ref doesn't have it's own unique sequence number, rather whenever we go to lookup backrefs we inc the sequence number so we can make sure to keep any new operations from screwing up our world view at that given point. This allows us to merge delayed refs during runtime. With all of these changes the delayed ref stuff is a little saner and the qgroup accounting stuff no longer goes negative in some cases like it was before. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:48 -07:00
Liu Bo	5dca6eea91	Btrfs: mark mapping with error flag to report errors to userspace According to commit `865ffef379` (fs: fix fsync() error reporting), it's not stable to just check error pages because pages can be truncated or invalidated, we should also mark mapping with error flag so that a later fsync can catch the error. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:47 -07:00

1 2 3 4 5 ...