Commit graph

41400 commits

Author SHA1 Message Date
Yan, Zheng
a2971c8ccb ceph: send TID of the oldest pending caps flush to MDS
According to this information, MDS can trim its completed caps flush
list (which is used to detect duplicated cap flush).

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:31 +03:00
Yan, Zheng
8310b08913 ceph: track pending caps flushing globally
So we know TID of the oldest pending caps flushing. Later patch will
send this information to MDS, so that MDS can trim its completed caps
flush list.

Tracking pending caps flushing globally also simplifies syncfs code.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:31 +03:00
Yan, Zheng
553adfd941 ceph: track pending caps flushing accurately
Previously we do not trace accurate TID for flushing caps. when
MDS failovers, we have no choice but to re-send all flushing caps
with a new TID. This can cause problem because MDS can has already
flushed some caps and has issued the same caps to other client.
The re-sent cap flush has a new TID, which makes MDS unable to
detect if it has already processed the cap flush.

This patch adds code to track pending caps flushing accurately.
When re-sending cap flush is needed, we use its original flush
TID.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:30 +03:00
Yan, Zheng
da819c8150 ceph: fix directory fsync
fsync() on directory should flush dirty caps and wait for any
uncommitted directory opertions to commit. But ceph_dir_fsync()
only waits for uncommitted directory opertions.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:30 +03:00
Yan, Zheng
89b52fe14d ceph: fix flushing caps
Current ceph_fsync() only flushes dirty caps and wait for them to be
flushed. It doesn't wait for caps that has already been flushing.
This patch makes ceph_fsync() wait for pending flushing caps too.
Besides, this patch also makes caps_are_flushed() peroperly handle
tid wrapping.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:30 +03:00
Yan, Zheng
41445999ae ceph: don't include used caps in cap_wanted
when copying files to cephfs, file data may stay in page cache after
corresponding file is closed. Cached data use Fc capability. If we
include Fc capability in cap_wanted, MDS will treat files with cached
data as open files, and journal them in an EOpen event when trimming
log segment.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:30 +03:00
Yan, Zheng
3e0708b990 ceph: ratelimit warn messages for MDS closes session
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:30 +03:00
Ilya Dryomov
5be7303477 ceph: simplify two mount_timeout sites
No need to bifurcate wait now that we've got ceph_timeout_jiffies().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:29 +03:00
Ilya Dryomov
a319bf56a6 libceph: store timeouts in jiffies, verify user input
There are currently three libceph-level timeouts that the user can
specify on mount: mount_timeout, osd_idle_ttl and osdkeepalive.  All of
these are in seconds and no checking is done on user input: negative
values are accepted, we multiply them all by HZ which may or may not
overflow, arbitrarily large jiffies then get added together, etc.

There is also a bug in the way mount_timeout=0 is handled.  It's
supposed to mean "infinite timeout", but that's not how wait.h APIs
treat it and so __ceph_open_session() for example will busy loop
without much chance of being interrupted if none of ceph-mons are
there.

Fix all this by verifying user input, storing timeouts capped by
msecs_to_jiffies() in jiffies and using the new ceph_timeout_jiffies()
helper for all user-specified waits to handle infinite timeouts
correctly.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2015-06-25 11:49:29 +03:00
Yan, Zheng
e8a7b8b12b ceph: exclude setfilelock requests when calculating oldest tid
setfilelock requests can block for a long time, which can prevent
client from advancing its oldest tid.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:29 +03:00
Yan, Zheng
745a8e3bcc ceph: don't pre-allocate space for cap release messages
Previously we pre-allocate cap release messages for each caps. This
wastes lots of memory when there are large amount of caps. This patch
make the code not pre-allocate the cap release messages. Instead,
we add the corresponding ceph_cap struct to a list when releasing a
cap. Later when flush cap releases is needed, we allocate the cap
release messages dynamically.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:29 +03:00
Yan, Zheng
affbc19a68 ceph: make sure syncfs flushes all cap snaps
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:29 +03:00
Yan, Zheng
622f3e250f ceph: don't trim auth cap when there are cap snaps
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:28 +03:00
Yan, Zheng
604d1b0245 ceph: take snap_rwsem when accessing snap realm's cached_context
When ceph inode's i_head_snapc is NULL, __ceph_mark_dirty_caps()
accesses snap realm's cached_context. So we need take read lock
of snap_rwsem.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:28 +03:00
Yan, Zheng
8605609049 ceph: avoid sending unnessesary FLUSHSNAP message
when a snap notification contains no new snapshot, we can avoid
sending FLUSHSNAP message to MDS. But we still need to create
cap_snap in some case because it's required by write path and
page writeback path

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:28 +03:00
Yan, Zheng
5dda377cf0 ceph: set i_head_snapc when getting CEPH_CAP_FILE_WR reference
In most cases that snap context is needed, we are holding
reference of CEPH_CAP_FILE_WR. So we can set ceph inode's
i_head_snapc when getting the CEPH_CAP_FILE_WR reference,
and make codes get snap context from i_head_snapc. This makes
the code simpler.

Another benefit of this change is that we can handle snap
notification more elegantly. Especially when snap context
is updated while someone else is doing write. The old queue
cap_snap code may set cap_snap's context to ether the old
context or the new snap context, depending on if i_head_snapc
is set. The new queue capp_snap code always set cap_snap's
context to the old snap context.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:28 +03:00
Yan, Zheng
7b06a826e7 ceph: use empty snap context for uninline_data and get_pool_perm
Cached_context in ceph_snap_realm is directly accessed by
uninline_data() and get_pool_perm(). This is racy in theory.
both uninline_data() and get_pool_perm() do not modify existing
object, they only create new object. So we can pass the empty
snap context to them.  Unlike cached_context in ceph_snap_realm,
we do not need to protect the empty snap context.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:28 +03:00
Yan, Zheng
10183a6955 ceph: check OSD caps before read/write
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:28 +03:00
Yan, Zheng
144cba1493 libceph: allow setting osd_req_op's flags
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2015-06-25 11:49:27 +03:00
Linus Torvalds
aefbef10e3 Merge branch 'akpm' (patches from Andrew)
Merge first patchbomb from Andrew Morton:

 - a few misc things

 - ocfs2 udpates

 - kernel/watchdog.c feature work (took ages to get right)

 - most of MM.  A few tricky bits are held up and probably won't make 4.2.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (91 commits)
  mm: kmemleak_alloc_percpu() should follow the gfp from per_alloc()
  mm, thp: respect MPOL_PREFERRED policy with non-local node
  tmpfs: truncate prealloc blocks past i_size
  mm/memory hotplug: print the last vmemmap region at the end of hot add memory
  mm/mmap.c: optimization of do_mmap_pgoff function
  mm: kmemleak: optimise kmemleak_lock acquiring during kmemleak_scan
  mm: kmemleak: avoid deadlock on the kmemleak object insertion error path
  mm: kmemleak: do not acquire scan_mutex in kmemleak_do_cleanup()
  mm: kmemleak: fix delete_object_*() race when called on the same memory block
  mm: kmemleak: allow safe memory scanning during kmemleak disabling
  memcg: convert mem_cgroup->under_oom from atomic_t to int
  memcg: remove unused mem_cgroup->oom_wakeups
  frontswap: allow multiple backends
  x86, mirror: x86 enabling - find mirrored memory ranges
  mm/memblock: allocate boot time data structures from mirrored memory
  mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute
  mm: do not ignore mapping_gfp_mask in page cache allocation paths
  mm/cma.c: fix typos in comments
  mm/oom_kill.c: print points as unsigned int
  mm/hugetlb: handle races in alloc_huge_page and hugetlb_reserve_pages
  ...
2015-06-24 20:47:21 -07:00
Linus Torvalds
266da6f142 Miscellaneous pstore improvements
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJViyiCAAoJEKurIx+X31iB3acP/1kPYKClGZLoqH6wqHNM/djq
 ROsPm9PDt9g7WZ/yw5HpKuUzC+XhOg4odYpZIy+6PogW6BUCumxPCLVq/qbVSPhT
 Q7Pv0mlmjyJS+kj6FncWuJrJG0xQfKYE6OeYrnkyHfrHYmJunf1bS6K71CDGHJAa
 O2bQr4E67twuU8yQR8BZ+YlZu4NTzPYZ4JWmb9Wepm3seIM2GEJsNRZ7WJXH7BOv
 BHOPi8FDr8fMkA+2WitE853gYvcTYcuxlsDgRumtGzWDhRIUH8Q5yS9QLAFd0Rly
 BW7YOHYCY1L75RJxnVTWd04GNrepxe4LY1bbtx+mqI6FrdMw0dK0M5BKuohV+BT0
 tBC/anSHBOqua/aDA6m+8c+p8I7qp1wXNHtmm15lqKQg1YHvh0Rs7FlP3HBdLDxQ
 rmUQHTcVQGaf00GCTUgsEn80kW8FYYtOnh4FJcbSkLdU8/mkr1+1rE/3i8ob/AL9
 EsJgzqptT9/VBX3j6tZgk8tt6xstLMGVw/DmScjxeLqA2WoaINh5XRPgGCGWQW6T
 wa9nFGBuJFtJv9NfVlMEe8xekxyDa6xPIgoCheWBcV9NFRmfBrUmbG4yF127nqTG
 TXYBzFPSxdBn0FqGIaz+6RPbcGd7tZuz317sIYHTgHBWQHeoWG9TJGHeGt8L5uql
 eKMoHAvVRZIyBuZruUeB
 =248K
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux

Pull pstore updates from Tony Luck:
 "Miscellaneous pstore improvements"

* tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  ramoops: make it possible to change mem_type param.
  pstore/ram: verify ramoops header before saving record
  fs/pstore: Optimization function ramoops_init_przs
  fs/pstore: update the backend parameter in pstore module
  pstore: do not use message compression without lock
2015-06-24 20:42:21 -07:00
Linus Torvalds
cfcc0ad47f Merge tag 'for-f2fs-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "New features:
   - per-file encryption (e.g., ext4)
   - FALLOC_FL_ZERO_RANGE
   - FALLOC_FL_COLLAPSE_RANGE
   - RENAME_WHITEOUT

  Major enhancement/fixes:
   - recovery broken superblocks
   - enhance f2fs_trim_fs with a discard_map
   - fix a race condition on dentry block allocation
   - fix a deadlock during summary operation
   - fix a missing fiemap result

  .. and many minor bug fixes and clean-ups were done"

* tag 'for-f2fs-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (83 commits)
  f2fs: do not trim preallocated blocks when truncating after i_size
  f2fs crypto: add alloc_bounce_page
  f2fs crypto: fix to handle errors likewise ext4
  f2fs: drop the volatile_write flag only
  f2fs: skip committing valid superblock
  f2fs: setting discard option in parse_options()
  f2fs: fix to return exact trimmed size
  f2fs: support FALLOC_FL_INSERT_RANGE
  f2fs: hide common code in f2fs_replace_block
  f2fs: disable the discard option when device doesn't support
  f2fs crypto: remove alloc_page for bounce_page
  f2fs: fix a deadlock for summary page lock vs. sentry_lock
  f2fs crypto: clean up error handling in f2fs_fname_setup_filename
  f2fs crypto: avoid f2fs_inherit_context for symlink
  f2fs crypto: do not set encryption policy for non-directory by ioctl
  f2fs crypto: allow setting encryption policy once
  f2fs crypto: check context consistent for rename2
  f2fs: avoid duplicated code by reusing f2fs_read_end_io
  f2fs crypto: use per-inode tfm structure
  f2fs: recovering broken superblock during mount
  ...
2015-06-24 20:38:29 -07:00
Linus Torvalds
a7296b49fb Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF fixes and cleanups from Jan Kara:
 "The contains some small fixes and improvements in error handling for
  UDF.

  Bundled is also one ext3 coding style fix and a fix in quota
  documentation"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: fix udf_load_pvoldesc()
  udf: remove double err declaration in udf_file_write_iter()
  UDF: support NFSv2 export
  fs: ext3: super: fixed a space coding style issue
  quota: Update documentation
  udf: Return error from udf_find_entry()
  udf: Make udf_get_filename() return error instead of 0 length file name
  udf: bug on exotic flag in udf_get_filename()
  udf: improve error management in udf_CS0toNLS()
  udf: improve error management in udf_CS0toUTF8()
  udf: unicode: update function name in comments
  udf: remove unnecessary test in udf_build_ustr_exact()
  udf: Return -ENOMEM when allocation fails in udf_get_filename()
2015-06-24 20:07:10 -07:00
Michal Hocko
6afdb859b7 mm: do not ignore mapping_gfp_mask in page cache allocation paths
page_cache_read, do_generic_file_read, __generic_file_splice_read and
__ntfs_grab_cache_pages currently ignore mapping_gfp_mask when calling
add_to_page_cache_lru which might cause recursion into fs down in the
direct reclaim path if the mapping really relies on GFP_NOFS semantic.

This doesn't seem to be the case now because page_cache_read (page fault
path) doesn't seem to suffer from the reclaim recursion issues and
do_generic_file_read and __generic_file_splice_read also shouldn't be
called under fs locks which would deadlock in the reclaim path.  Anyway it
is better to obey mapping gfp mask and prevent from later breakage.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:44 -07:00
Zhang Zhen
a67a31fa30 mm/hugetlb: reduce arch dependent code about hugetlb_prefault_arch_hook
Currently we have many duplicates in definitions of
hugetlb_prefault_arch_hook.  In all architectures this function is empty.

Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:41 -07:00
Chris Metcalf
f51c0eaee3 procfs: treat parked tasks as sleeping for task state
Allowing watchdog threads to be parked means that we now have the
opportunity of actually seeing persistent parked threads in the output
of /proc/<pid>/stat and /proc/<pid>/status.  The existing code reported
such threads as "Running", which is kind-of true if you think of the
case where we park them as part of taking cpus offline.  But if we allow
parking them indefinitely, "Running" is pretty misleading, so we report
them as "Sleeping" instead.

We could simply report them with a new string, "Parked", but it feels
like it's a bit risky for userspace to see unexpected new values; the
output is already documented in Documentation/filesystems/proc.txt, and
it seems like a mistake to change that lightly.

The scheduler does report parked tasks with a "P" in debugging output
from sched_show_task() or dump_cpu_task(), but that's a different API.
Similarly, the trace_ctxwake_* routines report a "P" for parked tasks,
but again, different API.

This change seemed slightly cleaner than updating the task_state_array
to have additional rows.  TASK_DEAD should be subsumed by the exit_state
bits; TASK_WAKEKILL is just a modifier; and TASK_WAKING can very
reasonably be reported as "Running" (as it is now).  Only TASK_PARKED
shows up with unreasonable output here.

Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:40 -07:00
Joseph Qi
b519ea6d9a ocfs2: mark local functions as static
Some functions are only used locally, so mark them as static.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:40 -07:00
Fabian Frederick
ab1ba02181 ocfs2: use swap() in ocfs2_double_lock()
Use kernel.h macro definition.

Thanks to Julia Lawall for Coccinelle scripting support.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:40 -07:00
Fabian Frederick
a612543fd1 ocfs2: use swap() in swap_refcount_rec()
Use kernel.h macro definition.

Thanks to Julia Lawall for Coccinelle scripting support.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:40 -07:00
Fabian Frederick
2a28f98c49 ocfs2: use swap() in dx_leaf_sort_swap()
Use kernel.h macro definition.

Thanks to Julia Lawall for Coccinelle scripting support.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:40 -07:00
Joseph Qi
ae1f081467 ocfs2: fix wrong check in ocfs2_direct_IO_get_blocks
contig_blocks gotten from ocfs2_extent_map_get_blocks cannot be compared
with clusters_to_alloc. So convert it to clusters first.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Weiwei Wang <wangww631@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:40 -07:00
Xue jiufei
74e364ad1b ocfs2: fix NULL pointer dereference in function ocfs2_abort_trigger()
ocfs2_abort_trigger() use bh->b_assoc_map to get sb.  But there's no
function to set bh->b_assoc_map in ocfs2, it will trigger NULL pointer
dereference while calling this function.  We can get sb from
bh->b_bdev->bd_super instead of b_assoc_map.

[akpm@linux-foundation.org: update comment, per Joseph]
Signed-off-by: joyce.xue <xuejiufei@huawei.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:39 -07:00
alex chen
fce56d841e ocfs2: o2net: should remove debugfs in o2net_init() out branch
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:39 -07:00
WeiWei Wang
fa5a0eb3b0 ocfs2: remove OCFS2_IOCB_SEM lock type in direct io
In ocfs2 direct read/write, OCFS2_IOCB_SEM lock type is used to protect
inode->i_alloc_sem rw semaphore lock in the earlier kernel version.
However, in the latest kernel, inode->i_alloc_sem rw semaphore lock is not
used at all, so OCFS2_IOCB_SEM lock type needs to be removed.

Signed-off-by: Weiwei Wang <wangww631@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:39 -07:00
Joseph Qi
e272e7f0fb ocfs2: do not BUG if jbd2_journal_dirty_metadata fails
jbd2_journal_dirty_metadata may fail.  Currently it cannot take care of
non zero return value and just BUG in ocfs2_journal_dirty.  This patch is
aborting the handle and journal instead of BUG.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: joyce.xue <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:39 -07:00
Xue jiufei
099768b0c6 ocfs2: remove BUG_ON(!empty_extent) in __ocfs2_rotate_tree_left()
ocfs2_rotate_tree_left() calls __ocfs2_rotate_tree_left() for left
rotation while non-rightmost path containing an empty extent in the leaf
block.  __ocfs2_rotate_tree_left() returns -EAGAIN if right subtree having an
empty extent and pass the empty_extent_path to caller.  The caller
ocfs2_rotate_tree_left() will restart rotation from the returned path.

It will trigger the BUG_ON(!ocfs2_is_empty_extent) when the et on disk
is as follows:

eb0 is the leaf block of path(say path_a) passed to
ocfs2_rotate_tree_left, which has an empty rec[0].

eb1 is the leaf block of path(say path_b) that just right to path_a, which
has no empty record.

eb2 is the leaf block of path(say path_c) that just right to path_b, which
has an empty rec[0].  And path_c is also the rightmost path.

Now we want to remove the empty rec[0] in eb0:

ocfs2_rotate_tree_left:
  -> call __ocfs2_rotate_tree_left with path_a as its input *path*
    -> call ocfs2_rotate_subtree_left with path_a as its input
       *left_path* and path_b as its input *right_path*. it will move
       rec[0] in eb1 to eb0, and rec[0] in eb0 is not empty now.
    -> continue to call ocfs2_rotate_subtree_left with path_b as its
       input *left_path* and path_c as its input *right_path*, and
       return -EAGAIN because eb2 has an empty rec[0]
  -> call __ocfs2_rotate_tree_left with path_c as it input, rotate all
     records in eb2 to left and return 0.
  -> call __ocfs2_rotate_tree_left with path_a as its input, and triggers
     the BUG_ON(!ocfs2_is_empty_extent) as the rec[0] in eb0 is not empty.

So the BUG_ON() should be removed and return 0 if rec[0] is no longer an
empty extent.

Signed-off-by: joyce.xue <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:39 -07:00
Xue jiufei
9f99ad0861 ocfs2: return error when ocfs2_figure_merge_contig_type() fails
ocfs2_figure_merge_contig_type() still returns CONTIG_NONE when some error
occurs which will cause an unpredictable error.  So return a proper errno
when ocfs2_figure_merge_contig_type() fails.

Signed-off-by: joyce.xue <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:39 -07:00
Joseph Qi
345dc681bd ocfs2/dlm: cleanup unused function __dlm_wait_on_lockres_flags_set
__dlm_wait_on_lockres_flags_set() is declared but not implemented and
used.  So remove it.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:39 -07:00
Daeseok Youn
2e17315242 ocfs2: use retval instead of status for checking error
The use of 'status' in __ocfs2_add_entry() can return wrong value.

Some functions' return value in __ocfs2_add_entry(), i.e
ocfs2_journal_access_di() is saved to 'status'.  But 'status' is not
used in 'bail' label for returning result of __ocfs2_add_entry().

So use retval instead of status.

Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:39 -07:00
Joseph Qi
cf1776a9e8 ocfs2: fix a tiny race when truncate dio orohaned entry
Once dio crashed it will leave an entry in orphan dir.  And orphan scan
will take care of the clean up.  There is a tiny race case that the same
entry will be truncated twice and then trigger the BUG in
ocfs2_del_inode_from_orphan.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:39 -07:00
Andrew Morton
e327284abb ocfs2: remove __mlog_cpu_guess
raw_smp_processor_id() is the means of avoiding the runtime preemptibility
check.

[akpm@linux-foundation.org: fix printk warning]
Cc: Joe Perches <joe@perches.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:39 -07:00
Joe Perches
7c2bd2f930 ocfs2: reduce object size of mlog uses
Using a function for __mlog_printk instead of a macro reduces the object
size of built-in.o by about 190KB, or ~18% overall (x86-64 defconfig
with all ocfs2 options)

  $ size fs/ocfs2/built-in.o*
     text    data     bss     dec     hex filename
   870954  118471  134408 1123833  1125f9 fs/ocfs2/built-in.o,new
  1064081  118071  134408 1316560  1416d0 fs/ocfs2/built-in.o.old

Miscellanea:

 - Move the used-once __mlog_cpu_guess statement expression macro to the
   masklog.c file above the use in __mlog_printk function

 - Simplify the mlog macro moving the and/or logic and level code into
   __mlog_printk

[akpm@linux-foundation.org: export __mlog_printk() to other ocfs2 modules]
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:39 -07:00
Fabian Frederick
5286d20c4e configfs: unexport/make static config_item_init()
config_item_init() is only used in item.c

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:39 -07:00
Pekka Enberg
b0cbeee72f NTFS: use kvfree() in ntfs_free()
Use kvfree() instead of open-coding it.

Signed-off-by: Pekka Enberg <penberg@kernel.org>
Cc: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:38 -07:00
Linus Torvalds
e0456717e4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Add TX fast path in mac80211, from Johannes Berg.

 2) Add TSO/GRO support to ibmveth, from Thomas Falcon

 3) Move away from cached routes in ipv6, just like ipv4, from Martin
    KaFai Lau.

 4) Lots of new rhashtable tests, from Thomas Graf.

 5) Run ingress qdisc lockless, from Alexei Starovoitov.

 6) Allow servers to fetch TCP packet headers for SYN packets of new
    connections, for fingerprinting.  From Eric Dumazet.

 7) Add mode parameter to pktgen, for testing receive.  From Alexei
    Starovoitov.

 8) Cache access optimizations via simplifications of build_skb(), from
    Alexander Duyck.

 9) Move page frag allocator under mm/, also from Alexander.

10) Add xmit_more support to hv_netvsc, from KY Srinivasan.

11) Add a counter guard in case we try to perform endless reclassify
    loops in the packet scheduler.

12) Extern flow dissector to be programmable and use it in new "Flower"
    classifier.  From Jiri Pirko.

13) AF_PACKET fanout rollover fixes, performance improvements, and new
    statistics.  From Willem de Bruijn.

14) Add netdev driver for GENEVE tunnels, from John W Linville.

15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso.

16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet.

17) Add an ECN retry fallback for the initial TCP handshake, from Daniel
    Borkmann.

18) Add tail call support to BPF, from Alexei Starovoitov.

19) Add several pktgen helper scripts, from Jesper Dangaard Brouer.

20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa.

21) Favor even port numbers for allocation to connect() requests, and
    odd port numbers for bind(0), in an effort to help avoid
    ip_local_port_range exhaustion.  From Eric Dumazet.

22) Add Cavium ThunderX driver, from Sunil Goutham.

23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata,
    from Alexei Starovoitov.

24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai.

25) Double TCP Small Queues default to 256K to accomodate situations
    like the XEN driver and wireless aggregation.  From Wei Liu.

26) Add more entropy inputs to flow dissector, from Tom Herbert.

27) Add CDG congestion control algorithm to TCP, from Kenneth Klette
    Jonassen.

28) Convert ipset over to RCU locking, from Jozsef Kadlecsik.

29) Track and act upon link status of ipv4 route nexthops, from Andy
    Gospodarek.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits)
  bridge: vlan: flush the dynamically learned entries on port vlan delete
  bridge: multicast: add a comment to br_port_state_selection about blocking state
  net: inet_diag: export IPV6_V6ONLY sockopt
  stmmac: troubleshoot unexpected bits in des0 & des1
  net: ipv4 sysctl option to ignore routes when nexthop link is down
  net: track link-status of ipv4 nexthops
  net: switchdev: ignore unsupported bridge flags
  net: Cavium: Fix MAC address setting in shutdown state
  drivers: net: xgene: fix for ACPI support without ACPI
  ip: report the original address of ICMP messages
  net/mlx5e: Prefetch skb data on RX
  net/mlx5e: Pop cq outside mlx5e_get_cqe
  net/mlx5e: Remove mlx5e_cq.sqrq back-pointer
  net/mlx5e: Remove extra spaces
  net/mlx5e: Avoid TX CQE generation if more xmit packets expected
  net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion
  net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq()
  net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them
  net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues
  net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device
  ...
2015-06-24 16:49:49 -07:00
Dan Carpenter
5a5003df98 btrfs: delayed-ref: double free in btrfs_add_delayed_tree_ref()
There is a cut and paste error so instead of freeing "head_ref", we free
"ref" twice.

Fixes: 3368d001ba ('btrfs: qgroup: Record possible quota-related extent for qgroup.')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-06-24 12:28:03 -07:00
Peng Tao
97ba375b5d pnfs/flexfiles: report layoutstat regularly
As a simple scheme, report every minute if IO is still going on.

Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-24 10:54:23 -04:00
Peng Tao
1bfe3b259f nfs42: serialize LAYOUTSTATS calls of the same file
There is no need to report concurrently.

Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-24 10:53:11 -04:00
Peng Tao
27c4306443 pnfs/flexfiles: encode LAYOUTSTATS flexfiles specific data
Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-24 10:53:11 -04:00
Peng Tao
ad4dc53e64 pnfs/flexfiles: add ff_layout_prepare_layoutstats
It fills in the generic part of LAYOUTSTATS call. One thing to note
is that we don't really track if IO is continuous or not. So just fake
to use the completed bytes for it.

Still missing flexfiles specific part, which will be included in the next patch.

Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-24 10:53:11 -04:00