Document give_up_on_write argument of mpage_map_and_submit_extent().
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
It doesn't make sense to require io_end->handle when we are in
nojournal mode. So update the assertion accordingly to avoid false
warnings from ext4_add_complete_io().
Reported-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Currently glocks have an atomic reference count and also a spinlock
which covers various internal fields, such as the state. This intent of
this patch is to replace the spinlock and the atomic reference count
with a lockref structure. This contains a spinlock which we can continue
to use as before, and a reference counter which is used in conjuction
with the spinlock to replace the previous atomic counter.
As a result of this there are some new rules for reference counting on
glocks. We need to distinguish between reference count changes under
gl_spin (which are now just increment or decrement of the new counter,
provided the count cannot hit zero) and those which are outside of
gl_spin, but which now take gl_spin internally.
The conversion is relatively straight forward. There is probably some
further clean up which can be done, but the priority at this stage is to
make the change in as simple a manner as possible.
A consequence of this change is that the reference count is being
decoupled from the lru list processing. This should allow future
adoption of the lru_list code with glocks in due course.
The reason for using the "dead" state and not just relying on 0 being
the "invalid state" is so that in due course 0 ref counts can be
allowable. The intent is to eventually be able to remove the ref count
changes which are currently hidden away in state_change().
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Functions that walk the ntstatus_to_dos_map[] array could
run off the end. For example, ntstatus_to_dos() loops
while ntstatus_to_dos_map[].ntstatus is not 0. Granted,
this is mostly theoretical, but could be used as a DOS attack
if the error code in the SMB header is bogus.
[Might consider adding to stable, as this patch is low risk - Steve]
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Steve French <smfrench@gmail.com>
While looking at the code, I noticed that bin_attribute read() and write()
ops copy the inode size into an int for futher comparisons.
Some bin_attributes can be fairly large. For example, pci creates some for
BARs set to the BAR size and giant BARs are around the corner, so this is
going to break something somewhere eventually.
Let's use the right type.
[adjust for seqfile conversions, only needed for bin_read() - gkh]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
375b611e60 ("sysfs: remove sysfs_buffer->ops") introduced
sysfs_file_ops() which determines the associated file operation of a
given sysfs_dirent. As file ops access should be protected by an
active reference, the new function includes a lockdep assertion on the
sysfs_dirent; unfortunately, I forgot to take attr->ignore_lockdep
flag into account and the lockdep assertion trips spuriously for files
which opt out from active reference lockdep checking.
# cat /sys/devices/pci0000:00/0000:00:01.2/usb1/authorized
------------[ cut here ]------------
WARNING: CPU: 1 PID: 540 at /work/os/work/fs/sysfs/file.c:79 sysfs_file_ops+0x4e/0x60()
Modules linked in:
CPU: 1 PID: 540 Comm: cat Not tainted 3.11.0-work+ #3
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
0000000000000009 ffff880016205c08 ffffffff81ca0131 0000000000000000
ffff880016205c40 ffffffff81096d0d ffff8800166cb898 ffff8800166f6f60
ffffffff8125a220 ffff880011ab1ec0 ffff88000aff0c78 ffff880016205c50
Call Trace:
[<ffffffff81ca0131>] dump_stack+0x4e/0x82
[<ffffffff81096d0d>] warn_slowpath_common+0x7d/0xa0
[<ffffffff81096dea>] warn_slowpath_null+0x1a/0x20
[<ffffffff8125994e>] sysfs_file_ops+0x4e/0x60
[<ffffffff8125a274>] sysfs_open_file+0x54/0x300
[<ffffffff811df612>] do_dentry_open.isra.17+0x182/0x280
[<ffffffff811df820>] finish_open+0x30/0x40
[<ffffffff811f0623>] do_last+0x503/0xd90
[<ffffffff811f0f6b>] path_openat+0xbb/0x6d0
[<ffffffff811f23ba>] do_filp_open+0x3a/0x90
[<ffffffff811e09a9>] do_sys_open+0x129/0x220
[<ffffffff811e0abe>] SyS_open+0x1e/0x20
[<ffffffff81caf3c2>] system_call_fastpath+0x16/0x1b
---[ end trace aa48096b111dafdb ]---
Rename fs/sysfs/dir.c::ignore_lockdep() to sysfs_ignore_lockdep() and
move it to fs/sysfs/sysfs.h and make sysfs_file_ops() skip lockdep
assertion if sysfs_ignore_lockdep() is true.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Olga reported that file descriptors opened with O_PATH do not work with
fstatfs(), found during further development of ksh93's thread support.
There is no reason to not allow O_PATH file descriptors here (fstatfs is
very much a path operation), so use "fdget_raw()". See commit
55815f7014 ("vfs: make O_PATH file descriptors usable for 'fstat()'")
for a very similar issue reported for fstat() by the same team.
Reported-and-tested-by: ольга крыжановская <olga.kryzhanovska@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org # O_PATH introduced in 3.0+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull btrfs fixes from Chris Mason:
"We've got more bug fixes in my for-linus branch:
One of these fixes another corner of the compression oops from last
time. Miao nailed down some problems with concurrent snapshot
deletion and drive balancing.
I kept out one of his patches for more testing, but these are all
stable"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix oops caused by the space balance and dead roots
Btrfs: insert orphan roots into fs radix tree
Btrfs: limit delalloc pages outside of find_delalloc_range
Btrfs: use right root when checking for hash collision
If we take the 2nd retry path in ext4_expand_extra_isize_ea, we
potentionally return from the function without having freed these
allocations. If we don't do the return, we over-write the previous
allocation pointers, so we leak either way.
Spotted with Coverity.
[ Fixed by tytso to set is and bs to NULL after freeing these
pointers, in case in the retry loop we later end up triggering an
error causing a jump to cleanup, at which point we could have a double
free bug. -- Ted ]
Signed-off-by: Dave Jones <davej@fedoraproject.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Cc: stable@vger.kernel.org
aio kiocb refcounting was broken - it was relying on keeping track of
the number of available ring buffer entries, which it needs to do
anyways; then at shutdown time it'd wait for completions to be delivered
until the # of available ring buffer entries equalled what it was
initialized to.
Problem with that is that the ring buffer is mapped writable into
userspace, so userspace could futz with the head and tail pointers to
cause the kernel to see extra completions, and cause free_ioctx() to
return while there were still outstanding kiocbs. Which would be bad.
Fix is just to directly refcount the kiocbs - which is more
straightforward, and with the new percpu refcounting code doesn't cost
us any cacheline bouncing which was the whole point of the original
scheme.
Also clean up ioctx_alloc()'s error path and fix a bug where it wasn't
subtracting from aio_nr if ioctx_add_table() failed.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
When doing space balance and subvolume destroy at the same time, we met
the following oops:
kernel BUG at fs/btrfs/relocation.c:2247!
RIP: 0010: [<ffffffffa04cec16>] prepare_to_merge+0x154/0x1f0 [btrfs]
Call Trace:
[<ffffffffa04b5ab7>] relocate_block_group+0x466/0x4e6 [btrfs]
[<ffffffffa04b5c7a>] btrfs_relocate_block_group+0x143/0x275 [btrfs]
[<ffffffffa0495c56>] btrfs_relocate_chunk.isra.27+0x5c/0x5a2 [btrfs]
[<ffffffffa0459871>] ? btrfs_item_key_to_cpu+0x15/0x31 [btrfs]
[<ffffffffa048b46a>] ? btrfs_get_token_64+0x7e/0xcd [btrfs]
[<ffffffffa04a3467>] ? btrfs_tree_read_unlock_blocking+0xb2/0xb7 [btrfs]
[<ffffffffa049907d>] btrfs_balance+0x9c7/0xb6f [btrfs]
[<ffffffffa049ef84>] btrfs_ioctl_balance+0x234/0x2ac [btrfs]
[<ffffffffa04a1e8e>] btrfs_ioctl+0xd87/0x1ef9 [btrfs]
[<ffffffff81122f53>] ? path_openat+0x234/0x4db
[<ffffffff813c3b78>] ? __do_page_fault+0x31d/0x391
[<ffffffff810f8ab6>] ? vma_link+0x74/0x94
[<ffffffff811250f5>] vfs_ioctl+0x1d/0x39
[<ffffffff811258c8>] do_vfs_ioctl+0x32d/0x3e2
[<ffffffff811259d4>] SyS_ioctl+0x57/0x83
[<ffffffff813c3bfa>] ? do_page_fault+0xe/0x10
[<ffffffff813c73c2>] system_call_fastpath+0x16/0x1b
It is because we returned the error number if the reference of the root was 0
when doing space relocation. It was not right here, because though the root
was dead(refs == 0), but the space it held still need be relocated, or we
could not remove the block group. So in this case, we should return the root
no matter it is dead or not.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Now we don't drop all the deleted snapshots/subvolumes before the space
balance. It means we have to relocate the space which is held by the dead
snapshots/subvolumes. So we must into them into fs radix tree, or we would
forget to commit the change of them when doing transaction commit, and it
would corrupt the metadata.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Liu fixed part of this problem and unfortunately I steered him in slightly the
wrong direction and so didn't completely fix the problem. The problem is we
limit the size of the delalloc range we are looking for to max bytes and then we
try to lock that range. If we fail to lock the pages in that range we will
shrink the max bytes to a single page and re loop. However if our first page is
inside of the delalloc range then we will end up limiting the end of the range
to a period before our first page. This is illustrated below
[0 -------- delalloc range --------- 256mb]
[page]
So find_delalloc_range will return with delalloc_start as 0 and end as 128mb,
and then we will notice that delalloc_start < *start and adjust it up, but not
adjust delalloc_end up, so things go sideways. To fix this we need to not limit
the max bytes in find_delalloc_range, but in find_lock_delalloc_range and that
way we don't end up with this confusion. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
btrfs_rename was using the root of the old dir instead of the root of the new
dir when checking for a hash collision, so if you tried to move a file into a
subvol it would freak out because it would see the file you are trying to move
in its current root. This fixes the bug where this would fail
btrfs subvol create test1
btrfs subvol create test2
mv test1 test2.
Thanks to Chris Murphy for catching this,
Cc: stable@vger.kernel.org
Reported-by: Chris Murphy <lists@colorremedies.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
HAVE_ARCH_DEVTREE_FIXUPS appears to always be needed except for sparc,
but it is only used for /proc/device-teee and sparc does not enable
/proc/device-tree. So this option is redundant. Remove the option and
always enable it. This has the side effect of fixing /proc/device-tree
on arches such as arm64 which failed to define this option.
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Grant Likely <grant.likely@linaro.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
It is possible for a task in a numa group to call exec, and
have the new (unrelated) executable inherit the numa group
association from its former self.
This has the potential to break numa grouping, and is trivial
to fix.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-51-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It is desirable to model from userspace how the scheduler groups tasks
over time. This patch adds an ID to the numa_group and reports it via
/proc/PID/status.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-45-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The xfs_inactive() return value is meaningless. Turn xfs_inactive()
into a void function and clean up the error handling appropriately.
Kill the VN_INACTIVE_[NO]CACHE directives as they are not relevant
to Linux.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Push the inode free work performed during xfs_inactive() down into
a new xfs_inactive_ifree() helper. This clears xfs_inactive() from
all inode locking and transaction management more directly
associated with freeing the inode xattrs, extents and the inode
itself.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Create the new xfs_inactive_truncate() function to handle the
truncate portion of xfs_inactive(). Push the locking and
transaction management into the new function.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Push down the transaction management for remote symlinks from
xfs_inactive() down to xfs_inactive_symlink_rmt(). The latter is
cleaned up to avoid transaction management intended for the
calling context (i.e., trans duplication, reservation, item
attachment).
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Add the inode type directory type support to XFS_IOC_FSGEOM
so that xfs_repair/xfs_info knows if the superblock v4 filesystem
enabled the feature.
Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Previously, there was a erroneous scenario like below.
thread 1: thread 2:
f2fs_unlink
- acquire_orphan_inode
: sbi->n_orphans++ write_checkpoint
- block_operations
: f2fs_lock_all
- do_checkpoint
: write orphan blocks with sbi->n_orphans
- unblock_operations
- f2fs_lock_op
- release_orphan_inode
- f2fs_unlock_op
During the checkpoint by thread 2, f2fs stores a wrong orphan block according
to the wrong sbi->n_orphans.
To avoid this, simply we should make cover acquire_orphan_inode too with
f2fs_lock_op.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
During the f2fs_put_super procedure, we don't need to conduct checkpoint all
the time, since we don't need to do that if superblock is clean.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This allows users to use LANMAN authentication on servers which support
unencapsulated authentication.
The patch fixes a regression where users using plaintext authentication
were no longer able to do so because of changed bought in by patch
3f618223dchttps://bugzilla.redhat.com/show_bug.cgi?id=1011621
Reported-by: Panos Kavalagios <Panagiotis.Kavalagios@eurodyn.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
When connecting to SMB2/3 shares, maximum file size is set to non-LFS maximum in superblock. This is due to cap_large_files bit being different for SMB1 and SMB2/3 (where it is just an internal flag that is not negotiated and the SMB1 one corresponds to multichannel capability, so maybe LFS works correctly if server sends 0x08 flag) while capabilities are checked always for the SMB1 bit in cifs_read_super().
The patch fixes this by checking for the correct bit according to the protocol version.
CC: Stable <stable@kernel.org>
Signed-off-by: Jan Klos <honza.klos@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
The current f2fs code errors if the xattr or acl options are passed when
remounting. This is important in a typical scenario where f2fs is mounted
as a "ro" root file-system by the boot loader and then the init process wants
to remount it "rw" with the "remount,rw" option.
Signed-off-by: Kelly Anderson <kelly@xilka.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
The fs_locks is used to block other ops(ex, recovery) when doing checkpoint.
And each other operate routine(besides checkpoint) needs to acquire a fs_lock,
there is a terrible problem here, if these are too many concurrency threads acquiring
fs_lock, so that they will block each other and may lead to some performance problem,
but this is not the phenomenon we want to see.
Though there are some optimization patches introduced to enhance the usage of fs_lock,
but the thorough solution is using a *rw_sem* to replace the fs_lock.
Checkpoint routine takes write_sem, and other ops take read_sem, so that we can block
other ops(ex, recovery) when doing checkpoint, and other ops will not disturb each other,
this can avoid the problem described above completely.
Because of the weakness of rw_sem, the above change may introduce a potential problem
that the checkpoint thread might get starved if other threads are intensively locking
the read semaphore for I/O.(Pointed out by Xu Jin)
In order to avoid this, a wait_list is introduced, the appending read semaphore ops
will be dropped into the wait_list if checkpoint thread is waiting for write semaphore,
and will be waked up when checkpoint thread gives up write semaphore.
Thanks to Kim's previous review and test, and will be very glad to see other guys'
performance tests about this patch.
V2:
-fix the potential starvation problem.
-use more suitable func name suggested by Xu Jin.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
[Jaegeuk Kim: adjust minor coding standard]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Do not send SMB2 Logoff command when reconnecting, the way smb1
code base works.
Also, no need to wait for a credit for an echo command when one is already
in flight.
Without these changes, umount command hangs if the server is unresponsive
e.g. hibernating.
Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@us.ibm.com>
Windows 8 and later can create NFS symlinks (within reparse points)
which we were assuming were normal NTFS symlinks and thus reporting
corrupt paths for. Add check for reparse points to make sure that
they really are normal symlinks before we try to parse the pathname.
We also should not be parsing other types of reparse points (DFS
junctions etc) as if they were a symlink so return EOPNOTSUPP
on those. Also fix endian errors (we were not parsing symlink
lengths as little endian).
This fixes commit d244bf2dfb
which implemented follow link for non-Unix CIFS mounts
CC: Stable <stable@kernel.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
With the previous changes, sysfs regular file code is ready to handle
bin files too. This patch makes bin files share the regular file
path.
* sysfs_create/remove_bin_file() are moved to fs/sysfs/file.c.
* sysfs_init_inode() is updated to use the new sysfs_bin_operations
instead of bin_fops for bin files.
* fs/sysfs/bin.c and the related pieces are removed.
This patch shouldn't introduce any behavior difference to bin file
accesses.
Overall, this unification reduces the amount of duplicate logic, makes
behaviors more consistent and paves the road for building simpler and
more versatile interface which will allow other subsystems to make use
of sysfs for their pseudo filesystems.
v2: Stale fs/sysfs/bin.c reference dropped from
Documentation/DocBook/filesystems.tmpl. Reported by kbuild test
robot.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kay Sievers <kay@vrfy.org>
Cc: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sysfs bin file handling will be merged into the regular file support.
This patch prepares the open path.
This patch updates sysfs_open_file() such that it can handle both
regular and bin files.
This is a preparation and the new bin file path isn't used yet.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sysfs bin file handling will be merged into the regular file support.
This patch copies mmap support from bin so that fs/sysfs/file.c can
handle mmapping bin files.
The code is copied mostly verbatim with the following updates.
* ->mmapped and ->vm_ops are added to sysfs_open_file and bin_buffer
references are replaced with sysfs_open_file ones.
* Symbols are prefixed with sysfs_.
* sysfs_unmap_bin_file() grabs sysfs_open_dirent and traverses
->files. Invocation of this function is added to
sysfs_addrm_finish().
* sysfs_bin_mmap() is added to sysfs_bin_operations.
This is a preparation and the new mmap path isn't used yet.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sysfs bin file handling will be merged into the regular file support.
This patch prepares the read path.
Copy fs/sysfs/bin.c::read() to fs/sysfs/file.c and make it use
sysfs_open_file instead of bin_buffer. The function is identical copy
except for the use of sysfs_open_file.
The new function is added to sysfs_bin_operations. This isn't used
yet but will eventually replace fs/sysfs/bin.c.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sysfs bin file handling will be merged into the regular file support.
This patch prepares the write path.
bin file write is almost identical to regular file write except that
the write length is capped by the inode size and @off is passed to the
write method. This patch adds bin file handling to sysfs_write_file()
so that it can handle both regular and bin files.
A new file_operations struct sysfs_bin_operations is added, which
currently only hosts sysfs_write_file() and generic_file_llseek().
This isn't used yet but will eventually replace fs/sysfs/bin.c.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
read() is simple enough and fill_read() being in a separate function
doesn't add anything. Let's collapse it into read(). This will make
merging bin file handling with regular file.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After b31ca3f5df ("sysfs: fix deadlock"), bin read() first writes
data to bb->buffer and bounces it to a transient kernel buffer which
is then copied out to userland. The double bouncing doesn't add
anything. Let's just use the transient buffer directly.
While at it, rename @temp to @buf for clarity.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sysfs read path implements its own buffering scheme between userland
and kernel callbacks, which essentially is a degenerate duplicate of
seq_file. This patch replaces the custom read buffering
implementation in sysfs with seq_file.
While the amount of code reduction is small, this reduces low level
hairiness and enables future development of a new versatile API based
on seq_file so that sysfs features can be shared with other
subsystems.
As write path was already converted to not use sysfs_open_file->page,
this patch makes ->page and ->count unused and removes them.
Userland behavior remains the same except for some extreme corner
cases - e.g. sysfs will now regenerate the content each time a file is
read after a non-contiguous seek whereas the original code would keep
using the same content. While this is a userland visible behavior
change, it is extremely unlikely to be noticeable and brings sysfs
behavior closer to that of procfs.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There isn't much to be gained by keeping around kernel buffer while a
file is open especially as the read path planned to be converted to
use seq_file and won't use the buffer. This patch makes
sysfs_write_file() use per-write transient buffer instead of
sysfs_open_file->page.
This simplifies the write path, enables removing sysfs_open_file->page
once read path is updated and will help merging bin file write path
which already requires the use of a transient buffer due to a locking
order issue.
As the function comments of flush_write_buffer() and
sysfs_write_buffer() are being updated anyway, reformat them so that
they're more conventional.
v2: Use min_t() instead of min() in sysfs_write_file() to avoid build
warning on arm. Reported by build test robot.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sysfs will be converted to use seq_file for read path, which will make
it difficult to pass around multiple pointers directly. This patch
adds sysfs_open_file->sd and ->file so that we can reach all the
necessary data structures from sysfs_open_file.
flush_write_buffer() is updated to drop @dentry which was used to
discover the sysfs_dirent as it's now available through
sysfs_open_file->sd.
This patch doesn't cause any behavior difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sysfs read path will be converted to use seq_file which will handle
buffering making sysfs_buffer a misnomer. Rename sysfs_buffer to
sysfs_open_file, and sysfs_open_dirent->buffers to ->files.
This path is pure rename.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add a separate mutex to protect sysfs_open_dirent->buffers list. This
will allow performing sleepable operations while traversing
sysfs_buffers, which will be renamed to sysfs_open_file.
Note that currently sysfs_open_dirent->buffers list isn't being used
for anything and this patch doesn't make any functional difference.
It will be used to merge regular and bin file supports.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, sysfs_ops is fetched during sysfs_open_file() and cached in
sysfs_buffer->ops to be used while the file is open. This patch
removes the caching and makes each operation directly fetch sysfs_ops.
This patch doesn't introduce any behavior difference and is to prepare
for merging regular and bin file supports.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull btrfs fixes from Chris Mason:
"This is a small collection of fixes, including a regression fix from
Liu Bo that solves rare crashes with compression on.
I've merged my for-linus up to 3.12-rc3 because the top commit is only
meant for 3.12. The rest of the fixes are also available in my master
branch on top of my last 3.11 based pull"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: Fix crash due to not allocating integrity data for a bioset
Btrfs: fix a use-after-free bug in btrfs_dev_replace_finishing
Btrfs: eliminate races in worker stopping code
Btrfs: fix crash of compressed writes
Btrfs: fix transid verify errors when recovering log tree
->needs_read_fill is used to implement the following behaviors.
1. Ensure buffer filling on the first read.
2. Force buffer filling after a write.
3. Force buffer filling after a successful poll.
However, #2 and #3 don't really work as sysfs doesn't reset file
position. While the read buffer would be refilled, the next read
would continue from the position after the last read or write,
requiring an explicit seek to the start for it to be useful, which
makes ->needs_read_fill superflous as read buffer is always refilled
if f_pos == 0.
Update sysfs_read_file() to test buffer->page for #1 instead and
remove ->needs_read_fill. While this changes behavior in extreme
corner cases - e.g. re-reading a sysfs file after seeking to non-zero
position after a write or poll, it's highly unlikely to lead to actual
breakage. This change is to prepare for using seq_file in the read
path.
While at it, reformat a comment in fill_write_buffer().
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>