Commit graph

41877 commits

Author SHA1 Message Date
Eric W. Biederman
90f8572b0f vfs: Commit to never having exectuables on proc and sysfs.
Today proc and sysfs do not contain any executable files.  Several
applications today mount proc or sysfs without noexec and nosuid and
then depend on there being no exectuables files on proc or sysfs.
Having any executable files show on proc or sysfs would cause
a user space visible regression, and most likely security problems.

Therefore commit to never allowing executables on proc and sysfs by
adding a new flag to mark them as filesystems without executables and
enforce that flag.

Test the flag where MNT_NOEXEC is tested today, so that the only user
visible effect will be that exectuables will be treated as if the
execute bit is cleared.

The filesystems proc and sysfs do not currently incoporate any
executable files so this does not result in any user visible effects.

This makes it unnecessary to vet changes to proc and sysfs tightly for
adding exectuable files or changes to chattr that would modify
existing files, as no matter what the individual file say they will
not be treated as exectuable files by the vfs.

Not having to vet changes to closely is important as without this we
are only one proc_create call (or another goof up in the
implementation of notify_change) from having problematic executables
on proc.  Those mistakes are all too easy to make and would create
a situation where there are security issues or the assumptions of
some program having to be broken (and cause userspace regressions).

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-07-10 10:39:25 -05:00
Joe Perches
a28e4b2b18 hpfs: hpfs_error: Remove static buffer, use vsprintf extension %pV instead
Removing unnecessary static buffers is good.
Use the vsprintf %pV extension instead.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Mikulas Patocka <mikulas@twibright.com>
Cc: stable@vger.kernel.org      # v2.6.36+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-09 13:35:31 -07:00
Sanidhya Kashyap
ce657611ba hpfs: kstrdup() out of memory handling
There is a possibility of nothing being allocated to the new_opts in
case of memory pressure, therefore return ENOMEM for such case.

Signed-off-by: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Signed-off-by: Mikulas Patocka <mikulas@twibright.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-09 13:35:31 -07:00
Firo Yang
d7b04097c2 hpfs: Remove unessary cast
Avoid a pointless kmem_cache_alloc() return value cast in
fs/hpfs/super.c::hpfs_alloc_inode()

Signed-off-by: Firo Yang <firogm@gmail.com>
Signed-off-by: Mikulas Patocka <mikulas@twibright.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-09 13:35:31 -07:00
Mikulas Patocka
a27b5b97d6 hpfs: add fstrim support
This patch adds support for fstrim to the HPFS filesystem.

Signed-off-by: Mikulas Patocka <mikulas@twibright.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-09 13:35:30 -07:00
Mikulas Patocka
9abea2d64c ioctl_compat: handle FITRIM
The FITRIM ioctl has the same arguments on 32-bit and 64-bit
architectures, so we can add it to the list of compatible ioctls and
drop it from compat_ioctl method of various filesystems.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ted Ts'o <tytso@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-09 11:42:21 -07:00
Steven J. Magnani
70f19f5869 udf: Don't corrupt unalloc spacetable when writing it
For a UDF filesystem configured with an Unallocated Space Table,
a filesystem operation that triggers an update to the table results
in on-disk corruption that prevents remounting:

  udf_read_tagged: tag version 0x0000 != 0x0002 || 0x0003, block 274

For example:
  1. Create a filesystem
      $ mkudffs --media-type=hd --blocksize=512 --lvid=BUGTEST \
              --vid=BUGTEST --fsid=BUGTEST --space=unalloctable \
              /dev/mmcblk0

  2. Mount it
      # mount /dev/mmcblk0 /mnt

  3. Create a file
      $ echo "No corruption, please" > /mnt/new.file

  4. Umount
      # umount /mnt

  5. Attempt remount
      # mount /dev/mmcblk0 /mnt

This appears to be a longstanding bug caused by zero-initialization of
the Unallocated Space Entry block buffer and only partial repopulation
of required fields before writing to disk.

Commit 0adfb339fd64 ("udf: Fix unalloc space handling in udf_update_inode")
addressed one such field, but several others are required.

Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: Jan Kara <jack@suse.com>
2015-07-09 16:38:57 +02:00
Trond Myklebust
690edcfad0 NFSv4.2/flexfiles: Fix a typo in the flexfiles layoutstats code
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-08 20:25:41 +02:00
Al Viro
4e317ce73a ufs_inode_get{frag,block}(): get rid of 'phys' argument
Just pass NULL as locked_page in case of first block in the indirect
chain.  Old calling conventions aside, a reason for having 'phys'
was that ufs_inode_getfrag() used to be able to do _two_ allocations
- indirect block and extending/reallocating a tail.  We needed
locked_page for the latter (it's a data), but we also needed to
figure out that indirect block is metadata.  So we used to pass
non-NULL locked_page in all cases *and* used NULL phys as
indication of being asked to allocate an indirect.

With tail unpacking taken into a separate function we don't need
those convolutions anymore.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:40:05 -04:00
Al Viro
0385f1f9e3 ufs_getfrag_block(): tidy up a bit
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:40:04 -04:00
Al Viro
5fbfb238f7 ufs_inode_getblock(): failure to read an indirect block is -EIO
... and not "write to beginning of the disk", TYVM...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:40:03 -04:00
Al Viro
4eeff4c932 ufs_getfrag_block(): turn following indirects into a loop
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:40:02 -04:00
Al Viro
5336970be0 ufs_inode_getfrag(): pass index instead of 'fragment'
same story as with ufs_inode_getblock()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:40:01 -04:00
Al Viro
0f3c1294be ufs_inode_getfrag(): split extending the partial blocks off
ufs_extend_tail() is handling that now.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:40:00 -04:00
Al Viro
619cfac091 ufs_inode_getblock(): pass indirect block number and full index
... instead of messing with buffer_head.  We can bloody well do
sb_bread() in there.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:59 -04:00
Al Viro
721435a767 ufs_inode_getblock(): pass index instead of 'fragment'
The value passed to ufs_inode_getblock() as the 3rd argument
had lower bits ignored; the upper bits were shifted down
and used and they actually make sense - those are _lower_ bits
of index in indirect block (i.e. they form the index within
a fragment within an indirect block).

Pass those as argument.  Upper bits of index (i.e. the number
of fragment within indirect block) will join them shortly.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:58 -04:00
Al Viro
177848a018 ufs_inode_get{frag,block}(): leave sb_getblk() to caller
just return the damn block number

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:57 -04:00
Al Viro
8d9dcf1436 ufs_getfrag_block(): get rid of macro jungles
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:56 -04:00
Al Viro
bbb3eb9d34 ufs_inode_get{frag,block}(): consolidate success exits
These calling conventions are rudiments of pre-2.3 times; they
really need to be sanitized.  This is the first step; next
will be _always_ returning a block number, instead of this
"return a pointer to buffer_head, except when we get to the
actual data" crap.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:55 -04:00
Al Viro
71dd42846f ufs: use the branch depth in ufs_getfrag_block()
we'd already calculated it...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:54 -04:00
Al Viro
4b7068c8b1 ufs: move calculation of offsets into ufs_getfrag_block()
... and massage ufs_frag_map() to take those instead of fragment number.

As it is, we duplicate the damn thing on the write side, open-coded and
bloody hard to follow.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:53 -04:00
Al Viro
5a39c25562 ufs_inode_get{frag,block}(): get rid of retries
We are holding ->truncate_mutex, so nobody else can alter our
block pointers.  Rechecks/retries were needed back when we
only held BKL there, and had to cope with write_begin/writepage
and writepage/truncate races.  Can't happen anymore...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:52 -04:00
Al Viro
f53bd1421b __ufs_truncate_blocks(): avoid excessive dirtying of indirect blocks
There's a case when an indirect block gets dirtied for no good
reason - when there's a hole starting in the middle of area
covered by it and spanning past its end, and truncate() is done
precisely to the beginning of the hole.

The block is obviously not modified at all - all removals happen
beyond it.  However, existing code ends up dirtying it just in
case.  It's trivial to fix and while it's not a real bug by any
stretch of imagination, it makes the damn thing harder to follow.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:51 -04:00
Al Viro
cc7231e309 free_full_branch(): don't bother modifying the block we are going to free
Note that it's already made unreachable from the inode, so we don't have
to worry about ufs_frag_map() walking into something already freed.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:50 -04:00
Al Viro
b6eede0ec6 move marking inode dirty to the end of __ufs_truncate_blocks()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:49 -04:00
Al Viro
163073db51 free_full_branch(): saner calling conventions
Have caller fetch the block number *and* remove it from wherever
it was.  Pass the block number instead.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:48 -04:00
Al Viro
7b4e4f7f81 ufs_trunc_branch(): kill recursion
turn recursion into a pair of loops

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:47 -04:00
Al Viro
6aab6dd379 ufs_trunc_branch(): massage towards killing recursion
We always have 0 < depth2 <= depth in there, so
if (--depth) {
	if (--depth2)
		A
	B
} else {
	C // not using depth2
}
D // not using depth2

is equivalent to

if (--depth2)
	A with s/depth/depth - 1/
if (--depth)
	B
else
	C
D

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:46 -04:00
Al Viro
6d1ebbca2b split ufs_truncate_branch() into full- and partial-branch variants
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:45 -04:00
Al Viro
a138b4b688 ufs: unify the logics for collecting adjacent data blocks to free
open-coded in several places...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:44 -04:00
Al Viro
a96574233c ufs_trunc_branch(): separate the calls with non-NULL offsets
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:43 -04:00
Al Viro
97e0f8f87c ufs_trunc_branch(): never call with offsets != NULL && depth2 == 0
For calls in __ufs_truncate_blocks() it's just a matter of not
incrementing offsets[0] and not making that call - immediately
following loop will be executed one extra time and we'll be just
fine.  For recursive call in ufs_trunc_branch() itself, just
assing NULL to offsets if we would be about to make such call.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:42 -04:00
Al Viro
42432739b5 __ufs_trunc_blocks(): turn the part after switch into a loop
... and turn the switch into if (), since all cases with
depth != 1 have just become identical.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:41 -04:00
Al Viro
ef3a315d4c __ufs_truncate_blocks(): unify freeing the full branches
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:40 -04:00
Al Viro
9e0fbbde27 unify ufs_trunc_..indirect()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:39 -04:00
Al Viro
6775e24d9c ufs_trunc_..indirect(): more massage towards unifying
Instead of manually checking that the array contains only zeroes,
find the position of the last non-zero (in __ufs_truncate(), where
we can conveniently do that) and use that to tell if there's
any non-zero in the array tail passed to ufs_trunc_...indirect().

The goal of all that clumsiness is to get fold these functions
together.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:38 -04:00
Al Viro
85416288bf ufs_trunc_...indirect(): pass the array of indices instead of offsets
rather than bitslicing the offset just formed as sum of shifted indices,
pass the array of those indices itself.  NULL is used as equivalent
of "all zeroes" (== free the entire branch).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:37 -04:00
Al Viro
7a4fdda724 __ufs_truncate(); find cutoff distances into branches by offsets[] array
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:36 -04:00
Al Viro
7bad5939fc ufs_trunc_dindirect(): pass the number of blocks to keep
same as the previous two.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:35 -04:00
Al Viro
6ac36b8777 ufs_trunc_indirect(): pass the index of the first pointer to free
... instead of file offset.  Same cleanups as in the tindirect
conversion in previous commit.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:34 -04:00
Al Viro
18ca51d821 ufs_trunc_tindirect(): pass the number of blocks to keep
IOW, the distance of cutoff from the begining of the branch
(in blocks).

That (and the fact that block just prior to cutoff is guaranteed to
be present) allows to tell whether to free triple indirect block
just by looking at the offset.

While we are at it, using u64 for index in the block is wrong -
those should be unsigned int.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:33 -04:00
Al Viro
31cd043e1a ufs: beginning of __ufs_truncate_block() massage
Use ufs_block_to_path() to find the cutoff path in the block pointers' tree.
For now just use the information about the depth (to bypass the fully
preserved subtrees); subsequent commits will use the information about actual
path.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:32 -04:00
Al Viro
4e3911f3d7 ufs: the offsets ufs_block_to_path() puts into array are not sector_t
type makes no sense - those are indices in block number arrays, not
block numbers.  And no, UFS is not likely to grow indirect blocks with
4Gpointers in them...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:31 -04:00
Al Viro
010d331fc3 ufs: move truncate code into inode.c
It is closely tied to block pointers handling there, can benefit
from existing helpers, etc. - no point keeping them apart.

Trimmed the trailing whitespaces in inode.c at the same time.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:30 -04:00
Al Viro
0d23cf7616 ufs: no retries are needed on truncate
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:29 -04:00
Al Viro
687857930d ufs: ufs_trunc_...() has exclusion with everything that might cause allocations
Currently - on lock_ufs(), eventually - on per-inode mutex.
lock_ufs() used to be mere BKL, which is much weaker, so it needed
those rechecks.  BKL doesn't provide any exclusion once we lose CPU;
its blind replacement, OTOH, _does_.  Making that per-filesystem was
an atrocity, but at least we can simplify life here.  And yes, we
certainly need to make that sucker per-inode - these days inode.c and
truncate.c uses are needed only to protect the block pointers.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:28 -04:00
Al Viro
6a799d3514 ufs: ufs_trunc_direct() always returns 0
make it return void

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:27 -04:00
Al Viro
dff7cfd36e ufs: kill lock_ufs()
There were 3 remaining users; in two of them we took ->s_lock immediately
after lock_ufs() and held it until just before unlock_ufs(); the third
one (statfs) could not be called from itself or from other two (remount
and sync_fs).  Just use ->s_lock in statfs and don't bother with lock_ufs
at all.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:26 -04:00
Al Viro
724bb09fdc ufs: don't use lock_ufs() for block pointers tree protection
* stores to block pointers are under per-inode seqlock (meta_lock) and
mutex (truncate_mutex)
* fetches of block pointers are either under truncate_mutex, or wrapped
into seqretry loop on meta_lock
* all changes of ->i_size are under truncate_mutex and i_mutex
* all changes of ->i_lastfrag are under truncate_mutex

It's similar to what ext2 is doing; the main difference is that unlike
ext2 we can't rely upon the atomicity of stores into block pointers -
on UFS2 they are 64bit.  So we can't cut the corner when switching
a pointer from NULL to non-NULL as we could in ext2_splice_branch()
and need to use meta_lock on all modifications.

We use seqlock where ext2 uses rwlock; ext2 could probably also benefit
from such change...

Another non-trivial difference is that with UFS we *cannot* have reader
grab truncate_mutex in case of race - it has to keep retrying.  That
might be possible to change, but not until we lift tail unpacking
several levels up in call chain.

After that commit we do *NOT* hold fs-wide serialization on accesses
to block pointers anymore.  Moreover, lock_ufs() can become a normal
mutex now - it's only used on statfs, remount and sync_fs and none
of those uses are recursive.  As the matter of fact, *now* it can be
collapsed with ->s_lock, and be eventually replaced with saner
per-cylinder-group spinlocks, but that's a separate story.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:25 -04:00
Al Viro
4af7b2c080 ufs: bforget() indirect blocks before freeing them
right now it doesn't matter (lock_ufs() serializes everything),
but when we switch to per-inode locking, it will be needed.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:24 -04:00