In normal cases, we would not be allowed to do balance in RO mode.
However, when we're using a seeding device and adding another device to sprout,
things will change:
$ mkfs.btrfs /dev/sdb7
$ btrfstune -S 1 /dev/sdb7
$ mount /dev/sdb7 /mnt/btrfs -o ro
$ btrfs fi bal /mnt/btrfs -----------------------> fail.
$ btrfs dev add /dev/sdb8 /mnt/btrfs
$ btrfs fi bal /mnt/btrfs -----------------------> works!
It should not be designed as an exception, and we'd better add another check for
mnt flags.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
Fully utilize our extent state's new helper functions to use
fastpath as much as possible.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
Reproduce:
$ mkfs.btrfs /dev/sdb7
$ mount /dev/sdb7 /mnt/btrfs -o ro
$ btrfs dev add /dev/sdb8 /mnt/btrfs
ERROR: error adding the device '/dev/sdb8' - Invalid argument
Since we mount with readonly options, and /dev/sdb7 is not a seeding one,
a readonly notification is preferred.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
We noticed that the ordered extent completion doesn't really rely on having
a page and that it could be done independantly of ending the writeback on a
page. This patch makes us not do the threaded endio stuff for normal
buffered writes and direct writes so we can end page writeback as soon as
possible (in irq context) and only start threads to do the ordered work when
it is actually done. Compression needs to be reworked some to take
advantage of this as well, but atm it has to do a find_get_page in its endio
handler so it must be done in its own thread. This makes direct writes
quite a bit faster. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
We are checking delalloc to see if it is ok to update the i_size. There are
2 cases it stops us from updating
1) If there is delalloc between our current disk_i_size and this ordered
extent
2) If there is delalloc between our current ordered extent and the next
ordered extent
These tests are racy however since we can set delalloc for these ranges at
any time. Also for the first case if we notice there is delalloc between
disk_i_size and our ordered extent we will not update disk_i_size and assume
that when that delalloc bit gets written out it will update everything
properly. However if we crash before that we will have file extents outside
of our i_size, which is not good, so this test is dangerous as well as racy.
Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
There is an off-by-one error: allocating room for a maximal result
string but without room for a trailing NUL. That, can lead to
returning a transformed string that is not NUL-terminated, and
then to a caller reading beyond end of the malloc'd buffer.
Rewrite to s/kzalloc/kmalloc/, remove unwarranted use of strncpy
(the result is guaranteed to fit), remove dead strlen at end, and
change a few variable names and comments.
Reviewed-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Jim Meyering <meyering@redhat.com>
A device with name of length BTRFS_DEVICE_PATH_NAME_MAX or longer
would not be NUL-terminated in the DEV_INFO ioctl result buffer.
Signed-off-by: Jim Meyering <meyering@redhat.com>
The buffer read-overrun would be triggered by a printk format
starting with <N>, where N is a single digit. NUL-terminate
after strncpy. Use memcpy, not strncpy, since we know the
string we're copying fits in the destination buffer and
contains no NUL byte.
Signed-off-by: Jim Meyering <meyering@redhat.com>
Changing 'mount -oremount,thread_pool=2 /' didn't make any effect:
maximum amount of worker threads is specified in 2 places:
- in 'strict btrfs_fs_info::thread_pool_size'
- in each worker struct: 'struct btrfs_workers::max_workers'
'mount -oremount' updated only 'btrfs_fs_info::thread_pool_size'.
Fix it by pushing new maximum value to all created worker structures
as well.
Cc: Josef Bacik <josef@redhat.com>
Cc: Chris Mason <chris.mason@oracle.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
We already do the btrfs_wait_ordered_range which will do this for us, so
just remove this call so we don't call it twice. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
In btrfs_wait_ordered_range we have been calling filemap_fdata_write() twice
because compression does strange things and then waiting. Then we look up
ordered extents and if we find any we will always schedule_timeout(); once
and then loop back around and do it all again. We will even check to see if
there is delalloc pages on this range and loop again. So this patch gets
rid of the multipe fdata_write() calls and just does
filemap_write_and_wait(). In the case of compression we will still find the
ordered extents and start those individually if we need to so that is ok,
but in the normal buffered case we avoid all this weird overhead.
Then in the case of the schedule_timeout(1), we don't need it. All callers
either 1) don't care, they just want to make sure what they just wrote maeks
it to disk or 2) are doing the lock()->lookup ordered->unlock->flush thing
in which case it will lock and check for ordered extents _anyway_ so get
back to them as quickly as possible. The delaloc check is simply not
needed, this only catches the case where we write to the file again since
doing the filemap_write_and_wait() and if the caller truly cares about that
it will take care of everything itself. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
These warnings are bogus since we will always have at least one page in an
eb, but to make the compiler happy just set ret = 0 in these two cases.
Thanks,
Btrfs: fix compile warnings in extent_io.c
These warnings are bogus since we will always have at least one page in an
eb, but to make the compiler happy just set ret = 0 in these two cases.
Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
When running compilebench I noticed we were spending some time looking up
acls on new inodes, which shouldn't be happening since there were no acls.
This is because when we init acls on the inode after creating them we don't
cache the fact there are no acls if there aren't any. Doing this adds a
little bit of a bump to my compilebench runs. Thanks,
Btrfs: cache no acl on new inodes
Signed-off-by: Josef Bacik <josef@redhat.com>
We've been keeping around the inode sequence number in hopes that somebody
would use it, but nobody uses it and people actually use i_version which
serves the same purpose, so use i_version where we used the incore inode's
sequence number and that way the sequence is updated properly across the
board, and not just in file write. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
When a fresh transaction begins, the tree mod log must be clean. Users of
the tree modification log must ensure they never span across transaction
boundaries.
We reset the sequence to 0 in this safe situation to make absolutely sure
overflow can't happen.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
This enables backref resolving on life trees while they are changing. This
is a prerequisite for quota groups and just nice to have for everything
else.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
The tree modification log together with the current state of the tree gives
a consistent, old version of the tree. btrfs_search_old_slot is used to
search through this old version and return old (dummy!) extent buffers.
Naturally, this function cannot do any tree modifications.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Record all relevant modifications to block pointers in the tree mod log so
that we can rewind them later on for backref walking.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
When running functions that can make changes to the internal trees
(e.g. btrfs_search_slot), we check if somebody may be interested in the
block we're currently modifying. If so, we record our modification to be
able to rewind it later on.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
The tree mod log will log modifications made fs-tree nodes. Most
modifications are done by autobalance of the tree. Such changes are recorded
as long as a block entry exists. When released, the log is cleaned.
With the tree modification log, it's possible to reconstruct a consistent
old state of the tree. This is required to do backref walking on a busy
file system.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
FSI DMAEngine has to be stopped certainly at the start/stop time.
Without this patch, it will include noise on playback.
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
FSI driver is using dma_sync_single_xxx(),
but the dma area was not correct.
This patch fix it up.
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Commit c751085943
Author: Rafael J. Wysocki <rjw@sisk.pl>
Date: Sun Apr 12 20:06:56 2009 +0200
PM/Hibernate: Wait for SCSI devices scan to complete during resume
Broke the scsi_wait_scan module in 2.6.30. Apparently debian still uses it so
fix it and backport to stable before removing it in 3.6.
The breakage is caused because the function template in
include/scsi/scsi_scan.h is defined to be a nop unless SCSI is built in.
That means that in the modular case (which is every distro), the
scsi_wait_scan module does a simple async_synchronize_full() instead of
waiting for scans.
Cc: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
DMA stream handler didn't care about master clock.
This patch fixes it up.
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Commit a7a20d1 "[SCSI] sd: limit the scope of the async probe domain"
moved sd probe work out of reach of wait_for_device_probe(). Allow it
to be synced via scsi_complete_async_scans().
Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
This patch should go into 3.5 fixes. The bug was added in the
patches for the 3.5 feature window.
As you can see from the patch I made a mistake. During
development I switched from passing a struct to the size of
the struct, but left the sizeof. This results in us allocating
4 bytes (sizeof(int)) but then calling pci_free_consistent
with the size of the struct.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Added support to capture dump (Minidump) which allows us to
catpure a snapshot of the firmware/hardware states at the time
of firmware failure
Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>
Signed-off-by: Shyam Sundar <shyam.sundar@qlogic.com>
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
change_queue_depth will adjust device queuedepth upon receiving
"SAM_STAT_TASK_SET_FULL" scsi status from the target.
Also added ql4xqfulltracking command line param to enable or disable
queuefull tracking. One can disabling queuefull tracking to ensure
user set scsi device queuedepth is not altered.
Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Allow ddb state to change to DDB_DS_NO_CONNECTION_ACTIVE or
DDB_DS_SESSION_FAILED before issuing clear ddb mailbox cmd,
because clear ddb mailbox cmd fails if the ddb state is not
equal to DDB_DS_NO_CONNECTION_ACTIVE or DDB_DS_SESSION_FAILED.
Signed-off-by: Manish Rangankar <manish.rangankar@qlogic.com>
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Function pat_pagerange_is_ram() scales poorly to large address
ranges, because it probes the resource tree for each page.
On a 2.6 GHz Opteron, this function consumes 34 ms for a 1 GB range.
It is called twice during untrack_pfn_vma(), slowing process
cleanup and handicapping the OOM killer.
This replacement consumes less than 1ms, under the same conditions.
Signed-off-by: John Dykstra <jdykstra@cray.com> on behalf of Cray Inc.
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1337980366.1979.6.camel@redwood
[ Small stylistic cleanups and renames ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Update the session and connection parameter before sending
connection logged in event to iscsiadm because in some
scenario logout may come in just after we send the logged
in event to user, which free up session, connection and ddb,
but DPC is still updating session and connect parameter
which can lead to panic.
Signed-off-by: Manish Rangankar <manish.rangankar@qlogic.com>
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Since there are uses for I2C_M_NOSTART which are much more sensible and
standard than most of the protocol mangling functionality (the main one
being gather writes to devices where something like a register address
needs to be inserted before a block of data) create a new I2C_FUNC_NOSTART
for this feature and update all the users to use it.
Also strengthen the disrecommendation of the protocol mangling while we're
at it.
In the case of regmap-i2c we remove the requirement for mangling as
I2C_M_NOSTART is the only mangling feature which is being used.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
As the bus driver side implementation of I2C_M_RECV_LEN is heavily
tied to SMBus, we can't support received length over 32 bytes, but
let's at least support that.
In practice, the caller will have to setup a buffer large enough to
cover the case where received length byte has value 32, so minimum
32 + 1 = 33 bytes, possibly more if there is a fixed number of bytes
added for the specific slave (for example a checksum.)
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Check for Firmware Hang (AF_FW_RECOVERY) after mailbox command
has gained access to ensure that the mailbox command does not
wait un-necessarily during a firmware recovery and prevent
premature mailbox timeout which will lead to back to back reset's.
Signed-off-by: Lalit Chandivade <lalit.chandivade@qlogic.com>
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
On s390 access_ok is a macro which discards all parameters and always
returns 1. This can result in compile warnings which warn about unused
variables like this:
fs/read_write.c: In function 'rw_copy_check_uvector':
fs/read_write.c:684:16: warning: unused variable 'buf' [-Wunused-variable]
Fix this by adding a __range_ok() function which consumes all parameters
but still always returns 1.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Now that hopefully all cmpxchg/xchg bugs have been fixed select
HAVE_CMPXCHG_LOCAL option which uncovered a couple of bugs on s390.
The only call site which is affected seems to be within mm/vmstat.c.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
For 1 and 2 byte operands for xchg and cmpxchg the old and new values
get or'ed into the larger 4 byte old value before the compare and swap
instruction gets executed. This is done without using the proper byte
mask before or'ing the values.
If the caller passed in negative old or new values these got sign
extended by the caller. Which in turn means that either the old value
never matches, or, even worse, unrelated bytes would be changed in memory.
Luckily there don't seem to be any callers around yet, since that would
have resulted in the specification exception fixed in an earlies patch.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When accessing a 1 or 2 byte memory operand we cannot use the
passed address since the compare and swap instruction only works
for 4 byte aligned memory operands.
Hence we calculate an aligned address so that compare and swap works
correctly. However we don't pass the calculated address to the inline
assembly. This results in incorrect memory accesses and in a
specification exception if used on non 4 byte aligned memory operands.
Since this didn't happen until now, there don't seem to be
too many users of cmpxchg on unaligned addresses.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The cmpxchg macros and functions are a bit different than on other
architectures. In particular the macros do not store the return
value of a __cmpxchg function call in a variable before returning the
value.
This causes compile warnings that only occur on s390 like this one:
net/ipv4/af_inet.c: In function 'build_ehash_secret':
net/ipv4/af_inet.c:241:2: warning: value computed is not used [-Wunused-value]
To get rid of these warnings use the same construct that we already use
for the xchg macro, which was introduced for the same reason.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
All cmpxchg functions imply a memory barrier.
cmpxch64 did not have one for 31 bit code, so add it.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
It has been a big mistage to add the capabilities attribute to the
cpus in sysfs:
First the attribute only contains the cpu capability of primary cpus,
which however is not necessarily (or better: unlikely) the type of
cpu the kernel runs on, which is typically an IFL.
In addition all information that is necessary is available in
/proc/sysinfo already. So this attribute partially duplicated
informations.
So programs should look into the sysinfo file to retrieve all
informations they are interested in.
Since with this kernel release also the powersavings cpu attributes
are removed this seems to be a good opportunity to remove another
broken interface.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If the IPL CPU is offline, currently the pcpu_delegate() function
used by smp_call_ipl_cpu() does not work because pcpu_delegate()
modifies the lowcore of the target CPU. In case of an offline
IPL CPU currently the prefix register is zero but pcpu->lowcore
still points to the old prefix page. Therefore the lowcore changes
done by pcpu_delegate() have no effect.
With this fix pcpu_delegate() now uses memcpy_absolute() and therefore
also prepares the absolute zero lowcore if the target CPU has prefix
register zero.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch introduces the new function memcpy_absolute() that allows to
copy memory using absolute addressing. This means that the prefix swap
does not apply when this function is used.
With this patch also all s390 kernel code that accesses absolute zero
now uses the new memcpy_absolute() function. The old and less generic
copy_to_absolute_zero() function is removed.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
. Make the annotatation toggles (hide_src_code, jump_arrows, use_offset, etc)
global so that navigation doesn't resets them on new annotations.
. Introduce an '[annotate]' config file section to allow permanent changes
to the annotate browser defaults.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJPxXylAAoJENZQFvNTUqpAMgEP/jdTZgrhOmccx5Jcc1790ELe
efw0C/ggtFVovyz2/UWBvOQLdcYZEVbjFbeB6Vq9yCRgAiHCmnVgsVG/IERoAhuQ
eQoWOkGolSYTI+BOW9/TEsLfUrsNlKXhK8Hu8Efs3vaVk+MCAMT6eetpF8txKhtg
KI4leQmR+Z4XFdwWmoUDP9ROS3MwvHTESBaenxM63w+sF+mCod70dsK9tA3+SfMF
z65aidY694o3Fs2HgPSF0dFbNd9lfnS2tCnZXyD9Ckg9ZF8Dbd5qvTlFaTiZi/0h
zj36mmsb9Q3WPXBmiQtXOVzzfYKk9dSz3KCH/IoUm6firQQVCn101hrsief7e9nU
/QvKGHsbcChzULr/SJrnFMCsaBt4zUh06RTLjaOBU5ITfR5X6OLrFNiHxGtZJrTC
jCHZZ1oO622v0V0JbF6Fzd4VOZ9IOsfFiJwRXt8rKQjNjKBF41H4xsXAlXVAJljb
txdNk9wUFl7TtVB6BEORKS4kdXeMGkr2VVaLE8A8PrFd0K3C7HL1Sf2nrPX7snfQ
USEBJ3uPfE7NgsH9r9UMSOOao4mbuOTMqZoqrjG59zk4mB0scySJHcC5JpvX8wPG
Chi4jYZYuWzxqfwzihq8hu8z6mKQkRJoqpf1XjgUoAUtBQZe+lWN7YDySPlMPREx
5Ndq47mP1uhq+unYYICl
=2R//
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Annotation fixes/improvements from Arnaldo Carvalho de Melo:
. Make the annotatation toggles (hide_src_code, jump_arrows, use_offset, etc)
global so that navigation doesn't resets them on new annotations.
. Introduce an '[annotate]' config file section to allow permanent changes
to the annotate browser defaults.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>