This patch adds two macros for transition between byte and block offsets.
Currently, f2fs only supports 4KB blocks, so use the default size for now.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch fixes wrong handling of buffer_new flag in get_block.
If f2fs allocates new blocks and mapped buffer_head, it needs to set buffer_new
for the bh_result.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In get_node_page, if the page is up-to-date, we assumed that the page was not
reclaimed at all.
But, sometimes it was reported that its contents was missing.
So, just for sure, let's check its mapping and contents.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
xfstest generic/285 complains our issue in lseeking huge file.
Here is the detail output of generic/285:
"./check -f2fs tests/generic/285
Ran: generic/285
Failures: generic/285
Failed 1 of 1 tests
10. Test a huge file for offset overflow
10.01 SEEK_HOLE expected 65536 or 8589934592, got 65536. succ
10.02 SEEK_HOLE expected 65536 or 8589934592, got 65536. succ
10.03 SEEK_DATA expected 0 or 0, got 0. succ
10.04 SEEK_DATA expected 1 or 1, got 1. succ
10.05 SEEK_HOLE expected 8589934592 or 8589934592, got 0. FAIL
10.06 SEEK_DATA expected 8589869056 or 8589869056, got 8589869056. succ
10.07 SEEK_DATA expected 8589869057 or 8589869057, got 8589869057. succ
10.08 SEEK_DATA expected 8589869056 or 8589869056, got 4294901760. FAIL"
The reason of this issue is:
We will calculate current offset through left shifting page-offset with
PAGE_CACHE_SHIFT bits, but our page-offset is a type of unsigned long, its size
is 4 bytes in 32-bits machine.
So if our page-offset is bigger than (1 << 32 / pagesize - 1), result of left
shifting will overflow.
Let's fix this issue by casting type of page-offset to type of current offset:
loff_t.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In commit a78186ebe5 ("f2fs: use highmem for directory pages"), we have set
__GFP_HIGHMEM into dir mapping's gfp flag in f2fs_iget, so high address memory
could be used for these existing dir's page.
But we forgot to set flag for newly created dir, due to this reason, our newly
created dir pages could not be allocated from high address memory. Fix it.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch introduces a batched trimming feature, which submits split discard
commands.
This is to avoid long latency due to huge trim commands.
If fstrim was triggered ranging from 0 to the end of device, we should lock
all the checkpoint-related mutexes, resulting in very long latency.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch merges ->{invalidate,release}page function for meta/node/data pages.
After this, duplication of codes could be removed.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If device is read-only, we should not proceed data recovery.
But, if the previous checkpoint was done by normal clean shutdown, it's safe to
proceed the recovery, since there will be no data to be recovered.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds FASTBOOT flag into checkpoint as follows.
- CP_UMOUNT_FLAG is set when system is umounted.
- CP_FASTBOOT_FLAG is set when intermediate checkpoint having node summaries
was done.
So, if you get CP_UMOUNT_FLAG from checkpoint, the system was umounted cleanly.
Instead, if there was sudden-power-off, you can get CP_FASTBOOT_FLAG or nothing.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Do not change any partition when f2fs is changed to readonly mode.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds a mount option, norecovery, which is mostly same as
disable_roll_forward. The only difference is that norecovery should be activated
with read-only mount option.
This can be used when user wants to check whether f2fs is mountable or not
without any recovery process. (e.g., xfstests/200)
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If wrong mount option was requested, f2fs tries to fill_super again.
But, during the next trial, f2fs has no valid mount options, since
parse_options deleted all the separators in the original string.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Currently, there are several variables with Boolean type as below:
struct f2fs_sb_info {
...
int s_dirty;
bool need_fsck;
bool s_closing;
...
bool por_doing;
...
}
For this there are some issues:
1. there are some space of f2fs_sb_info is wasted due to aligning after Boolean
type variables by compiler.
2. if we continuously add new flag into f2fs_sb_info, structure will be messed
up.
So in this patch, we try to:
1. switch s_dirty to Boolean type variable since it has two status 0/1.
2. merge s_dirty/need_fsck/s_closing/por_doing variables into s_flag.
3. introduce an enum type which can indicate different states of sbi.
4. use new introduced universal interfaces is_sbi_flag_set/{set,clear}_sbi_flag
to operate flags for sbi.
After that, above issues will be fixed.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Use pointer parameter @wait to pass result in {in,de}create_sleep_time for
cleanup.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
1. make truncate_inline_date static;
2. remove parameter @from of truncate_inline_date as callers only pass zero.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
No modification in functionality, just clean codes with f2fs_radix_tree_insert.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In this patch we add the FS_IOC_GETVERSION ioctl for getting i_generation from
inode, after that, users can list file's generation number by using "lsattr -v".
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
During the recovery, any xattr blocks should not be found, since they are
written into cold log, not the warm node chain.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We will encounter deadloop in below scenario:
1. increase page count for F2FS_DIRTY_META type in following path:
->recover_fsync_data
->recover_data
->do_recover_data
->recover_data_page
->change_curseg
->write_sum_page
->set_page_dirty
2. fail in recover_data()
3. invalidate meta pages in truncate_inode_pages_final without decreasing page
count.
4. deadloop when sync_meta_pages as page count will always be non-zero.
message:
NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s!
[<c1129a37>] pagevec_lookup_tag+0x27/0x30
[<f0e774c7>] sync_meta_pages+0x87/0x160 [f2fs]
[<f0e86dd9>] recover_fsync_data+0xeb9/0xf10 [f2fs]
[<f0e75398>] f2fs_fill_super+0x888/0x980 [f2fs]
[<c11733ca>] mount_bdev+0x16a/0x1a0
[<f0e7180f>] f2fs_mount+0x1f/0x30 [f2fs]
[<c1173da6>] mount_fs+0x36/0x170
[<c118b6f5>] vfs_kern_mount+0x55/0xe0
[<c118d63f>] do_mount+0x1df/0x9f0
[<c118e110>] SyS_mount+0x70/0xb0
[<c15a0c48>] sysenter_do_call+0x12/0x12
To avoid page count leak, let's add ->invalidatepage and ->releasepage in
f2fs_meta_aops as f2fs_node_aops to release meta page count correctly.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If the previous checkpoint was done without CP_UMOUNT flag, it needs to do
checkpoint with CP_UMOUNT for the next fast boot.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch fixes to trigger checkpoint with umount flag when kill_sb was called.
In kill_sb, f2fs_sync_fs was finally called, but at this time, f2fs can't do
checkpoint with CP_UMOUNT.
After then, f2fs_put_super is not doing checkpoint, since it is not dirty.
So, this patch adds a flag to indicate f2fs_sync_fs is called during umount.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Our value of memory footprint statistics showed in debugfs is not calculated
correctly. Fix it in this patch.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If cp_error is set, we should avoid all the infinite loop.
In f2fs_sync_file, there is a hole, and this patch fixes that.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Pull networking updates from David Miller:
1) More iov_iter conversion work from Al Viro.
[ The "crypto: switch af_alg_make_sg() to iov_iter" commit was
wrong, and this pull actually adds an extra commit on top of the
branch I'm pulling to fix that up, so that the pre-merge state is
ok. - Linus ]
2) Various optimizations to the ipv4 forwarding information base trie
lookup implementation. From Alexander Duyck.
3) Remove sock_iocb altogether, from CHristoph Hellwig.
4) Allow congestion control algorithm selection via routing metrics.
From Daniel Borkmann.
5) Make ipv4 uncached route list per-cpu, from Eric Dumazet.
6) Handle rfs hash collisions more gracefully, also from Eric Dumazet.
7) Add xmit_more support to r8169, e1000, and e1000e drivers. From
Florian Westphal.
8) Transparent Ethernet Bridging support for GRO, from Jesse Gross.
9) Add BPF packet actions to packet scheduler, from Jiri Pirko.
10) Add support for uniqu flow IDs to openvswitch, from Joe Stringer.
11) New NetCP ethernet driver, from Muralidharan Karicheri and Wingman
Kwok.
12) More sanely handle out-of-window dupacks, which can result in
serious ACK storms. From Neal Cardwell.
13) Various rhashtable bug fixes and enhancements, from Herbert Xu,
Patrick McHardy, and Thomas Graf.
14) Support xmit_more in be2net, from Sathya Perla.
15) Group Policy extensions for vxlan, from Thomas Graf.
16) Remove Checksum Offload support for vxlan, from Tom Herbert.
17) Like ipv4, support lockless transmit over ipv6 UDP sockets. From
Vlad Yasevich.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1494+1 commits)
crypto: fix af_alg_make_sg() conversion to iov_iter
ipv4: Namespecify TCP PMTU mechanism
i40e: Fix for stats init function call in Rx setup
tcp: don't include Fast Open option in SYN-ACK on pure SYN-data
openvswitch: Only set TUNNEL_VXLAN_OPT if VXLAN-GBP metadata is set
ipv6: Make __ipv6_select_ident static
ipv6: Fix fragment id assignment on LE arches.
bridge: Fix inability to add non-vlan fdb entry
net: Mellanox: Delete unnecessary checks before the function call "vunmap"
cxgb4: Add support in cxgb4 to get expansion rom version via ethtool
ethtool: rename reserved1 memeber in ethtool_drvinfo for expansion ROM version
net: dsa: Remove redundant phy_attach()
IB/mlx4: Reset flow support for IB kernel ULPs
IB/mlx4: Always use the correct port for mirrored multicast attachments
net/bonding: Fix potential bad memory access during bonding events
tipc: remove tipc_snprintf
tipc: nl compat add noop and remove legacy nl framework
tipc: convert legacy nl stats show to nl compat
tipc: convert legacy nl net id get to nl compat
tipc: convert legacy nl net id set to nl compat
...
Merge misc updates from Andrew Morton:
"Bite-sized chunks this time, to avoid the MTA ratelimiting woes.
- fs/notify updates
- ocfs2
- some of MM"
That laconic "some MM" is mainly the removal of remap_file_pages(),
which is a big simplification of the VM, and which gets rid of a *lot*
of random cruft and special cases because we no longer support the
non-linear mappings that it used.
From a user interface perspective, nothing has changed, because the
remap_file_pages() syscall still exists, it's just done by emulating the
old behavior by creating a lot of individual small mappings instead of
one non-linear one.
The emulation is slower than the old "native" non-linear mappings, but
nobody really uses or cares about remap_file_pages(), and simplifying
the VM is a big advantage.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (78 commits)
memcg: zap memcg_slab_caches and memcg_slab_mutex
memcg: zap memcg_name argument of memcg_create_kmem_cache
memcg: zap __memcg_{charge,uncharge}_slab
mm/page_alloc.c: place zone_id check before VM_BUG_ON_PAGE check
mm: hugetlb: fix type of hugetlb_treat_as_movable variable
mm, hugetlb: remove unnecessary lower bound on sysctl handlers"?
mm: memory: merge shared-writable dirtying branches in do_wp_page()
mm: memory: remove ->vm_file check on shared writable vmas
xtensa: drop _PAGE_FILE and pte_file()-related helpers
x86: drop _PAGE_FILE and pte_file()-related helpers
unicore32: drop pte_file()-related helpers
um: drop _PAGE_FILE and pte_file()-related helpers
tile: drop pte_file()-related helpers
sparc: drop pte_file()-related helpers
sh: drop _PAGE_FILE and pte_file()-related helpers
score: drop _PAGE_FILE and pte_file()-related helpers
s390: drop pte_file()-related helpers
parisc: drop _PAGE_FILE and pte_file()-related helpers
openrisc: drop _PAGE_FILE and pte_file()-related helpers
nios2: drop _PAGE_FILE and pte_file()-related helpers
...
relating to ACLs, and another which improves (but does not fix entirely) an
allocation fall-back code path. The other three patches are small clean
ups.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)
iQIcBAABAgAGBQJU2dvJAAoJEMrg3m4a/8jSQJYQAKJXz9l0MDHPpm4qcjEO1p8T
JLSEa8X24taugcbSavC15g0v3HOCveleMymNo2wrfTFqkcpy/YTlqj0LjZywXOZc
KJMZWL4ZYZXmdf2DDhZ6WFb6Ouw+0l2MbtIWb/L2NQiv1K6nSdOVsgXiRvyehvxD
YEQBbdQxsa8uFr1QbBVNvaM6X6iHOLLvkLHHpw84N1dMcijt3BjQqqXQphkD6aM8
uSEYSY4aMl2nZv/fEItjqLqLZvkhQX/dbN5aUnixHd65bSYkdsWMjn4DInJujbNd
aT/iLNFhCKN3FqAcl+FAfnOhanJsj0jDSS4puyufuLOv2vgK9AmF8TiAW4TEZ0ny
IAqdurn7+1dAINxKTuo/ktBdkdwWI35cwP3PUrvKJifgKxX/EvmGDC3U2EoeItP1
/TbwE2PVdQspCiXUGIAx40BhmEEszrQHIdM+p+cUxxmmt74Hm9582pby1aMOPnHQ
FZoPhtFRKD8JttxRu+MyEwC4K0/JRK7R496fFPoY225+DSqeNibXrhU3USznZXYc
0h2kRoXTMO74ptdZr+I91RKUWZo17Ujm96mRZgnCSGgEX3IwD/6dxgd0cjU7075r
mSVEy6fngwM6cGF0qQsnqhf2bktCPg+8zhSilbxwuFfWIa66NUjoupqB3wzpdlyC
sd+3tpZijhStqjPYTKmM
=CiRw
-----END PGP SIGNATURE-----
Merge tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw
Pull gfs2 updates from Steven Whitehouse:
"This time we have mostly clean ups. There is a bug fix for a NULL
dereference relating to ACLs, and another which improves (but does not
fix entirely) an allocation fall-back code path. The other three
patches are small clean ups"
* tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
GFS2: Fix crash during ACL deletion in acl max entry check in gfs2_set_acl()
GFS2: use __vmalloc GFP_NOFS for fs-related allocations.
GFS2: Eliminate a nonsense goto
GFS2: fix sprintf format specifier
GFS2: Eliminate __gfs2_glock_remove_from_lru
This update contains:
o RENAME_EXCHANGE support
o Rework of the superblock logging infrastructure
o Rework of the XFS_IOCTL_SETXATTR implementation
- enables use inside user namespaces
- fixes inconsistencies setting extent size hints
o fixes for missing buffer type annotations used in log recovery
o more consolidation of libxfs headers
o preparation patches for block based PNFS support
o miscellaneous bug fixes and cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJU2UQ6AAoJEK3oKUf0dfodNAwQAKD2T154VZCXZtjC6LobdEL4
tuk72eDEyduNQZ1yL8MPNf090lsFizoLmXocawQEKPHZnSnZu1E6VQLughI5wSge
N658/5mNvuRQkvUTw13onPhNdOzSoqigiy+6u0qiZbhab72ZExfuloqAAgZ908Ak
DWZRQNXDspG8olGD1EKyEOKAoWDgRljiBEUw7+wVAN2thdQRXs6pyyTcaIXst3j6
bk9EAeWZOK78OJUjtWlvIFv7RqjjUQL/8Z4Arr3lMuv8Y1fxS3oL8V9GjKae0MUn
ZFPJzRHE9kfcO4AYAwYDUDW0Ia4ccq3sYu8FDzX6FTf3AqziQXKW/ovw83So5oEE
vpT/ltsugWfu2feShvfa9FTYxqmecXUqoITZ4Sczm/8F9tDz0Q4HZ/GxleAmJyH9
89IbTlSgm6qy1nmSbg21x0CVuOqJqHFAG83nnMKv8X4gmaMg9SzdNlU1iL2ugBQ7
l2XKS2DKDbjgBnSdW+bOdBoFiMyCUjrwMAp23vYELuTIFSdHkTFsBQBvqtpjkJ0X
KEutMaZdy8uo5DdPRN9k78OANbzOc4qYKJHTjiO5sQsMybsPNkciLBX49CgYFmw4
xoVQ+EYrebpBKi4WHyLKWFOWb5CP/AUtLgxsX0clQVBuuaEBhA306qic1ZZixtIm
ea2Q8ODEgpzs8QWZ7j6+
=eZEU
-----END PGP SIGNATURE-----
Merge tag 'xfs-for-linus-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs
Pull xfs update from Dave Chinner:
"This update contains:
- RENAME_EXCHANGE support
- Rework of the superblock logging infrastructure
- Rework of the XFS_IOCTL_SETXATTR implementation
* enables use inside user namespaces
* fixes inconsistencies setting extent size hints
- fixes for missing buffer type annotations used in log recovery
- more consolidation of libxfs headers
- preparation patches for block based PNFS support
- miscellaneous bug fixes and cleanups"
* tag 'xfs-for-linus-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (37 commits)
xfs: only trace buffer items if they exist
xfs: report proper f_files in statfs if we overshoot imaxpct
xfs: fix panic_mask documentation
xfs: xfs_ioctl_setattr_check_projid can be static
xfs: growfs should use synchronous transactions
xfs: fix behaviour of XFS_IOC_FSSETXATTR on directories
xfs: factor projid hint checking out of xfs_ioctl_setattr
xfs: factor extsize hint checking out of xfs_ioctl_setattr
xfs: XFS_IOCTL_SETXATTR can run in user namespaces
xfs: kill xfs_ioctl_setattr behaviour mask
xfs: disaggregate xfs_ioctl_setattr
xfs: factor out xfs_ioctl_setattr transaciton preamble
xfs: separate xflags from xfs_ioctl_setattr
xfs: FSX_NONBLOCK is not used
xfs: don't allocate an ioend for direct I/O completions
xfs: change kmem_free to use generic kvfree()
xfs: factor out a xfs_update_prealloc_flags() helper
xfs: remove incorrect error negation in attr_multi ioctl
xfs: set superblock buffer type correctly
xfs: set buf types when converting extent formats
...
Pull quota interface unification and misc cleanups from Jan Kara:
"The first part of the series unifying XFS and VFS quota interfaces.
This part unifies turning quotas on and off so quota-tools and
xfs_quota can be used to manage any filesystem. This is useful so
that userspace doesn't have to distinguish which filesystem it is
working with. As a result we can then easily reuse tests for project
quotas in XFS for ext4.
This also contains minor cleanups and fixes for udf, isofs, and ext3"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (23 commits)
udf: remove bool assignment to 0/1
udf: use bool for done
quota: Store maximum space limit in bytes
quota: Remove quota_on_meta callback
ocfs2: Use generic helpers for quotaon and quotaoff
ext4: Use generic helpers for quotaon and quotaoff
quota: Add ->quota_{enable,disable} callbacks for VFS quotas
quota: Wire up ->quota_{enable,disable} callbacks into Q_QUOTA{ON,OFF}
quota: Split ->set_xstate callback into two
xfs: Remove some pointless quota checks
xfs: Remove some useless flags tests
xfs: Remove useless test
quota: Verify flags passed to Q_SETINFO
quota: Cleanup flags definitions
ocfs2: Move OLQF_CLEAN flag out of generic quota flags
quota: Don't store flags for v2 quota format
jbd: drop jbd_ENOSYS debug
udf: destroy sbi mutex in put_super
udf: Check length of extended attributes and allocation descriptors
udf: Remove repeated loads blocksize
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJU1MYmAAoJEAAOaEEZVoIV/rAQAKoHj/PCOATTy05lF/NDhJlS
6NbNjupnC8HrbNPv6Z/cQ902eC1YRVH96gf6we4FeAm9Tjctpje6uEqvPQCUxpot
2jWgCG+g95OeEaQEjXQvR3x5ZfXvPUtwKVOnMF423L1p5Xfbj3kJfGi+dv2k8XOi
GArsUB7uCwqLyyz+L47RJ2Cz7s47M9O25HkVRfWlgYOv+4afq5OpADGKQAhMLL/s
CPhYgqw/7r1p+pLkjUE/x+5BAliDzUinFtDatgD4CeHOdq0RKlxzQ1rFg6uJVg/k
3ZttGOxWUtGIeGM4v5cosDFReLPCESax/TUzn58jxxFR702MjHAA+lHRgjZoWvW/
9EnShl0XlznQX1ns6f0rI1seWe4M5R3CWus8AcG0kDmdbTp8nARo+pBLFhCME/kZ
15GHLz4tDSRt5SNow6aqJdlYJR7p3WrsceKyM5aH9M7odM3eaB5vJxIJ0fljsZbS
Qtz4t+Ua1oVSYD7TX3y7EUiQVPVo8VKS3o6Ua73wCHIXNbSH7hZLOvPLFs6V1Psi
RKqRiad5iO3+iavVGuDDcs12zXZ5hmksE8oMh0NkjFZ6wJlO4Hf5iOt5thABNDmT
Km+40IBq1DYwclPTofaRpB+ytDOnWedMxdWfWdEWQ710zuuNY3cfi/XMXEX34kBY
fLhUMabqcyfUegpA6S0R
=6+UV
-----END PGP SIGNATURE-----
Merge tag 'locks-v3.20-1' of git://git.samba.org/jlayton/linux
Pull file locking related changes #1 from Jeff Layton:
"This patchset contains a fairly major overhaul of how file locks are
tracked within the inode. Rather than a single list, we now create a
per-inode "lock context" that contains individual lists for the file
locks, and a new dedicated spinlock for them.
There are changes in other trees that are based on top of this set so
it may be easiest to pull this in early"
* tag 'locks-v3.20-1' of git://git.samba.org/jlayton/linux:
locks: update comments that refer to inode->i_flock
locks: consolidate NULL i_flctx checks in locks_remove_file
locks: keep a count of locks on the flctx lists
locks: clean up the lm_change prototype
locks: add a dedicated spinlock to protect i_flctx lists
locks: remove i_flock field from struct inode
locks: convert lease handling to file_lock_context
locks: convert posix locks to file_lock_context
locks: move flock locks to file_lock_context
ceph: move spinlocking into ceph_encode_locks_to_buffer and ceph_count_locks
locks: add a new struct file_locking_context pointer to struct inode
locks: have locks_release_file use flock_lock_file to release generic flock locks
locks: add new struct list_head to struct file_lock
We don't create non-linear mappings anymore. Let's drop code which
handles them in rmap.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have to handle non-linear mappings for /proc/PID/{smaps,clear_refs}
which is unused now. Let's drop it.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__generic_block_fiemap may spin very long time for large sparse files.
Without this patch an unprivileged user may abuse system resources simply
by spawning a vast number of unkilable busyloops (works on ext2/ext3):
truncate --size 1T test
for ((i=0;i<1024;i++))
do
filefrag test > /dev/null &
done
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A tiny race between BAST and unlock message causes the NULL dereference.
A node sends an unlock request to master and receives a response. Before
processing the response it receives a BAST from the master. Since both
requests are processed by different threads it creates a race. While the
BAST is being processed, lock can get freed by unlock code.
This patch makes bast to return immediately if lock is found but unlock is
pending. The code should handle this race. We also have to fix master
node to skip sending BAST after receiving unlock message.
Below is the crash stack
BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
IP: o2dlm_blocking_ast_wrapper+0xd/0x16
dlm_do_local_bast+0x8e/0x97 [ocfs2_dlm]
dlm_proxy_ast_handler+0x838/0x87e [ocfs2_dlm]
o2net_process_message+0x395/0x5b8 [ocfs2_nodemanager]
o2net_rx_until_empty+0x762/0x90d [ocfs2_nodemanager]
worker_thread+0x14d/0x1ed
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In ocfs2_dentry_convert_worker, we should prune the dcache before deleting
the dentry of directory, otherwise, in the following cases the inode of
directory will still remain in orphan directory until the device being
umounted.
Mount point: /mnt/ocfs2
Node A Node B
mkdir /mnt/ocfs2/testdir
ocfs2_mkdir
->ocfs2_mknod
->ocfs2_dentry_attach_lock
->ocfs2_dentry_lock(dentry, 0)
... ...
touch /mnt/ocfs2/testdir/testfile
unlink /mnt/test/testdir/testfile
rmdir /mnt/ocfs2/testdir
ocfs2_unlink
->ocfs2_remote_dentry_delete
->ocfs2_dentry_lock(dentry, 1)
... ...
... ...
ocfs2_downconvert_thread
->ocfs2_unblock_lock
->ocfs2_dentry_convert_worker
->ocfs2_find_local_alias
->dget_dlock
->d_delete
Here the dentry can not be
released because the children's
dentry is negative but still exist.
Finally, this inode will still remain
in orphan directory until its children
are destroyed.
So before deleting dentry of directory, we should prune the dcache to
remove unused children of the parent dentry by shrink_dcache_parent().
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: joyce.xue <xuejiufei@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
resv_lock is only used in reservations.c
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mlog_errno() is called twice when some functions are failed.
Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Smatch complains that, if o2net_tx_can_proceed() returns false, then "sc"
and "ret" are uninialized or maybe we are re-using the data from previous
iteration. I do not know if we can hit this bug in real life but checking
the return value is harmless and we may as well silence the static checker
warning.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The assigned value is never used.
Coverity-id 1226847.
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a mount option to support JBD2 feature:
JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT. When this feature is opened, journal
commit block can be written to disk without waiting for descriptor blocks,
which can improve journal commit performance. This option will enable
'journal_checksum' internally.
Using the fs_mark benchmark, using journal_async_commit shows a 50%
improvement, the files per second go up from 215.2 to 317.5.
test script:
fs_mark -d /mnt/ocfs2/ -s 10240 -n 1000
default:
FSUse% Count Size Files/sec App Overhead
0 1000 10240 215.2 17878
with journal_async_commit option:
FSUse% Count Size Files/sec App Overhead
0 1000 10240 317.5 17881
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Signed-off-by: Weiwei Wang <wangww631@huawei.comm>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Similar to ocfs2_write_end_nolock() which is metioned at commit
136f49b917 ("ocfs2: fix journal commit deadlock"), we should unlock
pages before ocfs2_commit_trans() in ocfs2_convert_inline_data_to_extents.
Otherwise, it will cause a deadlock with journal commit threads.
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove dlm_joined() that is not used anywhere.
This was partially found by using a static code analysis program called
cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove ol_dqblk_file_block() that is not used anywhere.
This was partially found by using a static code analysis program called
cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove ocfs2_xattr_bucket_get_val() that is not used anywhere.
This was partially found by using a static code analysis program called
cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>