Commit graph

569678 commits

Author SHA1 Message Date
Damien Le Moal
018fc18e28 f2fs: Cache zoned block devices zone type
commit 178053e2f1f9ccdb61ff6c2bd8644b53fc98e72e upstream.

With the zoned block device feature enabled, section discard
need to do a zone reset for sections contained in sequential
zones, and a regular discard (if supported) for sections
stored in conventional zones. Avoid the need for a costly
report zones to obtain a section zone type when discarding it
by caching the types of the device zones in the super block
information. This cache is initialized at mount time for mounts
with the zoned block device feature enabled.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:11:49 -07:00
Damien Le Moal
d9d8c376e4 f2fs: Do not allow adaptive mode for host-managed zoned block devices
commit 3adc57e97792e4ac9f228bde802829e2e9840afe upstream.

The LFS mode is mandatory for host-managed zoned block devices as
update in place optimizations are not possible for segments in
sequential zones.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:11:44 -07:00
Damien Le Moal
4b1d4ef0b7 f2fs: Always enable discard for zoned blocks devices
commit 96ba2decb4241aa2c6b61cfc8489d648769eff99 upstream.

Zone write pointer reset acts as discard for zoned block
devices. So if the zoned block device feature is enabled,
always declare that discard is enabled, even if the device
does not actually support the command.
For the same reason, prevent the use the "nodicard" mount
option.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:11:41 -07:00
Damien Le Moal
ecc252e7a4 f2fs: Suppress discard warning message for zoned block devices
commit 0ab0299835738cd407569401da1fef4c97b4419c upstream.

For zoned block devices, discard is replaced by zone reset. So
do not warn if the device does not supports discard.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:11:39 -07:00
Damien Le Moal
1529b8f943 f2fs: Check zoned block feature for host-managed zoned block devices
commit d1b959c8770260b611b9a1f0c5e8b12b7cb5b9d2 upstream.

The F2FS_FEATURE_BLKZONED feature indicates that the drive was formatted
 with zone alignment optimization. This is optional for host-aware
devices, but mandatory for host-managed zoned block devices.
So check that the feature is set in this latter case.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:11:36 -07:00
Damien Le Moal
22bbc1efdb f2fs: Use generic zoned block device terminology
commit 0bfd7a091c19132489a0f977b8dbf9f6b5ae0a1c upstream.

SMR stands for "Shingled Magnetic Recording" which makes sense
only for hard disk drives (spinning rust). The ZBC/ZAC standards
enable management of SMR disks, but solid state drives may also
support those standards. So rename the HMSMR feature to BLKZONED
to avoid a HDD centric terminology. For the same reason, rename
f2fs_sb_mounted_hmsmr to f2fs_sb_mounted_blkzoned.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:11:34 -07:00
Damien Le Moal
97df49a0c3 f2fs: Add missing break in switch-case
commit 487df616dec33231c99294b906d720d256a2de16 upstream.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:11:28 -07:00
Jaegeuk Kim
a91b9fe273 f2fs: avoid infinite loop in the EIO case on recover_orphan_inodes
commit 099228000eff6b25e0f76b276043cd65cd4eba5a upstream.

This patch should fix an infinite loop case below.

F2FS-fs : inject IO error in f2fs_read_end_io+0xf3/0x120 [f2fs]
F2FS-fs (nvme0n1p1): recover_orphan_inode: orphan failed (ino=39ac1a), run fsck to fix.
...
[<ffffffffc0b11ede>] sync_meta_pages+0xae/0x270 [f2fs]
[<ffffffffc0b288dd>] ? flush_sit_entries+0x8d/0x960 [f2fs]
[<ffffffffc0b13801>] write_checkpoint+0x361/0xf20 [f2fs]
[<ffffffffb40e979d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc0b0a199>] ? f2fs_sync_fs+0x79/0x190 [f2fs]
[<ffffffffc0b0a1a5>] f2fs_sync_fs+0x85/0x190 [f2fs]
[<ffffffffc0b2560e>] f2fs_balance_fs_bg+0x7e/0x1c0 [f2fs]
[<ffffffffc0b216c4>] f2fs_write_node_pages+0x34/0x320 [f2fs]
[<ffffffffb41dff21>] do_writepages+0x21/0x30
[<ffffffffb429edb1>] __writeback_single_inode+0x61/0x760
[<ffffffffb490a937>] ? _raw_spin_unlock+0x27/0x40
[<ffffffffb42a0805>] writeback_single_inode+0xd5/0x190
[<ffffffffb42a0959>] write_inode_now+0x99/0xc0
[<ffffffffb4289a16>] iput+0x1f6/0x2c0
[<ffffffffc0b0e3be>] f2fs_fill_super+0xe0e/0x1300 [f2fs]
[<ffffffffb426c394>] ? sget_userns+0x4f4/0x530
[<ffffffffb426c692>] mount_bdev+0x182/0x1b0
[<ffffffffc0b0d5b0>] ? f2fs_commit_super+0x100/0x100 [f2fs]
[<ffffffffc0b0a375>] f2fs_mount+0x15/0x20 [f2fs]
[<ffffffffb426d038>] mount_fs+0x38/0x170
[<ffffffffb428ec9b>] vfs_kern_mount+0x6b/0x160
[<ffffffffb4291d9e>] do_mount+0x1be/0xd60
[<ffffffffb4291a57>] ? copy_mount_options+0xb7/0x220
[<ffffffffb4292c54>] SyS_mount+0x94/0xd0
[<ffffffffb490b345>] entry_SYSCALL_64_fastpath+0x23/0xc6

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:11:27 -07:00
Chao Yu
7d2eab1921 f2fs: report error of f2fs_fill_dentries
commit ed6bd4b146527e7c6934e3582c47d7b857802676 upstream.

Report error of f2fs_fill_dentries to ->iterate_shared, otherwise when
error ocurrs, user may just list part of dirents in target directory
without any hints.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:11:27 -07:00
Jaegeuk Kim
d4ec990d25 fs/crypto: catch up 4.9-rc6
commit d117b9acaeada0b243f31e0fe83e111fcc9a6644 upstream.

Merge tag 'ext4_for_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
"A security fix (so a maliciously corrupted file system image won't
panic the kernel) and some fixes for CONFIG_VMAP_STACK"

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:11:02 -07:00
Arnd Bergmann
b3441f8c71 f2fs: hide a maybe-uninitialized warning
commit 230436b3ef3fd7d4a1da19edf5e87bb2d74e0fc2 upstream.

gcc is unsure about the use of last_ofs_in_node, which might happen
without a prior initialization:

fs/f2fs//git/arm-soc/fs/f2fs/data.c: In function ‘f2fs_map_blocks’:
fs/f2fs/data.c:799:54: warning: ‘last_ofs_in_node’ may be used uninitialized in this function [-Wmaybe-uninitialized]
   if (prealloc && dn.ofs_in_node != last_ofs_in_node + 1) {

As pointed out by Chao Yu, the code is actually correct as 'prealloc'
is only set if the last_ofs_in_node has been set, the two always
get updated together.

This initializes last_ofs_in_node to dn.ofs_in_node for each
new dnode at the start of the 'next_block' loop, which at that
point is a correct initialization as well. I assume that compilers
that correctly track the contents of the variables and do not
warn about the condition also figure out that they can eliminate
the extra assignment here.

Fixes: 46008c6d4232 ("f2fs: support in batch multi blocks preallocation")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:08:29 -07:00
Jaegeuk Kim
3f137dda70 f2fs: remove percpu_count due to performance regression
commit 35782b233f37e48ecc469d9c7232f3f6a7fad41a upstream.

This patch removes percpu_count usage due to performance regression in iozone.

Fixes: 523be8a6b3 ("f2fs: use percpu_counter for page counters")
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:08:25 -07:00
Jaegeuk Kim
a15f017e8a f2fs: make clean inodes when flushing inode page
commit 18340edc8da20b0d399eb25ba4bb631b27652f46 upstream.

This patch tries to make more clean inodes when flushing dirty inodes in
checkpoint.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:08:22 -07:00
Jaegeuk Kim
0ef31c7bfa f2fs: keep dirty inodes selectively for checkpoint
commit 7c45729a4d6d1c90879e6c5c2df325c2f6db7191 upstream.

This is to avoid no free segment bug during checkpoint caused by a number of
dirty inodes.

The case was reported by Chao like this.
1. mount with lazytime option
2. fill 4k file until disk is full
3. sync filesystem
4. read all files in the image
5. umount

In this case, we actually don't need to flush dirty inode to inode page during
checkpoint.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:08:18 -07:00
Jaegeuk Kim
dafac77e8d f2fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
commit 02027d42c3f747945f19111d3da2092ed2148ac8 upstream.

This is for backport only.

fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:08:05 -07:00
Jaegeuk Kim
04030d21a7 f2fs: use BIO_MAX_PAGES for bio allocation
commit 664ba972df9b96942191db3068274cc1db899774 upstream.

We don't need to allocate bio partially in order to maximize sequential writes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:06:10 -07:00
Jaegeuk Kim
c01ce254c7 f2fs: declare static function for __build_free_nids
commit 3e7b5bbbef7f5eb8a19aa61b611c704bf8230937 upstream.

This patch avoids build warning.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:06:05 -07:00
Jaegeuk Kim
92c9dec342 f2fs: call f2fs_balance_fs for setattr
commit 15d04354555fdfe8005e1365009e349148fb5f90 upstream.

If inode becomes dirty, we need to check the # of dirty inodes whether or not
further checkpoint would be required.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:05:49 -07:00
Jaegeuk Kim
beb74f7757 f2fs: count dirty inodes to flush node pages during checkpoint
commit b9610bdfcbdbb6017802ec6d1e073f445c98157d upstream.

If there are a lot of dirty inodes, we need to flush all of them when doing
checkpoint. So, we need to count this for enough free space.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:05:40 -07:00
Chao Yu
9f99694bb7 f2fs: avoid casted negative value as shrink count
commit 02110a4fd53164db7cce3bb2780dce4d6c4e058f upstream.

This patch makes sure it returns a positive value instead of a probable
casted negative value as shrink count.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:05:35 -07:00
Chao Yu
e07f457ef7 f2fs: don't interrupt free nids building during nid allocation
commit 3a2ad5672bb36ee9c07bab97dadc8b0f70d391f4 upstream.

Let build_free_nids support sync/async methods, in allocation flow of nids,
we use synchronuous method, so that we can avoid looping in alloc_nid when
free memory is low; in unblock_operations and f2fs_balance_fs_bg we use
asynchronuous method in where low memory condition can interrupt us.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:05:30 -07:00
Jaegeuk Kim
6a248819a2 f2fs: clean up free nid list operations
commit eb0aa4b80784b8551bd5be577024e067bc83ef94 upstream.

This patch cleans up to use consistent free nid list ops.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:05:24 -07:00
Chao Yu
e18c262450 f2fs: split free nid list
commit b8559dc242d1d47dcf99660a4d6afded727e0cc0 upstream.

During free nid allocation, in order to do preallocation, we will tag free
nid entry as allocated one and still leave it in free nid list, for other
allocators who want to grab free nids, it needs to traverse the free nid
list for lookup. It becomes overhead in scenario of allocating free nid
intensively by multithreads.

This patch splits free nid list to two list: {free,alloc}_nid_list, to
keep free nids and preallocated free nids separately, after that, traverse
latency will be gone, besides split nid_cnt for separate statistic.

Additionally, introduce __insert_nid_to_list and __remove_nid_from_list for
cleanup.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: modify f2fs_bug_on to avoid needless branches]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:05:19 -07:00
Chao Yu
8db338877d f2fs: clear nlink if fail to add_link
commit a11b9f65eae766b17ec3451a6a1766f0a9d1dbff upstream.

We don't need to keep incomplete created inode in cache, so if we fail to
add link into directory during new inode creation, it's better to set
nlink of inode to zero, then we can evict inode immediately. Otherwise
release of nid belong to inode will be delayed until inode cache is being
shrunk, it may cause a seemingly endless loop while allocating free nids
in time of testing generic/269 case of fstest suit.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: add update_inode_page to fix kernel panic]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:05:11 -07:00
Eric Biggers
fbeee49e06 f2fs: fix sparse warnings
commit 0c0b471e43e7acf0747c6eb410863bf78c14750d upstream.

f2fs contained a number of endianness conversion bugs.

Also, one function should have been 'static'.

Found with sparse by running 'make C=2 CF=-D__CHECK_ENDIAN__ fs/f2fs/'

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:04:55 -07:00
Chao Yu
c675400f4a f2fs: fix error handling in fsync_node_pages
commit 9de69279750e9740bc7221c7051a40c0516a58fb upstream.

In fsync_node_pages, if f2fs was taged with CP_ERROR_FLAG, make sure bio
cache was flushed before return.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:04:55 -07:00
Chao Yu
1158df42b2 f2fs: fix to update largest extent under lock
commit b691d98fdd4cc2514c60fd6975e6016da203e64f upstream.

In order to avoid racing problem, make largest extent cache being updated
under lock.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:04:54 -07:00
Chao Yu
9e3d0bf6d3 f2fs: be aware of extent beyond EOF in fiemap
commit 58736fa60f6ae659ac72da8b1580c308b47e8edd upstream.

f2fs can support fallocating blocks beyond file size without changing the
size, but ->fiemap of f2fs was restricted and can't detect these extents
fallocated past EOF, now relieve the restriction.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:04:54 -07:00
Chao Yu
332f40b43f f2fs: don't miss any f2fs_balance_fs cases
commit 6f2d8ed654bfa391854df4de854953f772a16a9d upstream.

In f2fs_map_blocks, let f2fs_balance_fs detects node page modification
with dn.node_changed to avoid miss some corner cases.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:04:53 -07:00
Chao Yu
4c7eae1fef f2fs: add missing f2fs_balance_fs in f2fs_zero_range
commit 9434fcde1fa0f48e1a29fbdd9d436fa279aeb909 upstream.

f2fs_balance_fs should be called in between node page updating, otherwise
node page count will exceeded far beyond watermark of triggering
foreground garbage collection, result in facing high risk of hitting LFS
allocation failure.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:04:53 -07:00
Chao Yu
75bb19d8b7 f2fs: give a chance to detach from dirty list
commit 933439c8f3474e329709b715b43b0b8168bbecf8 upstream.

If there is no dirty pages in inode, we should give a chance to detach
the inode from global dirty list, otherwise it needs to call another
unnecessary .writepages for detaching.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:04:52 -07:00
Chao Yu
ab38818bdd f2fs: fix to release discard entries during checkpoint
commit 2dd15654ac0abe587a245a09a7823bbbd588bfb7 upstream.

In f2fs_fill_super, if there is any IO error occurs during recovery,
cached discard entries will be leaked, in order to avoid this, make
write_checkpoint() handle memory release by itself, besides, move
clear_prefree_segments to write_checkpoint for readability.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:04:47 -07:00
Chao Yu
311aa690ef f2fs: exclude free nids building and allocation
commit 2411cf5befa5804e4ced4c45a3212d7653869286 upstream.

During nid allocation, it needs to exclude building and allocating flow
of free nids, this is because while building free nid cache, there are two
steps: a) load free nids from unused nat entries in NAT pages, b) update
free nid cache by checking nat journal. The two steps should be atomical,
otherwise an used nid can be allocated as free one after a) and before b).

This patch adds missing lock which covers build_free_nids in
unlock_operation and f2fs_balance_fs_bg to avoid that.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:02:02 -07:00
Jaegeuk Kim
6b266c3a99 f2fs: fix to determine start_cp_addr by sbi->cur_cp_pack
commit 8508e44ae98622f841f5ef29d0bf3d5db4e0c1cc upstream.

We don't guarantee cp_addr is fixed by cp_version.
This is to sync with f2fs-tools.

Cc: stable@vger.kernel.org
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:01:57 -07:00
Jaegeuk Kim
ab6f3626a8 f2fs: fix overflow due to condition check order
commit e87f7329bbd6760c2acc4f1eb423362b08851a71 upstream.

In the last ilen case, i was already increased, resulting in accessing out-
of-boundary entry of do_replace and blkaddr.
Fix to check ilen first to exit the loop.

Fixes: 2aa8fbb9693020 ("f2fs: refactor __exchange_data_block for speed up")
Cc: stable@vger.kernel.org # 4.8+
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:01:52 -07:00
Jaegeuk Kim
91d38ba841 posix_acl: Clear SGID bit when setting file permissions
commit 073931017b49d9458aa351605b43a7e34598caef upstream.

Cherry-pick to f2fs only for generic/375 from:

(073931017: posix_acl: Clear SGID bit when setting file permissions)

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:01:36 -07:00
Jaegeuk Kim
ae81ccb3bd f2fs: fix wrong sum_page pointer in f2fs_gc
commit de0dcc40f6e24d6bac6b60e36eac4659bbbd3f00 upstream.

This patch fixes using a wrong pointer for sum_page in f2fs_gc.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:01:24 -07:00
Jaegeuk Kim
c1286ff41c f2fs: backport from (4c1fad64 - Merge tag 'for-f2fs-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs)
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 14:27:55 -07:00
Vikram Mulukutla
650b6a5c41 Revert "ANDROID: sched/tune: Initialize raw_spin_lock in boosted_groups"
This reverts commit c5616f2f874faa20b59b116177b99bf3948586df.

If we re-init the per-cpu boostgroup spinlock every time that
we add a new boosted cgroup, we can easily wipe out (reinit)
a spinlock struct while in a critical section. We should only
be setting up the per-cpu boostgroup data, and the spin_lock
initialization need only happen once - which we're already
doing in a postcore_initcall.

For example:

     -------- CPU 0 --------   | -------- CPU1 --------
cgroupX boost group added      |
schedtune_enqueue_task         |
  acquires(bg->lock)           | cgroupY boost group added
                               |  for_each_cpu()
                               |    raw_spin_lock_init(bg->lock)
  releases(bg->lock)           |
      BUG (already unlocked)   |
                               |

This results in the following BUG from the debug spinlock code:
	BUG: spinlock already unlocked on CPU#5, rcuop/6/68

Change-Id: I3016702780b461a0cd95e26c538cd18df27d6316
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
2017-09-23 01:25:03 +00:00
Michal Hocko
047200481e BACKPORT: partial: mm, oom_reaper: do not mmput synchronously from the oom reaper context
(cherry picked from commit ec8d7c14ea14922fe21945b458a75e39f11dd832)

Tetsuo has properly noted that mmput slow path might get blocked waiting
for another party (e.g.  exit_aio waits for an IO).  If that happens the
oom_reaper would be put out of the way and will not be able to process
next oom victim.  We should strive for making this context as reliable
and independent on other subsystems as much as possible.

Introduce mmput_async which will perform the slow path from an async
(WQ) context.  This will delay the operation but that shouldn't be a
problem because the oom_reaper has reclaimed the victim's address space
for most cases as much as possible and the remaining context shouldn't
bind too much memory anymore.  The only exception is when mmap_sem
trylock has failed which shouldn't happen too often.

The issue is only theoretical but not impossible.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Only backports mmput_async.

Change-Id: I5fe54abcc629e7d9eab9fe03908903d1174177f1
Signed-off-by: Arve Hjønnevåg <arve@android.com>
2017-09-21 17:45:15 +00:00
Sherry Yang
9b9d7cf191 FROMLIST: android: binder: Don't get mm from task
(from https://patchwork.kernel.org/patch/9954125/)

Use binder_alloc struct's mm_struct rather than getting
a reference to the mm struct through get_task_mm to
avoid a potential deadlock between lru lock, task lock and
dentry lock, since a thread can be holding the task lock
and the dentry lock while trying to acquire the lru lock.

Test: ran binderLibTest, throughputtest, interfacetest and
mempressure w/lockdep
Bug: 63926541
Change-Id: Icc661404eb7a4a2ecc5234b1bf8f0104665f9b45
Acked-by: Arve Hjønnevåg <arve@android.com>
Signed-off-by: Sherry Yang <sherryy@android.com>
2017-09-20 22:27:11 -04:00
Sherry Yang
e6fa28a9a9 FROMLIST: android: binder: Remove unused vma argument
(from https://patchwork.kernel.org/patch/9954123/)

The vma argument in update_page_range is no longer
used after 74310e06 ("android: binder: Move buffer
out of area shared with user space"), since mmap_handler
no longer calls update_page_range with a vma.

Test: ran binderLibTest, throughputtest, interfacetest and mempressure
Bug: 36007193
Change-Id: Ibd6f24c11750f8f7e6ed56e40dd18c08e02ace25
Acked-by: Arve Hjønnevåg <arve@android.com>
Signed-off-by: Sherry Yang <sherryy@android.com>
2017-09-20 22:27:03 -04:00
Sherry Yang
849c7764d8 FROMLIST: android: binder: Drop lru lock in isolate callback
(from https://patchwork.kernel.org/patch/9945123/)

Drop the global lru lock in isolate callback
before calling zap_page_range which calls
cond_resched, and re-acquire the global lru
lock before returning. Also change return
code to LRU_REMOVED_RETRY.

Use mmput_async when fail to acquire mmap sem
in an atomic context.

Fix "BUG: sleeping function called from invalid context"
errors when CONFIG_DEBUG_ATOMIC_SLEEP is enabled.

Bug: 63926541
Change-Id: I45dbada421b715abed9a66d03d30ae2285671ca1
Fixes: f2517eb76f1f2 ("android: binder: Add global lru shrinker to binder")
Reported-by: Kyle Yan <kyan@codeaurora.org>
Acked-by: Arve Hjønnevåg <arve@android.com>
Signed-off-by: Sherry Yang <sherryy@android.com>
2017-09-20 20:37:31 -04:00
Steve Muckle
9cfefbcfaa ANDROID: configs: remove config fragments
The kernel config fragments for Android have moved into
their own repository located at

https://android.googlesource.com/kernel/configs/

Bug: 63994171
Change-Id: I837bac54cb5c90e6a6eb0f6f0ad5c90588c1a46a
Signed-off-by: Steve Muckle <smuckle@google.com>
2017-09-19 15:40:14 +00:00
gaurav jindal
2876169271 drivers: cpufreq_interactive: handle error for module load fail
If the cpufreq_register_governor fails, resources for thread
speedchange_task should be released.
currently, concerned  resources are released in module_exit,
but if module loading fails, exit will not be called
and resources will remain acquired. this may leave kernel
in an unstable state.

Change-Id: Ic33f058c069d30bfd114fa1c1380325c8e00b51c
Signed-off-by: gaurav jindal <gauravjindal1104@gmail.com>
2017-09-17 19:40:29 +00:00
Michael Ellerman
2e26e045de UPSTREAM: Fix build break in fork.c when THREAD_SIZE < PAGE_SIZE
Commit b235beea9e99 ("Clarify naming of thread info/stack allocators")
breaks the build on some powerpc configs, where THREAD_SIZE < PAGE_SIZE:

  kernel/fork.c:235:2: error: implicit declaration of function 'free_thread_stack'
  kernel/fork.c:355:8: error: assignment from incompatible pointer type
    stack = alloc_thread_stack_node(tsk, node);
    ^

Fix it by renaming free_stack() to free_thread_stack(), and updating the
return type of alloc_thread_stack_node().

Fixes: b235beea9e99 ("Clarify naming of thread info/stack allocators")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 38331309
Change-Id: I5b7f920b459fb84adf5fc75f83bb488b855c4deb
(cherry picked from commit 9521d39976db20f8ef9b56af66661482a17d5364)
Signed-off-by: Zubin Mithra <zsm@google.com>
2017-09-15 10:44:27 +01:00
Greg Kroah-Hartman
29d0b657c3 This is the 4.4.88 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlm5nrQACgkQONu9yGCS
 aT7YkBAAuOKsiNi1UcZQY7MTr9BYM8hDi6wpLYrOltGlRGyJnlkkP5of0ukpulO6
 Cfp3RlLjhJ8a/ZPm+bEudnqISR7GsIyW40QiNOHLCoLUwbz0qImSCBCP1OREg5B8
 +KTsJ6UVJ5VXuqFaHAZLFtJlqmZVo9PpH0CPmL8bZylOx56dOZ8f/KkhXexBOZR3
 /CCrcCqiRs/bqJ3PAcEGcMcZYKh20SlmdNgj/GxSotvJ+xKFgBaqtHI2e9ftoMWZ
 RC1+h0plq7onjz2WMNe+hSbyODITGmJuti3TeJaZGtRpYRHv7S0Yuqs0QTvJCyjV
 iUcT0Z5tC2a1xIhiIhABZ9sveVRiop24d7qBdxqZhqLDn/jmCETZpsUaxkHs0Nk2
 bKPMT7guopS/e5xxJb0Acl8StPfv/EAogWw5XNeBlwtG1ZxsvHg2/g8jUV6k3yEc
 QH+vZUtGRp/aGBmxlTHyiI3gUSUOyqBD+kG8yCq1ySfHWFFT03D6qIsZThh2GB6B
 eiq4kHzhXsOI3IL8BjXmAWRa0KJydELMr+ofgQWNkFiIVnNRedS39a8t9Aulnxoc
 1T6vz9+laYiHdXkaIxsWNM2WPKzvdJfiEf2MKLyxQ5jWgqh6jSemx5b3BH6z2c9J
 0RZMMVNm9BH5JBTiL01/PE6m+e+EaeuB21HgmkzHENWiFlQnphE=
 =SSJQ
 -----END PGP SIGNATURE-----

Merge 4.4.88 into android-4.4

Changes in 4.4.88
	usb: quirks: add delay init quirk for Corsair Strafe RGB keyboard
	USB: serial: option: add support for D-Link DWM-157 C1
	usb: Add device quirk for Logitech HD Pro Webcam C920-C
	usb:xhci:Fix regression when ATI chipsets detected
	USB: core: Avoid race of async_completed() w/ usbdev_release()
	staging/rts5208: fix incorrect shift to extract upper nybble
	driver core: bus: Fix a potential double free
	intel_th: pci: Add Cannon Lake PCH-H support
	intel_th: pci: Add Cannon Lake PCH-LP support
	ath10k: fix memory leak in rx ring buffer allocation
	Input: trackpoint - assume 3 buttons when buttons detection fails
	rtlwifi: rtl_pci_probe: Fix fail path of _rtl_pci_find_adapter
	Bluetooth: Add support of 13d3:3494 RTL8723BE device
	dlm: avoid double-free on error path in dlm_device_{register,unregister}
	mwifiex: correct channel stat buffer overflows
	drm/nouveau/pci/msi: disable MSI on big-endian platforms by default
	workqueue: Fix flag collision
	cs5536: add support for IDE controller variant
	scsi: sg: protect against races between mmap() and SG_SET_RESERVED_SIZE
	scsi: sg: recheck MMAP_IO request length with lock held
	drm: adv7511: really enable interrupts for EDID detection
	drm/bridge: adv7511: Fix mutex deadlock when interrupts are disabled
	drm/bridge: adv7511: Use work_struct to defer hotplug handing to out of irq context
	drm/bridge: adv7511: Switch to using drm_kms_helper_hotplug_event()
	drm/bridge: adv7511: Re-write the i2c address before EDID probing
	btrfs: resume qgroup rescan on rw remount
	locktorture: Fix potential memory leak with rw lock test
	ALSA: msnd: Optimize / harden DSP and MIDI loops
	Bluetooth: Properly check L2CAP config option output buffer length
	ARM: 8692/1: mm: abort uaccess retries upon fatal signal
	NFS: Fix 2 use after free issues in the I/O code
	xfs: XFS_IS_REALTIME_INODE() should be false if no rt device present
	Linux 4.4.88

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-09-14 09:48:29 -07:00
Greg Kroah-Hartman
b52c9082f2 Linux 4.4.88 2017-09-13 14:10:05 -07:00
Richard Wareing
ad39034341 xfs: XFS_IS_REALTIME_INODE() should be false if no rt device present
commit b31ff3cdf540110da4572e3e29bd172087af65cc upstream.

If using a kernel with CONFIG_XFS_RT=y and we set the RHINHERIT flag on
a directory in a filesystem that does not have a realtime device and
create a new file in that directory, it gets marked as a real time file.
When data is written and a fsync is issued, the filesystem attempts to
flush a non-existent rt device during the fsync process.

This results in a crash dereferencing a null buftarg pointer in
xfs_blkdev_issue_flush():

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  IP: xfs_blkdev_issue_flush+0xd/0x20
  .....
  Call Trace:
    xfs_file_fsync+0x188/0x1c0
    vfs_fsync_range+0x3b/0xa0
    do_fsync+0x3d/0x70
    SyS_fsync+0x10/0x20
    do_syscall_64+0x4d/0xb0
    entry_SYSCALL64_slow_path+0x25/0x25

Setting RT inode flags does not require special privileges so any
unprivileged user can cause this oops to occur.  To reproduce, confirm
kernel is compiled with CONFIG_XFS_RT=y and run:

  # mkfs.xfs -f /dev/pmem0
  # mount /dev/pmem0 /mnt/test
  # mkdir /mnt/test/foo
  # xfs_io -c 'chattr +t' /mnt/test/foo
  # xfs_io -f -c 'pwrite 0 5m' -c fsync /mnt/test/foo/bar

Or just run xfstests with MKFS_OPTIONS="-d rtinherit=1" and wait.

Kernels built with CONFIG_XFS_RT=n are not exposed to this bug.

Fixes: f538d4da8d ("[XFS] write barrier support")
Signed-off-by: Richard Wareing <rwareing@fb.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-13 14:09:46 -07:00
Trond Myklebust
677a803640 NFS: Fix 2 use after free issues in the I/O code
commit 196639ebbe63a037fe9a80669140bd292d8bcd80 upstream.

The writeback code wants to send a commit after processing the pages,
which is why we want to delay releasing the struct path until after
that's done.

Also, the layout code expects that we do not free the inode before
we've put the layout segments in pnfs_writehdr_free() and
pnfs_readhdr_free()

Fixes: 919e3bd9a875 ("NFS: Ensure we commit after writeback is complete")
Fixes: 4714fb51fd ("nfs: remove pgio_header refcount, related cleanup")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-13 14:09:46 -07:00