* refs/heads/tmp-0c3b8c4
Linux 4.4.177
KVM: X86: Fix residual mmio emulation request to userspace
KVM: nVMX: Ignore limit checks on VMX instructions using flat segments
KVM: nVMX: Sign extend displacements of VMX instr's mem operands
drm/radeon/evergreen_cs: fix missing break in switch statement
media: uvcvideo: Avoid NULL pointer dereference at the end of streaming
rcu: Do RCU GP kthread self-wakeup from softirq and interrupt
PM / wakeup: Rework wakeup source timer cancellation
nfsd: fix wrong check in write_v4_end_grace()
nfsd: fix memory corruption caused by readdir
NFS: Don't recoalesce on error in nfs_pageio_complete_mirror()
NFS: Fix an I/O request leakage in nfs_do_recoalesce
md: Fix failed allocation of md_register_thread
perf intel-pt: Fix overlap calculation for padding
perf auxtrace: Define auxtrace record alignment
perf intel-pt: Fix CYC timestamp calculation after OVF
NFS41: pop some layoutget errors to application
dm: fix to_sector() for 32bit
ARM: s3c24xx: Fix boolean expressions in osiris_dvs_notify
powerpc/83xx: Also save/restore SPRG4-7 during suspend
powerpc/powernv: Make opal log only readable by root
powerpc/wii: properly disable use of BATs when requested.
powerpc/32: Clear on-stack exception marker upon exception return
jbd2: fix compile warning when using JBUFFER_TRACE
jbd2: clear dirty flag when revoking a buffer from an older transaction
serial: 8250_pci: Have ACCES cards that use the four port Pericom PI7C9X7954 chip use the pci_pericom_setup()
serial: 8250_pci: Fix number of ports for ACCES serial cards
perf bench: Copy kernel files needed to build mem{cpy,set} x86_64 benchmarks
i2c: tegra: fix maximum transfer size
parport_pc: fix find_superio io compare code, should use equal test.
intel_th: Don't reference unassigned outputs
kernel/sysctl.c: add missing range check in do_proc_dointvec_minmax_conv
mm/vmalloc: fix size check for remap_vmalloc_range_partial()
dmaengine: usb-dmac: Make DMAC system sleep callbacks explicit
clk: ingenic: Fix round_rate misbehaving with non-integer dividers
ext2: Fix underflow in ext2_max_size()
ext4: fix crash during online resizing
cpufreq: pxa2xx: remove incorrect __init annotation
cpufreq: tegra124: add missing of_node_put()
crypto: pcbc - remove bogus memcpy()s with src == dest
Btrfs: fix corruption reading shared and compressed extents after hole punching
btrfs: ensure that a DUP or RAID1 block group has exactly two stripes
m68k: Add -ffreestanding to CFLAGS
scsi: target/iscsi: Avoid iscsit_release_commands_from_conn() deadlock
scsi: virtio_scsi: don't send sc payload with tmfs
s390/virtio: handle find on invalid queue gracefully
clocksource/drivers/exynos_mct: Clear timer interrupt when shutdown
clocksource/drivers/exynos_mct: Move one-shot check from tick clear to ISR
regulator: s2mpa01: Fix step values for some LDOs
regulator: s2mps11: Fix steps for buck7, buck8 and LDO35
ACPI / device_sysfs: Avoid OF modalias creation for removed device
tracing: Do not free iter->trace in fail path of tracing_open_pipe()
CIFS: Fix read after write for files with read caching
crypto: arm64/aes-ccm - fix logical bug in AAD MAC handling
stm class: Prevent division by zero
tmpfs: fix uninitialized return value in shmem_link
net: set static variable an initial value in atl2_probe()
mac80211_hwsim: propagate genlmsg_reply return code
phonet: fix building with clang
ARC: uacces: remove lp_start, lp_end from clobber list
tmpfs: fix link accounting when a tmpfile is linked in
arm64: Relax GIC version check during early boot
ASoC: topology: free created components in tplg load error
net: mv643xx_eth: disable clk on error path in mv643xx_eth_shared_probe()
pinctrl: meson: meson8b: fix the sdxc_a data 1..3 pins
net: systemport: Fix reception of BPDUs
scsi: libiscsi: Fix race between iscsi_xmit_task and iscsi_complete_task
assoc_array: Fix shortcut creation
ARM: 8824/1: fix a migrating irq bug when hotplug cpu
Input: st-keyscan - fix potential zalloc NULL dereference
i2c: cadence: Fix the hold bit setting
Input: matrix_keypad - use flush_delayed_work()
ARM: OMAP2+: Variable "reg" in function omap4_dsi_mux_pads() could be uninitialized
s390/dasd: fix using offset into zero size array error
gpu: ipu-v3: Fix CSI offsets for imx53
gpu: ipu-v3: Fix i.MX51 CSI control registers offset
crypto: ahash - fix another early termination in hash walk
crypto: caam - fixed handling of sg list
stm class: Fix an endless loop in channel allocation
ASoC: fsl_esai: fix register setting issue in RIGHT_J mode
9p/net: fix memory leak in p9_client_create
9p: use inode->i_lock to protect i_size_write() under 32-bit
media: videobuf2-v4l2: drop WARN_ON in vb2_warn_zero_bytesused()
It's wrong to add len to sector_nr in raid10 reshape twice
fs/9p: use fscache mutex rather than spinlock
ALSA: bebob: use more identical mod_alias for Saffire Pro 10 I/O against Liquid Saffire 56
tcp/dccp: remove reqsk_put() from inet_child_forget()
gro_cells: make sure device is up in gro_cells_receive()
net/hsr: fix possible crash in add_timer()
vxlan: Fix GRO cells race condition between receive and link delete
vxlan: test dev->flags & IFF_UP before calling gro_cells_receive()
ipvlan: disallow userns cap_net_admin to change global mode/flags
missing barriers in some of unix_sock ->addr and ->path accesses
net: Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255
mdio_bus: Fix use-after-free on device_register fails
net/x25: fix a race in x25_bind()
net/mlx4_core: Fix qp mtt size calculation
net/mlx4_core: Fix reset flow when in command polling mode
tcp: handle inet_csk_reqsk_queue_add() failures
route: set the deleted fnhe fnhe_daddr to 0 in ip_del_fnhe to fix a race
ravb: Decrease TxFIFO depth of Q3 and Q2 to one
pptp: dst_release sk_dst_cache in pptp_sock_destruct
net/x25: reset state in x25_connect()
net/x25: fix use-after-free in x25_device_event()
net: sit: fix UBSAN Undefined behaviour in check_6rd
net: hsr: fix memory leak in hsr_dev_finalize()
l2tp: fix infoleak in l2tp_ip6_recvmsg()
KEYS: restrict /proc/keys by credentials at open time
netfilter: nf_conntrack_tcp: Fix stack out of bounds when parsing TCP options
netfilter: nfnetlink_acct: validate NFACCT_FILTER parameters
netfilter: nfnetlink_log: just returns error for unknown command
netfilter: x_tables: enforce nul-terminated table name from getsockopt GET_ENTRIES
udplite: call proper backlog handlers
ARM: dts: exynos: Do not ignore real-world fuse values for thermal zone 0 on Exynos5420
Revert "x86/platform/UV: Use efi_runtime_lock to serialise BIOS calls"
ARM: dts: exynos: Add minimal clkout parameters to Exynos3250 PMU
futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock()
iscsi_ibft: Fix missing break in switch statement
Input: elan_i2c - add id for touchpad found in Lenovo s21e-20
Input: wacom_serial4 - add support for Wacom ArtPad II tablet
MIPS: Remove function size check in get_frame_info()
perf symbols: Filter out hidden symbols from labels
s390/qeth: fix use-after-free in error path
dmaengine: dmatest: Abort test in case of mapping error
dmaengine: at_xdmac: Fix wrongfull report of a channel as in use
irqchip/mmp: Only touch the PJ4 IRQ & FIQ bits on enable/disable
ARM: pxa: ssp: unneeded to free devm_ allocated data
autofs: fix error return in autofs_fill_super()
autofs: drop dentry reference only when it is never used
fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
mm, memory_hotplug: test_pages_in_a_zone do not pass the end of zone
mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone
x86_64: increase stack size for KASAN_EXTRA
x86/kexec: Don't setup EFI info if EFI runtime is not enabled
cifs: fix computation for MAX_SMB2_HDR_SIZE
platform/x86: Fix unmet dependency warning for SAMSUNG_Q10
scsi: libfc: free skb when receiving invalid flogi resp
nfs: Fix NULL pointer dereference of dev_name
gpio: vf610: Mask all GPIO interrupts
net: stmmac: dwmac-rk: fix error handling in rk_gmac_powerup()
net: hns: Fix wrong read accesses via Clause 45 MDIO protocol
net: altera_tse: fix msgdma_tx_completion on non-zero fill_level case
xtensa: SMP: limit number of possible CPUs by NR_CPUS
xtensa: SMP: mark each possible CPU as present
xtensa: smp_lx200_defconfig: fix vectors clash
xtensa: SMP: fix secondary CPU initialization
xtensa: SMP: fix ccount_timer_shutdown
iommu/amd: Fix IOMMU page flush when detach device from a domain
ipvs: Fix signed integer overflow when setsockopt timeout
IB/{hfi1, qib}: Fix WC.byte_len calculation for UD_SEND_WITH_IMM
perf tools: Handle TOPOLOGY headers with no CPU
vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel
media: uvcvideo: Fix 'type' check leading to overflow
ip6mr: Do not call __IP6_INC_STATS() from preemptible context
net: dsa: mv88e6xxx: Fix u64 statistics
netlabel: fix out-of-bounds memory accesses
hugetlbfs: fix races and page leaks during migration
MIPS: irq: Allocate accurate order pages for irq stack
applicom: Fix potential Spectre v1 vulnerabilities
x86/CPU/AMD: Set the CPB bit unconditionally on F17h
net: phy: Micrel KSZ8061: link failure after cable connect
net: avoid use IPCB in cipso_v4_error
net: Add __icmp_send helper.
xen-netback: fix occasional leak of grant ref mappings under memory pressure
net: nfc: Fix NULL dereference on nfc_llcp_build_tlv fails
bnxt_en: Drop oversize TX packets to prevent errors.
team: Free BPF filter when unregistering netdev
sky2: Disable MSI on Dell Inspiron 1545 and Gateway P-79
net-sysfs: Fix mem leak in netdev_register_kobject
staging: lustre: fix buffer overflow of string buffer
isdn: isdn_tty: fix build warning of strncpy
ncpfs: fix build warning of strncpy
sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
cpufreq: Use struct kobj_attribute instead of struct global_attr
USB: serial: ftdi_sio: add ID for Hjelmslund Electronics USB485
USB: serial: cp210x: add ID for Ingenico 3070
USB: serial: option: add Telit ME910 ECM composition
x86/uaccess: Don't leak the AC flag into __put_user() value evaluation
mm: enforce min addr even if capable() in expand_downwards()
mmc: spi: Fix card detection during probe
powerpc: Always initialize input array when calling epapr_hypercall()
KVM: arm/arm64: Fix MMIO emulation data handling
arm/arm64: KVM: Feed initialized memory to MMIO accesses
KVM: nSVM: clear events pending from svm_complete_interrupts() when exiting to L1
cfg80211: extend range deviation for DMG
mac80211: don't initiate TDLS connection if station is not associated to AP
ibmveth: Do not process frames after calling napi_reschedule
net: altera_tse: fix connect_local_phy error path
scsi: csiostor: fix NULL pointer dereference in csio_vport_set_state()
serial: fsl_lpuart: fix maximum acceptable baud rate with over-sampling
mac80211: fix miscounting of ttl-dropped frames
ARC: fix __ffs return value to avoid build warnings
ASoC: imx-audmux: change snprintf to scnprintf for possible overflow
ASoC: dapm: change snprintf to scnprintf for possible overflow
usb: gadget: Potential NULL dereference on allocation error
usb: dwc3: gadget: Fix the uninitialized link_state when udc starts
thermal: int340x_thermal: Fix a NULL vs IS_ERR() check
ALSA: compress: prevent potential divide by zero bugs
ASoC: Intel: Haswell/Broadwell: fix setting for .dynamic field
drm/msm: Unblock writer if reader closes file
scsi: libsas: Fix rphy phy_identifier for PHYs with end devices attached
libceph: handle an empty authorize reply
Revert "bridge: do not add port to router list when receives query with source 0.0.0.0"
ARCv2: Enable unaligned access in early ASM code
net/mlx4_en: Force CHECKSUM_NONE for short ethernet frames
sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()
team: avoid complex list operations in team_nl_cmd_options_set()
net/packet: fix 4gb buffer limit due to overflow check
batman-adv: fix uninit-value in batadv_interface_tx()
KEYS: always initialize keyring_index_key::desc_len
KEYS: user: Align the payload buffer
RDMA/srp: Rework SCSI device reset handling
isdn: avm: Fix string plus integer warning from Clang
leds: lp5523: fix a missing check of return value of lp55xx_read
atm: he: fix sign-extension overflow on large shift
isdn: i4l: isdn_tty: Fix some concurrency double-free bugs
MIPS: jazz: fix 64bit build
scsi: isci: initialize shost fully before calling scsi_add_host()
scsi: qla4xxx: check return code of qla4xxx_copy_from_fwddb_param
MIPS: ath79: Enable OF serial ports in the default config
net: hns: Fix use after free identified by SLUB debug
mfd: mc13xxx: Fix a missing check of a register-read failure
mfd: wm5110: Add missing ASRC rate register
mfd: qcom_rpm: write fw_version to CTRL_REG
mfd: ab8500-core: Return zero in get_register_interruptible()
mfd: db8500-prcmu: Fix some section annotations
mfd: twl-core: Fix section annotations on {,un}protect_pm_master
mfd: ti_am335x_tscadc: Use PLATFORM_DEVID_AUTO while registering mfd cells
KEYS: allow reaching the keys quotas exactly
numa: change get_mempolicy() to use nr_node_ids instead of MAX_NUMNODES
ceph: avoid repeatedly adding inode to mdsc->snap_flush_list
Revert "ANDROID: arm: process: Add display of memory around registers when displaying regs."
ANDROID: mnt: Propagate remount correctly
ANDROID: cuttlefish_defconfig: Add support for AC97 audio
ANDROID: overlayfs: override_creds=off option bypass creator_cred
FROMGIT: binder: create node flag to request sender's security context
Conflicts:
arch/arm/kernel/irq.c
drivers/media/v4l2-core/videobuf2-v4l2.c
sound/core/compress_offload.c
Change-Id: I998f8d53b0c5b8a7102816034452b1779a3b69a3
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
[ Upstream commit c27d82f52f75fc9d8d9d40d120d2a96fdeeada5e ]
When superblock has lots of inodes without any pagecache (like is the
case for /proc), drop_pagecache_sb() will iterate through all of them
without dropping sb->s_inode_list_lock which can lead to softlockups
(one of our customers hit this).
Fix the problem by going to the slow path and doing cond_resched() in
case the process needs rescheduling.
Link: http://lkml.kernel.org/r/20190114085343.15011-1-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Exposes drop_pagecache_sb (required by eCryptfs cache wiping)
Adds truncate_inode_pages_fill_zero (required by eCryptfs cache wiping),
which not only truncates pages but also fills them with 0, so that the
cached data can no longer be retrieved.
Change-Id: Icfc18a2c8cdc922e71ee17add6459a1355e77ba6
Signed-off-by: Andrey Markovytch <andreym@codeaurora.org>
[gbroner@codeaurora.org: fix merge conflict]
Signed-off-by: Gilad Broner <gbroner@codeaurora.org>
The process of reducing contention on per-superblock inode lists
starts with moving the locking to match the per-superblock inode
list. This takes the global lock out of the picture and reduces the
contention problems to within a single filesystem. This doesn't get
rid of contention as the locks still have global CPU scope, but it
does isolate operations on different superblocks form each other.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Dave Chinner <dchinner@redhat.com>
This patch adds SHRINKER_MEMCG_AWARE flag. If a shrinker has this flag
set, it will be called per memory cgroup. The memory cgroup to scan
objects from is passed in shrink_control->memcg. If the memory cgroup
is NULL, a memcg aware shrinker is supposed to scan objects from the
global list. Unaware shrinkers are only called on global pressure with
memcg=NULL.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Greg Thelen <gthelen@google.com>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The slab shrinkers are currently invoked from the zonelist walkers in
kswapd, direct reclaim, and zone reclaim, all of which roughly gauge the
eligible LRU pages and assemble a nodemask to pass to NUMA-aware
shrinkers, which then again have to walk over the nodemask. This is
redundant code, extra runtime work, and fairly inaccurate when it comes to
the estimation of actually scannable LRU pages. The code duplication will
only get worse when making the shrinkers cgroup-aware and requiring them
to have out-of-band cgroup hierarchy walks as well.
Instead, invoke the shrinkers from shrink_zone(), which is where all
reclaimers end up, to avoid this duplication.
Take the count for eligible LRU pages out of get_scan_count(), which
considers many more factors than just the availability of swap space, like
zone_reclaimable_pages() currently does. Accumulate the number over all
visited lruvecs to get the per-zone value.
Some nodes have multiple zones due to memory addressing restrictions. To
avoid putting too much pressure on the shrinkers, only invoke them once
for each such node, using the class zone of the allocation as the pivot
zone.
For now, this integrates the slab shrinking better into the reclaim logic
and gets rid of duplicative invocations from kswapd, direct reclaim, and
zone reclaim. It also prepares for cgroup-awareness, allowing
memcg-capable shrinkers to be added at the lruvec level without much
duplication of both code and runtime work.
This changes kswapd behavior, which used to invoke the shrinkers for each
zone, but with scan ratios gathered from the entire node, resulting in
meaningless pressure quantities on multi-zone nodes.
Zone reclaim behavior also changes. It used to shrink slabs until the
same amount of pages were shrunk as were reclaimed from the LRUs. Now it
merely invokes the shrinkers once with the zone's scan ratio, which makes
the shrinkers go easier on caches that implement aging and would prefer
feeding back pressure from recently used slab objects to unused LRU pages.
[vdavydov@parallels.com: assure class zone is populated]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This typedef is unnecessary and should just be removed.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is plenty of anecdotal evidence and a load of blog posts
suggesting that using "drop_caches" periodically keeps your system
running in "tip top shape". Perhaps adding some kernel documentation
will increase the amount of accurate data on its use.
If we are not shrinking caches effectively, then we have real bugs.
Using drop_caches will simply mask the bugs and make them harder to
find, but certainly does not fix them, nor is it an appropriate
"workaround" to limit the size of the caches. On the contrary, there
have been bug reports on issues that turned out to be misguided use of
cache dropping.
Dropping caches is a very drastic and disruptive operation that is good
for debugging and running tests, but if it creates bug reports from
production use, kernel developers should be aware of its use.
Add a bit more documentation about it, a syslog message to track down
abusers, and vmstat drop counters to help analyze problem reports.
[akpm@linux-foundation.org: checkpatch fixes]
[hannes@cmpxchg.org: add runtime suppression control]
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pass the node of the current zone being reclaimed to shrink_slab(),
allowing the shrinker control nodemask to be set appropriately for node
aware shrinkers.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change each shrinker's API by consolidating the existing parameters into
shrink_control struct. This will simplify any further features added w/o
touching each file of shrinker.
[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: fix warning]
[kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API]
[akpm@linux-foundation.org: fix xfs warning]
[akpm@linux-foundation.org: update gfs2]
Signed-off-by: Ying Han <yinghan@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Consolidate the existing parameters to shrink_slab() into a new
shrink_control struct. This is needed later to pass the same struct to
shrinkers.
Signed-off-by: Ying Han <yinghan@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Protect the per-sb inode list with a new global lock
inode_sb_list_lock and use it to protect the list manipulations and
traversals. This lock replaces the inode_lock as the inodes on the
list can be validity checked while holding the inode->i_lock and
hence the inode_lock is no longer needed to protect the list.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Protect inode state transitions and validity checks with the
inode->i_lock. This enables us to make inode state transitions
independently of the inode_lock and is the first step to peeling
away the inode_lock from the code.
This requires that __iget() is done atomically with i_state checks
during list traversals so that we don't race with another thread
marking the inode I_FREEING between the state check and grabbing the
reference.
Also remove the unlock_new_inode() memory barrier optimisation
required to avoid taking the inode_lock when clearing I_NEW.
Simplify the code by simply taking the inode->i_lock around the
state change and wakeup. Because the wakeup is no longer tricky,
remove the wake_up_inode() function and open code the wakeup where
necessary.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
add I_CLEAR instead of replacing I_FREEING with it. I_CLEAR is
equivalent to I_FREEING for almost all code looking at either;
it's there to keep track of having called clear_inode() exactly
once per inode lifetime, at some point after having set I_FREEING.
I_CLEAR and I_FREEING never get set at the same time with the
current code, so we can switch to setting i_flags to I_FREEING | I_CLEAR
instead of I_CLEAR without loss of information. As the result of
such change, checks become simpler and the amount of code that needs
to know about I_CLEAR shrinks a lot.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We used to remove from s_list and s_instances at the same
time. So let's *not* do the former and skip superblocks
that have empty s_instances in the loops over s_list.
The next step, of course, will be to get rid of rescan logics
in those loops.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
It's unused.
It isn't needed -- read or write flag is already passed and sysctl
shouldn't care about the rest.
It _was_ used in two places at arch/frv for some reason.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove __invalidate_mapping_pages atomic variant now that its sole caller
can sleep (fixed in eccb95cee4 ("vfs: fix
lock inversion in drop_pagecache_sb()")).
This fixes softlockups that can occur while in the drop_caches path.
Signed-off-by: Mike Waychison <mikew@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
clear_inode() will switch inode state from I_FREEING to I_CLEAR, and do so
_outside_ of inode_lock. So any I_FREEING testing is incomplete without a
coupled testing of I_CLEAR.
So add I_CLEAR tests to drop_pagecache_sb(), generic_sync_sb_inodes() and
add_dquot_ref().
Masayoshi MIZUMA discovered the bug in drop_pagecache_sb() and Jan Kara
reminds fixing the other two cases.
Masayoshi MIZUMA has a nice panic flow:
=====================================================================
[process A] | [process B]
| |
| prune_icache() | drop_pagecache()
| spin_lock(&inode_lock) | drop_pagecache_sb()
| inode->i_state |= I_FREEING; | |
| spin_unlock(&inode_lock) | V
| | | spin_lock(&inode_lock)
| V | |
| dispose_list() | |
| list_del() | |
| clear_inode() | |
| inode->i_state = I_CLEAR | |
| | | V
| | | if (inode->i_state & (I_FREEING|I_WILL_FREE))
| | | continue; <==== NOT MATCH
| | |
| | | (DANGER from here on! Accessing disposing inode!)
| | |
| | | __iget()
| | | list_move() <===== PANIC on poisoned list !!
V V |
(time)
=====================================================================
Reported-by: Masayoshi MIZUMA <m.mizuma@jp.fujitsu.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To be on the safe side, it should be less fragile to exclude I_NEW inodes
from inode list scans by default (unless there is an important reason to
have them).
Normally they will get excluded (eg. by zero refcount or writecount etc),
however it is a bit fragile for list walkers to know exactly what parts of
the inode state is set up and valid to test when in I_NEW. So along these
lines, move I_NEW checks upward as well (sometimes taking I_FREEING etc
checks with them too -- this shouldn't be a problem should it?)
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Many inodes have no pagecache, so we can avoid lots of lock-takings.
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Fengguang Wu <wfg@mail.ustc.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix longstanding lock inversion in drop_pagecache_sb by dropping inode_lock
before calling __invalidate_mapping_pages(). We just have to make sure inode
won't go away from under us by keeping reference to it and putting the
reference only after we have safely resumed the scan of the inode list. A bit
tricky but not too bad...
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: David Chinner <dgc@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make the following needlessly global functions static:
- drop_pagecache()
- drop_slab()
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
invalidate_mapping_pages() can sometimes take a long time (millions of pages
to free). Long enough for the softlockup detector to trigger.
We used to have a cond_resched() in there but I took it out because the
drop_caches code calls invalidate_mapping_pages() under inode_lock.
The patch adds a nasty flag and puts the cond_resched() back.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert all calls to invalidate_inode_pages() into open-coded calls to
invalidate_mapping_pages().
Leave the invalidate_inode_pages() wrapper in place for now, marked as
deprecated.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add /proc/sys/vm/drop_caches. When written to, this will cause the kernel to
discard as much pagecache and/or reclaimable slab objects as it can. THis
operation requires root permissions.
It won't drop dirty data, so the user should run `sync' first.
Caveats:
a) Holds inode_lock for exorbitant amounts of time.
b) Needs to be taught about NUMA nodes: propagate these all the way through
so the discarding can be controlled on a per-node basis.
This is a debugging feature: useful for getting consistent results between
filesystem benchmarks. We could possibly put it under a config option, but
it's less than 300 bytes.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>