Commit graph

29 commits

Author SHA1 Message Date
Srinivasarao P
19342ee004 Merge android-4.4.177 (0c3b8c4) into msm-4.4
* refs/heads/tmp-0c3b8c4
  Linux 4.4.177
  KVM: X86: Fix residual mmio emulation request to userspace
  KVM: nVMX: Ignore limit checks on VMX instructions using flat segments
  KVM: nVMX: Sign extend displacements of VMX instr's mem operands
  drm/radeon/evergreen_cs: fix missing break in switch statement
  media: uvcvideo: Avoid NULL pointer dereference at the end of streaming
  rcu: Do RCU GP kthread self-wakeup from softirq and interrupt
  PM / wakeup: Rework wakeup source timer cancellation
  nfsd: fix wrong check in write_v4_end_grace()
  nfsd: fix memory corruption caused by readdir
  NFS: Don't recoalesce on error in nfs_pageio_complete_mirror()
  NFS: Fix an I/O request leakage in nfs_do_recoalesce
  md: Fix failed allocation of md_register_thread
  perf intel-pt: Fix overlap calculation for padding
  perf auxtrace: Define auxtrace record alignment
  perf intel-pt: Fix CYC timestamp calculation after OVF
  NFS41: pop some layoutget errors to application
  dm: fix to_sector() for 32bit
  ARM: s3c24xx: Fix boolean expressions in osiris_dvs_notify
  powerpc/83xx: Also save/restore SPRG4-7 during suspend
  powerpc/powernv: Make opal log only readable by root
  powerpc/wii: properly disable use of BATs when requested.
  powerpc/32: Clear on-stack exception marker upon exception return
  jbd2: fix compile warning when using JBUFFER_TRACE
  jbd2: clear dirty flag when revoking a buffer from an older transaction
  serial: 8250_pci: Have ACCES cards that use the four port Pericom PI7C9X7954 chip use the pci_pericom_setup()
  serial: 8250_pci: Fix number of ports for ACCES serial cards
  perf bench: Copy kernel files needed to build mem{cpy,set} x86_64 benchmarks
  i2c: tegra: fix maximum transfer size
  parport_pc: fix find_superio io compare code, should use equal test.
  intel_th: Don't reference unassigned outputs
  kernel/sysctl.c: add missing range check in do_proc_dointvec_minmax_conv
  mm/vmalloc: fix size check for remap_vmalloc_range_partial()
  dmaengine: usb-dmac: Make DMAC system sleep callbacks explicit
  clk: ingenic: Fix round_rate misbehaving with non-integer dividers
  ext2: Fix underflow in ext2_max_size()
  ext4: fix crash during online resizing
  cpufreq: pxa2xx: remove incorrect __init annotation
  cpufreq: tegra124: add missing of_node_put()
  crypto: pcbc - remove bogus memcpy()s with src == dest
  Btrfs: fix corruption reading shared and compressed extents after hole punching
  btrfs: ensure that a DUP or RAID1 block group has exactly two stripes
  m68k: Add -ffreestanding to CFLAGS
  scsi: target/iscsi: Avoid iscsit_release_commands_from_conn() deadlock
  scsi: virtio_scsi: don't send sc payload with tmfs
  s390/virtio: handle find on invalid queue gracefully
  clocksource/drivers/exynos_mct: Clear timer interrupt when shutdown
  clocksource/drivers/exynos_mct: Move one-shot check from tick clear to ISR
  regulator: s2mpa01: Fix step values for some LDOs
  regulator: s2mps11: Fix steps for buck7, buck8 and LDO35
  ACPI / device_sysfs: Avoid OF modalias creation for removed device
  tracing: Do not free iter->trace in fail path of tracing_open_pipe()
  CIFS: Fix read after write for files with read caching
  crypto: arm64/aes-ccm - fix logical bug in AAD MAC handling
  stm class: Prevent division by zero
  tmpfs: fix uninitialized return value in shmem_link
  net: set static variable an initial value in atl2_probe()
  mac80211_hwsim: propagate genlmsg_reply return code
  phonet: fix building with clang
  ARC: uacces: remove lp_start, lp_end from clobber list
  tmpfs: fix link accounting when a tmpfile is linked in
  arm64: Relax GIC version check during early boot
  ASoC: topology: free created components in tplg load error
  net: mv643xx_eth: disable clk on error path in mv643xx_eth_shared_probe()
  pinctrl: meson: meson8b: fix the sdxc_a data 1..3 pins
  net: systemport: Fix reception of BPDUs
  scsi: libiscsi: Fix race between iscsi_xmit_task and iscsi_complete_task
  assoc_array: Fix shortcut creation
  ARM: 8824/1: fix a migrating irq bug when hotplug cpu
  Input: st-keyscan - fix potential zalloc NULL dereference
  i2c: cadence: Fix the hold bit setting
  Input: matrix_keypad - use flush_delayed_work()
  ARM: OMAP2+: Variable "reg" in function omap4_dsi_mux_pads() could be uninitialized
  s390/dasd: fix using offset into zero size array error
  gpu: ipu-v3: Fix CSI offsets for imx53
  gpu: ipu-v3: Fix i.MX51 CSI control registers offset
  crypto: ahash - fix another early termination in hash walk
  crypto: caam - fixed handling of sg list
  stm class: Fix an endless loop in channel allocation
  ASoC: fsl_esai: fix register setting issue in RIGHT_J mode
  9p/net: fix memory leak in p9_client_create
  9p: use inode->i_lock to protect i_size_write() under 32-bit
  media: videobuf2-v4l2: drop WARN_ON in vb2_warn_zero_bytesused()
  It's wrong to add len to sector_nr in raid10 reshape twice
  fs/9p: use fscache mutex rather than spinlock
  ALSA: bebob: use more identical mod_alias for Saffire Pro 10 I/O against Liquid Saffire 56
  tcp/dccp: remove reqsk_put() from inet_child_forget()
  gro_cells: make sure device is up in gro_cells_receive()
  net/hsr: fix possible crash in add_timer()
  vxlan: Fix GRO cells race condition between receive and link delete
  vxlan: test dev->flags & IFF_UP before calling gro_cells_receive()
  ipvlan: disallow userns cap_net_admin to change global mode/flags
  missing barriers in some of unix_sock ->addr and ->path accesses
  net: Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255
  mdio_bus: Fix use-after-free on device_register fails
  net/x25: fix a race in x25_bind()
  net/mlx4_core: Fix qp mtt size calculation
  net/mlx4_core: Fix reset flow when in command polling mode
  tcp: handle inet_csk_reqsk_queue_add() failures
  route: set the deleted fnhe fnhe_daddr to 0 in ip_del_fnhe to fix a race
  ravb: Decrease TxFIFO depth of Q3 and Q2 to one
  pptp: dst_release sk_dst_cache in pptp_sock_destruct
  net/x25: reset state in x25_connect()
  net/x25: fix use-after-free in x25_device_event()
  net: sit: fix UBSAN Undefined behaviour in check_6rd
  net: hsr: fix memory leak in hsr_dev_finalize()
  l2tp: fix infoleak in l2tp_ip6_recvmsg()
  KEYS: restrict /proc/keys by credentials at open time
  netfilter: nf_conntrack_tcp: Fix stack out of bounds when parsing TCP options
  netfilter: nfnetlink_acct: validate NFACCT_FILTER parameters
  netfilter: nfnetlink_log: just returns error for unknown command
  netfilter: x_tables: enforce nul-terminated table name from getsockopt GET_ENTRIES
  udplite: call proper backlog handlers
  ARM: dts: exynos: Do not ignore real-world fuse values for thermal zone 0 on Exynos5420
  Revert "x86/platform/UV: Use efi_runtime_lock to serialise BIOS calls"
  ARM: dts: exynos: Add minimal clkout parameters to Exynos3250 PMU
  futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock()
  iscsi_ibft: Fix missing break in switch statement
  Input: elan_i2c - add id for touchpad found in Lenovo s21e-20
  Input: wacom_serial4 - add support for Wacom ArtPad II tablet
  MIPS: Remove function size check in get_frame_info()
  perf symbols: Filter out hidden symbols from labels
  s390/qeth: fix use-after-free in error path
  dmaengine: dmatest: Abort test in case of mapping error
  dmaengine: at_xdmac: Fix wrongfull report of a channel as in use
  irqchip/mmp: Only touch the PJ4 IRQ & FIQ bits on enable/disable
  ARM: pxa: ssp: unneeded to free devm_ allocated data
  autofs: fix error return in autofs_fill_super()
  autofs: drop dentry reference only when it is never used
  fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
  mm, memory_hotplug: test_pages_in_a_zone do not pass the end of zone
  mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone
  x86_64: increase stack size for KASAN_EXTRA
  x86/kexec: Don't setup EFI info if EFI runtime is not enabled
  cifs: fix computation for MAX_SMB2_HDR_SIZE
  platform/x86: Fix unmet dependency warning for SAMSUNG_Q10
  scsi: libfc: free skb when receiving invalid flogi resp
  nfs: Fix NULL pointer dereference of dev_name
  gpio: vf610: Mask all GPIO interrupts
  net: stmmac: dwmac-rk: fix error handling in rk_gmac_powerup()
  net: hns: Fix wrong read accesses via Clause 45 MDIO protocol
  net: altera_tse: fix msgdma_tx_completion on non-zero fill_level case
  xtensa: SMP: limit number of possible CPUs by NR_CPUS
  xtensa: SMP: mark each possible CPU as present
  xtensa: smp_lx200_defconfig: fix vectors clash
  xtensa: SMP: fix secondary CPU initialization
  xtensa: SMP: fix ccount_timer_shutdown
  iommu/amd: Fix IOMMU page flush when detach device from a domain
  ipvs: Fix signed integer overflow when setsockopt timeout
  IB/{hfi1, qib}: Fix WC.byte_len calculation for UD_SEND_WITH_IMM
  perf tools: Handle TOPOLOGY headers with no CPU
  vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel
  media: uvcvideo: Fix 'type' check leading to overflow
  ip6mr: Do not call __IP6_INC_STATS() from preemptible context
  net: dsa: mv88e6xxx: Fix u64 statistics
  netlabel: fix out-of-bounds memory accesses
  hugetlbfs: fix races and page leaks during migration
  MIPS: irq: Allocate accurate order pages for irq stack
  applicom: Fix potential Spectre v1 vulnerabilities
  x86/CPU/AMD: Set the CPB bit unconditionally on F17h
  net: phy: Micrel KSZ8061: link failure after cable connect
  net: avoid use IPCB in cipso_v4_error
  net: Add __icmp_send helper.
  xen-netback: fix occasional leak of grant ref mappings under memory pressure
  net: nfc: Fix NULL dereference on nfc_llcp_build_tlv fails
  bnxt_en: Drop oversize TX packets to prevent errors.
  team: Free BPF filter when unregistering netdev
  sky2: Disable MSI on Dell Inspiron 1545 and Gateway P-79
  net-sysfs: Fix mem leak in netdev_register_kobject
  staging: lustre: fix buffer overflow of string buffer
  isdn: isdn_tty: fix build warning of strncpy
  ncpfs: fix build warning of strncpy
  sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
  cpufreq: Use struct kobj_attribute instead of struct global_attr
  USB: serial: ftdi_sio: add ID for Hjelmslund Electronics USB485
  USB: serial: cp210x: add ID for Ingenico 3070
  USB: serial: option: add Telit ME910 ECM composition
  x86/uaccess: Don't leak the AC flag into __put_user() value evaluation
  mm: enforce min addr even if capable() in expand_downwards()
  mmc: spi: Fix card detection during probe
  powerpc: Always initialize input array when calling epapr_hypercall()
  KVM: arm/arm64: Fix MMIO emulation data handling
  arm/arm64: KVM: Feed initialized memory to MMIO accesses
  KVM: nSVM: clear events pending from svm_complete_interrupts() when exiting to L1
  cfg80211: extend range deviation for DMG
  mac80211: don't initiate TDLS connection if station is not associated to AP
  ibmveth: Do not process frames after calling napi_reschedule
  net: altera_tse: fix connect_local_phy error path
  scsi: csiostor: fix NULL pointer dereference in csio_vport_set_state()
  serial: fsl_lpuart: fix maximum acceptable baud rate with over-sampling
  mac80211: fix miscounting of ttl-dropped frames
  ARC: fix __ffs return value to avoid build warnings
  ASoC: imx-audmux: change snprintf to scnprintf for possible overflow
  ASoC: dapm: change snprintf to scnprintf for possible overflow
  usb: gadget: Potential NULL dereference on allocation error
  usb: dwc3: gadget: Fix the uninitialized link_state when udc starts
  thermal: int340x_thermal: Fix a NULL vs IS_ERR() check
  ALSA: compress: prevent potential divide by zero bugs
  ASoC: Intel: Haswell/Broadwell: fix setting for .dynamic field
  drm/msm: Unblock writer if reader closes file
  scsi: libsas: Fix rphy phy_identifier for PHYs with end devices attached
  libceph: handle an empty authorize reply
  Revert "bridge: do not add port to router list when receives query with source 0.0.0.0"
  ARCv2: Enable unaligned access in early ASM code
  net/mlx4_en: Force CHECKSUM_NONE for short ethernet frames
  sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()
  team: avoid complex list operations in team_nl_cmd_options_set()
  net/packet: fix 4gb buffer limit due to overflow check
  batman-adv: fix uninit-value in batadv_interface_tx()
  KEYS: always initialize keyring_index_key::desc_len
  KEYS: user: Align the payload buffer
  RDMA/srp: Rework SCSI device reset handling
  isdn: avm: Fix string plus integer warning from Clang
  leds: lp5523: fix a missing check of return value of lp55xx_read
  atm: he: fix sign-extension overflow on large shift
  isdn: i4l: isdn_tty: Fix some concurrency double-free bugs
  MIPS: jazz: fix 64bit build
  scsi: isci: initialize shost fully before calling scsi_add_host()
  scsi: qla4xxx: check return code of qla4xxx_copy_from_fwddb_param
  MIPS: ath79: Enable OF serial ports in the default config
  net: hns: Fix use after free identified by SLUB debug
  mfd: mc13xxx: Fix a missing check of a register-read failure
  mfd: wm5110: Add missing ASRC rate register
  mfd: qcom_rpm: write fw_version to CTRL_REG
  mfd: ab8500-core: Return zero in get_register_interruptible()
  mfd: db8500-prcmu: Fix some section annotations
  mfd: twl-core: Fix section annotations on {,un}protect_pm_master
  mfd: ti_am335x_tscadc: Use PLATFORM_DEVID_AUTO while registering mfd cells
  KEYS: allow reaching the keys quotas exactly
  numa: change get_mempolicy() to use nr_node_ids instead of MAX_NUMNODES
  ceph: avoid repeatedly adding inode to mdsc->snap_flush_list
  Revert "ANDROID: arm: process: Add display of memory around registers when displaying regs."
  ANDROID: mnt: Propagate remount correctly
  ANDROID: cuttlefish_defconfig: Add support for AC97 audio
  ANDROID: overlayfs: override_creds=off option bypass creator_cred
  FROMGIT: binder: create node flag to request sender's security context

Conflicts:
	arch/arm/kernel/irq.c
	drivers/media/v4l2-core/videobuf2-v4l2.c
	sound/core/compress_offload.c

Change-Id: I998f8d53b0c5b8a7102816034452b1779a3b69a3
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
2019-03-25 12:49:05 +05:30
Jan Kara
5d5a802ec2 fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
[ Upstream commit c27d82f52f75fc9d8d9d40d120d2a96fdeeada5e ]

When superblock has lots of inodes without any pagecache (like is the
case for /proc), drop_pagecache_sb() will iterate through all of them
without dropping sb->s_inode_list_lock which can lead to softlockups
(one of our customers hit this).

Fix the problem by going to the slow path and doing cond_resched() in
case the process needs rescheduling.

Link: http://lkml.kernel.org/r/20190114085343.15011-1-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-23 08:44:26 +01:00
Andrey Markovytch
b61ac21fb0 mm + fs: extends support for cache dropping
Exposes drop_pagecache_sb (required by eCryptfs cache wiping)
Adds truncate_inode_pages_fill_zero (required by eCryptfs cache wiping),
which not only truncates pages but also fills them with 0, so that the
cached data can no longer be retrieved.

Change-Id: Icfc18a2c8cdc922e71ee17add6459a1355e77ba6
Signed-off-by: Andrey Markovytch <andreym@codeaurora.org>
[gbroner@codeaurora.org: fix merge conflict]
Signed-off-by: Gilad Broner <gbroner@codeaurora.org>
2016-03-23 21:24:12 -07:00
Dave Chinner
74278da9f7 inode: convert inode_sb_list_lock to per-sb
The process of reducing contention on per-superblock inode lists
starts with moving the locking to match the per-superblock inode
list. This takes the global lock out of the picture and reduces the
contention problems to within a single filesystem. This doesn't get
rid of contention as the locks still have global CPU scope, but it
does isolate operations on different superblocks form each other.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Dave Chinner <dchinner@redhat.com>
2015-08-17 18:39:46 -04:00
Vladimir Davydov
cb731d6c62 vmscan: per memory cgroup slab shrinkers
This patch adds SHRINKER_MEMCG_AWARE flag.  If a shrinker has this flag
set, it will be called per memory cgroup.  The memory cgroup to scan
objects from is passed in shrink_control->memcg.  If the memory cgroup
is NULL, a memcg aware shrinker is supposed to scan objects from the
global list.  Unaware shrinkers are only called on global pressure with
memcg=NULL.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Greg Thelen <gthelen@google.com>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:09 -08:00
Johannes Weiner
6b4f7799c6 mm: vmscan: invoke slab shrinkers from shrink_zone()
The slab shrinkers are currently invoked from the zonelist walkers in
kswapd, direct reclaim, and zone reclaim, all of which roughly gauge the
eligible LRU pages and assemble a nodemask to pass to NUMA-aware
shrinkers, which then again have to walk over the nodemask.  This is
redundant code, extra runtime work, and fairly inaccurate when it comes to
the estimation of actually scannable LRU pages.  The code duplication will
only get worse when making the shrinkers cgroup-aware and requiring them
to have out-of-band cgroup hierarchy walks as well.

Instead, invoke the shrinkers from shrink_zone(), which is where all
reclaimers end up, to avoid this duplication.

Take the count for eligible LRU pages out of get_scan_count(), which
considers many more factors than just the availability of swap space, like
zone_reclaimable_pages() currently does.  Accumulate the number over all
visited lruvecs to get the per-zone value.

Some nodes have multiple zones due to memory addressing restrictions.  To
avoid putting too much pressure on the shrinkers, only invoke them once
for each such node, using the class zone of the allocation as the pivot
zone.

For now, this integrates the slab shrinking better into the reclaim logic
and gets rid of duplicative invocations from kswapd, direct reclaim, and
zone reclaim.  It also prepares for cgroup-awareness, allowing
memcg-capable shrinkers to be added at the lruvec level without much
duplication of both code and runtime work.

This changes kswapd behavior, which used to invoke the shrinkers for each
zone, but with scan ratios gathered from the entire node, resulting in
meaningless pressure quantities on multi-zone nodes.

Zone reclaim behavior also changes.  It used to shrink slabs until the
same amount of pages were shrunk as were reclaimed from the LRUs.  Now it
merely invokes the shrinkers once with the zone's scan ratio, which makes
the shrinkers go easier on caches that implement aging and would prefer
feeding back pressure from recently used slab objects to unused LRU pages.

[vdavydov@parallels.com: assure class zone is populated]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:48 -08:00
Joe Perches
1f7e0616cd fs: convert use of typedef ctl_table to struct ctl_table
This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:16 -07:00
Dave Hansen
5509a5d27b drop_caches: add some documentation and info message
There is plenty of anecdotal evidence and a load of blog posts
suggesting that using "drop_caches" periodically keeps your system
running in "tip top shape".  Perhaps adding some kernel documentation
will increase the amount of accurate data on its use.

If we are not shrinking caches effectively, then we have real bugs.
Using drop_caches will simply mask the bugs and make them harder to
find, but certainly does not fix them, nor is it an appropriate
"workaround" to limit the size of the caches.  On the contrary, there
have been bug reports on issues that turned out to be misguided use of
cache dropping.

Dropping caches is a very drastic and disruptive operation that is good
for debugging and running tests, but if it creates bug reports from
production use, kernel developers should be aware of its use.

Add a bit more documentation about it, a syslog message to track down
abusers, and vmstat drop counters to help analyze problem reports.

[akpm@linux-foundation.org: checkpatch fixes]
[hannes@cmpxchg.org: add runtime suppression control]
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:04 -07:00
Dave Chinner
0ce3d74450 shrinker: add node awareness
Pass the node of the current zone being reclaimed to shrink_slab(),
allowing the shrinker control nodemask to be set appropriately for node
aware shrinkers.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10 18:56:31 -04:00
Ying Han
1495f230fa vmscan: change shrinker API by passing shrink_control struct
Change each shrinker's API by consolidating the existing parameters into
shrink_control struct.  This will simplify any further features added w/o
touching each file of shrinker.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: fix warning]
[kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API]
[akpm@linux-foundation.org: fix xfs warning]
[akpm@linux-foundation.org: update gfs2]
Signed-off-by: Ying Han <yinghan@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:26 -07:00
Ying Han
a09ed5e000 vmscan: change shrink_slab() interfaces by passing shrink_control
Consolidate the existing parameters to shrink_slab() into a new
shrink_control struct.  This is needed later to pass the same struct to
shrinkers.

Signed-off-by: Ying Han <yinghan@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:25 -07:00
Dave Chinner
55fa6091d8 fs: move i_sb_list out from under inode_lock
Protect the per-sb inode list with a new global lock
inode_sb_list_lock and use it to protect the list manipulations and
traversals. This lock replaces the inode_lock as the inodes on the
list can be validity checked while holding the inode->i_lock and
hence the inode_lock is no longer needed to protect the list.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-24 21:16:32 -04:00
Dave Chinner
250df6ed27 fs: protect inode->i_state with inode->i_lock
Protect inode state transitions and validity checks with the
inode->i_lock. This enables us to make inode state transitions
independently of the inode_lock and is the first step to peeling
away the inode_lock from the code.

This requires that __iget() is done atomically with i_state checks
during list traversals so that we don't race with another thread
marking the inode I_FREEING between the state check and grabbing the
reference.

Also remove the unlock_new_inode() memory barrier optimisation
required to avoid taking the inode_lock when clearing I_NEW.
Simplify the code by simply taking the inode->i_lock around the
state change and wakeup. Because the wakeup is no longer tricky,
remove the wake_up_inode() function and open code the wakeup where
necessary.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-24 21:16:31 -04:00
Petr Holasek
cb16e95fa2 sysctl: add some missing input constraint checks
Add boundaries of allowed input ranges for: dirty_expire_centisecs,
drop_caches, overcommit_memory, page-cluster and panic_on_oom.

Signed-off-by: Petr Holasek <pholasek@redhat.com>
Acked-by: Dave Young <hidave.darkstar@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-23 19:46:51 -07:00
Al Viro
a4ffdde6e5 simplify checks for I_CLEAR/I_FREEING
add I_CLEAR instead of replacing I_FREEING with it.  I_CLEAR is
equivalent to I_FREEING for almost all code looking at either;
it's there to keep track of having called clear_inode() exactly
once per inode lifetime, at some point after having set I_FREEING.
I_CLEAR and I_FREEING never get set at the same time with the
current code, so we can switch to setting i_flags to I_FREEING | I_CLEAR
instead of I_CLEAR without loss of information.  As the result of
such change, checks become simpler and the amount of code that needs
to know about I_CLEAR shrinks a lot.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:44 -04:00
Al Viro
01a05b337a new helper: iterate_supers()
... and switch the simple "loop over superblocks and do something"
loops to it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-21 18:31:16 -04:00
Al Viro
35cf7ba0b4 Bury __put_super_and_need_restart()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-21 18:31:16 -04:00
Al Viro
6754af6464 Convert simple loops over superblocks to list_for_each_entry_safe
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-21 18:31:15 -04:00
Al Viro
551de6f34d Leave superblocks on s_list until the end
We used to remove from s_list and s_instances at the same
time.  So let's *not* do the former and skip superblocks
that have empty s_instances in the loops over s_list.

The next step, of course, will be to get rid of rescan logics
in those loops.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-21 18:31:14 -04:00
Alexey Dobriyan
8d65af789f sysctl: remove "struct file *" argument of ->proc_handler
It's unused.

It isn't needed -- read or write flag is already passed and sysctl
shouldn't care about the rest.

It _was_ used in two places at arch/frv for some reason.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:21:04 -07:00
Mike Waychison
286973552f mm: remove __invalidate_mapping_pages variant
Remove __invalidate_mapping_pages atomic variant now that its sole caller
can sleep (fixed in eccb95cee4 ("vfs: fix
lock inversion in drop_pagecache_sb()")).

This fixes softlockups that can occur while in the drop_caches path.

Signed-off-by: Mike Waychison <mikew@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16 19:47:43 -07:00
Wu Fengguang
b6fac63cc1 vfs: skip I_CLEAR state inodes
clear_inode() will switch inode state from I_FREEING to I_CLEAR, and do so
_outside_ of inode_lock.  So any I_FREEING testing is incomplete without a
coupled testing of I_CLEAR.

So add I_CLEAR tests to drop_pagecache_sb(), generic_sync_sb_inodes() and
add_dquot_ref().

Masayoshi MIZUMA discovered the bug in drop_pagecache_sb() and Jan Kara
reminds fixing the other two cases.

Masayoshi MIZUMA has a nice panic flow:

=====================================================================
            [process A]               |        [process B]
 |                                    |
 |    prune_icache()                  | drop_pagecache()
 |      spin_lock(&inode_lock)        |   drop_pagecache_sb()
 |      inode->i_state |= I_FREEING;  |       |
 |      spin_unlock(&inode_lock)      |       V
 |          |                         |     spin_lock(&inode_lock)
 |          V                         |         |
 |      dispose_list()                |         |
 |        list_del()                  |         |
 |        clear_inode()               |         |
 |          inode->i_state = I_CLEAR  |         |
 |            |                       |         V
 |            |                       |      if (inode->i_state & (I_FREEING|I_WILL_FREE))
 |            |                       |              continue;           <==== NOT MATCH
 |            |                       |
 |            |                       | (DANGER from here on! Accessing disposing inode!)
 |            |                       |
 |            |                       |      __iget()
 |            |                       |        list_move() <===== PANIC on poisoned list !!
 V            V                       |
(time)
=====================================================================

Reported-by: Masayoshi MIZUMA <m.mizuma@jp.fujitsu.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02 19:04:48 -07:00
Nick Piggin
aabb8fdb41 fs: avoid I_NEW inodes
To be on the safe side, it should be less fragile to exclude I_NEW inodes
from inode list scans by default (unless there is an important reason to
have them).

Normally they will get excluded (eg.  by zero refcount or writecount etc),
however it is a bit fragile for list walkers to know exactly what parts of
the inode state is set up and valid to test when in I_NEW.  So along these
lines, move I_NEW checks upward as well (sometimes taking I_FREEING etc
checks with them too -- this shouldn't be a problem should it?)

Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-03-27 14:44:05 -04:00
Jan Kara
af065b8a19 vfs: skip inodes without pages to free in drop_pagecache_sb()
Many inodes have no pagecache, so we can avoid lots of lock-takings.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Fengguang Wu <wfg@mail.ustc.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:05 -07:00
Jan Kara
eccb95cee4 vfs: fix lock inversion in drop_pagecache_sb()
Fix longstanding lock inversion in drop_pagecache_sb by dropping inode_lock
before calling __invalidate_mapping_pages().  We just have to make sure inode
won't go away from under us by keeping reference to it and putting the
reference only after we have safely resumed the scan of the inode list.  A bit
tricky but not too bad...

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: David Chinner <dgc@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:05 -07:00
Adrian Bunk
07d45da616 fs/drop_caches.c: make 2 functions static
Make the following needlessly global functions static:

- drop_pagecache()
- drop_slab()

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:00 -07:00
Andrew Morton
fc9a07e7bf invalidate_mapping_pages(): add cond_resched
invalidate_mapping_pages() can sometimes take a long time (millions of pages
to free).  Long enough for the softlockup detector to trigger.

We used to have a cond_resched() in there but I took it out because the
drop_caches code calls invalidate_mapping_pages() under inode_lock.

The patch adds a nasty flag and puts the cond_resched() back.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:36 -07:00
Andrew Morton
fc0ecff698 [PATCH] remove invalidate_inode_pages()
Convert all calls to invalidate_inode_pages() into open-coded calls to
invalidate_mapping_pages().

Leave the invalidate_inode_pages() wrapper in place for now, marked as
deprecated.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-11 10:51:31 -08:00
Andrew Morton
9d0243bca3 [PATCH] drop-pagecache
Add /proc/sys/vm/drop_caches.  When written to, this will cause the kernel to
discard as much pagecache and/or reclaimable slab objects as it can.  THis
operation requires root permissions.

It won't drop dirty data, so the user should run `sync' first.

Caveats:

a) Holds inode_lock for exorbitant amounts of time.

b) Needs to be taught about NUMA nodes: propagate these all the way through
   so the discarding can be controlled on a per-node basis.

This is a debugging feature: useful for getting consistent results between
filesystem benchmarks.  We could possibly put it under a config option, but
it's less than 300 bytes.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:12:40 -08:00