Commit graph

306 commits

Author SHA1 Message Date
Srinivasarao P
f77d7178aa Merge android-4.4.141 (b1bad9e) into msm-4.4
* refs/heads/tmp-b1bad9e
  Linux 4.4.141
  loop: remember whether sysfs_create_group() was done
  RDMA/ucm: Mark UCM interface as BROKEN
  PM / hibernate: Fix oops at snapshot_write()
  loop: add recursion validation to LOOP_CHANGE_FD
  netfilter: x_tables: initialise match/target check parameter struct
  netfilter: nf_queue: augment nfqa_cfg_policy
  uprobes/x86: Remove incorrect WARN_ON() in uprobe_init_insn()
  x86/cpufeature: Add helper macro for mask check macros
  x86/cpufeature: Make sure DISABLED/REQUIRED macros are updated
  x86/cpufeature: Update cpufeaure macros
  x86/cpufeature, x86/mm/pkeys: Fix broken compile-time disabling of pkeys
  x86/cpu: Add detection of AMD RAS Capabilities
  x86/mm/pkeys: Fix mismerge of protection keys CPUID bits
  x86/cpufeature, x86/mm/pkeys: Add protection keys related CPUID definitions
  x86/cpufeature: Speed up cpu_feature_enabled()
  x86/boot: Simplify kernel load address alignment check
  x86/vdso: Use static_cpu_has()
  x86/alternatives: Discard dynamic check after init
  x86/alternatives: Add an auxilary section
  x86/cpufeature: Get rid of the non-asm goto variant
  x86/cpufeature: Replace the old static_cpu_has() with safe variant
  x86/cpufeature: Carve out X86_FEATURE_*
  x86/headers: Don't include asm/processor.h in asm/atomic.h
  x86/fpu: Get rid of xstate_fault()
  x86/fpu: Add an XSTATE_OP() macro
  x86/cpu: Provide a config option to disable static_cpu_has
  x86/cpufeature: Cleanup get_cpu_cap()
  x86/cpufeature: Move some of the scattered feature bits to x86_capability
  iw_cxgb4: correctly enforce the max reg_mr depth
  tools build: fix # escaping in .cmd files for future Make
  Fix up non-directory creation in SGID directories
  HID: usbhid: add quirk for innomedia INNEX GENESIS/ATARI adapter
  xhci: xhci-mem: off by one in xhci_stream_id_to_ring()
  usb: quirks: add delay quirks for Corsair Strafe
  USB: serial: mos7840: fix status-register error handling
  USB: yurex: fix out-of-bounds uaccess in read handler
  USB: serial: keyspan_pda: fix modem-status error handling
  USB: serial: cp210x: add another USB ID for Qivicon ZigBee stick
  USB: serial: ch341: fix type promotion bug in ch341_control_in()
  ahci: Disable LPM on Lenovo 50 series laptops with a too old BIOS
  vmw_balloon: fix inflation with batching
  ibmasm: don't write out of bounds in read handler
  MIPS: Fix ioremap() RAM check
  cpufreq: Kconfig: Remove CPU_FREQ_DEFAULT_GOV_SCHED

Change-Id: I0909a2917621f2384cdfe27078577cc2c06b9612
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
2018-07-24 12:10:47 +05:30
Greg Kroah-Hartman
b1bad9e232 This is the 4.4.141 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAltNt4MACgkQONu9yGCS
 aT4Otg//e7FAfNGllvjx+53RBbpRUoa4ltdKNrdKa94ZGgVbGCdctKa9BntDkHSb
 Vw6tfvdonuJSs3e9KBSt4vOiTWkJ0eOnajdRYEQUg/jtufIULWgHNEl1dk0JB2Oj
 +8GAfXzlZ7NRfjEV0l0m44aU/qHaWVBBPQcmqLlxnLEr+0idWfSAGALEBnK6W+nH
 5yNU8X1pxVb1qSnL2YVM03+B9cfrFlpiPv46+hrHaQ6r87e+veD6f1tE1o8BvVy6
 f8CxWGvYisKJZ+OOQLH95xVahzcsGG5RKcarXzjsq30XJM1QZj8hBSWlzj0aBZmW
 OAiJ2dJccZaThxBSPJWLm6jzrUpjmQOtQMRK6TnlGxhG03eA8noxffTE03RUzL7Q
 jog6oxGgnrM+h08kmNHQEWP8EMgc6GTextKY2v9LQL51L+IBkvX8YOJwZS8YltOI
 XcoriH/lrNq5O7gSEQ4WoZWYlDlVYNc8r5EqI8lYeeShdGJqps6/wOZa1zqBFtbE
 BD0UxIDOs4zmcqPBebVUqGoPklLsGW5QfZi1dgBTiGNnopokMxia3DlPnQeq/euM
 b7+DBzL0ce2EamIh///HS+HF2uAM5N7w+BdEbYpIUCoSTKB0hUuKIM+T6rgXEvzD
 y0wJhH4SmjBH8w/Hc57VYVqOMAG+cUPDlhrw5XBkZ9HXy1ns1HM=
 =780A
 -----END PGP SIGNATURE-----

Merge 4.4.141 into android-4.4

Changes in 4.4.141
	MIPS: Fix ioremap() RAM check
	ibmasm: don't write out of bounds in read handler
	vmw_balloon: fix inflation with batching
	ahci: Disable LPM on Lenovo 50 series laptops with a too old BIOS
	USB: serial: ch341: fix type promotion bug in ch341_control_in()
	USB: serial: cp210x: add another USB ID for Qivicon ZigBee stick
	USB: serial: keyspan_pda: fix modem-status error handling
	USB: yurex: fix out-of-bounds uaccess in read handler
	USB: serial: mos7840: fix status-register error handling
	usb: quirks: add delay quirks for Corsair Strafe
	xhci: xhci-mem: off by one in xhci_stream_id_to_ring()
	HID: usbhid: add quirk for innomedia INNEX GENESIS/ATARI adapter
	Fix up non-directory creation in SGID directories
	tools build: fix # escaping in .cmd files for future Make
	iw_cxgb4: correctly enforce the max reg_mr depth
	x86/cpufeature: Move some of the scattered feature bits to x86_capability
	x86/cpufeature: Cleanup get_cpu_cap()
	x86/cpu: Provide a config option to disable static_cpu_has
	x86/fpu: Add an XSTATE_OP() macro
	x86/fpu: Get rid of xstate_fault()
	x86/headers: Don't include asm/processor.h in asm/atomic.h
	x86/cpufeature: Carve out X86_FEATURE_*
	x86/cpufeature: Replace the old static_cpu_has() with safe variant
	x86/cpufeature: Get rid of the non-asm goto variant
	x86/alternatives: Add an auxilary section
	x86/alternatives: Discard dynamic check after init
	x86/vdso: Use static_cpu_has()
	x86/boot: Simplify kernel load address alignment check
	x86/cpufeature: Speed up cpu_feature_enabled()
	x86/cpufeature, x86/mm/pkeys: Add protection keys related CPUID definitions
	x86/mm/pkeys: Fix mismerge of protection keys CPUID bits
	x86/cpu: Add detection of AMD RAS Capabilities
	x86/cpufeature, x86/mm/pkeys: Fix broken compile-time disabling of pkeys
	x86/cpufeature: Update cpufeaure macros
	x86/cpufeature: Make sure DISABLED/REQUIRED macros are updated
	x86/cpufeature: Add helper macro for mask check macros
	uprobes/x86: Remove incorrect WARN_ON() in uprobe_init_insn()
	netfilter: nf_queue: augment nfqa_cfg_policy
	netfilter: x_tables: initialise match/target check parameter struct
	loop: add recursion validation to LOOP_CHANGE_FD
	PM / hibernate: Fix oops at snapshot_write()
	RDMA/ucm: Mark UCM interface as BROKEN
	loop: remember whether sysfs_create_group() was done
	Linux 4.4.141

Change-Id: I777b39a0ede95b58638add97756d6beaf4a9d154
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-07-17 12:15:52 +02:00
Linus Torvalds
e71dbad756 Fix up non-directory creation in SGID directories
commit 0fa3ecd87848c9c93c2c828ef4c3a8ca36ce46c7 upstream.

sgid directories have special semantics, making newly created files in
the directory belong to the group of the directory, and newly created
subdirectories will also become sgid.  This is historically used for
group-shared directories.

But group directories writable by non-group members should not imply
that such non-group members can magically join the group, so make sure
to clear the sgid bit on non-directories for non-members (but remember
that sgid without group execute means "mandatory locking", just to
confuse things even more).

Reported-by: Jann Horn <jannh@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-17 11:31:43 +02:00
Srinivasarao P
863577dd59 Merge android-4.4.116 (20ddb25) into msm-4.4
* refs/heads/tmp-20ddb25
  Linux 4.4.116
  ftrace: Remove incorrect setting of glob search field
  mn10300/misalignment: Use SIGSEGV SEGV_MAPERR to report a failed user copy
  ovl: fix failure to fsync lower dir
  ACPI: sbshc: remove raw pointer from printk() message
  nvme: Fix managing degraded controllers
  btrfs: Handle btrfs_set_extent_delalloc failure in fixup worker
  pktcdvd: Fix pkt_setup_dev() error path
  EDAC, octeon: Fix an uninitialized variable warning
  xtensa: fix futex_atomic_cmpxchg_inatomic
  alpha: fix reboot on Avanti platform
  alpha: fix crash if pthread_create races with signal delivery
  signal/sh: Ensure si_signo is initialized in do_divide_error
  signal/openrisc: Fix do_unaligned_access to send the proper signal
  Bluetooth: btusb: Restore QCA Rome suspend/resume fix with a "rewritten" version
  Revert "Bluetooth: btusb: fix QCA Rome suspend/resume"
  Bluetooth: btsdio: Do not bind to non-removable BCM43341
  HID: quirks: Fix keyboard + touchpad on Toshiba Click Mini not working
  kernel/async.c: revert "async: simplify lowest_in_progress()"
  media: cxusb, dib0700: ignore XC2028_I2C_FLUSH
  media: ts2020: avoid integer overflows on 32 bit machines
  watchdog: imx2_wdt: restore previous timeout after suspend+resume
  KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2
  arm: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls
  crypto: caam - fix endless loop when DECO acquire fails
  media: v4l2-compat-ioctl32.c: refactor compat ioctl32 logic
  media: v4l2-compat-ioctl32.c: don't copy back the result for certain errors
  media: v4l2-compat-ioctl32.c: drop pr_info for unknown buffer type
  media: v4l2-compat-ioctl32.c: copy clip list in put_v4l2_window32
  media: v4l2-compat-ioctl32: Copy v4l2_window->global_alpha
  media: v4l2-compat-ioctl32.c: make ctrl_is_pointer work for subdevs
  media: v4l2-compat-ioctl32.c: fix ctrl_is_pointer
  media: v4l2-compat-ioctl32.c: copy m.userptr in put_v4l2_plane32
  media: v4l2-compat-ioctl32.c: avoid sizeof(type)
  media: v4l2-compat-ioctl32.c: move 'helper' functions to __get/put_v4l2_format32
  media: v4l2-compat-ioctl32.c: fix the indentation
  media: v4l2-compat-ioctl32.c: add missing VIDIOC_PREPARE_BUF
  vb2: V4L2_BUF_FLAG_DONE is set after DQBUF
  media: v4l2-ioctl.c: don't copy back the result for -ENOTTY
  nsfs: mark dentry with DCACHE_RCUACCESS
  crypto: poly1305 - remove ->setkey() method
  crypto: cryptd - pass through absence of ->setkey()
  crypto: hash - introduce crypto_hash_alg_has_setkey()
  ahci: Add Intel Cannon Lake PCH-H PCI ID
  ahci: Add PCI ids for Intel Bay Trail, Cherry Trail and Apollo Lake AHCI
  ahci: Annotate PCI ids for mobile Intel chipsets as such
  kernfs: fix regression in kernfs_fop_write caused by wrong type
  NFS: reject request for id_legacy key without auxdata
  NFS: commit direct writes even if they fail partially
  NFS: Add a cond_resched() to nfs_commit_release_pages()
  nfs/pnfs: fix nfs_direct_req ref leak when i/o falls back to the mds
  ubi: block: Fix locking for idr_alloc/idr_remove
  mtd: nand: sunxi: Fix ECC strength choice
  mtd: nand: Fix nand_do_read_oob() return value
  mtd: nand: brcmnand: Disable prefetch by default
  mtd: cfi: convert inline functions to macros
  media: dvb-usb-v2: lmedm04: move ts2020 attach to dm04_lme2510_tuner
  media: dvb-usb-v2: lmedm04: Improve logic checking of warm start
  dccp: CVE-2017-8824: use-after-free in DCCP code
  sched/rt: Up the root domain ref count when passing it around via IPIs
  sched/rt: Use container_of() to get root domain in rto_push_irq_work_func()
  usb: gadget: uvc: Missing files for configfs interface
  posix-timer: Properly check sigevent->sigev_notify
  netfilter: nf_queue: Make the queue_handler pernet
  kaiser: fix compile error without vsyscall
  x86/kaiser: fix build error with KASAN && !FUNCTION_GRAPH_TRACER
  dmaengine: dmatest: fix container_of member in dmatest_callback
  CIFS: zero sensitive data when freeing
  cifs: Fix autonegotiate security settings mismatch
  cifs: Fix missing put_xid in cifs_file_strict_mmap
  powerpc/pseries: include linux/types.h in asm/hvcall.h
  x86/microcode: Do the family check first
  x86/microcode/AMD: Do not load when running on a hypervisor
  crypto: tcrypt - fix S/G table for test_aead_speed()
  don't put symlink bodies in pagecache into highmem
  KEYS: encrypted: fix buffer overread in valid_master_desc()
  media: soc_camera: soc_scale_crop: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
  vhost_net: stop device during reset owner
  tcp: release sk_frag.page in tcp_disconnect
  r8169: fix RTL8168EP take too long to complete driver initialization.
  qlcnic: fix deadlock bug
  net: igmp: add a missing rcu locking section
  ip6mr: fix stale iterator
  x86/asm: Fix inline asm call constraints for GCC 4.4
  drm: rcar-du: Fix race condition when disabling planes at CRTC stop
  drm: rcar-du: Use the VBK interrupt for vblank events
  ASoC: rsnd: avoid duplicate free_irq()
  ASoC: rsnd: don't call free_irq() on Parent SSI
  ASoC: simple-card: Fix misleading error message
  net: cdc_ncm: initialize drvflags before usage
  usbip: fix 3eee23c3ec14 tcp_socket address still in the status file
  usbip: vhci_hcd: clear just the USB_PORT_STAT_POWER bit
  ASoC: pcm512x: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
  powerpc/64s: Allow control of RFI flush via debugfs
  powerpc/64s: Wire up cpu_show_meltdown()
  powerpc/powernv: Check device-tree for RFI flush settings
  powerpc/pseries: Query hypervisor for RFI flush settings
  powerpc/64s: Support disabling RFI flush with no_rfi_flush and nopti
  powerpc/64s: Add support for RFI flush of L1-D cache
  powerpc/64s: Convert slb_miss_common to use RFI_TO_USER/KERNEL
  powerpc/64: Convert the syscall exit path to use RFI_TO_USER/KERNEL
  powerpc/64: Convert fast_exception_return to use RFI_TO_USER/KERNEL
  powerpc/64s: Simple RFI macro conversions
  powerpc/64: Add macros for annotating the destination of rfid/hrfid
  powerpc/pseries: Add H_GET_CPU_CHARACTERISTICS flags & wrapper
  powerpc: Simplify module TOC handling
  powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VEC
  powerpc/64: Fix flush_(d|i)cache_range() called from modules
  powerpc/bpf/jit: Disable classic BPF JIT on ppc64le
  BACKPORT: xfrm: Fix return value check of copy_sec_ctx.
  time: Fix ktime_get_raw() incorrect base accumulation
  sched/fair: prevent possible infinite loop in sched_group_energy
  UPSTREAM: MIPS: Fix build of compressed image
  ANDROID: qtaguid: Fix the UAF probelm with tag_ref_tree
  UPSTREAM: ANDROID: binder: remove waitqueue when thread exits.
  UPSTREAM: arm64/efi: Make strnlen() available to the EFI namespace
  UPSTREAM: ARM: boot: Add an implementation of strnlen for libfdt
  ANDROID: MIPS: Add ranchu[32r5|32r6|64]_defconfig
  FROMLIST: tty: goldfish: Enable 'earlycon' only if built-in
  FROMLIST: MIPS: ranchu: Add Ranchu as a new generic-based board
  FROMLIST: MIPS: Add noexec=on|off kernel parameter
  FROMLIST: MIPS: CPC: Map registers using DT in mips_cpc_default_phys_base()
  FROMLIST: dt-bindings: Document mti,mips-cpc binding
  FROMLIST: MIPS: math-emu: Mark fall throughs in switch statements with a comment
  FROMLIST: MIPS: math-emu: Avoid multiple assignment
  FROMLIST: MIPS: math-emu: Avoid an assignment within if statement condition
  FROMLIST: MIPS: math-emu: Declare function srl128() as static
  FROMLIST: MIPS: math-emu: Avoid definition duplication for macro DPXMULT()
  FROMLIST: MIPS: math-emu: Remove an unnecessary header inclusion
  UPSTREAM: scripts/dtc: Update to upstream version 0931cea3ba20
  UPSTREAM: scripts/dtc: dt_to_config - kernel config options for a devicetree
  UPSTREAM: scripts/dtc: Update to upstream version 53bf130b1cdd
  UPSTREAM: scripts/dtc: Update to upstream commit b06e55c88b9b
  UPSTREAM: scripts/dtc: dtx_diff - add info to error message
  UPSTREAM: dtc: create tool to diff device trees
  UPSTREAM: config: android-base: disable CONFIG_NFSD and CONFIG_NFS_FS
  UPSTREAM: config: android-base: add CGROUP_BPF
  UPSTREAM: config: android-base: add CONFIG_MODULES option
  UPSTREAM: config: android-base: add CONFIG_IKCONFIG option
  UPSTREAM: config: android-base: disable CONFIG_USELIB and CONFIG_FHANDLE
  UPSTREAM: config: android-base: enable hardened usercopy and kernel ASLR
  UPSTREAM: config: android: enable CONFIG_SECCOMP
  UPSTREAM: config: android: set SELinux as default security mode
  UPSTREAM: config: android: move device mapper options to recommended
  UPSTREAM: config/android: Remove CONFIG_IPV6_PRIVACY
  UPSTREAM: config: add android config fragments
  BACKPORT: MIPS: generic: Add a MAINTAINERS entry
  BACKPORT: irqchip/irq-goldfish-pic: Add Goldfish PIC driver
  UPSTREAM: dt-bindings/goldfish-pic: Add device tree binding for Goldfish PIC driver
  UPSTREAM: MIPS: Allow storing pgd in C0_CONTEXT for MIPSr6
  UPSTREAM: MIPS: CPS: Handle spurious VP starts more gracefully
  UPSTREAM: MIPS: CPS: Handle cores not powering down more gracefully
  UPSTREAM: MIPS: CPS: Prevent multi-core with dcache aliasing
  UPSTREAM: MIPS: CPS: Select CONFIG_SYS_SUPPORTS_SCHED_SMT for MIPSr6
  UPSTREAM: MIPS: CM: WARN on attempt to lock invalid VP, not BUG
  UPSTREAM: MIPS: CM: Avoid per-core locking with CM3 & higher
  UPSTREAM: MIPS: smp-cps: Avoid BUG() when offlining pre-r6 CPUs
  UPSTREAM: MIPS: smp-cps: Add support for CPU hotplug of MIPSr6 processors
  UPSTREAM: MIPS: generic: Bump default NR_CPUS to 16
  UPSTREAM: MIPS: pm-cps: Change FSB workaround to CPU blacklist
  UPSTREAM: MIPS: Fix early CM probing
  UPSTREAM: MIPS: smp-cps: Stop printing EJTAG exceptions to UART
  UPSTREAM: MIPS: smp-cps: Add nothreads kernel parameter
  UPSTREAM: MIPS: smp-cps: Support MIPSr6 Virtual Processors
  UPSTREAM: MIPS: smp-cps: Skip core setup if coherent
  UPSTREAM: MIPS: smp-cps: Pull boot config retrieval out of mips_cps_boot_vpes
  UPSTREAM: MIPS: smp-cps: Pull cache init into a function
  UPSTREAM: MIPS: smp-cps: Ensure our VP ident calculation is correct
  UPSTREAM: irqchip: mips-gic: Provide VP ID accessor
  UPSTREAM: irqchip: mips-gic: Use HW IDs for VPE_OTHER_ADDR
  UPSTREAM: MIPS: CM: Fix mips_cm_max_vp_width for UP kernels
  UPSTREAM: MIPS: CM: Add CM GCR_BEV_BASE accessors
  UPSTREAM: MIPS: CPC: Add start, stop and running CM3 CPC registers
  UPSTREAM: MIPS: pm-cps: Avoid offset overflow on MIPSr6
  UPSTREAM: MIPS: traps: Make sure secondary cores have a sane ebase register
  UPSTREAM: MIPS: Detect MIPSr6 Virtual Processor support
  UPSTREAM: Documentation: Add device tree binding for Goldfish FB driver
  UPSTREAM: MIPS: math-emu: Use preferred flavor of unsigned integer declarations
  UPSTREAM: MIPS: math-emu: <MADDF|MSUBF>.D: Fix accuracy (64-bit case)
  UPSTREAM: MIPS: math-emu: <MADDF|MSUBF>.S: Fix accuracy (32-bit case)
  UPSTREAM: MIPS: Update Goldfish RTC driver maintainer email address
  UPSTREAM: MIPS: Update RINT emulation maintainer email address
  UPSTREAM: MIPS: math-emu: do not use bools for arithmetic
  UPSTREAM: rtc: goldfish: Add RTC driver for Android emulator
  BACKPORT: dt-bindings: Add device tree binding for Goldfish RTC driver
  UPSTREAM: tty: goldfish: Implement support for kernel 'earlycon' parameter
  UPSTREAM: tty: goldfish: Use streaming DMA for r/w operations on Ranchu platforms
  UPSTREAM: tty: goldfish: Refactor constants to better reflect their nature
  UPSTREAM: MIPS: math-emu: Add FP emu debugfs stats for individual instructions
  UPSTREAM: MIPS: math-emu: Add FP emu debugfs clear functionality
  UPSTREAM: MIPS: math-emu: Add FP emu debugfs statistics for branches
  BACKPORT: MIPS: math-emu: CLASS.D: Zero bits 32-63 of the result
  BACKPORT: MIPS: math-emu: RINT.<D|S>: Fix several problems by reimplementation
  UPSTREAM: MIPS: math-emu: CMP.Sxxx.<D|S>: Prevent occurrences of SIGILL crashes
  UPSTREAM: MIPS: math-emu: <MADDF|MSUBF>.<D|S>: Clean up "maddf_flags" enumeration
  UPSTREAM: MIPS: math-emu: <MADDF|MSUBF>.<D|S>: Fix some cases of zero inputs
  UPSTREAM: MIPS: math-emu: <MADDF|MSUBF>.<D|S>: Fix some cases of infinite inputs
  UPSTREAM: MIPS: math-emu: <MADDF|MSUBF>.<D|S>: Fix NaN propagation
  UPSTREAM: tty: goldfish: Fix a parameter of a call to free_irq
  UPSTREAM: MIPS: VDSO: Fix clobber lists in fallback code paths
  UPSTREAM: MIPS: VDSO: Fix a mismatch between comment and preprocessor constant
  UPSTREAM: MIPS: VDSO: Add implementation of gettimeofday() fallback
  UPSTREAM: MIPS: VDSO: Add implementation of clock_gettime() fallback
  UPSTREAM: MIPS: VDSO: Fix conversions in do_monotonic()/do_monotonic_coarse()
  UPSTREAM: MIPS: unaligned: Add DSP lwx & lhx missaligned access support
  UPSTREAM: MIPS: build: Fix "-modd-spreg" switch usage when compiling for mips32r6
  UPSTREAM: MIPS: cmdline: Add support for 'memmap' parameter
  UPSTREAM: MIPS: math-emu: Handle zero accumulator case in MADDF and MSUBF separately
  UPSTREAM: MIPS: Support per-device DMA coherence
  UPSTREAM: MIPS: dma-default: Don't check hw_coherentio if device is non-coherent
  UPSTREAM: MIPS: Sanitise coherentio semantics
  UPSTREAM: MIPS: CPC: Provide default mips_cpc_default_phys_base to ignore CPC
  UPSTREAM: MIPS: generic: Introduce generic DT-based board support
  UPSTREAM: MIPS: Support generating Flattened Image Trees (.itb)
  UPSTREAM: MIPS: Allow emulation for unaligned [LS]DXC1 instructions
  UPSTREAM: MIPS: math-emu: Fix BC1EQZ and BC1NEZ condition handling
  UPSTREAM: MIPS: r2-on-r6-emu: Clear BLTZALL and BGEZALL debugfs counters
  UPSTREAM: MIPS: r2-on-r6-emu: Fix BLEZL and BGTZL identification
  UPSTREAM: MIPS: remove aliasing alignment if HW has antialising support
  BACKPORT: MIPS: store the appended dtb address in a variable
  UPSTREAM: MIPS: Fix FCSR Cause bit handling for correct SIGFPE issue
  UPSTREAM: MIPS: kernel: Audit and remove any unnecessary uses of module.h
  UPSTREAM: MIPS: c-r4k: Fix sigtramp SMP call to use kmap
  UPSTREAM: MIPS: c-r4k: Fix protected_writeback_scache_line for EVA
  UPSTREAM: MIPS: Spelling fix lets -> let's
  UPSTREAM: MIPS: R6: Fix typo
  UPSTREAM: MIPS: traps: Correct the SIGTRAP debug ABI in `do_watch' and `do_trap_or_bp'
  UPSTREAM: MIPS: inst.h: Rename cbcond{0,1}_op to pop{1,3}0_op
  UPSTREAM: MIPS: inst.h: Rename b{eq,ne}zcji[al]c_op to pop{6,7}6_op
  UPSTREAM: MIPS: math-emu: Fix m{add,sub}.s shifts
  UPSTREAM: MIPS: inst: Declare fsel_op for sel.fmt instruction
  UPSTREAM: MIPS: math-emu: Fix code indentation
  UPSTREAM: MIPS: math-emu: Fix bit-width in ieee754dp_{mul, maddf, msubf} comments
  UPSTREAM: MIPS: math-emu: Add z argument macros
  UPSTREAM: MIPS: math-emu: Unify ieee754dp_m{add,sub}f
  UPSTREAM: MIPS: math-emu: Unify ieee754sp_m{add,sub}f
  UPSTREAM: MIPS: math-emu: Emulate MIPSr6 sel.fmt instruction
  UPSTREAM: MIPS: math-emu: Fix BC1{EQ,NE}Z emulation
  UPSTREAM: MIPS: math-emu: Always propagate sNaN payload in quieting
  UPSTREAM: MIPS: Fix misspellings in comments.
  UPSTREAM: MIPS: math-emu: Add IEEE Std 754-2008 NaN encoding emulation
  UPSTREAM: MIPS: math-emu: Add IEEE Std 754-2008 ABS.fmt and NEG.fmt emulation
  UPSTREAM: MIPS: non-exec stack & heap when non-exec PT_GNU_STACK is present
  UPSTREAM: MIPS: Add IEEE Std 754 conformance mode selection
  UPSTREAM: MIPS: Determine the presence of IEEE Std 754-2008 features
  UPSTREAM: MIPS: Define the legacy-NaN and 2008-NaN features
  UPSTREAM: MIPS: ELF: Interpret the NAN2008 file header flag
  UPSTREAM: ELF: Also pass any interpreter's file header to `arch_check_elf'
  UPSTREAM: MIPS: Use a union to access the ELF file header
  UPSTREAM: MIPS: Fix delay slot emulation count in debugfs
  BACKPORT: exit_thread: accept a task parameter to be exited
  UPSTREAM: mn10300: let exit_fpu accept a task
  UPSTREAM: MIPS: Use per-mm page to execute branch delay slot instructions
  BACKPORT: s390: get rid of exit_thread()
  BACKPORT: exit_thread: remove empty bodies
  UPSTREAM: MIPS: Make flush_thread
  UPSTREAM: MIPS: Properly disable FPU in start_thread()
  UPSTREAM: MIPS: Select CONFIG_HANDLE_DOMAIN_IRQ and make it work.
  UPSTREAM: MIPS: math-emu: Fix typo
  UPSTREAM: MIPS: math-emu: dsemul: Remove an unused bit in ADDIUPC emulation
  UPSTREAM: MIPS: math-emu: dsemul: Reduce `get_isa16_mode' clutter
  UPSTREAM: MIPS: math-emu: dsemul: Correct description of the emulation frame
  UPSTREAM: MIPS: math-emu: Correct the emulation of microMIPS ADDIUPC instruction
  UPSTREAM: MIPS: math-emu: Make microMIPS branch delay slot emulation work
  UPSTREAM: MIPS: math-emu: dsemul: Fix ill formatting of microMIPS part
  UPSTREAM: MIPS: math-emu: Correctly handle NOP emulation

Conflicts:
	drivers/irqchip/Kconfig
	drivers/irqchip/Makefile
	drivers/media/v4l2-core/v4l2-compat-ioctl32.c

Change-Id: I98374358ab24ce80dba3afa2f4562c71f45b7aab
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
2018-03-01 17:18:47 +05:30
Greg Kroah-Hartman
20ddb25b3e This is the 4.4.116 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlqHLN8ACgkQONu9yGCS
 aT7eyQ/+NGK3/MPgoqRtg8sEvr1CVk8VhH1BiBfiQPGXe/D4nqPrKQzQBBzsW8QX
 6Z9PY7wDz9RgFkw+FoOyG0eLuYdgNYOelASdQ4kJzteVH8pB2GxxTbX0drttzV+F
 liNy0w39YLYxbjR4FavOSuDekd46dNQsHBvzTawaFKh0BEtQO+1uUGMg1LjMKVPn
 F9ry0mEPrOoC2+nRvU6QXIUZy6y4+Pgdda0sfGcO3yXwQev9HoW5h9qMCnGah30J
 D3Glt86dtpQcuqeIaXrfX+HnkvAOxTHjP8uRn3O7A7h8+WYBWq5Xms6A7EE9duNV
 0UA8OZpvq0r0YSTmBFzrDexAcf/cXW8ajd/VKseI/d53iIauLV5FUaGldLJ3IQMc
 gYZ2uNxGTI4z3V+nIiVQ0NCm4kmqogVY8PvMlgUwiFVG2B088iYGZ7iTOQ9b7wBO
 VgDo0ouC/yDA8Lmz/A0l3SuvkJDNIPJit5lWzqCGRjk1F8WdPpI5C3ONfp8R3Lko
 sTllldOo982KW5up/fg5HfuMg1OjgXZtzO+/NlTtyTpSr9bb1OoniSROG8eEcMqO
 lKI1MB8Xx/pqqW1E8OOtb7A/8JPCBFzVV9xVGKwI0uZa2XOQeAwGruOe8Ub6nEpU
 8w30DlSgy8MB1BPL6UGC6k+001k8jkohdl/qjpYb6aK55CfbhlA=
 =a3k5
 -----END PGP SIGNATURE-----

Merge 4.4.116 into android-4.4

Changes in 4.4.116
	powerpc/bpf/jit: Disable classic BPF JIT on ppc64le
	powerpc/64: Fix flush_(d|i)cache_range() called from modules
	powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VEC
	powerpc: Simplify module TOC handling
	powerpc/pseries: Add H_GET_CPU_CHARACTERISTICS flags & wrapper
	powerpc/64: Add macros for annotating the destination of rfid/hrfid
	powerpc/64s: Simple RFI macro conversions
	powerpc/64: Convert fast_exception_return to use RFI_TO_USER/KERNEL
	powerpc/64: Convert the syscall exit path to use RFI_TO_USER/KERNEL
	powerpc/64s: Convert slb_miss_common to use RFI_TO_USER/KERNEL
	powerpc/64s: Add support for RFI flush of L1-D cache
	powerpc/64s: Support disabling RFI flush with no_rfi_flush and nopti
	powerpc/pseries: Query hypervisor for RFI flush settings
	powerpc/powernv: Check device-tree for RFI flush settings
	powerpc/64s: Wire up cpu_show_meltdown()
	powerpc/64s: Allow control of RFI flush via debugfs
	ASoC: pcm512x: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
	usbip: vhci_hcd: clear just the USB_PORT_STAT_POWER bit
	usbip: fix 3eee23c3ec14 tcp_socket address still in the status file
	net: cdc_ncm: initialize drvflags before usage
	ASoC: simple-card: Fix misleading error message
	ASoC: rsnd: don't call free_irq() on Parent SSI
	ASoC: rsnd: avoid duplicate free_irq()
	drm: rcar-du: Use the VBK interrupt for vblank events
	drm: rcar-du: Fix race condition when disabling planes at CRTC stop
	x86/asm: Fix inline asm call constraints for GCC 4.4
	ip6mr: fix stale iterator
	net: igmp: add a missing rcu locking section
	qlcnic: fix deadlock bug
	r8169: fix RTL8168EP take too long to complete driver initialization.
	tcp: release sk_frag.page in tcp_disconnect
	vhost_net: stop device during reset owner
	media: soc_camera: soc_scale_crop: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
	KEYS: encrypted: fix buffer overread in valid_master_desc()
	don't put symlink bodies in pagecache into highmem
	crypto: tcrypt - fix S/G table for test_aead_speed()
	x86/microcode/AMD: Do not load when running on a hypervisor
	x86/microcode: Do the family check first
	powerpc/pseries: include linux/types.h in asm/hvcall.h
	cifs: Fix missing put_xid in cifs_file_strict_mmap
	cifs: Fix autonegotiate security settings mismatch
	CIFS: zero sensitive data when freeing
	dmaengine: dmatest: fix container_of member in dmatest_callback
	x86/kaiser: fix build error with KASAN && !FUNCTION_GRAPH_TRACER
	kaiser: fix compile error without vsyscall
	netfilter: nf_queue: Make the queue_handler pernet
	posix-timer: Properly check sigevent->sigev_notify
	usb: gadget: uvc: Missing files for configfs interface
	sched/rt: Use container_of() to get root domain in rto_push_irq_work_func()
	sched/rt: Up the root domain ref count when passing it around via IPIs
	dccp: CVE-2017-8824: use-after-free in DCCP code
	media: dvb-usb-v2: lmedm04: Improve logic checking of warm start
	media: dvb-usb-v2: lmedm04: move ts2020 attach to dm04_lme2510_tuner
	mtd: cfi: convert inline functions to macros
	mtd: nand: brcmnand: Disable prefetch by default
	mtd: nand: Fix nand_do_read_oob() return value
	mtd: nand: sunxi: Fix ECC strength choice
	ubi: block: Fix locking for idr_alloc/idr_remove
	nfs/pnfs: fix nfs_direct_req ref leak when i/o falls back to the mds
	NFS: Add a cond_resched() to nfs_commit_release_pages()
	NFS: commit direct writes even if they fail partially
	NFS: reject request for id_legacy key without auxdata
	kernfs: fix regression in kernfs_fop_write caused by wrong type
	ahci: Annotate PCI ids for mobile Intel chipsets as such
	ahci: Add PCI ids for Intel Bay Trail, Cherry Trail and Apollo Lake AHCI
	ahci: Add Intel Cannon Lake PCH-H PCI ID
	crypto: hash - introduce crypto_hash_alg_has_setkey()
	crypto: cryptd - pass through absence of ->setkey()
	crypto: poly1305 - remove ->setkey() method
	nsfs: mark dentry with DCACHE_RCUACCESS
	media: v4l2-ioctl.c: don't copy back the result for -ENOTTY
	vb2: V4L2_BUF_FLAG_DONE is set after DQBUF
	media: v4l2-compat-ioctl32.c: add missing VIDIOC_PREPARE_BUF
	media: v4l2-compat-ioctl32.c: fix the indentation
	media: v4l2-compat-ioctl32.c: move 'helper' functions to __get/put_v4l2_format32
	media: v4l2-compat-ioctl32.c: avoid sizeof(type)
	media: v4l2-compat-ioctl32.c: copy m.userptr in put_v4l2_plane32
	media: v4l2-compat-ioctl32.c: fix ctrl_is_pointer
	media: v4l2-compat-ioctl32.c: make ctrl_is_pointer work for subdevs
	media: v4l2-compat-ioctl32: Copy v4l2_window->global_alpha
	media: v4l2-compat-ioctl32.c: copy clip list in put_v4l2_window32
	media: v4l2-compat-ioctl32.c: drop pr_info for unknown buffer type
	media: v4l2-compat-ioctl32.c: don't copy back the result for certain errors
	media: v4l2-compat-ioctl32.c: refactor compat ioctl32 logic
	crypto: caam - fix endless loop when DECO acquire fails
	arm: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls
	KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2
	watchdog: imx2_wdt: restore previous timeout after suspend+resume
	media: ts2020: avoid integer overflows on 32 bit machines
	media: cxusb, dib0700: ignore XC2028_I2C_FLUSH
	kernel/async.c: revert "async: simplify lowest_in_progress()"
	HID: quirks: Fix keyboard + touchpad on Toshiba Click Mini not working
	Bluetooth: btsdio: Do not bind to non-removable BCM43341
	Revert "Bluetooth: btusb: fix QCA Rome suspend/resume"
	Bluetooth: btusb: Restore QCA Rome suspend/resume fix with a "rewritten" version
	signal/openrisc: Fix do_unaligned_access to send the proper signal
	signal/sh: Ensure si_signo is initialized in do_divide_error
	alpha: fix crash if pthread_create races with signal delivery
	alpha: fix reboot on Avanti platform
	xtensa: fix futex_atomic_cmpxchg_inatomic
	EDAC, octeon: Fix an uninitialized variable warning
	pktcdvd: Fix pkt_setup_dev() error path
	btrfs: Handle btrfs_set_extent_delalloc failure in fixup worker
	nvme: Fix managing degraded controllers
	ACPI: sbshc: remove raw pointer from printk() message
	ovl: fix failure to fsync lower dir
	mn10300/misalignment: Use SIGSEGV SEGV_MAPERR to report a failed user copy
	ftrace: Remove incorrect setting of glob search field
	Linux 4.4.116

Change-Id: Id000cb8d59b74de063902e9ad24dd07fe1b1694b
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-02-20 16:23:06 +01:00
Al Viro
076e4ab327 don't put symlink bodies in pagecache into highmem
commit 21fc61c73c3903c4c312d0802da01ec2b323d174 upstream.

kmap() in page_follow_link_light() needed to go - allowing to hold
an arbitrary number of kmaps for long is a great way to deadlocking
the system.

new helper (inode_nohighmem(inode)) needs to be used for pagecache
symlinks inodes; done for all in-tree cases.  page_follow_link_light()
instrumented to yell about anything missed.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jin Qian <jinqian@google.com>
Signed-off-by: Jin Qian <jinqian@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-16 20:09:38 +01:00
Dmitry Shmidt
232c28fe23 Merge remote-tracking branch 'common/android-4.4' into android-4.4.y
Change-Id: Icf907f5067fb6da5935ab0d3271df54b8d5df405
2017-02-15 18:02:55 -08:00
Daniel Rosenberg
e5eeaaf5f7 ANDROID: vfs: Add setattr2 for filesystems with per mount permissions
This allows filesystems to use their mount private data to
influence the permssions they use in setattr2. It has
been separated into a new call to avoid disrupting current
setattr users.

Change-Id: I19959038309284448f1b7f232d579674ef546385
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-02-03 15:04:29 +05:30
Daniel Rosenberg
1cbf8e31e3 ANDROID: vfs: Add setattr2 for filesystems with per mount permissions
This allows filesystems to use their mount private data to
influence the permssions they use in setattr2. It has
been separated into a new call to avoid disrupting current
setattr users.

Change-Id: I19959038309284448f1b7f232d579674ef546385
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-01-26 15:53:30 -08:00
Tahsin Erdogan
1f565de67d writeback: initialize inode members that track writeback history
inode struct members that track cgroup writeback information
should be reinitialized when inode gets allocated from
kmem_cache. Otherwise, their values remain and get used by the
new inode.

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Fixes: d10c809552 ("writeback: implement foreign cgroup inode bdi_writeback switching")
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 3d65ae4634ed8350aee98a4e6f4e41fe40c7d282)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
2016-11-29 15:25:11 +08:00
Miklos Szeredi
8e510cd921 vfs: fix deadlock in file_remove_privs() on overlayfs
commit c1892c37769cf89c7e7ba57528ae2ccb5d153c9b upstream.

file_remove_privs() is called with inode lock on file_inode(), which
proceeds to calling notify_change() on file->f_path.dentry.  Which triggers
the WARN_ON_ONCE(!inode_is_locked(inode)) in addition to deadlocking later
when ovl_setattr tries to lock the underlying inode again.

Fix this mess by not mixing the layers, but doing everything on underlying
dentry/inode.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 07a2daab49c5 ("ovl: Copy up underlying inode's ->i_mode to overlay inode")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-10 11:49:30 +02:00
Randy Dunlap
30fdc8ee0e fs/inode.c: fix kernel-doc warning
Fix kernel-doc warning in fs/inode.c:

  ../fs/inode.c:1606: warning: No description found for parameter 'inode'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Josef Bacik
ac05fbb400 inode: don't softlockup when evicting inodes
On a box with a lot of ram (148gb) I can make the box softlockup after running
an fs_mark job that creates hundreds of millions of empty files.  This is
because we never generate enough memory pressure to keep the number of inodes on
our unused list low, so when we go to unmount we have to evict ~100 million
inodes.  This makes one processor a very unhappy person, so add a cond_resched()
in dispose_list() and if we need a resched when processing the s_inodes list do
that and run dispose_list() on what we've currently culled.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
2015-08-18 10:20:09 -07:00
Dave Chinner
c7f5408493 inode: rename i_wb_list to i_io_list
There's a small consistency problem between the inode and writeback
naming. Writeback calls the "for IO" inode queues b_io and
b_more_io, but the inode calls these the "writeback list" or
i_wb_list. This makes it hard to an new "under writeback" list to
the inode, or call it an "under IO" list on the bdi because either
way we'll have writeback on IO and IO on writeback and it'll just be
confusing. I'm getting confused just writing this!

So, rename the inode "for IO" list variable to i_io_list so we can
add a new "writeback list" in a subsequent patch.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Dave Chinner <dchinner@redhat.com>
2015-08-17 23:38:10 -04:00
Dave Chinner
74278da9f7 inode: convert inode_sb_list_lock to per-sb
The process of reducing contention on per-superblock inode lists
starts with moving the locking to match the per-superblock inode
list. This takes the global lock out of the picture and reduces the
contention problems to within a single filesystem. This doesn't get
rid of contention as the locks still have global CPU scope, but it
does isolate operations on different superblocks form each other.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Dave Chinner <dchinner@redhat.com>
2015-08-17 18:39:46 -04:00
Linus Torvalds
1dc51b8288 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
 "Assorted VFS fixes and related cleanups (IMO the most interesting in
  that part are f_path-related things and Eric's descriptor-related
  stuff).  UFS regression fixes (it got broken last cycle).  9P fixes.
  fs-cache series, DAX patches, Jan's file_remove_suid() work"

[ I'd say this is much more than "fixes and related cleanups".  The
  file_table locking rule change by Eric Dumazet is a rather big and
  fundamental update even if the patch isn't huge.   - Linus ]

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
  9p: cope with bogus responses from server in p9_client_{read,write}
  p9_client_write(): avoid double p9_free_req()
  9p: forgetting to cancel request on interrupted zero-copy RPC
  dax: bdev_direct_access() may sleep
  block: Add support for DAX reads/writes to block devices
  dax: Use copy_from_iter_nocache
  dax: Add block size note to documentation
  fs/file.c: __fget() and dup2() atomicity rules
  fs/file.c: don't acquire files->file_lock in fd_install()
  fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
  vfs: avoid creation of inode number 0 in get_next_ino
  namei: make set_root_rcu() return void
  make simple_positive() public
  ufs: use dir_pages instead of ufs_dir_pages()
  pagemap.h: move dir_pages() over there
  remove the pointless include of lglock.h
  fs: cleanup slight list_entry abuse
  xfs: Correctly lock inode when removing suid and file capabilities
  fs: Call security_ops->inode_killpriv on truncate
  fs: Provide function telling whether file_remove_privs() will do anything
  ...
2015-07-04 19:36:06 -07:00
Carlos Maiolino
2adc376c55 vfs: avoid creation of inode number 0 in get_next_ino
currently, get_next_ino() is able to create inodes with inode number = 0.
This have a bad impact in the filesystems relying in this function to generate
inode numbers.

While there is no problem at all in having inodes with number 0, userspace tools
which handle file management tasks can have problems handling these files, like
for example, the impossiblity of users to delete these files, since glibc will
ignore them. So, I believe the best way is kernel to avoid creating them.

This problem has been raised previously, but the old thread didn't have any
other update for a year+, and I've seen too many users hitting the same issue
regarding the impossibility to delete files while using filesystems relying on
this function. So, I'm starting the thread again, with the same patch
that I believe is enough to address this problem.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-30 23:59:49 -04:00
Linus Torvalds
e4bc13adfd Merge branch 'for-4.2/writeback' of git://git.kernel.dk/linux-block
Pull cgroup writeback support from Jens Axboe:
 "This is the big pull request for adding cgroup writeback support.

  This code has been in development for a long time, and it has been
  simmering in for-next for a good chunk of this cycle too.  This is one
  of those problems that has been talked about for at least half a
  decade, finally there's a solution and code to go with it.

  Also see last weeks writeup on LWN:

        http://lwn.net/Articles/648292/"

* 'for-4.2/writeback' of git://git.kernel.dk/linux-block: (85 commits)
  writeback, blkio: add documentation for cgroup writeback support
  vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB
  writeback: do foreign inode detection iff cgroup writeback is enabled
  v9fs: fix error handling in v9fs_session_init()
  bdi: fix wrong error return value in cgwb_create()
  buffer: remove unusued 'ret' variable
  writeback: disassociate inodes from dying bdi_writebacks
  writeback: implement foreign cgroup inode bdi_writeback switching
  writeback: add lockdep annotation to inode_to_wb()
  writeback: use unlocked_inode_to_wb transaction in inode_congested()
  writeback: implement unlocked_inode_to_wb transaction and use it for stat updates
  writeback: implement [locked_]inode_to_wb_and_lock_list()
  writeback: implement foreign cgroup inode detection
  writeback: make writeback_control track the inode being written back
  writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb()
  mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use
  writeback: implement memcg writeback domain based throttling
  writeback: reset wb_domain->dirty_limit[_tstmp] when memcg domain size changes
  writeback: implement memcg wb_domain
  writeback: update wb_over_bg_thresh() to use wb_domain aware operations
  ...
2015-06-25 16:00:17 -07:00
Jan Kara
45f147a1bc fs: Call security_ops->inode_killpriv on truncate
Comment in include/linux/security.h says that ->inode_killpriv() should
be called when setuid bit is being removed and that similar security
labels (in fact this applies only to file capabilities) should be
removed at this time as well. However we don't call ->inode_killpriv()
when we remove suid bit on truncate.

We fix the problem by calling ->inode_need_killpriv() and subsequently
->inode_killpriv() on truncate the same way as we do it on file write.

After this patch there's only one user of should_remove_suid() - ocfs2 -
and indeed it's buggy because it doesn't call ->inode_killpriv() on
write. However fixing it is difficult because of special locking
constraints.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23 18:01:09 -04:00
Jan Kara
dbfae0cdcd fs: Provide function telling whether file_remove_privs() will do anything
Provide function telling whether file_remove_privs() will do anything.
Currently we only have should_remove_suid() and that does something
slightly different.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23 18:01:09 -04:00
Jan Kara
5fa8e0a1c6 fs: Rename file_remove_suid() to file_remove_privs()
file_remove_suid() is a misnomer since it removes also file capabilities
stored in xattrs and sets S_NOSEC flag. Also should_remove_suid() tells
something else than whether file_remove_suid() call is necessary which
leads to bugs.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23 18:01:08 -04:00
Jan Kara
2426f39100 fs: Fix S_NOSEC handling
file_remove_suid() could mistakenly set S_NOSEC inode bit when root was
modifying the file. As a result following writes to the file by ordinary
user would avoid clearing suid or sgid bits.

Fix the bug by checking actual mode bits before setting S_NOSEC.

CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23 18:01:08 -04:00
Tejun Heo
52ebea749a writeback: make backing_dev_info host cgroup-specific bdi_writebacks
For the planned cgroup writeback support, on each bdi
(backing_dev_info), each memcg will be served by a separate wb
(bdi_writeback).  This patch updates bdi so that a bdi can host
multiple wbs (bdi_writebacks).

On the default hierarchy, blkcg implicitly enables memcg.  This allows
using memcg's page ownership for attributing writeback IOs, and every
memcg - blkcg combination can be served by its own wb by assigning a
dedicated wb to each memcg.  This means that there may be multiple
wb's of a bdi mapped to the same blkcg.  As congested state is per
blkcg - bdi combination, those wb's should share the same congested
state.  This is achieved by tracking congested state via
bdi_writeback_congested structs which are keyed by blkcg.

bdi->wb remains unchanged and will keep serving the root cgroup.
cgwb's (cgroup wb's) for non-root cgroups are created on-demand or
looked up while dirtying an inode according to the memcg of the page
being dirtied or current task.  Each cgwb is indexed on bdi->cgwb_tree
by its memcg id.  Once an inode is associated with its wb, it can be
retrieved using inode_to_wb().

Currently, none of the filesystems has FS_CGROUP_WRITEBACK and all
pages will keep being associated with bdi->wb.

v3: inode_attach_wb() in account_page_dirtied() moved inside
    mapping_cap_account_dirty() block where it's known to be !NULL.
    Also, an unnecessary NULL check before kfree() removed.  Both
    detected by the kbuild bot.

v2: Updated so that wb association is per inode and wb is per memcg
    rather than blkcg.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: kbuild test robot <fengguang.wu@intel.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-02 08:33:35 -06:00
NeilBrown
8fa9dd2466 VFS/namei: make the use of touch_atime() in get_link() RCU-safe.
touch_atime is not RCU-safe, and so cannot be called on an RCU walk.
However, in situations where RCU-walk makes a difference, the symlink
will likely to accessed much more often than it is useful to update
the atime.

So split out the test of "Does the atime actually need to be updated"
into  atime_needs_update(), and have get_link() unlazy if it finds that
it will need to do that update.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15 01:06:27 -04:00
Al Viro
61ba64fc07 libfs: simple_follow_link()
let "fast" symlinks store the pointer to the body into ->i_link and
use simple_follow_link for ->follow_link()

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-10 22:18:20 -04:00
Jens Axboe
fe0f07d08e direct-io: only inc/dec inode->i_dio_count for file systems
do_blockdev_direct_IO() increments and decrements the inode
->i_dio_count for each IO operation. It does this to protect against
truncate of a file. Block devices don't need this sort of protection.

For a capable multiqueue setup, this atomic int is the only shared
state between applications accessing the device for O_DIRECT, and it
presents a scaling wall for that. In my testing, as much as 30% of
system time is spent incrementing and decrementing this value. A mixed
read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with
better latencies too. Before:

clat percentiles (usec):
 |  1.00th=[   33],  5.00th=[   34], 10.00th=[   34], 20.00th=[   34],
 | 30.00th=[   34], 40.00th=[   34], 50.00th=[   35], 60.00th=[   35],
 | 70.00th=[   35], 80.00th=[   35], 90.00th=[   37], 95.00th=[   80],
 | 99.00th=[   98], 99.50th=[  151], 99.90th=[  155], 99.95th=[  155],
 | 99.99th=[  165]

After:

clat percentiles (usec):
 |  1.00th=[   95],  5.00th=[  108], 10.00th=[  129], 20.00th=[  149],
 | 30.00th=[  155], 40.00th=[  161], 50.00th=[  167], 60.00th=[  171],
 | 70.00th=[  177], 80.00th=[  185], 90.00th=[  201], 95.00th=[  270],
 | 99.00th=[  390], 99.50th=[  398], 99.90th=[  418], 99.95th=[  422],
 | 99.99th=[  438]

In other setups, Robert Elliott reported seeing good performance
improvements:

https://lkml.org/lkml/2015/4/3/557

The more applications accessing the device, the worse it gets.

Add a new direct-io flags, DIO_SKIP_DIO_COUNT, which tells
do_blockdev_direct_IO() that it need not worry about incrementing
or decrementing the inode i_dio_count for this caller.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Elliott, Robert (Server Storage) <elliott@hp.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-24 15:45:28 -04:00
David Howells
df2b1afde1 VFS: fs/inode.c helpers: d_inode() annotations
these should be used on objects already in top layer

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-15 15:06:59 -04:00
Linus Torvalds
038911597e Merge branch 'lazytime' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull lazytime mount option support from Al Viro:
 "Lazytime stuff from tytso"

* 'lazytime' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ext4: add optimization for the lazytime mount option
  vfs: add find_inode_nowait() function
  vfs: add support for a lazytime mount option
2015-02-17 16:12:34 -08:00
Linus Torvalds
818099574b Merge branch 'akpm' (patches from Andrew)
Merge third set of updates from Andrew Morton:

 - the rest of MM

   [ This includes getting rid of the numa hinting bits, in favor of
     just generic protnone logic.  Yay.     - Linus ]

 - core kernel

 - procfs

 - some of lib/ (lots of lib/ material this time)

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (104 commits)
  lib/lcm.c: replace include
  lib/percpu_ida.c: remove redundant includes
  lib/strncpy_from_user.c: replace module.h include
  lib/stmp_device.c: replace module.h include
  lib/sort.c: move include inside #if 0
  lib/show_mem.c: remove redundant include
  lib/radix-tree.c: change to simpler include
  lib/plist.c: remove redundant include
  lib/nlattr.c: remove redundant include
  lib/kobject_uevent.c: remove redundant include
  lib/llist.c: remove redundant include
  lib/md5.c: simplify include
  lib/list_sort.c: rearrange includes
  lib/genalloc.c: remove redundant include
  lib/idr.c: remove redundant include
  lib/halfmd4.c: simplify includes
  lib/dynamic_queue_limits.c: simplify includes
  lib/sort.c: use simpler includes
  lib/interval_tree.c: simplify includes
  hexdump: make it return number of bytes placed in buffer
  ...
2015-02-12 18:54:28 -08:00
Vladimir Davydov
3f97b16320 list_lru: add helpers to isolate items
Currently, the isolate callback passed to the list_lru_walk family of
functions is supposed to just delete an item from the list upon returning
LRU_REMOVED or LRU_REMOVED_RETRY, while nr_items counter is fixed by
__list_lru_walk_one after the callback returns.  Since the callback is
allowed to drop the lock after removing an item (it has to return
LRU_REMOVED_RETRY then), the nr_items can be less than the actual number
of elements on the list even if we check them under the lock.  This makes
it difficult to move items from one list_lru_one to another, which is
required for per-memcg list_lru reparenting - we can't just splice the
lists, we have to move entries one by one.

This patch therefore introduces helpers that must be used by callback
functions to isolate items instead of raw list_del/list_move.  These are
list_lru_isolate and list_lru_isolate_move.  They not only remove the
entry from the list, but also fix the nr_items counter, making sure
nr_items always reflects the actual number of elements on the list if
checked under the appropriate lock.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:10 -08:00
Vladimir Davydov
503c358cf1 list_lru: introduce list_lru_shrink_{count,walk}
Kmem accounting of memcg is unusable now, because it lacks slab shrinker
support.  That means when we hit the limit we will get ENOMEM w/o any
chance to recover.  What we should do then is to call shrink_slab, which
would reclaim old inode/dentry caches from this cgroup.  This is what
this patch set is intended to do.

Basically, it does two things.  First, it introduces the notion of
per-memcg slab shrinker.  A shrinker that wants to reclaim objects per
cgroup should mark itself as SHRINKER_MEMCG_AWARE.  Then it will be
passed the memory cgroup to scan from in shrink_control->memcg.  For
such shrinkers shrink_slab iterates over the whole cgroup subtree under
the target cgroup and calls the shrinker for each kmem-active memory
cgroup.

Secondly, this patch set makes the list_lru structure per-memcg.  It's
done transparently to list_lru users - everything they have to do is to
tell list_lru_init that they want memcg-aware list_lru.  Then the
list_lru will automatically distribute objects among per-memcg lists
basing on which cgroup the object is accounted to.  This way to make FS
shrinkers (icache, dcache) memcg-aware we only need to make them use
memcg-aware list_lru, and this is what this patch set does.

As before, this patch set only enables per-memcg kmem reclaim when the
pressure goes from memory.limit, not from memory.kmem.limit.  Handling
memory.kmem.limit is going to be tricky due to GFP_NOFS allocations, and
it is still unclear whether we will have this knob in the unified
hierarchy.

This patch (of 9):

NUMA aware slab shrinkers use the list_lru structure to distribute
objects coming from different NUMA nodes to different lists.  Whenever
such a shrinker needs to count or scan objects from a particular node,
it issues commands like this:

        count = list_lru_count_node(lru, sc->nid);
        freed = list_lru_walk_node(lru, sc->nid, isolate_func,
                                   isolate_arg, &sc->nr_to_scan);

where sc is an instance of the shrink_control structure passed to it
from vmscan.

To simplify this, let's add special list_lru functions to be used by
shrinkers, list_lru_shrink_count() and list_lru_shrink_walk(), which
consolidate the nid and nr_to_scan arguments in the shrink_control
structure.

This will also allow us to avoid patching shrinkers that use list_lru
when we make shrink_slab() per-memcg - all we will have to do is extend
the shrink_control structure to include the target memcg and make
list_lru_shrink_{count,walk} handle this appropriately.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Greg Thelen <gthelen@google.com>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:08 -08:00
Linus Torvalds
6bec003528 Merge branch 'for-3.20/bdi' of git://git.kernel.dk/linux-block
Pull backing device changes from Jens Axboe:
 "This contains a cleanup of how the backing device is handled, in
  preparation for a rework of the life time rules.  In this part, the
  most important change is to split the unrelated nommu mmap flags from
  it, but also removing a backing_dev_info pointer from the
  address_space (and inode), and a cleanup of other various minor bits.

  Christoph did all the work here, I just fixed an oops with pages that
  have a swap backing.  Arnd fixed a missing export, and Oleg killed the
  lustre backing_dev_info from staging.  Last patch was from Al,
  unexporting parts that are now no longer needed outside"

* 'for-3.20/bdi' of git://git.kernel.dk/linux-block:
  Make super_blocks and sb_lock static
  mtd: export new mtd_mmap_capabilities
  fs: make inode_to_bdi() handle NULL inode
  staging/lustre/llite: get rid of backing_dev_info
  fs: remove default_backing_dev_info
  fs: don't reassign dirty inodes to default_backing_dev_info
  nfs: don't call bdi_unregister
  ceph: remove call to bdi_unregister
  fs: remove mapping->backing_dev_info
  fs: export inode_to_bdi and use it in favor of mapping->backing_dev_info
  nilfs2: set up s_bdi like the generic mount_bdev code
  block_dev: get bdev inode bdi directly from the block device
  block_dev: only write bdev inode on close
  fs: introduce f_op->mmap_capabilities for nommu mmap support
  fs: kill BDI_CAP_SWAP_BACKED
  fs: deduplicate noop_backing_dev_info
2015-02-12 13:50:21 -08:00
Linus Torvalds
992de5a8ec Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "Bite-sized chunks this time, to avoid the MTA ratelimiting woes.

   - fs/notify updates

   - ocfs2

   - some of MM"

That laconic "some MM" is mainly the removal of remap_file_pages(),
which is a big simplification of the VM, and which gets rid of a *lot*
of random cruft and special cases because we no longer support the
non-linear mappings that it used.

From a user interface perspective, nothing has changed, because the
remap_file_pages() syscall still exists, it's just done by emulating the
old behavior by creating a lot of individual small mappings instead of
one non-linear one.

The emulation is slower than the old "native" non-linear mappings, but
nobody really uses or cares about remap_file_pages(), and simplifying
the VM is a big advantage.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (78 commits)
  memcg: zap memcg_slab_caches and memcg_slab_mutex
  memcg: zap memcg_name argument of memcg_create_kmem_cache
  memcg: zap __memcg_{charge,uncharge}_slab
  mm/page_alloc.c: place zone_id check before VM_BUG_ON_PAGE check
  mm: hugetlb: fix type of hugetlb_treat_as_movable variable
  mm, hugetlb: remove unnecessary lower bound on sysctl handlers"?
  mm: memory: merge shared-writable dirtying branches in do_wp_page()
  mm: memory: remove ->vm_file check on shared writable vmas
  xtensa: drop _PAGE_FILE and pte_file()-related helpers
  x86: drop _PAGE_FILE and pte_file()-related helpers
  unicore32: drop pte_file()-related helpers
  um: drop _PAGE_FILE and pte_file()-related helpers
  tile: drop pte_file()-related helpers
  sparc: drop pte_file()-related helpers
  sh: drop _PAGE_FILE and pte_file()-related helpers
  score: drop _PAGE_FILE and pte_file()-related helpers
  s390: drop pte_file()-related helpers
  parisc: drop _PAGE_FILE and pte_file()-related helpers
  openrisc: drop _PAGE_FILE and pte_file()-related helpers
  nios2: drop _PAGE_FILE and pte_file()-related helpers
  ...
2015-02-10 16:45:56 -08:00
Kirill A. Shutemov
27ba0644ea rmap: drop support of non-linear mappings
We don't create non-linear mappings anymore.  Let's drop code which
handles them in rmap.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-10 14:30:31 -08:00
Theodore Ts'o
fe032c422c vfs: add find_inode_nowait() function
Add a new function find_inode_nowait() which is an even more general
version of ilookup5_nowait().  It is designed for callers which need
very fine grained control over when the function is allowed to block
or increment the inode's reference count.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-05 02:45:00 -05:00
Theodore Ts'o
0ae45f63d4 vfs: add support for a lazytime mount option
Add a new mount option which enables a new "lazytime" mode.  This mode
causes atime, mtime, and ctime updates to only be made to the
in-memory version of the inode.  The on-disk times will only get
updated when (a) if the inode needs to be updated for some non-time
related change, (b) if userspace calls fsync(), syncfs() or sync(), or
(c) just before an undeleted inode is evicted from memory.

This is OK according to POSIX because there are no guarantees after a
crash unless userspace explicitly requests via a fsync(2) call.

For workloads which feature a large number of random write to a
preallocated file, the lazytime mount option significantly reduces
writes to the inode table.  The repeated 4k writes to a single block
will result in undesirable stress on flash devices and SMR disk
drives.  Even on conventional HDD's, the repeated writes to the inode
table block will trigger Adjacent Track Interference (ATI) remediation
latencies, which very negatively impact long tail latencies --- which
is a very big deal for web serving tiers (for example).

Google-Bug-Id: 18297052

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-05 02:45:00 -05:00
Christoph Hellwig
b83ae6d421 fs: remove mapping->backing_dev_info
Now that we never use the backing_dev_info pointer in struct address_space
we can simply remove it and save 4 to 8 bytes in every inode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reviewed-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-01-20 14:03:05 -07:00
Jeff Layton
4a075e39c8 locks: add a new struct file_locking_context pointer to struct inode
The current scheme of using the i_flock list is really difficult to
manage. There is also a legitimate desire for a per-inode spinlock to
manage these lists that isn't the i_lock.

Start conversion to a new scheme to eventually replace the old i_flock
list with a new "file_lock_context" object.

We start by adding a new i_flctx to struct inode. For now, it lives in
parallel with i_flock list, but will eventually replace it. The idea is
to allocate a structure to sit in that pointer and act as a locus for
all things file locking.

We allocate a file_lock_context for an inode when the first lock is
added to it, and it's only freed when the inode is freed. We use the
i_lock to protect the assignment, but afterward it should mostly be
accessed locklessly.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
2015-01-16 15:05:54 -05:00
Linus Torvalds
603ba7e41b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile #2 from Al Viro:
 "Next pile (and there'll be one or two more).

  The large piece in this one is getting rid of /proc/*/ns/* weirdness;
  among other things, it allows to (finally) make nameidata completely
  opaque outside of fs/namei.c, making for easier further cleanups in
  there"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  coda_venus_readdir(): use file_inode()
  fs/namei.c: fold link_path_walk() call into path_init()
  path_init(): don't bother with LOOKUP_PARENT in argument
  fs/namei.c: new helper (path_cleanup())
  path_init(): store the "base" pointer to file in nameidata itself
  make default ->i_fop have ->open() fail with ENXIO
  make nameidata completely opaque outside of fs/namei.c
  kill proc_ns completely
  take the targets of /proc/*/ns/* symlinks to separate fs
  bury struct proc_ns in fs/proc
  copy address of proc_ns_ops into ns_common
  new helpers: ns_alloc_inum/ns_free_inum
  make proc_ns_operations work with struct ns_common * instead of void *
  switch the rest of proc_ns_operations to working with &...->ns
  netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns
  make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns
  common object embedded into various struct ....ns
2014-12-16 15:53:03 -08:00
Davidlohr Bueso
c8c06efa8b mm: convert i_mmap_mutex to rwsem
The i_mmap_mutex is a close cousin of the anon vma lock, both protecting
similar data, one for file backed pages and the other for anon memory.  To
this end, this lock can also be a rwsem.  In addition, there are some
important opportunities to share the lock when there are no tree
modifications.

This conversion is straightforward.  For now, all users take the write
lock.

[sfr@canb.auug.org.au: update fremap.c]
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:45 -08:00
Al Viro
bd9b51e79c make default ->i_fop have ->open() fail with ENXIO
As it is, default ->i_fop has NULL ->open() (along with all other methods).
The only case where it matters is reopening (via procfs symlink) a file that
didn't get its ->f_op from ->i_fop - anything else will have ->i_fop assigned
to something sane (default would fail on read/write/ioctl/etc.).

	Unfortunately, such case exists - alloc_file() users, especially
anon_get_file() ones.  There we have tons of opened files of very different
kinds sharing the same inode.  As the result, attempt to reopen those via
procfs succeeds and you get a descriptor you can't do anything with.

	Moreover, in case of sockets we set ->i_fop that will only be used
on such reopen attempts - and put a failing ->open() into it to make sure
those do not succeed.

	It would be simpler to put such ->open() into default ->i_fop and leave
it unchanged both for anon inode (as we do anyway) and for socket ones.  Result:
	* everything going through do_dentry_open() works as it used to
	* sock_no_open() kludge is gone
	* attempts to reopen anon-inode files fail as they really ought to
	* ditto for aio_private_file()
	* ditto for perfmon - this one actually tried to imitate sock_no_open()
trick, but failed to set ->i_fop, so in the current tree reopens succeed and
yield completely useless descriptor.  Intent clearly had been to fail with
-ENXIO on such reopens; now it actually does.
	* everything else that used alloc_file() keeps working - it has ->i_fop
set for its inodes anyway

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-10 21:32:15 -05:00
Jan Kara
75cbe701a4 vfs: Remove i_dquot field from inode
All filesystems using VFS quotas are now converted to use their private
i_dquot fields. Remove the i_dquot field from generic inode structure.

Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2014-11-10 10:06:18 +01:00
David Herrmann
4bb5f5d939 mm: allow drivers to prevent new writable mappings
This patch (of 6):

The i_mmap_writable field counts existing writable mappings of an
address_space.  To allow drivers to prevent new writable mappings, make
this counter signed and prevent new writable mappings if it is negative.
This is modelled after i_writecount and DENYWRITE.

This will be required by the shmem-sealing infrastructure to prevent any
new writable mappings after the WRITE seal has been set.  In case there
exists a writable mapping, this operation will fail with EBUSY.

Note that we rely on the fact that iff you already own a writable mapping,
you can increase the counter without using the helpers.  This is the same
that we do for i_writecount.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ryan Lortie <desrt@desrt.ca>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <zonque@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:31 -07:00
NeilBrown
743162013d sched: Remove proliferation of wait_on_bit() action functions
The current "wait_on_bit" interface requires an 'action'
function to be provided which does the actual waiting.
There are over 20 such functions, many of them identical.
Most cases can be satisfied by one of just two functions, one
which uses io_schedule() and one which just uses schedule().

So:
 Rename wait_on_bit and        wait_on_bit_lock to
        wait_on_bit_action and wait_on_bit_lock_action
 to make it explicit that they need an action function.

 Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
 which are *not* given an action function but implicitly use
 a standard one.
 The decision to error-out if a signal is pending is now made
 based on the 'mode' argument rather than being encoded in the action
 function.

 All instances of the old wait_on_bit and wait_on_bit_lock which
 can use the new version have been changed accordingly and their
 action functions have been discarded.
 wait_on_bit{_lock} does not return any specific error code in the
 event of a signal so the caller must check for non-zero and
 interpolate their own error code as appropriate.

The wait_on_bit() call in __fscache_wait_on_invalidate() was
ambiguous as it specified TASK_UNINTERRUPTIBLE but used
fscache_wait_bit_interruptible as an action function.
David Howells confirms this should be uniformly
"uninterruptible"

The main remaining user of wait_on_bit{,_lock}_action is NFS
which needs to use a freezer-aware schedule() call.

A comment in fs/gfs2/glock.c notes that having multiple 'action'
functions is useful as they display differently in the 'wchan'
field of 'ps'. (and /proc/$PID/wchan).
As the new bit_wait{,_io} functions are tagged "__sched", they
will not show up at all, but something higher in the stack.  So
the distinction will still be visible, only with different
function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
gfs2/glock.c case).

Since first version of this patch (against 3.15) two new action
functions appeared, on in NFS and one in CIFS.  CIFS also now
uses an action function that makes the same freezer aware
schedule call as NFS.

Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steve French <sfrench@samba.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-16 15:10:39 +02:00
Andy Lutomirski
23adbe12ef fs,userns: Change inode_capable to capable_wrt_inode_uidgid
The kernel has no concept of capabilities with respect to inodes; inodes
exist independently of namespaces.  For example, inode_capable(inode,
CAP_LINUX_IMMUTABLE) would be nonsense.

This patch changes inode_capable to check for uid and gid mappings and
renames it to capable_wrt_inode_uidgid, which should make it more
obvious what it does.

Fixes CVE-2014-4014.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-10 13:57:22 -07:00
Joe Perches
1f7e0616cd fs: convert use of typedef ctl_table to struct ctl_table
This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:16 -07:00
Linus Torvalds
24e7ea3bea Major changes for 3.14 include support for the newly added ZERO_RANGE
and COLLAPSE_RANGE fallocate operations, and scalability improvements
 in the jbd2 layer and in xattr handling when the extended attributes
 spill over into an external block.
 
 Other than that, the usual clean ups and minor bug fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJTPbD2AAoJENNvdpvBGATwDmUQANSfGYIQazB8XKKgtNTMiG/Y
 Ky7n1JzN9lTX/6nMsqQnbfCweLRmxqpWUBuyKDRHUi8IG0/voXSTFsAOOgz0R15A
 ERRRWkVvHixLpohuL/iBdEMFHwNZYPGr3jkm0EIgzhtXNgk5DNmiuMwvHmCY27kI
 kdNZIw9fip/WRNoFLDBGnLGC37aanoHhCIbVlySy5o9LN1pkC8BgXAYV0Rk19SVd
 bWCudSJEirFEqWS5H8vsBAEm/ioxTjwnNL8tX8qms6orZ6h8yMLFkHoIGWPw3Q15
 a0TSUoMyav50Yr59QaDeWx9uaPQVeK41wiYFI2rZOnyG2ts0u0YXs/nLwJqTovgs
 rzvbdl6cd3Nj++rPi97MTA7iXK96WQPjsDJoeeEgnB0d/qPyTk6mLKgftzLTNgSa
 ZmWjrB19kr6CMbebMC4L6eqJ8Fr66pCT8c/iue8wc4MUHi7FwHKH64fqWvzp2YT/
 +165dqqo2JnUv7tIp6sUi1geun+bmDHLZFXgFa7fNYFtcU3I+uY1mRr3eMVAJndA
 2d6ASe/KhQbpVnjKJdQ8/b833ZS3p+zkgVPrd68bBr3t7gUmX91wk+p1ct6rUPLr
 700F+q/pQWL8ap0pU9Ht/h3gEJIfmRzTwxlOeYyOwDseqKuS87PSB3BzV3dDunSU
 DrPKlXwIgva7zq5/S0Vr
 =4s1Z
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Major changes for 3.14 include support for the newly added ZERO_RANGE
  and COLLAPSE_RANGE fallocate operations, and scalability improvements
  in the jbd2 layer and in xattr handling when the extended attributes
  spill over into an external block.

  Other than that, the usual clean ups and minor bug fixes"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (42 commits)
  ext4: fix premature freeing of partial clusters split across leaf blocks
  ext4: remove unneeded test of ret variable
  ext4: fix comment typo
  ext4: make ext4_block_zero_page_range static
  ext4: atomically set inode->i_flags in ext4_set_inode_flags()
  ext4: optimize Hurd tests when reading/writing inodes
  ext4: kill i_version support for Hurd-castrated file systems
  ext4: each filesystem creates and uses its own mb_cache
  fs/mbcache.c: doucple the locking of local from global data
  fs/mbcache.c: change block and index hash chain to hlist_bl_node
  ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate
  ext4: refactor ext4_fallocate code
  ext4: Update inode i_size after the preallocation
  ext4: fix partial cluster handling for bigalloc file systems
  ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents
  ext4: only call sync_filesystm() when remounting read-only
  fs: push sync_filesystem() down to the file system's remount_fs()
  jbd2: improve error messages for inconsistent journal heads
  jbd2: minimize region locked by j_list_lock in jbd2_journal_forget()
  jbd2: minimize region locked by j_list_lock in journal_get_create_access()
  ...
2014-04-04 15:39:39 -07:00
Linus Torvalds
7df934526c Merge branch 'cross-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull renameat2 system call from Miklos Szeredi:
 "This adds a new syscall, renameat2(), which is the same as renameat()
  but with a flags argument.

  The purpose of extending rename is to add cross-rename, a symmetric
  variant of rename, which exchanges the two files.  This allows
  interesting things, which were not possible before, for example
  atomically replacing a directory tree with a symlink, etc...  This
  also allows overlayfs and friends to operate on whiteouts atomically.

  Andy Lutomirski also suggested a "noreplace" flag, which disables the
  overwriting behavior of rename.

  These two flags, RENAME_EXCHANGE and RENAME_NOREPLACE are only
  implemented for ext4 as an example and for testing"

* 'cross-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ext4: add cross rename support
  ext4: rename: split out helper functions
  ext4: rename: move EMLINK check up
  ext4: rename: create ext4_renament structure for local vars
  vfs: add cross-rename
  vfs: lock_two_nondirectories: allow directory args
  security: add flags to rename hooks
  vfs: add RENAME_NOREPLACE flag
  vfs: add renameat2 syscall
  vfs: rename: use common code for dir and non-dir
  vfs: rename: move d_move() up
  vfs: add d_is_dir()
2014-04-04 14:03:05 -07:00
Johannes Weiner
91b0abe36a mm + fs: store shadow entries in page cache
Reclaim will be leaving shadow entries in the page cache radix tree upon
evicting the real page.  As those pages are found from the LRU, an
iput() can lead to the inode being freed concurrently.  At this point,
reclaim must no longer install shadow pages because the inode freeing
code needs to ensure the page tree is really empty.

Add an address_space flag, AS_EXITING, that the inode freeing code sets
under the tree lock before doing the final truncate.  Reclaim will check
for this flag before installing shadow pages.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:01 -07:00
J. Bruce Fields
4fd699ae3f vfs: lock_two_nondirectories: allow directory args
lock_two_nondirectories warned if either of its args was a directory.
Instead just ignore the directory args.  This is needed for locking in
cross rename.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-01 17:08:43 +02:00