We don't need to wait for whole bunch of discard candidates in fstrim, since
runtime discard will issue them in idle time.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* refs/heads/tmp-3f51ea2
Linux 4.4.133
x86/kexec: Avoid double free_page() upon do_kexec_load() failure
hfsplus: stop workqueue when fill_super() failed
cfg80211: limit wiphy names to 128 bytes
gpio: rcar: Add Runtime PM handling for interrupts
time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
dmaengine: ensure dmaengine helpers check valid callback
scsi: zfcp: fix infinite iteration on ERP ready list
scsi: sg: allocate with __GFP_ZERO in sg_build_indirect()
scsi: libsas: defer ata device eh commands to libata
s390: use expoline thunks in the BPF JIT
s390: extend expoline to BC instructions
s390: move spectre sysfs attribute code
s390/kernel: use expoline for indirect branches
s390/lib: use expoline for indirect branches
s390: move expoline assembler macros to a header
s390: add assembler macros for CPU alternatives
ext2: fix a block leak
tcp: purge write queue in tcp_connect_init()
sock_diag: fix use-after-free read in __sk_free
packet: in packet_snd start writing at link layer allocation
net: test tailroom before appending to linear skb
btrfs: fix reading stale metadata blocks after degraded raid1 mounts
btrfs: fix crash when trying to resume balance without the resume flag
Btrfs: fix xattr loss after power failure
ARM: 8772/1: kprobes: Prohibit kprobes on get_user functions
ARM: 8770/1: kprobes: Prohibit probing on optimized_callback
ARM: 8769/1: kprobes: Fix to use get_kprobe_ctlblk after irq-disabed
tick/broadcast: Use for_each_cpu() specially on UP kernels
ARM: 8771/1: kprobes: Prohibit kprobes on do_undefinstr
efi: Avoid potential crashes, fix the 'struct efi_pci_io_protocol_32' definition for mixed mode
s390: remove indirect branch from do_softirq_own_stack
s390/qdio: don't release memory in qdio_setup_irq()
s390/cpum_sf: ensure sample frequency of perf event attributes is non-zero
s390/qdio: fix access to uninitialized qdio_q fields
mm: don't allow deferred pages with NEED_PER_CPU_KM
powerpc/powernv: Fix NVRAM sleep in invalid context when crashing
procfs: fix pthread cross-thread naming if !PR_DUMPABLE
proc read mm's {arg,env}_{start,end} with mmap semaphore taken.
tracing/x86/xen: Remove zero data size trace events trace_xen_mmu_flush_tlb{_all}
cpufreq: intel_pstate: Enable HWP by default
signals: avoid unnecessary taking of sighand->siglock
mm: filemap: avoid unnecessary calls to lock_page when waiting for IO to complete during a read
mm: filemap: remove redundant code in do_read_cache_page
proc: meminfo: estimate available memory more conservatively
vmscan: do not force-scan file lru if its absolute size is small
powerpc: Don't preempt_disable() in show_cpuinfo()
cpuidle: coupled: remove unused define cpuidle_coupled_lock
powerpc/powernv: remove FW_FEATURE_OPALv3 and just use FW_FEATURE_OPAL
powerpc/powernv: Remove OPALv2 firmware define and references
powerpc/powernv: panic() on OPAL < V3
spi: pxa2xx: Allow 64-bit DMA
ALSA: control: fix a redundant-copy issue
ALSA: hda: Add Lenovo C50 All in one to the power_save blacklist
ALSA: usb: mixer: volume quirk for CM102-A+/102S+
usbip: usbip_host: fix bad unlock balance during stub_probe()
usbip: usbip_host: fix NULL-ptr deref and use-after-free errors
usbip: usbip_host: run rebind from exit when module is removed
usbip: usbip_host: delete device from busid_table after rebind
usbip: usbip_host: refine probe and disconnect debug msgs to be useful
kernel/exit.c: avoid undefined behaviour when calling wait4()
futex: futex_wake_op, fix sign_extend32 sign bits
pipe: cap initial pipe capacity according to pipe-max-size limit
l2tp: revert "l2tp: fix missing print session offset info"
Revert "ARM: dts: imx6qdl-wandboard: Fix audio channel swap"
lockd: lost rollback of set_grace_period() in lockd_down_net()
xfrm: fix xfrm_do_migrate() with AEAD e.g(AES-GCM)
futex: Remove duplicated code and fix undefined behaviour
futex: Remove unnecessary warning from get_futex_key
arm64: Add work around for Arm Cortex-A55 Erratum 1024718
arm64: introduce mov_q macro to move a constant into a 64-bit register
audit: move calcs after alloc and check when logging set loginuid
ALSA: timer: Call notifier in the same spinlock
sctp: delay the authentication for the duplicated cookie-echo chunk
sctp: fix the issue that the cookie-ack with auth can't get processed
tcp: ignore Fast Open on repair mode
bonding: do not allow rlb updates to invalid mac
tg3: Fix vunmap() BUG_ON() triggered from tg3_free_consistent().
sctp: use the old asoc when making the cookie-ack chunk in dupcook_d
sctp: handle two v4 addrs comparison in sctp_inet6_cmp_addr
r8169: fix powering up RTL8168h
qmi_wwan: do not steal interfaces from class drivers
openvswitch: Don't swap table in nlattr_set() after OVS_ATTR_NESTED is found
net: support compat 64-bit time in {s,g}etsockopt
net_sched: fq: take care of throttled flows before reuse
net/mlx4_en: Verify coalescing parameters are in range
net: ethernet: sun: niu set correct packet size in skb
llc: better deal with too small mtu
ipv4: fix memory leaks in udp_sendmsg, ping_v4_sendmsg
dccp: fix tasklet usage
bridge: check iface upper dev when setting master via ioctl
8139too: Use disable_irq_nosync() in rtl8139_poll_controller()
BACKPORT, FROMLIST: fscrypt: add Speck128/256 support
cgroup: Disable IRQs while holding css_set_lock
Revert "cgroup: Disable IRQs while holding css_set_lock"
cgroup: Disable IRQs while holding css_set_lock
ANDROID: proc: fix undefined behavior in proc_uid_base_readdir
x86: vdso: Fix leaky vdso linker with CC=clang.
ANDROID: build: cuttlefish: Upgrade clang to newer version.
ANDROID: build: cuttlefish: Upgrade clang to newer version.
ANDROID: build: cuttlefish: Fix path to clang.
UPSTREAM: dm bufio: avoid sleeping while holding the dm_bufio lock
ANDROID: sdcardfs: Don't d_drop in d_revalidate
Conflicts:
arch/arm64/include/asm/cputype.h
fs/ext4/crypto.c
fs/ext4/ext4.h
kernel/cgroup.c
mm/vmscan.c
Change-Id: Ic10c5722b6439af1cf423fd949c493f786764d7e
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
In order to avoid interfering normal r/w IO, let's turn down IO
priority of discard issued from background.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Now, we issue discard asynchronously in separated thread instead of in
checkpoint, after that, we won't encounter long latency in checkpoint
due to huge number of synchronous discard command handling, so, we don't
need to split checkpoint to do trim in batch, merge it and obsolete
related sysfs entry.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In the high utilization like over 80%, we don't expect huge # of large discard
commands, but do many small pending discards which affects FTL GCs a lot.
Let's issue them in that case.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
For non-atomic files, this patch adds an option to give nobarrier which
doesn't issue flush commands to the device.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The fstrim gathers huge number of large discard commands, and tries to issue
without IO awareness, which results in long user-perceive IO latencies on
READ, WRITE, and FLUSH in UFS. We've observed some of commands take several
seconds due to long discard latency.
This patch limits the maximum size to 2MB per candidate, and check IO congestion
when issuing them to disk.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlsOO14ACgkQONu9yGCS
aT4ulhAAhMVYSRa/cOFm0BHxSL/59WmJTa3Na8TJqkTrJy+LRluBiKCywyiMZknp
4rIffv4jcxcFNCpqYTjNTSStGLWCCkBLNSzxuzFv5M89Jdx4Gz1Ww1hzMESP3gxK
puHUewSJQm7qtVOiC2l4YcW3Q6nFK0kqbCWpSkHoGVfZoX9JS2P1V8n+KFZpUH1a
UyhVW48ainUpXfhSKJZ5xABiWYM2hcSq52RW1edNZvwuKwulZ+2EME26HgGCK7ff
WHzGHECE6Lem+iunR26J/QtbTo8LKEyU0F039X21E7FIxf33S0xyPx+MGjJfWBOo
Q6A23mAEWwEhlMomNKzdd/iUzSVlWSzKe8LJa7GI5G6BxftN8Z0TGTnKzIDkw++M
T6RfK03CP6c9rQ756d0fTPxdZh6ae9EN8WSot/Sbbc9SvGSfy6o4I8Y/uJygShmF
j13JfMweC+t7/6fyUqc5dcgY0Xy7LUFiWqfPxQj6axDiT82Mx2AvQaczrPUAKr1K
KQsetmyhHC+Cpy7ILrhUGYjEWlvQm11ZiFoX8BkocFLFWk736QA63iB7mOUpCOQR
SKLK00dF163GJdQC6nb4wCtyBxnCg4pSoP/72Z1foPtaSd3ccJ4CLsIE6GY5sP/I
sDlPnIlnzEDfDPIxtVfKC8e1JINP6awXwtoJJo6MnuCuP3LDb58=
=ogZQ
-----END PGP SIGNATURE-----
Merge 4.4.134 into android-4.4
Changes in 4.4.134
MIPS: ptrace: Expose FIR register through FP regset
MIPS: Fix ptrace(2) PTRACE_PEEKUSR and PTRACE_POKEUSR accesses to o32 FGRs
KVM: Fix spelling mistake: "cop_unsuable" -> "cop_unusable"
affs_lookup(): close a race with affs_remove_link()
aio: fix io_destroy(2) vs. lookup_ioctx() race
ALSA: timer: Fix pause event notification
mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register
libata: Blacklist some Sandisk SSDs for NCQ
libata: blacklist Micron 500IT SSD with MU01 firmware
xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
Revert "ipc/shm: Fix shmat mmap nil-page protection"
ipc/shm: fix shmat() nil address after round-down when remapping
kasan: fix memory hotplug during boot
kernel/sys.c: fix potential Spectre v1 issue
kernel/signal.c: avoid undefined behaviour in kill_something_info
xfs: remove racy hasattr check from attr ops
do d_instantiate/unlock_new_inode combinations safely
firewire-ohci: work around oversized DMA reads on JMicron controllers
NFSv4: always set NFS_LOCK_LOST when a lock is lost.
ALSA: hda - Use IS_REACHABLE() for dependency on input
ASoC: au1x: Fix timeout tests in au1xac97c_ac97_read()
kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl
tracing/hrtimer: Fix tracing bugs by taking all clock bases and modes into account
PCI: Add function 1 DMA alias quirk for Marvell 9128
tools lib traceevent: Simplify pointer print logic and fix %pF
perf callchain: Fix attr.sample_max_stack setting
tools lib traceevent: Fix get_field_str() for dynamic strings
dm thin: fix documentation relative to low water mark threshold
nfs: Do not convert nfs_idmap_cache_timeout to jiffies
watchdog: sp5100_tco: Fix watchdog disable bit
kconfig: Don't leak main menus during parsing
kconfig: Fix automatic menu creation mem leak
kconfig: Fix expr_free() E_NOT leak
mac80211_hwsim: fix possible memory leak in hwsim_new_radio_nl()
ipmi/powernv: Fix error return code in ipmi_powernv_probe()
Btrfs: set plug for fsync
btrfs: Fix out of bounds access in btrfs_search_slot
Btrfs: fix scrub to repair raid6 corruption
scsi: fas216: fix sense buffer initialization
HID: roccat: prevent an out of bounds read in kovaplus_profile_activated()
jffs2: Fix use-after-free bug in jffs2_iget()'s error handling path
powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes
powerpc/numa: Ensure nodes initialized for hotplug
RDMA/mlx5: Avoid memory leak in case of XRCD dealloc failure
ntb_transport: Fix bug with max_mw_size parameter
ocfs2: return -EROFS to mount.ocfs2 if inode block is invalid
ocfs2/acl: use 'ip_xattr_sem' to protect getting extended attribute
ocfs2: return error when we attempt to access a dirty bh in jbd2
mm/mempolicy: fix the check of nodemask from user
mm/mempolicy: add nodes_empty check in SYSC_migrate_pages
asm-generic: provide generic_pmdp_establish()
mm: pin address_space before dereferencing it while isolating an LRU page
IB/ipoib: Fix for potential no-carrier state
x86/power: Fix swsusp_arch_resume prototype
firmware: dmi_scan: Fix handling of empty DMI strings
ACPI: processor_perflib: Do not send _PPC change notification if not ready
bpf: fix selftests/bpf test_kmod.sh failure when CONFIG_BPF_JIT_ALWAYS_ON=y
MIPS: TXx9: use IS_BUILTIN() for CONFIG_LEDS_CLASS
xen-netfront: Fix race between device setup and open
xen/grant-table: Use put_page instead of free_page
RDS: IB: Fix null pointer issue
arm64: spinlock: Fix theoretical trylock() A-B-A with LSE atomics
proc: fix /proc/*/map_files lookup
cifs: silence compiler warnings showing up with gcc-8.0.0
bcache: properly set task state in bch_writeback_thread()
bcache: fix for allocator and register thread race
bcache: fix for data collapse after re-attaching an attached device
bcache: return attach error when no cache set exist
tools/libbpf: handle issues with bpf ELF objects containing .eh_frames
locking/qspinlock: Ensure node->count is updated before initialising node
irqchip/gic-v3: Change pr_debug message to pr_devel
scsi: ufs: Enable quirk to ignore sending WRITE_SAME command
scsi: bnx2fc: Fix check in SCSI completion handler for timed out request
scsi: sym53c8xx_2: iterator underflow in sym_getsync()
scsi: mptfusion: Add bounds check in mptctl_hp_targetinfo()
scsi: qla2xxx: Avoid triggering undefined behavior in qla2x00_mbx_completion()
ARC: Fix malformed ARC_EMUL_UNALIGNED default
usb: gadget: f_uac2: fix bFirstInterface in composite gadget
usb: gadget: fsl_udc_core: fix ep valid checks
usb: dwc2: Fix dwc2_hsotg_core_init_disconnected()
selftests: memfd: add config fragment for fuse
scsi: storvsc: Increase cmd_per_lun for higher speed devices
scsi: aacraid: fix shutdown crash when init fails
scsi: qla4xxx: skip error recovery in case of register disconnect.
ARM: OMAP2+: timer: fix a kmemleak caused in omap_get_timer_dt
ARM: OMAP3: Fix prm wake interrupt for resume
ARM: OMAP1: clock: Fix debugfs_create_*() usage
NFC: llcp: Limit size of SDP URI
mac80211: round IEEE80211_TX_STATUS_HEADROOM up to multiple of 4
md raid10: fix NULL deference in handle_write_completed()
drm/exynos: fix comparison to bitshift when dealing with a mask
usb: musb: fix enumeration after resume
locking/xchg/alpha: Add unconditional memory barrier to cmpxchg()
md: raid5: avoid string overflow warning
kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE
powerpc/bpf/jit: Fix 32-bit JIT for seccomp_data access
s390/cio: fix return code after missing interrupt
s390/cio: clear timer when terminating driver I/O
ARM: OMAP: Fix dmtimer init for omap1
smsc75xx: fix smsc75xx_set_features()
regulatory: add NUL to request alpha2
locking/xchg/alpha: Fix xchg() and cmpxchg() memory ordering bugs
x86/topology: Update the 'cpu cores' field in /proc/cpuinfo correctly across CPU hotplug operations
media: dmxdev: fix error code for invalid ioctls
md/raid1: fix NULL pointer dereference
batman-adv: fix packet checksum in receive path
batman-adv: invalidate checksum on fragment reassembly
netfilter: ebtables: convert BUG_ONs to WARN_ONs
nvme-pci: Fix nvme queue cleanup if IRQ setup fails
clocksource/drivers/fsl_ftm_timer: Fix error return checking
r8152: fix tx packets accounting
virtio-gpu: fix ioctl and expose the fixed status to userspace.
dmaengine: rcar-dmac: fix max_chunk_size for R-Car Gen3
bcache: fix kcrashes with fio in RAID5 backend dev
sit: fix IFLA_MTU ignored on NEWLINK
gianfar: Fix Rx byte accounting for ndev stats
net/tcp/illinois: replace broken algorithm reference link
xen/pirq: fix error path cleanup when binding MSIs
Btrfs: send, fix issuing write op when processing hole in no data mode
selftests/powerpc: Skip the subpage_prot tests if the syscall is unavailable
KVM: PPC: Book3S HV: Fix VRMA initialization with 2MB or 1GB memory backing
watchdog: f71808e_wdt: Fix magic close handling
e1000e: Fix check_for_link return value with autoneg off
e1000e: allocate ring descriptors with dma_zalloc_coherent
usb: musb: call pm_runtime_{get,put}_sync before reading vbus registers
scsi: mpt3sas: Do not mark fw_event workqueue as WQ_MEM_RECLAIM
scsi: sd: Keep disk read-only when re-reading partition
fbdev: Fixing arbitrary kernel leak in case FBIOGETCMAP_SPARC in sbusfb_ioctl_helper().
xen: xenbus: use put_device() instead of kfree()
USB: OHCI: Fix NULL dereference in HCDs using HCD_LOCAL_MEM
netfilter: ebtables: fix erroneous reject of last rule
bnxt_en: Check valid VNIC ID in bnxt_hwrm_vnic_set_tpa().
workqueue: use put_device() instead of kfree()
ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu
sunvnet: does not support GSO for sctp
net: Fix vlan untag for bridge and vlan_dev with reorder_hdr off
batman-adv: fix header size check in batadv_dbg_arp()
vti4: Don't count header length twice on tunnel setup
vti4: Don't override MTU passed on link creation via IFLA_MTU
perf/cgroup: Fix child event counting bug
RDMA/ucma: Correct option size check using optlen
mm/mempolicy.c: avoid use uninitialized preferred_node
selftests: ftrace: Add probe event argument syntax testcase
selftests: ftrace: Add a testcase for string type with kprobe_event
selftests: ftrace: Add a testcase for probepoint
batman-adv: fix multicast-via-unicast transmission with AP isolation
batman-adv: fix packet loss for broadcasted DHCP packets to a server
ARM: 8748/1: mm: Define vdso_start, vdso_end as array
net: qmi_wwan: add BroadMobi BM806U 2020:2033
net/usb/qmi_wwan.c: Add USB id for lt4120 modem
net-usb: add qmi_wwan if on lte modem wistron neweb d18q1
llc: properly handle dev_queue_xmit() return value
mm/kmemleak.c: wait for scan completion before disabling free
net: Fix untag for vlan packets without ethernet header
net: mvneta: fix enable of all initialized RXQs
sh: fix debug trap failure to process signals before return to user
x86/pgtable: Don't set huge PUD/PMD on non-leaf entries
fs/proc/proc_sysctl.c: fix potential page fault while unregistering sysctl table
swap: divide-by-zero when zero length swap file on ssd
sr: get/drop reference to device in revalidate and check_events
Force log to disk before reading the AGF during a fstrim
cpufreq: CPPC: Initialize shared perf capabilities of CPUs
scsi: aacraid: Insure command thread is not recursively stopped
dp83640: Ensure against premature access to PHY registers after reset
mm/ksm: fix interaction with THP
mm: fix races between address_space dereference and free in page_evicatable
Btrfs: bail out on error during replay_dir_deletes
Btrfs: fix NULL pointer dereference in log_dir_items
btrfs: Fix possible softlock on single core machines
ocfs2/dlm: don't handle migrate lockres if already in shutdown
sched/rt: Fix rq->clock_update_flags < RQCF_ACT_SKIP warning
KVM: VMX: raise internal error for exception during invalid protected mode state
fscache: Fix hanging wait on page discarded by writeback
sparc64: Make atomic_xchg() an inline function rather than a macro.
rtc: snvs: Fix usage of snvs_rtc_enable
net: bgmac: Fix endian access in bgmac_dma_tx_ring_free()
Bluetooth: btusb: Add USB ID 7392:a611 for Edimax EW-7611ULB
btrfs: tests/qgroup: Fix wrong tree backref level
Btrfs: fix copy_items() return value when logging an inode
btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers
xen/acpi: off by one in read_acpi_id()
ACPI: acpi_pad: Fix memory leak in power saving threads
powerpc/mpic: Check if cpu_possible() in mpic_physmask()
m68k: set dma and coherent masks for platform FEC ethernets
parisc/pci: Switch LBA PCI bus from Hard Fail to Soft Fail mode
hwmon: (nct6775) Fix writing pwmX_mode
rtc: hctosys: Ensure system time doesn't overflow time_t
powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer
powerpc/perf: Fix kernel address leak via sampling registers
tools/thermal: tmon: fix for segfault
selftests: Print the test we're running to /dev/kmsg
net/mlx5: Protect from command bit overflow
ath10k: Fix kernel panic while using worker (ath10k_sta_rc_update_wk)
ima: Fix Kconfig to select TPM 2.0 CRB interface
ima: Fallback to the builtin hash algorithm
virtio-net: Fix operstate for virtio when no VIRTIO_NET_F_STATUS
arm: dts: socfpga: fix GIC PPI warning
usb: dwc3: Update DWC_usb31 GTXFIFOSIZ reg fields
cpufreq: cppc_cpufreq: Fix cppc_cpufreq_init() failure path
clk: Don't show the incorrect clock phase
zorro: Set up z->dev.dma_mask for the DMA API
bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set
ACPICA: Events: add a return on failure from acpi_hw_register_read
ACPICA: acpi: acpica: fix acpi operand cache leak in nseval.c
i2c: mv64xxx: Apply errata delay only in standard mode
KVM: lapic: stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use
xhci: zero usb device slot_id member when disabling and freeing a xhci slot
MIPS: ath79: Fix AR724X_PLL_REG_PCIE_CONFIG offset
PCI: Restore config space on runtime resume despite being unbound
ipmi_ssif: Fix kernel panic at msg_done_handler
usb: dwc2: Fix interval type issue
usb: gadget: ffs: Let setup() return USB_GADGET_DELAYED_STATUS
usb: gadget: ffs: Execute copy_to_user() with USER_DS set
powerpc: Add missing prototype for arch_irq_work_raise()
ASoC: topology: create TLV data for dapm widgets
perf/core: Fix perf_output_read_group()
hwmon: (pmbus/max8688) Accept negative page register values
hwmon: (pmbus/adm1275) Accept negative page register values
cdrom: do not call check_disk_change() inside cdrom_open()
gfs2: Fix fallocate chunk size
usb: gadget: udc: change comparison to bitshift when dealing with a mask
usb: gadget: composite: fix incorrect handling of OS desc requests
x86/devicetree: Initialize device tree before using it
x86/devicetree: Fix device IRQ settings in DT
ALSA: vmaster: Propagate slave error
media: cx23885: Override 888 ImpactVCBe crystal frequency
media: cx23885: Set subdev host data to clk_freq pointer
media: s3c-camif: fix out-of-bounds array access
dmaengine: pl330: fix a race condition in case of threaded irqs
media: em28xx: USB bulk packet size fix
clk: rockchip: Prevent calculating mmc phase if clock rate is zero
enic: enable rq before updating rq descriptors
hwrng: stm32 - add reset during probe
staging: rtl8192u: return -ENOMEM on failed allocation of priv->oldaddr
rtc: tx4939: avoid unintended sign extension on a 24 bit shift
serial: xuartps: Fix out-of-bounds access through DT alias
serial: samsung: Fix out-of-bounds access through serial port index
serial: mxs-auart: Fix out-of-bounds access through serial port index
serial: imx: Fix out-of-bounds access through serial port index
serial: fsl_lpuart: Fix out-of-bounds access through DT alias
serial: arc_uart: Fix out-of-bounds access through DT alias
PCI: Add function 1 DMA alias quirk for Marvell 88SE9220
udf: Provide saner default for invalid uid / gid
media: cx25821: prevent out-of-bounds read on array card
clk: samsung: s3c2410: Fix PLL rates
clk: samsung: exynos5260: Fix PLL rates
clk: samsung: exynos5433: Fix PLL rates
clk: samsung: exynos5250: Fix PLL rates
clk: samsung: exynos3250: Fix PLL rates
crypto: sunxi-ss - Add MODULE_ALIAS to sun4i-ss
audit: return on memory error to avoid null pointer dereference
MIPS: Octeon: Fix logging messages with spurious periods after newlines
drm/rockchip: Respect page offset for PRIME mmap calls
x86/apic: Set up through-local-APIC mode on the boot CPU if 'noapic' specified
perf tests: Use arch__compare_symbol_names to compare symbols
perf report: Fix memory corruption in --branch-history mode --branch-history
selftests/net: fixes psock_fanout eBPF test case
netlabel: If PF_INET6, check sk_buff ip header version
scsi: lpfc: Fix issue_lip if link is disabled
scsi: lpfc: Fix soft lockup in lpfc worker thread during LIP testing
scsi: lpfc: Fix frequency of Release WQE CQEs
regulator: of: Add a missing 'of_node_put()' in an error handling path of 'of_regulator_match()'
ASoC: samsung: i2s: Ensure the RCLK rate is properly determined
Bluetooth: btusb: Add device ID for RTL8822BE
kdb: make "mdr" command repeat
s390/ftrace: use expoline for indirect branches
Linux 4.4.134
Change-Id: Iababaf9b89bc8d0437b95e1368d8b0a9126a178c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 116e5258e4115aca0c64ac0bf40ded3b353ed626 ]
Currently when UDF filesystem is recorded without uid / gid (ids are set
to -1), we will assign INVALID_[UG]ID to vfs inode unless user uses uid=
and gid= mount options. In such case filesystem could not be modified in
any way as VFS refuses to modify files with invalid ids (even by root).
This is confusing to users and not very useful default since such media
mode is generally used for removable media. Use overflow[ug]id instead
so that at least root can modify the filesystem.
Reported-by: Steve Kenton <skenton@ou.edu>
Reviewed-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 174d1232ebc84fcde8f5889d1171c9c7e74a10a7 ]
The chunk size of allocations in __gfs2_fallocate is calculated
incorrectly. The size can collapse, causing __gfs2_fallocate to
allocate one block at a time, which is very inefficient. This needs
fixing in two places:
In gfs2_quota_lock_check, always set ap->allowed to UINT_MAX to indicate
that there is no quota limit. This fixes callers that rely on
ap->allowed to be set even when quotas are off.
In __gfs2_fallocate, reset max_blks to UINT_MAX in each iteration of the
loop to make sure that allocation limits from one resource group won't
spill over into another resource group.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8434ec46c6e3232cebc25a910363b29f5c617820 ]
When logging an inode, at tree-log.c:copy_items(), if we call
btrfs_next_leaf() at the loop which checks for the need to log holes, we
need to make sure copy_items() returns the value 1 to its caller and
not 0 (on success). This is because the path the caller passed was
released and is now different from what is was before, and the caller
expects a return value of 0 to mean both success and that the path
has not changed, while a return value of 1 means both success and
signals the caller that it can not reuse the path, it has to perform
another tree search.
Even though this is a case that should not be triggered on normal
circumstances or very rare at least, its consequences can be very
unpredictable (especially when replaying a log tree).
Fixes: 16e7549f04 ("Btrfs: incompatible format change to remove hole extents")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3c0efdf03b2d127f0e40e30db4e7aa0429b1b79a ]
The extent tree of the test fs is like the following:
BTRFS info (device (null)): leaf 16327509003777336587 total ptrs 1 free space 3919
item 0 key (4096 168 4096) itemoff 3944 itemsize 51
extent refs 1 gen 1 flags 2
tree block key (68719476736 0 0) level 1
^^^^^^^
ref#0: tree block backref root 5
And it's using an empty tree for fs tree, so there is no way that its
level can be 1.
For REAL (created by mkfs) fs tree backref with no skinny metadata, the
result should look like:
item 3 key (30408704 EXTENT_ITEM 4096) itemoff 3845 itemsize 51
refs 1 gen 4 flags TREE_BLOCK
tree block key (256 INODE_ITEM 0) level 0
^^^^^^^
tree block backref root 5
Fix the level to 0, so it won't break later tree level checker.
Fixes: faa2dbf004 ("Btrfs: add sanity tests for new qgroup accounting code")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2c98425720233ae3e135add0c7e869b32913502f ]
If the fscache asynchronous write operation elects to discard a page that's
pending storage to the cache because the page would be over the store limit
then it needs to wake the page as someone may be waiting on completion of
the write.
The problem is that the store limit may be updated by a different
asynchronous operation - and so may miss the write - and that the store
limit may not even get updated until later by the netfs.
Fix the kernel hang by making fscache_write_op() mark as written any pages
that are over the limit.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bb34f24c7d2c98d0c81838a7700e6068325b17a0 ]
We should not handle migrate lockres if we are already in
'DLM_CTXT_IN_SHUTDOWN', as that will cause lockres remains after leaving
dlm domain. At last other nodes will get stuck into infinite loop when
requsting lock from us.
The problem is caused by concurrency umount between nodes. Before
receiveing N1's DLM_BEGIN_EXIT_DOMAIN_MSG, N2 has picked up N1 as the
migrate target. So N2 will continue sending lockres to N1 even though
N1 has left domain.
N1 N2 (owner)
touch file
access the file,
and get pr lock
begin leave domain and
pick up N1 as new owner
begin leave domain and
migrate all lockres done
begin migrate lockres to N1
end leave domain, but
the lockres left
unexpectedly, because
migrate task has passed
[piaojun@huawei.com: v3]
Link: http://lkml.kernel.org/r/5A9CBD19.5020107@huawei.com
Link: http://lkml.kernel.org/r/5A99F028.2090902@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1e1c50a929bc9e49bc3f9935b92450d9e69f8158 ]
do_chunk_alloc implements a loop checking whether there is a pending
chunk allocation and if so causes the caller do loop. Generally this
loop is executed only once, however testing with btrfs/072 on a single
core vm machines uncovered an extreme case where the system could loop
indefinitely. This is due to a missing cond_resched when loop which
doesn't give a chance to the previous chunk allocator finish its job.
The fix is to simply add the missing cond_resched.
Fixes: 6d74119f1a ("Btrfs: avoid taking the chunk_mutex in do_chunk_alloc")
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 80c0b4210a963e31529e15bf90519708ec947596 ]
0, 1 and <0 can be returned by btrfs_next_leaf(), and when <0 is
returned, path->nodes[0] could be NULL, log_dir_items lacks such a
check for <0 and we may run into a null pointer dereference panic.
Fixes: e02119d5a7 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b98def7ca6e152ee55e36863dddf6f41f12d1dc6 ]
If errors were returned by btrfs_next_leaf(), replay_dir_deletes needs
to bail out, otherwise @ret would be forced to be 0 after 'break;' and
the caller won't be aware of it.
Fixes: e02119d5a7 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8c81dd46ef3c416b3b95e3020fb90dbd44e6140b ]
Forcing the log to disk after reading the agf is wrong, we might be
calling xfs_log_force with XFS_LOG_SYNC with a metadata lock held.
This can cause a deadlock when racing a fstrim with a filesystem
shutdown.
The deadlock has been identified due a miscalculation bug in device-mapper
dm-thin, which returns lack of space to its users earlier than the device itself
really runs out of space, changing the device-mapper volume into an error state.
The problem happened while filling the filesystem with a single file,
triggering the bug in device-mapper, consequently causing an IO error
and shutting down the filesystem.
If such file is removed, and fstrim executed before the XFS finishes the
shut down process, the fstrim process will end up holding the buffer
lock, and going to sleep on the cil wait queue.
At this point, the shut down process will try to wake up all the threads
waiting on the cil wait queue, but for this, it will try to hold the
same buffer log already held my the fstrim, locking up the filesystem.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a0b0d1c345d0317efe594df268feb5ccc99f651e ]
proc_sys_link_fill_cache() does not take currently unregistering sysctl
tables into account, which might result into a page fault in
sysctl_follow_link() - add a check to fix it.
This bug has been present since v3.4.
Link: http://lkml.kernel.org/r/20180228013506.4915-1-danilokrummrich@dk-develop.de
Fixes: 0e47c99d7f ("sysctl: Replace root_list with links between sysctl_table_sets")
Signed-off-by: Danilo Krummrich <danilokrummrich@dk-develop.de>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Luis R . Rodriguez" <mcgrof@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ade7db991b47ab3016a414468164f4966bd08202 ]
This bug was fixed before, but came up again with the latest
compiler in another function:
fs/cifs/cifssmb.c: In function 'CIFSSMBSetEA':
fs/cifs/cifssmb.c:6362:3: error: 'strncpy' offset 8 is out of the bounds [0, 4] [-Werror=array-bounds]
strncpy(parm_data->list[0].name, ea_name, name_len);
Let's apply the same fix that was used for the other instances.
Fixes: b2a3ad9ca5 ("cifs: silence compiler warnings showing up with gcc-4.7.0")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ac7f1061c2c11bb8936b1b6a94cdb48de732f7a4 ]
Current code does:
if (sscanf(dentry->d_name.name, "%lx-%lx", start, end) != 2)
However sscanf() is broken garbage.
It silently accepts whitespace between format specifiers
(did you know that?).
It silently accepts valid strings which result in integer overflow.
Do not use sscanf() for any even remotely reliable parsing code.
OK
# readlink '/proc/1/map_files/55a23af39000-55a23b05b000'
/lib/systemd/systemd
broken
# readlink '/proc/1/map_files/ 55a23af39000-55a23b05b000'
/lib/systemd/systemd
broken
# readlink '/proc/1/map_files/55a23af39000-55a23b05b000 '
/lib/systemd/systemd
very broken
# readlink '/proc/1/map_files/1000000000000000055a23af39000-55a23b05b000'
/lib/systemd/systemd
Andrei said:
: This patch breaks criu. It was a bug in criu. And this bug is on a minor
: path, which works when memfd_create() isn't available. It is a reason why
: I ask to not backport this patch to stable kernels.
:
: In CRIU this bug can be triggered, only if this patch will be backported
: to a kernel which version is lower than v3.16.
Link: http://lkml.kernel.org/r/20171120212706.GA14325@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d984187e3a1ad7d12447a7ab2c43ce3717a2b5b3 ]
We should not reuse the dirty bh in jbd2 directly due to the following
situation:
1. When removing extent rec, we will dirty the bhs of extent rec and
truncate log at the same time, and hand them over to jbd2.
2. The bhs are submitted to jbd2 area successfully.
3. The write-back thread of device help flush the bhs to disk but
encounter write error due to abnormal storage link.
4. After a while the storage link become normal. Truncate log flush
worker triggered by the next space reclaiming found the dirty bh of
truncate log and clear its 'BH_Write_EIO' and then set it uptodate in
__ocfs2_journal_access():
ocfs2_truncate_log_worker
ocfs2_flush_truncate_log
__ocfs2_flush_truncate_log
ocfs2_replay_truncate_records
ocfs2_journal_access_di
__ocfs2_journal_access // here we clear io_error and set 'tl_bh' uptodata.
5. Then jbd2 will flush the bh of truncate log to disk, but the bh of
extent rec is still in error state, and unfortunately nobody will
take care of it.
6. At last the space of extent rec was not reduced, but truncate log
flush worker have given it back to globalalloc. That will cause
duplicate cluster problem which could be identified by fsck.ocfs2.
Sadly we can hardly revert this but set fs read-only in case of ruining
atomicity and consistency of space reclaim.
Link: http://lkml.kernel.org/r/5A6E8092.8090701@huawei.com
Fixes: acf8fdbe6a ("ocfs2: do not BUG if buffer not uptodate in __ocfs2_journal_access")
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 16c8d569f5704a84164f30ff01b29879f3438065 ]
The race between *set_acl and *get_acl will cause getting incomplete
xattr data as below:
processA processB
ocfs2_set_acl
ocfs2_xattr_set
__ocfs2_xattr_set_handle
ocfs2_get_acl_nolock
ocfs2_xattr_get_nolock:
processB may get incomplete xattr data if processA hasn't set_acl done.
So we should use 'ip_xattr_sem' to protect getting extended attribute in
ocfs2_get_acl_nolock(), as other processes could be changing it
concurrently.
Link: http://lkml.kernel.org/r/5A5DDCFF.7030001@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 025bcbde3634b2c9b316f227fed13ad6ad6817fb ]
If metadata is corrupted such as 'invalid inode block', we will get
failed by calling 'mount()' and then set filesystem readonly as below:
ocfs2_mount
ocfs2_initialize_super
ocfs2_init_global_system_inodes
ocfs2_iget
ocfs2_read_locked_inode
ocfs2_validate_inode_block
ocfs2_error
ocfs2_handle_error
ocfs2_set_ro_flag(osb, 0); // set readonly
In this situation we need return -EROFS to 'mount.ocfs2', so that user
can fix it by fsck. And then mount again. In addition, 'mount.ocfs2'
should be updated correspondingly as it only return 1 for all errno.
And I will post a patch for 'mount.ocfs2' too.
Link: http://lkml.kernel.org/r/5A4302FA.2010606@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Reviewed-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 762221f095e3932669093466aaf4b85ed9ad2ac1 ]
The raid6 corruption is that,
suppose that all disks can be read without problems and if the content
that was read out doesn't match its checksum, currently for raid6
btrfs at most retries twice,
- the 1st retry is to rebuild with all other stripes, it'll eventually
be a raid5 xor rebuild,
- if the 1st fails, the 2nd retry will deliberately fail parity p so
that it will do raid6 style rebuild,
however, the chances are that another non-parity stripe content also
has something corrupted, so that the above retries are not able to
return correct content.
We've fixed normal reads to rebuild raid6 correctly with more retries
in Patch "Btrfs: make raid6 rebuild retry more"[1], this is to fix
scrub to do the exactly same rebuild process.
[1]: https://patchwork.kernel.org/patch/10091755/
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9ea2c7c9da13c9073e371c046cbbc45481ecb459 ]
When modifying a tree where the root is at BTRFS_MAX_LEVEL - 1 then
the level variable is going to be 7 (this is the max height of the
tree). On the other hand btrfs_cow_block is always called with
"level + 1" as an index into the nodes and slots arrays. This leads to
an out of bounds access. Admittdely this will be benign since an OOB
access of the nodes array will likely read the 0th element from the
slots array, which in this case is going to be 0 (since we start CoW at
the top of the tree). The OOB access into the slots array in turn will
read the 0th and 1st values of the locks array, which would both be 0
at the time. However, this benign behavior relies on the fact that the
path being passed hasn't been initialised, if it has already been used to
query a btree then it could potentially have populated the nodes/slots arrays.
Fix it by explicitly checking if we are at level 7 (the maximum allowed
index in nodes/slots arrays) and explicitly call the CoW routine with
NULL for parent's node/slot.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Fixes-coverity-id: 711515
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 343e4fc1c60971b0734de26dbbd475d433950982 ]
Setting plug can merge adjacent IOs before dispatching IOs to the disk
driver.
Without plug, it'd not be a problem for single disk usecases, but for
multiple disks using raid profile, a large IO can be split to several
IOs of stripe length, and plug can be helpful to bring them together
for each disk so that we can save several disk access.
Moreover, fsync issues synchronous writes, so plug can really take
effect.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cbebc6ef4fc830f4040d4140bf53484812d5d5d9 ]
Since commit 57e62324e4 ("NFS: Store the legacy idmapper result in the
keyring") nfs_idmap_cache_timeout changed units from jiffies to seconds.
Unfortunately sysctl interface was not updated accordingly.
As a effect updating /proc/sys/fs/nfs/idmap_cache_timeout with some
value will incorrectly multiply this value by HZ.
Also reading /proc/sys/fs/nfs/idmap_cache_timeout will show real value
divided by HZ.
Fixes: 57e62324e4 ("NFS: Store the legacy idmapper result in the keyring")
Signed-off-by: Jan Chochol <jan@chochol.info>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit dce2630c7da73b0634686bca557cc8945cc450c8 ]
There are 2 comments in the NFSv4 code which suggest that
SIGLOST should possibly be sent to a process. In these
cases a lock has been lost.
The current practice is to set NFS_LOCK_LOST so that
read/write returns EIO when a lock is lost.
So change these comments to code when sets NFS_LOCK_LOST.
One case is when lock recovery after apparent server restart
fails with NFS4ERR_DENIED, NFS4ERR_RECLAIM_BAD, or
NFS4ERRO_RECLAIM_CONFLICT. The other case is when a lock
attempt as part of lease recovery fails with NFS4ERR_DENIED.
In an ideal world, these should not happen. However I have
a packet trace showing an NFSv4.1 session getting
NFS4ERR_BADSESSION after an extended network parition. The
NFSv4.1 client treats this like server reboot until/unless
it get NFS4ERR_NO_GRACE, in which case it switches over to
"nograce" recovery mode. In this network trace, the client
attempts to recover a lock and the server (incorrectly)
reports NFS4ERR_DENIED rather than NFS4ERR_NO_GRACE. This
leads to the ineffective comment and the client then
continues to write using the OPEN stateid.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1e2e547a93a00ebc21582c06ca3c6cfea2a309ee upstream.
For anything NFS-exported we do _not_ want to unlock new inode
before it has grown an alias; original set of fixes got the
ordering right, but missed the nasty complication in case of
lockdep being enabled - unlock_new_inode() does
lockdep_annotate_inode_mutex_key(inode)
which can only be done before anyone gets a chance to touch
->i_mutex. Unfortunately, flipping the order and doing
unlock_new_inode() before d_instantiate() opens a window when
mkdir can race with open-by-fhandle on a guessed fhandle, leading
to multiple aliases for a directory inode and all the breakage
that follows from that.
Correct solution: a new primitive (d_instantiate_new())
combining these two in the right order - lockdep annotate, then
d_instantiate(), then the rest of unlock_new_inode(). All
combinations of d_instantiate() with unlock_new_inode() should
be converted to that.
Cc: stable@kernel.org # 2.6.29 and later
Tested-by: Mike Marshall <hubcap@omnibond.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5a93790d4e2df73e30c965ec6e49be82fc3ccfce upstream.
xfs_attr_[get|remove]() have unlocked attribute fork checks to optimize
away a lock cycle in cases where the fork does not exist or is otherwise
empty. This check is not safe, however, because an attribute fork short
form to extent format conversion includes a transient state that causes
the xfs_inode_hasattr() check to fail. Specifically,
xfs_attr_shortform_to_leaf() creates an empty extent format attribute
fork and then adds the existing shortform attributes to it.
This means that lookup of an existing xattr can spuriously return
-ENOATTR when racing against a setxattr that causes the associated
format conversion. This was originally reproduced by an untar on a
particularly configured glusterfs volume, but can also be reproduced on
demand with properly crafted xattr requests.
The format conversion occurs under the exclusive ilock. xfs_attr_get()
and xfs_attr_remove() already have the proper locking and checks further
down in the functions to handle this situation correctly. Drop the
unlocked checks to avoid the spurious failure and rely on the existing
logic.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit baf10564fbb66ea222cae66fbff11c444590ffd9 upstream.
kill_ioctx() used to have an explicit RCU delay between removing the
reference from ->ioctx_table and percpu_ref_kill() dropping the refcount.
At some point that delay had been removed, on the theory that
percpu_ref_kill() itself contained an RCU delay. Unfortunately, that was
the wrong kind of RCU delay and it didn't care about rcu_read_lock() used
by lookup_ioctx(). As the result, we could get ctx freed right under
lookup_ioctx(). Tejun has fixed that in a6d7cff472e ("fs/aio: Add explicit
RCU grace period when freeing kioctx"); however, that fix is not enough.
Suppose io_destroy() from one thread races with e.g. io_setup() from another;
CPU1 removes the reference from current->mm->ioctx_table[...] just as CPU2
has picked it (under rcu_read_lock()). Then CPU1 proceeds to drop the
refcount, getting it to 0 and triggering a call of free_ioctx_users(),
which proceeds to drop the secondary refcount and once that reaches zero
calls free_ioctx_reqs(). That does
INIT_RCU_WORK(&ctx->free_rwork, free_ioctx);
queue_rcu_work(system_wq, &ctx->free_rwork);
and schedules freeing the whole thing after RCU delay.
In the meanwhile CPU2 has gotten around to percpu_ref_get(), bumping the
refcount from 0 to 1 and returned the reference to io_setup().
Tejun's fix (that queue_rcu_work() in there) guarantees that ctx won't get
freed until after percpu_ref_get(). Sure, we'd increment the counter before
ctx can be freed. Now we are out of rcu_read_lock() and there's nothing to
stop freeing of the whole thing. Unfortunately, CPU2 assumes that since it
has grabbed the reference, ctx is *NOT* going away until it gets around to
dropping that reference.
The fix is obvious - use percpu_ref_tryget_live() and treat failure as miss.
It's not costlier than what we currently do in normal case, it's safe to
call since freeing *is* delayed and it closes the race window - either
lookup_ioctx() comes before percpu_ref_kill() (in which case ctx->users
won't reach 0 until the caller of lookup_ioctx() drops it) or lookup_ioctx()
fails, ctx->users is unaffected and caller of lookup_ioctx() doesn't see
the object in question at all.
Cc: stable@kernel.org
Fixes: a6d7cff472e "fs/aio: Add explicit RCU grace period when freeing kioctx"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 30da870ce4a4e007c901858a96e9e394a1daa74a upstream.
we unlock the directory hash too early - if we are looking at secondary
link and primary (in another directory) gets removed just as we unlock,
we could have the old primary moved in place of the secondary, leaving
us to look into freed entry (and leaving our dentry with ->d_fsdata
pointing to a freed entry).
Cc: stable@vger.kernel.org # 2.4.4+
Acked-by: David Sterba <dsterba@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlsJA10ACgkQONu9yGCS
aT4NqRAAr+4+KwFqbdUDDAdYMLgomybjLVNxbI80CvOTF24NIKfKIKUn+Q3e2qCE
11y2Q+PixE9qbujYPg+qoC3Xux+S6DAj9QOJPJpuJVQhBRRmnFugKlAq630kaoxx
VOPJx1x+244Q1OsAJMRDqEJEtMEFew/r0VGQ1yrXd9APYgc0KvDKHfjt8rXzGGuA
sdf5GsbxSxptMCF6nnUAGcyfuRBVIBW0v6NOEnj5m/K6f4oESQb+uKk7R8MO7m3U
kc2ggTALxA1u/0iAsfxScfaFkT865+2IxCz4i4N13PUmxuJJTDF0xshAOSdlrSxV
j8x7B+YmVaPgs63m2EyClQpVitqkcgyfiPZ0byWEcaKtuYXavcOO77aGB7W/QUSw
ZfGJeDhz0hkjOCSGD2LCx062clMSpqqZn20MUDyF32HiRl1mIf6prac/LBXphNHh
l+arXyzRk9rVTgtfbqcKBgi8h5n0LKzqbfD4f+8hrhv8q0i+9tNoM1lW8R+GL4RC
nXfCuhCEIEXbsfQIJeSkEp6AH8N9guMcbw9jOiji9HvNFQZj3RpfkuCHGGggBlwa
EiD3GzMhwFyJmIzWqdYCSGfCh6YI6FA7KpspOKhUKZKkHVDfJ7M+A8lBQmOZGRBQ
G44XJJvaKB7l/I0ux2S0C5CdcyBb7EMjD8tXXLnRjMEGjLoKpqM=
=s+Ms
-----END PGP SIGNATURE-----
Merge 4.4.133 into android-4.4
Changes in 4.4.133
8139too: Use disable_irq_nosync() in rtl8139_poll_controller()
bridge: check iface upper dev when setting master via ioctl
dccp: fix tasklet usage
ipv4: fix memory leaks in udp_sendmsg, ping_v4_sendmsg
llc: better deal with too small mtu
net: ethernet: sun: niu set correct packet size in skb
net/mlx4_en: Verify coalescing parameters are in range
net_sched: fq: take care of throttled flows before reuse
net: support compat 64-bit time in {s,g}etsockopt
openvswitch: Don't swap table in nlattr_set() after OVS_ATTR_NESTED is found
qmi_wwan: do not steal interfaces from class drivers
r8169: fix powering up RTL8168h
sctp: handle two v4 addrs comparison in sctp_inet6_cmp_addr
sctp: use the old asoc when making the cookie-ack chunk in dupcook_d
tg3: Fix vunmap() BUG_ON() triggered from tg3_free_consistent().
bonding: do not allow rlb updates to invalid mac
tcp: ignore Fast Open on repair mode
sctp: fix the issue that the cookie-ack with auth can't get processed
sctp: delay the authentication for the duplicated cookie-echo chunk
ALSA: timer: Call notifier in the same spinlock
audit: move calcs after alloc and check when logging set loginuid
arm64: introduce mov_q macro to move a constant into a 64-bit register
arm64: Add work around for Arm Cortex-A55 Erratum 1024718
futex: Remove unnecessary warning from get_futex_key
futex: Remove duplicated code and fix undefined behaviour
xfrm: fix xfrm_do_migrate() with AEAD e.g(AES-GCM)
lockd: lost rollback of set_grace_period() in lockd_down_net()
Revert "ARM: dts: imx6qdl-wandboard: Fix audio channel swap"
l2tp: revert "l2tp: fix missing print session offset info"
pipe: cap initial pipe capacity according to pipe-max-size limit
futex: futex_wake_op, fix sign_extend32 sign bits
kernel/exit.c: avoid undefined behaviour when calling wait4()
usbip: usbip_host: refine probe and disconnect debug msgs to be useful
usbip: usbip_host: delete device from busid_table after rebind
usbip: usbip_host: run rebind from exit when module is removed
usbip: usbip_host: fix NULL-ptr deref and use-after-free errors
usbip: usbip_host: fix bad unlock balance during stub_probe()
ALSA: usb: mixer: volume quirk for CM102-A+/102S+
ALSA: hda: Add Lenovo C50 All in one to the power_save blacklist
ALSA: control: fix a redundant-copy issue
spi: pxa2xx: Allow 64-bit DMA
powerpc/powernv: panic() on OPAL < V3
powerpc/powernv: Remove OPALv2 firmware define and references
powerpc/powernv: remove FW_FEATURE_OPALv3 and just use FW_FEATURE_OPAL
cpuidle: coupled: remove unused define cpuidle_coupled_lock
powerpc: Don't preempt_disable() in show_cpuinfo()
vmscan: do not force-scan file lru if its absolute size is small
proc: meminfo: estimate available memory more conservatively
mm: filemap: remove redundant code in do_read_cache_page
mm: filemap: avoid unnecessary calls to lock_page when waiting for IO to complete during a read
signals: avoid unnecessary taking of sighand->siglock
cpufreq: intel_pstate: Enable HWP by default
tracing/x86/xen: Remove zero data size trace events trace_xen_mmu_flush_tlb{_all}
proc read mm's {arg,env}_{start,end} with mmap semaphore taken.
procfs: fix pthread cross-thread naming if !PR_DUMPABLE
powerpc/powernv: Fix NVRAM sleep in invalid context when crashing
mm: don't allow deferred pages with NEED_PER_CPU_KM
s390/qdio: fix access to uninitialized qdio_q fields
s390/cpum_sf: ensure sample frequency of perf event attributes is non-zero
s390/qdio: don't release memory in qdio_setup_irq()
s390: remove indirect branch from do_softirq_own_stack
efi: Avoid potential crashes, fix the 'struct efi_pci_io_protocol_32' definition for mixed mode
ARM: 8771/1: kprobes: Prohibit kprobes on do_undefinstr
tick/broadcast: Use for_each_cpu() specially on UP kernels
ARM: 8769/1: kprobes: Fix to use get_kprobe_ctlblk after irq-disabed
ARM: 8770/1: kprobes: Prohibit probing on optimized_callback
ARM: 8772/1: kprobes: Prohibit kprobes on get_user functions
Btrfs: fix xattr loss after power failure
btrfs: fix crash when trying to resume balance without the resume flag
btrfs: fix reading stale metadata blocks after degraded raid1 mounts
net: test tailroom before appending to linear skb
packet: in packet_snd start writing at link layer allocation
sock_diag: fix use-after-free read in __sk_free
tcp: purge write queue in tcp_connect_init()
ext2: fix a block leak
s390: add assembler macros for CPU alternatives
s390: move expoline assembler macros to a header
s390/lib: use expoline for indirect branches
s390/kernel: use expoline for indirect branches
s390: move spectre sysfs attribute code
s390: extend expoline to BC instructions
s390: use expoline thunks in the BPF JIT
scsi: libsas: defer ata device eh commands to libata
scsi: sg: allocate with __GFP_ZERO in sg_build_indirect()
scsi: zfcp: fix infinite iteration on ERP ready list
dmaengine: ensure dmaengine helpers check valid callback
time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
gpio: rcar: Add Runtime PM handling for interrupts
cfg80211: limit wiphy names to 128 bytes
hfsplus: stop workqueue when fill_super() failed
x86/kexec: Avoid double free_page() upon do_kexec_load() failure
Linux 4.4.133
Change-Id: I0554b12889bc91add2a444da95f18d59c6fb9cdb
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 66072c29328717072fd84aaff3e070e3f008ba77 upstream.
syzbot is reporting ODEBUG messages at hfsplus_fill_super() [1]. This
is because hfsplus_fill_super() forgot to call cancel_delayed_work_sync().
As far as I can see, it is hfsplus_mark_mdb_dirty() from
hfsplus_new_inode() in hfsplus_fill_super() that calls
queue_delayed_work(). Therefore, I assume that hfsplus_new_inode() does
not fail if queue_delayed_work() was called, and the out_put_hidden_dir
label is the appropriate location to call cancel_delayed_work_sync().
[1] https://syzkaller.appspot.com/bug?id=a66f45e96fdbeb76b796bf46eb25ea878c42a6c9
Link: http://lkml.kernel.org/r/964a8b27-cd69-357c-fe78-76b066056201@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+4f2e5f086147d543ab03@syzkaller.appspotmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Ernesto A. Fernandez <ernesto.mnd.fernandez@gmail.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5aa1437d2d9a068c0334bd7c9dafa8ec4f97f13b upstream.
open file, unlink it, then use ioctl(2) to make it immutable or
append only. Now close it and watch the blocks *not* freed...
Immutable/append-only checks belong in ->setattr().
Note: the bug is old and backport to anything prior to 737f2e93b9
("ext2: convert to use the new truncate convention") will need
these checks lifted into ext2_setattr().
Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 02a3307aa9c20b4f6626255b028f07f6cfa16feb upstream.
If a btree block, aka. extent buffer, is not available in the extent
buffer cache, it'll be read out from the disk instead, i.e.
btrfs_search_slot()
read_block_for_search() # hold parent and its lock, go to read child
btrfs_release_path()
read_tree_block() # read child
Unfortunately, the parent lock got released before reading child, so
commit 5bdd3536cb ("Btrfs: Fix block generation verification race") had
used 0 as parent transid to read the child block. It forces
read_tree_block() not to check if parent transid is different with the
generation id of the child that it reads out from disk.
A simple PoC is included in btrfs/124,
0. A two-disk raid1 btrfs,
1. Right after mkfs.btrfs, block A is allocated to be device tree's root.
2. Mount this filesystem and put it in use, after a while, device tree's
root got COW but block A hasn't been allocated/overwritten yet.
3. Umount it and reload the btrfs module to remove both disks from the
global @fs_devices list.
4. mount -odegraded dev1 and write some data, so now block A is allocated
to be a leaf in checksum tree. Note that only dev1 has the latest
metadata of this filesystem.
5. Umount it and mount it again normally (with both disks), since raid1
can pick up one disk by the writer task's pid, if btrfs_search_slot()
needs to read block A, dev2 which does NOT have the latest metadata
might be read for block A, then we got a stale block A.
6. As parent transid is not checked, block A is marked as uptodate and
put into the extent buffer cache, so the future search won't bother
to read disk again, which means it'll make changes on this stale
one and make it dirty and flush it onto disk.
To avoid the problem, parent transid needs to be passed to
read_tree_block().
In order to get a valid parent transid, we need to hold the parent's
lock until finishing reading child.
This patch needs to be slightly adapted for stable kernels, the
&first_key parameter added to read_tree_block() is from 4.16+
(581c1760415c4). The fix is to replace 0 by 'gen'.
Fixes: 5bdd3536cb ("Btrfs: Fix block generation verification race")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 02ee654d3a04563c67bfe658a05384548b9bb105 upstream.
We set the BTRFS_BALANCE_RESUME flag in the btrfs_recover_balance()
only, which isn't called during the remount. So when resuming from
the paused balance we hit the bug:
kernel: kernel BUG at fs/btrfs/volumes.c:3890!
::
kernel: balance_kthread+0x51/0x60 [btrfs]
kernel: kthread+0x111/0x130
::
kernel: RIP: btrfs_balance+0x12e1/0x1570 [btrfs] RSP: ffffba7d0090bde8
Reproducer:
On a mounted filesystem:
btrfs balance start --full-balance /btrfs
btrfs balance pause /btrfs
mount -o remount,ro /dev/sdb /btrfs
mount -o remount,rw /dev/sdb /btrfs
To fix this set the BTRFS_BALANCE_RESUME flag in
btrfs_resume_balance_async().
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9a8fca62aacc1599fea8e813d01e1955513e4fad upstream.
If a file has xattrs, we fsync it, to ensure we clear the flags
BTRFS_INODE_NEEDS_FULL_SYNC and BTRFS_INODE_COPY_EVERYTHING from its
inode, the current transaction commits and then we fsync it (without
either of those bits being set in its inode), we end up not logging
all its xattrs. This results in deleting all xattrs when replying the
log after a power failure.
Trivial reproducer
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ touch /mnt/foobar
$ setfattr -n user.xa -v qwerty /mnt/foobar
$ xfs_io -c "fsync" /mnt/foobar
$ sync
$ xfs_io -c "pwrite -S 0xab 0 64K" /mnt/foobar
$ xfs_io -c "fsync" /mnt/foobar
<power failure>
$ mount /dev/sdb /mnt
$ getfattr --absolute-names --dump /mnt/foobar
<empty output>
$
So fix this by making sure all xattrs are logged if we log a file's inode
item and neither the flags BTRFS_INODE_NEEDS_FULL_SYNC nor
BTRFS_INODE_COPY_EVERYTHING were set in the inode.
Fixes: 36283bf777 ("Btrfs: fix fsync xattr loss in the fast fsync path")
Cc: <stable@vger.kernel.org> # 4.2+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1b3044e39a89cb1d4d5313da477e8dfea2b5232d upstream.
The PR_DUMPABLE flag causes the pid related paths of the proc file
system to be owned by ROOT.
The implementation of pthread_set/getname_np however needs access to
/proc/<pid>/task/<tid>/comm. If PR_DUMPABLE is false this
implementation is locked out.
This patch installs a special permission function for the file "comm"
that grants read and write access to all threads of the same group
regardless of the ownership of the inode. For all other threads the
function falls back to the generic inode permission check.
[akpm@linux-foundation.org: fix spello in comment]
Signed-off-by: Janis Danisevskis <jdanis@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Minfei Huang <mnfhuang@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Calvin Owens <calvinowens@fb.com>
Cc: Jann Horn <jann@thejh.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a3b609ef9f8b1dbfe97034ccad6cd3fe71fbe7ab upstream.
Only functions doing more than one read are modified. Consumeres
happened to deal with possibly changing data, but it does not seem like
a good thing to rely on.
Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Jarod Wilson <jarod@redhat.com>
Cc: Jan Stancek <jstancek@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anshuman Khandual <anshuman.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 84ad5802a33a4964a49b8f7d24d80a214a096b19 upstream.
The MemAvailable item in /proc/meminfo is to give users a hint of how
much memory is allocatable without causing swapping, so it excludes the
zones' low watermarks as unavailable to userspace.
However, for a userspace allocation, kswapd will actually reclaim until
the free pages hit a combination of the high watermark and the page
allocator's lowmem protection that keeps a certain amount of DMA and
DMA32 memory from userspace as well.
Subtract the full amount we know to be unavailable to userspace from the
number of free pages when calculating MemAvailable.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 086e774a57fba4695f14383c0818994c0b31da7c upstream.
This is a patch that provides behavior that is more consistent, and
probably less surprising to users. I consider the change optional, and
welcome opinions about whether it should be applied.
By default, pipes are created with a capacity of 64 kiB. However,
/proc/sys/fs/pipe-max-size may be set smaller than this value. In this
scenario, an unprivileged user could thus create a pipe whose initial
capacity exceeds the limit. Therefore, it seems logical to cap the
initial pipe capacity according to the value of pipe-max-size.
The test program shown earlier in this patch series can be used to
demonstrate the effect of the change brought about with this patch:
# cat /proc/sys/fs/pipe-max-size
1048576
# sudo -u mtk ./test_F_SETPIPE_SZ 1
Initial pipe capacity: 65536
# echo 10000 > /proc/sys/fs/pipe-max-size
# cat /proc/sys/fs/pipe-max-size
16384
# sudo -u mtk ./test_F_SETPIPE_SZ 1
Initial pipe capacity: 16384
# ./test_F_SETPIPE_SZ 1
Initial pipe capacity: 65536
The last two executions of 'test_F_SETPIPE_SZ' show that pipe-max-size
caps the initial allocation for a new pipe for unprivileged users, but
not for privileged users.
Link: http://lkml.kernel.org/r/31dc7064-2a17-9c5b-1df1-4e3012ee992c@gmail.com
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: <socketpair@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Jens Axboe <axboe@fb.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3a2b19d1ee5633f76ae8a88da7bc039a5d1732aa upstream.
Commit efda760fe95ea ("lockd: fix lockd shutdown race") is incorrect,
it removes lockd_manager and disarm grace_period_end for init_net only.
If nfsd was started from another net namespace lockd_up_net() calls
set_grace_period() that adds lockd_manager into per-netns list
and queues grace_period_end delayed work.
These action should be reverted in lockd_down_net().
Otherwise it can lead to double list_add on after restart nfsd in netns,
and to use-after-free if non-disarmed delayed work will be executed after netns destroy.
Fixes: efda760fe95e ("lockd: fix lockd shutdown race")
Cc: stable@vger.kernel.org
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fscrypt currently only supports AES encryption. However, many low-end
mobile devices have older CPUs that don't have AES instructions, e.g.
the ARMv8 Cryptography Extensions. Currently, user data on such devices
is not encrypted at rest because AES is too slow, even when the NEON
bit-sliced implementation of AES is used. Unfortunately, it is
infeasible to encrypt these devices at all when AES is the only option.
Therefore, this patch updates fscrypt to support the Speck block cipher,
which was recently added to the crypto API. The C implementation of
Speck is not especially fast, but Speck can be implemented very
efficiently with general-purpose vector instructions, e.g. ARM NEON.
For example, on an ARMv7 processor, we measured the NEON-accelerated
Speck128/256-XTS at 69 MB/s for both encryption and decryption, while
AES-256-XTS with the NEON bit-sliced implementation was only 22 MB/s
encryption and 19 MB/s decryption.
There are multiple variants of Speck. This patch only adds support for
Speck128/256, which is the variant with a 128-bit block size and 256-bit
key size -- the same as AES-256. This is believed to be the most secure
variant of Speck, and it's only about 6% slower than Speck128/128.
Speck64/128 would be at least 20% faster because it has 20% rounds, and
it can be even faster on CPUs that can't efficiently do the 64-bit
operations needed for Speck128. However, Speck64's 64-bit block size is
not preferred security-wise. ARM NEON also supports the needed 64-bit
operations even on 32-bit CPUs, resulting in Speck128 being fast enough
for our targeted use cases so far.
The chosen modes of operation are XTS for contents and CTS-CBC for
filenames. These are the same modes of operation that fscrypt defaults
to for AES. Note that as with the other fscrypt modes, Speck will not
be used unless userspace chooses to use it. Nor are any of the existing
modes (which are all AES-based) being removed, of course.
We intentionally don't make CONFIG_FS_ENCRYPTION select
CONFIG_CRYPTO_SPECK, so people will have to enable Speck support
themselves if they need it. This is because we shouldn't bloat the
FS_ENCRYPTION dependencies with every new cipher, especially ones that
aren't recommended for most users. Moreover, CRYPTO_SPECK is just the
generic implementation, which won't be fast enough for many users; in
practice, they'll need to enable CRYPTO_SPECK_NEON to get acceptable
performance.
More details about our choice of Speck can be found in our patches that
added Speck to the crypto API, and the follow-on discussion threads.
We're planning a publication that explains the choice in more detail.
But briefly, we can't use ChaCha20 as we previously proposed, since it
would be insecure to use a stream cipher in this context, with potential
IV reuse during writes on f2fs and/or on wear-leveling flash storage.
We also evaluated many other lightweight and/or ARX-based block ciphers
such as Chaskey-LTS, RC5, LEA, CHAM, Threefish, RC6, NOEKEON, SPARX, and
XTEA. However, all had disadvantages vs. Speck, such as insufficient
performance with NEON, much less published cryptanalysis, or an
insufficient security level. Various design choices in Speck make it
perform better with NEON than competing ciphers while still having a
security margin similar to AES, and in the case of Speck128 also the
same available security levels. Unfortunately, Speck does have some
political baggage attached -- it's an NSA designed cipher, and was
rejected from an ISO standard (though for context, as far as I know none
of the above-mentioned alternatives are ISO standards either).
Nevertheless, we believe it is a good solution to the problem from a
technical perspective.
Certain algorithms constructed from ChaCha or the ChaCha permutation,
such as MEM (Masked Even-Mansour) or HPolyC, may also meet our
performance requirements. However, these are new constructions that
need more time to receive the cryptographic review and acceptance needed
to be confident in their security. HPolyC hasn't been published yet,
and we are concerned that MEM makes stronger assumptions about the
underlying permutation than the ChaCha stream cipher does. In contrast,
the XTS mode of operation is relatively well accepted, and Speck has
over 70 cryptanalysis papers. Of course, these ChaCha-based algorithms
can still be added later if they become ready.
The best known attack on Speck128/256 is a differential cryptanalysis
attack on 25 of 34 rounds with 2^253 time complexity and 2^125 chosen
plaintexts, i.e. only marginally faster than brute force. There is no
known attack on the full 34 rounds.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry-picked from commit 12d28f79558f2e987c5f3817f89e1ccc0f11a7b5
https://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt.git master)
(dropped Documentation/filesystems/fscrypt.rst change)
(fixed merge conflict in fs/crypto/keyinfo.c)
(also ported change to fs/ext4/, which isn't using fs/crypto/ in this
kernel version)
Change-Id: I62c632044dfd06a2c5b74c2fb058f9c3b8af0add
Signed-off-by: Eric Biggers <ebiggers@google.com>
When uid_base_stuff has no entries, proc_uid_base_readdir tries to
compute an address before the start of the array. Revise this check to
use uid_base_stuff + nents instead, which makes the code valid
regardless of array size.
Bug: 80158484
Test: No more compiler warning with CONFIG_CPU_FREQ_TIMES=n
Change-Id: I6e55b27c3ba8210cee194f6d27bbd62c0b263796
Signed-off-by: Connor O'Brien <connoro@google.com>