Commit graph

2621 commits

Author SHA1 Message Date
David Keitel
f2b1fed1bd Merge remote-tracking branch 'lsk-44/linux-linaro-lsk-v4.4' into 44rc2
* lsk-44/linux-linaro-lsk-v4.4:
  Linux 4.4.3
  modules: fix modparam async_probe request
  module: wrapper for symbol name.
  itimers: Handle relative timers with CONFIG_TIME_LOW_RES proper
  posix-timers: Handle relative timers with CONFIG_TIME_LOW_RES proper
  timerfd: Handle relative timers with CONFIG_TIME_LOW_RES proper
  prctl: take mmap sem for writing to protect against others
  xfs: log mount failures don't wait for buffers to be released
  Revert "xfs: clear PF_NOFREEZE for xfsaild kthread"
  xfs: inode recovery readahead can race with inode buffer creation
  libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct
  ovl: setattr: check permissions before copy-up
  ovl: root: copy attr
  ovl: check dentry positiveness in ovl_cleanup_whiteouts()
  ovl: use a minimal buffer in ovl_copy_xattr
  ovl: allow zero size xattr
  futex: Drop refcount if requeue_pi() acquired the rtmutex
  devm_memremap_release(): fix memremap'd addr handling
  ipc/shm: handle removed segments gracefully in shm_mmap()
  intel_scu_ipcutil: underflow in scu_reg_access()
  mm,thp: khugepaged: call pte flush at the time of collapse
  dump_stack: avoid potential deadlocks
  radix-tree: fix oops after radix_tree_iter_retry
  drivers/hwspinlock: fix race between radix tree insertion and lookup
  radix-tree: fix race in gang lookup
  MAINTAINERS: return arch/sh to maintained state, with new maintainers
  memcg: only free spare array when readers are done
  numa: fix /proc/<pid>/numa_maps for hugetlbfs on s390
  fs/hugetlbfs/inode.c: fix bugs in hugetlb_vmtruncate_list()
  scripts/bloat-o-meter: fix python3 syntax error
  dma-debug: switch check from _text to _stext
  m32r: fix m32104ut_defconfig build fail
  xhci: Fix list corruption in urb dequeue at host removal
  Revert "xhci: don't finish a TD if we get a short-transfer event mid TD"
  iommu/vt-d: Clear PPR bit to ensure we get more page request interrupts
  iommu/vt-d: Fix 64-bit accesses to 32-bit DMAR_GSTS_REG
  iommu/vt-d: Fix mm refcounting to hold mm_count not mm_users
  iommu/amd: Correct the wrong setting of alias DTE in do_attach
  iommu/vt-d: Don't skip PCI devices when disabling IOTLB
  Input: vmmouse - fix absolute device registration
  string_helpers: fix precision loss for some inputs
  Input: i8042 - add Fujitsu Lifebook U745 to the nomux list
  Input: elantech - mark protocols v2 and v3 as semi-mt
  mm: fix regression in remap_file_pages() emulation
  mm: replace vma_lock_anon_vma with anon_vma_lock_read/write
  mm: fix mlock accouting
  libnvdimm: fix namespace object confusion in is_uuid_busy()
  mm: soft-offline: check return value in second __get_any_page() call
  perf kvm record/report: 'unprocessable sample' error while recording/reporting guest data
  KVM: PPC: Fix ONE_REG AltiVec support
  KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8
  KVM: arm/arm64: Fix reference to uninitialised VGIC
  arm64: dma-mapping: fix handling of devices registered before arch_initcall
  ARM: OMAP2+: Fix ppa_zero_params and ppa_por_params for rodata
  ARM: OMAP2+: Fix save_secure_ram_context for rodata
  ARM: OMAP2+: Fix l2dis_3630 for rodata
  ARM: OMAP2+: Fix l2_inv_api_params for rodata
  ARM: OMAP2+: Fix wait_dll_lock_timed for rodata
  ARM: dts: at91: sama5d4ek: add phy address and IRQ for macb0
  ARM: dts: at91: sama5d4 xplained: fix phy0 IRQ type
  ARM: dts: at91: sama5d4: fix instance id of DBGU
  ARM: dts: at91: sama5d4 xplained: properly mux phy interrupt
  ARM: dts: omap5-board-common: enable rtc and charging of backup battery
  ARM: dts: Fix omap5 PMIC control lines for RTC writes
  ARM: dts: Fix wl12xx missing clocks that cause hangs
  ARM: nomadik: fix up SD/MMC DT settings
  ARM: 8517/1: ICST: avoid arithmetic overflow in icst_hz()
  ARM: 8519/1: ICST: try other dividends than 1
  arm64: mm: avoid calling apply_to_page_range on empty range
  ARM: mvebu: remove duplicated regulator definition in Armada 388 GP
  powerpc/ioda: Set "read" permission when "write" is set
  powerpc/powernv: Fix stale PE primary bus
  powerpc/eeh: Fix stale cached primary bus
  powerpc/eeh: Fix PE location code
  SUNRPC: Fixup socket wait for memory
  udf: Check output buffer length when converting name to CS0
  udf: Prevent buffer overrun with multi-byte characters
  udf: limit the maximum number of indirect extents in a row
  pNFS/flexfiles: Fix an XDR encoding bug in layoutreturn
  nfs: Fix race in __update_open_stateid()
  pNFS/flexfiles: Fix an Oopsable typo in ff_mirror_match_fh()
  NFS: Fix attribute cache revalidation
  cifs: fix erroneous return value
  cifs_dbg() outputs an uninitialized buffer in cifs_readdir()
  cifs: fix race between call_async() and reconnect()
  cifs: Ratelimit kernel log messages
  iio: inkern: fix a NULL dereference on error
  iio: pressure: mpl115: fix temperature offset sign
  iio: light: acpi-als: Report data as processed
  iio: dac: mcp4725: set iio name property in sysfs
  iio: add IIO_TRIGGER dependency to STK8BA50
  iio: add HAS_IOMEM dependency to VF610_ADC
  iio-light: Use a signed return type for ltr501_match_samp_freq()
  iio:adc:ti_am335x_adc Fix buffered mode by identifying as software buffer.
  iio: adis_buffer: Fix out-of-bounds memory access
  scsi: fix soft lockup in scsi_remove_target() on module removal
  SCSI: Add Marvell Console to VPD blacklist
  scsi_dh_rdac: always retry MODE SELECT on command lock violation
  drivers/scsi/sg.c: mark VMA as VM_IO to prevent migration
  SCSI: fix crashes in sd and sr runtime PM
  iscsi-target: Fix potential dead-lock during node acl delete
  scsi: add Synology to 1024 sector blacklist
  klist: fix starting point removed bug in klist iterators
  tracepoints: Do not trace when cpu is offline
  tracing: Fix freak link error caused by branch tracer
  perf tools: tracepoint_error() can receive e=NULL, robustify it
  tools lib traceevent: Fix output of %llu for 64 bit values read on 32 bit machines
  ptrace: use fsuid, fsgid, effective creds for fs access checks
  Btrfs: fix direct IO requests not reporting IO error to user space
  Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl
  Btrfs: fix page reading in extent_same ioctl leading to csum errors
  Btrfs: fix invalid page accesses in extent_same (dedup) ioctl
  btrfs: properly set the termination value of ctx->pos in readdir
  Revert "btrfs: clear PF_NOFREEZE in cleaner_kthread()"
  Btrfs: fix fitrim discarding device area reserved for boot loader's use
  btrfs: handle invalid num_stripes in sys_array
  ext4: don't read blocks from disk after extents being swapped
  ext4: fix potential integer overflow
  ext4: fix scheduling in atomic on group checksum failure
  serial: omap: Prevent DoS using unprivileged ioctl(TIOCSRS485)
  serial: 8250_pci: Add Intel Broadwell ports
  tty: Add support for PCIe WCH382 2S multi-IO card
  pty: make sure super_block is still valid in final /dev/tty close
  pty: fix possible use after free of tty->driver_data
  staging/speakup: Use tty_ldisc_ref() for paste kworker
  phy: twl4030-usb: Fix unbalanced pm_runtime_enable on module reload
  phy: twl4030-usb: Relase usb phy on unload
  ALSA: seq: Fix double port list deletion
  ALSA: seq: Fix leak of pool buffer at concurrent writes
  ALSA: pcm: Fix rwsem deadlock for non-atomic PCM stream
  ALSA: hda - Cancel probe work instead of flush at remove
  x86/mm: Fix vmalloc_fault() to handle large pages properly
  x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache()
  x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable
  x86/mm/pat: Avoid truncation when converting cpa->numpages to address
  x86/mm: Fix types used in pgprot cacheability flags translations
  Linux 4.4.2
  HID: multitouch: fix input mode switching on some Elan panels
  mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any progress
  zsmalloc: fix migrate_zspage-zs_free race condition
  zram: don't call idr_remove() from zram_remove()
  zram: try vmalloc() after kmalloc()
  zram/zcomp: use GFP_NOIO to allocate streams
  rtlwifi: rtl8821ae: Fix 5G failure when EEPROM is incorrectly encoded
  rtlwifi: rtl8821ae: Fix errors in parameter initialization
  crypto: marvell/cesa - fix test in mv_cesa_dev_dma_init()
  crypto: atmel-sha - remove calls of clk_prepare() from atomic contexts
  crypto: atmel-sha - fix atmel_sha_remove()
  crypto: algif_skcipher - Do not set MAY_BACKLOG on the async path
  crypto: algif_skcipher - Do not dereference ctx without socket lock
  crypto: algif_skcipher - Do not assume that req is unchanged
  crypto: user - lock crypto_alg_list on alg dump
  EVM: Use crypto_memneq() for digest comparisons
  crypto: algif_hash - wait for crypto_ahash_init() to complete
  crypto: shash - Fix has_key setting
  crypto: chacha20-ssse3 - Align stack pointer to 64 bytes
  crypto: caam - make write transactions bufferable on PPC platforms
  crypto: algif_skcipher - sendmsg SG marking is off by one
  crypto: algif_skcipher - Load TX SG list after waiting
  crypto: crc32c - Fix crc32c soft dependency
  crypto: algif_skcipher - Fix race condition in skcipher_check_key
  crypto: algif_hash - Fix race condition in hash_check_key
  crypto: af_alg - Forbid bind(2) when nokey child sockets are present
  crypto: algif_skcipher - Remove custom release parent function
  crypto: algif_hash - Remove custom release parent function
  crypto: af_alg - Allow af_af_alg_release_parent to be called on nokey path
  ahci: Intel DNV device IDs SATA
  libata: disable forced PORTS_IMPL for >= AHCI 1.3
  crypto: algif_skcipher - Add key check exception for cipher_null
  crypto: skcipher - Add crypto_skcipher_has_setkey
  crypto: algif_hash - Require setkey before accept(2)
  crypto: hash - Add crypto_ahash_has_setkey
  crypto: algif_skcipher - Add nokey compatibility path
  crypto: af_alg - Add nokey compatibility path
  crypto: af_alg - Fix socket double-free when accept fails
  crypto: af_alg - Disallow bind/setkey/... after accept(2)
  crypto: algif_skcipher - Require setkey before accept(2)
  sched: Fix crash in sched_init_numa()
  ext4 crypto: add missing locking for keyring_key access
  iommu/io-pgtable-arm: Ensure we free the final level on teardown
  tty: Fix unsafe ldisc reference via ioctl(TIOCGETD)
  tty: Retry failed reopen if tty teardown in-progress
  tty: Wait interruptibly for tty lock on reopen
  n_tty: Fix unsafe reference to "other" ldisc
  usb: xhci: apply XHCI_PME_STUCK_QUIRK to Intel Broxton-M platforms
  usb: xhci: handle both SSIC ports in PME stuck quirk
  usb: phy: msm: fix error handling in probe.
  usb: cdc-acm: send zero packet for intel 7260 modem
  usb: cdc-acm: handle unlinked urb in acm read callback
  USB: option: fix Cinterion AHxx enumeration
  USB: serial: option: Adding support for Telit LE922
  USB: cp210x: add ID for IAI USB to RS485 adaptor
  USB: serial: ftdi_sio: add support for Yaesu SCU-18 cable
  usb: hub: do not clear BOS field during reset device
  USB: visor: fix null-deref at probe
  USB: serial: visor: fix crash on detecting device without write_urbs
  ASoC: rt5645: fix the shift bit of IN1 boost
  saa7134-alsa: Only frees registered sound cards
  ALSA: dummy: Implement timer backend switching more safely
  ALSA: hda - Fix bad dereference of jack object
  ALSA: hda - Fix speaker output from VAIO AiO machines
  Revert "ALSA: hda - Fix noise on Gigabyte Z170X mobo"
  ALSA: hda - Fix static checker warning in patch_hdmi.c
  ALSA: hda - Add fixup for Mac Mini 7,1 model
  ALSA: timer: Fix race between stop and interrupt
  ALSA: timer: Fix wrong instance passed to slave callbacks
  ALSA: timer: Fix race at concurrent reads
  ALSA: timer: Fix link corruption due to double start or stop
  ALSA: timer: Fix leftover link at closing
  ALSA: timer: Code cleanup
  ALSA: seq: Fix lockdep warnings due to double mutex locks
  ALSA: seq: Fix race at closing in virmidi driver
  ALSA: seq: Fix yet another races among ALSA timer accesses
  ASoC: dpcm: fix the BE state on hw_free
  ALSA: pcm: Fix potential deadlock in OSS emulation
  ALSA: hda/realtek - Support Dell headset mode for ALC225
  ALSA: hda/realtek - Support headset mode for ALC225
  ALSA: hda/realtek - New codec support of ALC225
  ALSA: rawmidi: Fix race at copying & updating the position
  ALSA: rawmidi: Remove kernel WARNING for NULL user-space buffer check
  ALSA: rawmidi: Make snd_rawmidi_transmit() race-free
  ALSA: seq: Degrade the error message for too many opens
  ALSA: seq: Fix incorrect sanity check at snd_seq_oss_synth_cleanup()
  ALSA: dummy: Disable switching timer backend via sysfs
  ALSA: compress: Disable GET_CODEC_CAPS ioctl for some architectures
  ALSA: hda - disable dynamic clock gating on Broxton before reset
  ALSA: Add missing dependency on CONFIG_SND_TIMER
  ALSA: bebob: Use a signed return type for get_formation_index
  ALSA: usb-audio: avoid freeing umidi object twice
  ALSA: usb-audio: Add native DSD support for PS Audio NuWave DAC
  ALSA: usb-audio: Fix OPPO HA-1 vendor ID
  ALSA: usb-audio: Add quirk for Microsoft LifeCam HD-6000
  ALSA: usb-audio: Fix TEAC UD-501/UD-503/NT-503 usb delay
  hrtimer: Handle remaining time proper for TIME_LOW_RES
  md/raid: only permit hot-add of compatible integrity profiles
  media: i2c: Don't export ir-kbd-i2c module alias
  parisc: Fix __ARCH_SI_PREAMBLE_SIZE
  parisc: Protect huge page pte changes with spinlocks
  printk: do cond_resched() between lines while outputting to consoles
  tracing/stacktrace: Show entire trace if passed in function not found
  tracing: Fix stacktrace skip depth in trace_buffer_unlock_commit_regs()
  PCI: Fix minimum allocation address overwrite
  PCI: host: Mark PCIe/PCI (MSI) IRQ cascade handlers as IRQF_NO_THREAD
  mtd: nand: assign reasonable default name for NAND drivers
  wlcore/wl12xx: spi: fix NULL pointer dereference (Oops)
  wlcore/wl12xx: spi: fix oops on firmware load
  ocfs2/dlm: clear refmap bit of recovery lock while doing local recovery cleanup
  ocfs2/dlm: ignore cleaning the migration mle that is inuse
  ALSA: hda - Implement loopback control switch for Realtek and other codecs
  block: fix bio splitting on max sectors
  base/platform: Fix platform drivers with no probe callback
  HID: usbhid: fix recursive deadlock
  ocfs2: NFS hangs in __ocfs2_cluster_lock due to race with ocfs2_unblock_lock
  block: split bios to max possible length
  NFSv4.1/pnfs: Fixup an lo->plh_block_lgets imbalance in layoutreturn
  crypto: sun4i-ss - add missing statesize
  Linux 4.4.1
  arm64: kernel: fix architected PMU registers unconditional access
  arm64: kernel: enforce pmuserenr_el0 initialization and restore
  arm64: mm: ensure that the zero page is visible to the page table walker
  arm64: Clear out any singlestep state on a ptrace detach operation
  powerpc/module: Handle R_PPC64_ENTRY relocations
  scripts/recordmcount.pl: support data in text section on powerpc
  powerpc: Make {cmp}xchg* and their atomic_ versions fully ordered
  powerpc: Make value-returning atomics fully ordered
  powerpc/tm: Check for already reclaimed tasks
  batman-adv: Drop immediate orig_node free function
  batman-adv: Drop immediate batadv_hard_iface free function
  batman-adv: Drop immediate neigh_ifinfo free function
  batman-adv: Drop immediate batadv_neigh_node free function
  batman-adv: Drop immediate batadv_orig_ifinfo free function
  batman-adv: Avoid recursive call_rcu for batadv_nc_node
  batman-adv: Avoid recursive call_rcu for batadv_bla_claim
  team: Replace rcu_read_lock with a mutex in team_vlan_rx_kill_vid
  net/mlx5_core: Fix trimming down IRQ number
  bridge: fix lockdep addr_list_lock false positive splat
  ipv6: update skb->csum when CE mark is propagated
  net: bpf: reject invalid shifts
  phonet: properly unshare skbs in phonet_rcv()
  dwc_eth_qos: Fix dma address for multi-fragment skbs
  bonding: Prevent IPv6 link local address on enslaved devices
  net: preserve IP control block during GSO segmentation
  udp: disallow UFO for sockets with SO_NO_CHECK option
  net: pktgen: fix null ptr deref in skb allocation
  sched,cls_flower: set key address type when present
  tcp_yeah: don't set ssthresh below 2
  ipv6: tcp: add rcu locking in tcp_v6_send_synack()
  net: sctp: prevent writes to cookie_hmac_alg from accessing invalid memory
  vxlan: fix test which detect duplicate vxlan iface
  unix: properly account for FDs passed over unix sockets
  xhci: refuse loading if nousb is used
  usb: core: lpm: fix usb3_hardware_lpm sysfs node
  USB: cp210x: add ID for ELV Marble Sound Board 1
  rtlwifi: fix memory leak for USB device
  ASoC: compress: Fix compress device direction check
  ASoC: wm5110: Fix PGA clear when disabling DRE
  ALSA: timer: Handle disconnection more safely
  ALSA: hda - Flush the pending probe work at remove
  ALSA: hda - Fix missing module loading with model=generic option
  ALSA: hda - Fix bass pin fixup for ASUS N550JX
  ALSA: control: Avoid kernel warnings from tlv ioctl with numid 0
  ALSA: hrtimer: Fix stall by hrtimer_cancel()
  ALSA: pcm: Fix snd_pcm_hw_params struct copy in compat mode
  ALSA: seq: Fix snd_seq_call_port_info_ioctl in compat mode
  ALSA: hda - Add fixup for Dell Latitidue E6540
  ALSA: timer: Fix double unlink of active_list
  ALSA: timer: Fix race among timer ioctls
  ALSA: hda - fix the headset mic detection problem for a Dell laptop
  ALSA: timer: Harden slave timer list handling
  ALSA: usb-audio: Fix mixer ctl regression of Native Instrument devices
  ALSA: hda - Fix white noise on Dell Latitude E5550
  ALSA: seq: Fix race at timer setup and close
  ALSA: usb-audio: Avoid calling usb_autopm_put_interface() at disconnect
  ALSA: seq: Fix missing NULL check at remove_events ioctl
  ALSA: hda - Fixup inverted internal mic for Lenovo E50-80
  ALSA: usb: Add native DSD support for Oppo HA-1
  x86/mm: Improve switch_mm() barrier comments
  x86/mm: Add barriers and document switch_mm()-vs-flush synchronization
  x86/boot: Double BOOT_HEAP_SIZE to 64KB
  x86/reboot/quirks: Add iMac10,1 to pci_reboot_dmi_table[]
  kvm: x86: Fix vmwrite to SECONDARY_VM_EXEC_CONTROL
  KVM: x86: correctly print #AC in traces
  KVM: x86: expose MSR_TSC_AUX to userspace
  x86/xen: don't reset vcpu_info on a cancelled suspend
  KEYS: Fix keyring ref leak in join_session_keyring()

Conflicts:
	arch/arm64/kernel/perf_event.c
	drivers/scsi/sd.c
	sound/core/compress_offload.c

Change-Id: I9f77fe42aaae249c24cd6e170202110ab1426878
Signed-off-by: Trilok Soni <tsoni@codeaurora.org>
2016-03-23 20:51:00 -07:00
Eryu Guan
bbfe21c87b ext4: don't read blocks from disk after extents being swapped
commit bcff24887d00bce102e0857d7b0a8c44a40f53d1 upstream.

I notice ext4/307 fails occasionally on ppc64 host, reporting md5
checksum mismatch after moving data from original file to donor file.

The reason is that move_extent_per_page() calls __block_write_begin()
and block_commit_write() to write saved data from original inode blocks
to donor inode blocks, but __block_write_begin() not only maps buffer
heads but also reads block content from disk if the size is not block
size aligned.  At this time the physical block number in mapped buffer
head is pointing to the donor file not the original file, and that
results in reading wrong data to page, which get written to disk in
following block_commit_write call.

This also can be reproduced by the following script on 1k block size ext4
on x86_64 host:

    mnt=/mnt/ext4
    donorfile=$mnt/donor
    testfile=$mnt/testfile
    e4compact=~/xfstests/src/e4compact

    rm -f $donorfile $testfile

    # reserve space for donor file, written by 0xaa and sync to disk to
    # avoid EBUSY on EXT4_IOC_MOVE_EXT
    xfs_io -fc "pwrite -S 0xaa 0 1m" -c "fsync" $donorfile

    # create test file written by 0xbb
    xfs_io -fc "pwrite -S 0xbb 0 1023" -c "fsync" $testfile

    # compute initial md5sum
    md5sum $testfile | tee md5sum.txt
    # drop cache, force e4compact to read data from disk
    echo 3 > /proc/sys/vm/drop_caches

    # test defrag
    echo "$testfile" | $e4compact -i -v -f $donorfile
    # check md5sum
    md5sum -c md5sum.txt

Fix it by creating & mapping buffer heads only but not reading blocks
from disk, because all the data in page is guaranteed to be up-to-date
in mext_page_mkuptodate().

Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:15 -08:00
Insu Yun
600d41f4ec ext4: fix potential integer overflow
commit 46901760b46064964b41015d00c140c83aa05bcf upstream.

Since sizeof(ext_new_group_data) > sizeof(ext_new_flex_group_data),
integer overflow could be happened.
Therefore, need to fix integer overflow sanitization.

Signed-off-by: Insu Yun <wuninsu@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:15 -08:00
Jan Kara
33f48f8ab0 ext4: fix scheduling in atomic on group checksum failure
commit 05145bd799e498ce4e3b5145894174ee881f02b0 upstream.

When block group checksum is wrong, we call ext4_error() while holding
group spinlock from ext4_init_block_bitmap() or
ext4_init_inode_bitmap() which results in scheduling while in atomic.
Fix the issue by calling ext4_error() later after dropping the spinlock.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:14 -08:00
Theodore Ts'o
b80b70ef40 ext4 crypto: add missing locking for keyring_key access
commit db7730e3091a52c2fcd8fcc952b964d88998e675 upstream.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-17 12:31:02 -08:00
JP Abgrall
9a878d1d77 ext4: Add support for FIDTRIM, a best-effort ioctl for deep discard trim
* What
This provides an interface for issuing an FITRIM which uses the
secure discard instead of just a discard.
Only the eMMC command is "secure", and not how the FS uses it:
due to the fact that the FS might reassign a region somewhere else,
the original deleted data will not be affected by the "trim" which only
handles un-used regions.
So we'll just call it "deep discard", and note that this is a
"best effort" cleanup.

* Why
Once in a while, We want to be able to cleanup most of the unused blocks
after erasing a bunch of files.
We don't want to constantly secure-discard via a mount option.

From an eMMC spec perspective, it tells the device to really get rid of
all the data for the specified blocks and not just put them back into the
pool of free ones (unlike the normal TRIM). The eMMC spec says the
secure trim handling must make sure the data (and metadata) is not available
anymore. A simple TRIM doesn't clear the data, it just puts blocks in the
free pool.
JEDEC Standard No. 84-A441
  7.6.9 Secure Erase
  7.6.10 Secure Trim

From an FS perspective, it is acceptable to leave some data behind.
 - directory entries related to deleted files
 - databases entries related to deleted files
 - small-file data stored in inode extents
 - blocks held by the FS waiting to be re-used (mitigated by sync).
 - blocks reassigned by the FS prior to FIDTRIM.

Change-Id: I676a1404a80130d93930c84898360f2e6fb2f81e
Signed-off-by: Geremy Condra <gcondra@google.com>
Signed-off-by: JP Abgrall <jpa@google.com>
2016-02-16 13:54:19 -08:00
Linus Torvalds
f41683a204 Ext4 bug fixes for v4.4, including fixes for post-2038 time encodings,
some endian conversion problems with ext4 encryption, potential memory
 leaks after truncate in data=journal mode, and an ocfs2 regression
 caused by a jbd2 performance improvement.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJWZP5/AAoJEPL5WVaVDYGjLUwH+wZUMghNAiR+AUEDW5MwbRMd
 XGvPFr1ohs9ep6vrayRLAVU+BDbcSvL7GRddFpUabTfDqH6tyk43IqOFaE2UMefj
 +k61vUnEaD7hTOGtgnfkntAcrE8hngTA1UkEGRTRQjRSqKgt1ku2LB/GfGbgv4Yq
 1grwLbq/MRZ6gSZIv8TwuLC7BOAt6mLtxtJB8ozRuhVJVo1gzy1IvXkqn2mgf/h4
 nIq6i720NxZS9HqncA/o1rxfWb2bwEpLj5MYqdgscwHBGql0riHM0HOebAbgvVqV
 FgLM81p4qVry8N4ACNLIJsqjWu3eMcNUZFW3vaObGTg2tODTj8WSEClb56Z6fpE=
 =iOzG
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Ext4 bug fixes for v4.4, including fixes for post-2038 time encodings,
  some endian conversion problems with ext4 encryption, potential memory
  leaks after truncate in data=journal mode, and an ocfs2 regression
  caused by a jbd2 performance improvement"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  jbd2: fix null committed data return in undo_access
  ext4: add "static" to ext4_seq_##name##_fops struct
  ext4: fix an endianness bug in ext4_encrypted_follow_link()
  ext4: fix an endianness bug in ext4_encrypted_zeroout()
  jbd2: Fix unreclaimed pages after truncate in data=journal mode
  ext4: Fix handling of extended tv_sec
2015-12-07 10:25:00 -08:00
Xu Cang
681c46b164 ext4: add "static" to ext4_seq_##name##_fops struct
to fix sparse warning, add static to ext4_seq_##name##_fops struct.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-11-26 15:52:24 -05:00
Al Viro
5a1c7f47da ext4: fix an endianness bug in ext4_encrypted_follow_link()
applying le32_to_cpu() to 16bit value is a bad idea...
    
Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-11-26 15:20:50 -05:00
Al Viro
e2c9e0b28e ext4: fix an endianness bug in ext4_encrypted_zeroout()
ex->ee_block is not host-endian (note that accesses of other fields
of *ex right next to that line go through the helpers that do proper
conversion from little-endian to host-endian; it might make sense
to add similar for ->ee_block to avoid reintroducing that kind of
bugs...)

Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-11-26 15:20:19 -05:00
David Turner
a4dad1ae24 ext4: Fix handling of extended tv_sec
In ext4, the bottom two bits of {a,c,m}time_extra are used to extend
the {a,c,m}time fields, deferring the year 2038 problem to the year
2446.

When decoding these extended fields, for times whose bottom 32 bits
would represent a negative number, sign extension causes the 64-bit
extended timestamp to be negative as well, which is not what's
intended.  This patch corrects that issue, so that the only negative
{a,c,m}times are those between 1901 and 1970 (as per 32-bit signed
timestamps).

Some older kernels might have written pre-1970 dates with 1,1 in the
extra bits.  This patch treats those incorrectly-encoded dates as
pre-1970, instead of post-2311, until kernel 4.20 is released.
Hopefully by then e2fsck will have fixed up the bad data.

Also add a comment explaining the encoding of ext4's extra {a,c,m}time
bits.

Signed-off-by: David Turner <novalis@novalis.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Mark Harris <mh8928@yahoo.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=23732
Cc: stable@vger.kernel.org
2015-11-24 14:34:37 -05:00
Dan Williams
ef83b6e8f4 ext2, ext4: warn when mounting with dax enabled
Similar to XFS warn when mounting DAX while it is still considered under
development.  Also, aspects of the DAX implementation, for example
synchronization against multiple faults and faults causing block
allocation, depend on the correct implementation in the filesystem.  The
maturity of a given DAX implementation is filesystem specific.

Cc: <stable@vger.kernel.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: linux-ext4@vger.kernel.org
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Dave Chinner <david@fromorbit.com>
Acked-by: Jan Kara <jack@suse.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-11-16 09:43:54 -08:00
Linus Torvalds
5d2eb548b3 Merge branch 'for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs xattr cleanups from Al Viro.

* 'for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  f2fs: xattr simplifications
  squashfs: xattr simplifications
  9p: xattr simplifications
  xattr handlers: Pass handler to operations instead of flags
  jffs2: Add missing capability check for listing trusted xattrs
  hfsplus: Remove unused xattr handler list operations
  ubifs: Remove unused security xattr handler
  vfs: Fix the posix_acl_xattr_list return value
  vfs: Check attribute names in posix acl xattr handers
2015-11-13 18:02:30 -08:00
Andreas Gruenbacher
d9a82a0403 xattr handlers: Pass handler to operations instead of flags
The xattr_handler operations are currently all passed a file system
specific flags value which the operations can use to disambiguate between
different handlers; some file systems use that to distinguish the xattr
namespace, for example.  In some oprations, it would be useful to also have
access to the handler prefix.  To allow that, pass a pointer to the handler
to operations instead of the flags value alone.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-13 20:34:32 -05:00
Linus Torvalds
842cf0b952 Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs update from Al Viro:

 - misc stable fixes

 - trivial kernel-doc and comment fixups

 - remove never-used block_page_mkwrite() wrapper function, and rename
   the function that is _actually_ used to not have double underscores.

* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: 9p: cache.h: Add #define of include guard
  vfs: remove stale comment in inode_operations
  vfs: remove unused wrapper block_page_mkwrite()
  binfmt_elf: Correct `arch_check_elf's description
  fs: fix writeback.c kernel-doc warnings
  fs: fix inode.c kernel-doc warning
  fs/pipe.c: return error code rather than 0 in pipe_write()
  fs/pipe.c: preserve alloc_file() error code
  binfmt_elf: Don't clobber passed executable's file header
  FS-Cache: Handle a write to the page immediately beyond the EOF marker
  cachefiles: perform test on s_blocksize when opening cache file.
  FS-Cache: Don't override netfs's primary_index if registering failed
  FS-Cache: Increase reference of parent after registering, netfs success
  debugfs: fix refcount imbalance in start_creating
2015-11-11 09:45:24 -08:00
Ross Zwisler
5c50002963 vfs: remove unused wrapper block_page_mkwrite()
The function currently called "__block_page_mkwrite()" used to be called
"block_page_mkwrite()" until a wrapper for this function was added by:

commit 24da4fab5a ("vfs: Create __block_page_mkwrite() helper passing
	error values back")

This wrapper, the current "block_page_mkwrite()", is currently unused.
__block_page_mkwrite() is used directly by ext4, nilfs2 and xfs.

Remove the unused wrapper, rename __block_page_mkwrite() back to
block_page_mkwrite() and update the comment above block_page_mkwrite().

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.com>
Cc: Jan Kara <jack@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11 02:19:33 -05:00
Andrew Morton
79211c8ed1 remove abs64()
Switch everything to the new and more capable implementation of abs().
Mainly to give the new abs() a bit of a workout.

Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Linus Torvalds
ad804a0b2a Merge branch 'akpm' (patches from Andrew)
Merge second patch-bomb from Andrew Morton:

 - most of the rest of MM

 - procfs

 - lib/ updates

 - printk updates

 - bitops infrastructure tweaks

 - checkpatch updates

 - nilfs2 update

 - signals

 - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
   dma-debug, dma-mapping, ...

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits)
  ipc,msg: drop dst nil validation in copy_msg
  include/linux/zutil.h: fix usage example of zlib_adler32()
  panic: release stale console lock to always get the logbuf printed out
  dma-debug: check nents in dma_sync_sg*
  dma-mapping: tidy up dma_parms default handling
  pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
  kexec: use file name as the output message prefix
  fs, seqfile: always allow oom killer
  seq_file: reuse string_escape_str()
  fs/seq_file: use seq_* helpers in seq_hex_dump()
  coredump: change zap_threads() and zap_process() to use for_each_thread()
  coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
  signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
  signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
  signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
  signals: kill block_all_signals() and unblock_all_signals()
  nilfs2: fix gcc uninitialized-variable warnings in powerpc build
  nilfs2: fix gcc unused-but-set-variable warnings
  MAINTAINERS: nilfs2: add header file for tracing
  nilfs2: add tracepoints for analyzing reading and writing metadata files
  ...
2015-11-07 14:32:45 -08:00
Linus Torvalds
75021d2859 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial updates from Jiri Kosina:
 "Trivial stuff from trivial tree that can be trivially summed up as:

   - treewide drop of spurious unlikely() before IS_ERR() from Viresh
     Kumar

   - cosmetic fixes (that don't really affect basic functionality of the
     driver) for pktcdvd and bcache, from Julia Lawall and Petr Mladek

   - various comment / printk fixes and updates all over the place"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  bcache: Really show state of work pending bit
  hwmon: applesmc: fix comment typos
  Kconfig: remove comment about scsi_wait_scan module
  class_find_device: fix reference to argument "match"
  debugfs: document that debugfs_remove*() accepts NULL and error values
  net: Drop unlikely before IS_ERR(_OR_NULL)
  mm: Drop unlikely before IS_ERR(_OR_NULL)
  fs: Drop unlikely before IS_ERR(_OR_NULL)
  drivers: net: Drop unlikely before IS_ERR(_OR_NULL)
  drivers: misc: Drop unlikely before IS_ERR(_OR_NULL)
  UBI: Update comments to reflect UBI_METAONLY flag
  pktcdvd: drop null test before destroy functions
2015-11-07 13:05:44 -08:00
Michal Hocko
c62d25556b mm, fs: introduce mapping_gfp_constraint()
There are many places which use mapping_gfp_mask to restrict a more
generic gfp mask which would be used for allocations which are not
directly related to the page cache but they are performed in the same
context.

Let's introduce a helper function which makes the restriction explicit and
easier to track.  This patch doesn't introduce any functional changes.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Mel Gorman
d0164adc89 mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts.  They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve".  __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".

Over time, callers had a requirement to not block when fallback options
were available.  Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.

This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative.  High priority users continue to use
__GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.

This patch then converts a number of sites

o __GFP_ATOMIC is used by callers that are high priority and have memory
  pools for those requests. GFP_ATOMIC uses this flag.

o Callers that have a limited mempool to guarantee forward progress clear
  __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
  into this category where kswapd will still be woken but atomic reserves
  are not used as there is a one-entry mempool to guarantee progress.

o Callers that are checking if they are non-blocking should use the
  helper gfpflags_allow_blocking() where possible. This is because
  checking for __GFP_WAIT as was done historically now can trigger false
  positives. Some exceptions like dm-crypt.c exist where the code intent
  is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
  flag manipulations.

o Callers that built their own GFP flags instead of starting with GFP_KERNEL
  and friends now also need to specify __GFP_KSWAPD_RECLAIM.

The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.

The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL.  They may
now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Linus Torvalds
7130098096 Add support for the CSUM_SEED feature which will allow future
userspace utilities to change the file system's UUID without rewriting
 all of the file system metadata.
 
 A number of miscellaneous fixes, the most significant of which are in
 the ext4 encryption support.  Anyone wishing to use the encryption
 feature should backport all of the ext4 crypto patches up to 4.4 to
 get fixes to a memory leak and file system corruption bug.
 
 There are also cleanups in ext4's feature test macros and in ext4's
 sysfs support code.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJWPMOAAAoJEPL5WVaVDYGjDYAH/jVIz+WpPkZn/J1Oepy1WPDm
 Azxv+tUvKffXQ7sRwGZEU7snLaeamKxtrvHUXKLT4PIQB6m6o4/kk8o71ovv5PRT
 FvuMz1j7qsqEk5ZFQnqI7xosao/RovjfkvL5g+WsAN//C+vxO35bg4fVvZmf73GQ
 vloe6igM/qxiUAuMX0jlXjUI1Zyo+H3bXOC6LjnUCOPwZMRcQ9zMtoYjkWryTzo1
 4udepGRSfcWeZkWXqt9KIe6slYRmq3BtXWJ0+Zvx6gKWrXVIisINLDHYAEVBXAf4
 6VUiDKL6wIytkwt3vGwYSY11wNQC5ky3qV/tJlPnpbYfUP0vEvT4UXljoaR/YQc=
 =m3Q9
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Add support for the CSUM_SEED feature which will allow future
  userspace utilities to change the file system's UUID without rewriting
  all of the file system metadata.

  A number of miscellaneous fixes, the most significant of which are in
  the ext4 encryption support.  Anyone wishing to use the encryption
  feature should backport all of the ext4 crypto patches up to 4.4 to
  get fixes to a memory leak and file system corruption bug.

  There are also cleanups in ext4's feature test macros and in ext4's
  sysfs support code"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (26 commits)
  fs/ext4: remove unnecessary new_valid_dev check
  ext4: fix abs() usage in ext4_mb_check_group_pa
  ext4: do not allow journal_opts for fs w/o journal
  ext4: explicit mount options parsing cleanup
  ext4, jbd2: ensure entering into panic after recording an error in superblock
  [PATCH] fix calculation of meta_bg descriptor backups
  ext4: fix potential use after free in __ext4_journal_stop
  jbd2: fix checkpoint list cleanup
  ext4: fix xfstest generic/269 double revoked buffer bug with bigalloc
  ext4: make the bitmap read routines return real error codes
  jbd2: clean up feature test macros with predicate functions
  ext4: clean up feature test macros with predicate functions
  ext4: call out CRC and corruption errors with specific error codes
  ext4: store checksum seed in superblock
  ext4: reserve code points for the project quota feature
  ext4: promote ext4 over ext2 in the default probe order
  jbd2: gate checksum calculations on crc driver presence, not sb flags
  ext4: use private version of page_zero_new_buffers() for data=journal mode
  ext4 crypto: fix bugs in ext4_encrypted_zeroout()
  ext4 crypto: replace some BUG_ON()'s with error checks
  ...
2015-11-06 16:23:27 -08:00
Linus Torvalds
1873499e13 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem update from James Morris:
 "This is mostly maintenance updates across the subsystem, with a
  notable update for TPM 2.0, and addition of Jarkko Sakkinen as a
  maintainer of that"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (40 commits)
  apparmor: clarify CRYPTO dependency
  selinux: Use a kmem_cache for allocation struct file_security_struct
  selinux: ioctl_has_perm should be static
  selinux: use sprintf return value
  selinux: use kstrdup() in security_get_bools()
  selinux: use kmemdup in security_sid_to_context_core()
  selinux: remove pointless cast in selinux_inode_setsecurity()
  selinux: introduce security_context_str_to_sid
  selinux: do not check open perm on ftruncate call
  selinux: change CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default
  KEYS: Merge the type-specific data with the payload data
  KEYS: Provide a script to extract a module signature
  KEYS: Provide a script to extract the sys cert list from a vmlinux file
  keys: Be more consistent in selection of union members used
  certs: add .gitignore to stop git nagging about x509_certificate_list
  KEYS: use kvfree() in add_key
  Smack: limited capability for changing process label
  TPM: remove unnecessary little endian conversion
  vTPM: support little endian guests
  char: Drop owner assignment from i2c_driver
  ...
2015-11-05 15:32:38 -08:00
Yaowei Bai
be69e1c19f fs/ext4: remove unnecessary new_valid_dev check
As new_valid_dev always returns 1, so !new_valid_dev check is not
needed, remove it.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-10-29 14:18:13 -04:00
David Howells
146aa8b145 KEYS: Merge the type-specific data with the payload data
Merge the type-specific data with the payload data into one four-word chunk
as it seems pointless to keep them separate.

Use user_key_payload() for accessing the payloads of overloaded
user-defined keys.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-cifs@vger.kernel.org
cc: ecryptfs@vger.kernel.org
cc: linux-ext4@vger.kernel.org
cc: linux-f2fs-devel@lists.sourceforge.net
cc: linux-nfs@vger.kernel.org
cc: ceph-devel@vger.kernel.org
cc: linux-ima-devel@lists.sourceforge.net
2015-10-21 15:18:36 +01:00
John Stultz
16175039e6 ext4: fix abs() usage in ext4_mb_check_group_pa
The ext4_fsblk_t type is a long long, which should not be used
with abs(), as is done in ext4_mb_check_group_pa().

This patch modifies ext4_mb_check_group_pa() to use abs64()
instead.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-10-19 00:01:05 -04:00
Dmitry Monakhov
1e381f60da ext4: do not allow journal_opts for fs w/o journal
It is appeared that we can pass journal related mount options and such options
be shown in /proc/mounts

Example:
#mkfs.ext4 -F /dev/vdb
#tune2fs -O ^has_journal /dev/vdb
#mount /dev/vdb /mnt/  -ocommit=20,journal_async_commit
#cat /proc/mounts  | grep /mnt
 /dev/vdb /mnt ext4 rw,relatime,journal_checksum,journal_async_commit,commit=20,data=ordered 0 0

But options:"journal_checksum,journal_async_commit,commit=20,data=ordered" has
nothing with reality because there is no journal at all.

This patch disallow following options for journalless configurations:
 - journal_checksum
 - journal_async_commit
 - commit=%ld
 - data={writeback,ordered,journal}

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
2015-10-18 23:50:26 -04:00
Dmitry Monakhov
c93cf2d757 ext4: explicit mount options parsing cleanup
Currently MOPT_EXPLICIT treated as EXPLICIT_DELALLOC which may be changed
in future. Let's fix it now.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-10-18 23:35:32 -04:00
Daeho Jeong
4327ba52af ext4, jbd2: ensure entering into panic after recording an error in superblock
If a EXT4 filesystem utilizes JBD2 journaling and an error occurs, the
journaling will be aborted first and the error number will be recorded
into JBD2 superblock and, finally, the system will enter into the
panic state in "errors=panic" option.  But, in the rare case, this
sequence is little twisted like the below figure and it will happen
that the system enters into panic state, which means the system reset
in mobile environment, before completion of recording an error in the
journal superblock. In this case, e2fsck cannot recognize that the
filesystem failure occurred in the previous run and the corruption
wouldn't be fixed.

Task A                        Task B
ext4_handle_error()
-> jbd2_journal_abort()
  -> __journal_abort_soft()
    -> __jbd2_journal_abort_hard()
    | -> journal->j_flags |= JBD2_ABORT;
    |
    |                         __ext4_abort()
    |                         -> jbd2_journal_abort()
    |                         | -> __journal_abort_soft()
    |                         |   -> if (journal->j_flags & JBD2_ABORT)
    |                         |           return;
    |                         -> panic()
    |
    -> jbd2_journal_update_sb_errno()

Tested-by: Hobin Woo <hobin.woo@samsung.com>
Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2015-10-18 17:02:56 -04:00
Andy Leiserson
904dad4742 [PATCH] fix calculation of meta_bg descriptor backups
"group" is the group where the backup will be placed, and is
initialized to zero in the declaration. This meant that backups for
meta_bg descriptors were erroneously written to the backup block group
descriptors in groups 1 and (desc_per_block-1).

Reproduction information:
  mke2fs -Fq -t ext4 -b 1024 -O ^resize_inode /tmp/foo.img 16G
  truncate -s 24G /tmp/foo.img
  losetup /dev/loop0 /tmp/foo.img
  mount /dev/loop0 /mnt
  resize2fs /dev/loop0
  umount /dev/loop0
  dd if=/dev/zero of=/dev/loop0 bs=1024 count=2
  e2fsck -fy /dev/loop0
  losetup -d /dev/loop0

Signed-off-by: Andy Leiserson <andy@leiserson.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2015-10-18 00:36:29 -04:00
Lukas Czerner
6934da9238 ext4: fix potential use after free in __ext4_journal_stop
There is a use-after-free possibility in __ext4_journal_stop() in the
case that we free the handle in the first jbd2_journal_stop() because
we're referencing handle->h_err afterwards. This was introduced in
9705acd63b and it is wrong. Fix it by
storing the handle->h_err value beforehand and avoid referencing
potentially freed handle.

Fixes: 9705acd63b
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Cc: stable@vger.kernel.org
2015-10-17 22:57:06 -04:00
Daeho Jeong
9c02ac9798 ext4: fix xfstest generic/269 double revoked buffer bug with bigalloc
When you repeatly execute xfstest generic/269 with bigalloc_1k option
enabled using the below command:

"./kvm-xfstests -c bigalloc_1k -m nodelalloc -C 1000 generic/269"

you can easily see the below bug message.

"JBD2 unexpected failure: jbd2_journal_revoke: !buffer_revoked(bh);"

This means that an already revoked buffer is erroneously revoked again
and it is caused by doing revoke for the buffer at the wrong position
in ext4_free_blocks(). We need to re-position the buffer revoke
procedure for an unspecified buffer after checking the cluster boundary
for bigalloc option. If not, some part of the cluster can be doubly
revoked.

Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
2015-10-17 22:28:21 -04:00
Darrick J. Wong
9008a58e5d ext4: make the bitmap read routines return real error codes
Make the bitmap reaading routines return real error codes (EIO,
EFSCORRUPTED, EFSBADCRC) which can then be reflected back to
userspace for more precise diagnosis work.

In particular, this means that mballoc no longer claims that we're out
of memory if the block bitmaps become corrupt.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-10-17 21:33:24 -04:00
Darrick J. Wong
e2b911c535 ext4: clean up feature test macros with predicate functions
Create separate predicate functions to test/set/clear feature flags,
thereby replacing the wordy old macros.  Furthermore, clean out the
places where we open-coded feature tests.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2015-10-17 16:18:43 -04:00
Darrick J. Wong
6a797d2737 ext4: call out CRC and corruption errors with specific error codes
Instead of overloading EIO for CRC errors and corrupt structures,
return the same error codes that XFS returns for the same issues.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-10-17 16:16:04 -04:00
Darrick J. Wong
8c81bd8f58 ext4: store checksum seed in superblock
Allow the filesystem to store the metadata checksum seed in the
superblock and add an incompat feature to say that we're using it.
This enables tune2fs to change the UUID on a mounted metadata_csum
FS without having to (racy!) rewrite all disk metadata.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-10-17 16:16:02 -04:00
Theodore Ts'o
8b4953e13f ext4: reserve code points for the project quota feature
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-10-17 16:15:18 -04:00
Linus Torvalds
3d875182d7 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "6 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  sh: add copy_user_page() alias for __copy_user()
  lib/Kconfig: ZLIB_DEFLATE must select BITREVERSE
  mm, dax: fix DAX deadlocks
  memcg: convert threshold to bytes
  builddeb: remove debian/files before build
  mm, fs: obey gfp_mapping for add_to_page_cache()
2015-10-16 11:42:37 -07:00
Michal Hocko
063d99b4fa mm, fs: obey gfp_mapping for add_to_page_cache()
Commit 6afdb859b7 ("mm: do not ignore mapping_gfp_mask in page cache
allocation paths") has caught some users of hardcoded GFP_KERNEL used in
the page cache allocation paths.  This, however, wasn't complete and
there were others which went unnoticed.

Dave Chinner has reported the following deadlock for xfs on loop device:
: With the recent merge of the loop device changes, I'm now seeing
: XFS deadlock on my single CPU, 1GB RAM VM running xfs/073.
:
: The deadlocked is as follows:
:
: kloopd1: loop_queue_read_work
:       xfs_file_iter_read
:       lock XFS inode XFS_IOLOCK_SHARED (on image file)
:       page cache read (GFP_KERNEL)
:       radix tree alloc
:       memory reclaim
:       reclaim XFS inodes
:       log force to unpin inodes
:       <wait for log IO completion>
:
: xfs-cil/loop1: <does log force IO work>
:       xlog_cil_push
:       xlog_write
:       <loop issuing log writes>
:               xlog_state_get_iclog_space()
:               <blocks due to all log buffers under write io>
:               <waits for IO completion>
:
: kloopd1: loop_queue_write_work
:       xfs_file_write_iter
:       lock XFS inode XFS_IOLOCK_EXCL (on image file)
:       <wait for inode to be unlocked>
:
: i.e. the kloopd, with it's split read and write work queues, has
: introduced a dependency through memory reclaim. i.e. that writes
: need to be able to progress for reads make progress.
:
: The problem, fundamentally, is that mpage_readpages() does a
: GFP_KERNEL allocation, rather than paying attention to the inode's
: mapping gfp mask, which is set to GFP_NOFS.
:
: The didn't used to happen, because the loop device used to issue
: reads through the splice path and that does:
:
:       error = add_to_page_cache_lru(page, mapping, index,
:                       GFP_KERNEL & mapping_gfp_mask(mapping));

This has changed by commit aa4d86163e ("block: loop: switch to VFS
ITER_BVEC").

This patch changes mpage_readpage{s} to follow gfp mask set for the
mapping.  There are, however, other places which are doing basically the
same.

lustre:ll_dir_filler is doing GFP_KERNEL from the function which
apparently uses GFP_NOFS for other allocations so let's make this
consistent.

cifs:readpages_get_pages is called from cifs_readpages and
__cifs_readpages_from_fscache called from the same path obeys mapping
gfp.

ramfs_nommu_expand_for_mapping is hardcoding GFP_KERNEL as well
regardless it uses mapping_gfp_mask for the page allocation.

ext4_mpage_readpages is the called from the page cache allocation path
same as read_pages and read_cache_pages

As I've noticed in my previous post I cannot say I would be happy about
sprinkling mapping_gfp_mask all over the place and it sounds like we
should drop gfp_mask argument altogether and use it internally in
__add_to_page_cache_locked that would require all the filesystems to use
mapping gfp consistently which I am not sure is the case here.  From a
quick glance it seems that some file system use it all the time while
others are selective.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Dave Chinner <david@fromorbit.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-16 11:42:28 -07:00
Theodore Ts'o
b90197b655 ext4: use private version of page_zero_new_buffers() for data=journal mode
If there is a error while copying data from userspace into the page
cache during a write(2) system call, in data=journal mode, in
ext4_journalled_write_end() were using page_zero_new_buffers() from
fs/buffer.c.  Unfortunately, this sets the buffer dirty flag, which is
no good if journalling is enabled.  This is a long-standing bug that
goes back for years and years in ext3, but a combination of (a)
data=journal not being very common, (b) in many case it only results
in a warning message. and (c) only very rarely causes the kernel hang,
means that we only really noticed this as a problem when commit
998ef75ddb caused this failure to happen frequently enough to cause
generic/208 to fail when run in data=journal mode.

The fix is to have our own version of this function that doesn't call
mark_dirty_buffer(), since we will end up calling
ext4_handle_dirty_metadata() on the buffer head(s) in questions very
shortly afterwards in ext4_journalled_write_end().

Thanks to Dave Hansen and Linus Torvalds for helping to identify the
root cause of the problem.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.com>
2015-10-15 10:29:05 -04:00
Theodore Ts'o
36086d43f6 ext4 crypto: fix bugs in ext4_encrypted_zeroout()
Fix multiple bugs in ext4_encrypted_zeroout(), including one that
could cause us to write an encrypted zero page to the wrong location
on disk, potentially causing data and file system corruption.
Fortunately, this tends to only show up in stress tests, but even with
these fixes, we are seeing some test failures with generic/127 --- but
these are now caused by data failures instead of metadata corruption.

Since ext4_encrypted_zeroout() is only used for some optimizations to
keep the extent tree from being too fragmented, and
ext4_encrypted_zeroout() itself isn't all that optimized from a time
or IOPS perspective, disable the extent tree optimization for
encrypted inodes for now.  This prevents the data corruption issues
reported by generic/127 until we can figure out what's going wrong.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2015-10-03 10:49:29 -04:00
Theodore Ts'o
687c3c36e7 ext4 crypto: replace some BUG_ON()'s with error checks
Buggy (or hostile) userspace should not be able to cause the kernel to
crash.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2015-10-03 10:49:27 -04:00
Theodore Ts'o
3684de8ca2 ext4 crypto: ext4_page_crypto() doesn't need a encryption context
Since ext4_page_crypto() doesn't need an encryption context (at least
not any more), this allows us to simplify a number function signature
and also allows us to avoid needing to allocate a context in
ext4_block_write_begin().  It also means we no longer need a separate
ext4_decrypt_one() function.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-10-03 10:49:26 -04:00
Theodore Ts'o
cccd147a57 ext4: optimize ext4_writepage() for attempted 4k delalloc writes
In cases where the file system block size is the same as the page
size, and ext4_writepage() is asked to write out a page which is
either has the unwritten bit set in the extent tree, or which does not
yet have a block assigned due to delayed allocation, we can bail out
early and, unlocking the page earlier and avoiding a round trip
through ext4_bio_write_page() with the attendant calls to
set_page_writeback() and redirty_page_for_writeback().

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-10-03 10:49:23 -04:00
Theodore Ts'o
937d7b84dc ext4 crypto: fix memory leak in ext4_bio_write_page()
There are times when ext4_bio_write_page() is called even though we
don't actually need to do any I/O.  This happens when ext4_writepage()
gets called by the jbd2 commit path when an inode needs to force its
pages written out in order to provide data=ordered guarantees --- and
a page is backed by an unwritten (e.g., uninitialized) block on disk,
or if delayed allocation means the page's backing store hasn't been
allocated yet.  In that case, we need to skip the call to
ext4_encrypt_page(), since in addition to wasting CPU, it leads to a
bounce page and an ext4 crypto context getting leaked.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2015-10-02 23:54:58 -04:00
Viresh Kumar
a1c83681d5 fs: Drop unlikely before IS_ERR(_OR_NULL)
IS_ERR(_OR_NULL) already contain an 'unlikely' compiler flag and there
is no need to do that again from its callers. Drop it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Jeff Layton <jlayton@poochiereds.net>
Reviewed-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve French <smfrench@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-09-29 15:13:58 +02:00
Jean Delvare
d4eb6dee47 ext4: Update EXT4_USE_FOR_EXT2 description
Configuration option EXT4_USE_FOR_EXT2 has no effect on ext3 support.
Support for ext3 is always included now.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Fixes: c290ea01ab ("fs: Remove ext3 filesystem driver")
Cc: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.com>
2015-09-24 13:27:47 +02:00
Theodore Ts'o
ebd173beb8 ext4: move procfs registration code to fs/ext4/sysfs.c
This allows us to refactor the procfs code, which saves a bit of
compiled space.  More importantly it isolates most of the procfs
support code into a single file, so it's easier to #ifdef it out if
the proc file system has been disabled.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-09-23 12:46:17 -04:00
Theodore Ts'o
76d33bca55 ext4: refactor sysfs support code
Make the code more easily extensible as well as taking up less
compiled space.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-09-23 12:45:17 -04:00
Theodore Ts'o
b579901882 ext4: move sysfs code from super.c to fs/ext4/sysfs.c
Also statically allocate the ext4_kset and ext4_feat objects, since we
only need exactly one of each, and it's simpler and less code if we
drop the dynamic allocation and deallocation when it's not needed.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-09-23 12:44:17 -04:00