Commit graph

42686 commits

Author SHA1 Message Date
Srivatsa Vaddagiri
3004236139 sched: colocate related threads
Provide userspace interface for tasks to be grouped together as
"related" threads. For example, all threads involved in updating
display buffer could be tagged as related.

Scheduler will attempt to provide special treatment for group of
related threads such as:

1) Colocation of related threads in same "preferred" cluster
2) Aggregation of demand towards determination of cluster frequency

This patch extends scheduler to provide best-effort colocation support
for a group of related threads.

Change-Id: Ic2cd769faf5da4d03a8f3cb0ada6224d0101a5f5
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor merge conflicts.  removed ifdefry
 for CONFIG_SCHED_QHMP.]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>

Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 21:25:12 -07:00
Jeevan Shriram
643a137249 net: initialize variables to avoid UML compilation failure
While compiling for usermode linux for x86 architecture, observed
compilation issues with probable usage of uninitialized variables.
This change initializes the variables.

Signed-off-by: Jeevan Shriram <jshriram@codeaurora.org>
2016-03-23 21:24:21 -07:00
Andrey Markovytch
03bad71331 eCryptfs: fixes issue where files sometimes got corrupted upon close
Fixes implementation of cache cleaning, it was implemented using outdated
code which cleaned valid pages as well by mistake. The fix uses the
original implementation from mm layer which was tweaked to zero pages
after releasing them. Also it is now invoked only when there is HW
encryption (meaning that cache is unencrypted)

Change-Id: If713fd17a89be6d998c1a7ba86bee4c17d2bca0f
Signed-off-by: Andrey Markovytch <andreym@codeaurora.org>

Conflicts:
	fs/ecryptfs/caches_utils.c
2016-03-23 21:24:13 -07:00
Andrey Markovytch
b61ac21fb0 mm + fs: extends support for cache dropping
Exposes drop_pagecache_sb (required by eCryptfs cache wiping)
Adds truncate_inode_pages_fill_zero (required by eCryptfs cache wiping),
which not only truncates pages but also fills them with 0, so that the
cached data can no longer be retrieved.

Change-Id: Icfc18a2c8cdc922e71ee17add6459a1355e77ba6
Signed-off-by: Andrey Markovytch <andreym@codeaurora.org>
[gbroner@codeaurora.org: fix merge conflict]
Signed-off-by: Gilad Broner <gbroner@codeaurora.org>
2016-03-23 21:24:12 -07:00
Andrey Markovytch
36f1ad69fd eCryptfs: fixed some major bugs
1. Fixed bug which didn't allow several threads to work simultaneously
on files in eCryptfs mounted folder

2. Fixed bug where PFK close callback was invoked multiple times when
files was opened and closed multiple times. Now it is invoked just once
when files is closed for the last time

Change-Id: Iaa3ada03500e5a12752918b5d2bb4a852ddca5f0
Signed-off-by: Andrey Markovytch <andreym@codeaurora.org>
2016-03-23 21:24:11 -07:00
Andrey Markovytch
8928f8683b PFK: fixed issue where key in TZ was not set properly
When key is set in ICE via TZ, HLOS should send two parts, SALT and
the KEY itself according to AES standards. KEY was used for both parts.

Change-Id: I453dea289b01bdf49352d5209255966052f5dc1b
Signed-off-by: Andrey Markovytch <andreym@codeaurora.org>
2016-03-23 21:24:07 -07:00
Andrey Markovytch
5eebf86343 ecryptfs: enhancing ecryptfs to be configurable with encryption type
enabled eCryptfs for qcom targets. In addition to the usual
options, a special mode 'aes-xts' was added for qcom ICE hw
encryption

Change-Id: I20c01adc46c977b4a5db0be9ff93384cda14bc56
Signed-off-by: Lina Zarivach <linaz@codeaurora.org>
Signed-off-by: Andrey Markovytch <andreym@codeaurora.org>
[gbroner@codeaurora.org: fix merge conflict]
Signed-off-by: Gilad Broner <gbroner@codeaurora.org>
2016-03-23 21:24:05 -07:00
Andrey Markovytch
ecc052ba4e platform: msm: add Per-File-Tagger (PFT) driver
Integrated from msm-3.14. Additional fixes were made to compile with the
new kernel and various new warnings and checkpatch issues were fixed

Change-Id: I073db1041e41eac9066e37ee099f1da9e4eed6c0
Signed-off-by: Andrey Markovytch <andreym@codeaurora.org>
[gbroner@codeaurora.org: fixed merge conflict and adapted the LSM
security hooks]
Signed-off-by: Gilad Broner <gbroner@codeaurora.org>
2016-03-23 21:24:03 -07:00
Richard Weinberger
44cd578eeb ubifs: Use XATTR_*_PREFIX_LEN
...instead of open coding it.

CRs-Fixed: 975289
Change-Id: I2d2b2fecfd6724baddb1d93ec5b4aeb2587b577f
Signed-off-by: Richard Weinberger <richard@nod.at>
Git-commit: 4fdd1d51ad5d059548c6539ac9d281f74d24bcbe
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Nikhilesh Reddy <reddyn@codeaurora.org>
2016-03-23 21:20:20 -07:00
Dongsheng Yang
819f5bcd53 UBIFS: add a comment in key.h for unused parameter
Add a comment in key.h to explain why we keep an unused
parameter in key helpers.

CRs-Fixed: 975289
Change-Id: I88633f34f7f49b5b895f2295c1ce4ad250900fc0
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Git-commit: 170eb55f7d4ba9564736ba298a7d4985422db4cc
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Nikhilesh Reddy <reddyn@codeaurora.org>
2016-03-23 21:20:19 -07:00
David Keitel
f2b1fed1bd Merge remote-tracking branch 'lsk-44/linux-linaro-lsk-v4.4' into 44rc2
* lsk-44/linux-linaro-lsk-v4.4:
  Linux 4.4.3
  modules: fix modparam async_probe request
  module: wrapper for symbol name.
  itimers: Handle relative timers with CONFIG_TIME_LOW_RES proper
  posix-timers: Handle relative timers with CONFIG_TIME_LOW_RES proper
  timerfd: Handle relative timers with CONFIG_TIME_LOW_RES proper
  prctl: take mmap sem for writing to protect against others
  xfs: log mount failures don't wait for buffers to be released
  Revert "xfs: clear PF_NOFREEZE for xfsaild kthread"
  xfs: inode recovery readahead can race with inode buffer creation
  libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct
  ovl: setattr: check permissions before copy-up
  ovl: root: copy attr
  ovl: check dentry positiveness in ovl_cleanup_whiteouts()
  ovl: use a minimal buffer in ovl_copy_xattr
  ovl: allow zero size xattr
  futex: Drop refcount if requeue_pi() acquired the rtmutex
  devm_memremap_release(): fix memremap'd addr handling
  ipc/shm: handle removed segments gracefully in shm_mmap()
  intel_scu_ipcutil: underflow in scu_reg_access()
  mm,thp: khugepaged: call pte flush at the time of collapse
  dump_stack: avoid potential deadlocks
  radix-tree: fix oops after radix_tree_iter_retry
  drivers/hwspinlock: fix race between radix tree insertion and lookup
  radix-tree: fix race in gang lookup
  MAINTAINERS: return arch/sh to maintained state, with new maintainers
  memcg: only free spare array when readers are done
  numa: fix /proc/<pid>/numa_maps for hugetlbfs on s390
  fs/hugetlbfs/inode.c: fix bugs in hugetlb_vmtruncate_list()
  scripts/bloat-o-meter: fix python3 syntax error
  dma-debug: switch check from _text to _stext
  m32r: fix m32104ut_defconfig build fail
  xhci: Fix list corruption in urb dequeue at host removal
  Revert "xhci: don't finish a TD if we get a short-transfer event mid TD"
  iommu/vt-d: Clear PPR bit to ensure we get more page request interrupts
  iommu/vt-d: Fix 64-bit accesses to 32-bit DMAR_GSTS_REG
  iommu/vt-d: Fix mm refcounting to hold mm_count not mm_users
  iommu/amd: Correct the wrong setting of alias DTE in do_attach
  iommu/vt-d: Don't skip PCI devices when disabling IOTLB
  Input: vmmouse - fix absolute device registration
  string_helpers: fix precision loss for some inputs
  Input: i8042 - add Fujitsu Lifebook U745 to the nomux list
  Input: elantech - mark protocols v2 and v3 as semi-mt
  mm: fix regression in remap_file_pages() emulation
  mm: replace vma_lock_anon_vma with anon_vma_lock_read/write
  mm: fix mlock accouting
  libnvdimm: fix namespace object confusion in is_uuid_busy()
  mm: soft-offline: check return value in second __get_any_page() call
  perf kvm record/report: 'unprocessable sample' error while recording/reporting guest data
  KVM: PPC: Fix ONE_REG AltiVec support
  KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8
  KVM: arm/arm64: Fix reference to uninitialised VGIC
  arm64: dma-mapping: fix handling of devices registered before arch_initcall
  ARM: OMAP2+: Fix ppa_zero_params and ppa_por_params for rodata
  ARM: OMAP2+: Fix save_secure_ram_context for rodata
  ARM: OMAP2+: Fix l2dis_3630 for rodata
  ARM: OMAP2+: Fix l2_inv_api_params for rodata
  ARM: OMAP2+: Fix wait_dll_lock_timed for rodata
  ARM: dts: at91: sama5d4ek: add phy address and IRQ for macb0
  ARM: dts: at91: sama5d4 xplained: fix phy0 IRQ type
  ARM: dts: at91: sama5d4: fix instance id of DBGU
  ARM: dts: at91: sama5d4 xplained: properly mux phy interrupt
  ARM: dts: omap5-board-common: enable rtc and charging of backup battery
  ARM: dts: Fix omap5 PMIC control lines for RTC writes
  ARM: dts: Fix wl12xx missing clocks that cause hangs
  ARM: nomadik: fix up SD/MMC DT settings
  ARM: 8517/1: ICST: avoid arithmetic overflow in icst_hz()
  ARM: 8519/1: ICST: try other dividends than 1
  arm64: mm: avoid calling apply_to_page_range on empty range
  ARM: mvebu: remove duplicated regulator definition in Armada 388 GP
  powerpc/ioda: Set "read" permission when "write" is set
  powerpc/powernv: Fix stale PE primary bus
  powerpc/eeh: Fix stale cached primary bus
  powerpc/eeh: Fix PE location code
  SUNRPC: Fixup socket wait for memory
  udf: Check output buffer length when converting name to CS0
  udf: Prevent buffer overrun with multi-byte characters
  udf: limit the maximum number of indirect extents in a row
  pNFS/flexfiles: Fix an XDR encoding bug in layoutreturn
  nfs: Fix race in __update_open_stateid()
  pNFS/flexfiles: Fix an Oopsable typo in ff_mirror_match_fh()
  NFS: Fix attribute cache revalidation
  cifs: fix erroneous return value
  cifs_dbg() outputs an uninitialized buffer in cifs_readdir()
  cifs: fix race between call_async() and reconnect()
  cifs: Ratelimit kernel log messages
  iio: inkern: fix a NULL dereference on error
  iio: pressure: mpl115: fix temperature offset sign
  iio: light: acpi-als: Report data as processed
  iio: dac: mcp4725: set iio name property in sysfs
  iio: add IIO_TRIGGER dependency to STK8BA50
  iio: add HAS_IOMEM dependency to VF610_ADC
  iio-light: Use a signed return type for ltr501_match_samp_freq()
  iio:adc:ti_am335x_adc Fix buffered mode by identifying as software buffer.
  iio: adis_buffer: Fix out-of-bounds memory access
  scsi: fix soft lockup in scsi_remove_target() on module removal
  SCSI: Add Marvell Console to VPD blacklist
  scsi_dh_rdac: always retry MODE SELECT on command lock violation
  drivers/scsi/sg.c: mark VMA as VM_IO to prevent migration
  SCSI: fix crashes in sd and sr runtime PM
  iscsi-target: Fix potential dead-lock during node acl delete
  scsi: add Synology to 1024 sector blacklist
  klist: fix starting point removed bug in klist iterators
  tracepoints: Do not trace when cpu is offline
  tracing: Fix freak link error caused by branch tracer
  perf tools: tracepoint_error() can receive e=NULL, robustify it
  tools lib traceevent: Fix output of %llu for 64 bit values read on 32 bit machines
  ptrace: use fsuid, fsgid, effective creds for fs access checks
  Btrfs: fix direct IO requests not reporting IO error to user space
  Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl
  Btrfs: fix page reading in extent_same ioctl leading to csum errors
  Btrfs: fix invalid page accesses in extent_same (dedup) ioctl
  btrfs: properly set the termination value of ctx->pos in readdir
  Revert "btrfs: clear PF_NOFREEZE in cleaner_kthread()"
  Btrfs: fix fitrim discarding device area reserved for boot loader's use
  btrfs: handle invalid num_stripes in sys_array
  ext4: don't read blocks from disk after extents being swapped
  ext4: fix potential integer overflow
  ext4: fix scheduling in atomic on group checksum failure
  serial: omap: Prevent DoS using unprivileged ioctl(TIOCSRS485)
  serial: 8250_pci: Add Intel Broadwell ports
  tty: Add support for PCIe WCH382 2S multi-IO card
  pty: make sure super_block is still valid in final /dev/tty close
  pty: fix possible use after free of tty->driver_data
  staging/speakup: Use tty_ldisc_ref() for paste kworker
  phy: twl4030-usb: Fix unbalanced pm_runtime_enable on module reload
  phy: twl4030-usb: Relase usb phy on unload
  ALSA: seq: Fix double port list deletion
  ALSA: seq: Fix leak of pool buffer at concurrent writes
  ALSA: pcm: Fix rwsem deadlock for non-atomic PCM stream
  ALSA: hda - Cancel probe work instead of flush at remove
  x86/mm: Fix vmalloc_fault() to handle large pages properly
  x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache()
  x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable
  x86/mm/pat: Avoid truncation when converting cpa->numpages to address
  x86/mm: Fix types used in pgprot cacheability flags translations
  Linux 4.4.2
  HID: multitouch: fix input mode switching on some Elan panels
  mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any progress
  zsmalloc: fix migrate_zspage-zs_free race condition
  zram: don't call idr_remove() from zram_remove()
  zram: try vmalloc() after kmalloc()
  zram/zcomp: use GFP_NOIO to allocate streams
  rtlwifi: rtl8821ae: Fix 5G failure when EEPROM is incorrectly encoded
  rtlwifi: rtl8821ae: Fix errors in parameter initialization
  crypto: marvell/cesa - fix test in mv_cesa_dev_dma_init()
  crypto: atmel-sha - remove calls of clk_prepare() from atomic contexts
  crypto: atmel-sha - fix atmel_sha_remove()
  crypto: algif_skcipher - Do not set MAY_BACKLOG on the async path
  crypto: algif_skcipher - Do not dereference ctx without socket lock
  crypto: algif_skcipher - Do not assume that req is unchanged
  crypto: user - lock crypto_alg_list on alg dump
  EVM: Use crypto_memneq() for digest comparisons
  crypto: algif_hash - wait for crypto_ahash_init() to complete
  crypto: shash - Fix has_key setting
  crypto: chacha20-ssse3 - Align stack pointer to 64 bytes
  crypto: caam - make write transactions bufferable on PPC platforms
  crypto: algif_skcipher - sendmsg SG marking is off by one
  crypto: algif_skcipher - Load TX SG list after waiting
  crypto: crc32c - Fix crc32c soft dependency
  crypto: algif_skcipher - Fix race condition in skcipher_check_key
  crypto: algif_hash - Fix race condition in hash_check_key
  crypto: af_alg - Forbid bind(2) when nokey child sockets are present
  crypto: algif_skcipher - Remove custom release parent function
  crypto: algif_hash - Remove custom release parent function
  crypto: af_alg - Allow af_af_alg_release_parent to be called on nokey path
  ahci: Intel DNV device IDs SATA
  libata: disable forced PORTS_IMPL for >= AHCI 1.3
  crypto: algif_skcipher - Add key check exception for cipher_null
  crypto: skcipher - Add crypto_skcipher_has_setkey
  crypto: algif_hash - Require setkey before accept(2)
  crypto: hash - Add crypto_ahash_has_setkey
  crypto: algif_skcipher - Add nokey compatibility path
  crypto: af_alg - Add nokey compatibility path
  crypto: af_alg - Fix socket double-free when accept fails
  crypto: af_alg - Disallow bind/setkey/... after accept(2)
  crypto: algif_skcipher - Require setkey before accept(2)
  sched: Fix crash in sched_init_numa()
  ext4 crypto: add missing locking for keyring_key access
  iommu/io-pgtable-arm: Ensure we free the final level on teardown
  tty: Fix unsafe ldisc reference via ioctl(TIOCGETD)
  tty: Retry failed reopen if tty teardown in-progress
  tty: Wait interruptibly for tty lock on reopen
  n_tty: Fix unsafe reference to "other" ldisc
  usb: xhci: apply XHCI_PME_STUCK_QUIRK to Intel Broxton-M platforms
  usb: xhci: handle both SSIC ports in PME stuck quirk
  usb: phy: msm: fix error handling in probe.
  usb: cdc-acm: send zero packet for intel 7260 modem
  usb: cdc-acm: handle unlinked urb in acm read callback
  USB: option: fix Cinterion AHxx enumeration
  USB: serial: option: Adding support for Telit LE922
  USB: cp210x: add ID for IAI USB to RS485 adaptor
  USB: serial: ftdi_sio: add support for Yaesu SCU-18 cable
  usb: hub: do not clear BOS field during reset device
  USB: visor: fix null-deref at probe
  USB: serial: visor: fix crash on detecting device without write_urbs
  ASoC: rt5645: fix the shift bit of IN1 boost
  saa7134-alsa: Only frees registered sound cards
  ALSA: dummy: Implement timer backend switching more safely
  ALSA: hda - Fix bad dereference of jack object
  ALSA: hda - Fix speaker output from VAIO AiO machines
  Revert "ALSA: hda - Fix noise on Gigabyte Z170X mobo"
  ALSA: hda - Fix static checker warning in patch_hdmi.c
  ALSA: hda - Add fixup for Mac Mini 7,1 model
  ALSA: timer: Fix race between stop and interrupt
  ALSA: timer: Fix wrong instance passed to slave callbacks
  ALSA: timer: Fix race at concurrent reads
  ALSA: timer: Fix link corruption due to double start or stop
  ALSA: timer: Fix leftover link at closing
  ALSA: timer: Code cleanup
  ALSA: seq: Fix lockdep warnings due to double mutex locks
  ALSA: seq: Fix race at closing in virmidi driver
  ALSA: seq: Fix yet another races among ALSA timer accesses
  ASoC: dpcm: fix the BE state on hw_free
  ALSA: pcm: Fix potential deadlock in OSS emulation
  ALSA: hda/realtek - Support Dell headset mode for ALC225
  ALSA: hda/realtek - Support headset mode for ALC225
  ALSA: hda/realtek - New codec support of ALC225
  ALSA: rawmidi: Fix race at copying & updating the position
  ALSA: rawmidi: Remove kernel WARNING for NULL user-space buffer check
  ALSA: rawmidi: Make snd_rawmidi_transmit() race-free
  ALSA: seq: Degrade the error message for too many opens
  ALSA: seq: Fix incorrect sanity check at snd_seq_oss_synth_cleanup()
  ALSA: dummy: Disable switching timer backend via sysfs
  ALSA: compress: Disable GET_CODEC_CAPS ioctl for some architectures
  ALSA: hda - disable dynamic clock gating on Broxton before reset
  ALSA: Add missing dependency on CONFIG_SND_TIMER
  ALSA: bebob: Use a signed return type for get_formation_index
  ALSA: usb-audio: avoid freeing umidi object twice
  ALSA: usb-audio: Add native DSD support for PS Audio NuWave DAC
  ALSA: usb-audio: Fix OPPO HA-1 vendor ID
  ALSA: usb-audio: Add quirk for Microsoft LifeCam HD-6000
  ALSA: usb-audio: Fix TEAC UD-501/UD-503/NT-503 usb delay
  hrtimer: Handle remaining time proper for TIME_LOW_RES
  md/raid: only permit hot-add of compatible integrity profiles
  media: i2c: Don't export ir-kbd-i2c module alias
  parisc: Fix __ARCH_SI_PREAMBLE_SIZE
  parisc: Protect huge page pte changes with spinlocks
  printk: do cond_resched() between lines while outputting to consoles
  tracing/stacktrace: Show entire trace if passed in function not found
  tracing: Fix stacktrace skip depth in trace_buffer_unlock_commit_regs()
  PCI: Fix minimum allocation address overwrite
  PCI: host: Mark PCIe/PCI (MSI) IRQ cascade handlers as IRQF_NO_THREAD
  mtd: nand: assign reasonable default name for NAND drivers
  wlcore/wl12xx: spi: fix NULL pointer dereference (Oops)
  wlcore/wl12xx: spi: fix oops on firmware load
  ocfs2/dlm: clear refmap bit of recovery lock while doing local recovery cleanup
  ocfs2/dlm: ignore cleaning the migration mle that is inuse
  ALSA: hda - Implement loopback control switch for Realtek and other codecs
  block: fix bio splitting on max sectors
  base/platform: Fix platform drivers with no probe callback
  HID: usbhid: fix recursive deadlock
  ocfs2: NFS hangs in __ocfs2_cluster_lock due to race with ocfs2_unblock_lock
  block: split bios to max possible length
  NFSv4.1/pnfs: Fixup an lo->plh_block_lgets imbalance in layoutreturn
  crypto: sun4i-ss - add missing statesize
  Linux 4.4.1
  arm64: kernel: fix architected PMU registers unconditional access
  arm64: kernel: enforce pmuserenr_el0 initialization and restore
  arm64: mm: ensure that the zero page is visible to the page table walker
  arm64: Clear out any singlestep state on a ptrace detach operation
  powerpc/module: Handle R_PPC64_ENTRY relocations
  scripts/recordmcount.pl: support data in text section on powerpc
  powerpc: Make {cmp}xchg* and their atomic_ versions fully ordered
  powerpc: Make value-returning atomics fully ordered
  powerpc/tm: Check for already reclaimed tasks
  batman-adv: Drop immediate orig_node free function
  batman-adv: Drop immediate batadv_hard_iface free function
  batman-adv: Drop immediate neigh_ifinfo free function
  batman-adv: Drop immediate batadv_neigh_node free function
  batman-adv: Drop immediate batadv_orig_ifinfo free function
  batman-adv: Avoid recursive call_rcu for batadv_nc_node
  batman-adv: Avoid recursive call_rcu for batadv_bla_claim
  team: Replace rcu_read_lock with a mutex in team_vlan_rx_kill_vid
  net/mlx5_core: Fix trimming down IRQ number
  bridge: fix lockdep addr_list_lock false positive splat
  ipv6: update skb->csum when CE mark is propagated
  net: bpf: reject invalid shifts
  phonet: properly unshare skbs in phonet_rcv()
  dwc_eth_qos: Fix dma address for multi-fragment skbs
  bonding: Prevent IPv6 link local address on enslaved devices
  net: preserve IP control block during GSO segmentation
  udp: disallow UFO for sockets with SO_NO_CHECK option
  net: pktgen: fix null ptr deref in skb allocation
  sched,cls_flower: set key address type when present
  tcp_yeah: don't set ssthresh below 2
  ipv6: tcp: add rcu locking in tcp_v6_send_synack()
  net: sctp: prevent writes to cookie_hmac_alg from accessing invalid memory
  vxlan: fix test which detect duplicate vxlan iface
  unix: properly account for FDs passed over unix sockets
  xhci: refuse loading if nousb is used
  usb: core: lpm: fix usb3_hardware_lpm sysfs node
  USB: cp210x: add ID for ELV Marble Sound Board 1
  rtlwifi: fix memory leak for USB device
  ASoC: compress: Fix compress device direction check
  ASoC: wm5110: Fix PGA clear when disabling DRE
  ALSA: timer: Handle disconnection more safely
  ALSA: hda - Flush the pending probe work at remove
  ALSA: hda - Fix missing module loading with model=generic option
  ALSA: hda - Fix bass pin fixup for ASUS N550JX
  ALSA: control: Avoid kernel warnings from tlv ioctl with numid 0
  ALSA: hrtimer: Fix stall by hrtimer_cancel()
  ALSA: pcm: Fix snd_pcm_hw_params struct copy in compat mode
  ALSA: seq: Fix snd_seq_call_port_info_ioctl in compat mode
  ALSA: hda - Add fixup for Dell Latitidue E6540
  ALSA: timer: Fix double unlink of active_list
  ALSA: timer: Fix race among timer ioctls
  ALSA: hda - fix the headset mic detection problem for a Dell laptop
  ALSA: timer: Harden slave timer list handling
  ALSA: usb-audio: Fix mixer ctl regression of Native Instrument devices
  ALSA: hda - Fix white noise on Dell Latitude E5550
  ALSA: seq: Fix race at timer setup and close
  ALSA: usb-audio: Avoid calling usb_autopm_put_interface() at disconnect
  ALSA: seq: Fix missing NULL check at remove_events ioctl
  ALSA: hda - Fixup inverted internal mic for Lenovo E50-80
  ALSA: usb: Add native DSD support for Oppo HA-1
  x86/mm: Improve switch_mm() barrier comments
  x86/mm: Add barriers and document switch_mm()-vs-flush synchronization
  x86/boot: Double BOOT_HEAP_SIZE to 64KB
  x86/reboot/quirks: Add iMac10,1 to pci_reboot_dmi_table[]
  kvm: x86: Fix vmwrite to SECONDARY_VM_EXEC_CONTROL
  KVM: x86: correctly print #AC in traces
  KVM: x86: expose MSR_TSC_AUX to userspace
  x86/xen: don't reset vcpu_info on a cancelled suspend
  KEYS: Fix keyring ref leak in join_session_keyring()

Conflicts:
	arch/arm64/kernel/perf_event.c
	drivers/scsi/sd.c
	sound/core/compress_offload.c

Change-Id: I9f77fe42aaae249c24cd6e170202110ab1426878
Signed-off-by: Trilok Soni <tsoni@codeaurora.org>
2016-03-23 20:51:00 -07:00
Srivatsa Vaddagiri
207d78dd26 sched: Add userspace interface to set PF_WAKE_UP_IDLE
sched_prefer_idle flag controls whether tasks can be woken to any
available idle cpu. It may be desirable to set sched_prefer_idle to 0
so that most tasks wake up to non-idle cpus under mostly_idle
threshold and have specialized tasks override this behavior through
other means. PF_WAKE_UP_IDLE flag per task provides exactly that. It
lets tasks with PF_WAKE_UP_IDLE flag set be woken up to any available
idle cpu independent of sched_prefer_idle flag setting. Currently
only kernel-space API exists to set PF_WAKE_UP_IDLE flag for a task.
This patch adds a user-space API (in /proc filesystem) to set
PF_WAKE_UP_IDLE flag for a given task. /proc/[pid]/sched_wake_up_idle
file can be written to set or clear PF_WAKE_UP_IDLE flag for a given
task.

Change-Id: I13a37e740195e503f457ebe291d54e83b230fbeb
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in kernel/sched/fair.c]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:33 -07:00
Srivatsa Vaddagiri
33af11b6f4 sched: Add API to set task's initial task load
Add a per-task attribute, init_load_pct, that is used to initialize
newly created children's initial task load. This helps important
applications launch their child tasks on cpus with highest capacity.

Change-Id: Ie9665fd2aeb15203f95fd7f211c50bebbaa18727
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict int init_new_task_load.
 se.avg.runnable_avg_sum has deprecated.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:00:58 -07:00
Mona Hossain
e0f32de9cf qmp: Add support for QSSP Enhancements
update seemp_api header file
update seemp_parm_id header file
update seemp_core to log data with blk header set to 64B
remove logging from fs proc base and net socket modules

Change-Id: I583e3129d62651b155b0372e173564d5a17e3153
Signed-off-by: Mona Hossain <mhossain@codeaurora.org>
2016-03-23 19:58:14 -07:00
William Clark
0990b37505 seemp: enhance support for malware detection
Improves the ability of a malware protection program
to detect anomalies in various activities. It records
task activities in a log and rates the actions
according to how a typical user would use the tools.

Change-Id: I976bc97f57215f173b046326b5f905522d785288
Signed-off-by: Mona Hossain <mhossain@codeaurora.org>
Signed-off-by: William Clark <wclark@codeaurora.org>
2016-03-23 19:58:11 -07:00
Nikhilesh Reddy
5a9fde57cf fuse: Add support for passthrough read/write
Add support for filesystem passthrough read/write of files
when enabled in userspace through the option FUSE_PASSTHROUGH.

There are many FUSE based filesystems that perform checks or
enforce policy or perform some kind of decision making in certain
functions like the "open" call but simply act as a "passthrough"
when performing operations such as read or write.

When FUSE_PASSTHROUGH is enabled all the reads and writes
to the fuse mount point go directly to the passthrough filesystem
i.e a native filesystem that actually hosts the files rather than
through the fuse daemon. All requests that aren't read/write still
go thought the userspace code.

This allows for significantly better performance on read and writes.
The difference in performance between fuse and the native lower
filesystem is negligible.

There is also a significant cpu/power savings that is achieved which
is really important on embedded systems that use fuse for I/O.

Changelog:

v5:
Fix the check when setting the passthrough file
[Found when testing by Mike Shal]

v3 and v4:
Use the fs_stack_depth to prevent further stacking and a minor fix
[Fix suggested by Jann Horn]

v2:
Changed the feature name to passthrough from stacked_io
[Proposed by Linus Torvalds]

Signed-off-by: Nikhilesh Reddy <reddyn@codeaurora.org>
2016-03-22 11:15:47 -07:00
Girish Mahadevan
d5baa81480 tty: serial: msm_hs: Add snapshot of msm_hs uart drivers
Snapshot of msm_hs uart driver from msm-3.18.
Baseline:
e70ad0cd5efdd9dc91a77dcdac31d6132e1315c1

Change-Id: I977d80142c2cf8df89810f1c38d523d53371cc46
Signed-off-by: Girish Mahadevan <girishm@codeaurora.org>
2016-03-22 11:10:33 -07:00
Mao Jinlong
f0aafc9926 rtc: alarm: Add power-on alarm feature
Android does not support powering-up the phone through alarm.
Set rtc alarm in timerfd to power-up the phone after alarm
expiration.

Change-Id: I781389c658fb00ba7f0ce089d706c10f202a7dc6
Signed-off-by: Mao Jinlong <c_jmao@codeaurora.org>
2016-03-22 11:08:58 -07:00
Venkat Gopalakrishnan
a627c73479 block/fs: make tracking dirty task debug only
Adding a new element "tsk_dirty" to struct page increases the size
of mem_map/vmemmap, restrict this to a debug only functionality to
save few MB of memory.

Considering a system with 1G of RAM, there will be nearly 262144
pages and thus that many number of page structures in mem_map/vmemmap.
With pointer size of 8 bytes on a 64 bit system, adding this
pointer to "struct page" means an increase of "2MB" for mem_map.

CRs-Fixed: 738692
Change-Id: Idf3217dcbe17cf1ab4d462d2aa8d39da1ffd8b13
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
[venkatg@codeaurora.org: Fixed trivial merge conflict]
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22 11:02:01 -07:00
Venkat Gopalakrishnan
014929f975 block/fs: keep track of the task that dirtied the page
Background writes happen in the context of a background thread.
It is very useful to identify the actual task that generated the
request instead of background task that submited the request.
Hence keep track of the task when a page gets dirtied and dump
this task info while tracing. Not all the pages in the bio are
dirtied by the same task but most likely it will be, since the
sectors accessed on the device must be adjacent.

Change-Id: I6afba85a2063dd3350a0141ba87cf8440ce9f777
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
[venkatg@codeaurora.org: Fixed trivial merge conflicts]
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22 11:02:00 -07:00
Thomas Gleixner
565f222968 timerfd: Handle relative timers with CONFIG_TIME_LOW_RES proper
commit b62526ed11a1fe3861ab98d40b7fdab8981d788a upstream.

Helge reported that a relative timer can return a remaining time larger than
the programmed relative time on parisc and other architectures which have
CONFIG_TIME_LOW_RES set. This happens because we add a jiffie to the resulting
expiry time to prevent short timeouts.

Use the new function hrtimer_expires_remaining_adjusted() to calculate the
remaining time. It takes that extra added time into account for relative
timers.

Reported-and-tested-by: Helge Deller <deller@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: linux-m68k@lists.linux-m68k.org
Cc: dhowells@redhat.com
Link: http://lkml.kernel.org/r/20160114164159.354500742@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:25 -08:00
Dave Chinner
f86701c4f3 xfs: log mount failures don't wait for buffers to be released
commit 85bec5460ad8e05e0a8d70fb0f6750eb719ad092 upstream.

Recently I've been seeing xfs/051 fail on 1k block size filesystems.
Trying to trace the events during the test lead to the problem going
away, indicating that it was a race condition that lead to this
ASSERT failure:

XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 156
.....
[<ffffffff814e1257>] xfs_free_perag+0x87/0xb0
[<ffffffff814e21b9>] xfs_mountfs+0x4d9/0x900
[<ffffffff814e5dff>] xfs_fs_fill_super+0x3bf/0x4d0
[<ffffffff811d8800>] mount_bdev+0x180/0x1b0
[<ffffffff814e3ff5>] xfs_fs_mount+0x15/0x20
[<ffffffff811d90a8>] mount_fs+0x38/0x170
[<ffffffff811f4347>] vfs_kern_mount+0x67/0x120
[<ffffffff811f7018>] do_mount+0x218/0xd60
[<ffffffff811f7e5b>] SyS_mount+0x8b/0xd0

When I finally caught it with tracing enabled, I saw that AG 2 had
an elevated reference count and a buffer was responsible for it. I
tracked down the specific buffer, and found that it was missing the
final reference count release that would put it back on the LRU and
hence be found by xfs_wait_buftarg() calls in the log mount failure
handling.

The last four traces for the buffer before the assert were (trimmed
for relevance)

kworker/0:1-5259   xfs_buf_iodone:        hold 2  lock 0 flags ASYNC
kworker/0:1-5259   xfs_buf_ioerror:       hold 2  lock 0 error -5
mount-7163	   xfs_buf_lock_done:     hold 2  lock 0 flags ASYNC
mount-7163	   xfs_buf_unlock:        hold 2  lock 1 flags ASYNC

This is an async write that is completing, so there's nobody waiting
for it directly.  Hence we call xfs_buf_relse() once all the
processing is complete. That does:

static inline void xfs_buf_relse(xfs_buf_t *bp)
{
	xfs_buf_unlock(bp);
	xfs_buf_rele(bp);
}

Now, it's clear that mount is waiting on the buffer lock, and that
it has been released by xfs_buf_relse() and gained by mount. This is
expected, because at this point the mount process is in
xfs_buf_delwri_submit() waiting for all the IO it submitted to
complete.

The mount process, however, is waiting on the lock for the buffer
because it is in xfs_buf_delwri_submit(). This waits for IO
completion, but it doesn't wait for the buffer reference owned by
the IO to go away. The mount process collects all the completions,
fails the log recovery, and the higher level code then calls
xfs_wait_buftarg() to free all the remaining buffers in the
filesystem.

The issue is that on unlocking the buffer, the scheduler has decided
that the mount process has higher priority than the the kworker
thread that is running the IO completion, and so immediately
switched contexts to the mount process from the semaphore unlock
code, hence preventing the kworker thread from finishing the IO
completion and releasing the IO reference to the buffer.

Hence by the time that xfs_wait_buftarg() is run, the buffer still
has an active reference and so isn't on the LRU list that the
function walks to free the remaining buffers. Hence we miss that
buffer and continue onwards to tear down the mount structures,
at which time we get find a stray reference count on the perag
structure. On a non-debug kernel, this will be ignored and the
structure torn down and freed. Hence when the kworker thread is then
rescheduled and the buffer released and freed, it will access a
freed perag structure.

The problem here is that when the log mount fails, we still need to
quiesce the log to ensure that the IO workqueues have returned to
idle before we run xfs_wait_buftarg(). By synchronising the
workqueues, we ensure that all IO completions are fully processed,
not just to the point where buffers have been unlocked. This ensures
we don't end up in the situation above.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:24 -08:00
Dave Chinner
16f14a28f6 Revert "xfs: clear PF_NOFREEZE for xfsaild kthread"
commit 3e85286e75224fa3f08bdad20e78c8327742634e upstream.

This reverts commit 24ba16bb3d as it
prevents machines from suspending. This regression occurs when the
xfsaild is idle on entry to suspend, and so there s no activity to
wake it from it's idle sleep and hence see that it is supposed to
freeze. Hence the freezer times out waiting for it and suspend is
cancelled.

There is no obvious fix for this short of freezing the filesystem
properly, so revert this change for now.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:24 -08:00
Dave Chinner
7530e6fdd9 xfs: inode recovery readahead can race with inode buffer creation
commit b79f4a1c68bb99152d0785ee4ea3ab4396cdacc6 upstream.

When we do inode readahead in log recovery, we do can do the
readahead before we've replayed the icreate transaction that stamps
the buffer with inode cores. The inode readahead verifier catches
this and marks the buffer as !done to indicate that it doesn't yet
contain valid inodes.

In adding buffer error notification  (i.e. setting b_error = -EIO at
the same time as as we clear the done flag) to such a readahead
verifier failure, we can then get subsequent inode recovery failing
with this error:

XFS (dm-0): metadata I/O error: block 0xa00060 ("xlog_recover_do..(read#2)") error 5 numblks 32

This occurs when readahead completion races with icreate item replay
such as:

	inode readahead
		find buffer
		lock buffer
		submit RA io
	....
	icreate recovery
	    xfs_trans_get_buffer
		find buffer
		lock buffer
		<blocks on RA completion>
	.....
	<ra completion>
		fails verifier
		clear XBF_DONE
		set bp->b_error = -EIO
		release and unlock buffer
	<icreate gains lock>
	icreate initialises buffer
	marks buffer as done
	adds buffer to delayed write queue
	releases buffer

At this point, we have an initialised inode buffer that is up to
date but has an -EIO state registered against it. When we finally
get to recovering an inode in that buffer:

	inode item recovery
	    xfs_trans_read_buffer
		find buffer
		lock buffer
		sees XBF_DONE is set, returns buffer
	    sees bp->b_error is set
		fail log recovery!

Essentially, we need xfs_trans_get_buf_map() to clear the error status of
the buffer when doing a lookup. This function returns uninitialised
buffers, so the buffer returned can not be in an error state and
none of the code that uses this function expects b_error to be set
on return. Indeed, there is an ASSERT(!bp->b_error); in the
transaction case in xfs_trans_get_buf_map() that would have caught
this if log recovery used transactions....

This patch firstly changes the inode readahead failure to set -EIO
on the buffer, and secondly changes xfs_buf_get_map() to never
return a buffer with an error state set so this first change doesn't
cause unexpected log recovery failures.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:24 -08:00
Darrick J. Wong
888959f2fd libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct
commit 96f859d52bcb1c6ea6f3388d39862bf7143e2f30 upstream.

Because struct xfs_agfl is 36 bytes long and has a 64-bit integer
inside it, gcc will quietly round the structure size up to the nearest
64 bits -- in this case, 40 bytes.  This results in the XFS_AGFL_SIZE
macro returning incorrect results for v5 filesystems on 64-bit
machines (118 items instead of 119).  As a result, a 32-bit xfs_repair
will see garbage in AGFL item 119 and complain.

Therefore, tell gcc not to pad the structure so that the AGFL size
calculation is correct.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:24 -08:00
Miklos Szeredi
8373f6590f ovl: setattr: check permissions before copy-up
commit cf9a6784f7c1b5ee2b9159a1246e327c331c5697 upstream.

Without this copy-up of a file can be forced, even without actually being
allowed to do anything on the file.

[Arnd Bergmann] include <linux/pagemap.h> for PAGE_CACHE_SIZE (used by
MAX_LFS_FILESIZE definition).

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:24 -08:00
Miklos Szeredi
7193e80296 ovl: root: copy attr
commit ed06e069775ad9236087594a1c1667367e983fb5 upstream.

We copy i_uid and i_gid of underlying inode into overlayfs inode.  Except
for the root inode.

Fix this omission.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:24 -08:00
Konstantin Khlebnikov
367e439dbc ovl: check dentry positiveness in ovl_cleanup_whiteouts()
commit 84889d49335627bc770b32787c1ef9ebad1da232 upstream.

This patch fixes kernel crash at removing directory which contains
whiteouts from lower layers.

Cache of directory content passed as "list" contains entries from all
layers, including whiteouts from lower layers. So, lookup in upper dir
(moved into work at this stage) will return negative entry. Plus this
cache is filled long before and we can race with external removal.

Example:
 mkdir -p lower0/dir lower1/dir upper work overlay
 touch lower0/dir/a lower0/dir/b
 mknod lower1/dir/a c 0 0
 mount -t overlay none overlay -o lowerdir=lower1:lower0,upperdir=upper,workdir=work
 rm -fr overlay/dir

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:24 -08:00
Vito Caputo
fa932190a5 ovl: use a minimal buffer in ovl_copy_xattr
commit e4ad29fa0d224d05e08b2858e65f112fd8edd4fe upstream.

Rather than always allocating the high-order XATTR_SIZE_MAX buffer
which is costly and prone to failure, only allocate what is needed and
realloc if necessary.

Fixes https://github.com/coreos/bugs/issues/489

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:24 -08:00
Miklos Szeredi
85a7ed329a ovl: allow zero size xattr
commit 97daf8b97ad6f913a34c82515be64dc9ac08d63e upstream.

When ovl_copy_xattr() encountered a zero size xattr no more xattrs were
copied and the function returned success.  This is clearly not the desired
behavior.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:23 -08:00
Michael Holzheu
4b20545910 numa: fix /proc/<pid>/numa_maps for hugetlbfs on s390
commit 5c2ff95e41c9290d16556cd02e35b25d81be8fe0 upstream.

When working with hugetlbfs ptes (which are actually pmds) is not valid to
directly use pte functions like pte_present() because the hardware bit
layout of pmds and ptes can be different.  This is the case on s390.
Therefore we have to convert the hugetlbfs ptes first into a valid pte
encoding with huge_ptep_get().

Currently the /proc/<pid>/numa_maps code uses hugetlbfs ptes without
huge_ptep_get().  On s390 this leads to the following two problems:

1) The pte_present() function returns false (instead of true) for
   PROT_NONE hugetlb ptes. Therefore PROT_NONE vmas are missing
   completely in the "numa_maps" output.

2) The pte_dirty() function always returns false for all hugetlb ptes.
   Therefore these pages are reported as "mapped=xxx" instead of
   "dirty=xxx".

Therefore use huge_ptep_get() to correctly convert the hugetlb ptes.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:22 -08:00
Mike Kravetz
db33368ca3 fs/hugetlbfs/inode.c: fix bugs in hugetlb_vmtruncate_list()
commit 9aacdd354d197ad64685941b36d28ea20ab88757 upstream.

Hillf Danton noticed bugs in the hugetlb_vmtruncate_list routine.  The
argument end is of type pgoff_t.  It was being converted to a vaddr
offset and passed to unmap_hugepage_range.  However, end was also being
used as an argument to the vma_interval_tree_foreach controlling loop.
In addition, the conversion of end to vaddr offset was incorrect.

hugetlb_vmtruncate_list is called as part of a file truncate or
fallocate hole punch operation.

When truncating a hugetlbfs file, this bug could prevent some pages from
being unmapped.  This is possible if there are multiple vmas mapping the
file, and there is a sufficiently sized hole between the mappings.  The
size of the hole between two vmas (A,B) must be such that the starting
virtual address of B is greater than (ending virtual address of A <<
PAGE_SHIFT).  In this case, the pages in B would not be unmapped.  If
pages are not properly unmapped during truncate, the following BUG is
hit:

	kernel BUG at fs/hugetlbfs/inode.c:428!

In the fallocate hole punch case, this bug could prevent pages from
being unmapped as in the truncate case.  However, for hole punch the
result is that unmapped pages will not be removed during the operation.
For hole punch, it is also possible that more pages than desired will be
unmapped.  This unnecessary unmapping will cause page faults to
reestablish the mappings on subsequent page access.

Fixes: 1bfad99ab (" hugetlbfs: hugetlb_vmtruncate_list() needs to take a range")Reported-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:22 -08:00
Andrew Gabbasov
d0452554b9 udf: Check output buffer length when converting name to CS0
commit bb00c898ad1ce40c4bb422a8207ae562e9aea7ae upstream.

If a name contains at least some characters with Unicode values
exceeding single byte, the CS0 output should have 2 bytes per character.
And if other input characters have single byte Unicode values, then
the single input byte is converted to 2 output bytes, and the length
of output becomes larger than the length of input. And if the input
name is long enough, the output length may exceed the allocated buffer
length.

All this means that conversion from UTF8 or NLS to CS0 requires
checking of output length in order to stop when it exceeds the given
output buffer size.

[JK: Make code return -ENAMETOOLONG instead of silently truncating the
name]

Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:18 -08:00
Andrew Gabbasov
eec1445767 udf: Prevent buffer overrun with multi-byte characters
commit ad402b265ecf6fa22d04043b41444cdfcdf4f52d upstream.

udf_CS0toUTF8 function stops the conversion when the output buffer
length reaches UDF_NAME_LEN-2, which is correct maximum name length,
but, when checking, it leaves the space for a single byte only,
while multi-bytes output characters can take more space, causing
buffer overflow.

Similar error exists in udf_CS0toNLS function, that restricts
the output length to UDF_NAME_LEN, while actual maximum allowed
length is UDF_NAME_LEN-2.

In these cases the output can override not only the current buffer
length field, causing corruption of the name buffer itself, but also
following allocation structures, causing kernel crash.

Adjust the output length checks in both functions to prevent buffer
overruns in case of multi-bytes UTF8 or NLS characters.

Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:18 -08:00
Vegard Nossum
aef22a3d69 udf: limit the maximum number of indirect extents in a row
commit b0918d9f476a8434b055e362b83fa4fd1d462c3f upstream.

udf_next_aext() just follows extent pointers while extents are marked as
indirect. This can loop forever for corrupted filesystem. Limit number
the of indirect extents we are willing to follow in a row.

[JK: Updated changelog, limit, style]

Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Jan Kara <jack@suse.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:18 -08:00
Trond Myklebust
66b8812e87 pNFS/flexfiles: Fix an XDR encoding bug in layoutreturn
commit 082fa37d1351a41afc491d44a1d095cb8d919aa2 upstream.

We must not skip encoding the statistics, or the server will see an
XDR encoding error.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:18 -08:00
Andrew Elble
d65eb5b3df nfs: Fix race in __update_open_stateid()
commit 361cad3c89070aeb37560860ea8bfc092d545adc upstream.

We've seen this in a packet capture - I've intermixed what I
think was going on. The fix here is to grab the so_lock sooner.

1964379 -> #1 open (for write) reply seqid=1
1964393 -> #2 open (for read) reply seqid=2

  __nfs4_close(), state->n_wronly--
  nfs4_state_set_mode_locked(), changes state->state = [R]
  state->flags is [RW]
  state->state is [R], state->n_wronly == 0, state->n_rdonly == 1

1964398 -> #3 open (for write) call -> because close is already running
1964399 -> downgrade (to read) call seqid=2 (close of #1)
1964402 -> #3 open (for write) reply seqid=3

 __update_open_stateid()
   nfs_set_open_stateid_locked(), changes state->flags
   state->flags is [RW]
   state->state is [R], state->n_wronly == 0, state->n_rdonly == 1
   new sequence number is exposed now via nfs4_stateid_copy()

   next step would be update_open_stateflags(), pending so_lock

1964403 -> downgrade reply seqid=2, fails with OLD_STATEID (close of #1)

   nfs4_close_prepare() gets so_lock and recalcs flags -> send close

1964405 -> downgrade (to read) call seqid=3 (close of #1 retry)

   __update_open_stateid() gets so_lock
 * update_open_stateflags() updates state->n_wronly.
   nfs4_state_set_mode_locked() updates state->state

   state->flags is [RW]
   state->state is [RW], state->n_wronly == 1, state->n_rdonly == 1

 * should have suppressed the preceding nfs4_close_prepare() from
   sending open_downgrade

1964406 -> write call
1964408 -> downgrade (to read) reply seqid=4 (close of #1 retry)

   nfs_clear_open_stateid_locked()
   state->flags is [R]
   state->state is [RW], state->n_wronly == 1, state->n_rdonly == 1

1964409 -> write reply (fails, openmode)

Signed-off-by: Andrew Elble <aweits@rit.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:18 -08:00
Trond Myklebust
c8841e15d6 pNFS/flexfiles: Fix an Oopsable typo in ff_mirror_match_fh()
commit 86fb449b07b8215443a30782dca5755d5b8b0577 upstream.

Jeff reports seeing an Oops in ff_layout_alloc_lseg. Turns out
copy+paste has played cruel tricks on a nested loop.

Reported-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:18 -08:00
Trond Myklebust
1873e6f486 NFS: Fix attribute cache revalidation
commit ade14a7df796d4e86bd9d181193c883a57b13db0 upstream.

If a NFSv4 client uses the cache_consistency_bitmask in order to
request only information about the change attribute, timestamps and
size, then it has not revalidated all attributes, and hence the
attribute timeout timestamp should not be updated.

Reported-by: Donald Buczek <buczek@molgen.mpg.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:18 -08:00
Anton Protopopov
dadfe92207 cifs: fix erroneous return value
commit 4b550af519854421dfec9f7732cdddeb057134b2 upstream.

The setup_ntlmv2_rsp() function may return positive value ENOMEM instead
of -ENOMEM in case of kmalloc failure.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:18 -08:00
Vasily Averin
7e30995b26 cifs_dbg() outputs an uninitialized buffer in cifs_readdir()
commit 01b9b0b28626db4a47d7f48744d70abca9914ef1 upstream.

In some cases tmp_bug can be not filled in cifs_filldir and stay uninitialized,
therefore its printk with "%s" modifier can leak content of kernelspace memory.
If old content of this buffer does not contain '\0' access bejond end of
allocated object can crash the host.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Steve French <sfrench@localhost.localdomain>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:17 -08:00
Rabin Vincent
5d80673404 cifs: fix race between call_async() and reconnect()
commit 820962dc700598ffe8cd21b967e30e7520c34748 upstream.

cifs_call_async() queues the MID to the pending list and calls
smb_send_rqst().  If smb_send_rqst() performs a partial send, it sets
the tcpStatus to CifsNeedReconnect and returns an error code to
cifs_call_async().  In this case, cifs_call_async() removes the MID
from the list and returns to the caller.

However, cifs_call_async() releases the server mutex _before_ removing
the MID.  This means that a cifs_reconnect() can race with this function
and manage to remove the MID from the list and delete the entry before
cifs_call_async() calls cifs_delete_mid().  This leads to various
crashes due to the use after free in cifs_delete_mid().

Task1				Task2

cifs_call_async():
 - rc = -EAGAIN
 - mutex_unlock(srv_mutex)

				cifs_reconnect():
				 - mutex_lock(srv_mutex)
				 - mutex_unlock(srv_mutex)
				 - list_delete(mid)
				 - mid->callback()
				 	cifs_writev_callback():
				 		- mutex_lock(srv_mutex)
						- delete(mid)
				 		- mutex_unlock(srv_mutex)

 - cifs_delete_mid(mid) <---- use after free

Fix this by removing the MID in cifs_call_async() before releasing the
srv_mutex.  Also hold the srv_mutex in cifs_reconnect() until the MIDs
are moved out of the pending list.

Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <sfrench@localhost.localdomain>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:17 -08:00
Jamie Bainbridge
88413fceab cifs: Ratelimit kernel log messages
commit ec7147a99e33a9e4abad6fc6e1b40d15df045d53 upstream.

Under some conditions, CIFS can repeatedly call the cifs_dbg() logging
wrapper. If done rapidly enough, the console framebuffer can softlockup
or "rcu_sched self-detected stall". Apply the built-in log ratelimiters
to prevent such hangs.

Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:17 -08:00
Jann Horn
969624b7c1 ptrace: use fsuid, fsgid, effective creds for fs access checks
commit caaee6234d05a58c5b4d05e7bf766131b810a657 upstream.

By checking the effective credentials instead of the real UID / permitted
capabilities, ensure that the calling process actually intended to use its
credentials.

To ensure that all ptrace checks use the correct caller credentials (e.g.
in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS
flag), use two new flags and require one of them to be set.

The problem was that when a privileged task had temporarily dropped its
privileges, e.g.  by calling setreuid(0, user_uid), with the intent to
perform following syscalls with the credentials of a user, it still passed
ptrace access checks that the user would not be able to pass.

While an attacker should not be able to convince the privileged task to
perform a ptrace() syscall, this is a problem because the ptrace access
check is reused for things in procfs.

In particular, the following somewhat interesting procfs entries only rely
on ptrace access checks:

 /proc/$pid/stat - uses the check for determining whether pointers
     should be visible, useful for bypassing ASLR
 /proc/$pid/maps - also useful for bypassing ASLR
 /proc/$pid/cwd - useful for gaining access to restricted
     directories that contain files with lax permissions, e.g. in
     this scenario:
     lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar
     drwx------ root root /root
     drwxr-xr-x root root /root/foobar
     -rw-r--r-- root root /root/foobar/secret

Therefore, on a system where a root-owned mode 6755 binary changes its
effective credentials as described and then dumps a user-specified file,
this could be used by an attacker to reveal the memory layout of root's
processes or reveal the contents of files he is not allowed to access
(through /proc/$pid/cwd).

[akpm@linux-foundation.org: fix warning]
Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:16 -08:00
Filipe Manana
ba6d92801b Btrfs: fix direct IO requests not reporting IO error to user space
commit 1636d1d77ef4e01e57f706a4cae3371463896136 upstream.

If a bio for a direct IO request fails, we were not setting the error in
the parent bio (the main DIO bio), making us not return the error to
user space in btrfs_direct_IO(), that is, it made __blockdev_direct_IO()
return the number of bytes issued for IO and not the error a bio created
and submitted by btrfs_submit_direct() got from the block layer.
This essentially happens because when we call:

   dio_end_io(dio_bio, bio->bi_error);

It does not set dio_bio->bi_error to the value of the second argument.
So just add this missing assignment in endio callbacks, just as we do in
the error path at btrfs_submit_direct() when we fail to clone the dio bio
or allocate its private object. This follows the convention of what is
done with other similar APIs such as bio_endio() where the caller is
responsible for setting the bi_error field in the bio it passes as an
argument to bio_endio().

This was detected by the new generic test cases in xfstests: 271, 272,
276 and 278. Which essentially setup a dm error target, then load the
error table, do a direct IO write and unload the error table. They
expect the write to fail with -EIO, which was not getting reported
when testing against btrfs.

Fixes: 4246a0b63b ("block: add a bi_error field to struct bio")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:15 -08:00
Filipe Manana
e8eced78e0 Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl
commit 0c0fe3b0fa45082cd752553fdb3a4b42503a118e upstream.

While doing some tests I ran into an hang on an extent buffer's rwlock
that produced the following trace:

[39389.800012] NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [fdm-stress:32166]
[39389.800016] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [fdm-stress:32165]
[39389.800016] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs]
[39389.800016] irq event stamp: 0
[39389.800016] hardirqs last  enabled at (0): [<          (null)>]           (null)
[39389.800016] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
[39389.800016] softirqs last  enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
[39389.800016] softirqs last disabled at (0): [<          (null)>]           (null)
[39389.800016] CPU: 14 PID: 32165 Comm: fdm-stress Not tainted 4.4.0-rc6-btrfs-next-18+ #1
[39389.800016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[39389.800016] task: ffff880175b1ca40 ti: ffff8800a185c000 task.ti: ffff8800a185c000
[39389.800016] RIP: 0010:[<ffffffff810902af>]  [<ffffffff810902af>] queued_spin_lock_slowpath+0x57/0x158
[39389.800016] RSP: 0018:ffff8800a185fb80  EFLAGS: 00000202
[39389.800016] RAX: 0000000000000101 RBX: ffff8801710c4e9c RCX: 0000000000000101
[39389.800016] RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000001
[39389.800016] RBP: ffff8800a185fb98 R08: 0000000000000001 R09: 0000000000000000
[39389.800016] R10: ffff8800a185fb68 R11: 6db6db6db6db6db7 R12: ffff8801710c4e98
[39389.800016] R13: ffff880175b1ca40 R14: ffff8800a185fc10 R15: ffff880175b1ca40
[39389.800016] FS:  00007f6d37fff700(0000) GS:ffff8802be9c0000(0000) knlGS:0000000000000000
[39389.800016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[39389.800016] CR2: 00007f6d300019b8 CR3: 0000000037c93000 CR4: 00000000001406e0
[39389.800016] Stack:
[39389.800016]  ffff8801710c4e98 ffff8801710c4e98 ffff880175b1ca40 ffff8800a185fbb0
[39389.800016]  ffffffff81091e11 ffff8801710c4e98 ffff8800a185fbc8 ffffffff81091895
[39389.800016]  ffff8801710c4e98 ffff8800a185fbe8 ffffffff81486c5c ffffffffa067288c
[39389.800016] Call Trace:
[39389.800016]  [<ffffffff81091e11>] queued_read_lock_slowpath+0x46/0x60
[39389.800016]  [<ffffffff81091895>] do_raw_read_lock+0x3e/0x41
[39389.800016]  [<ffffffff81486c5c>] _raw_read_lock+0x3d/0x44
[39389.800016]  [<ffffffffa067288c>] ? btrfs_tree_read_lock+0x54/0x125 [btrfs]
[39389.800016]  [<ffffffffa067288c>] btrfs_tree_read_lock+0x54/0x125 [btrfs]
[39389.800016]  [<ffffffffa0622ced>] ? btrfs_find_item+0xa7/0xd2 [btrfs]
[39389.800016]  [<ffffffffa069363f>] btrfs_ref_to_path+0xd6/0x174 [btrfs]
[39389.800016]  [<ffffffffa0693730>] inode_to_path+0x53/0xa2 [btrfs]
[39389.800016]  [<ffffffffa0693e2e>] paths_from_inode+0x117/0x2ec [btrfs]
[39389.800016]  [<ffffffffa0670cff>] btrfs_ioctl+0xd5b/0x2793 [btrfs]
[39389.800016]  [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
[39389.800016]  [<ffffffff81276727>] ? __this_cpu_preempt_check+0x13/0x15
[39389.800016]  [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
[39389.800016]  [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d
[39389.800016]  [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea
[39389.800016]  [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71
[39389.800016]  [<ffffffff8118240e>] SyS_ioctl+0x57/0x79
[39389.800016]  [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
[39389.800016] Code: b9 01 01 00 00 f7 c6 00 ff ff ff 75 32 83 fe 01 89 ca 89 f0 0f 45 d7 f0 0f b1 13 39 f0 74 04 89 c6 eb e2 ff ca 0f 84 fa 00 00 00 <8b> 03 84 c0 74 04 f3 90 eb f6 66 c7 03 01 00 e9 e6 00 00 00 e8
[39389.800012] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs]
[39389.800012] irq event stamp: 0
[39389.800012] hardirqs last  enabled at (0): [<          (null)>]           (null)
[39389.800012] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
[39389.800012] softirqs last  enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
[39389.800012] softirqs last disabled at (0): [<          (null)>]           (null)
[39389.800012] CPU: 15 PID: 32166 Comm: fdm-stress Tainted: G             L  4.4.0-rc6-btrfs-next-18+ #1
[39389.800012] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[39389.800012] task: ffff880179294380 ti: ffff880034a60000 task.ti: ffff880034a60000
[39389.800012] RIP: 0010:[<ffffffff81091e8d>]  [<ffffffff81091e8d>] queued_write_lock_slowpath+0x62/0x72
[39389.800012] RSP: 0018:ffff880034a639f0  EFLAGS: 00000206
[39389.800012] RAX: 0000000000000101 RBX: ffff8801710c4e98 RCX: 0000000000000000
[39389.800012] RDX: 00000000000000ff RSI: 0000000000000000 RDI: ffff8801710c4e9c
[39389.800012] RBP: ffff880034a639f8 R08: 0000000000000001 R09: 0000000000000000
[39389.800012] R10: ffff880034a639b0 R11: 0000000000001000 R12: ffff8801710c4e98
[39389.800012] R13: 0000000000000001 R14: ffff880172cbc000 R15: ffff8801710c4e00
[39389.800012] FS:  00007f6d377fe700(0000) GS:ffff8802be9e0000(0000) knlGS:0000000000000000
[39389.800012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[39389.800012] CR2: 00007f6d3d3c1000 CR3: 0000000037c93000 CR4: 00000000001406e0
[39389.800012] Stack:
[39389.800012]  ffff8801710c4e98 ffff880034a63a10 ffffffff81091963 ffff8801710c4e98
[39389.800012]  ffff880034a63a30 ffffffff81486f1b ffffffffa0672cb3 ffff8801710c4e00
[39389.800012]  ffff880034a63a78 ffffffffa0672cb3 ffff8801710c4e00 ffff880034a63a58
[39389.800012] Call Trace:
[39389.800012]  [<ffffffff81091963>] do_raw_write_lock+0x72/0x8c
[39389.800012]  [<ffffffff81486f1b>] _raw_write_lock+0x3a/0x41
[39389.800012]  [<ffffffffa0672cb3>] ? btrfs_tree_lock+0x119/0x251 [btrfs]
[39389.800012]  [<ffffffffa0672cb3>] btrfs_tree_lock+0x119/0x251 [btrfs]
[39389.800012]  [<ffffffffa061aeba>] ? rcu_read_unlock+0x5b/0x5d [btrfs]
[39389.800012]  [<ffffffffa061ce13>] ? btrfs_root_node+0xda/0xe6 [btrfs]
[39389.800012]  [<ffffffffa061ce83>] btrfs_lock_root_node+0x22/0x42 [btrfs]
[39389.800012]  [<ffffffffa062046b>] btrfs_search_slot+0x1b8/0x758 [btrfs]
[39389.800012]  [<ffffffff810fc6b0>] ? time_hardirqs_on+0x15/0x28
[39389.800012]  [<ffffffffa06365db>] btrfs_lookup_inode+0x31/0x95 [btrfs]
[39389.800012]  [<ffffffff8108d62f>] ? trace_hardirqs_on+0xd/0xf
[39389.800012]  [<ffffffff8148482b>] ? mutex_lock_nested+0x397/0x3bc
[39389.800012]  [<ffffffffa068821b>] __btrfs_update_delayed_inode+0x59/0x1c0 [btrfs]
[39389.800012]  [<ffffffffa068858e>] __btrfs_commit_inode_delayed_items+0x194/0x5aa [btrfs]
[39389.800012]  [<ffffffff81486ab7>] ? _raw_spin_unlock+0x31/0x44
[39389.800012]  [<ffffffffa0688a48>] __btrfs_run_delayed_items+0xa4/0x15c [btrfs]
[39389.800012]  [<ffffffffa0688d62>] btrfs_run_delayed_items+0x11/0x13 [btrfs]
[39389.800012]  [<ffffffffa064048e>] btrfs_commit_transaction+0x234/0x96e [btrfs]
[39389.800012]  [<ffffffffa0618d10>] btrfs_sync_fs+0x145/0x1ad [btrfs]
[39389.800012]  [<ffffffffa0671176>] btrfs_ioctl+0x11d2/0x2793 [btrfs]
[39389.800012]  [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
[39389.800012]  [<ffffffff81140261>] ? __might_fault+0x4c/0xa7
[39389.800012]  [<ffffffff81140261>] ? __might_fault+0x4c/0xa7
[39389.800012]  [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
[39389.800012]  [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d
[39389.800012]  [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea
[39389.800012]  [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71
[39389.800012]  [<ffffffff8118240e>] SyS_ioctl+0x57/0x79
[39389.800012]  [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
[39389.800012] Code: f0 0f b1 13 85 c0 75 ef eb 2a f3 90 8a 03 84 c0 75 f8 f0 0f b0 13 84 c0 75 f0 ba ff 00 00 00 eb 0a f0 0f b1 13 ff c8 74 0b f3 90 <8b> 03 83 f8 01 75 f7 eb ed c6 43 04 00 5b 5d c3 0f 1f 44 00 00

This happens because in the code path executed by the inode_paths ioctl we
end up nesting two calls to read lock a leaf's rwlock when after the first
call to read_lock() and before the second call to read_lock(), another
task (running the delayed items as part of a transaction commit) has
already called write_lock() against the leaf's rwlock. This situation is
illustrated by the following diagram:

         Task A                       Task B

  btrfs_ref_to_path()               btrfs_commit_transaction()
    read_lock(&eb->lock);

                                      btrfs_run_delayed_items()
                                        __btrfs_commit_inode_delayed_items()
                                          __btrfs_update_delayed_inode()
                                            btrfs_lookup_inode()

                                              write_lock(&eb->lock);
                                                --> task waits for lock

    read_lock(&eb->lock);
    --> makes this task hang
        forever (and task B too
	of course)

So fix this by avoiding doing the nested read lock, which is easily
avoidable. This issue does not happen if task B calls write_lock() after
task A does the second call to read_lock(), however there does not seem
to exist anything in the documentation that mentions what is the expected
behaviour for recursive locking of rwlocks (leaving the idea that doing
so is not a good usage of rwlocks).

Also, as a side effect necessary for this fix, make sure we do not
needlessly read lock extent buffers when the input path has skip_locking
set (used when called from send).

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:15 -08:00
Filipe Manana
be1232bcea Btrfs: fix page reading in extent_same ioctl leading to csum errors
commit 313140023026ae542ad76e7e268c56a1eaa2c28e upstream.

In the extent_same ioctl, we were grabbing the pages (locked) and
attempting to read them without bothering about any concurrent IO
against them. That is, we were not checking for any ongoing ordered
extents nor waiting for them to complete, which leads to a race where
the extent_same() code gets a checksum verification error when it
reads the pages, producing a message like the following in dmesg
and making the operation fail to user space with -ENOMEM:

[18990.161265] BTRFS warning (device sdc): csum failed ino 259 off 495616 csum 685204116 expected csum 1515870868

Fix this by using btrfs_readpage() for reading the pages instead of
extent_read_full_page_nolock(), which waits for any concurrent ordered
extents to complete and locks the io range. Also do better error handling
and don't treat all failures as -ENOMEM, as that's clearly misleasing,
becoming identical to the checks and operation of prepare_uptodate_page().

The use of extent_read_full_page_nolock() was required before
commit f441460202 ("btrfs: fix deadlock with extent-same and readpage"),
as we had the range locked in an inode's io tree before attempting to
read the pages.

Fixes: f441460202 ("btrfs: fix deadlock with extent-same and readpage")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:15 -08:00
Filipe Manana
df567e6dcd Btrfs: fix invalid page accesses in extent_same (dedup) ioctl
commit e0bd70c67bf996b360f706b6c643000f2e384681 upstream.

In the extent_same ioctl we are getting the pages for the source and
target ranges and unlocking them immediately after, which is incorrect
because later we attempt to map them (with kmap_atomic) and access their
contents at btrfs_cmp_data(). When we do such access the pages might have
been relocated or removed from memory, which leads to an invalid memory
access. This issue is detected on a kernel with CONFIG_DEBUG_PAGEALLOC=y
which produces a trace like the following:

186736.677437] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[186736.680382] Modules linked in: btrfs dm_flakey dm_mod ppdev xor raid6_pq sha256_generic hmac drbg ansi_cprng acpi_cpufreq evdev sg aesni_intel aes_x86_64
parport_pc ablk_helper tpm_tis psmouse parport i2c_piix4 tpm cryptd i2c_core lrw processor button serio_raw pcspkr gf128mul glue_helper loop autofs4 ext4
crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last
unloaded: btrfs]
[186736.681319] CPU: 13 PID: 10222 Comm: duperemove Tainted: G        W       4.4.0-rc6-btrfs-next-18+ #1
[186736.681319] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[186736.681319] task: ffff880132600400 ti: ffff880362284000 task.ti: ffff880362284000
[186736.681319] RIP: 0010:[<ffffffff81264d00>]  [<ffffffff81264d00>] memcmp+0xb/0x22
[186736.681319] RSP: 0018:ffff880362287d70  EFLAGS: 00010287
[186736.681319] RAX: 000002c002468acf RBX: 0000000012345678 RCX: 0000000000000000
[186736.681319] RDX: 0000000000001000 RSI: 0005d129c5cf9000 RDI: 0005d129c5cf9000
[186736.681319] RBP: ffff880362287d70 R08: 0000000000000000 R09: 0000000000001000
[186736.681319] R10: ffff880000000000 R11: 0000000000000476 R12: 0000000000001000
[186736.681319] R13: ffff8802f91d4c88 R14: ffff8801f2a77830 R15: ffff880352e83e40
[186736.681319] FS:  00007f27b37fe700(0000) GS:ffff88043dda0000(0000) knlGS:0000000000000000
[186736.681319] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[186736.681319] CR2: 00007f27a406a000 CR3: 0000000217421000 CR4: 00000000001406e0
[186736.681319] Stack:
[186736.681319]  ffff880362287ea0 ffffffffa048d0bd 000000000009f000 0000000000001000
[186736.681319]  0100000000000000 ffff8801f2a77850 ffff8802f91d49b0 ffff880132600400
[186736.681319]  00000000000004f8 ffff8801c1efbe41 0000000000000000 0000000000000038
[186736.681319] Call Trace:
[186736.681319]  [<ffffffffa048d0bd>] btrfs_ioctl+0x24cb/0x2731 [btrfs]
[186736.681319]  [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
[186736.681319]  [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d
[186736.681319]  [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea
[186736.681319]  [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71
[186736.681319]  [<ffffffff8118240e>] SyS_ioctl+0x57/0x79
[186736.681319]  [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
[186736.681319] Code: 0a 3c 6e 74 0d 3c 79 74 04 3c 59 75 0c c6 06 01 eb 03 c6 06 00 31 c0 eb 05 b8 ea ff ff ff 5d c3 55 31 c9 48 89 e5 48 39 d1 74 13 <0f> b6
04 0f 44 0f b6 04 0e 48 ff c1 44 29 c0 74 ea eb 02 31 c0

(gdb) list *(btrfs_ioctl+0x24cb)
0x5e0e1 is in btrfs_ioctl (fs/btrfs/ioctl.c:2972).
2967                    dst_addr = kmap_atomic(dst_page);
2968
2969                    flush_dcache_page(src_page);
2970                    flush_dcache_page(dst_page);
2971
2972                    if (memcmp(addr, dst_addr, cmp_len))
2973                            ret = BTRFS_SAME_DATA_DIFFERS;
2974
2975                    kunmap_atomic(addr);
2976                    kunmap_atomic(dst_addr);

So fix this by making sure we keep the pages locked and respect the same
locking order as everywhere else: get and lock the pages first and then
lock the range in the inode's io tree (like for example at
__btrfs_buffered_write() and extent_readpages()). If an ordered extent
is found after locking the range in the io tree, unlock the range,
unlock the pages, wait for the ordered extent to complete and repeat the
entire locking process until no overlapping ordered extents are found.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:15 -08:00
David Sterba
b58081d430 btrfs: properly set the termination value of ctx->pos in readdir
commit bc4ef7592f657ae81b017207a1098817126ad4cb upstream.

The value of ctx->pos in the last readdir call is supposed to be set to
INT_MAX due to 32bit compatibility, unless 'pos' is intentially set to a
larger value, then it's LLONG_MAX.

There's a report from PaX SIZE_OVERFLOW plugin that "ctx->pos++"
overflows (https://forums.grsecurity.net/viewtopic.php?f=1&t=4284), on a
64bit arch, where the value is 0x7fffffffffffffff ie. LLONG_MAX before
the increment.

We can get to that situation like that:

* emit all regular readdir entries
* still in the same call to readdir, bump the last pos to INT_MAX
* next call to readdir will not emit any entries, but will reach the
  bump code again, finds pos to be INT_MAX and sets it to LLONG_MAX

Normally this is not a problem, but if we call readdir again, we'll find
'pos' set to LLONG_MAX and the unconditional increment will overflow.

The report from Victor at
(http://thread.gmane.org/gmane.comp.file-systems.btrfs/49500) with debugging
print shows that pattern:

 Overflow: e
 Overflow: 7fffffff
 Overflow: 7fffffffffffffff
 PAX: size overflow detected in function btrfs_real_readdir
   fs/btrfs/inode.c:5760 cicus.935_282 max, count: 9, decl: pos; num: 0;
   context: dir_context;
 CPU: 0 PID: 2630 Comm: polkitd Not tainted 4.2.3-grsec #1
 Hardware name: Gigabyte Technology Co., Ltd. H81ND2H/H81ND2H, BIOS F3 08/11/2015
  ffffffff81901608 0000000000000000 ffffffff819015e6 ffffc90004973d48
  ffffffff81742f0f 0000000000000007 ffffffff81901608 ffffc90004973d78
  ffffffff811cb706 0000000000000000 ffff8800d47359e0 ffffc90004973ed8
 Call Trace:
  [<ffffffff81742f0f>] dump_stack+0x4c/0x7f
  [<ffffffff811cb706>] report_size_overflow+0x36/0x40
  [<ffffffff812ef0bc>] btrfs_real_readdir+0x69c/0x6d0
  [<ffffffff811dafc8>] iterate_dir+0xa8/0x150
  [<ffffffff811e6d8d>] ? __fget_light+0x2d/0x70
  [<ffffffff811dba3a>] SyS_getdents+0xba/0x1c0
 Overflow: 1a
  [<ffffffff811db070>] ? iterate_dir+0x150/0x150
  [<ffffffff81749b69>] entry_SYSCALL_64_fastpath+0x12/0x83

The jump from 7fffffff to 7fffffffffffffff happens when new dir entries
are not yet synced and are processed from the delayed list. Then the code
could go to the bump section again even though it might not emit any new
dir entries from the delayed list.

The fix avoids entering the "bump" section again once we've finished
emitting the entries, both for synced and delayed entries.

References: https://forums.grsecurity.net/viewtopic.php?f=1&t=4284
Reported-by: Victor <services@swwu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Tested-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:15 -08:00
David Sterba
dfd2961ab6 Revert "btrfs: clear PF_NOFREEZE in cleaner_kthread()"
commit 80ad623edd2d0ccb47d85357ee31c97e6c684e82 upstream.

This reverts commit 6962491321. The
cleaner thread can block freezing when there's a snapshot cleaning in
progress and the other threads get suspended first. From the logs
provided by Martin we're waiting for reading extent pages:

kernel: PM: Syncing filesystems ... done.
kernel: Freezing user space processes ... (elapsed 0.015 seconds) done.
kernel: Freezing remaining freezable tasks ...
kernel: Freezing of tasks failed after 20.003 seconds (1 tasks refusing to freeze, wq_busy=0):
kernel: btrfs-cleaner   D ffff88033dd13bc0     0   152      2 0x00000000
kernel: ffff88032ebc2e00 ffff88032e750000 ffff88032e74fa50 7fffffffffffffff
kernel: ffffffff814a58df 0000000000000002 ffffea000934d580 ffffffff814a5451
kernel: 7fffffffffffffff ffffffff814a6e8f 0000000000000000 0000000000000020
kernel: Call Trace:
kernel: [<ffffffff814a58df>] ? bit_wait+0x2c/0x2c
kernel: [<ffffffff814a5451>] ? schedule+0x6f/0x7c
kernel: [<ffffffff814a6e8f>] ? schedule_timeout+0x2f/0xd8
kernel: [<ffffffff81076f94>] ? timekeeping_get_ns+0xa/0x2e
kernel: [<ffffffff81077603>] ? ktime_get+0x36/0x44
kernel: [<ffffffff814a4f6c>] ? io_schedule_timeout+0x94/0xf2
kernel: [<ffffffff814a4f6c>] ? io_schedule_timeout+0x94/0xf2
kernel: [<ffffffff814a590b>] ? bit_wait_io+0x2c/0x30
kernel: [<ffffffff814a5694>] ? __wait_on_bit+0x41/0x73
kernel: [<ffffffff8109eba8>] ? wait_on_page_bit+0x6d/0x72
kernel: [<ffffffff8105d718>] ? autoremove_wake_function+0x2a/0x2a
kernel: [<ffffffff811a02d7>] ? read_extent_buffer_pages+0x1bd/0x203
kernel: [<ffffffff8117d9e9>] ? free_root_pointers+0x4c/0x4c
kernel: [<ffffffff8117e831>] ? btree_read_extent_buffer_pages.constprop.57+0x5a/0xe9
kernel: [<ffffffff8117f4f3>] ? read_tree_block+0x2d/0x45
kernel: [<ffffffff8116782a>] ? read_block_for_search.isra.34+0x22a/0x26b
kernel: [<ffffffff811656c3>] ? btrfs_set_path_blocking+0x1e/0x4a
kernel: [<ffffffff8116919b>] ? btrfs_search_slot+0x648/0x736
kernel: [<ffffffff81170559>] ? btrfs_lookup_extent_info+0xb7/0x2c7
kernel: [<ffffffff81170ee5>] ? walk_down_proc+0x9c/0x1ae
kernel: [<ffffffff81171c9d>] ? walk_down_tree+0x40/0xa4
kernel: [<ffffffff8117375f>] ? btrfs_drop_snapshot+0x2da/0x664
kernel: [<ffffffff8104ff21>] ? finish_task_switch+0x126/0x167
kernel: [<ffffffff811850f8>] ? btrfs_clean_one_deleted_snapshot+0xa6/0xb0
kernel: [<ffffffff8117eaba>] ? cleaner_kthread+0x13e/0x17b
kernel: [<ffffffff8117e97c>] ? btrfs_item_end+0x33/0x33
kernel: [<ffffffff8104d256>] ? kthread+0x95/0x9d
kernel: [<ffffffff8104d1c1>] ? kthread_parkme+0x16/0x16
kernel: [<ffffffff814a7b5f>] ? ret_from_fork+0x3f/0x70
kernel: [<ffffffff8104d1c1>] ? kthread_parkme+0x16/0x16

As this affects a released kernel (4.4) we need a minimal fix for
stable kernels.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=108361
Reported-by: Martin Ziegler <ziegler@uni-freiburg.de>
CC: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:15 -08:00