Provide userspace interface for tasks to be grouped together as
"related" threads. For example, all threads involved in updating
display buffer could be tagged as related.
Scheduler will attempt to provide special treatment for group of
related threads such as:
1) Colocation of related threads in same "preferred" cluster
2) Aggregation of demand towards determination of cluster frequency
This patch extends scheduler to provide best-effort colocation support
for a group of related threads.
Change-Id: Ic2cd769faf5da4d03a8f3cb0ada6224d0101a5f5
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor merge conflicts. removed ifdefry
for CONFIG_SCHED_QHMP.]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
While compiling for usermode linux for x86 architecture, observed
compilation issues with probable usage of uninitialized variables.
This change initializes the variables.
Signed-off-by: Jeevan Shriram <jshriram@codeaurora.org>
Fixes implementation of cache cleaning, it was implemented using outdated
code which cleaned valid pages as well by mistake. The fix uses the
original implementation from mm layer which was tweaked to zero pages
after releasing them. Also it is now invoked only when there is HW
encryption (meaning that cache is unencrypted)
Change-Id: If713fd17a89be6d998c1a7ba86bee4c17d2bca0f
Signed-off-by: Andrey Markovytch <andreym@codeaurora.org>
Conflicts:
fs/ecryptfs/caches_utils.c
Exposes drop_pagecache_sb (required by eCryptfs cache wiping)
Adds truncate_inode_pages_fill_zero (required by eCryptfs cache wiping),
which not only truncates pages but also fills them with 0, so that the
cached data can no longer be retrieved.
Change-Id: Icfc18a2c8cdc922e71ee17add6459a1355e77ba6
Signed-off-by: Andrey Markovytch <andreym@codeaurora.org>
[gbroner@codeaurora.org: fix merge conflict]
Signed-off-by: Gilad Broner <gbroner@codeaurora.org>
1. Fixed bug which didn't allow several threads to work simultaneously
on files in eCryptfs mounted folder
2. Fixed bug where PFK close callback was invoked multiple times when
files was opened and closed multiple times. Now it is invoked just once
when files is closed for the last time
Change-Id: Iaa3ada03500e5a12752918b5d2bb4a852ddca5f0
Signed-off-by: Andrey Markovytch <andreym@codeaurora.org>
When key is set in ICE via TZ, HLOS should send two parts, SALT and
the KEY itself according to AES standards. KEY was used for both parts.
Change-Id: I453dea289b01bdf49352d5209255966052f5dc1b
Signed-off-by: Andrey Markovytch <andreym@codeaurora.org>
enabled eCryptfs for qcom targets. In addition to the usual
options, a special mode 'aes-xts' was added for qcom ICE hw
encryption
Change-Id: I20c01adc46c977b4a5db0be9ff93384cda14bc56
Signed-off-by: Lina Zarivach <linaz@codeaurora.org>
Signed-off-by: Andrey Markovytch <andreym@codeaurora.org>
[gbroner@codeaurora.org: fix merge conflict]
Signed-off-by: Gilad Broner <gbroner@codeaurora.org>
Integrated from msm-3.14. Additional fixes were made to compile with the
new kernel and various new warnings and checkpatch issues were fixed
Change-Id: I073db1041e41eac9066e37ee099f1da9e4eed6c0
Signed-off-by: Andrey Markovytch <andreym@codeaurora.org>
[gbroner@codeaurora.org: fixed merge conflict and adapted the LSM
security hooks]
Signed-off-by: Gilad Broner <gbroner@codeaurora.org>
Add a comment in key.h to explain why we keep an unused
parameter in key helpers.
CRs-Fixed: 975289
Change-Id: I88633f34f7f49b5b895f2295c1ce4ad250900fc0
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Git-commit: 170eb55f7d4ba9564736ba298a7d4985422db4cc
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Nikhilesh Reddy <reddyn@codeaurora.org>
* lsk-44/linux-linaro-lsk-v4.4:
Linux 4.4.3
modules: fix modparam async_probe request
module: wrapper for symbol name.
itimers: Handle relative timers with CONFIG_TIME_LOW_RES proper
posix-timers: Handle relative timers with CONFIG_TIME_LOW_RES proper
timerfd: Handle relative timers with CONFIG_TIME_LOW_RES proper
prctl: take mmap sem for writing to protect against others
xfs: log mount failures don't wait for buffers to be released
Revert "xfs: clear PF_NOFREEZE for xfsaild kthread"
xfs: inode recovery readahead can race with inode buffer creation
libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct
ovl: setattr: check permissions before copy-up
ovl: root: copy attr
ovl: check dentry positiveness in ovl_cleanup_whiteouts()
ovl: use a minimal buffer in ovl_copy_xattr
ovl: allow zero size xattr
futex: Drop refcount if requeue_pi() acquired the rtmutex
devm_memremap_release(): fix memremap'd addr handling
ipc/shm: handle removed segments gracefully in shm_mmap()
intel_scu_ipcutil: underflow in scu_reg_access()
mm,thp: khugepaged: call pte flush at the time of collapse
dump_stack: avoid potential deadlocks
radix-tree: fix oops after radix_tree_iter_retry
drivers/hwspinlock: fix race between radix tree insertion and lookup
radix-tree: fix race in gang lookup
MAINTAINERS: return arch/sh to maintained state, with new maintainers
memcg: only free spare array when readers are done
numa: fix /proc/<pid>/numa_maps for hugetlbfs on s390
fs/hugetlbfs/inode.c: fix bugs in hugetlb_vmtruncate_list()
scripts/bloat-o-meter: fix python3 syntax error
dma-debug: switch check from _text to _stext
m32r: fix m32104ut_defconfig build fail
xhci: Fix list corruption in urb dequeue at host removal
Revert "xhci: don't finish a TD if we get a short-transfer event mid TD"
iommu/vt-d: Clear PPR bit to ensure we get more page request interrupts
iommu/vt-d: Fix 64-bit accesses to 32-bit DMAR_GSTS_REG
iommu/vt-d: Fix mm refcounting to hold mm_count not mm_users
iommu/amd: Correct the wrong setting of alias DTE in do_attach
iommu/vt-d: Don't skip PCI devices when disabling IOTLB
Input: vmmouse - fix absolute device registration
string_helpers: fix precision loss for some inputs
Input: i8042 - add Fujitsu Lifebook U745 to the nomux list
Input: elantech - mark protocols v2 and v3 as semi-mt
mm: fix regression in remap_file_pages() emulation
mm: replace vma_lock_anon_vma with anon_vma_lock_read/write
mm: fix mlock accouting
libnvdimm: fix namespace object confusion in is_uuid_busy()
mm: soft-offline: check return value in second __get_any_page() call
perf kvm record/report: 'unprocessable sample' error while recording/reporting guest data
KVM: PPC: Fix ONE_REG AltiVec support
KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8
KVM: arm/arm64: Fix reference to uninitialised VGIC
arm64: dma-mapping: fix handling of devices registered before arch_initcall
ARM: OMAP2+: Fix ppa_zero_params and ppa_por_params for rodata
ARM: OMAP2+: Fix save_secure_ram_context for rodata
ARM: OMAP2+: Fix l2dis_3630 for rodata
ARM: OMAP2+: Fix l2_inv_api_params for rodata
ARM: OMAP2+: Fix wait_dll_lock_timed for rodata
ARM: dts: at91: sama5d4ek: add phy address and IRQ for macb0
ARM: dts: at91: sama5d4 xplained: fix phy0 IRQ type
ARM: dts: at91: sama5d4: fix instance id of DBGU
ARM: dts: at91: sama5d4 xplained: properly mux phy interrupt
ARM: dts: omap5-board-common: enable rtc and charging of backup battery
ARM: dts: Fix omap5 PMIC control lines for RTC writes
ARM: dts: Fix wl12xx missing clocks that cause hangs
ARM: nomadik: fix up SD/MMC DT settings
ARM: 8517/1: ICST: avoid arithmetic overflow in icst_hz()
ARM: 8519/1: ICST: try other dividends than 1
arm64: mm: avoid calling apply_to_page_range on empty range
ARM: mvebu: remove duplicated regulator definition in Armada 388 GP
powerpc/ioda: Set "read" permission when "write" is set
powerpc/powernv: Fix stale PE primary bus
powerpc/eeh: Fix stale cached primary bus
powerpc/eeh: Fix PE location code
SUNRPC: Fixup socket wait for memory
udf: Check output buffer length when converting name to CS0
udf: Prevent buffer overrun with multi-byte characters
udf: limit the maximum number of indirect extents in a row
pNFS/flexfiles: Fix an XDR encoding bug in layoutreturn
nfs: Fix race in __update_open_stateid()
pNFS/flexfiles: Fix an Oopsable typo in ff_mirror_match_fh()
NFS: Fix attribute cache revalidation
cifs: fix erroneous return value
cifs_dbg() outputs an uninitialized buffer in cifs_readdir()
cifs: fix race between call_async() and reconnect()
cifs: Ratelimit kernel log messages
iio: inkern: fix a NULL dereference on error
iio: pressure: mpl115: fix temperature offset sign
iio: light: acpi-als: Report data as processed
iio: dac: mcp4725: set iio name property in sysfs
iio: add IIO_TRIGGER dependency to STK8BA50
iio: add HAS_IOMEM dependency to VF610_ADC
iio-light: Use a signed return type for ltr501_match_samp_freq()
iio:adc:ti_am335x_adc Fix buffered mode by identifying as software buffer.
iio: adis_buffer: Fix out-of-bounds memory access
scsi: fix soft lockup in scsi_remove_target() on module removal
SCSI: Add Marvell Console to VPD blacklist
scsi_dh_rdac: always retry MODE SELECT on command lock violation
drivers/scsi/sg.c: mark VMA as VM_IO to prevent migration
SCSI: fix crashes in sd and sr runtime PM
iscsi-target: Fix potential dead-lock during node acl delete
scsi: add Synology to 1024 sector blacklist
klist: fix starting point removed bug in klist iterators
tracepoints: Do not trace when cpu is offline
tracing: Fix freak link error caused by branch tracer
perf tools: tracepoint_error() can receive e=NULL, robustify it
tools lib traceevent: Fix output of %llu for 64 bit values read on 32 bit machines
ptrace: use fsuid, fsgid, effective creds for fs access checks
Btrfs: fix direct IO requests not reporting IO error to user space
Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl
Btrfs: fix page reading in extent_same ioctl leading to csum errors
Btrfs: fix invalid page accesses in extent_same (dedup) ioctl
btrfs: properly set the termination value of ctx->pos in readdir
Revert "btrfs: clear PF_NOFREEZE in cleaner_kthread()"
Btrfs: fix fitrim discarding device area reserved for boot loader's use
btrfs: handle invalid num_stripes in sys_array
ext4: don't read blocks from disk after extents being swapped
ext4: fix potential integer overflow
ext4: fix scheduling in atomic on group checksum failure
serial: omap: Prevent DoS using unprivileged ioctl(TIOCSRS485)
serial: 8250_pci: Add Intel Broadwell ports
tty: Add support for PCIe WCH382 2S multi-IO card
pty: make sure super_block is still valid in final /dev/tty close
pty: fix possible use after free of tty->driver_data
staging/speakup: Use tty_ldisc_ref() for paste kworker
phy: twl4030-usb: Fix unbalanced pm_runtime_enable on module reload
phy: twl4030-usb: Relase usb phy on unload
ALSA: seq: Fix double port list deletion
ALSA: seq: Fix leak of pool buffer at concurrent writes
ALSA: pcm: Fix rwsem deadlock for non-atomic PCM stream
ALSA: hda - Cancel probe work instead of flush at remove
x86/mm: Fix vmalloc_fault() to handle large pages properly
x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache()
x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable
x86/mm/pat: Avoid truncation when converting cpa->numpages to address
x86/mm: Fix types used in pgprot cacheability flags translations
Linux 4.4.2
HID: multitouch: fix input mode switching on some Elan panels
mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any progress
zsmalloc: fix migrate_zspage-zs_free race condition
zram: don't call idr_remove() from zram_remove()
zram: try vmalloc() after kmalloc()
zram/zcomp: use GFP_NOIO to allocate streams
rtlwifi: rtl8821ae: Fix 5G failure when EEPROM is incorrectly encoded
rtlwifi: rtl8821ae: Fix errors in parameter initialization
crypto: marvell/cesa - fix test in mv_cesa_dev_dma_init()
crypto: atmel-sha - remove calls of clk_prepare() from atomic contexts
crypto: atmel-sha - fix atmel_sha_remove()
crypto: algif_skcipher - Do not set MAY_BACKLOG on the async path
crypto: algif_skcipher - Do not dereference ctx without socket lock
crypto: algif_skcipher - Do not assume that req is unchanged
crypto: user - lock crypto_alg_list on alg dump
EVM: Use crypto_memneq() for digest comparisons
crypto: algif_hash - wait for crypto_ahash_init() to complete
crypto: shash - Fix has_key setting
crypto: chacha20-ssse3 - Align stack pointer to 64 bytes
crypto: caam - make write transactions bufferable on PPC platforms
crypto: algif_skcipher - sendmsg SG marking is off by one
crypto: algif_skcipher - Load TX SG list after waiting
crypto: crc32c - Fix crc32c soft dependency
crypto: algif_skcipher - Fix race condition in skcipher_check_key
crypto: algif_hash - Fix race condition in hash_check_key
crypto: af_alg - Forbid bind(2) when nokey child sockets are present
crypto: algif_skcipher - Remove custom release parent function
crypto: algif_hash - Remove custom release parent function
crypto: af_alg - Allow af_af_alg_release_parent to be called on nokey path
ahci: Intel DNV device IDs SATA
libata: disable forced PORTS_IMPL for >= AHCI 1.3
crypto: algif_skcipher - Add key check exception for cipher_null
crypto: skcipher - Add crypto_skcipher_has_setkey
crypto: algif_hash - Require setkey before accept(2)
crypto: hash - Add crypto_ahash_has_setkey
crypto: algif_skcipher - Add nokey compatibility path
crypto: af_alg - Add nokey compatibility path
crypto: af_alg - Fix socket double-free when accept fails
crypto: af_alg - Disallow bind/setkey/... after accept(2)
crypto: algif_skcipher - Require setkey before accept(2)
sched: Fix crash in sched_init_numa()
ext4 crypto: add missing locking for keyring_key access
iommu/io-pgtable-arm: Ensure we free the final level on teardown
tty: Fix unsafe ldisc reference via ioctl(TIOCGETD)
tty: Retry failed reopen if tty teardown in-progress
tty: Wait interruptibly for tty lock on reopen
n_tty: Fix unsafe reference to "other" ldisc
usb: xhci: apply XHCI_PME_STUCK_QUIRK to Intel Broxton-M platforms
usb: xhci: handle both SSIC ports in PME stuck quirk
usb: phy: msm: fix error handling in probe.
usb: cdc-acm: send zero packet for intel 7260 modem
usb: cdc-acm: handle unlinked urb in acm read callback
USB: option: fix Cinterion AHxx enumeration
USB: serial: option: Adding support for Telit LE922
USB: cp210x: add ID for IAI USB to RS485 adaptor
USB: serial: ftdi_sio: add support for Yaesu SCU-18 cable
usb: hub: do not clear BOS field during reset device
USB: visor: fix null-deref at probe
USB: serial: visor: fix crash on detecting device without write_urbs
ASoC: rt5645: fix the shift bit of IN1 boost
saa7134-alsa: Only frees registered sound cards
ALSA: dummy: Implement timer backend switching more safely
ALSA: hda - Fix bad dereference of jack object
ALSA: hda - Fix speaker output from VAIO AiO machines
Revert "ALSA: hda - Fix noise on Gigabyte Z170X mobo"
ALSA: hda - Fix static checker warning in patch_hdmi.c
ALSA: hda - Add fixup for Mac Mini 7,1 model
ALSA: timer: Fix race between stop and interrupt
ALSA: timer: Fix wrong instance passed to slave callbacks
ALSA: timer: Fix race at concurrent reads
ALSA: timer: Fix link corruption due to double start or stop
ALSA: timer: Fix leftover link at closing
ALSA: timer: Code cleanup
ALSA: seq: Fix lockdep warnings due to double mutex locks
ALSA: seq: Fix race at closing in virmidi driver
ALSA: seq: Fix yet another races among ALSA timer accesses
ASoC: dpcm: fix the BE state on hw_free
ALSA: pcm: Fix potential deadlock in OSS emulation
ALSA: hda/realtek - Support Dell headset mode for ALC225
ALSA: hda/realtek - Support headset mode for ALC225
ALSA: hda/realtek - New codec support of ALC225
ALSA: rawmidi: Fix race at copying & updating the position
ALSA: rawmidi: Remove kernel WARNING for NULL user-space buffer check
ALSA: rawmidi: Make snd_rawmidi_transmit() race-free
ALSA: seq: Degrade the error message for too many opens
ALSA: seq: Fix incorrect sanity check at snd_seq_oss_synth_cleanup()
ALSA: dummy: Disable switching timer backend via sysfs
ALSA: compress: Disable GET_CODEC_CAPS ioctl for some architectures
ALSA: hda - disable dynamic clock gating on Broxton before reset
ALSA: Add missing dependency on CONFIG_SND_TIMER
ALSA: bebob: Use a signed return type for get_formation_index
ALSA: usb-audio: avoid freeing umidi object twice
ALSA: usb-audio: Add native DSD support for PS Audio NuWave DAC
ALSA: usb-audio: Fix OPPO HA-1 vendor ID
ALSA: usb-audio: Add quirk for Microsoft LifeCam HD-6000
ALSA: usb-audio: Fix TEAC UD-501/UD-503/NT-503 usb delay
hrtimer: Handle remaining time proper for TIME_LOW_RES
md/raid: only permit hot-add of compatible integrity profiles
media: i2c: Don't export ir-kbd-i2c module alias
parisc: Fix __ARCH_SI_PREAMBLE_SIZE
parisc: Protect huge page pte changes with spinlocks
printk: do cond_resched() between lines while outputting to consoles
tracing/stacktrace: Show entire trace if passed in function not found
tracing: Fix stacktrace skip depth in trace_buffer_unlock_commit_regs()
PCI: Fix minimum allocation address overwrite
PCI: host: Mark PCIe/PCI (MSI) IRQ cascade handlers as IRQF_NO_THREAD
mtd: nand: assign reasonable default name for NAND drivers
wlcore/wl12xx: spi: fix NULL pointer dereference (Oops)
wlcore/wl12xx: spi: fix oops on firmware load
ocfs2/dlm: clear refmap bit of recovery lock while doing local recovery cleanup
ocfs2/dlm: ignore cleaning the migration mle that is inuse
ALSA: hda - Implement loopback control switch for Realtek and other codecs
block: fix bio splitting on max sectors
base/platform: Fix platform drivers with no probe callback
HID: usbhid: fix recursive deadlock
ocfs2: NFS hangs in __ocfs2_cluster_lock due to race with ocfs2_unblock_lock
block: split bios to max possible length
NFSv4.1/pnfs: Fixup an lo->plh_block_lgets imbalance in layoutreturn
crypto: sun4i-ss - add missing statesize
Linux 4.4.1
arm64: kernel: fix architected PMU registers unconditional access
arm64: kernel: enforce pmuserenr_el0 initialization and restore
arm64: mm: ensure that the zero page is visible to the page table walker
arm64: Clear out any singlestep state on a ptrace detach operation
powerpc/module: Handle R_PPC64_ENTRY relocations
scripts/recordmcount.pl: support data in text section on powerpc
powerpc: Make {cmp}xchg* and their atomic_ versions fully ordered
powerpc: Make value-returning atomics fully ordered
powerpc/tm: Check for already reclaimed tasks
batman-adv: Drop immediate orig_node free function
batman-adv: Drop immediate batadv_hard_iface free function
batman-adv: Drop immediate neigh_ifinfo free function
batman-adv: Drop immediate batadv_neigh_node free function
batman-adv: Drop immediate batadv_orig_ifinfo free function
batman-adv: Avoid recursive call_rcu for batadv_nc_node
batman-adv: Avoid recursive call_rcu for batadv_bla_claim
team: Replace rcu_read_lock with a mutex in team_vlan_rx_kill_vid
net/mlx5_core: Fix trimming down IRQ number
bridge: fix lockdep addr_list_lock false positive splat
ipv6: update skb->csum when CE mark is propagated
net: bpf: reject invalid shifts
phonet: properly unshare skbs in phonet_rcv()
dwc_eth_qos: Fix dma address for multi-fragment skbs
bonding: Prevent IPv6 link local address on enslaved devices
net: preserve IP control block during GSO segmentation
udp: disallow UFO for sockets with SO_NO_CHECK option
net: pktgen: fix null ptr deref in skb allocation
sched,cls_flower: set key address type when present
tcp_yeah: don't set ssthresh below 2
ipv6: tcp: add rcu locking in tcp_v6_send_synack()
net: sctp: prevent writes to cookie_hmac_alg from accessing invalid memory
vxlan: fix test which detect duplicate vxlan iface
unix: properly account for FDs passed over unix sockets
xhci: refuse loading if nousb is used
usb: core: lpm: fix usb3_hardware_lpm sysfs node
USB: cp210x: add ID for ELV Marble Sound Board 1
rtlwifi: fix memory leak for USB device
ASoC: compress: Fix compress device direction check
ASoC: wm5110: Fix PGA clear when disabling DRE
ALSA: timer: Handle disconnection more safely
ALSA: hda - Flush the pending probe work at remove
ALSA: hda - Fix missing module loading with model=generic option
ALSA: hda - Fix bass pin fixup for ASUS N550JX
ALSA: control: Avoid kernel warnings from tlv ioctl with numid 0
ALSA: hrtimer: Fix stall by hrtimer_cancel()
ALSA: pcm: Fix snd_pcm_hw_params struct copy in compat mode
ALSA: seq: Fix snd_seq_call_port_info_ioctl in compat mode
ALSA: hda - Add fixup for Dell Latitidue E6540
ALSA: timer: Fix double unlink of active_list
ALSA: timer: Fix race among timer ioctls
ALSA: hda - fix the headset mic detection problem for a Dell laptop
ALSA: timer: Harden slave timer list handling
ALSA: usb-audio: Fix mixer ctl regression of Native Instrument devices
ALSA: hda - Fix white noise on Dell Latitude E5550
ALSA: seq: Fix race at timer setup and close
ALSA: usb-audio: Avoid calling usb_autopm_put_interface() at disconnect
ALSA: seq: Fix missing NULL check at remove_events ioctl
ALSA: hda - Fixup inverted internal mic for Lenovo E50-80
ALSA: usb: Add native DSD support for Oppo HA-1
x86/mm: Improve switch_mm() barrier comments
x86/mm: Add barriers and document switch_mm()-vs-flush synchronization
x86/boot: Double BOOT_HEAP_SIZE to 64KB
x86/reboot/quirks: Add iMac10,1 to pci_reboot_dmi_table[]
kvm: x86: Fix vmwrite to SECONDARY_VM_EXEC_CONTROL
KVM: x86: correctly print #AC in traces
KVM: x86: expose MSR_TSC_AUX to userspace
x86/xen: don't reset vcpu_info on a cancelled suspend
KEYS: Fix keyring ref leak in join_session_keyring()
Conflicts:
arch/arm64/kernel/perf_event.c
drivers/scsi/sd.c
sound/core/compress_offload.c
Change-Id: I9f77fe42aaae249c24cd6e170202110ab1426878
Signed-off-by: Trilok Soni <tsoni@codeaurora.org>
sched_prefer_idle flag controls whether tasks can be woken to any
available idle cpu. It may be desirable to set sched_prefer_idle to 0
so that most tasks wake up to non-idle cpus under mostly_idle
threshold and have specialized tasks override this behavior through
other means. PF_WAKE_UP_IDLE flag per task provides exactly that. It
lets tasks with PF_WAKE_UP_IDLE flag set be woken up to any available
idle cpu independent of sched_prefer_idle flag setting. Currently
only kernel-space API exists to set PF_WAKE_UP_IDLE flag for a task.
This patch adds a user-space API (in /proc filesystem) to set
PF_WAKE_UP_IDLE flag for a given task. /proc/[pid]/sched_wake_up_idle
file can be written to set or clear PF_WAKE_UP_IDLE flag for a given
task.
Change-Id: I13a37e740195e503f457ebe291d54e83b230fbeb
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in kernel/sched/fair.c]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Add a per-task attribute, init_load_pct, that is used to initialize
newly created children's initial task load. This helps important
applications launch their child tasks on cpus with highest capacity.
Change-Id: Ie9665fd2aeb15203f95fd7f211c50bebbaa18727
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict int init_new_task_load.
se.avg.runnable_avg_sum has deprecated.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
update seemp_api header file
update seemp_parm_id header file
update seemp_core to log data with blk header set to 64B
remove logging from fs proc base and net socket modules
Change-Id: I583e3129d62651b155b0372e173564d5a17e3153
Signed-off-by: Mona Hossain <mhossain@codeaurora.org>
Improves the ability of a malware protection program
to detect anomalies in various activities. It records
task activities in a log and rates the actions
according to how a typical user would use the tools.
Change-Id: I976bc97f57215f173b046326b5f905522d785288
Signed-off-by: Mona Hossain <mhossain@codeaurora.org>
Signed-off-by: William Clark <wclark@codeaurora.org>
Add support for filesystem passthrough read/write of files
when enabled in userspace through the option FUSE_PASSTHROUGH.
There are many FUSE based filesystems that perform checks or
enforce policy or perform some kind of decision making in certain
functions like the "open" call but simply act as a "passthrough"
when performing operations such as read or write.
When FUSE_PASSTHROUGH is enabled all the reads and writes
to the fuse mount point go directly to the passthrough filesystem
i.e a native filesystem that actually hosts the files rather than
through the fuse daemon. All requests that aren't read/write still
go thought the userspace code.
This allows for significantly better performance on read and writes.
The difference in performance between fuse and the native lower
filesystem is negligible.
There is also a significant cpu/power savings that is achieved which
is really important on embedded systems that use fuse for I/O.
Changelog:
v5:
Fix the check when setting the passthrough file
[Found when testing by Mike Shal]
v3 and v4:
Use the fs_stack_depth to prevent further stacking and a minor fix
[Fix suggested by Jann Horn]
v2:
Changed the feature name to passthrough from stacked_io
[Proposed by Linus Torvalds]
Signed-off-by: Nikhilesh Reddy <reddyn@codeaurora.org>
Android does not support powering-up the phone through alarm.
Set rtc alarm in timerfd to power-up the phone after alarm
expiration.
Change-Id: I781389c658fb00ba7f0ce089d706c10f202a7dc6
Signed-off-by: Mao Jinlong <c_jmao@codeaurora.org>
Adding a new element "tsk_dirty" to struct page increases the size
of mem_map/vmemmap, restrict this to a debug only functionality to
save few MB of memory.
Considering a system with 1G of RAM, there will be nearly 262144
pages and thus that many number of page structures in mem_map/vmemmap.
With pointer size of 8 bytes on a 64 bit system, adding this
pointer to "struct page" means an increase of "2MB" for mem_map.
CRs-Fixed: 738692
Change-Id: Idf3217dcbe17cf1ab4d462d2aa8d39da1ffd8b13
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
[venkatg@codeaurora.org: Fixed trivial merge conflict]
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
Background writes happen in the context of a background thread.
It is very useful to identify the actual task that generated the
request instead of background task that submited the request.
Hence keep track of the task when a page gets dirtied and dump
this task info while tracing. Not all the pages in the bio are
dirtied by the same task but most likely it will be, since the
sectors accessed on the device must be adjacent.
Change-Id: I6afba85a2063dd3350a0141ba87cf8440ce9f777
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
[venkatg@codeaurora.org: Fixed trivial merge conflicts]
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
commit b62526ed11a1fe3861ab98d40b7fdab8981d788a upstream.
Helge reported that a relative timer can return a remaining time larger than
the programmed relative time on parisc and other architectures which have
CONFIG_TIME_LOW_RES set. This happens because we add a jiffie to the resulting
expiry time to prevent short timeouts.
Use the new function hrtimer_expires_remaining_adjusted() to calculate the
remaining time. It takes that extra added time into account for relative
timers.
Reported-and-tested-by: Helge Deller <deller@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: linux-m68k@lists.linux-m68k.org
Cc: dhowells@redhat.com
Link: http://lkml.kernel.org/r/20160114164159.354500742@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 85bec5460ad8e05e0a8d70fb0f6750eb719ad092 upstream.
Recently I've been seeing xfs/051 fail on 1k block size filesystems.
Trying to trace the events during the test lead to the problem going
away, indicating that it was a race condition that lead to this
ASSERT failure:
XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 156
.....
[<ffffffff814e1257>] xfs_free_perag+0x87/0xb0
[<ffffffff814e21b9>] xfs_mountfs+0x4d9/0x900
[<ffffffff814e5dff>] xfs_fs_fill_super+0x3bf/0x4d0
[<ffffffff811d8800>] mount_bdev+0x180/0x1b0
[<ffffffff814e3ff5>] xfs_fs_mount+0x15/0x20
[<ffffffff811d90a8>] mount_fs+0x38/0x170
[<ffffffff811f4347>] vfs_kern_mount+0x67/0x120
[<ffffffff811f7018>] do_mount+0x218/0xd60
[<ffffffff811f7e5b>] SyS_mount+0x8b/0xd0
When I finally caught it with tracing enabled, I saw that AG 2 had
an elevated reference count and a buffer was responsible for it. I
tracked down the specific buffer, and found that it was missing the
final reference count release that would put it back on the LRU and
hence be found by xfs_wait_buftarg() calls in the log mount failure
handling.
The last four traces for the buffer before the assert were (trimmed
for relevance)
kworker/0:1-5259 xfs_buf_iodone: hold 2 lock 0 flags ASYNC
kworker/0:1-5259 xfs_buf_ioerror: hold 2 lock 0 error -5
mount-7163 xfs_buf_lock_done: hold 2 lock 0 flags ASYNC
mount-7163 xfs_buf_unlock: hold 2 lock 1 flags ASYNC
This is an async write that is completing, so there's nobody waiting
for it directly. Hence we call xfs_buf_relse() once all the
processing is complete. That does:
static inline void xfs_buf_relse(xfs_buf_t *bp)
{
xfs_buf_unlock(bp);
xfs_buf_rele(bp);
}
Now, it's clear that mount is waiting on the buffer lock, and that
it has been released by xfs_buf_relse() and gained by mount. This is
expected, because at this point the mount process is in
xfs_buf_delwri_submit() waiting for all the IO it submitted to
complete.
The mount process, however, is waiting on the lock for the buffer
because it is in xfs_buf_delwri_submit(). This waits for IO
completion, but it doesn't wait for the buffer reference owned by
the IO to go away. The mount process collects all the completions,
fails the log recovery, and the higher level code then calls
xfs_wait_buftarg() to free all the remaining buffers in the
filesystem.
The issue is that on unlocking the buffer, the scheduler has decided
that the mount process has higher priority than the the kworker
thread that is running the IO completion, and so immediately
switched contexts to the mount process from the semaphore unlock
code, hence preventing the kworker thread from finishing the IO
completion and releasing the IO reference to the buffer.
Hence by the time that xfs_wait_buftarg() is run, the buffer still
has an active reference and so isn't on the LRU list that the
function walks to free the remaining buffers. Hence we miss that
buffer and continue onwards to tear down the mount structures,
at which time we get find a stray reference count on the perag
structure. On a non-debug kernel, this will be ignored and the
structure torn down and freed. Hence when the kworker thread is then
rescheduled and the buffer released and freed, it will access a
freed perag structure.
The problem here is that when the log mount fails, we still need to
quiesce the log to ensure that the IO workqueues have returned to
idle before we run xfs_wait_buftarg(). By synchronising the
workqueues, we ensure that all IO completions are fully processed,
not just to the point where buffers have been unlocked. This ensures
we don't end up in the situation above.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3e85286e75224fa3f08bdad20e78c8327742634e upstream.
This reverts commit 24ba16bb3d as it
prevents machines from suspending. This regression occurs when the
xfsaild is idle on entry to suspend, and so there s no activity to
wake it from it's idle sleep and hence see that it is supposed to
freeze. Hence the freezer times out waiting for it and suspend is
cancelled.
There is no obvious fix for this short of freezing the filesystem
properly, so revert this change for now.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b79f4a1c68bb99152d0785ee4ea3ab4396cdacc6 upstream.
When we do inode readahead in log recovery, we do can do the
readahead before we've replayed the icreate transaction that stamps
the buffer with inode cores. The inode readahead verifier catches
this and marks the buffer as !done to indicate that it doesn't yet
contain valid inodes.
In adding buffer error notification (i.e. setting b_error = -EIO at
the same time as as we clear the done flag) to such a readahead
verifier failure, we can then get subsequent inode recovery failing
with this error:
XFS (dm-0): metadata I/O error: block 0xa00060 ("xlog_recover_do..(read#2)") error 5 numblks 32
This occurs when readahead completion races with icreate item replay
such as:
inode readahead
find buffer
lock buffer
submit RA io
....
icreate recovery
xfs_trans_get_buffer
find buffer
lock buffer
<blocks on RA completion>
.....
<ra completion>
fails verifier
clear XBF_DONE
set bp->b_error = -EIO
release and unlock buffer
<icreate gains lock>
icreate initialises buffer
marks buffer as done
adds buffer to delayed write queue
releases buffer
At this point, we have an initialised inode buffer that is up to
date but has an -EIO state registered against it. When we finally
get to recovering an inode in that buffer:
inode item recovery
xfs_trans_read_buffer
find buffer
lock buffer
sees XBF_DONE is set, returns buffer
sees bp->b_error is set
fail log recovery!
Essentially, we need xfs_trans_get_buf_map() to clear the error status of
the buffer when doing a lookup. This function returns uninitialised
buffers, so the buffer returned can not be in an error state and
none of the code that uses this function expects b_error to be set
on return. Indeed, there is an ASSERT(!bp->b_error); in the
transaction case in xfs_trans_get_buf_map() that would have caught
this if log recovery used transactions....
This patch firstly changes the inode readahead failure to set -EIO
on the buffer, and secondly changes xfs_buf_get_map() to never
return a buffer with an error state set so this first change doesn't
cause unexpected log recovery failures.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 96f859d52bcb1c6ea6f3388d39862bf7143e2f30 upstream.
Because struct xfs_agfl is 36 bytes long and has a 64-bit integer
inside it, gcc will quietly round the structure size up to the nearest
64 bits -- in this case, 40 bytes. This results in the XFS_AGFL_SIZE
macro returning incorrect results for v5 filesystems on 64-bit
machines (118 items instead of 119). As a result, a 32-bit xfs_repair
will see garbage in AGFL item 119 and complain.
Therefore, tell gcc not to pad the structure so that the AGFL size
calculation is correct.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cf9a6784f7c1b5ee2b9159a1246e327c331c5697 upstream.
Without this copy-up of a file can be forced, even without actually being
allowed to do anything on the file.
[Arnd Bergmann] include <linux/pagemap.h> for PAGE_CACHE_SIZE (used by
MAX_LFS_FILESIZE definition).
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ed06e069775ad9236087594a1c1667367e983fb5 upstream.
We copy i_uid and i_gid of underlying inode into overlayfs inode. Except
for the root inode.
Fix this omission.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 84889d49335627bc770b32787c1ef9ebad1da232 upstream.
This patch fixes kernel crash at removing directory which contains
whiteouts from lower layers.
Cache of directory content passed as "list" contains entries from all
layers, including whiteouts from lower layers. So, lookup in upper dir
(moved into work at this stage) will return negative entry. Plus this
cache is filled long before and we can race with external removal.
Example:
mkdir -p lower0/dir lower1/dir upper work overlay
touch lower0/dir/a lower0/dir/b
mknod lower1/dir/a c 0 0
mount -t overlay none overlay -o lowerdir=lower1:lower0,upperdir=upper,workdir=work
rm -fr overlay/dir
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e4ad29fa0d224d05e08b2858e65f112fd8edd4fe upstream.
Rather than always allocating the high-order XATTR_SIZE_MAX buffer
which is costly and prone to failure, only allocate what is needed and
realloc if necessary.
Fixes https://github.com/coreos/bugs/issues/489
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 97daf8b97ad6f913a34c82515be64dc9ac08d63e upstream.
When ovl_copy_xattr() encountered a zero size xattr no more xattrs were
copied and the function returned success. This is clearly not the desired
behavior.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5c2ff95e41c9290d16556cd02e35b25d81be8fe0 upstream.
When working with hugetlbfs ptes (which are actually pmds) is not valid to
directly use pte functions like pte_present() because the hardware bit
layout of pmds and ptes can be different. This is the case on s390.
Therefore we have to convert the hugetlbfs ptes first into a valid pte
encoding with huge_ptep_get().
Currently the /proc/<pid>/numa_maps code uses hugetlbfs ptes without
huge_ptep_get(). On s390 this leads to the following two problems:
1) The pte_present() function returns false (instead of true) for
PROT_NONE hugetlb ptes. Therefore PROT_NONE vmas are missing
completely in the "numa_maps" output.
2) The pte_dirty() function always returns false for all hugetlb ptes.
Therefore these pages are reported as "mapped=xxx" instead of
"dirty=xxx".
Therefore use huge_ptep_get() to correctly convert the hugetlb ptes.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9aacdd354d197ad64685941b36d28ea20ab88757 upstream.
Hillf Danton noticed bugs in the hugetlb_vmtruncate_list routine. The
argument end is of type pgoff_t. It was being converted to a vaddr
offset and passed to unmap_hugepage_range. However, end was also being
used as an argument to the vma_interval_tree_foreach controlling loop.
In addition, the conversion of end to vaddr offset was incorrect.
hugetlb_vmtruncate_list is called as part of a file truncate or
fallocate hole punch operation.
When truncating a hugetlbfs file, this bug could prevent some pages from
being unmapped. This is possible if there are multiple vmas mapping the
file, and there is a sufficiently sized hole between the mappings. The
size of the hole between two vmas (A,B) must be such that the starting
virtual address of B is greater than (ending virtual address of A <<
PAGE_SHIFT). In this case, the pages in B would not be unmapped. If
pages are not properly unmapped during truncate, the following BUG is
hit:
kernel BUG at fs/hugetlbfs/inode.c:428!
In the fallocate hole punch case, this bug could prevent pages from
being unmapped as in the truncate case. However, for hole punch the
result is that unmapped pages will not be removed during the operation.
For hole punch, it is also possible that more pages than desired will be
unmapped. This unnecessary unmapping will cause page faults to
reestablish the mappings on subsequent page access.
Fixes: 1bfad99ab (" hugetlbfs: hugetlb_vmtruncate_list() needs to take a range")Reported-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bb00c898ad1ce40c4bb422a8207ae562e9aea7ae upstream.
If a name contains at least some characters with Unicode values
exceeding single byte, the CS0 output should have 2 bytes per character.
And if other input characters have single byte Unicode values, then
the single input byte is converted to 2 output bytes, and the length
of output becomes larger than the length of input. And if the input
name is long enough, the output length may exceed the allocated buffer
length.
All this means that conversion from UTF8 or NLS to CS0 requires
checking of output length in order to stop when it exceeds the given
output buffer size.
[JK: Make code return -ENAMETOOLONG instead of silently truncating the
name]
Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ad402b265ecf6fa22d04043b41444cdfcdf4f52d upstream.
udf_CS0toUTF8 function stops the conversion when the output buffer
length reaches UDF_NAME_LEN-2, which is correct maximum name length,
but, when checking, it leaves the space for a single byte only,
while multi-bytes output characters can take more space, causing
buffer overflow.
Similar error exists in udf_CS0toNLS function, that restricts
the output length to UDF_NAME_LEN, while actual maximum allowed
length is UDF_NAME_LEN-2.
In these cases the output can override not only the current buffer
length field, causing corruption of the name buffer itself, but also
following allocation structures, causing kernel crash.
Adjust the output length checks in both functions to prevent buffer
overruns in case of multi-bytes UTF8 or NLS characters.
Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b0918d9f476a8434b055e362b83fa4fd1d462c3f upstream.
udf_next_aext() just follows extent pointers while extents are marked as
indirect. This can loop forever for corrupted filesystem. Limit number
the of indirect extents we are willing to follow in a row.
[JK: Updated changelog, limit, style]
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Jan Kara <jack@suse.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 082fa37d1351a41afc491d44a1d095cb8d919aa2 upstream.
We must not skip encoding the statistics, or the server will see an
XDR encoding error.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 361cad3c89070aeb37560860ea8bfc092d545adc upstream.
We've seen this in a packet capture - I've intermixed what I
think was going on. The fix here is to grab the so_lock sooner.
1964379 -> #1 open (for write) reply seqid=1
1964393 -> #2 open (for read) reply seqid=2
__nfs4_close(), state->n_wronly--
nfs4_state_set_mode_locked(), changes state->state = [R]
state->flags is [RW]
state->state is [R], state->n_wronly == 0, state->n_rdonly == 1
1964398 -> #3 open (for write) call -> because close is already running
1964399 -> downgrade (to read) call seqid=2 (close of #1)
1964402 -> #3 open (for write) reply seqid=3
__update_open_stateid()
nfs_set_open_stateid_locked(), changes state->flags
state->flags is [RW]
state->state is [R], state->n_wronly == 0, state->n_rdonly == 1
new sequence number is exposed now via nfs4_stateid_copy()
next step would be update_open_stateflags(), pending so_lock
1964403 -> downgrade reply seqid=2, fails with OLD_STATEID (close of #1)
nfs4_close_prepare() gets so_lock and recalcs flags -> send close
1964405 -> downgrade (to read) call seqid=3 (close of #1 retry)
__update_open_stateid() gets so_lock
* update_open_stateflags() updates state->n_wronly.
nfs4_state_set_mode_locked() updates state->state
state->flags is [RW]
state->state is [RW], state->n_wronly == 1, state->n_rdonly == 1
* should have suppressed the preceding nfs4_close_prepare() from
sending open_downgrade
1964406 -> write call
1964408 -> downgrade (to read) reply seqid=4 (close of #1 retry)
nfs_clear_open_stateid_locked()
state->flags is [R]
state->state is [RW], state->n_wronly == 1, state->n_rdonly == 1
1964409 -> write reply (fails, openmode)
Signed-off-by: Andrew Elble <aweits@rit.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 86fb449b07b8215443a30782dca5755d5b8b0577 upstream.
Jeff reports seeing an Oops in ff_layout_alloc_lseg. Turns out
copy+paste has played cruel tricks on a nested loop.
Reported-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ade14a7df796d4e86bd9d181193c883a57b13db0 upstream.
If a NFSv4 client uses the cache_consistency_bitmask in order to
request only information about the change attribute, timestamps and
size, then it has not revalidated all attributes, and hence the
attribute timeout timestamp should not be updated.
Reported-by: Donald Buczek <buczek@molgen.mpg.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4b550af519854421dfec9f7732cdddeb057134b2 upstream.
The setup_ntlmv2_rsp() function may return positive value ENOMEM instead
of -ENOMEM in case of kmalloc failure.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 01b9b0b28626db4a47d7f48744d70abca9914ef1 upstream.
In some cases tmp_bug can be not filled in cifs_filldir and stay uninitialized,
therefore its printk with "%s" modifier can leak content of kernelspace memory.
If old content of this buffer does not contain '\0' access bejond end of
allocated object can crash the host.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Steve French <sfrench@localhost.localdomain>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 820962dc700598ffe8cd21b967e30e7520c34748 upstream.
cifs_call_async() queues the MID to the pending list and calls
smb_send_rqst(). If smb_send_rqst() performs a partial send, it sets
the tcpStatus to CifsNeedReconnect and returns an error code to
cifs_call_async(). In this case, cifs_call_async() removes the MID
from the list and returns to the caller.
However, cifs_call_async() releases the server mutex _before_ removing
the MID. This means that a cifs_reconnect() can race with this function
and manage to remove the MID from the list and delete the entry before
cifs_call_async() calls cifs_delete_mid(). This leads to various
crashes due to the use after free in cifs_delete_mid().
Task1 Task2
cifs_call_async():
- rc = -EAGAIN
- mutex_unlock(srv_mutex)
cifs_reconnect():
- mutex_lock(srv_mutex)
- mutex_unlock(srv_mutex)
- list_delete(mid)
- mid->callback()
cifs_writev_callback():
- mutex_lock(srv_mutex)
- delete(mid)
- mutex_unlock(srv_mutex)
- cifs_delete_mid(mid) <---- use after free
Fix this by removing the MID in cifs_call_async() before releasing the
srv_mutex. Also hold the srv_mutex in cifs_reconnect() until the MIDs
are moved out of the pending list.
Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <sfrench@localhost.localdomain>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ec7147a99e33a9e4abad6fc6e1b40d15df045d53 upstream.
Under some conditions, CIFS can repeatedly call the cifs_dbg() logging
wrapper. If done rapidly enough, the console framebuffer can softlockup
or "rcu_sched self-detected stall". Apply the built-in log ratelimiters
to prevent such hangs.
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit caaee6234d05a58c5b4d05e7bf766131b810a657 upstream.
By checking the effective credentials instead of the real UID / permitted
capabilities, ensure that the calling process actually intended to use its
credentials.
To ensure that all ptrace checks use the correct caller credentials (e.g.
in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS
flag), use two new flags and require one of them to be set.
The problem was that when a privileged task had temporarily dropped its
privileges, e.g. by calling setreuid(0, user_uid), with the intent to
perform following syscalls with the credentials of a user, it still passed
ptrace access checks that the user would not be able to pass.
While an attacker should not be able to convince the privileged task to
perform a ptrace() syscall, this is a problem because the ptrace access
check is reused for things in procfs.
In particular, the following somewhat interesting procfs entries only rely
on ptrace access checks:
/proc/$pid/stat - uses the check for determining whether pointers
should be visible, useful for bypassing ASLR
/proc/$pid/maps - also useful for bypassing ASLR
/proc/$pid/cwd - useful for gaining access to restricted
directories that contain files with lax permissions, e.g. in
this scenario:
lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar
drwx------ root root /root
drwxr-xr-x root root /root/foobar
-rw-r--r-- root root /root/foobar/secret
Therefore, on a system where a root-owned mode 6755 binary changes its
effective credentials as described and then dumps a user-specified file,
this could be used by an attacker to reveal the memory layout of root's
processes or reveal the contents of files he is not allowed to access
(through /proc/$pid/cwd).
[akpm@linux-foundation.org: fix warning]
Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1636d1d77ef4e01e57f706a4cae3371463896136 upstream.
If a bio for a direct IO request fails, we were not setting the error in
the parent bio (the main DIO bio), making us not return the error to
user space in btrfs_direct_IO(), that is, it made __blockdev_direct_IO()
return the number of bytes issued for IO and not the error a bio created
and submitted by btrfs_submit_direct() got from the block layer.
This essentially happens because when we call:
dio_end_io(dio_bio, bio->bi_error);
It does not set dio_bio->bi_error to the value of the second argument.
So just add this missing assignment in endio callbacks, just as we do in
the error path at btrfs_submit_direct() when we fail to clone the dio bio
or allocate its private object. This follows the convention of what is
done with other similar APIs such as bio_endio() where the caller is
responsible for setting the bi_error field in the bio it passes as an
argument to bio_endio().
This was detected by the new generic test cases in xfstests: 271, 272,
276 and 278. Which essentially setup a dm error target, then load the
error table, do a direct IO write and unload the error table. They
expect the write to fail with -EIO, which was not getting reported
when testing against btrfs.
Fixes: 4246a0b63b ("block: add a bi_error field to struct bio")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 313140023026ae542ad76e7e268c56a1eaa2c28e upstream.
In the extent_same ioctl, we were grabbing the pages (locked) and
attempting to read them without bothering about any concurrent IO
against them. That is, we were not checking for any ongoing ordered
extents nor waiting for them to complete, which leads to a race where
the extent_same() code gets a checksum verification error when it
reads the pages, producing a message like the following in dmesg
and making the operation fail to user space with -ENOMEM:
[18990.161265] BTRFS warning (device sdc): csum failed ino 259 off 495616 csum 685204116 expected csum 1515870868
Fix this by using btrfs_readpage() for reading the pages instead of
extent_read_full_page_nolock(), which waits for any concurrent ordered
extents to complete and locks the io range. Also do better error handling
and don't treat all failures as -ENOMEM, as that's clearly misleasing,
becoming identical to the checks and operation of prepare_uptodate_page().
The use of extent_read_full_page_nolock() was required before
commit f441460202 ("btrfs: fix deadlock with extent-same and readpage"),
as we had the range locked in an inode's io tree before attempting to
read the pages.
Fixes: f441460202 ("btrfs: fix deadlock with extent-same and readpage")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e0bd70c67bf996b360f706b6c643000f2e384681 upstream.
In the extent_same ioctl we are getting the pages for the source and
target ranges and unlocking them immediately after, which is incorrect
because later we attempt to map them (with kmap_atomic) and access their
contents at btrfs_cmp_data(). When we do such access the pages might have
been relocated or removed from memory, which leads to an invalid memory
access. This issue is detected on a kernel with CONFIG_DEBUG_PAGEALLOC=y
which produces a trace like the following:
186736.677437] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[186736.680382] Modules linked in: btrfs dm_flakey dm_mod ppdev xor raid6_pq sha256_generic hmac drbg ansi_cprng acpi_cpufreq evdev sg aesni_intel aes_x86_64
parport_pc ablk_helper tpm_tis psmouse parport i2c_piix4 tpm cryptd i2c_core lrw processor button serio_raw pcspkr gf128mul glue_helper loop autofs4 ext4
crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last
unloaded: btrfs]
[186736.681319] CPU: 13 PID: 10222 Comm: duperemove Tainted: G W 4.4.0-rc6-btrfs-next-18+ #1
[186736.681319] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[186736.681319] task: ffff880132600400 ti: ffff880362284000 task.ti: ffff880362284000
[186736.681319] RIP: 0010:[<ffffffff81264d00>] [<ffffffff81264d00>] memcmp+0xb/0x22
[186736.681319] RSP: 0018:ffff880362287d70 EFLAGS: 00010287
[186736.681319] RAX: 000002c002468acf RBX: 0000000012345678 RCX: 0000000000000000
[186736.681319] RDX: 0000000000001000 RSI: 0005d129c5cf9000 RDI: 0005d129c5cf9000
[186736.681319] RBP: ffff880362287d70 R08: 0000000000000000 R09: 0000000000001000
[186736.681319] R10: ffff880000000000 R11: 0000000000000476 R12: 0000000000001000
[186736.681319] R13: ffff8802f91d4c88 R14: ffff8801f2a77830 R15: ffff880352e83e40
[186736.681319] FS: 00007f27b37fe700(0000) GS:ffff88043dda0000(0000) knlGS:0000000000000000
[186736.681319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[186736.681319] CR2: 00007f27a406a000 CR3: 0000000217421000 CR4: 00000000001406e0
[186736.681319] Stack:
[186736.681319] ffff880362287ea0 ffffffffa048d0bd 000000000009f000 0000000000001000
[186736.681319] 0100000000000000 ffff8801f2a77850 ffff8802f91d49b0 ffff880132600400
[186736.681319] 00000000000004f8 ffff8801c1efbe41 0000000000000000 0000000000000038
[186736.681319] Call Trace:
[186736.681319] [<ffffffffa048d0bd>] btrfs_ioctl+0x24cb/0x2731 [btrfs]
[186736.681319] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
[186736.681319] [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d
[186736.681319] [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea
[186736.681319] [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71
[186736.681319] [<ffffffff8118240e>] SyS_ioctl+0x57/0x79
[186736.681319] [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
[186736.681319] Code: 0a 3c 6e 74 0d 3c 79 74 04 3c 59 75 0c c6 06 01 eb 03 c6 06 00 31 c0 eb 05 b8 ea ff ff ff 5d c3 55 31 c9 48 89 e5 48 39 d1 74 13 <0f> b6
04 0f 44 0f b6 04 0e 48 ff c1 44 29 c0 74 ea eb 02 31 c0
(gdb) list *(btrfs_ioctl+0x24cb)
0x5e0e1 is in btrfs_ioctl (fs/btrfs/ioctl.c:2972).
2967 dst_addr = kmap_atomic(dst_page);
2968
2969 flush_dcache_page(src_page);
2970 flush_dcache_page(dst_page);
2971
2972 if (memcmp(addr, dst_addr, cmp_len))
2973 ret = BTRFS_SAME_DATA_DIFFERS;
2974
2975 kunmap_atomic(addr);
2976 kunmap_atomic(dst_addr);
So fix this by making sure we keep the pages locked and respect the same
locking order as everywhere else: get and lock the pages first and then
lock the range in the inode's io tree (like for example at
__btrfs_buffered_write() and extent_readpages()). If an ordered extent
is found after locking the range in the io tree, unlock the range,
unlock the pages, wait for the ordered extent to complete and repeat the
entire locking process until no overlapping ordered extents are found.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bc4ef7592f657ae81b017207a1098817126ad4cb upstream.
The value of ctx->pos in the last readdir call is supposed to be set to
INT_MAX due to 32bit compatibility, unless 'pos' is intentially set to a
larger value, then it's LLONG_MAX.
There's a report from PaX SIZE_OVERFLOW plugin that "ctx->pos++"
overflows (https://forums.grsecurity.net/viewtopic.php?f=1&t=4284), on a
64bit arch, where the value is 0x7fffffffffffffff ie. LLONG_MAX before
the increment.
We can get to that situation like that:
* emit all regular readdir entries
* still in the same call to readdir, bump the last pos to INT_MAX
* next call to readdir will not emit any entries, but will reach the
bump code again, finds pos to be INT_MAX and sets it to LLONG_MAX
Normally this is not a problem, but if we call readdir again, we'll find
'pos' set to LLONG_MAX and the unconditional increment will overflow.
The report from Victor at
(http://thread.gmane.org/gmane.comp.file-systems.btrfs/49500) with debugging
print shows that pattern:
Overflow: e
Overflow: 7fffffff
Overflow: 7fffffffffffffff
PAX: size overflow detected in function btrfs_real_readdir
fs/btrfs/inode.c:5760 cicus.935_282 max, count: 9, decl: pos; num: 0;
context: dir_context;
CPU: 0 PID: 2630 Comm: polkitd Not tainted 4.2.3-grsec #1
Hardware name: Gigabyte Technology Co., Ltd. H81ND2H/H81ND2H, BIOS F3 08/11/2015
ffffffff81901608 0000000000000000 ffffffff819015e6 ffffc90004973d48
ffffffff81742f0f 0000000000000007 ffffffff81901608 ffffc90004973d78
ffffffff811cb706 0000000000000000 ffff8800d47359e0 ffffc90004973ed8
Call Trace:
[<ffffffff81742f0f>] dump_stack+0x4c/0x7f
[<ffffffff811cb706>] report_size_overflow+0x36/0x40
[<ffffffff812ef0bc>] btrfs_real_readdir+0x69c/0x6d0
[<ffffffff811dafc8>] iterate_dir+0xa8/0x150
[<ffffffff811e6d8d>] ? __fget_light+0x2d/0x70
[<ffffffff811dba3a>] SyS_getdents+0xba/0x1c0
Overflow: 1a
[<ffffffff811db070>] ? iterate_dir+0x150/0x150
[<ffffffff81749b69>] entry_SYSCALL_64_fastpath+0x12/0x83
The jump from 7fffffff to 7fffffffffffffff happens when new dir entries
are not yet synced and are processed from the delayed list. Then the code
could go to the bump section again even though it might not emit any new
dir entries from the delayed list.
The fix avoids entering the "bump" section again once we've finished
emitting the entries, both for synced and delayed entries.
References: https://forums.grsecurity.net/viewtopic.php?f=1&t=4284
Reported-by: Victor <services@swwu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Tested-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>