Add per-uid files that report the data in binary format rather than
text, to allow faster reading & parsing by userspace.
Signed-off-by: Connor O'Brien <connoro@google.com>
Bug: 72339335
Test: compare values to those reported in /proc/uid_time_in_state
Change-Id: I463039ea7f17b842be4c70024fe772539fe2ce02
Add support for reporting per-uid information through procfs, roughly
following the approach used for per-tid and per-tgid directories in
fs/proc/base.c.
This also entails some new tracking of which uids have been used, to
avoid losing information when the last task with a given uid exits.
Signed-off-by: Connor O'Brien <connoro@google.com>
Bug: 72339335
Test: ls /proc/uid/; compare with UIDs in /proc/uid_time_in_state
Change-Id: I0908f0c04438b11ceb673d860e58441bf503d478
Add time in state data to task structs, and create
/proc/<pid>/time_in_state files to show how long each individual task
has run at each frequency.
Create a CONFIG_CPU_FREQ_TIMES option to enable/disable this tracking.
Signed-off-by: Connor O'Brien <connoro@google.com>
Bug: 72339335
Test: Read /proc/<pid>/time_in_state
Change-Id: Ia6456754f4cb1e83b2bc35efa8fbe9f8696febc8
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlo6J9YACgkQONu9yGCS
aT7Cdg/8D7+btjAJPjk/suKUBSOpfSkYIoakaVSGf7r7gFv7SkMF023SikLUK+vN
xa0FL1bYASzXuKgcxY9vB7ZkCDShrglqTCIpbWwHJhwS0fRGTGrMN2MM+opVgeoG
4ngnEPLue5TqZs3LVrpTySQFODlxnY3C4lpKopN7QNrcr1M5iiMELXCJu/qy6JhC
ZBsRGUY8GHbouqC0YSqNlrv+C7zbfAlaawIBDSmYm0R4F+TuqoKZlBGAJ9lbALcZ
pM8OaOXx9v471RhE7Tcsl3Eiz3vKHFKWxG/ZujkSqB21wPq6gd4VuP/wMuelX0GC
rDTb/nn9Zhmv7UCOn62htlRLrAnSaJ9FlEK+u3TJ+XBGE9gmanH9IjIljCehEZeI
55Vm7q6IwQT2WvgTzqUoco4AYI37T9pqJ++I1E3jY/zk+bfCIQ1ZMpMXmwAUx738
m7boO38eRnyXMxqf4hfVQ4BFPkwaxdW/I3LDanE6U85Hw2nI2uZIPHRbVrC6gAPS
aY9EUFEancxu4mW92mWWKrEnEWs5Jsb313ISAKSU75WwWmZ/tgsUtSzvY7XNPwYr
G//HbPA5zNFdM1zSO4HQiLYjOqAiwNNAbDEWKoYr8MQpgYbTB4/SgT15mJzw6uTo
WtKZsIWMYiXVW8cNhQiWAVUJVf66GMlc6kHcyHU7YHH3obZAYXU=
=DNsf
-----END PGP SIGNATURE-----
Merge 4.4.107 into android-4.4
Changes in 4.4.107
crypto: hmac - require that the underlying hash algorithm is unkeyed
crypto: salsa20 - fix blkcipher_walk API usage
autofs: fix careless error in recent commit
tracing: Allocate mask_str buffer dynamically
USB: uas and storage: Add US_FL_BROKEN_FUA for another JMicron JMS567 ID
USB: core: prevent malicious bNumInterfaces overflow
usbip: fix stub_send_ret_submit() vulnerability to null transfer_buffer
ceph: drop negative child dentries before try pruning inode's alias
Bluetooth: btusb: driver to enable the usb-wakeup feature
xhci: Don't add a virt_dev to the devs array before it's fully allocated
sched/rt: Do not pull from current CPU if only one CPU to pull
dmaengine: dmatest: move callback wait queue to thread context
ext4: fix fdatasync(2) after fallocate(2) operation
ext4: fix crash when a directory's i_size is too small
KEYS: add missing permission check for request_key() destination
mac80211: Fix addition of mesh configuration element
usb: phy: isp1301: Add OF device ID table
md-cluster: free md_cluster_info if node leave cluster
userfaultfd: shmem: __do_fault requires VM_FAULT_NOPAGE
userfaultfd: selftest: vm: allow to build in vm/ directory
net: initialize msg.msg_flags in recvfrom
net: bcmgenet: correct the RBUF_OVFL_CNT and RBUF_ERR_CNT MIB values
net: bcmgenet: correct MIB access of UniMAC RUNT counters
net: bcmgenet: reserved phy revisions must be checked first
net: bcmgenet: power down internal phy if open or resume fails
net: bcmgenet: Power up the internal PHY before probing the MII
NFSD: fix nfsd_minorversion(.., NFSD_AVAIL)
NFSD: fix nfsd_reset_versions for NFSv4.
Input: i8042 - add TUXEDO BU1406 (N24_25BU) to the nomux list
drm/omap: fix dmabuf mmap for dma_alloc'ed buffers
netfilter: bridge: honor frag_max_size when refragmenting
writeback: fix memory leak in wb_queue_work()
net: wimax/i2400m: fix NULL-deref at probe
dmaengine: Fix array index out of bounds warning in __get_unmap_pool()
net: Resend IGMP memberships upon peer notification.
mlxsw: reg: Fix SPVM max record count
mlxsw: reg: Fix SPVMLR max record count
intel_th: pci: Add Gemini Lake support
openrisc: fix issue handling 8 byte get_user calls
scsi: hpsa: update check for logical volume status
scsi: hpsa: limit outstanding rescans
fjes: Fix wrong netdevice feature flags
drm/radeon/si: add dpm quirk for Oland
sched/deadline: Make sure the replenishment timer fires in the next period
sched/deadline: Throttle a constrained deadline task activated after the deadline
sched/deadline: Use deadline instead of period when calculating overflow
mmc: mediatek: Fixed bug where clock frequency could be set wrong
drm/radeon: reinstate oland workaround for sclk
afs: Fix missing put_page()
afs: Populate group ID from vnode status
afs: Adjust mode bits processing
afs: Flush outstanding writes when an fd is closed
afs: Migrate vlocation fields to 64-bit
afs: Prevent callback expiry timer overflow
afs: Fix the maths in afs_fs_store_data()
afs: Populate and use client modification time
afs: Fix page leak in afs_write_begin()
afs: Fix afs_kill_pages()
net/mlx4_core: Avoid delays during VF driver device shutdown
perf symbols: Fix symbols__fixup_end heuristic for corner cases
efi/esrt: Cleanup bad memory map log messages
NFSv4.1 respect server's max size in CREATE_SESSION
btrfs: add missing memset while reading compressed inline extents
target: Use system workqueue for ALUA transitions
target: fix ALUA transition timeout handling
target: fix race during implicit transition work flushes
sfc: don't warn on successful change of MAC
fbdev: controlfb: Add missing modes to fix out of bounds access
video: udlfb: Fix read EDID timeout
video: fbdev: au1200fb: Release some resources if a memory allocation fails
video: fbdev: au1200fb: Return an error code if a memory allocation fails
rtc: pcf8563: fix output clock rate
dmaengine: ti-dma-crossbar: Correct am335x/am43xx mux value type
PCI/PME: Handle invalid data when reading Root Status
powerpc/powernv/cpufreq: Fix the frequency read by /proc/cpuinfo
netfilter: ipvs: Fix inappropriate output of procfs
powerpc/opal: Fix EBUSY bug in acquiring tokens
powerpc/ipic: Fix status get and status clear
target/iscsi: Fix a race condition in iscsit_add_reject_from_cmd()
iscsi-target: fix memory leak in lio_target_tiqn_addtpg()
target:fix condition return in core_pr_dump_initiator_port()
target/file: Do not return error for UNMAP if length is zero
arm-ccn: perf: Prevent module unload while PMU is in use
crypto: tcrypt - fix buffer lengths in test_aead_speed()
mm: Handle 0 flags in _calc_vm_trans() macro
clk: mediatek: add the option for determining PLL source clock
clk: imx6: refine hdmi_isfr's parent to make HDMI work on i.MX6 SoCs w/o VPU
clk: tegra: Fix cclk_lp divisor register
ppp: Destroy the mutex when cleanup
thermal/drivers/step_wise: Fix temperature regulation misbehavior
GFS2: Take inode off order_write list when setting jdata flag
bcache: explicitly destroy mutex while exiting
bcache: fix wrong cache_misses statistics
l2tp: cleanup l2tp_tunnel_delete calls
xfs: fix log block underflow during recovery cycle verification
xfs: fix incorrect extent state in xfs_bmap_add_extent_unwritten_real
PCI: Detach driver before procfs & sysfs teardown on device remove
scsi: hpsa: cleanup sas_phy structures in sysfs when unloading
scsi: hpsa: destroy sas transport properties before scsi_host
powerpc/perf/hv-24x7: Fix incorrect comparison in memord
tty fix oops when rmmod 8250
usb: musb: da8xx: fix babble condition handling
pinctrl: adi2: Fix Kconfig build problem
raid5: Set R5_Expanded on parity devices as well as data.
scsi: scsi_devinfo: Add REPORTLUN2 to EMC SYMMETRIX blacklist entry
vt6655: Fix a possible sleep-in-atomic bug in vt6655_suspend
scsi: sd: change manage_start_stop to bool in sysfs interface
scsi: sd: change allow_restart to bool in sysfs interface
scsi: bfa: integer overflow in debugfs
udf: Avoid overflow when session starts at large offset
macvlan: Only deliver one copy of the frame to the macvlan interface
RDMA/cma: Avoid triggering undefined behavior
IB/ipoib: Grab rtnl lock on heavy flush when calling ndo_open/stop
ath9k: fix tx99 potential info leak
Linux 4.4.107
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllQl/sACgkQONu9yGCS
aT5zMRAAuDBpWjQ1IFtgmzQnKGyjS3fm5X/EgPmT81PFKXay5/TH6Hc85TvorChk
mCC7qybadCFPjieBfUeCGhTposiGkbOZdYIzduzLeHPe7Eda88NKJw5ZS3x+RDro
if6BZNtQPwPk9jQ95zpBu/p6eCuIGFzQObif8XHga9eEVP+TPGDKFn5EdLM8j99t
ErKYyTLFEiZYa52hpCBbVz/4mX8bJOoAlZaitcbvaFbG0OodA5SL24sKlr7tAPrM
ajnuqv+ghOUjbXrUlrTGxCjJ7vCJjdBqNzuxVFNj5P1xDucpBW8uuWGob0XWTMbB
hj/ToAIQXQXrZKFpASWW74B4QZDcjo7dbhDWOurBaAsyLuBzAi26pI+q6TqgCQUO
k17ilfk9LVEvvFhiQ7xpJPNnkh6tCEk7Jdblru6ZL5fHCAYe+qUDj56TbqjFJCQK
+bDzPi0QXkEGQNKxo7zDu5iGQ0Gb0zD2Z3MrGD+3pCkM5yG0PXjzZ7lOlboyPzwY
88dxuuTRmm8yGEEm81BKmDYqAA1l4FCrap8u9FLoNyoZyMnK7B+SHHuPRBRhL3F2
I3L/v8BbJhXTsDNPXEsXtpZZpn2wxJp4x4gKWmCcOb5MM1nbFrFtwdj0cKobu6Xe
ygNMEkjlW2uUrZoDXthj1ICda/cEw/R0gMWzBeNNVfErOZEmFxM=
=zl9i
-----END PGP SIGNATURE-----
Merge 4.4.74 into android-4.4
Changes in 4.4.74
configfs: Fix race between create_link and configfs_rmdir
can: gs_usb: fix memory leak in gs_cmd_reset()
cpufreq: conservative: Allow down_threshold to take values from 1 to 10
vb2: Fix an off by one error in 'vb2_plane_vaddr'
mac80211: don't look at the PM bit of BAR frames
mac80211/wpa: use constant time memory comparison for MACs
mac80211: fix CSA in IBSS mode
mac80211: fix IBSS presp allocation size
serial: efm32: Fix parity management in 'efm32_uart_console_get_options()'
x86/mm/32: Set the '__vmalloc_start_set' flag in initmem_init()
mfd: omap-usb-tll: Fix inverted bit use for USB TLL mode
staging: rtl8188eu: prevent an underflow in rtw_check_beacon_data()
iio: proximity: as3935: recalibrate RCO after resume
USB: hub: fix SS max number of ports
usb: core: fix potential memory leak in error path during hcd creation
pvrusb2: reduce stack usage pvr2_eeprom_analyze()
USB: gadget: dummy_hcd: fix hub-descriptor removable fields
usb: r8a66597-hcd: select a different endpoint on timeout
usb: r8a66597-hcd: decrease timeout
drivers/misc/c2port/c2port-duramar2150.c: checking for NULL instead of IS_ERR()
usb: xhci: ASMedia ASM1042A chipset need shorts TX quirk
USB: gadgetfs, dummy-hcd, net2280: fix locking for callbacks
mm/memory-failure.c: use compound_head() flags for huge pages
swap: cond_resched in swap_cgroup_prepare()
genirq: Release resources in __setup_irq() error path
alarmtimer: Prevent overflow of relative timers
usb: dwc3: exynos fix axius clock error path to do cleanup
MIPS: Fix bnezc/jialc return address calculation
alarmtimer: Rate limit periodic intervals
mm: larger stack guard gap, between vmas
Allow stack to grow up to address space limit
mm: fix new crash in unmapped_area_topdown()
Linux 4.4.74
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllEst4ACgkQONu9yGCS
aT5RJQ//UBkwmDInzy8BbfZk7pXY4IXzbXYfZ/nS5QbmPQNcQArKeIsOH+C1ZTKq
suo4wR4yWUzATr8BBBxbUKyyrAsA85oDq/fl3EyPVLYOHdChnM/sS9H/eFrQhJIN
7PM9j1YiqyTohkagB2tkWSb8tO1r/wgTSTLmIE+RzsWn6rvMCPMPVY+3OGgC1fuT
LJrgtXKGBl9zNxmrns3VlSou1MhJvh/y8ESIrhdYHdZKos0zsgQaISXJjhZtyQcV
OnEzsuz9NMXWE9XGjNpyAm88Nh8T41Ey/vwOjt6mkvWac3r77IgI5NWaLV/QDyqm
d3jpVWK5BSPcLsmeN4LwQC5aYayvHlh8CfP8ZlBx1xkB5TpclnqXGgQA9BYpXAKw
XoeFl8n8xLaPrgX8gp3kw/f6C6443OC2JVeRvgnH/0ZM7M0+rZxE6DstRcUHGqf6
K8PN+AssQpBLIjXSGHnzDVHME/1xWUSmJZfLd5bd6NJ1zSZqZOy1gkf5dx77p0Ka
UaGVOg6UzOojr3GeUTE62bRO2ZuAno0QO1NQJJkUK1CbNMYmE61vYLM8i0pLKWZJ
3mDlhcoGK8aJH8chNLU3mgkXECU/9zOVKveWZFoghhMlv8ImgeTuiqZhvztzzT38
42DxdXPfMzxCwBF02zYu4qn+WDJbNyOqMQrlMEHwwb88wnKiOUg=
=Ic1J
-----END PGP SIGNATURE-----
Merge 4.4.73 into android-4.4
Changes in 4.4.73
s390/vmem: fix identity mapping
partitions/msdos: FreeBSD UFS2 file systems are not recognized
ARM: dts: imx6dl: Fix the VDD_ARM_CAP voltage for 396MHz operation
staging: rtl8192e: rtl92e_fill_tx_desc fix write to mapped out memory.
Call echo service immediately after socket reconnect
net: xilinx_emaclite: fix freezes due to unordered I/O
net: xilinx_emaclite: fix receive buffer overflow
ipv6: Handle IPv4-mapped src to in6addr_any dst.
ipv6: Inhibit IPv4-mapped src address on the wire.
NET: Fix /proc/net/arp for AX.25
NET: mkiss: Fix panic
net: hns: Fix the device being used for dma mapping during TX
sierra_net: Skip validating irrelevant fields for IDLE LSIs
sierra_net: Add support for IPv6 and Dual-Stack Link Sense Indications
i2c: piix4: Fix request_region size
ipv6: Fix IPv6 packet loss in scenarios involving roaming + snooping switches
PM / runtime: Avoid false-positive warnings from might_sleep_if()
jump label: pass kbuild_cflags when checking for asm goto support
kasan: respect /proc/sys/kernel/traceoff_on_warning
log2: make order_base_2() behave correctly on const input value zero
ethtool: do not vzalloc(0) on registers dump
fscache: Fix dead object requeue
fscache: Clear outstanding writes when disabling a cookie
FS-Cache: Initialise stores_lock in netfs cookie
ipv6: fix flow labels when the traffic class is non-0
drm/nouveau: prevent userspace from deleting client object
drm/nouveau/fence/g84-: protect against concurrent access to semaphore buffers
net/mlx4_core: Avoid command timeouts during VF driver device shutdown
gianfar: synchronize DMA API usage by free_skb_rx_queue w/ gfar_new_page
pinctrl: berlin-bg4ct: fix the value for "sd1a" of pin SCRD0_CRD_PRES
net: adaptec: starfire: add checks for dma mapping errors
parisc, parport_gsc: Fixes for printk continuation lines
drm/nouveau: Don't enabling polling twice on runtime resume
drm/ast: Fixed system hanged if disable P2A
ravb: unmap descriptors when freeing rings
nfs: Fix "Don't increment lock sequence ID after NFS4ERR_MOVED"
r8152: re-schedule napi for tx
r8152: fix rtl8152_post_reset function
r8152: avoid start_xmit to schedule napi when napi is disabled
sctp: sctp_addr_id2transport should verify the addr before looking up assoc
romfs: use different way to generate fsid for BLOCK or MTD
proc: add a schedule point in proc_pid_readdir()
tipc: ignore requests when the connection state is not CONNECTED
xtensa: don't use linux IRQ #0
s390/kvm: do not rely on the ILC on kvm host protection fauls
sparc64: make string buffers large enough
Linux 4.4.73
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 1be7107fbe18eed3e319a6c3e83c78254b693acb upstream.
Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.
This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.
Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.
One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications. For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).
Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.
Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.
Original-patch-by: Oleg Nesterov <oleg@redhat.com>
Original-patch-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[wt: backport to 4.11: adjust context]
[wt: backport to 4.9: adjust context ; kernel doc was not in admin-guide]
[wt: backport to 4.4: adjust context ; drop ppc hugetlb_radix changes]
Signed-off-by: Willy Tarreau <w@1wt.eu>
[gkh: minor build fixes for 4.4]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3ba4bceef23206349d4130ddf140819b365de7c8 ]
We have seen proc_pid_readdir() invocations holding cpu for more than 50
ms. Add a cond_resched() to be gentle with other tasks.
[akpm@linux-foundation.org: coding style fix]
Link: http://lkml.kernel.org/r/1484238380.15816.42.camel@edumazet-glaptop3.roam.corp.google.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlkm0zAACgkQONu9yGCS
aT5QnxAAh9uZYFJtQ7wYngD7cQcDH1KVztqEYxCP5OtxzAZBrSNBufLdhKBbc1ZP
C04Mo+FzzNiJtBwkmlOqYaEPYUSx/uwCEk9mNX85VtchIhKBrwWF7GxkeXCPs6e5
yP5TUXmxbbSp3qM4q2Z4XSW8eEPZ2l3zoy0fkjz2kS02e4RW0yQ34dvzw0BG2urr
+9ocyVjDBoU3QNKyVw3fd1AltKesSZK0fa2vEO+TOTW6Bm3xD4egCJdOzu9saUwK
hfSKXsJ0/pf1r1iyfz2foR/Hi3i4j6vRqnneyqozT7nxEJEuBQ3B5WhnsbDfzrXu
+CY23KBkDkQ1RBngmtTQd3ABHEN1E2StpBImG5RUr+5giV6/e4rdz0/HWGMvCvAz
iWqXdgZNdCnc96HPEWaDGUKxndCxsiaJOhgZwW2zm/0drVWRE+vjsOmFLyUp2Ky1
1vnKfwlvTFU4xjQ5H44AuuSHQsv+GNEtPPIHrbBv/wg90/2VuF0aYuNYjHSsc4Ca
3YM53S6/sjQqmsKixWboax8Kh2wRrEuFbqSFQV64JjFpGau61JQFMtRNl4+FFXzm
Cm+26Fan4Wtyo5zB9xnBZbDwCOXqwTXQYUP2SejtObq+Uk2tXxF05emeta9pURF3
vdgv6N0cTPm4K3VZyBZvj8JitEr2OEaIxoUqE2BXkA1MPmbqOoI=
=Z1no
-----END PGP SIGNATURE-----
Merge 4.4.70 into android-4.4
Changes in 4.4.70
usb: misc: legousbtower: Fix buffers on stack
usb: misc: legousbtower: Fix memory leak
USB: ene_usb6250: fix DMA to the stack
watchdog: pcwd_usb: fix NULL-deref at probe
char: lp: fix possible integer overflow in lp_setup()
USB: core: replace %p with %pK
ARM: tegra: paz00: Mark panel regulator as enabled on boot
tpm_crb: check for bad response size
infiniband: call ipv6 route lookup via the stub interface
dm btree: fix for dm_btree_find_lowest_key()
dm raid: select the Kconfig option CONFIG_MD_RAID0
dm bufio: avoid a possible ABBA deadlock
dm bufio: check new buffer allocation watermark every 30 seconds
dm cache metadata: fail operations if fail_io mode has been established
dm bufio: make the parameter "retain_bytes" unsigned long
dm thin metadata: call precommit before saving the roots
dm space map disk: fix some book keeping in the disk space map
md: update slab_cache before releasing new stripes when stripes resizing
rtlwifi: rtl8821ae: setup 8812ae RFE according to device type
mwifiex: pcie: fix cmd_buf use-after-free in remove/reset
ima: accept previously set IMA_NEW_FILE
KVM: x86: Fix load damaged SSEx MXCSR register
KVM: X86: Fix read out-of-bounds vulnerability in kvm pio emulation
regulator: tps65023: Fix inverted core enable logic.
s390/kdump: Add final note
s390/cputime: fix incorrect system time
ath9k_htc: Add support of AirTies 1eda:2315 AR9271 device
ath9k_htc: fix NULL-deref at probe
drm/amdgpu: Avoid overflows/divide-by-zero in latency_watermark calculations.
drm/amdgpu: Make display watermark calculations more accurate
drm/nouveau/therm: remove ineffective workarounds for alarm bugs
drm/nouveau/tmr: ack interrupt before processing alarms
drm/nouveau/tmr: fix corruption of the pending list when rescheduling an alarm
drm/nouveau/tmr: avoid processing completed alarms when adding a new one
drm/nouveau/tmr: handle races with hw when updating the next alarm time
cdc-acm: fix possible invalid access when processing notification
proc: Fix unbalanced hard link numbers
of: fix sparse warning in of_pci_range_parser_one
iio: dac: ad7303: fix channel description
pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes
pid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes()
USB: serial: ftdi_sio: fix setting latency for unprivileged users
USB: serial: ftdi_sio: add Olimex ARM-USB-TINY(H) PIDs
ext4 crypto: don't let data integrity writebacks fail with ENOMEM
ext4 crypto: fix some error handling
net: qmi_wwan: Add SIMCom 7230E
fscrypt: fix context consistency check when key(s) unavailable
f2fs: check entire encrypted bigname when finding a dentry
fscrypt: avoid collisions when presenting long encrypted filenames
sched/fair: Do not announce throttled next buddy in dequeue_task_fair()
sched/fair: Initialize throttle_count for new task-groups lazily
usb: host: xhci-plat: propagate return value of platform_get_irq()
xhci: apply PME_STUCK_QUIRK and MISSING_CAS quirk for Denverton
usb: host: xhci-mem: allocate zeroed Scratchpad Buffer
net: irda: irda-usb: fix firmware name on big-endian hosts
usbvision: fix NULL-deref at probe
mceusb: fix NULL-deref at probe
ttusb2: limit messages to buffer size
usb: musb: tusb6010_omap: Do not reset the other direction's packet size
USB: iowarrior: fix info ioctl on big-endian hosts
usb: serial: option: add Telit ME910 support
USB: serial: qcserial: add more Lenovo EM74xx device IDs
USB: serial: mct_u232: fix big-endian baud-rate handling
USB: serial: io_ti: fix div-by-zero in set_termios
USB: hub: fix SS hub-descriptor handling
USB: hub: fix non-SS hub-descriptor handling
ipx: call ipxitf_put() in ioctl error path
iio: proximity: as3935: fix as3935_write
ceph: fix recursion between ceph_set_acl() and __ceph_setattr()
gspca: konica: add missing endpoint sanity check
s5p-mfc: Fix unbalanced call to clock management
dib0700: fix NULL-deref at probe
zr364xx: enforce minimum size when reading header
dvb-frontends/cxd2841er: define symbol_rate_min/max in T/C fe-ops
cx231xx-audio: fix init error path
cx231xx-audio: fix NULL-deref at probe
cx231xx-cards: fix NULL-deref at probe
powerpc/book3s/mce: Move add_taint() later in virtual mode
powerpc/pseries: Fix of_node_put() underflow during DLPAR remove
powerpc/64e: Fix hang when debugging programs with relocated kernel
ARM: dts: at91: sama5d3_xplained: fix ADC vref
ARM: dts: at91: sama5d3_xplained: not all ADC channels are available
arm64: xchg: hazard against entire exchange variable
arm64: uaccess: ensure extension of access_ok() addr
arm64: documentation: document tagged pointer stack constraints
xc2028: Fix use-after-free bug properly
mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp
staging: rtl8192e: fix 2 byte alignment of register BSSIDR.
staging: rtl8192e: rtl92e_get_eeprom_size Fix read size of EPROM_CMD.
iommu/vt-d: Flush the IOTLB to get rid of the initial kdump mappings
metag/uaccess: Fix access_ok()
metag/uaccess: Check access_ok in strncpy_from_user
uwb: fix device quirk on big-endian hosts
genirq: Fix chained interrupt data ordering
osf_wait4(): fix infoleak
tracing/kprobes: Enforce kprobes teardown after testing
PCI: Fix pci_mmap_fits() for HAVE_PCI_RESOURCE_TO_USER platforms
PCI: Freeze PME scan before suspending devices
drm/edid: Add 10 bpc quirk for LGD 764 panel in HP zBook 17 G2
nfsd: encoders mustn't use unitialized values in error cases
drivers: char: mem: Check for address space wraparound with mmap()
Linux 4.4.70
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit d66bb1607e2d8d384e53f3d93db5c18483c8c4f7 upstream.
proc_create_mount_point() forgot to increase the parent's nlink, and
it resulted in unbalanced hard link numbers, e.g. /proc/fs shows one
less than expected.
Fixes: eb6d38d542 ("proc: Allow creating permanently empty directories...")
Reported-by: Tristan Ye <tristan.ye@suse.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlj5tRkACgkQONu9yGCS
aT5zFxAAouq2kxBFxxJIQ3255yy/7B6oBYrhilQZPrETC800PUaIqZtuQZPpaoqb
3gG0+12ve0CMHK+PidEwsQlMlAHNI1xbzmUHm2UIrLYYCV817DTkEsc7JXGUvYVA
/YA71GASKmLVi9DnsawRb0ELhTeQHec76LrPlgvyWH/OMEtNcMOv/8oWfTq9bKV2
HsHC6MOwT2R86ukhYYmcfFHomTnJSpW7KtGXwNC/LhohzIfsKQKGQWb1f1j1aHGC
u5yQ5Qc9T+DhPMHAEY+xuURz/3ohpUL8aSQXk7pua/bTD0X0klNQcf/BXVJXsaeI
s4g78q+YdTcPL81rkEW+7yUvAlb3u+FdVr+wjsl/s6ih4iL0EgBsoClqUjGUUoz+
jvCXHiMP7lHi50eIkppQf/yZSVKSobKn5YYf9AA+y6tQ9R9GguDS/IQSRe2HnHeR
OymCBXa6BSmQGGyPiMUBiNTix6roJ8Vr4dK9lbsQXZ+YZICXWs1rpMOy5HK9EJWf
M6YF6l9lHwQ38AN+MhsjUXIyKLp9zCk7syeFaeK6k/IA2kcm7dL/momiZ1QIBnhq
OHB3iwEPZ5Rr4CVjk5j7Ue22ubdrtpc8IfTYV95N7nv+g3nBwe22k+RDi70NiDwk
2pnBqhO/vtPRE9Ry3QBS73VEeXgNb9IIVwQ7hi9Rk7KUgmdEOOo=
=iS0x
-----END PGP SIGNATURE-----
Merge 4.4.63 into android-4.4
Changes in 4.4.63:
cgroup, kthread: close race window where new kthreads can be migrated to non-root cgroups
thp: fix MADV_DONTNEED vs clear soft dirty race
drm/nouveau/mpeg: mthd returns true on success now
drm/nouveau/mmu/nv4a: use nv04 mmu rather than the nv44 one
CIFS: store results of cifs_reopen_file to avoid infinite wait
Input: xpad - add support for Razer Wildcat gamepad
perf/x86: Avoid exposing wrong/stale data in intel_pmu_lbr_read_32()
x86/vdso: Ensure vdso32_enabled gets set to valid values only
x86/vdso: Plug race between mapping and ELF header setup
acpi, nfit, libnvdimm: fix interleave set cookie calculation (64-bit comparison)
iscsi-target: Fix TMR reference leak during session shutdown
iscsi-target: Drop work-around for legacy GlobalSAN initiator
scsi: sr: Sanity check returned mode data
scsi: sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable
scsi: sd: Fix capacity calculation with 32-bit sector_t
xen, fbfront: fix connecting to backend
libnvdimm: fix reconfig_mutex, mmap_sem, and jbd2_handle lockdep splat
irqchip/irq-imx-gpcv2: Fix spinlock initialization
ftrace: Fix removing of second function probe
char: Drop bogus dependency of DEVPORT on !M68K
char: lack of bool string made CONFIG_DEVPORT always on
Revert "MIPS: Lantiq: Fix cascaded IRQ setup"
kvm: fix page struct leak in handle_vmon
zram: do not use copy_page with non-page aligned address
powerpc: Disable HFSCR[TM] if TM is not supported
crypto: ahash - Fix EINPROGRESS notification callback
ath9k: fix NULL pointer dereference
dvb-usb-v2: avoid use-after-free
ext4: fix inode checksum calculation problem if i_extra_size is small
platform/x86: acer-wmi: setup accelerometer when machine has appropriate notify event
rtc: tegra: Implement clock handling
mm: Tighten x86 /dev/mem with zeroing reads
dvb-usb: don't use stack for firmware load
dvb-usb-firmware: don't do DMA on stack
virtio-console: avoid DMA from stack
pegasus: Use heap buffers for all register access
rtl8150: Use heap buffers for all register access
catc: Combine failure cleanup code in catc_probe()
catc: Use heap buffer for memory size test
ibmveth: calculate gso_segs for large packets
SUNRPC: fix refcounting problems with auth_gss messages.
tty/serial: atmel: RS485 half duplex w/DMA: enable RX after TX is done
net: ipv6: check route protocol when deleting routes
sctp: deny peeloff operation on asocs with threads sleeping on it
MIPS: fix Select HAVE_IRQ_EXIT_ON_IRQ_STACK patch.
Linux 4.4.63
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 5b7abeae3af8c08c577e599dd0578b9e3ee6687b upstream.
Yet another instance of the same race.
Fix is identical to change_huge_pmd().
See "thp: fix MADV_DONTNEED vs. numa balancing race" for more details.
Link: http://lkml.kernel.org/r/20170302151034.27829-5-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We hit hardened usercopy feature check for kernel text access by reading
kcore file:
usercopy: kernel memory exposure attempt detected from ffffffff8179a01f (<kernel text>) (4065 bytes)
kernel BUG at mm/usercopy.c:75!
Bypassing this check for kcore by adding bounce buffer for ktext data.
Reported-by: Steve Best <sbest@redhat.com>
Fixes: f5509cc18daa ("mm: Hardened usercopy")
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 31374226
Change-Id: Ic93e6041b67d804a994518bf4950811f828b406e
(cherry picked from commit df04abfd181acc276ba6762c8206891ae10ae00d)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Next patch adds bounce buffer for ktext area, so it's
convenient to have single bounce buffer for both
vmalloc/module and ktext cases.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 31374226
Change-Id: I8f517354e6d12aed75ed4ae6c0a6adef0a1e61da
(cherry picked from commit f5beeb1851ea6f8cfcf2657f26cb24c0582b4945)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
commit cd81a9170e69e018bbaba547c1fd85a585f5697a upstream.
For more convenient access if one has a pointer to the task.
As a minor nit take advantage of the fact that only task lock + rcu are
needed to safely grab ->exe_file. This saves mm refcount dance.
Use the helper in proc_exe_link.
Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 65376df582174ffcec9e6471bf5b0dd79ba05e4a ]
Commit b76437579d ("procfs: mark thread stack correctly in
proc/<pid>/maps") added [stack:TID] annotation to /proc/<pid>/maps.
Finding the task of a stack VMA requires walking the entire thread list,
turning this into quadratic behavior: a thousand threads means a
thousand stacks, so the rendering of /proc/<pid>/maps needs to look at a
million combinations.
The cost is not in proportion to the usefulness as described in the
patch.
Drop the [stack:TID] annotation to make /proc/<pid>/maps (and
/proc/<pid>/numa_maps) usable again for higher thread counts.
The [stack] annotation inside /proc/<pid>/task/<tid>/maps is retained, as
identifying the stack VMA there is an O(1) operation.
Siddesh said:
"The end users needed a way to identify thread stacks programmatically and
there wasn't a way to do that. I'm afraid I no longer remember (or have
access to the resources that would aid my memory since I changed
employers) the details of their requirement. However, I did do this on my
own time because I thought it was an interesting project for me and nobody
really gave any feedback then as to its utility, so as far as I am
concerned you could roll back the main thread maps information since the
information is available in the thread-specific files"
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com>
Cc: Shaohua Li <shli@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 8148a73c9901a8794a50f950083c00ccf97d43b3)
If /proc/<PID>/environ gets read before the envp[] array is fully set up
in create_{aout,elf,elf_fdpic,flat}_tables(), we might end up trying to
read more bytes than are actually written, as env_start will already be
set but env_end will still be zero, making the range calculation
underflow, allowing to read beyond the end of what has been written.
Fix this as it is done for /proc/<PID>/cmdline by testing env_end for
zero. It is, apparently, intentionally set last in create_*_tables().
This bug was found by the PaX size_overflow plugin that detected the
arithmetic underflow of 'this_len = env_end - (env_start + src)' when
env_end is still zero.
The expected consequence is that userland trying to access
/proc/<PID>/environ of a not yet fully set up process may get
inconsistent data as we're in the middle of copying in the environment
variables.
Fixes: https://forums.grsecurity.net/viewtopic.php?f=3&t=4363
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=116461
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: Pax Team <pageexec@freemail.hu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ia2f58d48c15478ed4b6e237b63e704c70ff21e96
Bug: 30951939
In changing from checking ptrace_may_access(p, PTRACE_MODE_ATTACH_FSCREDS)
to capable(CAP_SYS_NICE), I missed that ptrace_my_access succeeds
when p == current, but the CAP_SYS_NICE doesn't.
Thus while the previous commit was intended to loosen the needed
privledges to modify a processes timerslack, it needlessly restricted
a task modifying its own timerslack via the proc/<tid>/timerslack_ns
(which is permitted also via the PR_SET_TIMERSLACK method).
This patch corrects this by checking if p == current before checking
the CAP_SYS_NICE value.
Cc: Kees Cook <keescook@chromium.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
CC: Arjan van de Ven <arjan@linux.intel.com>
Cc: Oren Laadan <orenl@cellrox.com>
Cc: Ruchi Kandoi <kandoiruchi@google.com>
Cc: Rom Lemarchand <romlem@android.com>
Cc: Todd Kjos <tkjos@google.com>
Cc: Colin Cross <ccross@android.com>
Cc: Nick Kralevich <nnk@google.com>
Cc: Dmitry Shmidt <dimitrysh@google.com>
Cc: Elliott Hughes <enh@google.com>
Cc: Android Kernel Team <kernel-team@android.com>
Mailing-list-url: http://www.spinics.net/lists/kernel/msg2317488.html
Change-Id: Ia3e8aff07c2d41f55b6617502d33c39b7d781aac
Signed-off-by: John Stultz <john.stultz@linaro.org>
As requested, this patch checks the existing LSM hooks
task_getscheduler/task_setscheduler when reading or modifying
the task's timerslack value.
Previous versions added new get/settimerslack LSM hooks, but
since they checked the same PROCESS__SET/GETSCHED values as
existing hooks, it was suggested we just use the existing ones.
Cc: Kees Cook <keescook@chromium.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
CC: Arjan van de Ven <arjan@linux.intel.com>
Cc: Oren Laadan <orenl@cellrox.com>
Cc: Ruchi Kandoi <kandoiruchi@google.com>
Cc: Rom Lemarchand <romlem@android.com>
Cc: Todd Kjos <tkjos@google.com>
Cc: Colin Cross <ccross@android.com>
Cc: Nick Kralevich <nnk@google.com>
Cc: Dmitry Shmidt <dimitrysh@google.com>
Cc: Elliott Hughes <enh@google.com>
Cc: James Morris <jmorris@namei.org>
Cc: Android Kernel Team <kernel-team@android.com>
Cc: linux-security-module@vger.kernel.org
Cc: selinux@tycho.nsa.gov
Signed-off-by: John Stultz <john.stultz@linaro.org>
FROMLIST URL: https://lkml.org/lkml/2016/7/21/523
Change-Id: Id157d10e2fe0b85f1be45035a6117358a42af028
(Cherry picked back to common/android-4.4)
Signed-off-by: John Stultz <john.stultz@linaro.org>
When an interface to allow a task to change another tasks
timerslack was first proposed, it was suggested that something
greater then CAP_SYS_NICE would be needed, as a task could be
delayed further then what normally could be done with nice
adjustments.
So CAP_SYS_PTRACE was adopted instead for what became the
/proc/<tid>/timerslack_ns interface. However, for Android (where
this feature originates), giving the system_server
CAP_SYS_PTRACE would allow it to observe and modify all tasks
memory. This is considered too high a privilege level for only
needing to change the timerslack.
After some discussion, it was realized that a CAP_SYS_NICE
process can set a task as SCHED_FIFO, so they could fork some
spinning processes and set them all SCHED_FIFO 99, in effect
delaying all other tasks for an infinite amount of time.
So as a CAP_SYS_NICE task can already cause trouble for other
tasks, using it as a required capability for accessing and
modifying /proc/<tid>/timerslack_ns seems sufficient.
Thus, this patch loosens the capability requirements to
CAP_SYS_NICE and removes CAP_SYS_PTRACE, simplifying some
of the code flow as well.
This is technically an ABI change, but as the feature just
landed in 4.6, I suspect no one is yet using it.
Cc: Kees Cook <keescook@chromium.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
CC: Arjan van de Ven <arjan@linux.intel.com>
Cc: Oren Laadan <orenl@cellrox.com>
Cc: Ruchi Kandoi <kandoiruchi@google.com>
Cc: Rom Lemarchand <romlem@android.com>
Cc: Todd Kjos <tkjos@google.com>
Cc: Colin Cross <ccross@android.com>
Cc: Nick Kralevich <nnk@google.com>
Cc: Dmitry Shmidt <dimitrysh@google.com>
Cc: Elliott Hughes <enh@google.com>
Cc: Android Kernel Team <kernel-team@android.com>
Reviewed-by: Nick Kralevich <nnk@google.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
FROMLIST URL: https://lkml.org/lkml/2016/7/21/522
Change-Id: Ia75481402e3948165a1b7c1551c539530cb25509
(Cherry picked against common/android-4.4)
Signed-off-by: John Stultz <john.stultz@linaro.org>
(cherry picked from commit e54ad7f1ee263ffa5a2de9c609d58dfa27b21cd9)
This prevents stacking filesystems (ecryptfs and overlayfs) from using
procfs as lower filesystem. There is too much magic going on inside
procfs, and there is no good reason to stack stuff on top of procfs.
(For example, procfs does access checks in VFS open handlers, and
ecryptfs by design calls open handlers from a kernel thread that doesn't
drop privileges or so.)
Signed-off-by: Jann Horn <jannh@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ib050ef9dc10e623589d22e3a9e6aee9ee4f0cd5d
Bug: 29444228
This patch backports 969624b (which backports caaee6234d0 upstream),
from the v4.4-stable branch to the common/android-4.4 branch.
This patch is needed to provide the PTRACE_MODE_ATTACH_FSCREDS definition
which was used by the backported version of proc/<tid>/timerslack_ns
in change-id: Ie5799b9a3402a31f88cd46437dcda4a0e46415a7
commit caaee6234d05a58c5b4d05e7bf766131b810a657 upstream.
By checking the effective credentials instead of the real UID / permitted
capabilities, ensure that the calling process actually intended to use its
credentials.
To ensure that all ptrace checks use the correct caller credentials (e.g.
in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS
flag), use two new flags and require one of them to be set.
The problem was that when a privileged task had temporarily dropped its
privileges, e.g. by calling setreuid(0, user_uid), with the intent to
perform following syscalls with the credentials of a user, it still passed
ptrace access checks that the user would not be able to pass.
While an attacker should not be able to convince the privileged task to
perform a ptrace() syscall, this is a problem because the ptrace access
check is reused for things in procfs.
In particular, the following somewhat interesting procfs entries only rely
on ptrace access checks:
/proc/$pid/stat - uses the check for determining whether pointers
should be visible, useful for bypassing ASLR
/proc/$pid/maps - also useful for bypassing ASLR
/proc/$pid/cwd - useful for gaining access to restricted
directories that contain files with lax permissions, e.g. in
this scenario:
lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar
drwx------ root root /root
drwxr-xr-x root root /root/foobar
-rw-r--r-- root root /root/foobar/secret
Therefore, on a system where a root-owned mode 6755 binary changes its
effective credentials as described and then dumps a user-specified file,
this could be used by an attacker to reveal the memory layout of root's
processes or reveal the contents of files he is not allowed to access
(through /proc/$pid/cwd).
[akpm@linux-foundation.org: fix warning]
Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[jstultz: Cherry-picked for common/android-4.4]
Signed-off-by: John Stultz <john.stultz@linaro.org>
This backports 5de23d435e88996b1efe0e2cebe242074ce67c9e
This patch provides a proc/PID/timerslack_ns interface which exposes a
task's timerslack value in nanoseconds and allows it to be changed.
This allows power/performance management software to set timer slack for
other threads according to its policy for the thread (such as when the
thread is designated foreground vs. background activity)
If the value written is non-zero, slack is set to that value. Otherwise
sets it to the default for the thread.
This interface checks that the calling task has permissions to to use
PTRACE_MODE_ATTACH_FSCREDS on the target task, so that we can ensure
arbitrary apps do not change the timer slack for other apps.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oren Laadan <orenl@cellrox.com>
Cc: Ruchi Kandoi <kandoiruchi@google.com>
Cc: Rom Lemarchand <romlem@android.com>
Cc: Android Kernel Team <kernel-team@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
commit e54ad7f1ee263ffa5a2de9c609d58dfa27b21cd9 upstream.
This prevents stacking filesystems (ecryptfs and overlayfs) from using
procfs as lower filesystem. There is too much magic going on inside
procfs, and there is no good reason to stack stuff on top of procfs.
(For example, procfs does access checks in VFS open handlers, and
ecryptfs by design calls open handlers from a kernel thread that doesn't
drop privileges or so.)
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8148a73c9901a8794a50f950083c00ccf97d43b3 upstream.
If /proc/<PID>/environ gets read before the envp[] array is fully set up
in create_{aout,elf,elf_fdpic,flat}_tables(), we might end up trying to
read more bytes than are actually written, as env_start will already be
set but env_end will still be zero, making the range calculation
underflow, allowing to read beyond the end of what has been written.
Fix this as it is done for /proc/<PID>/cmdline by testing env_end for
zero. It is, apparently, intentionally set last in create_*_tables().
This bug was found by the PaX size_overflow plugin that detected the
arithmetic underflow of 'this_len = env_end - (env_start + src)' when
env_end is still zero.
The expected consequence is that userland trying to access
/proc/<PID>/environ of a not yet fully set up process may get
inconsistent data as we're in the middle of copying in the environment
variables.
Fixes: https://forums.grsecurity.net/viewtopic.php?f=3&t=4363
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=116461
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: Pax Team <pageexec@freemail.hu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The PR_DUMPABLE flag causes the pid related paths of the
proc file system to be owned by ROOT. The implementation
of pthread_set/getname_np however needs access to
/proc/<pid>/task/<tid>/comm.
If PR_DUMPABLE is false this implementation is locked out.
This patch installs a special permission function for
the file "comm" that grants read and write access to
all threads of the same group regardless of the ownership
of the inode. For all other threads the function falls back
to the generic inode permission check.
Signed-off-by: Janis Danisevskis <jdanis@google.com>
commit 28093f9f34cedeaea0f481c58446d9dac6dd620f upstream.
In gather_pte_stats() a THP pmd is cast into a pte, which is wrong
because the layouts may differ depending on the architecture. On s390
this will lead to inaccurate numa_maps accounting in /proc because of
misguided pte_present() and pte_dirty() checks on the fake pte.
On other architectures pte_present() and pte_dirty() may work by chance,
but there may be an issue with direct-access (dax) mappings w/o
underlying struct pages when HAVE_PTE_SPECIAL is set and THP is
available. In vm_normal_page() the fake pte will be checked with
pte_special() and because there is no "special" bit in a pmd, this will
always return false and the VM_PFNMAP | VM_MIXEDMAP checking will be
skipped. On dax mappings w/o struct pages, an invalid struct page
pointer would then be returned that can crash the kernel.
This patch fixes the numa_maps THP handling by introducing new "_pmd"
variants of the can_gather_numa_stats() and vm_normal_page() functions.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5c2ff95e41c9290d16556cd02e35b25d81be8fe0 upstream.
When working with hugetlbfs ptes (which are actually pmds) is not valid to
directly use pte functions like pte_present() because the hardware bit
layout of pmds and ptes can be different. This is the case on s390.
Therefore we have to convert the hugetlbfs ptes first into a valid pte
encoding with huge_ptep_get().
Currently the /proc/<pid>/numa_maps code uses hugetlbfs ptes without
huge_ptep_get(). On s390 this leads to the following two problems:
1) The pte_present() function returns false (instead of true) for
PROT_NONE hugetlb ptes. Therefore PROT_NONE vmas are missing
completely in the "numa_maps" output.
2) The pte_dirty() function always returns false for all hugetlb ptes.
Therefore these pages are reported as "mapped=xxx" instead of
"dirty=xxx".
Therefore use huge_ptep_get() to correctly convert the hugetlb ptes.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit caaee6234d05a58c5b4d05e7bf766131b810a657 upstream.
By checking the effective credentials instead of the real UID / permitted
capabilities, ensure that the calling process actually intended to use its
credentials.
To ensure that all ptrace checks use the correct caller credentials (e.g.
in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS
flag), use two new flags and require one of them to be set.
The problem was that when a privileged task had temporarily dropped its
privileges, e.g. by calling setreuid(0, user_uid), with the intent to
perform following syscalls with the credentials of a user, it still passed
ptrace access checks that the user would not be able to pass.
While an attacker should not be able to convince the privileged task to
perform a ptrace() syscall, this is a problem because the ptrace access
check is reused for things in procfs.
In particular, the following somewhat interesting procfs entries only rely
on ptrace access checks:
/proc/$pid/stat - uses the check for determining whether pointers
should be visible, useful for bypassing ASLR
/proc/$pid/maps - also useful for bypassing ASLR
/proc/$pid/cwd - useful for gaining access to restricted
directories that contain files with lax permissions, e.g. in
this scenario:
lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar
drwx------ root root /root
drwxr-xr-x root root /root/foobar
-rw-r--r-- root root /root/foobar/secret
Therefore, on a system where a root-owned mode 6755 binary changes its
effective credentials as described and then dumps a user-specified file,
this could be used by an attacker to reveal the memory layout of root's
processes or reveal the contents of files he is not allowed to access
(through /proc/$pid/cwd).
[akpm@linux-foundation.org: fix warning]
Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Userspace processes often have multiple allocators that each do
anonymous mmaps to get memory. When examining memory usage of
individual processes or systems as a whole, it is useful to be
able to break down the various heaps that were allocated by
each layer and examine their size, RSS, and physical memory
usage.
This patch adds a user pointer to the shared union in
vm_area_struct that points to a null terminated string inside
the user process containing a name for the vma. vmas that
point to the same address will be merged, but vmas that
point to equivalent strings at different addresses will
not be merged.
Userspace can set the name for a region of memory by calling
prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name);
Setting the name to NULL clears it.
The names of named anonymous vmas are shown in /proc/pid/maps
as [anon:<name>] and in /proc/pid/smaps in a new "Name" field
that is only present for named vmas. If the userspace pointer
is no longer valid all or part of the name will be replaced
with "<fault>".
The idea to store a userspace pointer to reduce the complexity
within mm (at the expense of the complexity of reading
/proc/pid/mem) came from Dave Hansen. This results in no
runtime overhead in the mm subsystem other than comparing
the anon_name pointers when considering vma merging. The pointer
is stored in a union with fieds that are only used on file-backed
mappings, so it does not increase memory usage.
Includes fix from Jed Davis <jld@mozilla.com> for typo in
prctl_set_vma_anon_name, which could attempt to set the name
across two vmas at the same time due to a typo, which might
corrupt the vma list. Fix it to use tmp instead of end to limit
the name setting to a single vma at a time.
Change-Id: I9aa7b6b5ef536cd780599ba4e2fba8ceebe8b59f
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
Make oom_adj and oom_score_adj user read-only.
Bug: 19636629
Change-Id: I055bb172d5b4d3d856e25918f3c5de8edf31e4a3
Signed-off-by: Rom Lemarchand <romlem@google.com>
Writing to /proc/$pid/coredump_filter always returns -ESRCH because commit
774636e19e ("proc: convert to kstrto*()/kstrto*_from_user()") removed
the setting of ret after the get_proc_task call and incorrectly left it as
-ESRCH. Instead, return 0 when successful.
Example breakage:
echo 0 > /proc/self/coredump_filter
bash: echo: write error: No such process
Fixes: 774636e19e ("proc: convert to kstrto*()/kstrto*_from_user()")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org> [4.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge second patch-bomb from Andrew Morton:
- most of the rest of MM
- procfs
- lib/ updates
- printk updates
- bitops infrastructure tweaks
- checkpatch updates
- nilfs2 update
- signals
- various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
dma-debug, dma-mapping, ...
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits)
ipc,msg: drop dst nil validation in copy_msg
include/linux/zutil.h: fix usage example of zlib_adler32()
panic: release stale console lock to always get the logbuf printed out
dma-debug: check nents in dma_sync_sg*
dma-mapping: tidy up dma_parms default handling
pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
kexec: use file name as the output message prefix
fs, seqfile: always allow oom killer
seq_file: reuse string_escape_str()
fs/seq_file: use seq_* helpers in seq_hex_dump()
coredump: change zap_threads() and zap_process() to use for_each_thread()
coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
signals: kill block_all_signals() and unblock_all_signals()
nilfs2: fix gcc uninitialized-variable warnings in powerpc build
nilfs2: fix gcc unused-but-set-variable warnings
MAINTAINERS: nilfs2: add header file for tracing
nilfs2: add tracepoints for analyzing reading and writing metadata files
...
Pull trivial updates from Jiri Kosina:
"Trivial stuff from trivial tree that can be trivially summed up as:
- treewide drop of spurious unlikely() before IS_ERR() from Viresh
Kumar
- cosmetic fixes (that don't really affect basic functionality of the
driver) for pktcdvd and bcache, from Julia Lawall and Petr Mladek
- various comment / printk fixes and updates all over the place"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
bcache: Really show state of work pending bit
hwmon: applesmc: fix comment typos
Kconfig: remove comment about scsi_wait_scan module
class_find_device: fix reference to argument "match"
debugfs: document that debugfs_remove*() accepts NULL and error values
net: Drop unlikely before IS_ERR(_OR_NULL)
mm: Drop unlikely before IS_ERR(_OR_NULL)
fs: Drop unlikely before IS_ERR(_OR_NULL)
drivers: net: Drop unlikely before IS_ERR(_OR_NULL)
drivers: misc: Drop unlikely before IS_ERR(_OR_NULL)
UBI: Update comments to reflect UBI_METAONLY flag
pktcdvd: drop null test before destroy functions
The commit 96d0df79f2 ("proc: make proc_fd_permission() thread-friendly")
fixed the access to /proc/self/fd from sub-threads, but introduced another
problem: a sub-thread can't access /proc/<tid>/fd/ or /proc/thread-self/fd
if generic_permission() fails.
Change proc_fd_permission() to check same_thread_group(pid_task(), current).
Fixes: 96d0df79f2 ("proc: make proc_fd_permission() thread-friendly")
Reported-by: "Jin, Yihua" <yihua.jin@intel.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For now in task_name() we ignore the return code of string_escape_str()
call. This is not good if buffer suddenly becomes not big enough. Do the
proper error handling there.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
/proc/pid/oom_adj exists solely to avoid breaking existing userspace
binaries that write to the tunable.
Add a comment in the only possible location within the kernel tree to
describe the situation and motivation for keeping it around.
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As mentioned in the commit 56eecdb912 ("mm: Use ptep/pmdp_set_numa()
for updating _PAGE_NUMA bit"), architectures like ppc64 don't do tlb
flush in set_pte/pmd functions.
So when dealing with existing pte in clear_soft_dirty, the pte must be
cleared before being modified.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently there's no easy way to get per-process usage of hugetlb pages,
which is inconvenient because userspace applications which use hugetlb
typically want to control their processes on the basis of how much memory
(including hugetlb) they use. So this patch simply provides easy access
to the info via /proc/PID/status.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Joern Engel <joern@logfs.org>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>