Commit graph

10141 commits

Author SHA1 Message Date
Greg Kroah-Hartman
60a02c0d81 This is the 4.4.185 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl0lmj0ACgkQONu9yGCS
 aT67kRAAgoQ2/imz/GJ7bBKhtql6XneFD7fPDnggxHNRGA4VvkV+KcpHIinWZtXp
 QV5sq8u092SlvQHYolLcVyK3OsMzfzjsK0HMXJRxEa4UsI38hEDBqpgDaUIYiKrb
 l1NUsQLy8foFobbKsTkdiN3+RBdjkTFWZ/SxDXg1T6EicbntvpP/E0QXQ4J+H9TQ
 Wdvx0wMS1m8RkBQzoUxyqz10vCDP67e2NPIBotaKOQS3iYa1he6nU5Y7mLpJOw7s
 mEPkwnT9+DnDENhL7YHo37JPIgSKz+kLdnw6xLzKSIEOl7MUylu7ocbxcJlAHyxC
 LkLRA6E3vCBu540ajHhyjIfN4IPln78qnV1ciGmyTE+YNWtvyPqZVERDXqh9thM9
 4lPUm20HAyhopmxdYfoAq933Ki8IH/mTc3vXpcXbVnAOp2uZfJ0nhVOyhqov9B+t
 p6ct9t9/1ARPITmFTGNWFvTTRT+OoPLDi6ND1o+8ukXRmn9+sl0/VpWsJNpdIAiM
 ss93JfdVTFkR84Oc0zL4Cg55q01BvJYiGtj2oeU5cBECXvXAEvR3ro17b17fmCFh
 jk5xcckickpSK1g8iXNsC8EAAL0tIE+vQemxmJDPCcV3rM83addo9Lwqyme6Wa/3
 F702sfcOSvRvoEfd+9WZcKV01GfTEFu4jK7onrvL16Xn60Wq1iM=
 =Qr7D
 -----END PGP SIGNATURE-----

Merge 4.4.185 into android-4.4-p

Changes in 4.4.185
	fs/binfmt_flat.c: make load_flat_shared_library() work
	mm/page_idle.c: fix oops because end_pfn is larger than max_pfn
	scsi: vmw_pscsi: Fix use-after-free in pvscsi_queue_lck()
	tracing: Silence GCC 9 array bounds warning
	gcc-9: silence 'address-of-packed-member' warning
	usb: chipidea: udc: workaround for endpoint conflict issue
	Input: uinput - add compat ioctl number translation for UI_*_FF_UPLOAD
	apparmor: enforce nullbyte at end of tag string
	parport: Fix mem leak in parport_register_dev_model
	parisc: Fix compiler warnings in float emulation code
	IB/hfi1: Insure freeze_work work_struct is canceled on shutdown
	MIPS: uprobes: remove set but not used variable 'epc'
	net: hns: Fix loopback test failed at copper ports
	sparc: perf: fix updated event period in response to PERF_EVENT_IOC_PERIOD
	scripts/checkstack.pl: Fix arm64 wrong or unknown architecture
	scsi: ufs: Check that space was properly alloced in copy_query_response
	s390/qeth: fix VLAN attribute in bridge_hostnotify udev event
	hwmon: (pmbus/core) Treat parameters as paged if on multiple pages
	Btrfs: fix race between readahead and device replace/removal
	btrfs: start readahead also in seed devices
	can: flexcan: fix timeout when set small bitrate
	can: purge socket error queue on sock destruct
	ARM: imx: cpuidle-imx6sx: Restrict the SW2ISO increase to i.MX6SX
	Bluetooth: Align minimum encryption key size for LE and BR/EDR connections
	Bluetooth: Fix regression with minimum encryption key size alignment
	SMB3: retry on STATUS_INSUFFICIENT_RESOURCES instead of failing write
	cfg80211: fix memory leak of wiphy device name
	mac80211: drop robust management frames from unknown TA
	perf ui helpline: Use strlcpy() as a shorter form of strncpy() + explicit set nul
	perf help: Remove needless use of strncpy()
	9p/rdma: do not disconnect on down_interruptible EAGAIN
	9p: acl: fix uninitialized iattr access
	9p/rdma: remove useless check in cm_event_handler
	9p: p9dirent_read: check network-provided name length
	net/9p: include trans_common.h to fix missing prototype warning.
	KVM: X86: Fix scan ioapic use-before-initialization
	ovl: modify ovl_permission() to do checks on two inodes
	x86/speculation: Allow guests to use SSBD even if host does not
	cpu/speculation: Warn on unsupported mitigations= parameter
	sctp: change to hold sk after auth shkey is created successfully
	tipc: change to use register_pernet_device
	tipc: check msg->req data len in tipc_nl_compat_bearer_disable
	team: Always enable vlan tx offload
	ipv4: Use return value of inet_iif() for __raw_v4_lookup in the while loop
	bonding: Always enable vlan tx offload
	net: check before dereferencing netdev_ops during busy poll
	Bluetooth: Fix faulty expression for minimum encryption key size check
	um: Compile with modern headers
	ASoC : cs4265 : readable register too low
	spi: bitbang: Fix NULL pointer dereference in spi_unregister_master
	ASoC: max98090: remove 24-bit format support if RJ is 0
	usb: gadget: fusb300_udc: Fix memory leak of fusb300->ep[i]
	usb: gadget: udc: lpc32xx: allocate descriptor with GFP_ATOMIC
	scsi: hpsa: correct ioaccel2 chaining
	ARC: Assume multiplier is always present
	ARC: fix build warning in elf.h
	MIPS: math-emu: do not use bools for arithmetic
	mfd: omap-usb-tll: Fix register offsets
	swiotlb: Make linux/swiotlb.h standalone includible
	bug.h: work around GCC PR82365 in BUG()
	MIPS: Workaround GCC __builtin_unreachable reordering bug
	ptrace: Fix ->ptracer_cred handling for PTRACE_TRACEME
	crypto: user - prevent operating on larval algorithms
	ALSA: seq: fix incorrect order of dest_client/dest_ports arguments
	ALSA: firewire-lib/fireworks: fix miss detection of received MIDI messages
	ALSA: usb-audio: fix sign unintended sign extension on left shifts
	lib/mpi: Fix karactx leak in mpi_powm
	btrfs: Ensure replaced device doesn't have pending chunk allocation
	tty: rocket: fix incorrect forward declaration of 'rp_init()'
	ARC: handle gcc generated __builtin_trap for older compiler
	arm64, vdso: Define vdso_{start,end} as array
	KVM: x86: degrade WARN to pr_warn_ratelimited
	dmaengine: imx-sdma: remove BD_INTR for channel0
	Linux 4.4.185

Change-Id: If1b1ee0b61d5f6d6fb162dc446c621d6baebfab9
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-07-10 12:57:28 +02:00
Colin Ian King
bd6042e9c3 mm/page_idle.c: fix oops because end_pfn is larger than max_pfn
commit 7298e3b0a149c91323b3205d325e942c3b3b9ef6 upstream.

Currently the calcuation of end_pfn can round up the pfn number to more
than the actual maximum number of pfns, causing an Oops.  Fix this by
ensuring end_pfn is never more than max_pfn.

This can be easily triggered when on systems where the end_pfn gets
rounded up to more than max_pfn using the idle-page stress-ng stress test:

sudo stress-ng --idle-page 0

  BUG: unable to handle kernel paging request at 00000000000020d8
  #PF error: [normal kernel read fault]
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP PTI
  CPU: 1 PID: 11039 Comm: stress-ng-idle- Not tainted 5.0.0-5-generic #6-Ubuntu
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
  RIP: 0010:page_idle_get_page+0xc8/0x1a0
  Code: 0f b1 0a 75 7d 48 8b 03 48 89 c2 48 c1 e8 33 83 e0 07 48 c1 ea 36 48 8d 0c 40 4c 8d 24 88 49 c1 e4 07 4c 03 24 d5 00 89 c3 be <49> 8b 44 24 58 48 8d b8 80 a1 02 00 e8 07 d5 77 00 48 8b 53 08 48
  RSP: 0018:ffffafd7c672fde8 EFLAGS: 00010202
  RAX: 0000000000000005 RBX: ffffe36341fff700 RCX: 000000000000000f
  RDX: 0000000000000284 RSI: 0000000000000275 RDI: 0000000001fff700
  RBP: ffffafd7c672fe00 R08: ffffa0bc34056410 R09: 0000000000000276
  R10: ffffa0bc754e9b40 R11: ffffa0bc330f6400 R12: 0000000000002080
  R13: ffffe36341fff700 R14: 0000000000080000 R15: ffffa0bc330f6400
  FS: 00007f0ec1ea5740(0000) GS:ffffa0bc7db00000(0000) knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000000020d8 CR3: 0000000077d68000 CR4: 00000000000006e0
  Call Trace:
    page_idle_bitmap_write+0x8c/0x140
    sysfs_kf_bin_write+0x5c/0x70
    kernfs_fop_write+0x12e/0x1b0
    __vfs_write+0x1b/0x40
    vfs_write+0xab/0x1b0
    ksys_write+0x55/0xc0
    __x64_sys_write+0x1a/0x20
    do_syscall_64+0x5a/0x110
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

Link: http://lkml.kernel.org/r/20190618124352.28307-1-colin.king@canonical.com
Fixes: 33c3fc71c8 ("mm: introduce idle page tracking")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10 09:56:30 +02:00
Greg Kroah-Hartman
032bab8306 This is the 4.4.183 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl0NyDMACgkQONu9yGCS
 aT5MEA//XPA94FjtAzSiaMZlf79PAusz1FJIazheoOuhgn+dkLyrmZtO4BoADglK
 3G8M7ArAYYGg4grZgIkUdqjeTAlWu5XQm+pS8N7b2fJFiKM/p/1t4s1RitQwk2xm
 giCcqvCA+5D9FIwPsktAlRjhtLK7chmQAvxHDv8CYI60GPkMxSTLrx+7rMwj31wP
 4MR/Ll8MJAxXWXF7PPJnfDgMCIp5T8FEV+Uu3UUQc5kB0YNQofi9sFyhbCdiHSgN
 g9HIGuUKt5wV+bwzGcshRR86sumfsh0ayuEToZQHTUSWBgPvmh4Q9hyn/cFkOmAq
 tSFB9wC+DIjgGwvrT1573i1pwZBhetz8L8sgo4QGilqQl4QTbGv3PR5dpdxm4VFK
 BbzhWGmZGGd/J8KSIjRpwEEE2FoyUi/kalcYxqcIu8N+FuQVX3HvUdDkFSmwyoLB
 7/xMZb+xNkkmsuw+4JTGQ2CMQKcY4XbLIpa1cVbvDY9z923DGMJCqmMfb2FU4Rn1
 KRUXSV2Dt231IHVbXjqt7jLnMdTIq9Z8gMjH/yoX0Z4AFphvq2pk/xB3W7AkWQE3
 Q4iqSzPzT8Hij/3OJBVQTcwUajdmwjmMf+6ZvKjnECUcYv2nOuyuU6hXMDQuhH/v
 ZEB9WyH/kjUaRwjNensR+NhEmsKRKRW6eWaZ3CSJvzlUZKWMBsU=
 =8C1+
 -----END PGP SIGNATURE-----

Merge 4.4.183 into android-4.4-p

Changes in 4.4.183
	fs/fat/file.c: issue flush after the writeback of FAT
	sysctl: return -EINVAL if val violates minmax
	ipc: prevent lockup on alloc_msg and free_msg
	hugetlbfs: on restore reserve error path retain subpool reservation
	mm/cma.c: fix crash on CMA allocation if bitmap allocation fails
	mm/cma_debug.c: fix the break condition in cma_maxchunk_get()
	kernel/sys.c: prctl: fix false positive in validate_prctl_map()
	mfd: intel-lpss: Set the device in reset state when init
	mfd: twl6040: Fix device init errors for ACCCTL register
	perf/x86/intel: Allow PEBS multi-entry in watermark mode
	drm/bridge: adv7511: Fix low refresh rate selection
	ntp: Allow TAI-UTC offset to be set to zero
	f2fs: fix to avoid panic in do_recover_data()
	f2fs: fix to do sanity check on valid block count of segment
	iommu/vt-d: Set intel_iommu_gfx_mapped correctly
	ALSA: hda - Register irq handler after the chip initialization
	nvmem: core: fix read buffer in place
	fuse: retrieve: cap requested size to negotiated max_write
	nfsd: allow fh_want_write to be called twice
	x86/PCI: Fix PCI IRQ routing table memory leak
	platform/chrome: cros_ec_proto: check for NULL transfer function
	soc: mediatek: pwrap: Zero initialize rdata in pwrap_init_cipher
	clk: rockchip: Turn on "aclk_dmac1" for suspend on rk3288
	ARM: dts: imx6sx: Specify IMX6SX_CLK_IPG as "ahb" clock to SDMA
	ARM: dts: imx6sx: Specify IMX6SX_CLK_IPG as "ipg" clock to SDMA
	ARM: dts: imx6qdl: Specify IMX6QDL_CLK_IPG as "ipg" clock to SDMA
	PCI: rpadlpar: Fix leaked device_node references in add/remove paths
	PCI: rcar: Fix a potential NULL pointer dereference
	video: hgafb: fix potential NULL pointer dereference
	video: imsttfb: fix potential NULL pointer dereferences
	PCI: xilinx: Check for __get_free_pages() failure
	gpio: gpio-omap: add check for off wake capable gpios
	dmaengine: idma64: Use actual device for DMA transfers
	pwm: tiehrpwm: Update shadow register for disabling PWMs
	ARM: dts: exynos: Always enable necessary APIO_1V8 and ABB_1V8 regulators on Arndale Octa
	pwm: Fix deadlock warning when removing PWM device
	ARM: exynos: Fix undefined instruction during Exynos5422 resume
	futex: Fix futex lock the wrong page
	Revert "Bluetooth: Align minimum encryption key size for LE and BR/EDR connections"
	ALSA: seq: Cover unsubscribe_port() in list_mutex
	libata: Extend quirks for the ST1000LM024 drives with NOLPM quirk
	mm/list_lru.c: fix memory leak in __memcg_init_list_lru_node
	fs/ocfs2: fix race in ocfs2_dentry_attach_lock()
	signal/ptrace: Don't leak unitialized kernel memory with PTRACE_PEEK_SIGINFO
	ptrace: restore smp_rmb() in __ptrace_may_access()
	i2c: acorn: fix i2c warning
	bcache: fix stack corruption by PRECEDING_KEY()
	cgroup: Use css_tryget() instead of css_tryget_online() in task_get_css()
	ASoC: cs42xx8: Add regcache mask dirty
	Drivers: misc: fix out-of-bounds access in function param_set_kgdbts_var
	scsi: lpfc: add check for loss of ndlp when sending RRQ
	scsi: bnx2fc: fix incorrect cast to u64 on shift operation
	usbnet: ipheth: fix racing condition
	KVM: x86/pmu: do not mask the value that is written to fixed PMUs
	KVM: s390: fix memory slot handling for KVM_SET_USER_MEMORY_REGION
	drm/vmwgfx: integer underflow in vmw_cmd_dx_set_shader() leading to an invalid read
	drm/vmwgfx: NULL pointer dereference from vmw_cmd_dx_view_define()
	USB: Fix chipmunk-like voice when using Logitech C270 for recording audio.
	USB: usb-storage: Add new ID to ums-realtek
	USB: serial: pl2303: add Allied Telesis VT-Kit3
	USB: serial: option: add support for Simcom SIM7500/SIM7600 RNDIS mode
	USB: serial: option: add Telit 0x1260 and 0x1261 compositions
	ax25: fix inconsistent lock state in ax25_destroy_timer
	be2net: Fix number of Rx queues used for flow hashing
	ipv6: flowlabel: fl6_sock_lookup() must use atomic_inc_not_zero
	lapb: fixed leak of control-blocks.
	neigh: fix use-after-free read in pneigh_get_next
	sunhv: Fix device naming inconsistency between sunhv_console and sunhv_reg
	mISDN: make sure device name is NUL terminated
	x86/CPU/AMD: Don't force the CPB cap when running under a hypervisor
	perf/ring_buffer: Fix exposing a temporarily decreased data_head
	perf/ring_buffer: Add ordering to rb->nest increment
	gpio: fix gpio-adp5588 build errors
	net: tulip: de4x5: Drop redundant MODULE_DEVICE_TABLE()
	i2c: dev: fix potential memory leak in i2cdev_ioctl_rdwr
	configfs: Fix use-after-free when accessing sd->s_dentry
	ia64: fix build errors by exporting paddr_to_nid()
	KVM: PPC: Book3S: Use new mutex to synchronize access to rtas token list
	net: sh_eth: fix mdio access in sh_eth_close() for R-Car Gen2 and RZ/A1 SoCs
	scsi: libcxgbi: add a check for NULL pointer in cxgbi_check_route()
	scsi: libsas: delete sas port if expander discover failed
	Revert "crypto: crypto4xx - properly set IV after de- and encrypt"
	coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping
	Abort file_remove_privs() for non-reg. files
	Linux 4.4.183

Change-Id: I26e2772a587b1dcf557adede5bcff66962f72432
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-06-22 09:45:38 +02:00
Andrea Arcangeli
8f6345a11c coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping
commit 04f5866e41fb70690e28397487d8bd8eea7d712a upstream.

The core dumping code has always run without holding the mmap_sem for
writing, despite that is the only way to ensure that the entire vma
layout will not change from under it.  Only using some signal
serialization on the processes belonging to the mm is not nearly enough.
This was pointed out earlier.  For example in Hugh's post from Jul 2017:

  https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils

  "Not strictly relevant here, but a related note: I was very surprised
   to discover, only quite recently, how handle_mm_fault() may be called
   without down_read(mmap_sem) - when core dumping. That seems a
   misguided optimization to me, which would also be nice to correct"

In particular because the growsdown and growsup can move the
vm_start/vm_end the various loops the core dump does around the vma will
not be consistent if page faults can happen concurrently.

Pretty much all users calling mmget_not_zero()/get_task_mm() and then
taking the mmap_sem had the potential to introduce unexpected side
effects in the core dumping code.

Adding mmap_sem for writing around the ->core_dump invocation is a
viable long term fix, but it requires removing all copy user and page
faults and to replace them with get_dump_page() for all binary formats
which is not suitable as a short term fix.

For the time being this solution manually covers the places that can
confuse the core dump either by altering the vma layout or the vma flags
while it runs.  Once ->core_dump runs under mmap_sem for writing the
function mmget_still_valid() can be dropped.

Allowing mmap_sem protected sections to run in parallel with the
coredump provides some minor parallelism advantage to the swapoff code
(which seems to be safe enough by never mangling any vma field and can
keep doing swapins in parallel to the core dumping) and to some other
corner case.

In order to facilitate the backporting I added "Fixes: 86039bd3b4e6"
however the side effect of this same race condition in /proc/pid/mem
should be reproducible since before 2.6.12-rc2 so I couldn't add any
other "Fixes:" because there's no hash beyond the git genesis commit.

Because find_extend_vma() is the only location outside of the process
context that could modify the "mm" structures under mmap_sem for
reading, by adding the mmget_still_valid() check to it, all other cases
that take the mmap_sem for reading don't need the new check after
mmget_not_zero()/get_task_mm().  The expand_stack() in page fault
context also doesn't need the new check, because all tasks under core
dumping are frozen.

Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com
Fixes: 86039bd3b4 ("userfaultfd: add new syscall to provide memory externalization")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jann Horn <jannh@google.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
[mhocko@suse.com: stable 4.4 backport
 - drop infiniband part because of missing 5f9794dc94f59
 - drop userfaultfd_event_wait_completion hunk because of
   missing 9cd75c3cd4c3d]
 - handle binder_update_page_range because of missing 720c241924046
 - handle mlx5_ib_disassociate_ucontext - akaher@vmware.com
]
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-22 08:18:27 +02:00
Shakeel Butt
c05fed5075 mm/list_lru.c: fix memory leak in __memcg_init_list_lru_node
commit 3510955b327176fd4cbab5baa75b449f077722a2 upstream.

Syzbot reported following memory leak:

ffffffffda RBX: 0000000000000003 RCX: 0000000000441f79
BUG: memory leak
unreferenced object 0xffff888114f26040 (size 32):
  comm "syz-executor626", pid 7056, jiffies 4294948701 (age 39.410s)
  hex dump (first 32 bytes):
    40 60 f2 14 81 88 ff ff 40 60 f2 14 81 88 ff ff  @`......@`......
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
     slab_post_alloc_hook mm/slab.h:439 [inline]
     slab_alloc mm/slab.c:3326 [inline]
     kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
     kmalloc include/linux/slab.h:547 [inline]
     __memcg_init_list_lru_node+0x58/0xf0 mm/list_lru.c:352
     memcg_init_list_lru_node mm/list_lru.c:375 [inline]
     memcg_init_list_lru mm/list_lru.c:459 [inline]
     __list_lru_init+0x193/0x2a0 mm/list_lru.c:626
     alloc_super+0x2e0/0x310 fs/super.c:269
     sget_userns+0x94/0x2a0 fs/super.c:609
     sget+0x8d/0xb0 fs/super.c:660
     mount_nodev+0x31/0xb0 fs/super.c:1387
     fuse_mount+0x2d/0x40 fs/fuse/inode.c:1236
     legacy_get_tree+0x27/0x80 fs/fs_context.c:661
     vfs_get_tree+0x2e/0x120 fs/super.c:1476
     do_new_mount fs/namespace.c:2790 [inline]
     do_mount+0x932/0xc50 fs/namespace.c:3110
     ksys_mount+0xab/0x120 fs/namespace.c:3319
     __do_sys_mount fs/namespace.c:3333 [inline]
     __se_sys_mount fs/namespace.c:3330 [inline]
     __x64_sys_mount+0x26/0x30 fs/namespace.c:3330
     do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301
     entry_SYSCALL_64_after_hwframe+0x44/0xa9

This is a simple off by one bug on the error path.

Link: http://lkml.kernel.org/r/20190528043202.99980-1-shakeelb@google.com
Fixes: 60d3fd32a7 ("list_lru: introduce per-memcg lists")
Reported-by: syzbot+f90a420dfe2b1b03cb2c@syzkaller.appspotmail.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: <stable@vger.kernel.org>	[4.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-22 08:18:22 +02:00
Yue Hu
937fa1624a mm/cma_debug.c: fix the break condition in cma_maxchunk_get()
[ Upstream commit f0fd50504a54f5548eb666dc16ddf8394e44e4b7 ]

If not find zero bit in find_next_zero_bit(), it will return the size
parameter passed in, so the start bit should be compared with bitmap_maxno
rather than cma->count.  Although getting maxchunk is working fine due to
zero value of order_per_bit currently, the operation will be stuck if
order_per_bit is set as non-zero.

Link: http://lkml.kernel.org/r/20190319092734.276-1-zbestahu@gmail.com
Signed-off-by: Yue Hu <huyue2@yulong.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Joe Perches <joe@perches.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Safonov <d.safonov@partner.samsung.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-22 08:18:18 +02:00
Yue Hu
fceb0be418 mm/cma.c: fix crash on CMA allocation if bitmap allocation fails
[ Upstream commit 1df3a339074e31db95c4790ea9236874b13ccd87 ]

f022d8cb7e ("mm: cma: Don't crash on allocation if CMA area can't be
activated") fixes the crash issue when activation fails via setting
cma->count as 0, same logic exists if bitmap allocation fails.

Link: http://lkml.kernel.org/r/20190325081309.6004-1-zbestahu@gmail.com
Signed-off-by: Yue Hu <huyue2@yulong.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-22 08:18:17 +02:00
Mike Kravetz
9c8d4d77e3 hugetlbfs: on restore reserve error path retain subpool reservation
[ Upstream commit 0919e1b69ab459e06df45d3ba6658d281962db80 ]

When a huge page is allocated, PagePrivate() is set if the allocation
consumed a reservation.  When freeing a huge page, PagePrivate is checked.
If set, it indicates the reservation should be restored.  PagePrivate
being set at free huge page time mostly happens on error paths.

When huge page reservations are created, a check is made to determine if
the mapping is associated with an explicitly mounted filesystem.  If so,
pages are also reserved within the filesystem.  The default action when
freeing a huge page is to decrement the usage count in any associated
explicitly mounted filesystem.  However, if the reservation is to be
restored the reservation/use count within the filesystem should not be
decrementd.  Otherwise, a subsequent page allocation and free for the same
mapping location will cause the file filesystem usage to go 'negative'.

Filesystem                         Size  Used Avail Use% Mounted on
nodev                              4.0G -4.0M  4.1G    - /opt/hugepool

To fix, when freeing a huge page do not adjust filesystem usage if
PagePrivate() is set to indicate the reservation should be restored.

I did not cc stable as the problem has been around since reserves were
added to hugetlbfs and nobody has noticed.

Link: http://lkml.kernel.org/r/20190328234704.27083-2-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-22 08:18:17 +02:00
Greg Kroah-Hartman
f992814dac This is the 4.4.181 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlz/gVcACgkQONu9yGCS
 aT4kBQ//adwq+iNdyEF550hc8tWZny0dSLPRKflzTb4hPXnzGdImCSY6pO1KdXzK
 IhjtgLb8aeFpDSZyyAw+sqFxY/2Nd9GZ5pgetWedm218uX/Hr9ETRUe+QqfmXKfx
 sIeBfhSSCm2T8HV23SOL+MWqLaHLQFEXWjSDxJAPxB7ptzGiYJ4jmje0MBrN1xV8
 22H5ijDR9SweZoR83AFtDAr9hKnpXz2ciQtJ/0xOjnVPGDQgD2uK3mpaO+F2r1hR
 kbLA2Hst3m4C3mtQZnns/SZWCKURkPk1hFYhKZyD0k757sRcSR4iHnqKdBBk29kR
 lFNfjVsAARCIj1ucYwwBbkiRJfBaCpT6TMphdtgT0f91zVMOCDTuVTN2couGSsJl
 6wWmboyM20SKkHJ3VawvtZ4YcTUjut2B1mZC/iFBSQJsMyVPQkhFzSdAXUKO6VZ9
 ZLrMTXNpPwlkYLL7VluIzdUr5crRmYj9sYIH1A/+pyzfM8WZO779jev/i2E4Eipt
 lU7ak2UMgSEZhv3GWmqPkFnJIpZwHyIsl5bGUWJ2b3wd69VasUjroVxRu1CyynXN
 CeDnqmJGLSoOlFD6/SF3MCqgvuavt3hgF+eKT2gbVti9zwLnxCxkQ7pgWMQpiMZs
 uIECSg9f1Zox/E+RpsyWc6Jx7r5yIkYHTlAyIpMuwgT+zwhWXaY=
 =sf4M
 -----END PGP SIGNATURE-----

Merge 4.4.181 into android-4.4-p

Changes in 4.4.181
	x86/speculation/mds: Revert CPU buffer clear on double fault exit
	x86/speculation/mds: Improve CPU buffer clear documentation
	ARM: exynos: Fix a leaked reference by adding missing of_node_put
	crypto: vmx - fix copy-paste error in CTR mode
	crypto: crct10dif-generic - fix use via crypto_shash_digest()
	crypto: x86/crct10dif-pcl - fix use via crypto_shash_digest()
	ALSA: usb-audio: Fix a memory leak bug
	ALSA: hda/hdmi - Consider eld_valid when reporting jack event
	ALSA: hda/realtek - EAPD turn on later
	ASoC: max98090: Fix restore of DAPM Muxes
	ASoC: RT5677-SPI: Disable 16Bit SPI Transfers
	mm/mincore.c: make mincore() more conservative
	ocfs2: fix ocfs2 read inode data panic in ocfs2_iget
	mfd: da9063: Fix OTP control register names to match datasheets for DA9063/63L
	tty/vt: fix write/write race in ioctl(KDSKBSENT) handler
	ext4: actually request zeroing of inode table after grow
	ext4: fix ext4_show_options for file systems w/o journal
	Btrfs: do not start a transaction at iterate_extent_inodes()
	bcache: fix a race between cache register and cacheset unregister
	bcache: never set KEY_PTRS of journal key to 0 in journal_reclaim()
	ipmi:ssif: compare block number correctly for multi-part return messages
	crypto: gcm - Fix error return code in crypto_gcm_create_common()
	crypto: gcm - fix incompatibility between "gcm" and "gcm_base"
	crypto: chacha20poly1305 - set cra_name correctly
	crypto: salsa20 - don't access already-freed walk.iv
	crypto: arm/aes-neonbs - don't access already-freed walk.iv
	writeback: synchronize sync(2) against cgroup writeback membership switches
	fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount
	ext4: zero out the unused memory region in the extent tree block
	ALSA: hda/realtek - Fix for Lenovo B50-70 inverted internal microphone bug
	KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes
	net: avoid weird emergency message
	net/mlx4_core: Change the error print to info print
	ppp: deflate: Fix possible crash in deflate_init
	tipc: switch order of device registration to fix a crash
	tipc: fix modprobe tipc failed after switch order of device registration
	stm class: Fix channel free in stm output free path
	md: add mddev->pers to avoid potential NULL pointer dereference
	intel_th: msu: Fix single mode with IOMMU
	of: fix clang -Wunsequenced for be32_to_cpu()
	cifs: fix strcat buffer overflow and reduce raciness in smb21_set_oplock_level()
	media: ov6650: Fix sensor possibly not detected on probe
	NFS4: Fix v4.0 client state corruption when mount
	clk: tegra: Fix PLLM programming on Tegra124+ when PMC overrides divider
	fuse: fix writepages on 32bit
	fuse: honor RLIMIT_FSIZE in fuse_file_fallocate
	iommu/tegra-smmu: Fix invalid ASID bits on Tegra30/114
	ceph: flush dirty inodes before proceeding with remount
	tracing: Fix partial reading of trace event's id file
	memory: tegra: Fix integer overflow on tick value calculation
	perf intel-pt: Fix instructions sampling rate
	perf intel-pt: Fix improved sample timestamp
	perf intel-pt: Fix sample timestamp wrt non-taken branches
	fbdev: sm712fb: fix brightness control on reboot, don't set SR30
	fbdev: sm712fb: fix VRAM detection, don't set SR70/71/74/75
	fbdev: sm712fb: fix white screen of death on reboot, don't set CR3B-CR3F
	fbdev: sm712fb: fix boot screen glitch when sm712fb replaces VGA
	fbdev: sm712fb: fix crashes during framebuffer writes by correctly mapping VRAM
	fbdev: sm712fb: fix support for 1024x768-16 mode
	fbdev: sm712fb: use 1024x768 by default on non-MIPS, fix garbled display
	fbdev: sm712fb: fix crashes and garbled display during DPMS modesetting
	PCI: Mark Atheros AR9462 to avoid bus reset
	dm delay: fix a crash when invalid device is specified
	xfrm: policy: Fix out-of-bound array accesses in __xfrm_policy_unlink
	xfrm6_tunnel: Fix potential panic when unloading xfrm6_tunnel module
	vti4: ipip tunnel deregistration fixes.
	xfrm4: Fix uninitialized memory read in _decode_session4
	KVM: arm/arm64: Ensure vcpu target is unset on reset failure
	power: supply: sysfs: prevent endless uevent loop with CONFIG_POWER_SUPPLY_DEBUG
	ufs: fix braino in ufs_get_inode_gid() for solaris UFS flavour
	perf bench numa: Add define for RUSAGE_THREAD if not present
	Revert "Don't jump to compute_result state from check_result state"
	md/raid: raid5 preserve the writeback action after the parity check
	btrfs: Honour FITRIM range constraints during free space trim
	fbdev: sm712fb: fix memory frequency by avoiding a switch/case fallthrough
	ext4: do not delete unlinked inode from orphan list on failed truncate
	KVM: x86: fix return value for reserved EFER
	bio: fix improper use of smp_mb__before_atomic()
	Revert "scsi: sd: Keep disk read-only when re-reading partition"
	crypto: vmx - CTR: always increment IV as quadword
	gfs2: Fix sign extension bug in gfs2_update_stats
	Btrfs: fix race between ranged fsync and writeback of adjacent ranges
	btrfs: sysfs: don't leak memory when failing add fsid
	fbdev: fix divide error in fb_var_to_videomode
	hugetlb: use same fault hash key for shared and private mappings
	fbdev: fix WARNING in __alloc_pages_nodemask bug
	media: cpia2: Fix use-after-free in cpia2_exit
	media: vivid: use vfree() instead of kfree() for dev->bitmap_cap
	ssb: Fix possible NULL pointer dereference in ssb_host_pcmcia_exit
	at76c50x-usb: Don't register led_trigger if usb_register_driver failed
	perf tools: No need to include bitops.h in util.h
	tools include: Adopt linux/bits.h
	gfs2: Fix lru_count going negative
	cxgb4: Fix error path in cxgb4_init_module
	mmc: core: Verify SD bus width
	powerpc/boot: Fix missing check of lseek() return value
	ASoC: imx: fix fiq dependencies
	spi: pxa2xx: fix SCR (divisor) calculation
	brcm80211: potential NULL dereference in brcmf_cfg80211_vndr_cmds_dcmd_handler()
	rtc: 88pm860x: prevent use-after-free on device remove
	w1: fix the resume command API
	dmaengine: pl330: _stop: clear interrupt status
	mac80211/cfg80211: update bss channel on channel switch
	ASoC: fsl_sai: Update is_slave_mode with correct value
	mwifiex: prevent an array overflow
	net: cw1200: fix a NULL pointer dereference
	bcache: return error immediately in bch_journal_replay()
	bcache: fix failure in journal relplay
	bcache: add failure check to run_cache_set() for journal replay
	bcache: avoid clang -Wunintialized warning
	x86/build: Move _etext to actual end of .text
	smpboot: Place the __percpu annotation correctly
	x86/mm: Remove in_nmi() warning from 64-bit implementation of vmalloc_fault()
	mm/uaccess: Use 'unsigned long' to placate UBSAN warnings on older GCC versions
	HID: logitech-hidpp: use RAP instead of FAP to get the protocol version
	pinctrl: pistachio: fix leaked of_node references
	dmaengine: at_xdmac: remove BUG_ON macro in tasklet
	media: coda: clear error return value before picture run
	media: ov6650: Move v4l2_clk_get() to ov6650_video_probe() helper
	media: au0828: stop video streaming only when last user stops
	media: ov2659: make S_FMT succeed even if requested format doesn't match
	audit: fix a memory leak bug
	media: au0828: Fix NULL pointer dereference in au0828_analog_stream_enable()
	media: pvrusb2: Prevent a buffer overflow
	powerpc/numa: improve control of topology updates
	sched/core: Check quota and period overflow at usec to nsec conversion
	sched/core: Handle overflow in cpu_shares_write_u64
	USB: core: Don't unbind interfaces following device reset failure
	x86/irq/64: Limit IST stack overflow check to #DB stack
	i40e: don't allow changes to HW VLAN stripping on active port VLANs
	RDMA/cxgb4: Fix null pointer dereference on alloc_skb failure
	hwmon: (vt1211) Use request_muxed_region for Super-IO accesses
	hwmon: (smsc47m1) Use request_muxed_region for Super-IO accesses
	hwmon: (smsc47b397) Use request_muxed_region for Super-IO accesses
	hwmon: (pc87427) Use request_muxed_region for Super-IO accesses
	hwmon: (f71805f) Use request_muxed_region for Super-IO accesses
	scsi: libsas: Do discovery on empty PHY to update PHY info
	mmc_spi: add a status check for spi_sync_locked
	mmc: sdhci-of-esdhc: add erratum eSDHC5 support
	mmc: sdhci-of-esdhc: add erratum eSDHC-A001 and A-008358 support
	PM / core: Propagate dev->power.wakeup_path when no callbacks
	extcon: arizona: Disable mic detect if running when driver is removed
	s390: cio: fix cio_irb declaration
	cpufreq: ppc_cbe: fix possible object reference leak
	cpufreq/pasemi: fix possible object reference leak
	cpufreq: pmac32: fix possible object reference leak
	x86/build: Keep local relocations with ld.lld
	iio: ad_sigma_delta: Properly handle SPI bus locking vs CS assertion
	iio: hmc5843: fix potential NULL pointer dereferences
	iio: common: ssp_sensors: Initialize calculated_time in ssp_common_process_data
	rtlwifi: fix a potential NULL pointer dereference
	brcmfmac: fix missing checks for kmemdup
	b43: shut up clang -Wuninitialized variable warning
	brcmfmac: convert dev_init_lock mutex to completion
	brcmfmac: fix race during disconnect when USB completion is in progress
	scsi: ufs: Fix regulator load and icc-level configuration
	scsi: ufs: Avoid configuring regulator with undefined voltage range
	arm64: cpu_ops: fix a leaked reference by adding missing of_node_put
	x86/ia32: Fix ia32_restore_sigcontext() AC leak
	chardev: add additional check for minor range overlap
	HID: core: move Usage Page concatenation to Main item
	ASoC: eukrea-tlv320: fix a leaked reference by adding missing of_node_put
	ASoC: fsl_utils: fix a leaked reference by adding missing of_node_put
	cxgb3/l2t: Fix undefined behaviour
	spi: tegra114: reset controller on probe
	media: wl128x: prevent two potential buffer overflows
	virtio_console: initialize vtermno value for ports
	tty: ipwireless: fix missing checks for ioremap
	rcutorture: Fix cleanup path for invalid torture_type strings
	usb: core: Add PM runtime calls to usb_hcd_platform_shutdown
	scsi: qla4xxx: avoid freeing unallocated dma memory
	media: m88ds3103: serialize reset messages in m88ds3103_set_frontend
	media: go7007: avoid clang frame overflow warning with KASAN
	media: saa7146: avoid high stack usage with clang
	scsi: lpfc: Fix SLI3 commands being issued on SLI4 devices
	spi : spi-topcliff-pch: Fix to handle empty DMA buffers
	spi: rspi: Fix sequencer reset during initialization
	spi: Fix zero length xfer bug
	ASoC: davinci-mcasp: Fix clang warning without CONFIG_PM
	ipv6: Consider sk_bound_dev_if when binding a raw socket to an address
	llc: fix skb leak in llc_build_and_send_ui_pkt()
	net-gro: fix use-after-free read in napi_gro_frags()
	net: stmmac: fix reset gpio free missing
	usbnet: fix kernel crash after disconnect
	tipc: Avoid copying bytes beyond the supplied data
	bnxt_en: Fix aggregation buffer leak under OOM condition.
	net: mvpp2: fix bad MVPP2_TXQ_SCHED_TOKEN_CNTR_REG queue value
	crypto: vmx - ghash: do nosimd fallback manually
	xen/pciback: Don't disable PCI_COMMAND on PCI device reset.
	Revert "tipc: fix modprobe tipc failed after switch order of device registration"
	tipc: fix modprobe tipc failed after switch order of device registration -v2
	sparc64: Fix regression in non-hypervisor TLB flush xcall
	include/linux/bitops.h: sanitize rotate primitives
	xhci: Convert xhci_handshake() to use readl_poll_timeout_atomic()
	usb: xhci: avoid null pointer deref when bos field is NULL
	USB: Fix slab-out-of-bounds write in usb_get_bos_descriptor
	USB: sisusbvga: fix oops in error path of sisusb_probe
	USB: Add LPM quirk for Surface Dock GigE adapter
	USB: rio500: refuse more than one device at a time
	USB: rio500: fix memory leak in close after disconnect
	media: usb: siano: Fix general protection fault in smsusb
	media: usb: siano: Fix false-positive "uninitialized variable" warning
	media: smsusb: better handle optional alignment
	scsi: zfcp: fix missing zfcp_port reference put on -EBUSY from port_remove
	scsi: zfcp: fix to prevent port_remove with pure auto scan LUNs (only sdevs)
	Btrfs: fix race updating log root item during fsync
	ALSA: hda/realtek - Set default power save node to 0
	drm/nouveau/i2c: Disable i2c bus access after ->fini()
	tty: serial: msm_serial: Fix XON/XOFF
	tty: max310x: Fix external crystal register setup
	memcg: make it work on sparse non-0-node systems
	kernel/signal.c: trace_signal_deliver when signal_group_exit
	CIFS: cifs_read_allocate_pages: don't iterate through whole page array on ENOMEM
	binder: Replace "%p" with "%pK" for stable
	binder: replace "%p" with "%pK"
	net: create skb_gso_validate_mac_len()
	bnx2x: disable GSO where gso_size is too big for hardware
	brcmfmac: Add length checks on firmware events
	brcmfmac: screening firmware event packet
	brcmfmac: revise handling events in receive path
	brcmfmac: fix incorrect event channel deduction
	brcmfmac: add length checks in scheduled scan result handler
	brcmfmac: add subtype check for event handling in data path
	userfaultfd: don't pin the user memory in userfaultfd_file_create()
	Revert "x86/build: Move _etext to actual end of .text"
	net: cdc_ncm: GetNtbFormat endian fix
	usb: gadget: fix request length error for isoc transfer
	media: uvcvideo: Fix uvc_alloc_entity() allocation alignment
	ethtool: fix potential userspace buffer overflow
	neighbor: Call __ipv4_neigh_lookup_noref in neigh_xmit
	net/mlx4_en: ethtool, Remove unsupported SFP EEPROM high pages query
	net: rds: fix memory leak in rds_ib_flush_mr_pool
	pktgen: do not sleep with the thread lock held.
	rcu: locking and unlocking need to always be at least barriers
	parisc: Use implicit space register selection for loading the coherence index of I/O pdirs
	fuse: fallocate: fix return with locked inode
	MIPS: pistachio: Build uImage.gz by default
	genwqe: Prevent an integer overflow in the ioctl
	drm/gma500/cdv: Check vbt config bits when detecting lvds panels
	fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock
	fuse: Add FOPEN_STREAM to use stream_open()
	ipv4: Define __ipv4_neigh_lookup_noref when CONFIG_INET is disabled
	ethtool: check the return value of get_regs_len
	Linux 4.4.181

Change-Id: I0c9e7effbb6bd5d1978b4ffad3db3b76af6692bc
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-06-11 14:23:58 +02:00
Jiri Slaby
7a47d18731 memcg: make it work on sparse non-0-node systems
commit 3e8589963773a5c23e2f1fe4bcad0e9a90b7f471 upstream.

We have a single node system with node 0 disabled:
  Scanning NUMA topology in Northbridge 24
  Number of physical nodes 2
  Skipping disabled node 0
  Node 1 MemBase 0000000000000000 Limit 00000000fbff0000
  NODE_DATA(1) allocated [mem 0xfbfda000-0xfbfeffff]

This causes crashes in memcg when system boots:
  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  #PF error: [normal kernel read fault]
...
  RIP: 0010:list_lru_add+0x94/0x170
...
  Call Trace:
   d_lru_add+0x44/0x50
   dput.part.34+0xfc/0x110
   __fput+0x108/0x230
   task_work_run+0x9f/0xc0
   exit_to_usermode_loop+0xf5/0x100

It is reproducible as far as 4.12.  I did not try older kernels.  You have
to have a new enough systemd, e.g.  241 (the reason is unknown -- was not
investigated).  Cannot be reproduced with systemd 234.

The system crashes because the size of lru array is never updated in
memcg_update_all_list_lrus and the reads are past the zero-sized array,
causing dereferences of random memory.

The root cause are list_lru_memcg_aware checks in the list_lru code.  The
test in list_lru_memcg_aware is broken: it assumes node 0 is always
present, but it is not true on some systems as can be seen above.

So fix this by avoiding checks on node 0.  Remember the memcg-awareness by
a bool flag in struct list_lru.

Link: http://lkml.kernel.org/r/20190522091940.3615-1-jslaby@suse.cz
Fixes: 60d3fd32a7 ("list_lru: introduce per-memcg lists")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:24:10 +02:00
Mike Kravetz
bf8474c648 hugetlb: use same fault hash key for shared and private mappings
commit 1b426bac66e6cc83c9f2d92b96e4e72acf43419a upstream.

hugetlb uses a fault mutex hash table to prevent page faults of the
same pages concurrently.  The key for shared and private mappings is
different.  Shared keys off address_space and file index.  Private keys
off mm and virtual address.  Consider a private mappings of a populated
hugetlbfs file.  A fault will map the page from the file and if needed
do a COW to map a writable page.

Hugetlbfs hole punch uses the fault mutex to prevent mappings of file
pages.  It uses the address_space file index key.  However, private
mappings will use a different key and could race with this code to map
the file page.  This causes problems (BUG) for the page cache remove
code as it expects the page to be unmapped.  A sample stack is:

page dumped because: VM_BUG_ON_PAGE(page_mapped(page))
kernel BUG at mm/filemap.c:169!
...
RIP: 0010:unaccount_page_cache_page+0x1b8/0x200
...
Call Trace:
__delete_from_page_cache+0x39/0x220
delete_from_page_cache+0x45/0x70
remove_inode_hugepages+0x13c/0x380
? __add_to_page_cache_locked+0x162/0x380
hugetlbfs_fallocate+0x403/0x540
? _cond_resched+0x15/0x30
? __inode_security_revalidate+0x5d/0x70
? selinux_file_permission+0x100/0x130
vfs_fallocate+0x13f/0x270
ksys_fallocate+0x3c/0x80
__x64_sys_fallocate+0x1a/0x20
do_syscall_64+0x5b/0x180
entry_SYSCALL_64_after_hwframe+0x44/0xa9

There seems to be another potential COW issue/race with this approach
of different private and shared keys as noted in commit 8382d914eb
("mm, hugetlb: improve page-fault scalability").

Since every hugetlb mapping (even anon and private) is actually a file
mapping, just use the address_space index key for all mappings.  This
results in potentially more hash collisions.  However, this should not
be the common case.

Link: http://lkml.kernel.org/r/20190328234704.27083-3-mike.kravetz@oracle.com
Link: http://lkml.kernel.org/r/20190412165235.t4sscoujczfhuiyt@linux-r8p5
Fixes: b5cec28d36 ("hugetlbfs: truncate_hugepages() takes a range of pages")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:23:52 +02:00
Tejun Heo
bfce20eaf1 writeback: synchronize sync(2) against cgroup writeback membership switches
commit 7fc5854f8c6efae9e7624970ab49a1eac2faefb1 upstream.

sync_inodes_sb() can race against cgwb (cgroup writeback) membership
switches and fail to writeback some inodes.  For example, if an inode
switches to another wb while sync_inodes_sb() is in progress, the new
wb might not be visible to bdi_split_work_to_wbs() at all or the inode
might jump from a wb which hasn't issued writebacks yet to one which
already has.

This patch adds backing_dev_info->wb_switch_rwsem to synchronize cgwb
switch path against sync_inodes_sb() so that sync_inodes_sb() is
guaranteed to see all the target wbs and inodes can't jump wbs to
escape syncing.

v2: Fixed misplaced rwsem init.  Spotted by Jiufei.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jiufei Xue <xuejiufei@gmail.com>
Link: http://lkml.kernel.org/r/dc694ae2-f07f-61e1-7097-7c8411cee12d@gmail.com
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:23:41 +02:00
Jiri Kosina
b614485b6b mm/mincore.c: make mincore() more conservative
commit 134fca9063ad4851de767d1768180e5dede9a881 upstream.

The semantics of what mincore() considers to be resident is not
completely clear, but Linux has always (since 2.3.52, which is when
mincore() was initially done) treated it as "page is available in page
cache".

That's potentially a problem, as that [in]directly exposes
meta-information about pagecache / memory mapping state even about
memory not strictly belonging to the process executing the syscall,
opening possibilities for sidechannel attacks.

Change the semantics of mincore() so that it only reveals pagecache
information for non-anonymous mappings that belog to files that the
calling process could (if it tried to) successfully open for writing;
otherwise we'd be including shared non-exclusive mappings, which

 - is the sidechannel

 - is not the usecase for mincore(), as that's primarily used for data,
   not (shared) text

[jkosina@suse.cz: v2]
  Link: http://lkml.kernel.org/r/20190312141708.6652-2-vbabka@suse.cz
[mhocko@suse.com: restructure can_do_mincore() conditions]
Link: http://lkml.kernel.org/r/nycvar.YFH.7.76.1903062342020.19912@cbobk.fhfr.pm
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Josh Snyder <joshs@netflix.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Originally-by: Linus Torvalds <torvalds@linux-foundation.org>
Originally-by: Dominique Martinet <asmadeus@codewreck.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Kevin Easton <kevin@guarana.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Cyril Hrubis <chrubis@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Daniel Gruss <daniel@gruss.cc>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:23:36 +02:00
Greg Kroah-Hartman
505ad68286 This is the 4.4.179 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlzEBesACgkQONu9yGCS
 aT4KhA//fTPDn1cwSHYX7vCK1gWVfAC+d7SnWcdlyjiCzuUNOHeGtwYl2xIi2pmH
 7cnxCpzKRbxgPCuawj/T4MrZoKtc2fH8sduQGQtiIgtz4rew9vqYcQTIxO0AF8cI
 KcqZefj7L6M04RxSq+F0O2MpXtPCIfDnoYej44Espa5tCt1tWJLxQvaEV4waHmA1
 +6/Sh9ZlWzMF/vcyTJS0RpCSNrHDVjaoQjgQuaDNGaGhPhaWqvPavd4eSzLrOQ4U
 r54X+T3vqQuF3gjSVywGgSvRkpFX+ZTsgPtg9mkm3oJTNoLUa9nnCRl1p6ig+moI
 +7ZPFbk0Duhi+N34GpSL2MXbFkMz49Cvon+kDlQrO0LvETWha39VyEG/GcHSCm+o
 ISNWwEY//rsYl8JF3rZIzfbw973/x0QKZeNQySs184ls4pwnIo3To3fL2Wv9ArA1
 jCIHT3lFwBkNZUMfK9hz8e7fC93QZPIgDLGx/HWBDPRE05D5O29kRSuj6cQ10GFk
 PChQiV3WLiwgwEhq3/tIso3052MheAVaNsz76wYbpakTrupjkapNeN64bOOZO2FD
 BzdLkpSjOIe5kaSMzzVCTt9A25M0t8iz/rj7/6OEPYtaT8T49t1ZtTxjrAzqL2nc
 oyB0U20t67uMoaloZUQF6kYmMirvnYYVwDWSHXQ568bU5wXhI0U=
 =pJ1x
 -----END PGP SIGNATURE-----

Merge 4.4.179 into android-4.4-p

Changes in 4.4.179
	arm64: debug: Don't propagate UNKNOWN FAR into si_code for debug signals
	arm64: debug: Ensure debug handlers check triggering exception level
	ext4: cleanup bh release code in ext4_ind_remove_space()
	lib/int_sqrt: optimize initial value compute
	tty/serial: atmel: Add is_half_duplex helper
	mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified
	i2c: core-smbus: prevent stack corruption on read I2C_BLOCK_DATA
	Bluetooth: Fix decrementing reference count twice in releasing socket
	tty/serial: atmel: RS485 HD w/DMA: enable RX after TX is stopped
	CIFS: fix POSIX lock leak and invalid ptr deref
	h8300: use cc-cross-prefix instead of hardcoding h8300-unknown-linux-
	tracing: kdb: Fix ftdump to not sleep
	gpio: gpio-omap: fix level interrupt idling
	sysctl: handle overflow for file-max
	enic: fix build warning without CONFIG_CPUMASK_OFFSTACK
	mm/cma.c: cma_declare_contiguous: correct err handling
	mm/page_ext.c: fix an imbalance with kmemleak
	mm/vmalloc.c: fix kernel BUG at mm/vmalloc.c:512!
	mm/slab.c: kmemleak no scan alien caches
	ocfs2: fix a panic problem caused by o2cb_ctl
	f2fs: do not use mutex lock in atomic context
	fs/file.c: initialize init_files.resize_wait
	cifs: use correct format characters
	dm thin: add sanity checks to thin-pool and external snapshot creation
	cifs: Fix NULL pointer dereference of devname
	fs: fix guard_bio_eod to check for real EOD errors
	tools lib traceevent: Fix buffer overflow in arg_eval
	usb: chipidea: Grab the (legacy) USB PHY by phandle first
	scsi: core: replace GFP_ATOMIC with GFP_KERNEL in scsi_scan.c
	coresight: etm4x: Add support to enable ETMv4.2
	ARM: 8840/1: use a raw_spinlock_t in unwind
	mmc: omap: fix the maximum timeout setting
	e1000e: Fix -Wformat-truncation warnings
	IB/mlx4: Increase the timeout for CM cache
	scsi: megaraid_sas: return error when create DMA pool failed
	perf test: Fix failure of 'evsel-tp-sched' test on s390
	SoC: imx-sgtl5000: add missing put_device()
	media: sh_veu: Correct return type for mem2mem buffer helpers
	media: s5p-jpeg: Correct return type for mem2mem buffer helpers
	media: s5p-g2d: Correct return type for mem2mem buffer helpers
	media: mx2_emmaprp: Correct return type for mem2mem buffer helpers
	leds: lp55xx: fix null deref on firmware load failure
	kprobes: Prohibit probing on bsearch()
	ARM: 8833/1: Ensure that NEON code always compiles with Clang
	ALSA: PCM: check if ops are defined before suspending PCM
	bcache: fix input overflow to cache set sysfs file io_error_halflife
	bcache: fix input overflow to sequential_cutoff
	bcache: improve sysfs_strtoul_clamp()
	fbdev: fbmem: fix memory access if logo is bigger than the screen
	cdrom: Fix race condition in cdrom_sysctl_register
	ASoC: fsl-asoc-card: fix object reference leaks in fsl_asoc_card_probe
	soc: qcom: gsbi: Fix error handling in gsbi_probe()
	mt7601u: bump supported EEPROM version
	ARM: avoid Cortex-A9 livelock on tight dmb loops
	tty: increase the default flip buffer limit to 2*640K
	media: mt9m111: set initial frame size other than 0x0
	hwrng: virtio - Avoid repeated init of completion
	soc/tegra: fuse: Fix illegal free of IO base address
	hpet: Fix missing '=' character in the __setup() code of hpet_mmap_enable
	dmaengine: imx-dma: fix warning comparison of distinct pointer types
	netfilter: physdev: relax br_netfilter dependency
	media: s5p-jpeg: Check for fmt_ver_flag when doing fmt enumeration
	regulator: act8865: Fix act8600_sudcdc_voltage_ranges setting
	wlcore: Fix memory leak in case wl12xx_fetch_firmware failure
	x86/build: Mark per-CPU symbols as absolute explicitly for LLD
	dmaengine: tegra: avoid overflow of byte tracking
	drm/dp/mst: Configure no_stop_bit correctly for remote i2c xfers
	binfmt_elf: switch to new creds when switching to new mm
	kbuild: clang: choose GCC_TOOLCHAIN_DIR not on LD
	x86/build: Specify elf_i386 linker emulation explicitly for i386 objects
	x86: vdso: Use $LD instead of $CC to link
	x86/vdso: Drop implicit common-page-size linker flag
	lib/string.c: implement a basic bcmp
	tty: mark Siemens R3964 line discipline as BROKEN
	tty: ldisc: add sysctl to prevent autoloading of ldiscs
	ipv6: Fix dangling pointer when ipv6 fragment
	ipv6: sit: reset ip header pointer in ipip6_rcv
	net: rds: force to destroy connection if t_sock is NULL in rds_tcp_kill_sock().
	openvswitch: fix flow actions reallocation
	qmi_wwan: add Olicard 600
	sctp: initialize _pad of sockaddr_in before copying to user memory
	tcp: Ensure DCTCP reacts to losses
	netns: provide pure entropy for net_hash_mix()
	net: ethtool: not call vzalloc for zero sized memory request
	ip6_tunnel: Match to ARPHRD_TUNNEL6 for dev type
	ALSA: seq: Fix OOB-reads from strlcpy
	include/linux/bitrev.h: fix constant bitrev
	ASoC: fsl_esai: fix channel swap issue when stream starts
	block: do not leak memory in bio_copy_user_iov()
	genirq: Respect IRQCHIP_SKIP_SET_WAKE in irq_chip_set_wake_parent()
	ARM: dts: at91: Fix typo in ISC_D0 on PC9
	arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value
	xen: Prevent buffer overflow in privcmd ioctl
	sched/fair: Do not re-read ->h_load_next during hierarchical load calculation
	xtensa: fix return_address
	PCI: Add function 1 DMA alias quirk for Marvell 9170 SATA controller
	perf/core: Restore mmap record type correctly
	ext4: add missing brelse() in add_new_gdb_meta_bg()
	ext4: report real fs size after failed resize
	ALSA: echoaudio: add a check for ioremap_nocache
	ALSA: sb8: add a check for request_region
	IB/mlx4: Fix race condition between catas error reset and aliasguid flows
	mmc: davinci: remove extraneous __init annotation
	ALSA: opl3: fix mismatch between snd_opl3_drum_switch definition and declaration
	thermal/int340x_thermal: Add additional UUIDs
	thermal/int340x_thermal: fix mode setting
	tools/power turbostat: return the exit status of a command
	perf top: Fix error handling in cmd_top()
	perf evsel: Free evsel->counts in perf_evsel__exit()
	perf tests: Fix a memory leak of cpu_map object in the openat_syscall_event_on_all_cpus test
	perf tests: Fix a memory leak in test__perf_evsel__tp_sched_test()
	x86/hpet: Prevent potential NULL pointer dereference
	x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors
	iommu/vt-d: Check capability before disabling protected memory
	x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return an error
	fix incorrect error code mapping for OBJECTID_NOT_FOUND
	ext4: prohibit fstrim in norecovery mode
	rsi: improve kernel thread handling to fix kernel panic
	9p: do not trust pdu content for stat item size
	9p locks: add mount option for lock retry interval
	f2fs: fix to do sanity check with current segment number
	serial: uartps: console_setup() can't be placed to init section
	ARM: samsung: Limit SAMSUNG_PM_CHECK config option to non-Exynos platforms
	ACPI / SBS: Fix GPE storm on recent MacBookPro's
	cifs: fallback to older infolevels on findfirst queryinfo retry
	crypto: sha256/arm - fix crash bug in Thumb2 build
	crypto: sha512/arm - fix crash bug in Thumb2 build
	iommu/dmar: Fix buffer overflow during PCI bus notification
	ARM: 8839/1: kprobe: make patch_lock a raw_spinlock_t
	appletalk: Fix use-after-free in atalk_proc_exit
	lib/div64.c: off by one in shift
	include/linux/swap.h: use offsetof() instead of custom __swapoffset macro
	tpm/tpm_crb: Avoid unaligned reads in crb_recv()
	ovl: fix uid/gid when creating over whiteout
	appletalk: Fix compile regression
	bonding: fix event handling for stacked bonds
	net: atm: Fix potential Spectre v1 vulnerabilities
	net: bridge: multicast: use rcu to access port list from br_multicast_start_querier
	net: fou: do not use guehdr after iptunnel_pull_offloads in gue_udp_recv
	tcp: tcp_grow_window() needs to respect tcp_space()
	ipv4: recompile ip options in ipv4_link_failure
	ipv4: ensure rcu_read_lock() in ipv4_link_failure()
	crypto: crypto4xx - properly set IV after de- and encrypt
	modpost: file2alias: go back to simple devtable lookup
	modpost: file2alias: check prototype of handler
	tpm/tpm_i2c_atmel: Return -E2BIG when the transfer is incomplete
	KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU
	iio/gyro/bmg160: Use millidegrees for temperature scale
	iio: ad_sigma_delta: select channel when reading register
	iio: adc: at91: disable adc channel interrupt in timeout case
	io: accel: kxcjk1013: restore the range after resume.
	staging: comedi: vmk80xx: Fix use of uninitialized semaphore
	staging: comedi: vmk80xx: Fix possible double-free of ->usb_rx_buf
	staging: comedi: ni_usb6501: Fix use of uninitialized mutex
	staging: comedi: ni_usb6501: Fix possible double-free of ->usb_rx_buf
	ALSA: core: Fix card races between register and disconnect
	crypto: x86/poly1305 - fix overflow during partial reduction
	arm64: futex: Restore oldval initialization to work around buggy compilers
	x86/kprobes: Verify stack frame on kretprobe
	kprobes: Mark ftrace mcount handler functions nokprobe
	kprobes: Fix error check when reusing optimized probes
	mac80211: do not call driver wake_tx_queue op during reconfig
	Revert "kbuild: use -Oz instead of -Os when using clang"
	sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup
	device_cgroup: fix RCU imbalance in error case
	mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n
	ALSA: info: Fix racy addition/deletion of nodes
	Revert "locking/lockdep: Add debug_locks check in __lock_downgrade()"
	kernel/sysctl.c: fix out-of-bounds access when setting file-max
	Linux 4.4.179

Change-Id: Ia88dbd6c37250a682098a4a8540672869c6adf42
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-04-30 13:25:38 +02:00
Konstantin Khlebnikov
0e4d4e0d6b mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n
commit e8277b3b52240ec1caad8e6df278863e4bf42eac upstream.

Commit 58bc4c34d249 ("mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly")
depends on skipping vmstat entries with empty name introduced in
7aaf77272358 ("mm: don't show nr_indirectly_reclaimable in
/proc/vmstat") but reverted in b29940c1abd7 ("mm: rename and change
semantics of nr_indirectly_reclaimable_bytes").

So skipping no longer works and /proc/vmstat has misformatted lines " 0".

This patch simply shows debug counters "nr_tlb_remote_*" for UP.

Link: http://lkml.kernel.org/r/155481488468.467.4295519102880913454.stgit@buzz
Fixes: 58bc4c34d249 ("mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Roman Gushchin <guro@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-27 09:34:02 +02:00
Qian Cai
b1399497b7 mm/slab.c: kmemleak no scan alien caches
[ Upstream commit 92d1d07daad65c300c7d0b68bbef8867e9895d54 ]

Kmemleak throws endless warnings during boot due to in
__alloc_alien_cache(),

    alc = kmalloc_node(memsize, gfp, node);
    init_arraycache(&alc->ac, entries, batch);
    kmemleak_no_scan(ac);

Kmemleak does not track the array cache (alc->ac) but the alien cache
(alc) instead, so let it track the latter by lifting kmemleak_no_scan()
out of init_arraycache().

There is another place that calls init_arraycache(), but
alloc_kmem_cache_cpus() uses the percpu allocation where will never be
considered as a leak.

  kmemleak: Found object by alias at 0xffff8007b9aa7e38
  CPU: 190 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc2+ #2
  Call trace:
   dump_backtrace+0x0/0x168
   show_stack+0x24/0x30
   dump_stack+0x88/0xb0
   lookup_object+0x84/0xac
   find_and_get_object+0x84/0xe4
   kmemleak_no_scan+0x74/0xf4
   setup_kmem_cache_node+0x2b4/0x35c
   __do_tune_cpucache+0x250/0x2d4
   do_tune_cpucache+0x4c/0xe4
   enable_cpucache+0xc8/0x110
   setup_cpu_cache+0x40/0x1b8
   __kmem_cache_create+0x240/0x358
   create_cache+0xc0/0x198
   kmem_cache_create_usercopy+0x158/0x20c
   kmem_cache_create+0x50/0x64
   fsnotify_init+0x58/0x6c
   do_one_initcall+0x194/0x388
   kernel_init_freeable+0x668/0x688
   kernel_init+0x18/0x124
   ret_from_fork+0x10/0x18
  kmemleak: Object 0xffff8007b9aa7e00 (size 256):
  kmemleak:   comm "swapper/0", pid 1, jiffies 4294697137
  kmemleak:   min_count = 1
  kmemleak:   count = 0
  kmemleak:   flags = 0x1
  kmemleak:   checksum = 0
  kmemleak:   backtrace:
       kmemleak_alloc+0x84/0xb8
       kmem_cache_alloc_node_trace+0x31c/0x3a0
       __kmalloc_node+0x58/0x78
       setup_kmem_cache_node+0x26c/0x35c
       __do_tune_cpucache+0x250/0x2d4
       do_tune_cpucache+0x4c/0xe4
       enable_cpucache+0xc8/0x110
       setup_cpu_cache+0x40/0x1b8
       __kmem_cache_create+0x240/0x358
       create_cache+0xc0/0x198
       kmem_cache_create_usercopy+0x158/0x20c
       kmem_cache_create+0x50/0x64
       fsnotify_init+0x58/0x6c
       do_one_initcall+0x194/0x388
       kernel_init_freeable+0x668/0x688
       kernel_init+0x18/0x124
  kmemleak: Not scanning unknown object at 0xffff8007b9aa7e38
  CPU: 190 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc2+ #2
  Call trace:
   dump_backtrace+0x0/0x168
   show_stack+0x24/0x30
   dump_stack+0x88/0xb0
   kmemleak_no_scan+0x90/0xf4
   setup_kmem_cache_node+0x2b4/0x35c
   __do_tune_cpucache+0x250/0x2d4
   do_tune_cpucache+0x4c/0xe4
   enable_cpucache+0xc8/0x110
   setup_cpu_cache+0x40/0x1b8
   __kmem_cache_create+0x240/0x358
   create_cache+0xc0/0x198
   kmem_cache_create_usercopy+0x158/0x20c
   kmem_cache_create+0x50/0x64
   fsnotify_init+0x58/0x6c
   do_one_initcall+0x194/0x388
   kernel_init_freeable+0x668/0x688
   kernel_init+0x18/0x124
   ret_from_fork+0x10/0x18

Link: http://lkml.kernel.org/r/20190129184518.39808-1-cai@lca.pw
Fixes: 1fe00d50a9 ("slab: factor out initialization of array cache")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-27 09:33:48 +02:00
Uladzislau Rezki (Sony)
cb4d6cd276 mm/vmalloc.c: fix kernel BUG at mm/vmalloc.c:512!
[ Upstream commit afd07389d3f4933c7f7817a92fb5e053d59a3182 ]

One of the vmalloc stress test case triggers the kernel BUG():

  <snip>
  [60.562151] ------------[ cut here ]------------
  [60.562154] kernel BUG at mm/vmalloc.c:512!
  [60.562206] invalid opcode: 0000 [#1] PREEMPT SMP PTI
  [60.562247] CPU: 0 PID: 430 Comm: vmalloc_test/0 Not tainted 4.20.0+ #161
  [60.562293] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
  [60.562351] RIP: 0010:alloc_vmap_area+0x36f/0x390
  <snip>

it can happen due to big align request resulting in overflowing of
calculated address, i.e.  it becomes 0 after ALIGN()'s fixup.

Fix it by checking if calculated address is within vstart/vend range.

Link: http://lkml.kernel.org/r/20190124115648.9433-2-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-27 09:33:48 +02:00
Qian Cai
2ea83494ce mm/page_ext.c: fix an imbalance with kmemleak
[ Upstream commit 0c81585499601acd1d0e1cbf424cabfaee60628c ]

After offlining a memory block, kmemleak scan will trigger a crash, as
it encounters a page ext address that has already been freed during
memory offlining.  At the beginning in alloc_page_ext(), it calls
kmemleak_alloc(), but it does not call kmemleak_free() in
free_page_ext().

    BUG: unable to handle kernel paging request at ffff888453d00000
    PGD 128a01067 P4D 128a01067 PUD 128a04067 PMD 47e09e067 PTE 800ffffbac2ff060
    Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
    CPU: 1 PID: 1594 Comm: bash Not tainted 5.0.0-rc8+ #15
    Hardware name: HP ProLiant DL180 Gen9/ProLiant DL180 Gen9, BIOS U20 10/25/2017
    RIP: 0010:scan_block+0xb5/0x290
    Code: 85 6e 01 00 00 48 b8 00 00 30 f5 81 88 ff ff 48 39 c3 0f 84 5b 01 00 00 48 89 d8 48 c1 e8 03 42 80 3c 20 00 0f 85 87 01 00 00 <4c> 8b 3b e8 f3 0c fa ff 4c 39 3d 0c 6b 4c 01 0f 87 08 01 00 00 4c
    RSP: 0018:ffff8881ec57f8e0 EFLAGS: 00010082
    RAX: 0000000000000000 RBX: ffff888453d00000 RCX: ffffffffa61e5a54
    RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff888453d00000
    RBP: ffff8881ec57f920 R08: fffffbfff4ed588d R09: fffffbfff4ed588c
    R10: fffffbfff4ed588c R11: ffffffffa76ac463 R12: dffffc0000000000
    R13: ffff888453d00ff9 R14: ffff8881f80cef48 R15: ffff8881f80cef48
    FS:  00007f6c0e3f8740(0000) GS:ffff8881f7680000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffff888453d00000 CR3: 00000001c4244003 CR4: 00000000001606a0
    Call Trace:
     scan_gray_list+0x269/0x430
     kmemleak_scan+0x5a8/0x10f0
     kmemleak_write+0x541/0x6ca
     full_proxy_write+0xf8/0x190
     __vfs_write+0xeb/0x980
     vfs_write+0x15a/0x4f0
     ksys_write+0xd2/0x1b0
     __x64_sys_write+0x73/0xb0
     do_syscall_64+0xeb/0xaaa
     entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x7f6c0dad73b8
    Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 63 2d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
    RSP: 002b:00007ffd5b863cb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
    RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f6c0dad73b8
    RDX: 0000000000000005 RSI: 000055a9216e1710 RDI: 0000000000000001
    RBP: 000055a9216e1710 R08: 000000000000000a R09: 00007ffd5b863840
    R10: 000000000000000a R11: 0000000000000246 R12: 00007f6c0dda9780
    R13: 0000000000000005 R14: 00007f6c0dda4740 R15: 0000000000000005
    Modules linked in: nls_iso8859_1 nls_cp437 vfat fat kvm_intel kvm irqbypass efivars ip_tables x_tables xfs sd_mod ahci libahci igb i2c_algo_bit libata i2c_core dm_mirror dm_region_hash dm_log dm_mod efivarfs
    CR2: ffff888453d00000
    ---[ end trace ccf646c7456717c5 ]---
    Kernel panic - not syncing: Fatal exception
    Shutting down cpus with NMI
    Kernel Offset: 0x24c00000 from 0xffffffff81000000 (relocation range:
    0xffffffff80000000-0xffffffffbfffffff)
    ---[ end Kernel panic - not syncing: Fatal exception ]---

Link: http://lkml.kernel.org/r/20190227173147.75650-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-27 09:33:48 +02:00
Peng Fan
4970a8ba94 mm/cma.c: cma_declare_contiguous: correct err handling
[ Upstream commit 0d3bd18a5efd66097ef58622b898d3139790aa9d ]

In case cma_init_reserved_mem failed, need to free the memblock
allocated by memblock_reserve or memblock_alloc_range.

Quote Catalin's comments:
  https://lkml.org/lkml/2019/2/26/482

Kmemleak is supposed to work with the memblock_{alloc,free} pair and it
ignores the memblock_reserve() as a memblock_alloc() implementation
detail. It is, however, tolerant to memblock_free() being called on
a sub-range or just a different range from a previous memblock_alloc().
So the original patch looks fine to me. FWIW:

Link: http://lkml.kernel.org/r/20190227144631.16708-1-peng.fan@nxp.com
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-27 09:33:48 +02:00
Yang Shi
b3b489eea2 mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified
commit a7f40cfe3b7ada57af9b62fd28430eeb4a7cfcb7 upstream.

When MPOL_MF_STRICT was specified and an existing page was already on a
node that does not follow the policy, mbind() should return -EIO.  But
commit 6f4576e368 ("mempolicy: apply page table walker on
queue_pages_range()") broke the rule.

And commit c8633798497c ("mm: mempolicy: mbind and migrate_pages support
thp migration") didn't return the correct value for THP mbind() too.

If MPOL_MF_STRICT is set, ignore vma_migratable() to make sure it
reaches queue_pages_to_pte_range() or queue_pages_pmd() to check if an
existing page was already on a node that does not follow the policy.
And, non-migratable vma may be used, return -EIO too if MPOL_MF_MOVE or
MPOL_MF_MOVE_ALL was specified.

Tested with https://github.com/metan-ucw/ltp/blob/master/testcases/kernel/syscalls/mbind/mbind02.c

[akpm@linux-foundation.org: tweak code comment]
Link: http://lkml.kernel.org/r/1553020556-38583-1-git-send-email-yang.shi@linux.alibaba.com
Fixes: 6f4576e368 ("mempolicy: apply page table walker on queue_pages_range()")
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reported-by: Cyril Hrubis <chrubis@suse.cz>
Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: Rafael Aquini <aquini@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-27 09:33:47 +02:00
Greg Kroah-Hartman
254f2a04a8 This is the 4.4.178 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlykNUEACgkQONu9yGCS
 aT6n6A//QT/8UQ8IWU2J1iTtlxX95RWxfgbsip0bBh8PdVOhRAalR6+fa6F/Fh9D
 kM82QHro5R9ZO48mkQ1yF4ooJmVapabS4bvlgLil+/La9gDsF/Z2T/wxsUht2nCm
 aic3ZjLX2mtte75zQAL+lvEjPR6q92PibNOgBvt51ueLK7Hcxga4uiAzpdlZausp
 YKAtqwhaj7AD2xUqPuyB9xHw5tvFbGqiN6rMmxIbQSOUhgtiUxiiLHRM8ppanoHv
 D2fMKKj8Pz5FGgzd7c0b9fZUERFNqHeKSTPgxNENzLS0TCRexP94Ihp5FoWN4tY+
 HPQT291DrWyquSl0c7FrI1BuF41fmKJ+CZHbvXBwT429bJQQ2dehgIUfdGYgrSBt
 J/zbh0OO2fkLCxNDVpA0cNm+tlYUGbc+TCG4R2I3V2dn5yTxru/w+TdG/GyM8h75
 jUAGS3hFKBCFQSLC8M+nRcOsLsV1H4H9/MnQ84+wpXXC/Z5MseYHo07E1xWNViUW
 UHuM6PlGRUPJ0JrC6J6wLkkvHDyjXbaSitligH8K2aW9PtCU814T7+4rwgyaHCVr
 OMizAmy65Y2lutJ4mtMNc05mKlQRlGfWu/EOBgTRzB+V4hadp2NRZ1b9rk3MFRgk
 ckxiYM91MjtuvHV/SrLd3e2PlnvouGw30jaWhScy2Sl5D4g76ok=
 =zBsi
 -----END PGP SIGNATURE-----

Merge 4.4.178 into android-4.4-p

Changes in 4.4.178
	mmc: pxamci: fix enum type confusion
	drm/vmwgfx: Don't double-free the mode stored in par->set_mode
	udf: Fix crash on IO error during truncate
	mips: loongson64: lemote-2f: Add IRQF_NO_SUSPEND to "cascade" irqaction.
	MIPS: Fix kernel crash for R6 in jump label branch function
	futex: Ensure that futex address is aligned in handle_futex_death()
	ext4: fix NULL pointer dereference while journal is aborted
	ext4: fix data corruption caused by unaligned direct AIO
	ext4: brelse all indirect buffer in ext4_ind_remove_space()
	mmc: tmio_mmc_core: don't claim spurious interrupts
	media: v4l2-ctrls.c/uvc: zero v4l2_event
	locking/lockdep: Add debug_locks check in __lock_downgrade()
	ALSA: hda - Record the current power state before suspend/resume calls
	ALSA: hda - Enforces runtime_resume after S3 and S4 for each codec
	mmc: pwrseq_simple: Make reset-gpios optional to match doc
	mmc: debugfs: Add a restriction to mmc debugfs clock setting
	mmc: make MAN_BKOPS_EN message a debug
	mmc: sanitize 'bus width' in debug output
	mmc: core: shut up "voltage-ranges unspecified" pr_info()
	usb: dwc3: gadget: Fix suspend/resume during device mode
	arm64: mm: Add trace_irqflags annotations to do_debug_exception()
	mmc: core: fix using wrong io voltage if mmc_select_hs200 fails
	mm/rmap: replace BUG_ON(anon_vma->degree) with VM_WARN_ON
	extcon: usb-gpio: Don't miss event during suspend/resume
	kbuild: setlocalversion: print error to STDERR
	usb: gadget: composite: fix dereference after null check coverify warning
	usb: gadget: Add the gserial port checking in gs_start_tx()
	tcp/dccp: drop SYN packets if accept queue is full
	serial: sprd: adjust TIMEOUT to a big value
	Hang/soft lockup in d_invalidate with simultaneous calls
	arm64: traps: disable irq in die()
	usb: renesas_usbhs: gadget: fix unused-but-set-variable warning
	serial: sprd: clear timeout interrupt only rather than all interrupts
	lib/int_sqrt: optimize small argument
	USB: core: only clean up what we allocated
	rtc: Fix overflow when converting time64_t to rtc_time
	ath10k: avoid possible string overflow
	Bluetooth: Check L2CAP option sizes returned from l2cap_get_conf_opt
	Bluetooth: Verify that l2cap_get_conf_opt provides large enough buffer
	sched/fair: Fix new task's load avg removed from source CPU in wake_up_new_task()
	mmc: block: Allow more than 8 partitions per card
	arm64: fix COMPAT_SHMLBA definition for large pages
	efi: stub: define DISABLE_BRANCH_PROFILING for all architectures
	ARM: 8458/1: bL_switcher: add GIC dependency
	ARM: 8494/1: mm: Enable PXN when running non-LPAE kernel on LPAE processor
	android: unconditionally remove callbacks in sync_fence_free()
	vmstat: make vmstat_updater deferrable again and shut down on idle
	hid-sensor-hub.c: fix wrong do_div() usage
	arm64: hide __efistub_ aliases from kallsyms
	perf: Synchronously free aux pages in case of allocation failure
	net: diag: support v4mapped sockets in inet_diag_find_one_icsk()
	Revert "mmc: block: don't use parameter prefix if built as module"
	writeback: initialize inode members that track writeback history
	coresight: fixing lockdep error
	coresight: coresight_unregister() function cleanup
	coresight: release reference taken by 'bus_find_device()'
	coresight: remove csdev's link from topology
	stm class: Fix locking in unbinding policy path
	stm class: Fix link list locking
	stm class: Prevent user-controllable allocations
	stm class: Support devices with multiple instances
	stm class: Fix unlocking braino in the error path
	stm class: Guard output assignment against concurrency
	stm class: Fix unbalanced module/device refcounting
	stm class: Fix a race in unlinking
	coresight: "DEVICE_ATTR_RO" should defined as static.
	coresight: etm4x: Check every parameter used by dma_xx_coherent.
	asm-generic: Fix local variable shadow in __set_fixmap_offset
	staging: ashmem: Avoid deadlock with mmap/shrink
	staging: ashmem: Add missing include
	staging: ion: Set minimum carveout heap allocation order to PAGE_SHIFT
	staging: goldfish: audio: fix compiliation on arm
	ARM: 8510/1: rework ARM_CPU_SUSPEND dependencies
	arm64/kernel: fix incorrect EL0 check in inv_entry macro
	mac80211: fix "warning: ‘target_metric’ may be used uninitialized"
	perf/ring_buffer: Refuse to begin AUX transaction after rb->aux_mmap_count drops
	arm64: kernel: Include _AC definition in page.h
	PM / Hibernate: Call flush_icache_range() on pages restored in-place
	stm class: Do not leak the chrdev in error path
	stm class: Fix stm device initialization order
	ipv6: fix endianness error in icmpv6_err
	usb: gadget: configfs: add mutex lock before unregister gadget
	usb: gadget: rndis: free response queue during REMOTE_NDIS_RESET_MSG
	cpu/hotplug: Handle unbalanced hotplug enable/disable
	video: fbdev: Set pixclock = 0 in goldfishfb
	arm64: kconfig: drop CONFIG_RTC_LIB dependency
	mmc: mmc: fix switch timeout issue caused by jiffies precision
	cfg80211: size various nl80211 messages correctly
	stmmac: copy unicast mac address to MAC registers
	dccp: do not use ipv6 header for ipv4 flow
	mISDN: hfcpci: Test both vendor & device ID for Digium HFC4S
	net/packet: Set __GFP_NOWARN upon allocation in alloc_pg_vec
	net: rose: fix a possible stack overflow
	Add hlist_add_tail_rcu() (Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
	packets: Always register packet sk in the same order
	tcp: do not use ipv6 header for ipv4 flow
	vxlan: Don't call gro_cells_destroy() before device is unregistered
	sctp: get sctphdr by offset in sctp_compute_cksum
	mac8390: Fix mmio access size probe
	btrfs: remove WARN_ON in log_dir_items
	btrfs: raid56: properly unmap parity page in finish_parity_scrub()
	ARM: imx6q: cpuidle: fix bug that CPU might not wake up at expected time
	ALSA: compress: add support for 32bit calls in a 64bit kernel
	ALSA: rawmidi: Fix potential Spectre v1 vulnerability
	ALSA: seq: oss: Fix Spectre v1 vulnerability
	ALSA: pcm: Fix possible OOB access in PCM oss plugins
	ALSA: pcm: Don't suspend stream in unrecoverable PCM state
	scsi: sd: Fix a race between closing an sd device and sd I/O
	scsi: zfcp: fix rport unblock if deleted SCSI devices on Scsi_Host
	scsi: zfcp: fix scsi_eh host reset with port_forced ERP for non-NPIV FCP devices
	tty: atmel_serial: fix a potential NULL pointer dereference
	staging: vt6655: Remove vif check from vnt_interrupt
	staging: vt6655: Fix interrupt race condition on device start up.
	serial: max310x: Fix to avoid potential NULL pointer dereference
	serial: sh-sci: Fix setting SCSCR_TIE while transferring data
	USB: serial: cp210x: add new device id
	USB: serial: ftdi_sio: add additional NovaTech products
	USB: serial: mos7720: fix mos_parport refcount imbalance on error path
	USB: serial: option: set driver_info for SIM5218 and compatibles
	USB: serial: option: add Olicard 600
	Disable kgdboc failed by echo space to /sys/module/kgdboc/parameters/kgdboc
	fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links
	gpio: adnp: Fix testing wrong value in adnp_gpio_direction_input
	perf intel-pt: Fix TSC slip
	x86/smp: Enforce CONFIG_HOTPLUG_CPU when SMP=y
	KVM: Reject device ioctls from processes other than the VM's creator
	xhci: Fix port resume done detection for SS ports with LPM enabled
	Revert "USB: core: only clean up what we allocated"
	arm64: support keyctl() system call in 32-bit mode
	coresight: removing bind/unbind options from sysfs
	stm class: Hide STM-specific options if STM is disabled
	Linux 4.4.178

Change-Id: Iac01be124213731798a36b20d80ea3a8e911d025
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-04-03 10:21:44 +02:00
Christoph Lameter
bdf3c006b9 vmstat: make vmstat_updater deferrable again and shut down on idle
[ Upstream commit 0eb77e9880321915322d42913c3b53241739c8aa ]

Currently the vmstat updater is not deferrable as a result of commit
ba4877b9ca ("vmstat: do not use deferrable delayed work for
vmstat_update").  This in turn can cause multiple interruptions of the
applications because the vmstat updater may run at

Make vmstate_update deferrable again and provide a function that folds
the differentials when the processor is going to idle mode thus
addressing the issue of the above commit in a clean way.

Note that the shepherd thread will continue scanning the differentials
from another processor and will reenable the vmstat workers if it
detects any changes.

Fixes: ba4877b9ca ("vmstat: do not use deferrable delayed work for vmstat_update")
Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-03 06:23:21 +02:00
Konstantin Khlebnikov
7f69a980f6 mm/rmap: replace BUG_ON(anon_vma->degree) with VM_WARN_ON
commit e4c5800a3991f0c6a766983535dfc10d51802cf6 upstream.

This check effectively catches anon vma hierarchy inconsistence and some
vma corruptions.  It was effective for catching corner cases in anon vma
reusing logic.  For now this code seems stable so check could be hidden
under CONFIG_DEBUG_VM and replaced with WARN because it's not so fatal.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Suggested-by: Vasily Averin <vvs@virtuozzo.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-03 06:23:18 +02:00
Greg Kroah-Hartman
349ac1a59c This is the 4.4.177 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlyV4+kACgkQONu9yGCS
 aT5T2RAAn9hyo4LmxMvxab61d+PSEfn9TKhNjEtF8vFKNiYb+W+vI0ALHYSWcT1Z
 O5T4d1TeSeMrs9G1McL/D80vMJFIzcg0a9QIYuFObFAB21VpDiiGcVc74d+6fHtH
 m6loPE1d2GCpzwJ7VOCvdC9DR8C9SK0IVANyMJApXUL8mkNRo2H6vY/NGt65+5zb
 vioEbGbXZQJl1GvvwquM6cX9ABH4nyAU1yTX9r2CHMFCBQ0JDkpY4yxClY1NBZ02
 1Rc1NpJCR6OJUPvQUpyHuY5rkkPfM12Iz9dxFHARXvtTsmzm3AFdkev5GEMlR5e1
 hNXs6ZPyTADJL/fKO8nmeKwKf30xTaWObgMw9A3d8FOFSmDXAW6FLKAmIz+yZBGc
 27Tta1pGkZscC1iajEX2dcp5Zjkwr4y/HA5EJJ3jCCwrfTPDL5u8N900GbKMx4Lk
 EgPB3byZUAn/9k1m5HEA8RS08LqsNTAEA2Q6nZZhuhmqGJQPRtbBPG7tib9bvhUy
 KBLQdqJ8ubi9T1EopHu8xZdpZbbB/uCS+FB6NIkXuWR1IHkAGdEPheHrv3tuR5rf
 8/2OU970h63ztE5qHFsBci2uC4htiZFY62NULiPbI7HjeEUdym0AGK4JzGnn0lnX
 8McOBeOKwQwR5XuHZcMKWrsstt4mv9zo5QOdCJ1XDxFv628G2dQ=
 =eGAC
 -----END PGP SIGNATURE-----

Merge 4.4.177 into android-4.4-p

Changes in 4.4.177
	ceph: avoid repeatedly adding inode to mdsc->snap_flush_list
	numa: change get_mempolicy() to use nr_node_ids instead of MAX_NUMNODES
	KEYS: allow reaching the keys quotas exactly
	mfd: ti_am335x_tscadc: Use PLATFORM_DEVID_AUTO while registering mfd cells
	mfd: twl-core: Fix section annotations on {,un}protect_pm_master
	mfd: db8500-prcmu: Fix some section annotations
	mfd: ab8500-core: Return zero in get_register_interruptible()
	mfd: qcom_rpm: write fw_version to CTRL_REG
	mfd: wm5110: Add missing ASRC rate register
	mfd: mc13xxx: Fix a missing check of a register-read failure
	net: hns: Fix use after free identified by SLUB debug
	MIPS: ath79: Enable OF serial ports in the default config
	scsi: qla4xxx: check return code of qla4xxx_copy_from_fwddb_param
	scsi: isci: initialize shost fully before calling scsi_add_host()
	MIPS: jazz: fix 64bit build
	isdn: i4l: isdn_tty: Fix some concurrency double-free bugs
	atm: he: fix sign-extension overflow on large shift
	leds: lp5523: fix a missing check of return value of lp55xx_read
	isdn: avm: Fix string plus integer warning from Clang
	RDMA/srp: Rework SCSI device reset handling
	KEYS: user: Align the payload buffer
	KEYS: always initialize keyring_index_key::desc_len
	batman-adv: fix uninit-value in batadv_interface_tx()
	net/packet: fix 4gb buffer limit due to overflow check
	team: avoid complex list operations in team_nl_cmd_options_set()
	sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()
	net/mlx4_en: Force CHECKSUM_NONE for short ethernet frames
	ARCv2: Enable unaligned access in early ASM code
	Revert "bridge: do not add port to router list when receives query with source 0.0.0.0"
	libceph: handle an empty authorize reply
	scsi: libsas: Fix rphy phy_identifier for PHYs with end devices attached
	drm/msm: Unblock writer if reader closes file
	ASoC: Intel: Haswell/Broadwell: fix setting for .dynamic field
	ALSA: compress: prevent potential divide by zero bugs
	thermal: int340x_thermal: Fix a NULL vs IS_ERR() check
	usb: dwc3: gadget: Fix the uninitialized link_state when udc starts
	usb: gadget: Potential NULL dereference on allocation error
	ASoC: dapm: change snprintf to scnprintf for possible overflow
	ASoC: imx-audmux: change snprintf to scnprintf for possible overflow
	ARC: fix __ffs return value to avoid build warnings
	mac80211: fix miscounting of ttl-dropped frames
	serial: fsl_lpuart: fix maximum acceptable baud rate with over-sampling
	scsi: csiostor: fix NULL pointer dereference in csio_vport_set_state()
	net: altera_tse: fix connect_local_phy error path
	ibmveth: Do not process frames after calling napi_reschedule
	mac80211: don't initiate TDLS connection if station is not associated to AP
	cfg80211: extend range deviation for DMG
	KVM: nSVM: clear events pending from svm_complete_interrupts() when exiting to L1
	arm/arm64: KVM: Feed initialized memory to MMIO accesses
	KVM: arm/arm64: Fix MMIO emulation data handling
	powerpc: Always initialize input array when calling epapr_hypercall()
	mmc: spi: Fix card detection during probe
	mm: enforce min addr even if capable() in expand_downwards()
	x86/uaccess: Don't leak the AC flag into __put_user() value evaluation
	USB: serial: option: add Telit ME910 ECM composition
	USB: serial: cp210x: add ID for Ingenico 3070
	USB: serial: ftdi_sio: add ID for Hjelmslund Electronics USB485
	cpufreq: Use struct kobj_attribute instead of struct global_attr
	sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
	ncpfs: fix build warning of strncpy
	isdn: isdn_tty: fix build warning of strncpy
	staging: lustre: fix buffer overflow of string buffer
	net-sysfs: Fix mem leak in netdev_register_kobject
	sky2: Disable MSI on Dell Inspiron 1545 and Gateway P-79
	team: Free BPF filter when unregistering netdev
	bnxt_en: Drop oversize TX packets to prevent errors.
	net: nfc: Fix NULL dereference on nfc_llcp_build_tlv fails
	xen-netback: fix occasional leak of grant ref mappings under memory pressure
	net: Add __icmp_send helper.
	net: avoid use IPCB in cipso_v4_error
	net: phy: Micrel KSZ8061: link failure after cable connect
	x86/CPU/AMD: Set the CPB bit unconditionally on F17h
	applicom: Fix potential Spectre v1 vulnerabilities
	MIPS: irq: Allocate accurate order pages for irq stack
	hugetlbfs: fix races and page leaks during migration
	netlabel: fix out-of-bounds memory accesses
	net: dsa: mv88e6xxx: Fix u64 statistics
	ip6mr: Do not call __IP6_INC_STATS() from preemptible context
	media: uvcvideo: Fix 'type' check leading to overflow
	vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel
	perf tools: Handle TOPOLOGY headers with no CPU
	IB/{hfi1, qib}: Fix WC.byte_len calculation for UD_SEND_WITH_IMM
	ipvs: Fix signed integer overflow when setsockopt timeout
	iommu/amd: Fix IOMMU page flush when detach device from a domain
	xtensa: SMP: fix ccount_timer_shutdown
	xtensa: SMP: fix secondary CPU initialization
	xtensa: smp_lx200_defconfig: fix vectors clash
	xtensa: SMP: mark each possible CPU as present
	xtensa: SMP: limit number of possible CPUs by NR_CPUS
	net: altera_tse: fix msgdma_tx_completion on non-zero fill_level case
	net: hns: Fix wrong read accesses via Clause 45 MDIO protocol
	net: stmmac: dwmac-rk: fix error handling in rk_gmac_powerup()
	gpio: vf610: Mask all GPIO interrupts
	nfs: Fix NULL pointer dereference of dev_name
	scsi: libfc: free skb when receiving invalid flogi resp
	platform/x86: Fix unmet dependency warning for SAMSUNG_Q10
	cifs: fix computation for MAX_SMB2_HDR_SIZE
	x86/kexec: Don't setup EFI info if EFI runtime is not enabled
	x86_64: increase stack size for KASAN_EXTRA
	mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone
	mm, memory_hotplug: test_pages_in_a_zone do not pass the end of zone
	fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
	autofs: drop dentry reference only when it is never used
	autofs: fix error return in autofs_fill_super()
	ARM: pxa: ssp: unneeded to free devm_ allocated data
	irqchip/mmp: Only touch the PJ4 IRQ & FIQ bits on enable/disable
	dmaengine: at_xdmac: Fix wrongfull report of a channel as in use
	dmaengine: dmatest: Abort test in case of mapping error
	s390/qeth: fix use-after-free in error path
	perf symbols: Filter out hidden symbols from labels
	MIPS: Remove function size check in get_frame_info()
	Input: wacom_serial4 - add support for Wacom ArtPad II tablet
	Input: elan_i2c - add id for touchpad found in Lenovo s21e-20
	iscsi_ibft: Fix missing break in switch statement
	futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock()
	ARM: dts: exynos: Add minimal clkout parameters to Exynos3250 PMU
	Revert "x86/platform/UV: Use efi_runtime_lock to serialise BIOS calls"
	ARM: dts: exynos: Do not ignore real-world fuse values for thermal zone 0 on Exynos5420
	udplite: call proper backlog handlers
	netfilter: x_tables: enforce nul-terminated table name from getsockopt GET_ENTRIES
	netfilter: nfnetlink_log: just returns error for unknown command
	netfilter: nfnetlink_acct: validate NFACCT_FILTER parameters
	netfilter: nf_conntrack_tcp: Fix stack out of bounds when parsing TCP options
	KEYS: restrict /proc/keys by credentials at open time
	l2tp: fix infoleak in l2tp_ip6_recvmsg()
	net: hsr: fix memory leak in hsr_dev_finalize()
	net: sit: fix UBSAN Undefined behaviour in check_6rd
	net/x25: fix use-after-free in x25_device_event()
	net/x25: reset state in x25_connect()
	pptp: dst_release sk_dst_cache in pptp_sock_destruct
	ravb: Decrease TxFIFO depth of Q3 and Q2 to one
	route: set the deleted fnhe fnhe_daddr to 0 in ip_del_fnhe to fix a race
	tcp: handle inet_csk_reqsk_queue_add() failures
	net/mlx4_core: Fix reset flow when in command polling mode
	net/mlx4_core: Fix qp mtt size calculation
	net/x25: fix a race in x25_bind()
	mdio_bus: Fix use-after-free on device_register fails
	net: Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255
	missing barriers in some of unix_sock ->addr and ->path accesses
	ipvlan: disallow userns cap_net_admin to change global mode/flags
	vxlan: test dev->flags & IFF_UP before calling gro_cells_receive()
	vxlan: Fix GRO cells race condition between receive and link delete
	net/hsr: fix possible crash in add_timer()
	gro_cells: make sure device is up in gro_cells_receive()
	tcp/dccp: remove reqsk_put() from inet_child_forget()
	ALSA: bebob: use more identical mod_alias for Saffire Pro 10 I/O against Liquid Saffire 56
	fs/9p: use fscache mutex rather than spinlock
	It's wrong to add len to sector_nr in raid10 reshape twice
	media: videobuf2-v4l2: drop WARN_ON in vb2_warn_zero_bytesused()
	9p: use inode->i_lock to protect i_size_write() under 32-bit
	9p/net: fix memory leak in p9_client_create
	ASoC: fsl_esai: fix register setting issue in RIGHT_J mode
	stm class: Fix an endless loop in channel allocation
	crypto: caam - fixed handling of sg list
	crypto: ahash - fix another early termination in hash walk
	gpu: ipu-v3: Fix i.MX51 CSI control registers offset
	gpu: ipu-v3: Fix CSI offsets for imx53
	s390/dasd: fix using offset into zero size array error
	ARM: OMAP2+: Variable "reg" in function omap4_dsi_mux_pads() could be uninitialized
	Input: matrix_keypad - use flush_delayed_work()
	i2c: cadence: Fix the hold bit setting
	Input: st-keyscan - fix potential zalloc NULL dereference
	ARM: 8824/1: fix a migrating irq bug when hotplug cpu
	assoc_array: Fix shortcut creation
	scsi: libiscsi: Fix race between iscsi_xmit_task and iscsi_complete_task
	net: systemport: Fix reception of BPDUs
	pinctrl: meson: meson8b: fix the sdxc_a data 1..3 pins
	net: mv643xx_eth: disable clk on error path in mv643xx_eth_shared_probe()
	ASoC: topology: free created components in tplg load error
	arm64: Relax GIC version check during early boot
	tmpfs: fix link accounting when a tmpfile is linked in
	ARC: uacces: remove lp_start, lp_end from clobber list
	phonet: fix building with clang
	mac80211_hwsim: propagate genlmsg_reply return code
	net: set static variable an initial value in atl2_probe()
	tmpfs: fix uninitialized return value in shmem_link
	stm class: Prevent division by zero
	crypto: arm64/aes-ccm - fix logical bug in AAD MAC handling
	CIFS: Fix read after write for files with read caching
	tracing: Do not free iter->trace in fail path of tracing_open_pipe()
	ACPI / device_sysfs: Avoid OF modalias creation for removed device
	regulator: s2mps11: Fix steps for buck7, buck8 and LDO35
	regulator: s2mpa01: Fix step values for some LDOs
	clocksource/drivers/exynos_mct: Move one-shot check from tick clear to ISR
	clocksource/drivers/exynos_mct: Clear timer interrupt when shutdown
	s390/virtio: handle find on invalid queue gracefully
	scsi: virtio_scsi: don't send sc payload with tmfs
	scsi: target/iscsi: Avoid iscsit_release_commands_from_conn() deadlock
	m68k: Add -ffreestanding to CFLAGS
	btrfs: ensure that a DUP or RAID1 block group has exactly two stripes
	Btrfs: fix corruption reading shared and compressed extents after hole punching
	crypto: pcbc - remove bogus memcpy()s with src == dest
	cpufreq: tegra124: add missing of_node_put()
	cpufreq: pxa2xx: remove incorrect __init annotation
	ext4: fix crash during online resizing
	ext2: Fix underflow in ext2_max_size()
	clk: ingenic: Fix round_rate misbehaving with non-integer dividers
	dmaengine: usb-dmac: Make DMAC system sleep callbacks explicit
	mm/vmalloc: fix size check for remap_vmalloc_range_partial()
	kernel/sysctl.c: add missing range check in do_proc_dointvec_minmax_conv
	intel_th: Don't reference unassigned outputs
	parport_pc: fix find_superio io compare code, should use equal test.
	i2c: tegra: fix maximum transfer size
	perf bench: Copy kernel files needed to build mem{cpy,set} x86_64 benchmarks
	serial: 8250_pci: Fix number of ports for ACCES serial cards
	serial: 8250_pci: Have ACCES cards that use the four port Pericom PI7C9X7954 chip use the pci_pericom_setup()
	jbd2: clear dirty flag when revoking a buffer from an older transaction
	jbd2: fix compile warning when using JBUFFER_TRACE
	powerpc/32: Clear on-stack exception marker upon exception return
	powerpc/wii: properly disable use of BATs when requested.
	powerpc/powernv: Make opal log only readable by root
	powerpc/83xx: Also save/restore SPRG4-7 during suspend
	ARM: s3c24xx: Fix boolean expressions in osiris_dvs_notify
	dm: fix to_sector() for 32bit
	NFS41: pop some layoutget errors to application
	perf intel-pt: Fix CYC timestamp calculation after OVF
	perf auxtrace: Define auxtrace record alignment
	perf intel-pt: Fix overlap calculation for padding
	md: Fix failed allocation of md_register_thread
	NFS: Fix an I/O request leakage in nfs_do_recoalesce
	NFS: Don't recoalesce on error in nfs_pageio_complete_mirror()
	nfsd: fix memory corruption caused by readdir
	nfsd: fix wrong check in write_v4_end_grace()
	PM / wakeup: Rework wakeup source timer cancellation
	rcu: Do RCU GP kthread self-wakeup from softirq and interrupt
	media: uvcvideo: Avoid NULL pointer dereference at the end of streaming
	drm/radeon/evergreen_cs: fix missing break in switch statement
	KVM: nVMX: Sign extend displacements of VMX instr's mem operands
	KVM: nVMX: Ignore limit checks on VMX instructions using flat segments
	KVM: X86: Fix residual mmio emulation request to userspace
	Linux 4.4.177

Change-Id: Ia33b88c9634e04612874d79ce4cc166e8aa8096a
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-03-23 09:28:32 +01:00
Roman Penyaev
49b3c4a292 mm/vmalloc: fix size check for remap_vmalloc_range_partial()
commit 401592d2e095947344e10ec0623adbcd58934dd4 upstream.

When VM_NO_GUARD is not set area->size includes adjacent guard page,
thus for correct size checking get_vm_area_size() should be used, but
not area->size.

This fixes possible kernel oops when userspace tries to mmap an area on
1 page bigger than was allocated by vmalloc_user() call: the size check
inside remap_vmalloc_range_partial() accounts non-existing guard page
also, so check successfully passes but vmalloc_to_page() returns NULL
(guard page does not physically exist).

The following code pattern example should trigger an oops:

  static int oops_mmap(struct file *file, struct vm_area_struct *vma)
  {
        void *mem;

        mem = vmalloc_user(4096);
        BUG_ON(!mem);
        /* Do not care about mem leak */

        return remap_vmalloc_range(vma, mem, 0);
  }

And userspace simply mmaps size + PAGE_SIZE:

  mmap(NULL, 8192, PROT_WRITE|PROT_READ, MAP_PRIVATE, fd, 0);

Possible candidates for oops which do not have any explicit size
checks:

   *** drivers/media/usb/stkwebcam/stk-webcam.c:
   v4l_stk_mmap[789]   ret = remap_vmalloc_range(vma, sbuf->buffer, 0);

Or the following one:

   *** drivers/video/fbdev/core/fbmem.c
   static int
   fb_mmap(struct file *file, struct vm_area_struct * vma)
        ...
        res = fb->fb_mmap(info, vma);

Where fb_mmap callback calls remap_vmalloc_range() directly without any
explicit checks:

   *** drivers/video/fbdev/vfb.c
   static int vfb_mmap(struct fb_info *info,
             struct vm_area_struct *vma)
   {
       return remap_vmalloc_range(vma, (void *)info->fix.smem_start, vma->vm_pgoff);
   }

Link: http://lkml.kernel.org/r/20190103145954.16942-2-rpenyaev@suse.de
Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Joe Perches <joe@perches.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-23 08:44:37 +01:00
Darrick J. Wong
5f4c9964d1 tmpfs: fix uninitialized return value in shmem_link
[ Upstream commit 29b00e609960ae0fcff382f4c7079dd0874a5311 ]

When we made the shmem_reserve_inode call in shmem_link conditional, we
forgot to update the declaration for ret so that it always has a known
value.  Dan Carpenter pointed out this deficiency in the original patch.

Fixes: 1062af920c07 ("tmpfs: fix link accounting when a tmpfile is linked in")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Matej Kupljen <matej.kupljen@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-23 08:44:34 +01:00
Darrick J. Wong
f8f413336b tmpfs: fix link accounting when a tmpfile is linked in
[ Upstream commit 1062af920c07f5b54cf5060fde3339da6df0cf6b ]

tmpfs has a peculiarity of accounting hard links as if they were
separate inodes: so that when the number of inodes is limited, as it is
by default, a user cannot soak up an unlimited amount of unreclaimable
dcache memory just by repeatedly linking a file.

But when v3.11 added O_TMPFILE, and the ability to use linkat() on the
fd, we missed accommodating this new case in tmpfs: "df -i" shows that
an extra "inode" remains accounted after the file is unlinked and the fd
closed and the actual inode evicted.  If a user repeatedly links
tmpfiles into a tmpfs, the limit will be hit (ENOSPC) even after they
are deleted.

Just skip the extra reservation from shmem_link() in this case: there's
a sense in which this first link of a tmpfile is then cheaper than a
hard link of another file, but the accounting works out, and there's
still good limiting, so no need to do anything more complicated.

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1902182134370.7035@eggly.anvils
Fixes: f4e0c30c19 ("allow the temp files created by open() to be linked to")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Matej Kupljen <matej.kupljen@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-23 08:44:34 +01:00
Mikhail Zaslonko
fcd1132557 mm, memory_hotplug: test_pages_in_a_zone do not pass the end of zone
[ Upstream commit 24feb47c5fa5b825efb0151f28906dfdad027e61 ]

If memory end is not aligned with the sparse memory section boundary,
the mapping of such a section is only partly initialized.  This may lead
to VM_BUG_ON due to uninitialized struct pages access from
test_pages_in_a_zone() function triggered by memory_hotplug sysfs
handlers.

Here are the the panic examples:
 CONFIG_DEBUG_VM_PGFLAGS=y
 kernel parameter mem=2050M
 --------------------------
 page:000003d082008000 is uninitialized and poisoned
 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 Call Trace:
   test_pages_in_a_zone+0xde/0x160
   show_valid_zones+0x5c/0x190
   dev_attr_show+0x34/0x70
   sysfs_kf_seq_show+0xc8/0x148
   seq_read+0x204/0x480
   __vfs_read+0x32/0x178
   vfs_read+0x82/0x138
   ksys_read+0x5a/0xb0
   system_call+0xdc/0x2d8
 Last Breaking-Event-Address:
   test_pages_in_a_zone+0xde/0x160
 Kernel panic - not syncing: Fatal exception: panic_on_oops

Fix this by checking whether the pfn to check is within the zone.

[mhocko@suse.com: separated this change from http://lkml.kernel.org/r/20181105150401.97287-2-zaslonko@linux.ibm.com]
Link: http://lkml.kernel.org/r/20190128144506.15603-3-mhocko@kernel.org

[mhocko@suse.com: separated this change from
http://lkml.kernel.org/r/20181105150401.97287-2-zaslonko@linux.ibm.com]
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-23 08:44:26 +01:00
Michal Hocko
3ba0452668 mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone
[ Upstream commit efad4e475c312456edb3c789d0996d12ed744c13 ]

Patch series "mm, memory_hotplug: fix uninitialized pages fallouts", v2.

Mikhail Zaslonko has posted fixes for the two bugs quite some time ago
[1].  I have pushed back on those fixes because I believed that it is
much better to plug the problem at the initialization time rather than
play whack-a-mole all over the hotplug code and find all the places
which expect the full memory section to be initialized.

We have ended up with commit 2830bf6f05fb ("mm, memory_hotplug:
initialize struct pages for the full memory section") merged and cause a
regression [2][3].  The reason is that there might be memory layouts
when two NUMA nodes share the same memory section so the merged fix is
simply incorrect.

In order to plug this hole we really have to be zone range aware in
those handlers.  I have split up the original patch into two.  One is
unchanged (patch 2) and I took a different approach for `removable'
crash.

[1] http://lkml.kernel.org/r/20181105150401.97287-2-zaslonko@linux.ibm.com
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1666948
[3] http://lkml.kernel.org/r/20190125163938.GA20411@dhcp22.suse.cz

This patch (of 2):

Mikhail has reported the following VM_BUG_ON triggered when reading sysfs
removable state of a memory block:

 page:000003d08300c000 is uninitialized and poisoned
 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 Call Trace:
   is_mem_section_removable+0xb4/0x190
   show_mem_removable+0x9a/0xd8
   dev_attr_show+0x34/0x70
   sysfs_kf_seq_show+0xc8/0x148
   seq_read+0x204/0x480
   __vfs_read+0x32/0x178
   vfs_read+0x82/0x138
   ksys_read+0x5a/0xb0
   system_call+0xdc/0x2d8
 Last Breaking-Event-Address:
   is_mem_section_removable+0xb4/0x190
 Kernel panic - not syncing: Fatal exception: panic_on_oops

The reason is that the memory block spans the zone boundary and we are
stumbling over an unitialized struct page.  Fix this by enforcing zone
range in is_mem_section_removable so that we never run away from a zone.

Link: http://lkml.kernel.org/r/20190128144506.15603-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Debugged-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-23 08:44:26 +01:00
Mike Kravetz
aba029c8e7 hugetlbfs: fix races and page leaks during migration
commit cb6acd01e2e43fd8bad11155752b7699c3d0fb76 upstream.

hugetlb pages should only be migrated if they are 'active'.  The
routines set/clear_page_huge_active() modify the active state of hugetlb
pages.

When a new hugetlb page is allocated at fault time, set_page_huge_active
is called before the page is locked.  Therefore, another thread could
race and migrate the page while it is being added to page table by the
fault code.  This race is somewhat hard to trigger, but can be seen by
strategically adding udelay to simulate worst case scheduling behavior.
Depending on 'how' the code races, various BUG()s could be triggered.

To address this issue, simply delay the set_page_huge_active call until
after the page is successfully added to the page table.

Hugetlb pages can also be leaked at migration time if the pages are
associated with a file in an explicitly mounted hugetlbfs filesystem.
For example, consider a two node system with 4GB worth of huge pages
available.  A program mmaps a 2G file in a hugetlbfs filesystem.  It
then migrates the pages associated with the file from one node to
another.  When the program exits, huge page counts are as follows:

  node0
  1024    free_hugepages
  1024    nr_hugepages

  node1
  0       free_hugepages
  1024    nr_hugepages

  Filesystem                         Size  Used Avail Use% Mounted on
  nodev                              4.0G  2.0G  2.0G  50% /var/opt/hugepool

That is as expected.  2G of huge pages are taken from the free_hugepages
counts, and 2G is the size of the file in the explicitly mounted
filesystem.  If the file is then removed, the counts become:

  node0
  1024    free_hugepages
  1024    nr_hugepages

  node1
  1024    free_hugepages
  1024    nr_hugepages

  Filesystem                         Size  Used Avail Use% Mounted on
  nodev                              4.0G  2.0G  2.0G  50% /var/opt/hugepool

Note that the filesystem still shows 2G of pages used, while there
actually are no huge pages in use.  The only way to 'fix' the filesystem
accounting is to unmount the filesystem

If a hugetlb page is associated with an explicitly mounted filesystem,
this information in contained in the page_private field.  At migration
time, this information is not preserved.  To fix, simply transfer
page_private from old to new page at migration time if necessary.

There is a related race with removing a huge page from a file and
migration.  When a huge page is removed from the pagecache, the
page_mapping() field is cleared, yet page_private remains set until the
page is actually freed by free_huge_page().  A page could be migrated
while in this state.  However, since page_mapping() is not set the
hugetlbfs specific routine to transfer page_private is not called and we
leak the page count in the filesystem.

To fix that, check for this condition before migrating a huge page.  If
the condition is detected, return EBUSY for the page.

Link: http://lkml.kernel.org/r/74510272-7319-7372-9ea6-ec914734c179@oracle.com
Link: http://lkml.kernel.org/r/20190212221400.3512-1-mike.kravetz@oracle.com
Fixes: bcc5422230 ("mm: hugetlb: introduce page_huge_active")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: <stable@vger.kernel.org>
[mike.kravetz@oracle.com: v2]
  Link: http://lkml.kernel.org/r/7534d322-d782-8ac6-1c8d-a8dc380eb3ab@oracle.com
[mike.kravetz@oracle.com: update comment and changelog]
  Link: http://lkml.kernel.org/r/420bcfd6-158b-38e4-98da-26d0cd85bd01@oracle.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-23 08:44:23 +01:00
Jann Horn
40952b6a64 mm: enforce min addr even if capable() in expand_downwards()
commit 0a1d52994d440e21def1c2174932410b4f2a98a1 upstream.

security_mmap_addr() does a capability check with current_cred(), but
we can reach this code from contexts like a VFS write handler where
current_cred() must not be used.

This can be abused on systems without SMAP to make NULL pointer
dereferences exploitable again.

Fixes: 8869477a49 ("security: protect from stack expansion into low vm addresses")
Cc: stable@kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-23 08:44:21 +01:00
Ralph Campbell
d3f2228a22 numa: change get_mempolicy() to use nr_node_ids instead of MAX_NUMNODES
commit 050c17f239fd53adb55aa768d4f41bc76c0fe045 upstream.

The system call, get_mempolicy() [1], passes an unsigned long *nodemask
pointer and an unsigned long maxnode argument which specifies the length
of the user's nodemask array in bits (which is rounded up).  The manual
page says that if the maxnode value is too small, get_mempolicy will
return EINVAL but there is no system call to return this minimum value.
To determine this value, some programs search /proc/<pid>/status for a
line starting with "Mems_allowed:" and use the number of digits in the
mask to determine the minimum value.  A recent change to the way this line
is formatted [2] causes these programs to compute a value less than
MAX_NUMNODES so get_mempolicy() returns EINVAL.

Change get_mempolicy(), the older compat version of get_mempolicy(), and
the copy_nodes_to_user() function to use nr_node_ids instead of
MAX_NUMNODES, thus preserving the defacto method of computing the minimum
size for the nodemask array and the maxnode argument.

[1] http://man7.org/linux/man-pages/man2/get_mempolicy.2.html
[2] https://lore.kernel.org/lkml/1545405631-6808-1-git-send-email-longman@redhat.com

Link: http://lkml.kernel.org/r/20190211180245.22295-1-rcampbell@nvidia.com
Fixes: 4fb8e5b89bcbbbb ("include/linux/nodemask.h: use nr_node_ids (not MAX_NUMNODES) in __nodemask_pr_numnodes()")
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Waiman Long <longman@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-23 08:44:15 +01:00
Greg Kroah-Hartman
a95e76973d This is the 4.4.173 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlxbKr0ACgkQONu9yGCS
 aT5TfBAAhlyPx+CrOsKhOi9zCb2ZkLrAwMQ8E1LpiHOnCDgzt75zGempUqwVAKq5
 JmRay3Tt/YDK5+cDuT/3/ahHXcS3xvyJ/8kSanyPfB0KMkNL1nv1fU0oAb4+OLm6
 C63YmUpFQPGyD8R3BLmeIcBIUvEF0l+eZB3lrBjVz+tUKhuiIiBW6NtaHTyOhA9C
 KUXHN53CuZG4p7xdaevH5yt43sJGkb9FNDblaCLS9AVC1LiVOBGz/LSXiAiJfyU1
 u6zl9U9ZL33oU+cRbz2pulfsd+8CZpZEONPDjzDN11ahA+W8HQ81JabO1bZKkY9h
 geshJxrPM06/WS/NxEEPV1/MKPIuSDBxCdOMuGPzXTkpE1YB2EZRU6ONc1I11cYV
 hESoSjSMSbVRHfPANjgTz9DauvT7+CBkjZNAgfjT4gKDeIcQhvQXPOcfNnuCfFww
 eIdFdvxcBA2mCLz5lmkkH5tlN9fY7Bw7Y5eKknIoMSKGfckCUq5idEVDpTMKjFbP
 fcPk5u2MFDmI+EuVQ4FO5bY4cDqRXRpyDYFVl3OKTj5pyU5gfN30WGpNZ4U9sLFb
 kXXvoAwjiOmP/7H1fWB28C1Pnz1GaKImFhNmWKaOAfVWUEFto6otVqYRV1najAIv
 j1Hq44h47iqhDgAYgL2QtYiaGiyUJfd4lvGYzjM5OwJOEQyeErA=
 =bEr6
 -----END PGP SIGNATURE-----

Merge 4.4.173 into android-4.4-p

Changes in 4.4.173
	net: Fix usage of pskb_trim_rcsum
	openvswitch: Avoid OOB read when parsing flow nlattrs
	net: ipv4: Fix memory leak in network namespace dismantle
	net_sched: refetch skb protocol for each filter
	net: bridge: Fix ethernet header pointer before check skb forwardable
	USB: serial: simple: add Motorola Tetra TPG2200 device id
	USB: serial: pl2303: add new PID to support PL2303TB
	ASoC: atom: fix a missing check of snd_pcm_lib_malloc_pages
	ARC: perf: map generic branches to correct hardware condition
	s390/early: improve machine detection
	s390/smp: fix CPU hotplug deadlock with CPU rescan
	char/mwave: fix potential Spectre v1 vulnerability
	staging: rtl8188eu: Add device code for D-Link DWA-121 rev B1
	tty: Handle problem if line discipline does not have receive_buf
	tty/n_hdlc: fix __might_sleep warning
	CIFS: Fix possible hang during async MTU reads and writes
	Input: xpad - add support for SteelSeries Stratus Duo
	KVM: x86: Fix single-step debugging
	x86/kaslr: Fix incorrect i8254 outb() parameters
	can: dev: __can_get_echo_skb(): fix bogous check for non-existing skb by removing it
	can: bcm: check timer values before ktime conversion
	vt: invoke notifier on screen size change
	perf unwind: Unwind with libdw doesn't take symfs into account
	perf unwind: Take pgoff into account when reporting elf to libdwfl
	irqchip/gic-v3-its: Align PCI Multi-MSI allocation on their size
	arm64: mm: remove page_mapping check in __sync_icache_dcache
	f2fs: read page index before freeing
	Revert "loop: Fix double mutex_unlock(&loop_ctl_mutex) in loop_control_ioctl()"
	Revert "loop: Get rid of loop_index_mutex"
	Revert "loop: Fold __loop_release into loop_release"
	s390/smp: Fix calling smp_call_ipl_cpu() from ipl CPU
	fs: add the fsnotify call to vfs_iter_write
	ipv6: Consider sk_bound_dev_if when binding a socket to an address
	l2tp: copy 4 more bytes to linear part if necessary
	net/mlx4_core: Add masking for a few queries on HCA caps
	netrom: switch to sock timer API
	net/rose: fix NULL ax25_cb kernel panic
	ucc_geth: Reset BQL queue when stopping device
	l2tp: remove l2specific_len dependency in l2tp_core
	l2tp: fix reading optional fields of L2TPv3
	CIFS: Do not count -ENODATA as failure for query directory
	fs/dcache: Fix incorrect nr_dentry_unused accounting in shrink_dcache_sb()
	ARM: cns3xxx: Fix writing to wrong PCI config registers after alignment
	arm64: hyp-stub: Forbid kprobing of the hyp-stub
	gfs2: Revert "Fix loop in gfs2_rbm_find"
	platform/x86: asus-nb-wmi: Map 0x35 to KEY_SCREENLOCK
	platform/x86: asus-nb-wmi: Drop mapping of 0x33 and 0x34 scan codes
	mmc: sdhci-iproc: handle mmc_of_parse() errors during probe
	kernel/exit.c: release ptraced tasks before zap_pid_ns_processes
	mm, oom: fix use-after-free in oom_kill_process
	cifs: Always resolve hostname before reconnecting
	drivers: core: Remove glue dirs from sysfs earlier
	mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
	fs: don't scan the inode cache before SB_BORN is set
	Linux 4.4.173

Change-Id: Id606123657ad357fd2cd5f665a725f78b7c3e819
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-02-07 09:39:13 +01:00
David Hildenbrand
57d8138631 mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
commit e0a352fabce61f730341d119fbedf71ffdb8663f upstream.

We had a race in the old balloon compaction code before b1123ea6d3b3
("mm: balloon: use general non-lru movable page feature") refactored it
that became visible after backporting 195a8c43e93d ("virtio-balloon:
deflate via a page list") without the refactoring.

The bug existed from commit d6d86c0a7f ("mm/balloon_compaction:
redesign ballooned pages management") till b1123ea6d3b3 ("mm: balloon:
use general non-lru movable page feature").  d6d86c0a7f
("mm/balloon_compaction: redesign ballooned pages management") was
backported to 3.12, so the broken kernels are stable kernels [3.12 -
4.7].

There was a subtle race between dropping the page lock of the newpage in
__unmap_and_move() and checking for __is_movable_balloon_page(newpage).

Just after dropping this page lock, virtio-balloon could go ahead and
deflate the newpage, effectively dequeueing it and clearing PageBalloon,
in turn making __is_movable_balloon_page(newpage) fail.

This resulted in dropping the reference of the newpage via
putback_lru_page(newpage) instead of put_page(newpage), leading to
page->lru getting modified and a !LRU page ending up in the LRU lists.
With 195a8c43e93d ("virtio-balloon: deflate via a page list")
backported, one would suddenly get corrupted lists in
release_pages_balloon():

- WARNING: CPU: 13 PID: 6586 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
- list_del corruption. prev->next should be ffffe253961090a0, but was dead000000000100

Nowadays this race is no longer possible, but it is hidden behind very
ugly handling of __ClearPageMovable() and __PageMovable().

__ClearPageMovable() will not make __PageMovable() fail, only
PageMovable().  So the new check (__PageMovable(newpage)) will still
hold even after newpage was dequeued by virtio-balloon.

If anybody would ever change that special handling, the BUG would be
introduced again.  So instead, make it explicit and use the information
of the original isolated page before migration.

This patch can be backported fairly easy to stable kernels (in contrast
to the refactoring).

Link: http://lkml.kernel.org/r/20190129233217.10747-1-david@redhat.com
Fixes: d6d86c0a7f ("mm/balloon_compaction: redesign ballooned pages management")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Vratislav Bendel <vbendel@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vratislav Bendel <vbendel@redhat.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>	[3.12 - 4.7]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-06 19:43:08 +01:00
Shakeel Butt
c3ef8a44e7 mm, oom: fix use-after-free in oom_kill_process
commit cefc7ef3c87d02fc9307835868ff721ea12cc597 upstream.

Syzbot instance running on upstream kernel found a use-after-free bug in
oom_kill_process.  On further inspection it seems like the process
selected to be oom-killed has exited even before reaching
read_lock(&tasklist_lock) in oom_kill_process().  More specifically the
tsk->usage is 1 which is due to get_task_struct() in oom_evaluate_task()
and the put_task_struct within for_each_thread() frees the tsk and
for_each_thread() tries to access the tsk.  The easiest fix is to do
get/put across the for_each_thread() on the selected task.

Now the next question is should we continue with the oom-kill as the
previously selected task has exited? However before adding more
complexity and heuristics, let's answer why we even look at the children
of oom-kill selected task? The select_bad_process() has already selected
the worst process in the system/memcg.  Due to race, the selected
process might not be the worst at the kill time but does that matter?
The userspace can use the oom_score_adj interface to prefer children to
be killed before the parent.  I looked at the history but it seems like
this is there before git history.

Link: http://lkml.kernel.org/r/20190121215850.221745-1-shakeelb@google.com
Reported-by: syzbot+7fbbfa368521945f0e3d@syzkaller.appspotmail.com
Fixes: 6b0c81b3be ("mm, oom: reduce dependency on tasklist_lock")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-06 19:43:07 +01:00
Greg Kroah-Hartman
ad09d8c684 This is the 4.4.172 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlxMHY8ACgkQONu9yGCS
 aT5fqQ/9HVnzHF3hIBQ5Rl0cH4I0pxTCKJMk4BF5JiyWgowEW2gK5qL7a7Lyrovq
 L5h5Vagu6YSFzq3pMtwO2qLJntP2gvOxIBjgPLhyrJDOcfOYyPwPfDojZ7FIkR5d
 U8qpWKjb6q060hDRgCUFF0fCGNyyGV3/2/7tsaiAev7qcpvDwCL26HA8A7rZvFS4
 FySlOAIo3HDwzL/jNCydbdOgT6Tmwggy5Ggccf9GFM2zOxctwbxn5UhUAGOa8DD9
 S+V8+vMYnpl1CdsR4f5W+LzIZ+NF3Xb/JlzyCraP2sysi+28LUUqA+3X5GMnZfVp
 DBDVqvzSfjxioWu+W1AQ/3ww44mds+fRuNkxMc9O7eTGfYB3dvGZsmPeWAscp86G
 xiEIRI5gbTsv8RFJojg9gIzeGghx23LvBXOvVT4/j7vo6yeeDKJfVPz0nc4GgTpU
 Z8YcrRLFCpc90FLDZER8FLOXSi0u1d9SKN0UwVSkKgGqWQFxHroKKoyaev2KZo8c
 nKTa5/dCJU8H5a6p/4frxmkMYPEatEHFlsjjGbM2xzv+FORm16Uo9kR/FXP4w/i3
 Y6nuICM0/KSarSHVpWCmxwE3SWwa0ek9bMxTyYRWWluSjWmd0AtYhIAceGFsbYqv
 4SYCPjHXq2gojvzXeTCML1okpJOGRZlnyzaQqhpvoo7OkV4ZIHE=
 =2AX1
 -----END PGP SIGNATURE-----

Merge 4.4.172 into android-4.4-p

Changes in 4.4.172
	tty/ldsem: Wake up readers after timed out down_write()
	can: gw: ensure DLC boundaries after CAN frame modification
	f2fs: clean up argument of recover_data
	f2fs: cover more area with nat_tree_lock
	f2fs: move sanity checking of cp into get_valid_checkpoint
	f2fs: fix to convert inline directory correctly
	f2fs: give -EINVAL for norecovery and rw mount
	f2fs: remove an obsolete variable
	f2fs: factor out fsync inode entry operations
	f2fs: fix inode cache leak
	f2fs: fix to avoid reading out encrypted data in page cache
	f2fs: not allow to write illegal blkaddr
	f2fs: avoid unneeded loop in build_sit_entries
	f2fs: use crc and cp version to determine roll-forward recovery
	f2fs: introduce get_checkpoint_version for cleanup
	f2fs: put directory inodes before checkpoint in roll-forward recovery
	f2fs: fix to determine start_cp_addr by sbi->cur_cp_pack
	f2fs: detect wrong layout
	f2fs: free meta pages if sanity check for ckpt is failed
	f2fs: fix race condition in between free nid allocator/initializer
	f2fs: return error during fill_super
	f2fs: check blkaddr more accuratly before issue a bio
	f2fs: sanity check on sit entry
	f2fs: enhance sanity_check_raw_super() to avoid potential overflow
	f2fs: clean up with is_valid_blkaddr()
	f2fs: introduce and spread verify_blkaddr
	f2fs: fix to do sanity check with secs_per_zone
	f2fs: fix to do sanity check with user_block_count
	f2fs: Add sanity_check_inode() function
	f2fs: fix to do sanity check with node footer and iblocks
	f2fs: fix to do sanity check with reserved blkaddr of inline inode
	f2fs: fix to do sanity check with block address in main area
	f2fs: fix to do sanity check with block address in main area v2
	f2fs: fix to do sanity check with cp_pack_start_sum
	f2fs: fix invalid memory access
	f2fs: fix missing up_read
	f2fs: fix validation of the block count in sanity_check_raw_super
	media: em28xx: Fix misplaced reset of dev->v4l::field_count
	proc: Remove empty line in /proc/self/status
	arm64/kvm: consistently handle host HCR_EL2 flags
	arm64: Don't trap host pointer auth use to EL2
	ipv6: fix kernel-infoleak in ipv6_local_error()
	net: bridge: fix a bug on using a neighbour cache entry without checking its state
	packet: Do not leak dev refcounts on error exit
	ip: on queued skb use skb_header_pointer instead of pskb_may_pull
	crypto: authencesn - Avoid twice completion call in decrypt path
	crypto: authenc - fix parsing key with misaligned rta_len
	btrfs: wait on ordered extents on abort cleanup
	Yama: Check for pid death before checking ancestry
	scsi: sd: Fix cache_type_store()
	mips: fix n32 compat_ipc_parse_version
	mfd: tps6586x: Handle interrupts on suspend
	Disable MSI also when pcie-octeon.pcie_disable on
	omap2fb: Fix stack memory disclosure
	media: vivid: fix error handling of kthread_run
	media: vivid: set min width/height to a value > 0
	LSM: Check for NULL cred-security on free
	media: vb2: vb2_mmap: move lock up
	sunrpc: handle ENOMEM in rpcb_getport_async
	selinux: fix GPF on invalid policy
	sctp: allocate sctp_sockaddr_entry with kzalloc
	tipc: fix uninit-value in tipc_nl_compat_link_reset_stats
	tipc: fix uninit-value in tipc_nl_compat_bearer_enable
	tipc: fix uninit-value in tipc_nl_compat_link_set
	tipc: fix uninit-value in tipc_nl_compat_name_table_dump
	tipc: fix uninit-value in tipc_nl_compat_doit
	block/loop: Use global lock for ioctl() operation.
	loop: Fold __loop_release into loop_release
	loop: Get rid of loop_index_mutex
	loop: Fix double mutex_unlock(&loop_ctl_mutex) in loop_control_ioctl()
	drm/fb-helper: Ignore the value of fb_var_screeninfo.pixclock
	media: vb2: be sure to unlock mutex on errors
	r8169: Add support for new Realtek Ethernet
	ipv6: Consider sk_bound_dev_if when binding a socket to a v4 mapped address
	ipv6: Take rcu_read_lock in __inet6_bind for mapped addresses
	xfs: don't fail when converting shortform attr to long form during ATTR_REPLACE
	platform/x86: asus-wmi: Tell the EC the OS will handle the display off hotkey
	e1000e: allow non-monotonic SYSTIM readings
	writeback: don't decrement wb->refcnt if !wb->bdi
	MIPS: SiByte: Enable swiotlb for SWARM, LittleSur and BigSur
	arm64: perf: set suppress_bind_attrs flag to true
	jffs2: Fix use of uninitialized delayed_work, lockdep breakage
	pstore/ram: Do not treat empty buffers as valid
	powerpc/pseries/cpuidle: Fix preempt warning
	media: firewire: Fix app_info parameter type in avc_ca{,_app}_info
	net: call sk_dst_reset when set SO_DONTROUTE
	scsi: target: use consistent left-aligned ASCII INQUIRY data
	clk: imx6q: reset exclusive gates on init
	kconfig: fix file name and line number of warn_ignored_character()
	kconfig: fix memory leak when EOF is encountered in quotation
	mmc: atmel-mci: do not assume idle after atmci_request_end
	perf intel-pt: Fix error with config term "pt=0"
	perf svghelper: Fix unchecked usage of strncpy()
	perf parse-events: Fix unchecked usage of strncpy()
	dm kcopyd: Fix bug causing workqueue stalls
	dm snapshot: Fix excessive memory usage and workqueue stalls
	ALSA: bebob: fix model-id of unit for Apogee Ensemble
	sysfs: Disable lockdep for driver bind/unbind files
	scsi: megaraid: fix out-of-bound array accesses
	ocfs2: fix panic due to unrecovered local alloc
	mm/page-writeback.c: don't break integrity writeback on ->writepage() error
	mm, proc: be more verbose about unstable VMA flags in /proc/<pid>/smaps
	net: speed up skb_rbtree_purge()
	ipmi:ssif: Fix handling of multi-part return messages
	Linux 4.4.172

Change-Id: Icbea295f7501881279bdb3a111abfc96c6aa67fc
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-01-28 20:42:21 +01:00
Brian Foster
4c0b9a2eae mm/page-writeback.c: don't break integrity writeback on ->writepage() error
[ Upstream commit 3fa750dcf29e8606e3969d13d8e188cc1c0f511d ]

write_cache_pages() is used in both background and integrity writeback
scenarios by various filesystems.  Background writeback is mostly
concerned with cleaning a certain number of dirty pages based on various
mm heuristics.  It may not write the full set of dirty pages or wait for
I/O to complete.  Integrity writeback is responsible for persisting a set
of dirty pages before the writeback job completes.  For example, an
fsync() call must perform integrity writeback to ensure data is on disk
before the call returns.

write_cache_pages() unconditionally breaks out of its processing loop in
the event of a ->writepage() error.  This is fine for background
writeback, which had no strict requirements and will eventually come
around again.  This can cause problems for integrity writeback on
filesystems that might need to clean up state associated with failed page
writeouts.  For example, XFS performs internal delayed allocation
accounting before returning a ->writepage() error, where applicable.  If
the current writeback happens to be associated with an unmount and
write_cache_pages() completes the writeback prematurely due to error, the
filesystem is unmounted in an inconsistent state if dirty+delalloc pages
still exist.

To handle this problem, update write_cache_pages() to always process the
full set of pages for integrity writeback regardless of ->writepage()
errors.  Save the first encountered error and return it to the caller once
complete.  This facilitates XFS (or any other fs that expects integrity
writeback to process the entire set of dirty pages) to clean up its
internal state completely in the event of persistent mapping errors.
Background writeback continues to exit on the first error encountered.

[akpm@linux-foundation.org: fix typo in comment]
Link: http://lkml.kernel.org/r/20181116134304.32440-1-bfoster@redhat.com
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26 09:42:55 +01:00
Greg Kroah-Hartman
5076221ec5 This is the 4.4.171 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlw/nxwACgkQONu9yGCS
 aT4Ewg//RalHHTMCeO3R89/J/6sp4TI1/xplx+Z9pWyGZZpEAlbJG8DGjJkDn9DZ
 Lw9q5nwV7AQXZsoAqDGryMa4II2oS3B9tcUuZR1yu3qAQ2NsEvNrYTFq6IhbUegU
 SHbvtw8i4gVw7boBRskw1Y4Nzss4cBHhZ+GWie2w5rTbOX/yvnb0D36061s02dzk
 pho1MYcSImxdCvyY+WTKkeAnBFgmfKg94qRDL2JQ7FyDe3ZZ/2ANhcSAH7F2NCcL
 9ej5NbSvnSF1J90aoej0oYrqapfZExydcys7Y2o+KuALzEOYLvsFDrauTFJE50sD
 negZKySiePY+J+Cu0ibfekdeHggiB2FmQEKF1dFX6lgTvNZYeA3bUL0BN3BLiOWb
 /h1sbW7mUi9pH4nH51tPUfs0zn8Skqbte9gM3jq4jKW0oFzobuPjmD/UGDA5XYms
 06vNMUpSc2T5FvWUiR5hFb5KXS9HmmBKUnKmK6L/Wj0eyTqk+XBB3jd9eurNUgup
 yvu/ivvMH0UD6xSNL8Ubj13rDlBAaU/p3AU3Y5J4hpVTMvX/1B6m7yh/j8jknJAb
 XscqzGS8jZIxT5Z8cM1PnBYJN7dqFif+sgkqoElufgkPT5bVzA3wuPYasfykZlMF
 mGsy51pBcy9krrK+SHnqq81RZlu5pupyeFVObKGP0L7HJrSvg6U=
 =dyvz
 -----END PGP SIGNATURE-----

Merge 4.4.171 into android-4.4-p

Changes in 4.4.171
	ALSA: hda/realtek - Disable headset Mic VREF for headset mode of ALC225
	btrfs: cleanup, stop casting for extent_map->lookup everywhere
	btrfs: Enhance chunk validation check
	Btrfs: add validadtion checks for chunk loading
	Btrfs: check inconsistence between chunk and block group
	Btrfs: fix em leak in find_first_block_group
	Btrfs: detect corruption when non-root leaf has zero item
	Btrfs: check btree node's nritems
	Btrfs: fix BUG_ON in btrfs_mark_buffer_dirty
	Btrfs: memset to avoid stale content in btree node block
	Btrfs: improve check_node to avoid reading corrupted nodes
	Btrfs: kill BUG_ON in run_delayed_tree_ref
	Btrfs: memset to avoid stale content in btree leaf
	Btrfs: fix emptiness check for dirtied extent buffers at check_leaf()
	btrfs: struct-funcs, constify readers
	btrfs: Refactor check_leaf function for later expansion
	btrfs: Check if item pointer overlaps with the item itself
	btrfs: Add sanity check for EXTENT_DATA when reading out leaf
	btrfs: Add checker for EXTENT_CSUM
	btrfs: Move leaf and node validation checker to tree-checker.c
	btrfs: tree-checker: Enhance btrfs_check_node output
	btrfs: tree-checker: Fix false panic for sanity test
	btrfs: tree-checker: Add checker for dir item
	btrfs: tree-checker: use %zu format string for size_t
	btrfs: tree-check: reduce stack consumption in check_dir_item
	btrfs: tree-checker: Verify block_group_item
	btrfs: tree-checker: Detect invalid and empty essential trees
	btrfs: validate type when reading a chunk
	btrfs: Check that each block group has corresponding chunk at mount time
	btrfs: Verify that every chunk has corresponding block group at mount time
	btrfs: tree-checker: Check level for leaves and nodes
	btrfs: tree-checker: Fix misleading group system information
	CIFS: Do not hide EINTR after sending network packets
	cifs: Fix potential OOB access of lock element array
	usb: cdc-acm: send ZLP for Telit 3G Intel based modems
	USB: storage: don't insert sane sense for SPC3+ when bad sense specified
	USB: storage: add quirk for SMI SM3350
	USB: Add USB_QUIRK_DELAY_CTRL_MSG quirk for Corsair K70 RGB
	slab: alien caches must not be initialized if the allocation of the alien cache failed
	PCI: altera: Fix altera_pcie_link_is_up()
	PCI: altera: Reorder read/write functions
	PCI: altera: Check link status before retrain link
	PCI: altera: Poll for link up status after retraining the link
	PCI: altera: Poll for link training status after retraining the link
	PCI: altera: Rework config accessors for use without a struct pci_bus
	PCI: altera: Move retrain from fixup to altera_pcie_host_init()
	ACPI: power: Skip duplicate power resource references in _PRx
	i2c: dev: prevent adapter retries and timeout being set as minus value
	crypto: cts - fix crash on short inputs
	ext4: fix a potential fiemap/page fault deadlock w/ inline_data
	sunrpc: use-after-free in svc_process_common()
	Linux 4.4.171

Change-Id: I7c22502f517531eab581d07aee5f8b554a597e47
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-01-16 22:55:22 +01:00
Christoph Lameter
271137c038 slab: alien caches must not be initialized if the allocation of the alien cache failed
commit 09c2e76ed734a1d36470d257a778aaba28e86531 upstream.

Callers of __alloc_alien() check for NULL.  We must do the same check in
__alloc_alien_cache to avoid NULL pointer dereferences on allocation
failures.

Link: http://lkml.kernel.org/r/010001680f42f192-82b4e12e-1565-4ee0-ae1f-1e98974906aa-000000@email.amazonses.com
Fixes: 49dfc304ba ("slab: use the lock on alien_cache, instead of the lock on array_cache")
Fixes: c8522a3a58 ("Slab: introduce alloc_alien")
Signed-off-by: Christoph Lameter <cl@linux.com>
Reported-by: syzbot+d6ed4ec679652b4fd4e4@syzkaller.appspotmail.com
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-16 22:16:11 +01:00
Greg Kroah-Hartman
79e1682f69 This is the 4.4.170 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlw6/14ACgkQONu9yGCS
 aT7NGw//UntCx3g48kmf077FwEf+l85rCffBZDknDppUzHA6fsLbMLYOsrDytRo/
 rfhPZzmvIB8B3JtCJW3QZQiPUBy/TBJP4o6CNALbyYVkr9DVV0H6dqwN3R9tK3zm
 EHg1qaKheUgfBSe+wNGUdyQ2MiSeAGAW0AaAPGqsWthJVKU1MdL7xW85uAn5kfci
 rgMW+kFKEkVEQjzlbSGK1IZQpp5mqyZqXYaYcBkqf2TRmwJAVzwUP2EASrKaWyN7
 cdHSvJsy23fd21GhTd9982tqE8cLu95g3uV/NgdvSEoWynSLgK0hMHUL7ec8amO/
 CLaY0eAq/aJYXiUnKOPy15Uzznp97YyghFxxg59uxhXqKBAGxFJrkPi+Ct3GNVck
 uNtOpFNu2h46uRmFrRz7zyNt3gpQtJOXBnMSsJcI9BNYO1kuy3C5q6MwrhlzAltD
 WdXGQRtXrY7gd8KU3YYIkEgyTxSu1QlSdVutdWOoQD9shaF9vcT3jwl1SyAQWO7o
 YXSDeVrglAmSul0CelxJltE16yJGY6yAbzqP6gssH9zG9nszooSFG1L/UsJfOiMg
 iOEh7K2kwMlkd3HrxbnzaKDJ2hmpz/ipoJA3DKvzo1Dose6604szp5qFkWVQFRRs
 nbo7DZLvfQPe10UWObH/0FDyX5fi4GuxN+BEqVtNvNPHgSRmxPM=
 =eSCs
 -----END PGP SIGNATURE-----

Merge 4.4.170 into android-4.4-p

Changes in 4.4.170
	USB: hso: Fix OOB memory access in hso_probe/hso_get_config_data
	xhci: Don't prevent USB2 bus suspend in state check intended for USB3 only
	USB: serial: option: add GosunCn ZTE WeLink ME3630
	USB: serial: option: add HP lt4132
	USB: serial: option: add Simcom SIM7500/SIM7600 (MBIM mode)
	USB: serial: option: add Fibocom NL668 series
	USB: serial: option: add Telit LN940 series
	mmc: core: Reset HPI enabled state during re-init and in case of errors
	mmc: omap_hsmmc: fix DMA API warning
	gpio: max7301: fix driver for use with CONFIG_VMAP_STACK
	Drivers: hv: vmbus: Return -EINVAL for the sys files for unopened channels
	x86/mtrr: Don't copy uninitialized gentry fields back to userspace
	drm/ioctl: Fix Spectre v1 vulnerabilities
	ip6mr: Fix potential Spectre v1 vulnerability
	ipv4: Fix potential Spectre v1 vulnerability
	ax25: fix a use-after-free in ax25_fillin_cb()
	ibmveth: fix DMA unmap error in ibmveth_xmit_start error path
	ieee802154: lowpan_header_create check must check daddr
	ipv6: explicitly initialize udp6_addr in udp_sock_create6()
	isdn: fix kernel-infoleak in capi_unlocked_ioctl
	netrom: fix locking in nr_find_socket()
	packet: validate address length
	packet: validate address length if non-zero
	sctp: initialize sin6_flowinfo for ipv6 addrs in sctp_inet6addr_event
	vhost: make sure used idx is seen before log in vhost_add_used_n()
	VSOCK: Send reset control packet when socket is partially bound
	xen/netfront: tolerate frags with no data
	gro_cell: add napi_disable in gro_cells_destroy
	sock: Make sock->sk_stamp thread-safe
	ALSA: rme9652: Fix potential Spectre v1 vulnerability
	ALSA: emu10k1: Fix potential Spectre v1 vulnerabilities
	ALSA: pcm: Fix potential Spectre v1 vulnerability
	ALSA: emux: Fix potential Spectre v1 vulnerabilities
	ALSA: hda: add mute LED support for HP EliteBook 840 G4
	ALSA: hda/tegra: clear pending irq handlers
	USB: serial: pl2303: add ids for Hewlett-Packard HP POS pole displays
	USB: serial: option: add Fibocom NL678 series
	usb: r8a66597: Fix a possible concurrency use-after-free bug in r8a66597_endpoint_disable()
	Input: elan_i2c - add ACPI ID for touchpad in ASUS Aspire F5-573G
	KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixup
	perf pmu: Suppress potential format-truncation warning
	ext4: fix possible use after free in ext4_quota_enable
	ext4: missing unlock/put_page() in ext4_try_to_write_inline_data()
	ext4: fix EXT4_IOC_GROUP_ADD ioctl
	ext4: force inode writes when nfsd calls commit_metadata()
	spi: bcm2835: Fix race on DMA termination
	spi: bcm2835: Fix book-keeping of DMA termination
	spi: bcm2835: Avoid finishing transfer prematurely in IRQ mode
	cdc-acm: fix abnormal DATA RX issue for Mediatek Preloader.
	media: vivid: free bitmap_cap when updating std/timings/etc.
	MIPS: Ensure pmd_present() returns false after pmd_mknotpresent()
	MIPS: Align kernel load address to 64KB
	CIFS: Fix error mapping for SMB2_LOCK command which caused OFD lock problem
	x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested
	spi: bcm2835: Unbreak the build of esoteric configs
	powerpc: Fix COFF zImage booting on old powermacs
	ARM: imx: update the cpu power up timing setting on i.mx6sx
	Input: restore EV_ABS ABS_RESERVED
	checkstack.pl: fix for aarch64
	xfrm: Fix bucket count reported to userspace
	scsi: bnx2fc: Fix NULL dereference in error handling
	Input: omap-keypad - fix idle configuration to not block SoC idle states
	scsi: zfcp: fix posting too many status read buffers leading to adapter shutdown
	fork: record start_time late
	hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined
	mm, devm_memremap_pages: mark devm_memremap_pages() EXPORT_SYMBOL_GPL
	mm, devm_memremap_pages: kill mapping "System RAM" support
	sunrpc: fix cache_head leak due to queued request
	sunrpc: use SVC_NET() in svcauth_gss_* functions
	crypto: x86/chacha20 - avoid sleeping with preemption disabled
	ALSA: cs46xx: Potential NULL dereference in probe
	ALSA: usb-audio: Avoid access before bLength check in build_audio_procunit()
	ALSA: usb-audio: Fix an out-of-bound read in create_composite_quirks
	dlm: fixed memory leaks after failed ls_remove_names allocation
	dlm: possible memory leak on error path in create_lkb()
	dlm: lost put_lkb on error path in receive_convert() and receive_unlock()
	dlm: memory leaks on error path in dlm_user_request()
	gfs2: Fix loop in gfs2_rbm_find
	b43: Fix error in cordic routine
	9p/net: put a lower bound on msize
	iommu/vt-d: Handle domain agaw being less than iommu agaw
	ceph: don't update importing cap's mseq when handing cap export
	genwqe: Fix size check
	intel_th: msu: Fix an off-by-one in attribute store
	power: supply: olpc_battery: correct the temperature units
	Linux 4.4.170

Change-Id: I33c9750483716a6c44b40fbea8e729f96af41f52
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-01-13 10:36:32 +01:00
Michal Hocko
060853fdd4 hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined
commit b15c87263a69272423771118c653e9a1d0672caa upstream.

We have received a bug report that an injected MCE about faulty memory
prevents memory offline to succeed on 4.4 base kernel.  The underlying
reason was that the HWPoison page has an elevated reference count and the
migration keeps failing.  There are two problems with that.  First of all
it is dubious to migrate the poisoned page because we know that accessing
that memory is possible to fail.  Secondly it doesn't make any sense to
migrate a potentially broken content and preserve the memory corruption
over to a new location.

Oscar has found out that 4.4 and the current upstream kernels behave
slightly differently with his simply testcase

===

int main(void)
{
        int ret;
        int i;
        int fd;
        char *array = malloc(4096);
        char *array_locked = malloc(4096);

        fd = open("/tmp/data", O_RDONLY);
        read(fd, array, 4095);

        for (i = 0; i < 4096; i++)
                array_locked[i] = 'd';

        ret = mlock((void *)PAGE_ALIGN((unsigned long)array_locked), sizeof(array_locked));
        if (ret)
                perror("mlock");

        sleep (20);

        ret = madvise((void *)PAGE_ALIGN((unsigned long)array_locked), 4096, MADV_HWPOISON);
        if (ret)
                perror("madvise");

        for (i = 0; i < 4096; i++)
                array_locked[i] = 'd';

        return 0;
}
===

+ offline this memory.

In 4.4 kernels he saw the hwpoisoned page to be returned back to the LRU
list
kernel:  [<ffffffff81019ac9>] dump_trace+0x59/0x340
kernel:  [<ffffffff81019e9a>] show_stack_log_lvl+0xea/0x170
kernel:  [<ffffffff8101ac71>] show_stack+0x21/0x40
kernel:  [<ffffffff8132bb90>] dump_stack+0x5c/0x7c
kernel:  [<ffffffff810815a1>] warn_slowpath_common+0x81/0xb0
kernel:  [<ffffffff811a275c>] __pagevec_lru_add_fn+0x14c/0x160
kernel:  [<ffffffff811a2eed>] pagevec_lru_move_fn+0xad/0x100
kernel:  [<ffffffff811a334c>] __lru_cache_add+0x6c/0xb0
kernel:  [<ffffffff81195236>] add_to_page_cache_lru+0x46/0x70
kernel:  [<ffffffffa02b4373>] extent_readpages+0xc3/0x1a0 [btrfs]
kernel:  [<ffffffff811a16d7>] __do_page_cache_readahead+0x177/0x200
kernel:  [<ffffffff811a18c8>] ondemand_readahead+0x168/0x2a0
kernel:  [<ffffffff8119673f>] generic_file_read_iter+0x41f/0x660
kernel:  [<ffffffff8120e50d>] __vfs_read+0xcd/0x140
kernel:  [<ffffffff8120e9ea>] vfs_read+0x7a/0x120
kernel:  [<ffffffff8121404b>] kernel_read+0x3b/0x50
kernel:  [<ffffffff81215c80>] do_execveat_common.isra.29+0x490/0x6f0
kernel:  [<ffffffff81215f08>] do_execve+0x28/0x30
kernel:  [<ffffffff81095ddb>] call_usermodehelper_exec_async+0xfb/0x130
kernel:  [<ffffffff8161c045>] ret_from_fork+0x55/0x80

And that latter confuses the hotremove path because an LRU page is
attempted to be migrated and that fails due to an elevated reference
count.  It is quite possible that the reuse of the HWPoisoned page is some
kind of fixed race condition but I am not really sure about that.

With the upstream kernel the failure is slightly different.  The page
doesn't seem to have LRU bit set but isolate_movable_page simply fails and
do_migrate_range simply puts all the isolated pages back to LRU and
therefore no progress is made and scan_movable_pages finds same set of
pages over and over again.

Fix both cases by explicitly checking HWPoisoned pages before we even try
to get reference on the page, try to unmap it if it is still mapped.  As
explained by Naoya:

: Hwpoison code never unmapped those for no big reason because
: Ksm pages never dominate memory, so we simply didn't have strong
: motivation to save the pages.

Also put WARN_ON(PageLRU) in case there is a race and we can hit LRU
HWPoison pages which shouldn't happen but I couldn't convince myself about
that.  Naoya has noted the following:

: Theoretically no such gurantee, because try_to_unmap() doesn't have a
: guarantee of success and then memory_failure() returns immediately
: when hwpoison_user_mappings fails.
: Or the following code (comes after hwpoison_user_mappings block) also impli=
: es
: that the target page can still have PageLRU flag.
:
:         /*
:          * Torn down by someone else?
:          */
:         if (PageLRU(p) && !PageSwapCache(p) && p->mapping =3D=3D NULL) {
:                 action_result(pfn, MF_MSG_TRUNCATED_LRU, MF_IGNORED);
:                 res =3D -EBUSY;
:                 goto out;
:         }
:
: So I think it's OK to keep "if (WARN_ON(PageLRU(page)))" block in
: current version of your patch.

Link: http://lkml.kernel.org/r/20181206120135.14079-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oscar Salvador <osalvador@suse.com>
Debugged-by: Oscar Salvador <osalvador@suse.com>
Tested-by: Oscar Salvador <osalvador@suse.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-13 10:05:32 +01:00
Greg Kroah-Hartman
4074ea50e0 This is the 4.4.168 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlwYDToACgkQONu9yGCS
 aT79dhAAhjCCEjMpcWGXExuCryWYUKJGV6rI1Hk3o5+Jr6tu/dWnhCLQrrSLgyCR
 qbhBPW/MLpedxnoeLD0Kzo5XDvziB7dNrVgaure923N/Urst4JTH+hMBX6HHUPWY
 vGReKg0a6HNaKsitlTPQaZTNE0uJJ1oCO7mEWYkU571zWaiT8/MT/wo42Ruiab1/
 zw4YVlb74fdZRuazAmTIdszC8MxCoqBJQuzl0UvbKcPtosPdZLywi4Rw0LQNgdcf
 nO/FZE9GPYPw2G/yV3XMp3qs+vVtJpZQrwrF2xHHDfe7Hosk5bB9iEcl6iSbYvyw
 Eir1nD8YTD438sAcLgV3EDRguhQbgBcd23YHFPuyfJrZErZnshfp63iLLYIdZ1Mn
 OP47nilY1/FnvxIzJFn0aHlg+9Ix9RepmPWL31xqHb6a0HuYJRJY6ciLln9v+Mld
 jG4TtuxlGdkQzbkiSnNVbMVcsWMwX4OHmwQLteZvlzdj5bro5ko+8SVaio5TWBRB
 bA9Bw82mKw3BLvlhmgM0Rg0pwJpgXl88r6o5iq2zALVPCUOdFOedoHdCiPpwO1Hl
 eFUY2PYx1YZk8qZXX6eh0LhoHM1Lqyd7qSjDbekKGf1oBVUlLe3umhSGivp3j2is
 ei1usTw3uM3n6thSeIKPn565gyr/CwXbspo3Ym/YG+719a+XwNU=
 =2xi1
 -----END PGP SIGNATURE-----

Merge 4.4.168 into android-4.4-p

Changes in 4.4.168
	ipv6: Check available headroom in ip6_xmit() even without options
	net: 8139cp: fix a BUG triggered by changing mtu with network traffic
	net: phy: don't allow __set_phy_supported to add unsupported modes
	net: Prevent invalid access to skb->prev in __qdisc_drop_all
	rtnetlink: ndo_dflt_fdb_dump() only work for ARPHRD_ETHER devices
	tcp: fix NULL ref in tail loss probe
	tun: forbid iface creation with rtnl ops
	neighbour: Avoid writing before skb->head in neigh_hh_output()
	ARM: OMAP2+: prm44xx: Fix section annotation on omap44xx_prm_enable_io_wakeup
	ARM: OMAP1: ams-delta: Fix possible use of uninitialized field
	sysv: return 'err' instead of 0 in __sysv_write_inode
	s390/cpum_cf: Reject request for sampling in event initialization
	hwmon: (ina2xx) Fix current value calculation
	ASoC: dapm: Recalculate audio map forcely when card instantiated
	hwmon: (w83795) temp4_type has writable permission
	Btrfs: send, fix infinite loop due to directory rename dependencies
	ASoC: omap-mcpdm: Add pm_qos handling to avoid under/overruns with CPU_IDLE
	ASoC: omap-dmic: Add pm_qos handling to avoid overruns with CPU_IDLE
	exportfs: do not read dentry after free
	bpf: fix check of allowed specifiers in bpf_trace_printk
	USB: omap_udc: use devm_request_irq()
	USB: omap_udc: fix crashes on probe error and module removal
	USB: omap_udc: fix omap_udc_start() on 15xx machines
	USB: omap_udc: fix USB gadget functionality on Palm Tungsten E
	KVM: x86: fix empty-body warnings
	net: thunderx: fix NULL pointer dereference in nic_remove
	ixgbe: recognize 1000BaseLX SFP modules as 1Gbps
	net: hisilicon: remove unexpected free_netdev
	drm/ast: fixed reading monitor EDID not stable issue
	xen: xlate_mmu: add missing header to fix 'W=1' warning
	fscache: fix race between enablement and dropping of object
	fscache, cachefiles: remove redundant variable 'cache'
	ocfs2: fix deadlock caused by ocfs2_defrag_extent()
	hfs: do not free node before using
	hfsplus: do not free node before using
	debugobjects: avoid recursive calls with kmemleak
	ocfs2: fix potential use after free
	pstore: Convert console write to use ->write_buf
	ALSA: pcm: remove SNDRV_PCM_IOCTL1_INFO internal command
	KVM: nVMX: fix msr bitmaps to prevent L2 from accessing L0 x2APIC
	KVM: nVMX: mark vmcs12 pages dirty on L2 exit
	KVM: nVMX: Eliminate vmcs02 pool
	KVM: VMX: introduce alloc_loaded_vmcs
	KVM: VMX: make MSR bitmaps per-VCPU
	KVM/x86: Add IBPB support
	KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
	KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL
	KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL
	KVM/x86: Remove indirect MSR op calls from SPEC_CTRL
	x86: reorganize SMAP handling in user space accesses
	x86: fix SMAP in 32-bit environments
	x86: Introduce __uaccess_begin_nospec() and uaccess_try_nospec
	x86/usercopy: Replace open coded stac/clac with __uaccess_{begin, end}
	x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec
	x86/bugs, KVM: Support the combination of guest and host IBRS
	x86/KVM/VMX: Expose SPEC_CTRL Bit(2) to the guest
	KVM: SVM: Move spec control call after restore of GS
	x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL
	x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP
	KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD
	bpf: support 8-byte metafield access
	bpf/verifier: Add spi variable to check_stack_write()
	bpf/verifier: Pass instruction index to check_mem_access() and check_xadd()
	bpf: Prevent memory disambiguation attack
	wil6210: missing length check in wmi_set_ie
	posix-timers: Sanitize overrun handling
	mm/hugetlb.c: don't call region_abort if region_chg fails
	hugetlbfs: fix offset overflow in hugetlbfs mmap
	hugetlbfs: check for pgoff value overflow
	hugetlbfs: fix bug in pgoff overflow checking
	swiotlb: clean up reporting
	sr: pass down correctly sized SCSI sense buffer
	mm: remove write/force parameters from __get_user_pages_locked()
	mm: remove write/force parameters from __get_user_pages_unlocked()
	mm/nommu.c: Switch __get_user_pages_unlocked() to use __get_user_pages()
	mm: replace get_user_pages_unlocked() write/force parameters with gup_flags
	mm: replace get_user_pages_locked() write/force parameters with gup_flags
	mm: replace get_vaddr_frames() write/force parameters with gup_flags
	mm: replace get_user_pages() write/force parameters with gup_flags
	mm: replace __access_remote_vm() write parameter with gup_flags
	mm: replace access_remote_vm() write parameter with gup_flags
	proc: don't use FOLL_FORCE for reading cmdline and environment
	proc: do not access cmdline nor environ from file-backed areas
	media: dvb-frontends: fix i2c access helpers for KASAN
	matroxfb: fix size of memcpy
	staging: speakup: Replace strncpy with memcpy
	rocker: fix rocker_tlv_put_* functions for KASAN
	selftests: Move networking/timestamping from Documentation
	Linux 4.4.168

Change-Id: Icd04a723739ae5e38258a2f6b0aee875f306a0bc
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-12-19 19:22:06 +01:00
Willy Tarreau
adc143b97d proc: do not access cmdline nor environ from file-backed areas
commit 7f7ccc2ccc2e70c6054685f5e3522efa81556830 upstream.

proc_pid_cmdline_read() and environ_read() directly access the target
process' VM to retrieve the command line and environment. If this
process remaps these areas onto a file via mmap(), the requesting
process may experience various issues such as extra delays if the
underlying device is slow to respond.

Let's simply refuse to access file-backed areas in these functions.
For this we add a new FOLL_ANON gup flag that is passed to all calls
to access_remote_vm(). The code already takes care of such failures
(including unmapped areas). Accesses via /proc/pid/mem were not
changed though.

This was assigned CVE-2018-1120.

Note for stable backports: the patch may apply to kernels prior to 4.11
but silently miss one location; it must be checked that no call to
access_remote_vm() keeps zero as the last argument.

Reported-by: Qualys Security Advisory <qsa@qualys.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 4.4:
 - Update the extra call to access_remote_vm() from proc_pid_cmdline_read()
 - Adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17 21:55:17 +01:00
Lorenzo Stoakes
079d9ea862 mm: replace access_remote_vm() write parameter with gup_flags
commit 6347e8d5bcce33fc36e651901efefbe2c93a43ef upstream.

This removes the 'write' argument from access_remote_vm() and replaces
it with 'gup_flags' as use of this function previously silently implied
FOLL_FORCE, whereas after this patch callers explicitly pass this flag.

We make this explicit as use of FOLL_FORCE can result in surprising
behaviour (and hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17 21:55:17 +01:00
Lorenzo Stoakes
2b8143d687 mm: replace __access_remote_vm() write parameter with gup_flags
commit 442486ec1096781c50227b73f721a63974b0fdda upstream.

This removes the 'write' argument from __access_remote_vm() and replaces
it with 'gup_flags' as use of this function previously silently implied
FOLL_FORCE, whereas after this patch callers explicitly pass this flag.

We make this explicit as use of FOLL_FORCE can result in surprising
behaviour (and hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 4.4: adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17 21:55:17 +01:00
Lorenzo Stoakes
8e50b8b07f mm: replace get_user_pages() write/force parameters with gup_flags
commit 768ae309a96103ed02eb1e111e838c87854d8b51 upstream.

This removes the 'write' and 'force' from get_user_pages() and replaces
them with 'gup_flags' to make the use of FOLL_FORCE explicit in callers
as use of this flag can result in surprising behaviour (and hence bugs)
within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 4.4:
 - Drop changes in rapidio, vchiq, goldfish
 - Keep the "write" variable in amdgpu_ttm_tt_pin_userptr() as it's still
   needed
 - Also update calls from various other places that now use
   get_user_pages_remote() upstream, which were updated there by commit
   9beae1ea8930 "mm: replace get_user_pages_remote() write/force ..."
 - Also update calls from hfi1 and ipath
 - Adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17 21:55:16 +01:00
Lorenzo Stoakes
3ec22a6bce mm: replace get_vaddr_frames() write/force parameters with gup_flags
commit 7f23b3504a0df63b724180262c5f3f117f21bcae upstream.

This removes the 'write' and 'force' from get_vaddr_frames() and
replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17 21:55:16 +01:00
Lorenzo Stoakes
ffe6376c18 mm: replace get_user_pages_locked() write/force parameters with gup_flags
commit 3b913179c3fa89dd0e304193fa0c746fc0481447 upstream.

This removes the 'write' and 'force' use from get_user_pages_locked()
and replaces them with 'gup_flags' to make the use of FOLL_FORCE
explicit in callers as use of this flag can result in surprising
behaviour (and hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 4.4: adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17 21:55:16 +01:00
Lorenzo Stoakes
2b29980eb7 mm: replace get_user_pages_unlocked() write/force parameters with gup_flags
commit c164154f66f0c9b02673f07aa4f044f1d9c70274 upstream.

This removes the 'write' and 'force' use from get_user_pages_unlocked()
and replaces them with 'gup_flags' to make the use of FOLL_FORCE
explicit in callers as use of this flag can result in surprising
behaviour (and hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 4.4:
 - Also update calls from process_vm_rw_single_vec() and async_pf_execute()
 - Adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17 21:55:16 +01:00
Ben Hutchings
ff099ed774 mm/nommu.c: Switch __get_user_pages_unlocked() to use __get_user_pages()
Extracted from commit cde70140fed8 "mm/gup: Overload get_user_pages()
functions".  This is needed before picking commit 768ae309a961
"mm: replace get_user_pages() write/force parameters with gup_flags".

Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17 21:55:16 +01:00