commit 0bd50d719b004110e791800450ad204399100a86 upstream.
Due to a silicon issue on the Atom X5-Z8000 "Cherry Trail" processor
series, a common lock must be used to prevent concurrent accesses
across the 4 GPIO controllers managed by this driver.
See Intel Atom Z8000 Processor Series Specification Update
(Rev. 005), errata #CHT34, for further information.
Signed-off-by: Dan O'Donovan <dan@emutex.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 32b9ccbc3522811c0e483637b85ae25f5491296f upstream.
gpiod_get_optional can return either ERR_PTR or NULL pointer.
NULL case is not tested and then dereferenced later in desc_to_gpio.
Fix this by using non optional version which returns ERR_PTR in any
error case (this is not an optional gpio).
Use the same non optional version for the host-wake gpio.
Fixes: 765ea3abd1 ("Bluetooth: hci_intel: Retrieve host-wake IRQ")
Signed-off-by: Loic Poulain <loic.poulain@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3dbd3212f81b2b410a34a922055e2da792864829 upstream.
The commit d56d6b3d7d ("gpio: langwell: add Intel Merrifield support")
doesn't look at all as a proper support for Intel Merrifield and I dare to say
that it distorts the behaviour of the hardware.
The register map is different on Intel Merrifield, i.e. only 6 out of 8
register have the same purpose but none of them has same location in the
address space. The current case potentially harmful to existing hardware since
it's poking registers on wrong offsets and may set some pin to be GPIO output
when connected hardware doesn't expect such.
Besides the above GPIO and pinctrl on Intel Merrifield have been located in
different IP blocks. The functionality has been extended as well, i.e. added
support of level interrupts, special registers for wake capable sources and
thus, in my opinion, requires a completele separate driver.
If someone wondering the existing gpio-intel-mid.c would be converted to actual
pinctrl (which by the fact it is now), though I wouldn't be a volunteer to do
that.
Fixes: d56d6b3d7d ("gpio: langwell: add Intel Merrifield support")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a246b8198f776a16d1d3a3bbfc2d437bad766b29 upstream.
NBANK() macro assumes that ngpios is a multiple of 8(BANK_SZ) and
hence results in 0 banks for PCA9536 which has just 4 gpios. This is
wrong as PCA9356 has 1 bank with 4 gpios. This results in uninitialized
PCA953X_INVERT register. Fix this by using DIV_ROUND_UP macro in
NBANK().
Signed-off-by: Vignesh R <vigneshr@ti.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0058f0871efe7b01c6f2b3046c68196ab73e96da upstream.
When using DMA, half duplex doesn't work properly because rx is not stopped
before starting tx. Ensure we call atmel_stop_rx() in the DMA case.
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e51e4d8a185de90424b03f30181b35f29c46a25a upstream.
When the clk_get() of "uart" clock returns EPROBE_DEFER, the next re-probe
finishes with success but uses invalid (ERR_PTR) values. This leads to
dereferencing of ERR_PTR stored under ourport->clk:
12c30000.serial: Controller clock not found
(...)
12c30000.serial: ttySAC3 at MMIO 0x12c30000 (irq = 61, base_baud = 0) is a S3C6400/10
Unable to handle kernel paging request at virtual address fffffdfb
(clk_prepare) from [<c039f7d0>] (s3c24xx_serial_pm+0x20/0x128)
(s3c24xx_serial_pm) from [<c0395414>] (uart_change_pm+0x38/0x40)
(uart_change_pm) from [<c039689c>] (uart_add_one_port+0x31c/0x44c)
(uart_add_one_port) from [<c03a035c>] (s3c24xx_serial_probe+0x2a8/0x418)
(s3c24xx_serial_probe) from [<c03ee110>] (platform_drv_probe+0x50/0xb0)
(platform_drv_probe) from [<c03ecb44>] (driver_probe_device+0x1f4/0x2b0)
(driver_probe_device) from [<c03eb0c0>] (bus_for_each_drv+0x44/0x8c)
(bus_for_each_drv) from [<c03ec8c8>] (__device_attach+0x9c/0x100)
(__device_attach) from [<c03ebf54>] (bus_probe_device+0x84/0x8c)
(bus_probe_device) from [<c03ec388>] (deferred_probe_work_func+0x60/0x8c)
(deferred_probe_work_func) from [<c012fee4>] (process_one_work+0x120/0x328)
(process_one_work) from [<c0130150>] (worker_thread+0x2c/0x4ac)
(worker_thread) from [<c0135320>] (kthread+0xd8/0xf4)
(kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c)
The first unsuccessful clk_get() causes s3c24xx_serial_init_port() to
exit with failure but the s3c24xx_uart_port is left half-configured
(e.g. port->mapbase is set, clk contains ERR_PTR). On next re-probe,
the function s3c24xx_serial_init_port() will exit early with success
because of configured port->mapbase and driver will use old values,
including the ERR_PTR as clock.
Fix this by cleaning the port->mapbase on error path so each re-probe
will initialize all of the port settings.
Fixes: 60e9357547 ("serial: samsung: enable clock before clearing pending interrupts during init")
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Reviewed-by: Javier Martinez Canillas <javier@osg.samsung.com>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 30acf549ca1e81859a67590ab9ecfce3d1050a0b upstream.
For dm uarts in pio mode tx data is transferred to the fifo register 4
bytes at a time, but care is not taken when these 4 bytes spans the end
of the xmit buffer so the loop might read up to 3 bytes past the buffer
and then skip the actual data at the beginning of the buffer.
Fix this by, analogous to the DMA case, make sure the chunk doesn't
wrap the xmit buffer.
Fixes: 3a878c430f ("tty: serial: msm: Add TX DMA support")
Cc: Ivan Ivanov <iivanov.xz@gmail.com>
Reported-by: Frank Rowand <frowand.list@gmail.com>
Reported-by: Nicolas Dechesne <nicolas.dechesne@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Andy Gross <andy.gross@linaro.org>
Tested-by: Frank Rowand <frank.rowand@am.sony.com>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9113c2aa05e9848cd4f1154abee17d4f265f012d upstream.
In smp_prepare_boot_cpu(), we invoke cpuinfo_store_boot_cpu to store
the cpuinfo in a per-cpu ptr, before initialising the per-cpu offset for
the boot CPU. This patch reorders the sequence to make sure we initialise
the per-cpu offset before accessing the per-cpu area.
Commit 4b998ff188 ("arm64: Delay cpuinfo_store_boot_cpu") fixed the
issue where we modified the per-cpu area even before the kernel initialises
the per-cpu areas, but failed to wait until the boot cpu updated it's
offset.
Fixes: 4b998ff188 ("arm64: Delay cpuinfo_store_boot_cpu")
Cc: <stable@vger.kernel.org> # 4.4+
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2ce39ad15182604beb6c8fa8bed5e46b59fd1082 upstream.
Clearing PSTATE.D is one of the requirements for generating a debug
exception. The arm64 booting protocol requires that PSTATE.D is set,
since many of the debug registers (for example, the hw_breakpoint
registers) are UNKNOWN out of reset and could potentially generate
spurious, fatal debug exceptions in early boot code if PSTATE.D was
clear. Once the debug registers have been safely initialised, PSTATE.D
is cleared, however this is currently broken for two reasons:
(1) The boot CPU clears PSTATE.D in a postcore_initcall and secondary
CPUs clear PSTATE.D in secondary_start_kernel. Since the initcall
runs after SMP (and the scheduler) have been initialised, there is
no guarantee that it is actually running on the boot CPU. In this
case, the boot CPU is left with PSTATE.D set and is not capable of
generating debug exceptions.
(2) In a preemptible kernel, we may explicitly schedule on the IRQ
return path to EL1. If an IRQ occurs with PSTATE.D set in the idle
thread, then we may schedule the kthread_init thread, run the
postcore_initcall to clear PSTATE.D and then context switch back
to the idle thread before returning from the IRQ. The exception
return path will then restore PSTATE.D from the stack, and set it
again.
This patch fixes the problem by moving the clearing of PSTATE.D earlier
to proc.S. This has the desirable effect of clearing it in one place for
all CPUs, long before we have to worry about the scheduler or any
exception handling. We ensure that the previous reset of MDSCR_EL1 has
completed before unmasking the exception, so that any spurious
exceptions resulting from UNKNOWN debug registers are not generated.
Without this patch applied, the kprobes selftests have been seen to fail
under KVM, where we end up attempting to step the OOL instruction buffer
with PSTATE.D set and therefore fail to complete the step.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e19a6ee2460bdd0d0055a6029383422773f9999a upstream.
If we take an exception while at EL1, the exception handler inherits
the original context's addr_limit and PSTATE.UAO values. To be consistent
always reset addr_limit and PSTATE.UAO on (re-)entry to EL1. This
prevents accidental re-use of the original context's addr_limit.
Based on a similar patch for arm from Russell King.
Cc: <stable@vger.kernel.org> # 4.6-
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
[ backport to stop perf misusing inherited addr_limit.
Removed code interacting with UAO and the irqstack ]
Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=822
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 681fef8380eb818c0b845fca5d2ab1dcbab114ee upstream.
The stack object “ci” has a total size of 8 bytes. Its last 3 bytes
are padding bytes which are not initialized and leaked to userland
via “copy_to_user”.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Signed-off-by: Chas Williams <ciwillia@brocade.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4fdef698383db07d829da567e0e405fc41ff3a89 upstream.
This patch fixes an issue that the xfer_work() is possible to cause
NULL pointer dereference if the usb cable is disconnected while data
transfer is running.
In such case, a gadget driver may call usb_ep_disable()) before
xfer_work() is actually called. In this case, the usbhs_pkt_pop()
will call usbhsf_fifo_unselect(), and then usbhs_pipe_to_fifo()
in xfer_work() will return NULL.
Fixes: e73a989 ("usb: renesas_usbhs: add DMAEngine support")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3c0415fa08548e3bc63ef741762664497ab187ed upstream.
This patch adds support for 0x1206 PID of Telit LE910.
Since the interfaces positions are the same than the ones for
0x1043 PID of Telit LE922, telit_le922_blacklist_usbcfg3 is used.
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9cad39fe4e4a4fe95d8ea5a7b0692b0a6e89e38b upstream.
commit f3af36511e ("usb: dwc3: gadget: always
enable IOC on bulk/interrupt transfers") ended up
regressing Isochronous endpoints by clearing
DWC3_EP_BUSY flag too early, which resulted in
choppy audio playback over USB.
Fix that by partially reverting original commit and
making sure that we check for isochronous endpoints.
Fixes: f3af36511e ("usb: dwc3: gadget: always enable IOC
on bulk/interrupt transfers")
Signed-off-by: Konrad Leszczynski <konrad.leszczynski@intel.com>
Signed-off-by: Rafal Redzimski <rafal.f.redzimski@intel.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 25b1f9acc452209ae0fcc8c1332be852b5c52f53 upstream.
BugLink: http://bugs.launchpad.net/bugs/1498667
As reported in BugLink, this device has an issue with Linux Power
Management so adding a quirk. This quirk was reccomended by Alan Stern:
http://lkml.iu.edu/hypermail/linux/kernel/1606.2/05590.html
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 15e4292a2d21e9997fdb2b8c014cc461b3f268f0 upstream.
This patch fixes an issue that the CFIFOSEL register value is possible
to be changed by usbhsg_ep_enable() wrongly. And then, a data transfer
using CFIFO may not work correctly.
For example:
# modprobe g_multi file=usb-storage.bin
# ifconfig usb0 192.168.1.1 up
(During the USB host is sending file to the mass storage)
# ifconfig usb0 down
In this case, since the u_ether.c may call usb_ep_enable() in
eth_stop(), if the renesas_usbhs driver is also using CFIFO for
mass storage, the mass storage may not work correctly.
So, this patch adds usbhs_lock() and usbhs_unlock() calling in
usbhsg_ep_enable() to protect CFIFOSEL register. This is because:
- CFIFOSEL.CURPIPE = 0 is also needed for the pipe configuration
- The CFIFOSEL (fifo->sel) is already protected by usbhs_lock()
Fixes: 97664a207b ("usb: renesas_usbhs: shrink spin lock area")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0015f9156092d07b3ec06d37d014328419d5832e upstream.
This loop is supposed to set all the .num[] values to -1 but it's off by
one so it skips the first element and sets one element past the end of
the array.
I've cleaned up the loop a little as well.
Fixes: ddf8abd259 ('USB: f_fs: the FunctionFS driver')
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ffeee83aa0461992e8a99a59db2df31933e60362 upstream.
Function in_rq_cur copies random bytes from the stack.
Zero the memory instead.
Fixes: 132fcb4608 ("usb: gadget: Add Audio Class 2.0 Driver")
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 615d66c37c755c49ce022c9e5ac0875d27d2603d upstream.
Since commit 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure
after many small jobs") swap entries do not pin memcg->css.refcnt
directly. Instead, they pin memcg->id.ref. So we should adjust the
reference counters accordingly when moving swap charges between cgroups.
Fixes: 73f576c04b941 ("mm: memcontrol: fix cgroup creation failure after many small jobs")
Link: http://lkml.kernel.org/r/9ce297c64954a42dc90b543bc76106c4a94f07e8.1470219853.git.vdavydov@virtuozzo.com
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1f47b61fb4077936465dcde872a4e5cc4fe708da upstream.
An offline memory cgroup might have anonymous memory or shmem left
charged to it and no swap. Since only swap entries pin the id of an
offline cgroup, such a cgroup will have no id and so an attempt to
swapout its anon/shmem will not store memory cgroup info in the swap
cgroup map. As a result, memcg->swap or memcg->memsw will never get
uncharged from it and any of its ascendants.
Fix this by always charging swapout to the first ancestor cgroup that
hasn't released its id yet.
[hannes@cmpxchg.org: add comment to mem_cgroup_swapout]
[vdavydov@virtuozzo.com: use WARN_ON_ONCE() in mem_cgroup_id_get_online()]
Link: http://lkml.kernel.org/r/20160803123445.GJ13263@esperanza
Fixes: 73f576c04b941 ("mm: memcontrol: fix cgroup creation failure after many small jobs")
Link: http://lkml.kernel.org/r/5336daa5c9a32e776067773d9da655d2dc126491.1470219853.git.vdavydov@virtuozzo.com
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org> [3.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 73f576c04b9410ed19660f74f97521bee6e1c546 upstream.
The memory controller has quite a bit of state that usually outlives the
cgroup and pins its CSS until said state disappears. At the same time
it imposes a 16-bit limit on the CSS ID space to economically store IDs
in the wild. Consequently, when we use cgroups to contain frequent but
small and short-lived jobs that leave behind some page cache, we quickly
run into the 64k limitations of outstanding CSSs. Creating a new cgroup
fails with -ENOSPC while there are only a few, or even no user-visible
cgroups in existence.
Although pinning CSSs past cgroup removal is common, there are only two
instances that actually need an ID after a cgroup is deleted: cache
shadow entries and swapout records.
Cache shadow entries reference the ID weakly and can deal with the CSS
having disappeared when it's looked up later. They pose no hurdle.
Swap-out records do need to pin the css to hierarchically attribute
swapins after the cgroup has been deleted; though the only pages that
remain swapped out after offlining are tmpfs/shmem pages. And those
references are under the user's control, so they are manageable.
This patch introduces a private 16-bit memcg ID and switches swap and
cache shadow entries over to using that. This ID can then be recycled
after offlining when the CSS remains pinned only by objects that don't
specifically need it.
This script demonstrates the problem by faulting one cache page in a new
cgroup and deleting it again:
set -e
mkdir -p pages
for x in `seq 128000`; do
[ $((x % 1000)) -eq 0 ] && echo $x
mkdir /cgroup/foo
echo $$ >/cgroup/foo/cgroup.procs
echo trex >pages/$x
echo $$ >/cgroup/cgroup.procs
rmdir /cgroup/foo
done
When run on an unpatched kernel, we eventually run out of possible IDs
even though there are no visible cgroups:
[root@ham ~]# ./cssidstress.sh
[...]
65000
mkdir: cannot create directory '/cgroup/foo': No space left on device
After this patch, the IDs get released upon cgroup destruction and the
cache and css objects get released once memory reclaim kicks in.
[hannes@cmpxchg.org: init the IDR]
Link: http://lkml.kernel.org/r/20160621154601.GA22431@cmpxchg.org
Fixes: b2052564e6 ("mm: memcontrol: continue cache reclaim from offlined groups")
Link: http://lkml.kernel.org/r/20160617162516.GD19084@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: John Garcia <john.garcia@mesosphere.io>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 554a5ccc4e4a20c5f3ec859de0842db4b4b9c77e upstream.
If we hit this error when mounted with errors=continue or
errors=remount-ro:
EXT4-fs error (device loop0): ext4_mb_mark_diskspace_used:2940: comm ext4.exe: Allocating blocks 5090-6081 which overlap fs metadata
then ext4_mb_new_blocks() will call ext4_mb_release_context() and try to
continue. However, ext4_mb_release_context() is the wrong thing to call
here since we are still actually using the allocation context.
Instead, just error out. We could retry the allocation, but there is a
possibility of getting stuck in an infinite loop instead, so this seems
safer.
[ Fixed up so we don't return EAGAIN to userspace. --tytso ]
Fixes: 8556e8f3b6 ("ext4: Don't allow new groups to be added during block allocation")
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c65d5c6c81a1f27dec5f627f67840726fcd146de upstream.
If we encounter a filesystem error during orphan cleanup, we should stop.
Otherwise, we may end up in an infinite loop where the same inode is
processed again and again.
EXT4-fs (loop0): warning: checktime reached, running e2fsck is recommended
EXT4-fs error (device loop0): ext4_mb_generate_buddy:758: group 2, block bitmap and bg descriptor inconsistent: 6117 vs 0 free clusters
Aborting journal on device loop0-8.
EXT4-fs (loop0): Remounting filesystem read-only
EXT4-fs error (device loop0) in ext4_free_blocks:4895: Journal has aborted
EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
EXT4-fs error (device loop0) in ext4_ext_remove_space:3068: IO failure
EXT4-fs error (device loop0) in ext4_ext_truncate:4667: Journal has aborted
EXT4-fs error (device loop0) in ext4_orphan_del:2927: Journal has aborted
EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
EXT4-fs (loop0): Inode 16 (00000000618192a0): orphan list check failed!
[...]
EXT4-fs (loop0): Inode 16 (0000000061819748): orphan list check failed!
[...]
EXT4-fs (loop0): Inode 16 (0000000061819bf0): orphan list check failed!
[...]
See-also: c9eb13a9105 ("ext4: fix hang when processing corrupted orphaned inode list")
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5b9554dc5bf008ae7f68a52e3d7e76c0920938a2 upstream.
If s_reserved_gdt_blocks is extremely large, it's possible for
ext4_init_block_bitmap(), which is called when ext4 sets up an
uninitialized block bitmap, to corrupt random kernel memory. Add the
same checks which e2fsck has --- it must never be larger than
blocksize / sizeof(__u32) --- and then add a backup check in
ext4_init_block_bitmap() in case the superblock gets modified after
the file system is mounted.
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6a7fd522a7c94cdef0a3b08acf8e6702056e635c upstream.
If ext4_fill_super() fails early, it's possible for ext4_evict_inode()
to call ext4_should_journal_data() before superblock options and flags
are fully set up. In that case, the iput() on the journal inode can
end up causing a BUG().
Work around this problem by reordering the tests so we only call
ext4_should_journal_data() after we know it's not the journal inode.
Fixes: 2d859db3e4 ("ext4: fix data corruption in inodes with journalled data")
Fixes: 2b405bfa84 ("ext4: fix data=journal fast mount/umount hang")
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 646caa9c8e196880b41cd3e3d33a2ebc752bdb85 upstream.
Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
deadlock in ext4_writepages() which was previously much harder to hit.
After this commit xfstest generic/130 reproduces the deadlock on small
filesystems.
The problem happens when ext4_do_update_inode() sets LARGE_FILE feature
and marks current inode handle as synchronous. That subsequently results
in ext4_journal_stop() called from ext4_writepages() to block waiting for
transaction commit while still holding page locks, reference to io_end,
and some prepared bio in mpd structure each of which can possibly block
transaction commit from completing and thus results in deadlock.
Fix the problem by releasing page locks, io_end reference, and
submitting prepared bio before calling ext4_journal_stop().
[ Changed to defer the call to ext4_journal_stop() only if the handle
is synchronous. --tytso ]
Reported-and-tested-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f70749ca42943faa4d4dcce46dfdcaadb1d0c4b6 upstream.
An extent with lblock = 4294967295 and len = 1 will pass the
ext4_valid_extent() test:
ext4_lblk_t last = lblock + len - 1;
if (len == 0 || lblock > last)
return 0;
since last = 4294967295 + 1 - 1 = 4294967295. This would later trigger
the BUG_ON(es->es_lblk + es->es_len < es->es_lblk) in ext4_es_end().
We can simplify it by removing the - 1 altogether and changing the test
to use lblock + len <= lblock, since now if len = 0, then lblock + 0 ==
lblock and it fails, and if len > 0 then lblock + len > lblock in order
to pass (i.e. it doesn't overflow).
Fixes: 5946d0893 ("ext4: check for overlapping extents in ext4_valid_extent_entries()")
Fixes: 2f974865f ("ext4: check for zero length extent explicitly")
Cc: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5f070e81bee35f1b7bd1477bb223a873ff657803 upstream.
When there is more data to be processed, the current test in
scatterwalk_done may prevent us from calling pagedone even when
we should.
In particular, if we're on an SG entry spanning multiple pages
where the last page is not a full page, we will incorrectly skip
calling pagedone on the second last page.
This patch fixes this by adding a separate test for whether we've
reached the end of a page.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b30bdfa86431afbafe15284a3ad5ac19b49b88e3 upstream.
As it is if you ask for a sync gcm you may actually end up with
an async one because it does not filter out async implementations
of ghash.
This patch fixes this by adding the necessary filter when looking
for ghash.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 47be61845c775643f1aa4d2a54343549f943c94c upstream.
We triggered soft-lockup under stress test which
open/access/write/close one file concurrently on more than
five different CPUs:
WARN: soft lockup - CPU#0 stuck for 11s! [who:30631]
...
[<ffffffc0003986f8>] dput+0x100/0x298
[<ffffffc00038c2dc>] terminate_walk+0x4c/0x60
[<ffffffc00038f56c>] path_lookupat+0x5cc/0x7a8
[<ffffffc00038f780>] filename_lookup+0x38/0xf0
[<ffffffc000391180>] user_path_at_empty+0x78/0xd0
[<ffffffc0003911f4>] user_path_at+0x1c/0x28
[<ffffffc00037d4fc>] SyS_faccessat+0xb4/0x230
->d_lock trylock may failed many times because of concurrently
operations, and dput() may execute a long time.
Fix this by replacing cpu_relax() with cond_resched().
dput() used to be sleepable, so make it sleepable again
should be safe.
Signed-off-by: Wei Fang <fangwei1@huawei.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9446385f05c9af25fed53dbed3cc75763730be52 upstream.
FUSE_HAS_IOCTL_DIR should be assigned to ->flags, it may be a typo.
Signed-off-by: Wei Fang <fangwei1@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 69fe05c90e ("fuse: add missing INIT flags")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9ebce595f63a407c5cec98f98f9da8459b73740a upstream.
fuse_flush() calls write_inode_now() that triggers writeback, but actual
writeback will happen later, on fuse_sync_writes(). If an error happens,
fuse_writepage_end() will set error bit in mapping->flags. So, we have to
check mapping->flags after fuse_sync_writes().
Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 4d99ff8f12 ("fuse: Turn writeback cache on")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ac7f052b9e1534c8248f814b6f0068ad8d4a06d2 upstream.
Due to implementation of fuse writeback filemap_write_and_wait_range() does
not catch errors. We have to do this directly after fuse_sync_writes()
Signed-off-by: Alexey Kuznetsov <kuznet@virtuozzo.com>
Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 4d99ff8f12 ("fuse: Turn writeback cache on")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f7d665627e103e82d34306c7d3f6f46f387c0d8b upstream.
x86_64 needs to use compat_sys_keyctl for 32-bit userspace rather than
calling sys_keyctl(). The latter will work in a lot of cases, thereby
hiding the issue.
Reported-by: Stephan Mueller <smueller@chronox.de>
Tested-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keyrings@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Link: http://lkml.kernel.org/r/146961615805.14395.5581949237156769439.stgit@warthog.procyon.org.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1886297ce0c8d563a08c8a8c4c0b97743e06cd37 upstream.
The following BUG_ON() crash was reported on QEMU/i386:
kernel BUG at arch/x86/mm/physaddr.c:79!
Call Trace:
phys_mem_access_prot_allowed
mmap_mem
? mmap_region
mmap_region
do_mmap
vm_mmap_pgoff
SyS_mmap_pgoff
do_int80_syscall_32
entry_INT80_32
after commit:
edfe63ec97ed ("x86/mtrr: Fix Xorg crashes in Qemu sessions")
PAT is now set to disabled state when MTRRs are disabled.
Thus, reactivating the __pa(high_memory) check in
phys_mem_access_prot_allowed().
When CONFIG_DEBUG_VIRTUAL is set, __pa() calls __phys_addr(),
which in turn calls slow_virt_to_phys() for 'high_memory'.
Because 'high_memory' is set to (the max direct mapped virt
addr + 1), it is not a valid virtual address. Hence,
slow_virt_to_phys() returns 0 and hit the BUG_ON. Using
__pa_nodebug() instead of __pa() will fix this BUG_ON.
However, this code block, originally written for Pentiums and
earlier, is no longer adequate since a 32-bit Xen guest has
MTRRs disabled and supports ZONE_HIGHMEM. In this setup,
this code sets UC attribute for accessing RAM in high memory
range.
Delete this code block as it has been unused for a long time.
Reported-by: kernel test robot <ying.huang@linux.intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1460403360-25441-1-git-send-email-toshi.kani@hpe.com
Link: https://lkml.org/lkml/2016/4/1/608
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 88ba281108ed0c25c9d292b48bd3f272fcb90dd0 upstream.
Xen supports PAT without MTRRs for its guests. In order to
enable WC attribute, it was necessary for xen_start_kernel()
to call pat_init_cache_modes() to update PAT table before
starting guest kernel.
Now that the kernel initializes PAT table to the BIOS handoff
state when MTRR is disabled, this Xen-specific PAT init code
is no longer necessary. Delete it from xen_start_kernel().
Also change __init_cache_modes() to a static function since
PAT table should not be tweaked by other modules.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: elliott@hpe.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-7-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ad025a73f0e9344ac73ffe1b74c184033e08e7d5 upstream.
get_mtrr_state() calls pat_init() on BSP even if MTRR is disabled.
This results in calling pat_init() on BSP only since APs do not call
pat_init() when MTRR is disabled. This inconsistency between BSP
and APs leads to undefined behavior.
Make BSP's calling condition to pat_init() consistent with AP's,
mtrr_ap_init() and mtrr_aps_init().
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: elliott@hpe.com
Cc: konrad.wilk@oracle.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-6-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit edfe63ec97ed8d4496225f7ba54c9ce4207c5431 upstream.
A Xorg failure on qemu32 was reported as a regression [1] caused by
commit 9cd25aac1f ("x86/mm/pat: Emulate PAT when it is disabled").
This patch fixes the Xorg crash.
Negative effects of this regression were the following two failures [2]
in Xorg on QEMU with QEMU CPU model "qemu32" (-cpu qemu32), which were
triggered by the fact that its virtual CPU does not support MTRRs.
#1. copy_process() failed in the check in reserve_pfn_range()
copy_process
copy_mm
dup_mm
dup_mmap
copy_page_range
track_pfn_copy
reserve_pfn_range
A WC map request was tracked as WC in memtype, which set a PTE as
UC (pgprot) per __cachemode2pte_tbl[]. This led to this error in
reserve_pfn_range() called from track_pfn_copy(), which obtained
a pgprot from a PTE. It converts pgprot to page_cache_mode, which
does not necessarily result in the original page_cache_mode since
__cachemode2pte_tbl[] redirects multiple types to UC.
#2. error path in copy_process() then hit WARN_ON_ONCE in
untrack_pfn().
x86/PAT: Xorg:509 map pfn expected mapping type uncached-
minus for [mem 0xfd000000-0xfdffffff], got write-combining
Call Trace:
dump_stack
warn_slowpath_common
? untrack_pfn
? untrack_pfn
warn_slowpath_null
untrack_pfn
? __kunmap_atomic
unmap_single_vma
? pagevec_move_tail_fn
unmap_vmas
exit_mmap
mmput
copy_process.part.47
_do_fork
SyS_clone
do_syscall_32_irqs_on
entry_INT80_32
These negative effects are caused by two separate bugs, but they
can be addressed in separate patches. Fixing the pat_init() issue
described below addresses the root cause, and avoids Xorg to hit
these cases.
When the CPU does not support MTRRs, MTRR does not call pat_init(),
which leaves PAT enabled without initializing PAT. This pat_init()
issue is a long-standing issue, but manifested as issue #1 (and then
hit issue #2) with the above-mentioned commit because the memtype
now tracks cache attribute with 'page_cache_mode'.
This pat_init() issue existed before the commit, but we used pgprot
in memtype. Hence, we did not have issue #1 before. But WC request
resulted in WT in effect because WC pgrot is actually WT when PAT
is not initialized. This is not how it was designed to work. When
PAT is set to disable properly, WC is converted to UC. The use of
WT can result in a system crash if the target range does not support
WT. Fortunately, nobody ran into such issue before.
To fix this pat_init() issue, PAT code has been enhanced to provide
pat_disable() interface. Call this interface when MTRRs are disabled.
By setting PAT to disable properly, PAT bypasses the memtype check,
and avoids issue #1.
[1]: https://lkml.org/lkml/2016/3/3/828
[2]: https://lkml.org/lkml/2016/3/4/775
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: elliott@hpe.com
Cc: konrad.wilk@oracle.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-5-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d63dcf49cf5ae5605f4d14229e3888e104f294b1 upstream.
Borislav Petkov suggested:
> Please use on init paths boot_cpu_has(X86_FEATURE_PAT) and on fast
> paths static_cpu_has(X86_FEATURE_PAT). No more of that cpu_has_XXX
> ugliness.
Replace the use of cpu_has_pat on init paths with boot_cpu_has().
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Elliott <elliott@hpe.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: konrad.wilk@oracle.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-4-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 224bb1e5d67ba0f2872c98002d6a6f991ac6fd4a upstream.
In preparation for fixing a regression caused by:
9cd25aac1f ("x86/mm/pat: Emulate PAT when it is disabled")
... PAT needs to provide an interface that prevents the OS from
initializing the PAT MSR.
PAT MSR initialization must be done on all CPUs using the specific
sequence of operations defined in the Intel SDM. This requires MTRRs
to be enabled since pat_init() is called as part of MTRR init
from mtrr_rendezvous_handler().
Make pat_disable() as the interface that prevents the OS from
initializing the PAT MSR. MTRR will call this interface when it
cannot provide the SDM-defined sequence to initialize PAT.
This also assures that pat_disable() called from pat_bsp_init()
will set the PAT table properly when CPU does not support PAT.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Elliott <elliott@hpe.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: konrad.wilk@oracle.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-3-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 02f037d641dc6672be5cfe7875a48ab99b95b154 upstream.
In preparation for fixing a regression caused by:
9cd25aac1f ("x86/mm/pat: Emulate PAT when it is disabled")'
... PAT needs to support a case that PAT MSR is initialized with a
non-default value.
When pat_init() is called and PAT is disabled, it initializes the
PAT table with the BIOS default value. Xen, however, sets PAT MSR
with a non-default value to enable WC. This causes inconsistency
between the PAT table and PAT MSR when PAT is set to disable on Xen.
Change pat_init() to handle the PAT disable cases properly. Add
init_cache_modes() to handle two cases when PAT is set to disable.
1. CPU supports PAT: Set PAT table to be consistent with PAT MSR.
2. CPU does not support PAT: Set PAT table to be consistent with
PWT and PCD bits in a PTE.
Note, __init_cache_modes(), renamed from pat_init_cache_modes(),
will be changed to a static function in a later patch.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: elliott@hpe.com
Cc: konrad.wilk@oracle.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-2-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 67245ff332064c01b760afa7a384ccda024bfd24 upstream.
This gets rid of the horrible notion of having that
struct inode *ptmx_inode
be the linchpin of the interface between the pty code and devpts.
By de-emphasizing the ptmx inode, a lot of things actually get cleaner,
and we will have a much saner way forward. In particular, this will
allow us to associate with any particular devpts instance at open-time,
and not be artificially tied to one particular ptmx inode.
The patch itself is actually fairly straightforward, and apart from some
locking and return path cleanups it's pretty mechanical:
- the interfaces that devpts exposes all take "struct pts_fs_info *"
instead of "struct inode *ptmx_inode" now.
NOTE! The "struct pts_fs_info" thing is a completely opaque structure
as far as the pty driver is concerned: it's still declared entirely
internally to devpts. So the pty code can't actually access it in any
way, just pass it as a "cookie" to the devpts code.
- the "look up the pts fs info" is now a single clear operation, that
also does the reference count increment on the pts superblock.
So "devpts_add/del_ref()" is gone, and replaced by a "lookup and get
ref" operation (devpts_get_ref(inode)), along with a "put ref" op
(devpts_put_ref()).
- the pty master "tty->driver_data" field now contains the pts_fs_info,
not the ptmx inode.
- because we don't care about the ptmx inode any more as some kind of
base index, the ref counting can now drop the inode games - it just
gets the ref on the superblock.
- the pts_fs_info now has a back-pointer to the super_block. That's so
that we can easily look up the information we actually need. Although
quite often, the pts fs info was actually all we wanted, and not having
to look it up based on some magical inode makes things more
straightforward.
In particular, now that "devpts_get_ref(inode)" operation should really
be the *only* place we need to look up what devpts instance we're
associated with, and we do it exactly once, at ptmx_open() time.
The other side of this is that one ptmx node could now be associated
with multiple different devpts instances - you could have a single
/dev/ptmx node, and then have multiple mount namespaces with their own
instances of devpts mounted on /dev/pts/. And that's all perfectly sane
in a model where we just look up the pts instance at open time.
This will eventually allow us to get rid of our odd single-vs-multiple
pts instance model, but this patch in itself changes no semantics, only
an internal binding model.
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Jann Horn <jann@thejh.net>
Cc: Greg KH <greg@kroah.com>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Florian Weimer <fw@deneb.enyo.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Francesco Ruggeri <fruggeri@arista.com>
Cc: "Herton R. Krzesinski" <herton@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 86a574de4590ffe6fd3f3ca34cdcf655a78e36ec upstream.
Don't allow RNDADDTOENTCNT or RNDADDENTROPY to accept a negative
entropy value. It doesn't make any sense to subtract from the entropy
counter, and it can trigger a warning:
random: negative entropy/overflow: pool input count -40000
------------[ cut here ]------------
WARNING: CPU: 3 PID: 6828 at drivers/char/random.c:670[< none
>] credit_entropy_bits+0x21e/0xad0 drivers/char/random.c:670
Modules linked in:
CPU: 3 PID: 6828 Comm: a.out Not tainted 4.7.0-rc4+ #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
ffffffff880b58e0 ffff88005dd9fcb0 ffffffff82cc838f ffffffff87158b40
fffffbfff1016b1c 0000000000000000 0000000000000000 ffffffff87158b40
ffffffff83283dae 0000000000000009 ffff88005dd9fcf8 ffffffff8136d27f
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff82cc838f>] dump_stack+0x12e/0x18f lib/dump_stack.c:51
[<ffffffff8136d27f>] __warn+0x19f/0x1e0 kernel/panic.c:516
[<ffffffff8136d48c>] warn_slowpath_null+0x2c/0x40 kernel/panic.c:551
[<ffffffff83283dae>] credit_entropy_bits+0x21e/0xad0 drivers/char/random.c:670
[< inline >] credit_entropy_bits_safe drivers/char/random.c:734
[<ffffffff8328785d>] random_ioctl+0x21d/0x250 drivers/char/random.c:1546
[< inline >] vfs_ioctl fs/ioctl.c:43
[<ffffffff8185316c>] do_vfs_ioctl+0x18c/0xff0 fs/ioctl.c:674
[< inline >] SYSC_ioctl fs/ioctl.c:689
[<ffffffff8185405f>] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:680
[<ffffffff86a995c0>] entry_SYSCALL_64_fastpath+0x23/0xc1
arch/x86/entry/entry_64.S:207
---[ end trace 5d4902b2ba842f1f ]---
This was triggered using the test program:
// autogenerated by syzkaller (http://github.com/google/syzkaller)
int main() {
int fd = open("/dev/random", O_RDWR);
int val = -5000;
ioctl(fd, RNDADDTOENTCNT, &val);
return 0;
}
It's harmless in that (a) only root can trigger it, and (b) after
complaining the code never does let the entropy count go negative, but
it's better to simply not allow this userspace from passing in a
negative entropy value altogether.
Google-Bug-Id: #29575089
Reported-By: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5419447e2142d6ed68c9f5c1a28630b3a290a845 upstream.
This reverts commit 852ffd0f4e.
There are use cases where an intermediate boot kernel (1) uses kexec
to boot the final production kernel (2). For this scenario we should
provide the original boot information to the production kernel (2).
Therefore clearing the boot information during kexec() should not
be done.
Reported-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>