Commit graph

21915 commits

Author SHA1 Message Date
Morten Rasmussen
68a3b157d9 UPSTREAM: sched/core: Introduce SD_ASYM_CPUCAPACITY sched_domain topology flag
Add a topology flag to the sched_domain hierarchy indicating the lowest
domain level where the full range of CPU capacities is represented by
the domain members for asymmetric capacity topologies (e.g. ARM
big.LITTLE).

The flag is intended to indicate that extra care should be taken when
placing tasks on CPUs and this level spans all the different types of
CPUs found in the system (no need to look further up the domain
hierarchy). This information is currently only available through
iterating through the capacities of all the CPUs at parent levels in the
sched_domain hierarchy.

  SD 2      [  0      1      2      3]  SD_ASYM_CPUCAPACITY

  SD 1      [  0      1] [   2      3]  !SD_ASYM_CPUCAPACITY

  CPU:         0      1      2      3
  capacity:  756    756   1024   1024

If the topology in the example above is duplicated to create an eight
CPU example with third sched_domain level on top (SD 3), this level
should not have the flag set (!SD_ASYM_CPUCAPACITY) as its two group
would both have all CPU capacities represented within them.

Change-Id: I1526407b90567cac387419719b7d7fdc8b259a85
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: freedom.tan@mediatek.com
Cc: keita.kobayashi.ym@renesas.com
Cc: mgalbraith@suse.de
Cc: sgurrappadi@nvidia.com
Cc: vincent.guittot@linaro.org
Cc: yuyang.du@intel.com
Link: http://lkml.kernel.org/r/1469453670-2660-6-git-send-email-morten.rasmussen@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 1f6e6c7cb9bcd58abb5ee11243e0eefe6b36fc8e)
[trivial merge conflict]
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:52 -07:00
Morten Rasmussen
3e9cdd5ae9 UPSTREAM: sched/core: Remove unnecessary NULL-pointer check
Checking if the sched_domain pointer returned by sd_init() is NULL seems
pointless as sd_init() neither checks if it is valid to begin with nor
set it to NULL.

Change-Id: I5e16fd0c2ca7234b097be7c95409ddb15c5e9de9
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: freedom.tan@mediatek.com
Cc: keita.kobayashi.ym@renesas.com
Cc: mgalbraith@suse.de
Cc: sgurrappadi@nvidia.com
Cc: vincent.guittot@linaro.org
Cc: yuyang.du@intel.com
Link: http://lkml.kernel.org/r/1469453670-2660-5-git-send-email-morten.rasmussen@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 0e6d2a67a41321b3ef650b780a279a37855de08e)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:51 -07:00
Morten Rasmussen
bc7c939b3a UPSTREAM: sched/fair: Optimize find_idlest_cpu() when there is no choice
In the current find_idlest_group()/find_idlest_cpu() search we end up
calling find_idlest_cpu() in a sched_group containing only one CPU in
the end. Checking idle-states becomes pointless when there is no
alternative, so bail out instead.

Change-Id: Ic62bf09b53a7984143ac2431aaa69c69b204cd56
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: linux-kernel@vger.kernel.org
Cc: mgalbraith@suse.de
Cc: vincent.guittot@linaro.org
Cc: yuyang.du@intel.com
Link: http://lkml.kernel.org/r/1466615004-3503-4-git-send-email-morten.rasmussen@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit eaecf41f5abf80b63c8e025fcb9ee4aa203c3038)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:51 -07:00
Morten Rasmussen
3bb3d7e7d9 BACKPORT: sched/fair: Make the use of prev_cpu consistent in the wakeup path
In commit:

  ac66f54772 ("sched/numa: Introduce migrate_swap()")

select_task_rq() got a 'cpu' argument to enable overriding of prev_cpu
in special cases (NUMA task swapping).

However, the select_task_rq_fair() helper functions: wake_affine() and
select_idle_sibling(), still use task_cpu(p) directly to work out
prev_cpu, which leads to inconsistencies.

This patch passes prev_cpu (potentially overridden by NUMA code) into
the helper functions to ensure prev_cpu is indeed the same CPU
everywhere in the wakeup path.

Change-Id: I4951c4eead2e6045e4fb34e89f6cda17d881d4d7
cc: Ingo Molnar <mingo@redhat.com>
cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: linux-kernel@vger.kernel.org
Cc: mgalbraith@suse.de
Cc: vincent.guittot@linaro.org
Cc: yuyang.du@intel.com
Link: http://lkml.kernel.org/r/1466615004-3503-3-git-send-email-morten.rasmussen@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 772bd008cd9a1d4e8ce566f2edcc61d1c28fcbe5)
[merged with Android/EAS wakeup path]
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:51 -07:00
Dietmar Eggemann
cb88574a68 Partial Revert: "WIP: sched: Add cpu capacity awareness to wakeup balancing"
Revert the changes in find_idlest_cpu() and find_idlest_group().

Keep the infrastructure bits which are used in following EAS patches.

Change-Id: Id516ca5f3e51b9a13db1ebb8de2df3aa25f9679b
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
2017-06-02 08:01:51 -07:00
Dietmar Eggemann
bd6ff3505f Revert "WIP: sched: Consider spare cpu capacity at task wake-up"
This reverts commit 75a9695b619741019363f889c99c97c7bb823797.

Change-Id: I846b21f2bdeb0b0ca30ad65683564ed07a429428
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
[ minor merge changes ]
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:51 -07:00
Viresh Kumar
5c015afebd FROM-LIST: cpufreq: schedutil: Redefine the rate_limit_us tunable
The rate_limit_us tunable is intended to reduce the possible overhead
from running the schedutil governor.  However, that overhead can be
divided into two separate parts: the governor computations and the
invocation of the scaling driver to set the CPU frequency.  The latter
is where the real overhead comes from.  The former is much less
expensive in terms of execution time and running it every time the
governor callback is invoked by the scheduler, after rate_limit_us
interval has passed since the last frequency update, would not be a
problem.

For this reason, redefine the rate_limit_us tunable so that it means the
minimum time that has to pass between two consecutive invocations of the
scaling driver by the schedutil governor (to set the CPU frequency).

Change-Id: Iced64116b826c25441ef537c27a3dabfcf81919e
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[pulled from linux-pm linux-next https://patchwork.kernel.org/patch/9583949/ ]
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:51 -07:00
Steve Muckle
51b20b214f cpufreq: schedutil: add up/down frequency transition rate limits
The rate-limit tunable in the schedutil governor applies to transitions
to both lower and higher frequencies. On several platforms it is not the
ideal tunable though, as it is difficult to get best power/performance
figures using the same limit in both directions.

It is common on mobile platforms with demanding user interfaces to want
to increase frequency rapidly for example but decrease slowly.

One of the example can be a case where we have short busy periods
followed by similar or longer idle periods. If we keep the rate-limit
high enough, we will not go to higher frequencies soon enough. On the
other hand, if we keep it too low, we will have too many frequency
transitions, as we will always reduce the frequency after the busy
period.

It would be very useful if we can set low rate-limit while increasing
the frequency (so that we can respond to the short busy periods quickly)
and high rate-limit while decreasing frequency (so that we don't reduce
the frequency immediately after the short busy period and that may avoid
frequency transitions before the next busy period).

Implement separate up/down transition rate limits. Note that the
governor avoids frequency recalculations for a period equal to minimum
of up and down rate-limit. A global mutex is also defined to protect
updates to min_rate_limit_us via two separate sysfs files.

Note that this wouldn't change behavior of the schedutil governor for
the platforms which wish to keep same values for both up and down rate
limits.

This is tested with the rt-app [1] on ARM Exynos, dual A15 processor
platform.

Testcase: Run a SCHED_OTHER thread on CPU0 which will emulate work-load
for X ms of busy period out of the total period of Y ms, i.e. Y - X ms
of idle period. The values of X/Y taken were: 20/40, 20/50, 20/70, i.e
idle periods of 20, 30 and 50 ms respectively. These were tested against
values of up/down rate limits as: 10/10 ms and 10/40 ms.

For every test we noticed a performance increase of 5-10% with the
schedutil governor, which was very much expected.

[Viresh]: Simplified user interface and introduced min_rate_limit_us +
	      mutex, rewrote commit log and included test results.

[1] https://github.com/scheduler-tools/rt-app/

Change-Id: I18720a83855b196b8e21dcdc8deae79131635b84
Signed-off-by: Steve Muckle <smuckle.linux@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
(applied from https://marc.info/?l=linux-kernel&m=147936011103832&w=2)
[trivial adaptations]
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
2017-06-02 08:01:51 -07:00
Juri Lelli
f71d9f01c6 sched/cpufreq: make schedutil use WALT signal
If WALT is available and enabled, make schedutil governor use its
utilization signal.

Change-Id: I92bc37989447a76616e9bcc4e9e8616774fb9925
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
[we need to use boosted_cpu_util for schedutil, so make it
 not static]
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:51 -07:00
Steve Muckle
e5da6c11b2 sched: cpufreq: use rt_avg as estimate of required RT CPU capacity
A policy of going to fmax on any RT activity will be detrimental
for power on many platforms. Often RT accounts for only a small amount
of CPU activity so sending the CPU frequency to fmax is overkill. Worse
still, some platforms may not be able to even complete the CPU frequency
change before the RT activity has already completed.

Cpufreq governors have not treated RT activity this way in the past so
it is not part of the expected semantics of the RT scheduling class. The
DL class offers guarantees about task completion and could be used for
this purpose.

Modify the schedutil algorithm to instead use rt_avg as an estimate of
RT utilization of the CPU.

Based on previous work by Vincent Guittot <vincent.guittot@linaro.org>.

Change-Id: I1ed605a3e2512a94d34217a8e57c3fd97cca60be
Signed-off-by: Steve Muckle <smuckle@linaro.org>
2017-06-02 08:01:51 -07:00
Viresh Kumar
e2aa75a4c7 cpufreq: schedutil: move slow path from workqueue to SCHED_FIFO task
If slow path frequency changes are conducted in a SCHED_OTHER context
then they may be delayed for some amount of time, including
indefinitely, when real time or deadline activity is taking place.

Move the slow path to a real time kernel thread. In the future the
thread should be made SCHED_DEADLINE. The RT priority is arbitrarily set
to 50 for now.

Hackbench results on ARM Exynos, dual core A15 platform for 10
iterations:

$ hackbench -s 100 -l 100 -g 10 -f 20

Before			After
---------------------------------
1.808			1.603
1.847			1.251
2.229			1.590
1.952			1.600
1.947			1.257
1.925			1.627
2.694			1.620
1.258			1.621
1.919			1.632
1.250			1.240

Average:

1.8829			1.5041

Based on initial work by Steve Muckle.

Change-Id: I8f53037e94f353960c6d10abf07822d671631ef7
Signed-off-by: Steve Muckle <smuckle.linux@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from 02a7b1ee3baa)
[adapt to the 3.18 kthread interface]
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
2017-06-02 08:01:51 -07:00
Petr Mladek
78213729a7 BACKPORT: kthread: allow to cancel kthread work
We are going to use kthread workers more widely and sometimes we will need
to make sure that the work is neither pending nor running.

This patch implements cancel_*_sync() operations as inspired by
workqueues.  Well, we are synchronized against the other operations via
the worker lock, we use del_timer_sync() and a counter to count parallel
cancel operations.  Therefore the implementation might be easier.

First, we check if a worker is assigned.  If not, the work has newer been
queued after it was initialized.

Second, we take the worker lock.  It must be the right one.  The work must
not be assigned to another worker unless it is initialized in between.

Third, we try to cancel the timer when it exists.  The timer is deleted
synchronously to make sure that the timer call back is not running.  We
need to temporary release the worker->lock to avoid a possible deadlock
with the callback.  In the meantime, we set work->canceling counter to
avoid any queuing.

Fourth, we try to remove the work from a worker list. It might be
the list of either normal or delayed works.

Fifth, if the work is running, we call kthread_flush_work().  It might
take an arbitrary time.  We need to release the worker-lock again.  In the
meantime, we again block any queuing by the canceling counter.

As already mentioned, the check for a pending kthread work is done under a
lock.  In compare with workqueues, we do not need to fight for a single
PENDING bit to block other operations.  Therefore we do not suffer from
the thundering storm problem and all parallel canceling jobs might use
kthread_flush_work().  Any queuing is blocked until the counter gets zero.

Change-Id: I8a8ece0f93c828f311d0ad5c88d80db2388e4808
Link: http://lkml.kernel.org/r/1470754545-17632-10-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry-picked from 37be45d49dec2a411e29d50c9597cfe8184b5645)
[major changes to the original patch while cherry-picking; only rebased
the sync variant]
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
2017-06-02 08:01:50 -07:00
Steve Muckle
ca7b7d3c99 sched/cpufreq: fix tunables for schedfreq governor
The schedfreq governor does not currently handle cpufreq drivers which
use a global set of tunables (!have_governor_per_policy).

For example on x86 and using the acpi cpufreq driver, doing this

  cat /sys/devices/system/cpu/cpufreq/sched/up_throttle_nsec

will result in a bad pointer access.

Update the tunable code using the upstream schedutil tunable code by
Rafael Wysocki as a guide.

Includes a partial backport of the reorganized cpufreq tunable
infrastructure.

Change-Id: I7e6f8de1dac297077ad43f37dd2f6ddbfe921c98
Signed-off-by: Steve Muckle <smuckle@linaro.org>
[fixed cherry-pick issue]
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
[fixed cherry-pick issue]
Signed-off-by: Thierry Strudel <tstrudel@google.com>
2017-06-02 08:01:50 -07:00
Steve Muckle
6bc6115c16 BACKPORT: cpufreq: schedutil: New governor based on scheduler utilization data
Add a new cpufreq scaling governor, called "schedutil", that uses
scheduler-provided CPU utilization information as input for making
its decisions.

Doing that is possible after commit 34e2c55 (cpufreq: Add
mechanism for registering utilization update callbacks) that
introduced cpufreq_update_util() called by the scheduler on
utilization changes (from CFS) and RT/DL task status updates.
In particular, CPU frequency scaling decisions may be based on
the the utilization data passed to cpufreq_update_util() by CFS.

The new governor is relatively simple.

The frequency selection formula used by it depends on whether or not
the utilization is frequency-invariant.  In the frequency-invariant
case the new CPU frequency is given by

	next_freq = 1.25 * max_freq * util / max

where util and max are the last two arguments of cpufreq_update_util().
In turn, if util is not frequency-invariant, the maximum frequency in
the above formula is replaced with the current frequency of the CPU:

	next_freq = 1.25 * curr_freq * util / max

The coefficient 1.25 corresponds to the frequency tipping point at
(util / max) = 0.8.

All of the computations are carried out in the utilization update
handlers provided by the new governor.  One of those handlers is
used for cpufreq policies shared between multiple CPUs and the other
one is for policies with one CPU only (and therefore it doesn't need
to use any extra synchronization means).

The governor supports fast frequency switching if that is supported
by the cpufreq driver in use and possible for the given policy.
In the fast switching case, all operations of the governor take
place in its utilization update handlers.  If fast switching cannot
be used, the frequency switch operations are carried out with the
help of a work item which only calls __cpufreq_driver_target()
(under a mutex) to trigger a frequency update (to a value already
computed beforehand in one of the utilization update handlers).

Currently, the governor treats all of the RT and DL tasks as
"unknown utilization" and sets the frequency to the allowed
maximum when updated from the RT or DL sched classes.  That
heavy-handed approach should be replaced with something more
subtle and specifically targeted at RT and DL tasks.

The governor shares some tunables management code with the
"ondemand" and "conservative" governors and uses some common
definitions from cpufreq_governor.h, but apart from that it
is stand-alone.

Change-Id: I03876e622768e4b3ee4dc28682af7cce771f2f4c
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
(cherry-picked from 9bdcb44e391da5c41b98573bf0305a0e0b1c9569)
[ Backport the schedutil cpufreq governor from 4.9. Some cpufreq
  tunable infrastructure as well as the resolve_freq API is also
  backported as those are dependencies]
Signed-off-by: Steve Muckle <smuckle@linaro.org>
[trivial cherry-picking fixes]
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
[fixed default governor machinery]
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:50 -07:00
Steve Muckle
f02702dcf2 sched: backport cpufreq hooks from 4.9-rc4
The scheduler cpufreq hooks are required by the schedutil cpufreq
governor.

Change-Id: Ied6c46262bb33b7e81bbb3d3d2761124e0c676b7
Signed-off-by: Steve Muckle <smuckle@linaro.org>
[trivial cherry-picking fixes]
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-06-02 08:01:50 -07:00
Greg Kroah-Hartman
9bc462220d This is the 4.4.70 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlkm0zAACgkQONu9yGCS
 aT5QnxAAh9uZYFJtQ7wYngD7cQcDH1KVztqEYxCP5OtxzAZBrSNBufLdhKBbc1ZP
 C04Mo+FzzNiJtBwkmlOqYaEPYUSx/uwCEk9mNX85VtchIhKBrwWF7GxkeXCPs6e5
 yP5TUXmxbbSp3qM4q2Z4XSW8eEPZ2l3zoy0fkjz2kS02e4RW0yQ34dvzw0BG2urr
 +9ocyVjDBoU3QNKyVw3fd1AltKesSZK0fa2vEO+TOTW6Bm3xD4egCJdOzu9saUwK
 hfSKXsJ0/pf1r1iyfz2foR/Hi3i4j6vRqnneyqozT7nxEJEuBQ3B5WhnsbDfzrXu
 +CY23KBkDkQ1RBngmtTQd3ABHEN1E2StpBImG5RUr+5giV6/e4rdz0/HWGMvCvAz
 iWqXdgZNdCnc96HPEWaDGUKxndCxsiaJOhgZwW2zm/0drVWRE+vjsOmFLyUp2Ky1
 1vnKfwlvTFU4xjQ5H44AuuSHQsv+GNEtPPIHrbBv/wg90/2VuF0aYuNYjHSsc4Ca
 3YM53S6/sjQqmsKixWboax8Kh2wRrEuFbqSFQV64JjFpGau61JQFMtRNl4+FFXzm
 Cm+26Fan4Wtyo5zB9xnBZbDwCOXqwTXQYUP2SejtObq+Uk2tXxF05emeta9pURF3
 vdgv6N0cTPm4K3VZyBZvj8JitEr2OEaIxoUqE2BXkA1MPmbqOoI=
 =Z1no
 -----END PGP SIGNATURE-----

Merge 4.4.70 into android-4.4

Changes in 4.4.70
	usb: misc: legousbtower: Fix buffers on stack
	usb: misc: legousbtower: Fix memory leak
	USB: ene_usb6250: fix DMA to the stack
	watchdog: pcwd_usb: fix NULL-deref at probe
	char: lp: fix possible integer overflow in lp_setup()
	USB: core: replace %p with %pK
	ARM: tegra: paz00: Mark panel regulator as enabled on boot
	tpm_crb: check for bad response size
	infiniband: call ipv6 route lookup via the stub interface
	dm btree: fix for dm_btree_find_lowest_key()
	dm raid: select the Kconfig option CONFIG_MD_RAID0
	dm bufio: avoid a possible ABBA deadlock
	dm bufio: check new buffer allocation watermark every 30 seconds
	dm cache metadata: fail operations if fail_io mode has been established
	dm bufio: make the parameter "retain_bytes" unsigned long
	dm thin metadata: call precommit before saving the roots
	dm space map disk: fix some book keeping in the disk space map
	md: update slab_cache before releasing new stripes when stripes resizing
	rtlwifi: rtl8821ae: setup 8812ae RFE according to device type
	mwifiex: pcie: fix cmd_buf use-after-free in remove/reset
	ima: accept previously set IMA_NEW_FILE
	KVM: x86: Fix load damaged SSEx MXCSR register
	KVM: X86: Fix read out-of-bounds vulnerability in kvm pio emulation
	regulator: tps65023: Fix inverted core enable logic.
	s390/kdump: Add final note
	s390/cputime: fix incorrect system time
	ath9k_htc: Add support of AirTies 1eda:2315 AR9271 device
	ath9k_htc: fix NULL-deref at probe
	drm/amdgpu: Avoid overflows/divide-by-zero in latency_watermark calculations.
	drm/amdgpu: Make display watermark calculations more accurate
	drm/nouveau/therm: remove ineffective workarounds for alarm bugs
	drm/nouveau/tmr: ack interrupt before processing alarms
	drm/nouveau/tmr: fix corruption of the pending list when rescheduling an alarm
	drm/nouveau/tmr: avoid processing completed alarms when adding a new one
	drm/nouveau/tmr: handle races with hw when updating the next alarm time
	cdc-acm: fix possible invalid access when processing notification
	proc: Fix unbalanced hard link numbers
	of: fix sparse warning in of_pci_range_parser_one
	iio: dac: ad7303: fix channel description
	pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes
	pid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes()
	USB: serial: ftdi_sio: fix setting latency for unprivileged users
	USB: serial: ftdi_sio: add Olimex ARM-USB-TINY(H) PIDs
	ext4 crypto: don't let data integrity writebacks fail with ENOMEM
	ext4 crypto: fix some error handling
	net: qmi_wwan: Add SIMCom 7230E
	fscrypt: fix context consistency check when key(s) unavailable
	f2fs: check entire encrypted bigname when finding a dentry
	fscrypt: avoid collisions when presenting long encrypted filenames
	sched/fair: Do not announce throttled next buddy in dequeue_task_fair()
	sched/fair: Initialize throttle_count for new task-groups lazily
	usb: host: xhci-plat: propagate return value of platform_get_irq()
	xhci: apply PME_STUCK_QUIRK and MISSING_CAS quirk for Denverton
	usb: host: xhci-mem: allocate zeroed Scratchpad Buffer
	net: irda: irda-usb: fix firmware name on big-endian hosts
	usbvision: fix NULL-deref at probe
	mceusb: fix NULL-deref at probe
	ttusb2: limit messages to buffer size
	usb: musb: tusb6010_omap: Do not reset the other direction's packet size
	USB: iowarrior: fix info ioctl on big-endian hosts
	usb: serial: option: add Telit ME910 support
	USB: serial: qcserial: add more Lenovo EM74xx device IDs
	USB: serial: mct_u232: fix big-endian baud-rate handling
	USB: serial: io_ti: fix div-by-zero in set_termios
	USB: hub: fix SS hub-descriptor handling
	USB: hub: fix non-SS hub-descriptor handling
	ipx: call ipxitf_put() in ioctl error path
	iio: proximity: as3935: fix as3935_write
	ceph: fix recursion between ceph_set_acl() and __ceph_setattr()
	gspca: konica: add missing endpoint sanity check
	s5p-mfc: Fix unbalanced call to clock management
	dib0700: fix NULL-deref at probe
	zr364xx: enforce minimum size when reading header
	dvb-frontends/cxd2841er: define symbol_rate_min/max in T/C fe-ops
	cx231xx-audio: fix init error path
	cx231xx-audio: fix NULL-deref at probe
	cx231xx-cards: fix NULL-deref at probe
	powerpc/book3s/mce: Move add_taint() later in virtual mode
	powerpc/pseries: Fix of_node_put() underflow during DLPAR remove
	powerpc/64e: Fix hang when debugging programs with relocated kernel
	ARM: dts: at91: sama5d3_xplained: fix ADC vref
	ARM: dts: at91: sama5d3_xplained: not all ADC channels are available
	arm64: xchg: hazard against entire exchange variable
	arm64: uaccess: ensure extension of access_ok() addr
	arm64: documentation: document tagged pointer stack constraints
	xc2028: Fix use-after-free bug properly
	mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp
	staging: rtl8192e: fix 2 byte alignment of register BSSIDR.
	staging: rtl8192e: rtl92e_get_eeprom_size Fix read size of EPROM_CMD.
	iommu/vt-d: Flush the IOTLB to get rid of the initial kdump mappings
	metag/uaccess: Fix access_ok()
	metag/uaccess: Check access_ok in strncpy_from_user
	uwb: fix device quirk on big-endian hosts
	genirq: Fix chained interrupt data ordering
	osf_wait4(): fix infoleak
	tracing/kprobes: Enforce kprobes teardown after testing
	PCI: Fix pci_mmap_fits() for HAVE_PCI_RESOURCE_TO_USER platforms
	PCI: Freeze PME scan before suspending devices
	drm/edid: Add 10 bpc quirk for LGD 764 panel in HP zBook 17 G2
	nfsd: encoders mustn't use unitialized values in error cases
	drivers: char: mem: Check for address space wraparound with mmap()
	Linux 4.4.70

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-05-25 17:31:28 +02:00
Thomas Gleixner
6384f782a6 tracing/kprobes: Enforce kprobes teardown after testing
commit 30e7d894c1478c88d50ce94ddcdbd7f9763d9cdd upstream.

Enabling the tracer selftest triggers occasionally the warning in
text_poke(), which warns when the to be modified page is not marked
reserved.

The reason is that the tracer selftest installs kprobes on functions marked
__init for testing. These probes are removed after the tests, but that
removal schedules the delayed kprobes_optimizer work, which will do the
actual text poke. If the work is executed after the init text is freed,
then the warning triggers. The bug can be reproduced reliably when the work
delay is increased.

Flush the optimizer work and wait for the optimizing/unoptimizing lists to
become empty before returning from the kprobes tracer selftest. That
ensures that all operations which were queued due to the probes removal
have completed.

Link: http://lkml.kernel.org/r/20170516094802.76a468bb@gandalf.local.home

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 6274de498 ("kprobes: Support delayed unoptimizing")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 14:30:17 +02:00
Thomas Gleixner
e07db0d720 genirq: Fix chained interrupt data ordering
commit 2c4569ca26986d18243f282dd727da27e9adae4c upstream.

irq_set_chained_handler_and_data() sets up the chained interrupt and then
stores the handler data.

That's racy against an immediate interrupt which gets handled before the
store of the handler data happened. The handler will dereference a NULL
pointer and crash.

Cure it by storing handler data before installing the chained handler.

Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 14:30:17 +02:00
Konstantin Khlebnikov
ada79b5ecd sched/fair: Initialize throttle_count for new task-groups lazily
commit 094f469172e00d6ab0a3130b0e01c83b3cf3a98d upstream.

Cgroup created inside throttled group must inherit current throttle_count.
Broken throttle_count allows to nominate throttled entries as a next buddy,
later this leads to null pointer dereference in pick_next_task_fair().

This patch initialize cfs_rq->throttle_count at first enqueue: laziness
allows to skip locking all rq at group creation. Lazy approach also allows
to skip full sub-tree scan at throttling hierarchy (not in this patch).

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Link: http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Ben Pineau <benjamin.pineau@mirakl.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 14:30:12 +02:00
Konstantin Khlebnikov
f01ae9cb0d sched/fair: Do not announce throttled next buddy in dequeue_task_fair()
commit 754bd598be9bbc953bc709a9e8ed7f3188bfb9d7 upstream.

Hierarchy could be already throttled at this point. Throttled next
buddy could trigger a NULL pointer dereference in pick_next_task_fair().

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Ben Pineau <benjamin.pineau@mirakl.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 14:30:12 +02:00
Kirill Tkhai
6a70a5833e pid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes()
commit 3fd37226216620c1a468afa999739d5016fbc349 upstream.

Imagine we have a pid namespace and a task from its parent's pid_ns,
which made setns() to the pid namespace. The task is doing fork(),
while the pid namespace's child reaper is dying. We have the race
between them:

Task from parent pid_ns             Child reaper
copy_process()                      ..
  alloc_pid()                       ..
  ..                                zap_pid_ns_processes()
  ..                                  disable_pid_allocation()
  ..                                  read_lock(&tasklist_lock)
  ..                                  iterate over pids in pid_ns
  ..                                    kill tasks linked to pids
  ..                                  read_unlock(&tasklist_lock)
  write_lock_irq(&tasklist_lock);   ..
  attach_pid(p, PIDTYPE_PID);       ..
  ..                                ..

So, just created task p won't receive SIGKILL signal,
and the pid namespace will be in contradictory state.
Only manual kill will help there, but does the userspace
care about this? I suppose, the most users just inject
a task into a pid namespace and wait a SIGCHLD from it.

The patch fixes the problem. It simply checks for
(pid_ns->nr_hashed & PIDNS_HASH_ADDING) in copy_process().
We do it under the tasklist_lock, and can't skip
PIDNS_HASH_ADDING as noted by Oleg:

"zap_pid_ns_processes() does disable_pid_allocation()
and then takes tasklist_lock to kill the whole namespace.
Given that copy_process() checks PIDNS_HASH_ADDING
under write_lock(tasklist) they can't race;
if copy_process() takes this lock first, the new child will
be killed, otherwise copy_process() can't miss
the change in ->nr_hashed."

If allocation is disabled, we just return -ENOMEM
like it's made for such cases in alloc_pid().

v2: Do not move disable_pid_allocation(), do not
introduce a new variable in copy_process() and simplify
the patch as suggested by Oleg Nesterov.
Account the problem with double irq enabling
found by Eric W. Biederman.

Fixes: c876ad7682 ("pidns: Stop pid allocation when init dies")
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Ingo Molnar <mingo@kernel.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Mike Rapoport <rppt@linux.vnet.ibm.com>
CC: Michal Hocko <mhocko@suse.com>
CC: Andy Lutomirski <luto@kernel.org>
CC: "Eric W. Biederman" <ebiederm@xmission.com>
CC: Andrei Vagin <avagin@openvz.org>
CC: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Serge Hallyn <serge@hallyn.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 14:30:11 +02:00
Eric W. Biederman
ddf9b92f12 pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes
commit b9a985db98961ae1ba0be169f19df1c567e4ffe0 upstream.

The code can potentially sleep for an indefinite amount of time in
zap_pid_ns_processes triggering the hung task timeout, and increasing
the system average.  This is undesirable.  Sleep with a task state of
TASK_INTERRUPTIBLE instead of TASK_UNINTERRUPTIBLE to remove these
undesirable side effects.

Apparently under heavy load this has been allowing Chrome to trigger
the hung time task timeout error and cause ChromeOS to reboot.

Reported-by: Vovo Yang <vovoy@google.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Fixes: 6347e90091 ("pidns: guarantee that the pidns init will be the last pidns process reaped")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 14:30:11 +02:00
Greg Kroah-Hartman
b2fc10e724 This is the 4.4.69 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlkgNjAACgkQONu9yGCS
 aT5BNhAAvs5FwuKjmq+KLXs2ofB7REnq1xBjcm8Y7gnFo+7+slrgOyrGH4fpWArP
 55pU9YelY/DZzSjZ/hYkp/fI/TCZskgV+T/IIRhPlpAHIDRCnFfVqNpY6Oijo1jw
 ZcuggPUjo6OqV3yB6FRm8OKnaux4bZBi63TGom+0UpGEEzTW0LfwA8mK2yAmlgWm
 huVPuHRxBSHjxaie2s/8wwmbFfJZ+MwtaRFDNFiPayVuRb2zZBfDVUVEoVNlkGNL
 wfnTJ4UpjyBkMiOEoNao7DtmlLttuysAZ4LKqL2VsfcDZ7RzuwZ7okM1rxW1W7F8
 TTHKz9NXfqNEPTYhHHfwnHGhpzuZEYqeXRzCoddfQMuDdTkdbpscLd4gobosQJR7
 NL25MKL4wcI/7366qnq0Fa0J4pmNDd6LO1knOz4OR7sNFJ4C1TUVmzUryJuSA3UO
 8OGJ0qMJzJHUgoNByHdrs9cbxiQmTRcACA9MnizBPtz+ciiyvUUfY4dTEnlQIFOl
 PZhtux5wC/UdhZjfUzwBt2fD/kUHg4OHdPoEWVp0E0U/H7SbSllyeX+qKFZomfzm
 UUqSU823sGe/VQtoiLtH9fSqUmfARmU64pthgOuvGk8qBLyl6mkGApj+XtkBcozG
 lNE0AgWs+NnZyEPfMJIAyxxyko5Dy9I4TpX9/fjCWkQH7NrHqwM=
 =eKGw
 -----END PGP SIGNATURE-----

Merge 4.4.69 into android-4.4

Changes in 4.4.69
	xen: adjust early dom0 p2m handling to xen hypervisor behavior
	target: Fix compare_and_write_callback handling for non GOOD status
	target/fileio: Fix zero-length READ and WRITE handling
	target: Convert ACL change queue_depth se_session reference usage
	iscsi-target: Set session_fall_back_to_erl0 when forcing reinstatement
	usb: host: xhci: print correct command ring address
	USB: serial: ftdi_sio: add device ID for Microsemi/Arrow SF2PLUS Dev Kit
	USB: Proper handling of Race Condition when two USB class drivers try to call init_usb_class simultaneously
	staging: vt6656: use off stack for in buffer USB transfers.
	staging: vt6656: use off stack for out buffer USB transfers.
	staging: gdm724x: gdm_mux: fix use-after-free on module unload
	staging: comedi: jr3_pci: fix possible null pointer dereference
	staging: comedi: jr3_pci: cope with jiffies wraparound
	usb: misc: add missing continue in switch
	usb: Make sure usb/phy/of gets built-in
	usb: hub: Fix error loop seen after hub communication errors
	usb: hub: Do not attempt to autosuspend disconnected devices
	x86/boot: Fix BSS corruption/overwrite bug in early x86 kernel startup
	selftests/x86/ldt_gdt_32: Work around a glibc sigaction() bug
	x86, pmem: Fix cache flushing for iovec write < 8 bytes
	um: Fix PTRACE_POKEUSER on x86_64
	KVM: x86: fix user triggerable warning in kvm_apic_accept_events()
	KVM: arm/arm64: fix races in kvm_psci_vcpu_on
	block: fix blk_integrity_register to use template's interval_exp if not 0
	crypto: algif_aead - Require setkey before accept(2)
	dm era: save spacemap metadata root after the pre-commit
	vfio/type1: Remove locked page accounting workqueue
	IB/core: Fix sysfs registration error flow
	IB/IPoIB: ibX: failed to create mcg debug file
	IB/mlx4: Fix ib device initialization error flow
	IB/mlx4: Reduce SRIOV multicast cleanup warning message to debug level
	ext4: evict inline data when writing to memory map
	fs/xattr.c: zero out memory copied to userspace in getxattr
	ceph: fix memory leak in __ceph_setxattr()
	fs/block_dev: always invalidate cleancache in invalidate_bdev()
	Set unicode flag on cifs echo request to avoid Mac error
	SMB3: Work around mount failure when using SMB3 dialect to Macs
	CIFS: fix mapping of SFM_SPACE and SFM_PERIOD
	cifs: fix CIFS_IOC_GET_MNT_INFO oops
	CIFS: add misssing SFM mapping for doublequote
	padata: free correct variable
	arm64: KVM: Fix decoding of Rt/Rt2 when trapping AArch32 CP accesses
	serial: samsung: Use right device for DMA-mapping calls
	serial: omap: fix runtime-pm handling on unbind
	serial: omap: suspend device on probe errors
	tty: pty: Fix ldisc flush after userspace become aware of the data already
	Bluetooth: Fix user channel for 32bit userspace on 64bit kernel
	Bluetooth: hci_bcm: add missing tty-device sanity check
	Bluetooth: hci_intel: add missing tty-device sanity check
	mac80211: pass RX aggregation window size to driver
	mac80211: pass block ack session timeout to to driver
	mac80211: RX BA support for sta max_rx_aggregation_subframes
	wlcore: Pass win_size taken from ieee80211_sta to FW
	wlcore: Add RX_BA_WIN_SIZE_CHANGE_EVENT event
	ipmi: Fix kernel panic at ipmi_ssif_thread()
	Linux 4.4.69

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-05-21 19:01:22 +02:00
Jason A. Donenfeld
f08bc4d633 padata: free correct variable
commit 07a77929ba672d93642a56dc2255dd21e6e2290b upstream.

The author meant to free the variable that was just allocated, instead
of the one that failed to be allocated, but made a simple typo. This
patch rectifies that.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:27:02 +02:00
Greg Kroah-Hartman
e4528dd775 This is the 4.4.65 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlkFXvsACgkQONu9yGCS
 aT6kPg//QqrRCxSUBYahQ1Jp16AVLiEWjJ3umzBhGGSPn7FfsWF8951R1WBHGlFI
 lEUa3Pfi0U1sh0q4v6pTmQ/AYoa67DcKorxQegH9JoaRp0IvWpSaGMSfbmKP5pDl
 PQyRL6DmOFkf/6X0dvby5ybbt2Kp59zTm8RFeFLRo3LTUK30w/tBTVvouk+UW3KA
 KtjeL70OSOHgWoHXhNWDX1JTTBGFFTI2x0jlFeUtq10t2kRxAMDZpB/IY0VJ3ZTe
 iso6+hC8JyzsXUYP82ZfZ7BAv/hSWBV3ErHyrUmhqWfE/Px7PFEeo3OyG3Bqosu6
 aZW78jwFoqZcAhkVTQepWMHonUT+XLHUgCzc2MqFR4HW6JoQhKDdIqlt1Lqp6y1O
 XsYOrPU1WqHhyoO9E3YwmAIjlYBHxYSUiCnqI9WtvvExJUhXXk/wwzgXUFrZPD01
 berofViH2LJAxde0sqpidpNRg98m+MAK47M03I/tZUUykjGDi8NPTvM4FBbNCEty
 3qaVVCUm7o8YzZg54QF61O+ciceoQdnsQJVy94EV3n2pgdN/7pG0v1KikBRKfsPK
 1Wp+l0tdLkms56ElXyt/lHtF5Pre5i4sE6SdnZa3RHTUV168PFVYqJUCqWRwCD50
 QMs+yLvRHwCFst+ix29Xn+c7KYKcMyqPvCrI8oczfokV/tvMVd8=
 =1GiA
 -----END PGP SIGNATURE-----

Merge 4.4.65 into android-4.4

Changes in 4.4.65:
	tipc: make sure IPv6 header fits in skb headroom
	tipc: make dist queue pernet
	tipc: re-enable compensation for socket receive buffer double counting
	tipc: correct error in node fsm
	tty: nozomi: avoid a harmless gcc warning
	hostap: avoid uninitialized variable use in hfa384x_get_rid
	gfs2: avoid uninitialized variable warning
	tipc: fix random link resets while adding a second bearer
	tipc: fix socket timer deadlock
	mnt: Add a per mount namespace limit on the number of mounts
	xc2028: avoid use after free
	netfilter: nfnetlink: correctly validate length of batch messages
	tipc: check minimum bearer MTU
	vfio/pci: Fix integer overflows, bitmask check
	staging/android/ion : fix a race condition in the ion driver
	ping: implement proper locking
	perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race
	Linux 4.4.65

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-04-30 07:30:52 +02:00
Peter Zijlstra
416bd4a366 perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race
commit 321027c1fe77f892f4ea07846aeae08cefbbb290 upstream.

Di Shen reported a race between two concurrent sys_perf_event_open()
calls where both try and move the same pre-existing software group
into a hardware context.

The problem is exactly that described in commit:

  f63a8daa58 ("perf: Fix event->ctx locking")

... where, while we wait for a ctx->mutex acquisition, the event->ctx
relation can have changed under us.

That very same commit failed to recognise sys_perf_event_context() as an
external access vector to the events and thereby didn't apply the
established locking rules correctly.

So while one sys_perf_event_open() call is stuck waiting on
mutex_lock_double(), the other (which owns said locks) moves the group
about. So by the time the former sys_perf_event_open() acquires the
locks, the context we've acquired is stale (and possibly dead).

Apply the established locking rules as per perf_event_ctx_lock_nested()
to the mutex_lock_double() for the 'move_group' case. This obviously means
we need to validate state after we acquire the locks.

Reported-by: Di Shen (Keen Lab)
Tested-by: John Dias <joaodias@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Min Chong <mchong@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: f63a8daa58 ("perf: Fix event->ctx locking")
Link: http://lkml.kernel.org/r/20170106131444.GZ3174@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 4.4:
 - Test perf_event::group_flags instead of group_caps
 - Adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-30 05:49:29 +02:00
Eric W. Biederman
c50fd34e10 mnt: Add a per mount namespace limit on the number of mounts
commit d29216842a85c7970c536108e093963f02714498 upstream.

CAI Qian <caiqian@redhat.com> pointed out that the semantics
of shared subtrees make it possible to create an exponentially
increasing number of mounts in a mount namespace.

    mkdir /tmp/1 /tmp/2
    mount --make-rshared /
    for i in $(seq 1 20) ; do mount --bind /tmp/1 /tmp/2 ; done

Will create create 2^20 or 1048576 mounts, which is a practical problem
as some people have managed to hit this by accident.

As such CVE-2016-6213 was assigned.

Ian Kent <raven@themaw.net> described the situation for autofs users
as follows:

> The number of mounts for direct mount maps is usually not very large because of
> the way they are implemented, large direct mount maps can have performance
> problems. There can be anywhere from a few (likely case a few hundred) to less
> than 10000, plus mounts that have been triggered and not yet expired.
>
> Indirect mounts have one autofs mount at the root plus the number of mounts that
> have been triggered and not yet expired.
>
> The number of autofs indirect map entries can range from a few to the common
> case of several thousand and in rare cases up to between 30000 and 50000. I've
> not heard of people with maps larger than 50000 entries.
>
> The larger the number of map entries the greater the possibility for a large
> number of active mounts so it's not hard to expect cases of a 1000 or somewhat
> more active mounts.

So I am setting the default number of mounts allowed per mount
namespace at 100,000.  This is more than enough for any use case I
know of, but small enough to quickly stop an exponential increase
in mounts.  Which should be perfect to catch misconfigurations and
malfunctioning programs.

For anyone who needs a higher limit this can be changed by writing
to the new /proc/sys/fs/mount-max sysctl.

Tested-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
[bwh: Backported to 4.4: adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-30 05:49:28 +02:00
Greg Kroah-Hartman
e9cf0f69b7 This is the 4.4.64 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlkBmUYACgkQONu9yGCS
 aT6uOBAAvOVUjBIwkaYoy1/Pk2ynZXXIoiBUA6Ti3LaUEPT44zVcfG6CwOKxxUsb
 huIxAg8tGDXN0I41YrLZEG/Ju3ommWyjZQ+RWZA/W3an+2y6oz2BXNnBlePTpyts
 9EWknm61cm6rqcA9y0himDdGjtuM/F6g2vTLboCZnc0IYlwh2TG9tvBn5gcHlVyA
 1mlGCzAxBKf6ttIOKtan4LxssW0jO+e0w+W4mPrAsUViJFSnMHAY1csKQiT62r+Y
 aBNrNIFSMKKSz1a2slOgf1GihaCIL9HnrTlBUcIQkxXyjawNms4ENj9lBy4fJZao
 74eU6aVBvKbE2175PI/Ub90OvtbOI83EzmBgqkVgHSBXzCaPOScnDAnMlwlW3vhW
 5lQU1eN4jtL6FuMi565mXQ8G4RP7PzuWrLfT9rrAaR/rqC54tY882FGjL2KCqzpd
 IVLhKSDg5iqB2JrnNS/GEzJd6Y024EMYGytp+jcDkczfbUHguxfmUNkbrh8sOMSi
 leMS/Z+FN6kc4bvF55NsvwW2n8XNn5Om/TWcXNdGtxvBsk6PD2W6+Bo+Tq7NotNf
 aOuJFQHxBLqfA9LO6UjZMQGfTdfweZ+fAMaGH/X55+GCExLuTTkvfHxerleYFSw8
 FNS+wCn1e+RonHUw2tztE4kfPY2kJ6JkILxzGe/1pC6kv0HDzsA=
 =7UnS
 -----END PGP SIGNATURE-----

Merge 4.4.64 into android-4.4

Changes in 4.4.64:
	KEYS: Disallow keyrings beginning with '.' to be joined as session keyrings
	KEYS: Change the name of the dead type to ".dead" to prevent user access
	KEYS: fix keyctl_set_reqkey_keyring() to not leak thread keyrings
	tracing: Allocate the snapshot buffer before enabling probe
	ring-buffer: Have ring_buffer_iter_empty() return true when empty
	cifs: Do not send echoes before Negotiate is complete
	CIFS: remove bad_network_name flag
	s390/mm: fix CMMA vs KSM vs others
	Drivers: hv: don't leak memory in vmbus_establish_gpadl()
	Drivers: hv: get rid of timeout in vmbus_open()
	Drivers: hv: vmbus: Reduce the delay between retries in vmbus_post_msg()
	VSOCK: Detach QP check should filter out non matching QPs.
	Input: elantech - add Fujitsu Lifebook E547 to force crc_enabled
	ACPI / power: Avoid maybe-uninitialized warning
	mmc: sdhci-esdhc-imx: increase the pad I/O drive strength for DDR50 card
	mac80211: reject ToDS broadcast data frames
	ubi/upd: Always flush after prepared for an update
	powerpc/kprobe: Fix oops when kprobed on 'stdu' instruction
	x86/mce/AMD: Give a name to MCA bank 3 when accessed with legacy MSRs
	kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
	Tools: hv: kvp: ensure kvp device fd is closed on exec
	Drivers: hv: balloon: keep track of where ha_region starts
	Drivers: hv: balloon: account for gaps in hot add regions
	hv: don't reset hv_context.tsc_page on crash
	x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions
	block: fix del_gendisk() vs blkdev_ioctl crash
	tipc: fix crash during node removal
	Linux 4.4.64

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-04-27 10:07:57 +02:00
Steven Rostedt (VMware)
a2a67e53f9 ring-buffer: Have ring_buffer_iter_empty() return true when empty
commit 78f7a45dac2a2d2002f98a3a95f7979867868d73 upstream.

I noticed that reading the snapshot file when it is empty no longer gives a
status. It suppose to show the status of the snapshot buffer as well as how
to allocate and use it. For example:

 ># cat snapshot
 # tracer: nop
 #
 #
 # * Snapshot is allocated *
 #
 # Snapshot commands:
 # echo 0 > snapshot : Clears and frees snapshot buffer
 # echo 1 > snapshot : Allocates snapshot buffer, if not already allocated.
 #                      Takes a snapshot of the main buffer.
 # echo 2 > snapshot : Clears snapshot buffer (but does not allocate or free)
 #                      (Doesn't have to be '2' works with any number that
 #                       is not a '0' or '1')

But instead it just showed an empty buffer:

 ># cat snapshot
 # tracer: nop
 #
 # entries-in-buffer/entries-written: 0/0   #P:4
 #
 #                              _-----=> irqs-off
 #                             / _----=> need-resched
 #                            | / _---=> hardirq/softirq
 #                            || / _--=> preempt-depth
 #                            ||| /     delay
 #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
 #              | |       |   ||||       |         |

What happened was that it was using the ring_buffer_iter_empty() function to
see if it was empty, and if it was, it showed the status. But that function
was returning false when it was empty. The reason was that the iter header
page was on the reader page, and the reader page was empty, but so was the
buffer itself. The check only tested to see if the iter was on the commit
page, but the commit page was no longer pointing to the reader page, but as
all pages were empty, the buffer is also.

Fixes: 651e22f270 ("ring-buffer: Always reset iterator to reader page")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-27 09:09:31 +02:00
Steven Rostedt (VMware)
1dfb1c7bd6 tracing: Allocate the snapshot buffer before enabling probe
commit df62db5be2e5f070ecd1a5ece5945b590ee112e0 upstream.

Currently the snapshot trigger enables the probe and then allocates the
snapshot. If the probe triggers before the allocation, it could cause the
snapshot to fail and turn tracing off. It's best to allocate the snapshot
buffer first, and then enable the trigger. If something goes wrong in the
enabling of the trigger, the snapshot buffer is still allocated, but it can
also be freed by the user by writting zero into the snapshot buffer file.

Also add a check of the return status of alloc_snapshot().

Fixes: 77fd5c15e3 ("tracing: Add snapshot trigger to function probes")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-27 09:09:31 +02:00
Todd Kjos
d77312aeb2 Revert "[RFC]cgroup: Change from CAP_SYS_NICE to CAP_SYS_RESOURCE for cgroup migration permissions"
This reverts commit 64f0245f31.

Needs to be reverted since system_server no longer has CAP_SYS_RESOURCE.

Change-Id: Ic500cffb14b80a3e2a5dd41fda0b0b781b5245d6
2017-04-25 18:31:07 +00:00
Greg Kroah-Hartman
29fa724a09 This is the 4.4.63 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlj5tRkACgkQONu9yGCS
 aT5zFxAAouq2kxBFxxJIQ3255yy/7B6oBYrhilQZPrETC800PUaIqZtuQZPpaoqb
 3gG0+12ve0CMHK+PidEwsQlMlAHNI1xbzmUHm2UIrLYYCV817DTkEsc7JXGUvYVA
 /YA71GASKmLVi9DnsawRb0ELhTeQHec76LrPlgvyWH/OMEtNcMOv/8oWfTq9bKV2
 HsHC6MOwT2R86ukhYYmcfFHomTnJSpW7KtGXwNC/LhohzIfsKQKGQWb1f1j1aHGC
 u5yQ5Qc9T+DhPMHAEY+xuURz/3ohpUL8aSQXk7pua/bTD0X0klNQcf/BXVJXsaeI
 s4g78q+YdTcPL81rkEW+7yUvAlb3u+FdVr+wjsl/s6ih4iL0EgBsoClqUjGUUoz+
 jvCXHiMP7lHi50eIkppQf/yZSVKSobKn5YYf9AA+y6tQ9R9GguDS/IQSRe2HnHeR
 OymCBXa6BSmQGGyPiMUBiNTix6roJ8Vr4dK9lbsQXZ+YZICXWs1rpMOy5HK9EJWf
 M6YF6l9lHwQ38AN+MhsjUXIyKLp9zCk7syeFaeK6k/IA2kcm7dL/momiZ1QIBnhq
 OHB3iwEPZ5Rr4CVjk5j7Ue22ubdrtpc8IfTYV95N7nv+g3nBwe22k+RDi70NiDwk
 2pnBqhO/vtPRE9Ry3QBS73VEeXgNb9IIVwQ7hi9Rk7KUgmdEOOo=
 =iS0x
 -----END PGP SIGNATURE-----

Merge 4.4.63 into android-4.4

Changes in 4.4.63:
	cgroup, kthread: close race window where new kthreads can be migrated to non-root cgroups
	thp: fix MADV_DONTNEED vs clear soft dirty race
	drm/nouveau/mpeg: mthd returns true on success now
	drm/nouveau/mmu/nv4a: use nv04 mmu rather than the nv44 one
	CIFS: store results of cifs_reopen_file to avoid infinite wait
	Input: xpad - add support for Razer Wildcat gamepad
	perf/x86: Avoid exposing wrong/stale data in intel_pmu_lbr_read_32()
	x86/vdso: Ensure vdso32_enabled gets set to valid values only
	x86/vdso: Plug race between mapping and ELF header setup
	acpi, nfit, libnvdimm: fix interleave set cookie calculation (64-bit comparison)
	iscsi-target: Fix TMR reference leak during session shutdown
	iscsi-target: Drop work-around for legacy GlobalSAN initiator
	scsi: sr: Sanity check returned mode data
	scsi: sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable
	scsi: sd: Fix capacity calculation with 32-bit sector_t
	xen, fbfront: fix connecting to backend
	libnvdimm: fix reconfig_mutex, mmap_sem, and jbd2_handle lockdep splat
	irqchip/irq-imx-gpcv2: Fix spinlock initialization
	ftrace: Fix removing of second function probe
	char: Drop bogus dependency of DEVPORT on !M68K
	char: lack of bool string made CONFIG_DEVPORT always on
	Revert "MIPS: Lantiq: Fix cascaded IRQ setup"
	kvm: fix page struct leak in handle_vmon
	zram: do not use copy_page with non-page aligned address
	powerpc: Disable HFSCR[TM] if TM is not supported
	crypto: ahash - Fix EINPROGRESS notification callback
	ath9k: fix NULL pointer dereference
	dvb-usb-v2: avoid use-after-free
	ext4: fix inode checksum calculation problem if i_extra_size is small
	platform/x86: acer-wmi: setup accelerometer when machine has appropriate notify event
	rtc: tegra: Implement clock handling
	mm: Tighten x86 /dev/mem with zeroing reads
	dvb-usb: don't use stack for firmware load
	dvb-usb-firmware: don't do DMA on stack
	virtio-console: avoid DMA from stack
	pegasus: Use heap buffers for all register access
	rtl8150: Use heap buffers for all register access
	catc: Combine failure cleanup code in catc_probe()
	catc: Use heap buffer for memory size test
	ibmveth: calculate gso_segs for large packets
	SUNRPC: fix refcounting problems with auth_gss messages.
	tty/serial: atmel: RS485 half duplex w/DMA: enable RX after TX is done
	net: ipv6: check route protocol when deleting routes
	sctp: deny peeloff operation on asocs with threads sleeping on it
	MIPS: fix Select HAVE_IRQ_EXIT_ON_IRQ_STACK patch.
	Linux 4.4.63

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-04-21 09:47:01 +02:00
Steven Rostedt (VMware)
7fe57118a7 ftrace: Fix removing of second function probe
commit 82cc4fc2e70ec5baeff8f776f2773abc8b2cc0ae upstream.

When two function probes are added to set_ftrace_filter, and then one of
them is removed, the update to the function locations is not performed, and
the record keeping of the function states are corrupted, and causes an
ftrace_bug() to occur.

This is easily reproducable by adding two probes, removing one, and then
adding it back again.

 # cd /sys/kernel/debug/tracing
 # echo schedule:traceoff > set_ftrace_filter
 # echo do_IRQ:traceoff > set_ftrace_filter
 # echo \!do_IRQ:traceoff > /debug/tracing/set_ftrace_filter
 # echo do_IRQ:traceoff > set_ftrace_filter

Causes:
 ------------[ cut here ]------------
 WARNING: CPU: 2 PID: 1098 at kernel/trace/ftrace.c:2369 ftrace_get_addr_curr+0x143/0x220
 Modules linked in: [...]
 CPU: 2 PID: 1098 Comm: bash Not tainted 4.10.0-test+ #405
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
 Call Trace:
  dump_stack+0x68/0x9f
  __warn+0x111/0x130
  ? trace_irq_work_interrupt+0xa0/0xa0
  warn_slowpath_null+0x1d/0x20
  ftrace_get_addr_curr+0x143/0x220
  ? __fentry__+0x10/0x10
  ftrace_replace_code+0xe3/0x4f0
  ? ftrace_int3_handler+0x90/0x90
  ? printk+0x99/0xb5
  ? 0xffffffff81000000
  ftrace_modify_all_code+0x97/0x110
  arch_ftrace_update_code+0x10/0x20
  ftrace_run_update_code+0x1c/0x60
  ftrace_run_modify_code.isra.48.constprop.62+0x8e/0xd0
  register_ftrace_function_probe+0x4b6/0x590
  ? ftrace_startup+0x310/0x310
  ? debug_lockdep_rcu_enabled.part.4+0x1a/0x30
  ? update_stack_state+0x88/0x110
  ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320
  ? preempt_count_sub+0x18/0xd0
  ? mutex_lock_nested+0x104/0x800
  ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320
  ? __unwind_start+0x1c0/0x1c0
  ? _mutex_lock_nest_lock+0x800/0x800
  ftrace_trace_probe_callback.isra.3+0xc0/0x130
  ? func_set_flag+0xe0/0xe0
  ? __lock_acquire+0x642/0x1790
  ? __might_fault+0x1e/0x20
  ? trace_get_user+0x398/0x470
  ? strcmp+0x35/0x60
  ftrace_trace_onoff_callback+0x48/0x70
  ftrace_regex_write.isra.43.part.44+0x251/0x320
  ? match_records+0x420/0x420
  ftrace_filter_write+0x2b/0x30
  __vfs_write+0xd7/0x330
  ? do_loop_readv_writev+0x120/0x120
  ? locks_remove_posix+0x90/0x2f0
  ? do_lock_file_wait+0x160/0x160
  ? __lock_is_held+0x93/0x100
  ? rcu_read_lock_sched_held+0x5c/0xb0
  ? preempt_count_sub+0x18/0xd0
  ? __sb_start_write+0x10a/0x230
  ? vfs_write+0x222/0x240
  vfs_write+0xef/0x240
  SyS_write+0xab/0x130
  ? SyS_read+0x130/0x130
  ? trace_hardirqs_on_caller+0x182/0x280
  ? trace_hardirqs_on_thunk+0x1a/0x1c
  entry_SYSCALL_64_fastpath+0x18/0xad
 RIP: 0033:0x7fe61c157c30
 RSP: 002b:00007ffe87890258 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
 RAX: ffffffffffffffda RBX: ffffffff8114a410 RCX: 00007fe61c157c30
 RDX: 0000000000000010 RSI: 000055814798f5e0 RDI: 0000000000000001
 RBP: ffff8800c9027f98 R08: 00007fe61c422740 R09: 00007fe61ca53700
 R10: 0000000000000073 R11: 0000000000000246 R12: 0000558147a36400
 R13: 00007ffe8788f160 R14: 0000000000000024 R15: 00007ffe8788f15c
  ? trace_hardirqs_off_caller+0xc0/0x110
 ---[ end trace 99fa09b3d9869c2c ]---
 Bad trampoline accounting at: ffffffff81cc3b00 (do_IRQ+0x0/0x150)

Fixes: 59df055f19 ("ftrace: trace different functions with a different tracer")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-21 09:30:06 +02:00
Tejun Heo
3144d81a77 cgroup, kthread: close race window where new kthreads can be migrated to non-root cgroups
commit 77f88796cee819b9c4562b0b6b44691b3b7755b1 upstream.

Creation of a kthread goes through a couple interlocked stages between
the kthread itself and its creator.  Once the new kthread starts
running, it initializes itself and wakes up the creator.  The creator
then can further configure the kthread and then let it start doing its
job by waking it up.

In this configuration-by-creator stage, the creator is the only one
that can wake it up but the kthread is visible to userland.  When
altering the kthread's attributes from userland is allowed, this is
fine; however, for cases where CPU affinity is critical,
kthread_bind() is used to first disable affinity changes from userland
and then set the affinity.  This also prevents the kthread from being
migrated into non-root cgroups as that can affect the CPU affinity and
many other things.

Unfortunately, the cgroup side of protection is racy.  While the
PF_NO_SETAFFINITY flag prevents further migrations, userland can win
the race before the creator sets the flag with kthread_bind() and put
the kthread in a non-root cgroup, which can lead to all sorts of
problems including incorrect CPU affinity and starvation.

This bug got triggered by userland which periodically tries to migrate
all processes in the root cpuset cgroup to a non-root one.  Per-cpu
workqueue workers got caught while being created and ended up with
incorrected CPU affinity breaking concurrency management and sometimes
stalling workqueue execution.

This patch adds task->no_cgroup_migration which disallows the task to
be migrated by userland.  kthreadd starts with the flag set making
every child kthread start in the root cgroup with migration
disallowed.  The flag is cleared after the kthread finishes
initialization by which time PF_NO_SETAFFINITY is set if the kthread
should stay in the root cgroup.

It'd be better to wait for the initialization instead of failing but I
couldn't think of a way of implementing that without adding either a
new PF flag, or sleeping and retrying from waiting side.  Even if
userland depends on changing cgroup membership of a kthread, it either
has to be synchronized with kthread_create() or periodically repeat,
so it's unlikely that this would break anything.

v2: Switch to a simpler implementation using a new task_struct bit
    field suggested by Oleg.

Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reported-and-debugged-by: Chris Mason <clm@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-21 09:30:04 +02:00
Greg Kroah-Hartman
e3b87b234b This is the 4.4.61 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAljuA8EACgkQONu9yGCS
 aT5smg//fcD0laNCo+dhbbadB2utsxnDRD0diRusmvJfmRYXysW0amxbdvxRI5+t
 bVhGRRaSr+XIpmUYC3p7QHbJ3/ct1Ikee3aK1yyTNwyd8/EGhl++1F7nnQ7FU5nb
 iGV09kDvddsX9SbZqkPyB1yosXfzQbSu5G5eQX+lqHsXU9gCLdmaq73NQBygSUq8
 EVQivUvLlvRz8zQGKA5hUqz71G8V1mLmc2b1s9r6e5mUuPXBM+UdxbvlLA+iOFRT
 WuPTU8xNlFj55CckaGGwLTXSfIYmPl8UCgSdvOTo/TPbBEE2TIaQGn/0jvuqVns7
 sDs9s9c3rNWVMc0KMZPJ6b7WIuGBgiDjSFGu2hqqNvG+X33s6qCvmnq2ZqLSVxs/
 iXqKr8eC1YP9Sr6okhdMbUcS8jqqD99YDvH94ulvfC3nx9WvMS/2JY7SBbdh4nyN
 Jb4j3BeS4C4TXRtWuPo7ks3PbRj8mvrpKdAJ74zoKZNcjXd8PvtZem2P9UzYM5K9
 9PS4T0Ne5eYHbOehWMC4t95Ijl/mYSKYCygltl2Fer29gEMGCJ4dGt3evfyaFfFZ
 2l43A+WSeYdzQRsuPnFN/oMr/Q4o1U1+ZC5HCe/1Qx/FyfSonw5/hagVWzR6IxyJ
 LsbwmxQrZrZRy3vT4gBnoEe7xdwUgenuIoeGMJfjgpLaQiC0osU=
 =00n+
 -----END PGP SIGNATURE-----

Merge 4.4.61 into android-4.4

Changes in 4.4.61:
	drm/vmwgfx: Type-check lookups of fence objects
	drm/vmwgfx: NULL pointer dereference in vmw_surface_define_ioctl()
	drm/vmwgfx: avoid calling vzalloc with a 0 size in vmw_get_cap_3d_ioctl()
	drm/ttm, drm/vmwgfx: Relax permission checking when opening surfaces
	drm/vmwgfx: Remove getparam error message
	drm/vmwgfx: fix integer overflow in vmw_surface_define_ioctl()
	sysfs: be careful of error returns from ops->show()
	staging: android: ashmem: lseek failed due to no FMODE_LSEEK.
	arm/arm64: KVM: Take mmap_sem in stage2_unmap_vm
	arm/arm64: KVM: Take mmap_sem in kvm_arch_prepare_memory_region
	iio: bmg160: reset chip when probing
	Reset TreeId to zero on SMB2 TREE_CONNECT
	ptrace: fix PTRACE_LISTEN race corrupting task->state
	ring-buffer: Fix return value check in test_ringbuffer()
	metag/usercopy: Drop unused macros
	metag/usercopy: Fix alignment error checking
	metag/usercopy: Add early abort to copy_to_user
	metag/usercopy: Zero rest of buffer from copy_from_user
	metag/usercopy: Set flags before ADDZ
	metag/usercopy: Fix src fixup in from user rapf loops
	metag/usercopy: Add missing fixups
	powerpc/mm: Add missing global TLB invalidate if cxl is active
	powerpc: Don't try to fix up misaligned load-with-reservation instructions
	nios2: reserve boot memory for device tree
	s390/decompressor: fix initrd corruption caused by bss clear
	s390/uaccess: get_user() should zero on failure (again)
	MIPS: Force o32 fp64 support on 32bit MIPS64r6 kernels
	MIPS: ralink: Fix typos in rt3883 pinctrl
	MIPS: End spinlocks with .insn
	MIPS: Lantiq: fix missing xbar kernel panic
	MIPS: Flush wrong invalid FTLB entry for huge page
	mm/mempolicy.c: fix error handling in set_mempolicy and mbind.
	Linux 4.4.61

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-04-12 22:18:35 +02:00
Wei Yongjun
5cc244782d ring-buffer: Fix return value check in test_ringbuffer()
commit 62277de758b155dc04b78f195a1cb5208c37b2df upstream.

In case of error, the function kthread_run() returns ERR_PTR()
and never returns NULL. The NULL test in the return value check
should be replaced with IS_ERR().

Link: http://lkml.kernel.org/r/1466184839-14927-1-git-send-email-weiyj_lk@163.com

Fixes: 6c43e554a ("ring-buffer: Add ring buffer startup selftest")
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-12 12:38:33 +02:00
bsegall@google.com
926e1ed2b8 ptrace: fix PTRACE_LISTEN race corrupting task->state
commit 5402e97af667e35e54177af8f6575518bf251d51 upstream.

In PT_SEIZED + LISTEN mode STOP/CONT signals cause a wakeup against
__TASK_TRACED.  If this races with the ptrace_unfreeze_traced at the end
of a PTRACE_LISTEN, this can wake the task /after/ the check against
__TASK_TRACED, but before the reset of state to TASK_TRACED.  This
causes it to instead clobber TASK_WAKING, allowing a subsequent wakeup
against TRACED while the task is still on the rq wake_list, corrupting
it.

Oleg said:
 "The kernel can crash or this can lead to other hard-to-debug problems.
  In short, "task->state = TASK_TRACED" in ptrace_unfreeze_traced()
  assumes that nobody else can wake it up, but PTRACE_LISTEN breaks the
  contract. Obviusly it is very wrong to manipulate task->state if this
  task is already running, or WAKING, or it sleeps again"

[akpm@linux-foundation.org: coding-style fixes]
Fixes: 9899d11f ("ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL")
Link: http://lkml.kernel.org/r/xm26y3vfhmkp.fsf_-_@bsegall-linux.mtv.corp.google.com
Signed-off-by: Ben Segall <bsegall@google.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-12 12:38:33 +02:00
Xin Li
e497cb596f Merge 4.4.60 into android-4.4
Changes in 4.4.60:
	libceph: force GFP_NOIO for socket allocations
	xen/setup: Don't relocate p2m over existing one
	scsi: mpt3sas: fix hang on ata passthrough commands
	scsi: sg: check length passed to SG_NEXT_CMD_LEN
	scsi: libsas: fix ata xfer length
	ALSA: seq: Fix race during FIFO resize
	ALSA: hda - fix a problem for lineout on a Dell AIO machine
	ASoC: atmel-classd: fix audio clock rate
	ACPI: Fix incompatibility with mcount-based function graph tracing
	ACPI: Do not create a platform_device for IOAPIC/IOxAPIC
	tty/serial: atmel: fix race condition (TX+DMA)
	tty/serial: atmel: fix TX path in atmel_console_write()
	USB: fix linked-list corruption in rh_call_control()
	KVM: x86: clear bus pointer when destroyed
	drm/radeon: Override fpfn for all VRAM placements in radeon_evict_flags
	mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd()
	MIPS: Lantiq: Fix cascaded IRQ setup
	rtc: s35390a: fix reading out alarm
	rtc: s35390a: make sure all members in the output are set
	rtc: s35390a: implement reset routine as suggested by the reference
	rtc: s35390a: improve irq handling
	KVM: kvm_io_bus_unregister_dev() should never fail
	power: reset: at91-poweroff: timely shutdown LPDDR memories
	blk: improve order of bio handling in generic_make_request()
	blk: Ensure users for current->bio_list can see the full list.
	padata: avoid race in reordering
	Linux 4.4.60

Change-Id: I705c78ccae62ca59f922164085e7ca03ad4ecc6b
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-04-11 14:47:16 -07:00
Lianwei Wang
a2d978c2ad UPSTREAM: PM / sleep: make PM notifiers called symmetrically
(cherry picked from commit ea00f4f4f00cc2bc3b63ad512a4e6df3b20832b9)

This makes pm notifier PREPARE/POST symmetrical: if PREPARE
fails, we will only undo what ever happened on PREPARE.

It fixes the unbalanced CPU hotplug enable in CPU PM notifier.

Change-Id: I01dce3cc95c5d6b8913b7b6be301f2909258c745
Signed-off-by: Lianwei Wang <lianwei.wang@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-04-10 17:12:04 -07:00
Jason A. Donenfeld
84bd21a708 padata: avoid race in reordering
commit de5540d088fe97ad583cc7d396586437b32149a5 upstream.

Under extremely heavy uses of padata, crashes occur, and with list
debugging turned on, this happens instead:

[87487.298728] WARNING: CPU: 1 PID: 882 at lib/list_debug.c:33
__list_add+0xae/0x130
[87487.301868] list_add corruption. prev->next should be next
(ffffb17abfc043d0), but was ffff8dba70872c80. (prev=ffff8dba70872b00).
[87487.339011]  [<ffffffff9a53d075>] dump_stack+0x68/0xa3
[87487.342198]  [<ffffffff99e119a1>] ? console_unlock+0x281/0x6d0
[87487.345364]  [<ffffffff99d6b91f>] __warn+0xff/0x140
[87487.348513]  [<ffffffff99d6b9aa>] warn_slowpath_fmt+0x4a/0x50
[87487.351659]  [<ffffffff9a58b5de>] __list_add+0xae/0x130
[87487.354772]  [<ffffffff9add5094>] ? _raw_spin_lock+0x64/0x70
[87487.357915]  [<ffffffff99eefd66>] padata_reorder+0x1e6/0x420
[87487.361084]  [<ffffffff99ef0055>] padata_do_serial+0xa5/0x120

padata_reorder calls list_add_tail with the list to which its adding
locked, which seems correct:

spin_lock(&squeue->serial.lock);
list_add_tail(&padata->list, &squeue->serial.list);
spin_unlock(&squeue->serial.lock);

This therefore leaves only place where such inconsistency could occur:
if padata->list is added at the same time on two different threads.
This pdata pointer comes from the function call to
padata_get_next(pd), which has in it the following block:

next_queue = per_cpu_ptr(pd->pqueue, cpu);
padata = NULL;
reorder = &next_queue->reorder;
if (!list_empty(&reorder->list)) {
       padata = list_entry(reorder->list.next,
                           struct padata_priv, list);
       spin_lock(&reorder->lock);
       list_del_init(&padata->list);
       atomic_dec(&pd->reorder_objects);
       spin_unlock(&reorder->lock);

       pd->processed++;

       goto out;
}
out:
return padata;

I strongly suspect that the problem here is that two threads can race
on reorder list. Even though the deletion is locked, call to
list_entry is not locked, which means it's feasible that two threads
pick up the same padata object and subsequently call list_add_tail on
them at the same time. The fix is thus be hoist that lock outside of
that block.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-08 09:53:32 +02:00
Greg Kroah-Hartman
3a75d7a947 Merge 4.4.59 into android-4.4
Changes in 4.4.59:
	xfrm: policy: init locks early
	xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window
	xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder
	virtio_balloon: init 1st buffer in stats vq
	pinctrl: qcom: Don't clear status bit on irq_unmask
	c6x/ptrace: Remove useless PTRACE_SETREGSET implementation
	h8300/ptrace: Fix incorrect register transfer count
	mips/ptrace: Preserve previous registers for short regset write
	sparc/ptrace: Preserve previous registers for short regset write
	metag/ptrace: Preserve previous registers for short regset write
	metag/ptrace: Provide default TXSTATUS for short NT_PRSTATUS
	metag/ptrace: Reject partial NT_METAG_RPIPE writes
	fscrypt: remove broken support for detecting keyring key revocation
	sched/rt: Add a missing rescheduling point
	Linux 4.4.59

Change-Id: Ifa35307b133cbf29d0a0084bb78a7b0436182b53
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-04-06 19:01:38 +00:00
Sebastian Andrzej Siewior
2bed598769 sched/rt: Add a missing rescheduling point
commit 619bd4a71874a8fd78eb6ccf9f272c5e98bcc7b7 upstream.

Since the change in commit:

  fd7a4bed18 ("sched, rt: Convert switched_{from, to}_rt() / prio_changed_rt() to balance callbacks")

... we don't reschedule a task under certain circumstances:

Lets say task-A, SCHED_OTHER, is running on CPU0 (and it may run only on
CPU0) and holds a PI lock. This task is removed from the CPU because it
used up its time slice and another SCHED_OTHER task is running. Task-B on
CPU1 runs at RT priority and asks for the lock owned by task-A. This
results in a priority boost for task-A. Task-B goes to sleep until the
lock has been made available. Task-A is already runnable (but not active),
so it receives no wake up.

The reality now is that task-A gets on the CPU once the scheduler decides
to remove the current task despite the fact that a high priority task is
enqueued and waiting. This may take a long time.

The desired behaviour is that CPU0 immediately reschedules after the
priority boost which made task-A the task with the lowest priority.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: fd7a4bed18 ("sched, rt: Convert switched_{from, to}_rt() prio_changed_rt() to balance callbacks")
Link: http://lkml.kernel.org/r/20170124144006.29821-1-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-31 09:49:54 +02:00
Greg Kroah-Hartman
373a68ca93 This is the 4.4.57 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAljXlGkACgkQONu9yGCS
 aT6/mw/9G7QpBoLEwnQbw2NVeboOiM0E9iejUkwsZQzlWspREh43qW0x5Nwk9rxl
 y+OAgiYzF6z2hxV6hHNaswEYdIzOBkSjMq2Xbjmjrbj3H8sv5GWT8yD9Cxmaoerx
 oBJ21Pe7tMK5IQnThOLRef8ZVtCLKPlr789ifCzg7iuRUnzCdV2eyrthzgkfmt4y
 rSHjoSGji1RaC9O7/7DmBvQAosfzr/eSopHz0cbLWLS17OfJ+Xa7+6xb42uzENq6
 3mZUCyT0kg8Abz3e9E2wAmKyODkGnX7fPl97Mop5vwflrZTajWMqeCTi75SMIOgj
 TONSTi5NIASjS9AKB/UTphXrGEmQV/tU+GaUB3eYqsJQygFQQgllL2S+nLaSQ2u4
 LguWDltAfz0mY3/zv5bmf3C7LmpkBxJceaEAMYhsLmJsENsbPO1rRt3plSu9dNGv
 f1g3p4xktE2BZMbsKbMZ78CsCe5gYitx/nEzCqpQsqNasw/C99N/I24nAF7g5OOa
 Kwo9mY+hjamiqPdiII5rYiPnta/358xITLoLzemLbgjtfuLC5NGO3SppUZvW5DXW
 bmn1MwChSqdNRGLeOpdlQ7lrE4DFUtIzA78WHdj7jsJgUpJGFKyZSbhAhXPX3ryV
 Jqcngw/eSRtrkU6P7ZpZzFVUun98eLpIfbKgR/UMROjZIGmCrlA=
 =sriX
 -----END PGP SIGNATURE-----

Merge 4.4.57 to android-4.4

Changes in 4.4.57:
	usb: core: hub: hub_port_init lock controller instead of bus
	USB: don't free bandwidth_mutex too early
	crypto: ghash-clmulni - Fix load failure
	crypto: cryptd - Assign statesize properly
	crypto: mcryptd - Fix load failure
	cxlflash: Increase cmd_per_lun for better throughput
	ACPI / video: skip evaluating _DOD when it does not exist
	pinctrl: cherryview: Do not mask all interrupts in probe
	Drivers: hv: balloon: don't crash when memory is added in non-sorted order
	Drivers: hv: avoid vfree() on crash
	xen/qspinlock: Don't kick CPU if IRQ is not initialized
	KVM: PPC: Book3S PR: Fix illegal opcode emulation
	s390/pci: fix use after free in dma_init
	drm/amdgpu: add missing irq.h include
	tpm_tis: Use devm_free_irq not free_irq
	hv_netvsc: use skb_get_hash() instead of a homegrown implementation
	kernek/fork.c: allocate idle task for a CPU always on its local node
	give up on gcc ilog2() constant optimizations
	perf/core: Fix event inheritance on fork()
	cpufreq: Fix and clean up show_cpuinfo_cur_freq()
	powerpc/boot: Fix zImage TOC alignment
	md/raid1/10: fix potential deadlock
	target/pscsi: Fix TYPE_TAPE + TYPE_MEDIMUM_CHANGER export
	scsi: lpfc: Add shutdown method for kexec
	scsi: libiscsi: add lock around task lists to fix list corruption regression
	target: Fix VERIFY_16 handling in sbc_parse_cdb
	isdn/gigaset: fix NULL-deref at probe
	gfs2: Avoid alignment hole in struct lm_lockname
	percpu: acquire pcpu_lock when updating pcpu_nr_empty_pop_pages
	ext4: fix fencepost in s_first_meta_bg validation
	Linux 4.4.57

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-03-29 13:53:50 +02:00
Peter Zijlstra
f02729f2ab perf/core: Fix event inheritance on fork()
commit e7cc4865f0f31698ef2f7aac01a50e78968985b7 upstream.

While hunting for clues to a use-after-free, Oleg spotted that
perf_event_init_context() can loose an error value with the result
that fork() can succeed even though we did not fully inherit the perf
event context.

Spotted-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: oleg@redhat.com
Fixes: 889ff01506 ("perf/core: Split context's event group list into pinned and non-pinned lists")
Link: http://lkml.kernel.org/r/20170316125823.190342547@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26 12:13:18 +02:00
Andi Kleen
6052eb8712 kernek/fork.c: allocate idle task for a CPU always on its local node
commit 725fc629ff2545b061407305ae51016c9f928fce upstream.

Linux preallocates the task structs of the idle tasks for all possible
CPUs.  This currently means they all end up on node 0.  This also
implies that the cache line of MWAIT, which is around the flags field in
the task struct, are all located in node 0.

We see a noticeable performance improvement on Knights Landing CPUs when
the cache lines used for MWAIT are located in the local nodes of the
CPUs using them.  I would expect this to give a (likely slight)
improvement on other systems too.

The patch implements placing the idle task in the node of its CPUs, by
passing the right target node to copy_process()

[akpm@linux-foundation.org: use NUMA_NO_NODE, not a bare -1]
Link: http://lkml.kernel.org/r/1463492694-15833-1-git-send-email-andi@firstfloor.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26 12:13:18 +02:00
Peter Zijlstra
c3b48399e0 futex: Add missing error handling to FUTEX_REQUEUE_PI
am: 99d403faba

Change-Id: I7d99bd76a3d1cd329295b157eb179fc194029c5d
2017-03-22 11:32:47 +00:00
Peter Zijlstra
2b31ed1f92 futex: Fix potential use-after-free in FUTEX_REQUEUE_PI
am: 44854c191e

Change-Id: I361f387f34ccb2a497290c3f1f33803cc899b7da
2017-03-22 11:32:38 +00:00
Peter Zijlstra
99d403faba futex: Add missing error handling to FUTEX_REQUEUE_PI
commit 9bbb25afeb182502ca4f2c4f3f88af0681b34cae upstream.

Thomas spotted that fixup_pi_state_owner() can return errors and we
fail to unlock the rt_mutex in that case.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170304093558.867401760@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22 12:04:19 +01:00
Peter Zijlstra
44854c191e futex: Fix potential use-after-free in FUTEX_REQUEUE_PI
commit c236c8e95a3d395b0494e7108f0d41cf36ec107c upstream.

While working on the futex code, I stumbled over this potential
use-after-free scenario. Dmitry triggered it later with syzkaller.

pi_mutex is a pointer into pi_state, which we drop the reference on in
unqueue_me_pi(). So any access to that pointer after that is bad.

Since other sites already do rt_mutex_unlock() with hb->lock held, see
for example futex_lock_pi(), simply move the unlock before
unqueue_me_pi().

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170304093558.801744246@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22 12:04:19 +01:00
Greg Hackmann
a5e2a1ddbc ANDROID: sched: fix duplicate sched_group_energy const specifiers
EAS uses "const struct sched_group_energy * const" fairly consistently.
But a couple of places swap the "*" and second "const", making the
pointer mutable.

In the case of struct sched_group, "* const" would have been an error,
since init_sched_energy() writes to sd->groups->sge.

Change-Id: Ic6a8fcf99e65c0f25d9cc55c32625ef3ca5c9aca
Signed-off-by: Greg Hackmann <ghackmann@google.com>
2017-03-17 16:26:10 +00:00