Commit graph

22317 commits

Author SHA1 Message Date
Greg Hackmann
9c839d00dc alarmtimer: don't rate limit one-shot timers
Commit ff86bf0c65f1 ("alarmtimer: Rate limit periodic intervals") sets a
minimum bound on the alarm timer interval.  This minimum bound shouldn't
be applied if the interval is 0.  Otherwise, one-shot timers will be
converted into periodic ones.

Fixes: ff86bf0c65f1 ("alarmtimer: Rate limit periodic intervals")
Reported-by: Ben Fennema <fennema@google.com>
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Cc: stable@vger.kernel.org
Cc: John Stultz <john.stultz@linaro.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27 15:06:10 -07:00
Chunyu Hu
bb8109a9ca tracing: Fix kmemleak in instance_rmdir
commit db9108e054700c96322b0f0028546aa4e643cf0b upstream.

Hit the kmemleak when executing instance_rmdir, it forgot releasing
mem of tracing_cpumask. With this fix, the warn does not appear any
more.

unreferenced object 0xffff93a8dfaa7c18 (size 8):
  comm "mkdir", pid 1436, jiffies 4294763622 (age 9134.308s)
  hex dump (first 8 bytes):
    ff ff ff ff ff ff ff ff                          ........
  backtrace:
    [<ffffffff88b6567a>] kmemleak_alloc+0x4a/0xa0
    [<ffffffff8861ea41>] __kmalloc_node+0xf1/0x280
    [<ffffffff88b505d3>] alloc_cpumask_var_node+0x23/0x30
    [<ffffffff88b5060e>] alloc_cpumask_var+0xe/0x10
    [<ffffffff88571ab0>] instance_mkdir+0x90/0x240
    [<ffffffff886e5100>] tracefs_syscall_mkdir+0x40/0x70
    [<ffffffff886565c9>] vfs_mkdir+0x109/0x1b0
    [<ffffffff8865b1d0>] SyS_mkdir+0xd0/0x100
    [<ffffffff88403857>] do_syscall_64+0x67/0x150
    [<ffffffff88b710e7>] return_from_SYSCALL_64+0x0/0x6a
    [<ffffffffffffffff>] 0xffffffffffffffff

Link: http://lkml.kernel.org/r/1500546969-12594-1-git-send-email-chuhu@redhat.com

Fixes: ccfe9e42e4 ("tracing: Make tracing_cpumask available for all instances")
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27 15:06:10 -07:00
Ingo Molnar
45c59e792c Revert "perf/core: Drop kernel samples even though :u is specified"
commit 6a8a75f3235724c5941a33e287b2f98966ad14c5 upstream.

This reverts commit cc1582c231ea041fbc68861dfaf957eaf902b829.

This commit introduced a regression that broke rr-project, which uses sampling
events to receive a signal on overflow (but does not care about the contents
of the sample). These signals are critical to the correct operation of rr.

There's been some back and forth about how to fix it - but to not keep
applications in limbo queue up a revert.

Reported-by: Kyle Huey <me@kylehuey.com>
Acked-by: Kyle Huey <me@kylehuey.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20170628105600.GC5981@leverpostej
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27 15:06:09 -07:00
Dan Carpenter
75202d3ffc ftrace: Fix uninitialized variable in match_records()
commit 2e028c4fe12907f226b8221815f16c2486ad3aa7 upstream.

My static checker complains that if "func" is NULL then "clear_filter"
is uninitialized.  This seems like it could be true, although it's
possible something subtle is happening that I haven't seen.

    kernel/trace/ftrace.c:3844 match_records()
    error: uninitialized symbol 'clear_filter'.

Link: http://lkml.kernel.org/r/20170712073556.h6tkpjcdzjaozozs@mwanda

Fixes: f0a3b154bd ("ftrace: Clarify code for mod command")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27 15:06:07 -07:00
Chris Redpath
2e13f308a9 sched/fair: Add a backup_cpu to find_best_target
Sometimes we find a target cpu but then we do not use it as
the energy_diff indicates that we would increase energy usage
or not save anything. To offer an additional option for those
cases, we return a second option which is what we would have
selected if the target CPU had not been found. This gives us
another chance to try to save some energy.

Change-Id: I42c4f20aba10e4cf65b51ac4153e2e00e534c8c7
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2017-07-25 16:31:00 +01:00
Chris Redpath
6cb8fcccb2 sched/fair: Try to estimate possible idle states.
In the current EAS group energy calculations, we only use
the idle state of the group as it is right now. This means
that there are times when EAS cannot see that we are about
to remove all utilization from a group which is likely to
result in us being able to idle that entire group.

This is an attempt to detect that situation and at least
allow the energy calculation to include savings in that
scenario, regardless of what we might be able to actually
achieve in the real world. If a cluster or cpu looks like
it will have some idle time available to it, we try to
map the utilization onto an idle state.

Change-Id: I8fcb1e507f65ae6a2c5647eeef75a4bf28c7a0c0
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-07-25 16:31:00 +01:00
Brendan Jackman
d96e404728 sched/fair: Sync task util before EAS wakeup
Before using a task's util_avg signal in EAS, we need to ensure that
it has been synced up to the last_update_time of prev_cpu's root
cfs_rq.

We previously relied on the side effect of wake_cap to do that,
however that does not happen when the waking CPU has the same
capacity as the prev_cpu. Therefore just explicitly call
sync_entity_load_avg. This may result in calling that function twice
within the same select_task_rq_fair, but since last_update_time
hasn't changed the second call will bail out very quickly.

Change-Id: I91f1fcd71dfeb96b7f5b73418f1cf9ac311d4655
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2017-07-25 16:31:00 +01:00
Brendan Jackman
e76348ec5f Revert "sched/fair: ensure utilization signals are synchronized before use"
This reverts commit 83f462daa3.

Change-Id: I37ba36da61df2beb3a005557d9b673027f446916
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
2017-07-25 16:31:00 +01:00
Leo Yan
ebc28671a5 sched/fair: kick nohz idle balance for misfit task
If there have misfit task on one CPU, current code does not handle this
situation for nohz idle balance. As result, we can see the misfit task
stays run on little core for long time.

So this patch check if the CPU has misfit task or not. If has misfit
task then kick nohz idle balance so finally can execute active balance.

Change-Id: I117d3b7404296f8de11cb960a87a6b9a54a9f348
Signed-off-by: Leo Yan <leo.yan at linaro.org>
[taken from https://lists.linaro.org/pipermail/eas-dev/2016-September/000551.html]
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2017-07-25 16:31:00 +01:00
Chris Redpath
7b63e1ff52 sched/fair: Update signals of nohz cpus if we are going idle
Stale cpu utilization signals can cause havoc for energy-aware systems,
and they are caused by no updates being performed for cpus which have
no tick running. There is open debate about when is the correct time to
update these cpus, and general recognition that something needs to be
done.

This is an attempt to do something useful.

When we are looking for a task to pull for a newly-idle cpu, we have
an opportunity to update the stats for any cpu which has no tick running
without causing too much disturbance to the system or waking it up.

Change-Id: I0280104ea9c53e56c26f1c56a62bacab5d3e951b
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
2017-07-25 16:31:00 +01:00
Patrick Bellasi
bf6cd4d156 events: add tracepoint for find_best_target
Change-Id: I4c245ffacb207d7ea826c5763a426efe5399e0a2
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2017-07-25 16:31:00 +01:00
Patrick Bellasi
5680f23f20 sched/fair: streamline find_best_target heuristics
The find_best_target() code has evolved over time to integrate different
micro-optimizations to the point to be quite difficult now to follow
exactly what it's doing.

This patch rafactors the existing code to make it more readable and easy
to maintain. It does that by properly identifying the three main
use-cases and addressing them in priority order:
 A) latency sensitive tasks
 B) non latency sensitive tasks on IDLE CPUs
 C) non latency sensitive tasks on ACTIVE CPUs

The original behaviors are preserved. Some tests to compare
power/performances before and after this patch have been done using
Jankbench and YouTube and we did not noticed sensible differences.

The only difference with respect of the original code is a small update
to favor lower-capacity idle CPUs in case B. The same preference is not
enforce in case A since this can lead to a selection of a non-reserved
CPU for TOP_APP tasks, which ultimately can lead to non desirable
co-scheduling side-effects.

Change-Id: I871e5d95af89176217e4e239b64d44a420baabe8
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
(removed checkpatch whitespace error)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-07-25 16:31:00 +01:00
Daniel Mentz
362e08d257 Revert "proc: smaps: Allow smaps access for CAP_SYS_RESOURCE"
This reverts commit 9d19f72b43.

This fixes CVE-2017-0710.

SELinux allows more fine grained control: We grant processes that need
access to smaps CAP_SYS_PTRACE but prohibit them from using ptrace
attach().

Bug: 34951864
Bug: 36468447
Change-Id: I8ea67f8771ec212950bc251ee750bd8a7e7c0643
Signed-off-by: Daniel Mentz <danielmentz@google.com>
2017-07-21 11:09:08 -07:00
Greg Kroah-Hartman
59ff2e15be This is the 4.4.78 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllxlOsACgkQONu9yGCS
 aT4naw//VrWIuIf523IP6egcS+iprNM1lt8HZIzTQ0b2x0qv82UFA9FSs3e97dbG
 LJAUEmHW+oBbm6AekAUrNFF62qaMM5HYxVgGYeiniA1dRvLCDG/OxzxPc0hv6Kmm
 VsbZBlPQEj2B5tqUOkpvQNcqbKpIVglBsK4tjeHE/mRziUkIoPXWKS6Tt7VKGtBa
 gIGY7VASyKhPtxGCdbKzHsO/IGi3cmpwAyhqQDRBR/5Dxy3p9xlQ4gTpobeL+x8z
 A7WHqVNbgfdz21LBii5xE8+GUwiRjlYhxeFJTjhM6wo2/XCtw1FJgc7EMmsbVtbZ
 xYu1/tkYaZGoxOQ7sH5jNgVjE8IcyVimDa5eJ7p7fbh3AzsyXzAJIRc8cS7cL40F
 jLDWrYDnm5t7ziITrAf0uoMmRZLHqS2Bv5sqaoCxR3D51r6LVaNdavD6hYN1CLRA
 fjlDvnvwxbLPI4YPr7PZNu4oSJiawKx9jEBOTFSYu9L1XLvdvdcVU6ULAdpShnLn
 80a+YJmYsNg54im7sxFUw6z87AScznzirIpXEJoUO+Hs0SN9Rq/BQx8cKvVeaMEI
 z53c4Hci45Go/ozZVqCTxGbkQ6tKWIKBSo6kl3xchwms4lmijpWuwbAkDUzECNvv
 0RgyQvNx4d2cRJ8hIVcdVFzp+h+kgNqVQQ2nDld7/1QSZHLPhFo=
 =vrgw
 -----END PGP SIGNATURE-----

Merge 4.4.78 into android-4.4

Changes in 4.4.78
	net_sched: fix error recovery at qdisc creation
	net: sched: Fix one possible panic when no destroy callback
	net/phy: micrel: configure intterupts after autoneg workaround
	ipv6: avoid unregistering inet6_dev for loopback
	net: dp83640: Avoid NULL pointer dereference.
	tcp: reset sk_rx_dst in tcp_disconnect()
	net: prevent sign extension in dev_get_stats()
	bpf: prevent leaking pointer via xadd on unpriviledged
	net: handle NAPI_GRO_FREE_STOLEN_HEAD case also in napi_frags_finish()
	ipv6: dad: don't remove dynamic addresses if link is down
	net: ipv6: Compare lwstate in detecting duplicate nexthops
	vrf: fix bug_on triggered by rx when destroying a vrf
	rds: tcp: use sock_create_lite() to create the accept socket
	brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx()
	cfg80211: Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODE
	cfg80211: Validate frequencies nested in NL80211_ATTR_SCAN_FREQUENCIES
	cfg80211: Check if PMKID attribute is of expected size
	irqchip/gic-v3: Fix out-of-bound access in gic_set_affinity
	parisc: Report SIGSEGV instead of SIGBUS when running out of stack
	parisc: use compat_sys_keyctl()
	parisc: DMA API: return error instead of BUG_ON for dma ops on non dma devs
	parisc/mm: Ensure IRQs are off in switch_mm()
	tools/lib/lockdep: Reduce MAX_LOCK_DEPTH to avoid overflowing lock_chain/: Depth
	kernel/extable.c: mark core_kernel_text notrace
	mm/list_lru.c: fix list_lru_count_node() to be race free
	fs/dcache.c: fix spin lockup issue on nlru->lock
	checkpatch: silence perl 5.26.0 unescaped left brace warnings
	binfmt_elf: use ELF_ET_DYN_BASE only for PIE
	arm: move ELF_ET_DYN_BASE to 4MB
	arm64: move ELF_ET_DYN_BASE to 4GB / 4MB
	powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB
	s390: reduce ELF_ET_DYN_BASE
	exec: Limit arg stack to at most 75% of _STK_LIM
	vt: fix unchecked __put_user() in tioclinux ioctls
	mnt: In umount propagation reparent in a separate pass
	mnt: In propgate_umount handle visiting mounts in any order
	mnt: Make propagate_umount less slow for overlapping mount propagation trees
	selftests/capabilities: Fix the test_execve test
	tpm: Get rid of chip->pdev
	tpm: Provide strong locking for device removal
	Add "shutdown" to "struct class".
	tpm: Issue a TPM2_Shutdown for TPM2 devices.
	mm: fix overflow check in expand_upwards()
	crypto: talitos - Extend max key length for SHA384/512-HMAC and AEAD
	crypto: atmel - only treat EBUSY as transient if backlog
	crypto: sha1-ssse3 - Disable avx2
	crypto: caam - fix signals handling
	sched/topology: Fix overlapping sched_group_mask
	sched/topology: Optimize build_group_mask()
	PM / wakeirq: Convert to SRCU
	PM / QoS: return -EINVAL for bogus strings
	tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results
	KVM: x86: disable MPX if host did not enable MPX XSAVE features
	kvm: vmx: Do not disable intercepts for BNDCFGS
	kvm: x86: Guest BNDCFGS requires guest MPX support
	kvm: vmx: Check value written to IA32_BNDCFGS
	kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS
	Linux 4.4.78

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-07-21 09:14:57 +02:00
Pavankumar Kondeti
999b96b4de tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results
commit c59f29cb144a6a0dfac16ede9dc8eafc02dc56ca upstream.

The 's' flag is supposed to indicate that a softirq is running. This
can be detected by testing the preempt_count with SOFTIRQ_OFFSET.

The current code tests the preempt_count with SOFTIRQ_MASK, which
would be true even when softirqs are disabled but not serving a
softirq.

Link: http://lkml.kernel.org/r/1481300417-3564-1-git-send-email-pkondeti@codeaurora.org

Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-21 07:44:59 +02:00
Lauro Ramos Venancio
988067ec96 sched/topology: Optimize build_group_mask()
commit f32d782e31bf079f600dcec126ed117b0577e85c upstream.

The group mask is always used in intersection with the group CPUs. So,
when building the group mask, we don't have to care about CPUs that are
not part of the group.

Signed-off-by: Lauro Ramos Venancio <lvenanci@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: lwang@redhat.com
Cc: riel@redhat.com
Link: http://lkml.kernel.org/r/1492717903-5195-2-git-send-email-lvenanci@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-21 07:44:59 +02:00
Peter Zijlstra
5c34f49776 sched/topology: Fix overlapping sched_group_mask
commit 73bb059f9b8a00c5e1bf2f7ca83138c05d05e600 upstream.

The point of sched_group_mask is to select those CPUs from
sched_group_cpus that can actually arrive at this balance domain.

The current code gets it wrong, as can be readily demonstrated with a
topology like:

  node   0   1   2   3
    0:  10  20  30  20
    1:  20  10  20  30
    2:  30  20  10  20
    3:  20  30  20  10

Where (for example) domain 1 on CPU1 ends up with a mask that includes
CPU0:

  [] CPU1 attaching sched-domain:
  []  domain 0: span 0-2 level NUMA
  []   groups: 1 (mask: 1), 2, 0
  []   domain 1: span 0-3 level NUMA
  []    groups: 0-2 (mask: 0-2) (cpu_capacity: 3072), 0,2-3 (cpu_capacity: 3072)

This causes sched_balance_cpu() to compute the wrong CPU and
consequently should_we_balance() will terminate early resulting in
missed load-balance opportunities.

The fixed topology looks like:

  [] CPU1 attaching sched-domain:
  []  domain 0: span 0-2 level NUMA
  []   groups: 1 (mask: 1), 2, 0
  []   domain 1: span 0-3 level NUMA
  []    groups: 0-2 (mask: 1) (cpu_capacity: 3072), 0,2-3 (cpu_capacity: 3072)

(note: this relies on OVERLAP domains to always have children, this is
 true because the regular topology domains are still here -- this is
 before degenerate trimming)

Debugged-by: Lauro Ramos Venancio <lvenanci@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: e3589f6c81 ("sched: Allow for overlapping sched_domain spans")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-21 07:44:59 +02:00
Marcin Nowakowski
717ce69e47 kernel/extable.c: mark core_kernel_text notrace
commit c0d80ddab89916273cb97114889d3f337bc370ae upstream.

core_kernel_text is used by MIPS in its function graph trace processing,
so having this method traced leads to an infinite set of recursive calls
such as:

  Call Trace:
     ftrace_return_to_handler+0x50/0x128
     core_kernel_text+0x10/0x1b8
     prepare_ftrace_return+0x6c/0x114
     ftrace_graph_caller+0x20/0x44
     return_to_handler+0x10/0x30
     return_to_handler+0x0/0x30
     return_to_handler+0x0/0x30
     ftrace_ops_no_ops+0x114/0x1bc
     core_kernel_text+0x10/0x1b8
     core_kernel_text+0x10/0x1b8
     core_kernel_text+0x10/0x1b8
     ftrace_ops_no_ops+0x114/0x1bc
     core_kernel_text+0x10/0x1b8
     prepare_ftrace_return+0x6c/0x114
     ftrace_graph_caller+0x20/0x44
     (...)

Mark the function notrace to avoid it being traced.

Link: http://lkml.kernel.org/r/1498028607-6765-1-git-send-email-marcin.nowakowski@imgtec.com
Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Meyer <thomas@m3y3r.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-21 07:44:56 +02:00
Daniel Borkmann
1a4f13e0a9 bpf: prevent leaking pointer via xadd on unpriviledged
commit 6bdf6abc56b53103324dfd270a86580306e1a232 upstream.

Leaking kernel addresses on unpriviledged is generally disallowed,
for example, verifier rejects the following:

  0: (b7) r0 = 0
  1: (18) r2 = 0xffff897e82304400
  3: (7b) *(u64 *)(r1 +48) = r2
  R2 leaks addr into ctx

Doing pointer arithmetic on them is also forbidden, so that they
don't turn into unknown value and then get leaked out. However,
there's xadd as a special case, where we don't check the src reg
for being a pointer register, e.g. the following will pass:

  0: (b7) r0 = 0
  1: (7b) *(u64 *)(r1 +48) = r0
  2: (18) r2 = 0xffff897e82304400 ; map
  4: (db) lock *(u64 *)(r1 +48) += r2
  5: (95) exit

We could store the pointer into skb->cb, loose the type context,
and then read it out from there again to leak it eventually out
of a map value. Or more easily in a different variant, too:

   0: (bf) r6 = r1
   1: (7a) *(u64 *)(r10 -8) = 0
   2: (bf) r2 = r10
   3: (07) r2 += -8
   4: (18) r1 = 0x0
   6: (85) call bpf_map_lookup_elem#1
   7: (15) if r0 == 0x0 goto pc+3
   R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R6=ctx R10=fp
   8: (b7) r3 = 0
   9: (7b) *(u64 *)(r0 +0) = r3
  10: (db) lock *(u64 *)(r0 +0) += r6
  11: (b7) r0 = 0
  12: (95) exit

  from 7 to 11: R0=inv,min_value=0,max_value=0 R6=ctx R10=fp
  11: (b7) r0 = 0
  12: (95) exit

Prevent this by checking xadd src reg for pointer types. Also
add a couple of test cases related to this.

Fixes: 1be7f75d16 ("bpf: enable non-root eBPF programs")
Fixes: 17a5267067 ("bpf: verifier (add verifier core)")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-21 07:44:55 +02:00
Chris Redpath
2ee9941b0b UPSTREAM: cpufreq: schedutil: Trace frequency only if it has changed
sugov_update_commit() calls trace_cpu_frequency() to record the
current CPU frequency if it has not changed in the fast switch case
to prevent utilities from getting confused (they may report that the
CPU is idle if the frequency has not been recorded for too long, for
example).

However, that may cause the tracepoint to be triggered quite often
for no real reason (if the frequency doesn't change, we will not
modify the last update time stamp and governor computations may
run again shortly when that happens), so don't do that (arguably, it
is done to work around a utilities bug anyway).

That allows code duplication in sugov_update_commit() to be reduced
somewhat too.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
(cherry picked from commit 38d4ea229d25d30be6bf41bcd6cd663a587866ca)
(conflicts with sugov_up_down_rate_limit resolved)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: Ia019dda29b8c1c4cf3553da75c88d066eb5674e9
2017-07-18 18:18:53 +00:00
Chris Redpath
537d19226a UPSTREAM: cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely
The way the schedutil governor uses the PELT metric causes it to
underestimate the CPU utilization in some cases.

That can be easily demonstrated by running kernel compilation on
a Sandy Bridge Intel processor, running turbostat in parallel with
it and looking at the values written to the MSR_IA32_PERF_CTL
register.  Namely, the expected result would be that when all CPUs
were 100% busy, all of them would be requested to run in the maximum
P-state, but observation shows that this clearly isn't the case.
The CPUs run in the maximum P-state for a while and then are
requested to run slower and go back to the maximum P-state after
a while again.  That causes the actual frequency of the processor to
visibly oscillate below the sustainable maximum in a jittery fashion
which clearly is not desirable.

That has been attributed to CPU utilization metric updates on task
migration that cause the total utilization value for the CPU to be
reduced by the utilization of the migrated task.  If that happens,
the schedutil governor may see a CPU utilization reduction and will
attempt to reduce the CPU frequency accordingly right away.  That
may be premature, though, for example if the system is generally
busy and there are other runnable tasks waiting to be run on that
CPU already.

This is unlikely to be an issue on systems where cpufreq policies are
shared between multiple CPUs, because in those cases the policy
utilization is computed as the maximum of the CPU utilization values
over the whole policy and if that turns out to be low, reducing the
frequency for the policy most likely is a good idea anyway.  On
systems with one CPU per policy, however, it may affect performance
adversely and even lead to increased energy consumption in some cases.

On those systems it may be addressed by taking another utilization
metric into consideration, like whether or not the CPU whose
frequency is about to be reduced has been idle recently, because if
that's not the case, the CPU is likely to be busy in the near future
and its frequency should not be reduced.

To that end, use the counter of idle calls in the timekeeping code.
Namely, make the schedutil governor look at that counter for the
current CPU every time before its frequency is about to be reduced.
If the counter has not changed since the previous iteration of the
governor computations for that CPU, the CPU has been busy for all
that time and its frequency should not be decreased, so if the new
frequency would be lower than the one set previously, the governor
will skip the frequency update.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Joel Fernandes <joelaf@google.com>
(cherry picked from commit b7eaf1aab9f8bd2e49fceed77ebc66c1b5800718)
(simple CPUFREQ_RT_DL vs CPUFREQ_DL usage conflicts)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: I531ec02c052944ee07a904dc2a25c59948ee762b
2017-07-18 18:18:46 +00:00
Chris Redpath
a8a200d83b UPSTREAM: cpufreq: schedutil: Refactor sugov_next_freq_shared()
The loop in sugov_next_freq_shared() contains an if block to skip the
loop for the current CPU. This turns out to be an unnecessary
conditional in the scheduler's hot-path for every CPU in the policy.

It would be better to drop the conditional and make the loop treat all
the CPUs in the same way. That would eliminate the need of calling
sugov_iowait_boost() at the top of the routine.

To keep the code optimized to return early if the current CPU has RT/DL
flags set, move the flags check to sugov_update_shared() instead in
order to avoid the function call entirely.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit cba1dfb57b94c234728b689d9b00d4267fa1a879)
(modified for SCHED_CPUFREQ_DL vs SCHED_CPUFREQ_RT)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: Ie046fdc8eda46821356750edd0fb6f7d077af363
2017-07-18 18:18:40 +00:00
Chris Redpath
7378c38a80 UPSTREAM: cpufreq: schedutil: Fix per-CPU structure initialization in sugov_start()
sugov_start() only initializes struct sugov_cpu per-CPU structures
for shared policies, but it should do that for single-CPU policies too.

That in particular makes the IO-wait boost mechanism work in the
cases when cpufreq policies correspond to individual CPUs.

Fixes: 21ca6d2c52f8 (cpufreq: schedutil: Add iowait boosting)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.9+ <stable@vger.kernel.org> # 4.9+
(cherry picked from commit 4296f23ed49a15d36949458adcc66ff993dee2a8)
(we use SCHED_CPUFREQ_DL instead of SCHED_CPUFREQ_RT in cpu->flags)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: I5b837a0ee4432115d85caa1a9808ea61e1e1b07f
2017-07-18 18:18:33 +00:00
Viresh Kumar
cbaccedead UPSTREAM: cpufreq: schedutil: Pass sg_policy to get_next_freq()
get_next_freq() uses sg_cpu only to get sg_policy, which the callers of
get_next_freq() already have. Pass sg_policy instead of sg_cpu to
get_next_freq(), to make it more efficient.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 655cb1ebff4b7918fc560502c3297af2d3c7d114)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: Ia210058da32930a6cdb18258aa679cd1a44a747e
2017-07-18 18:18:27 +00:00
Chris Redpath
0646dd3592 UPSTREAM: cpufreq: schedutil: move cached_raw_freq to struct sugov_policy
cached_raw_freq applies to the entire cpufreq policy and not individual
CPUs. Apart from wasting per-cpu memory, it is actually wrong to keep it
in struct sugov_cpu as we may end up comparing next_freq with a stale
cached_raw_freq of a random CPU.

Move cached_raw_freq to struct sugov_policy.

Fixes: 5cbea46984d6 (cpufreq: schedutil: map raw required frequency to driver frequency)
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry-picked from 6c4f0fa643cb9e775dcc976e3db00d649468ff1d)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: Ie91420f710819b383947f9031da9be1f3bb7f636
2017-07-18 18:18:20 +00:00
Viresh Kumar
69fc75780d UPSTREAM: cpufreq: schedutil: Rectify comment in sugov_irq_work() function
This patch rectifies a comment present in sugov_irq_work() function to
follow proper grammar.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit d06e622d3d9206e6a2cc45a0f9a3256da8773ff4)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: Iaf996445d411725639d511432cc424086892a146
2017-07-18 18:18:13 +00:00
Chris Redpath
d9e7d036e7 UPSTREAM: cpufreq: schedutil: irq-work and mutex are only used in slow path
Execute the irq-work specific initialization/exit code only when the
fast path isn't available.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 21ef57297b15a49b0c4dd4e7135c1a08e9a29a1c)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: Icfd68f455ef71846d799fcd2d8ec6aa1bf59573e
2017-07-18 18:18:03 +00:00
Chris Redpath
ceed1eb2b4 UPSTREAM: cpufreq: schedutil: enable fast switch earlier
The fast_switch_enabled flag will be used by both sugov_policy_alloc()
and sugov_policy_free() with a later patch.

Prepare for that by moving the calls to enable and disable it to the
beginning of sugov_init() and end of sugov_exit().

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 4a71ce4348bb61740d411822357061f8bf870f4c)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: Ia174f423ca02d59360657ac2e77a5098ce5cf99c
2017-07-18 18:13:58 +00:00
Chris Redpath
bab9c2fbe4 UPSTREAM: cpufreq: schedutil: Avoid indented labels
Switch to the more common practice of writing labels.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 8e2ddb03643eb9d0bc4926946d7ce0d308eef0a5)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: Ida75c99cf3dff5cae24d3866454c83bcdb3385b9
2017-07-18 18:09:20 +00:00
Greg Kroah-Hartman
cc3d2b7361 This is the 4.4.77 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllp5y8ACgkQONu9yGCS
 aT6g8hAApzYi9TwiaF6wyYXsrp7YvOm4NyMaVBl4t7v/nFql6VsUL+qWaJKB5EL9
 o72ybYPUzbxGTVWCm/wiBO31VWea0ak0pBbyywBiowGgwAcgG/jpqZobale4Y2TE
 15jEpmpA5+3BmXpMkrv/dz4LHZ4jm65/ADhMbkPGRZqUJ3mHmyVoi50l67dpTE5+
 xWQIErycwlVMppJGnXPeFFgeD7Etch7OJ9CishQRNMb3F8H58WiQrMWWe1NfL0DO
 H2g18IBHMsxEYJqnRqxviTOMe8S96Km+lKGX0LOTRYt+2OQpfIF7buU6N+6C96rO
 7V2n2G02m2mOFVUFlDYF1RQ9IBrxHJf9iGkaZBwsaxX7XAK63ZjRxgjnEL7gMPU/
 TMCOWZ53BdZezz2eAmdhySsV+4Xt6MmJJE8rR47AgsM2Le3tgK421zmraunmA0fR
 eoJS99YHcftAHXCD3puGLafEwGVe0G4eQbY4L7mj1Y9VjaAbmmsWq9rlNOQMZRgH
 JTNyYik1C7yGPJX1iKi9hLAKldzBwPuM3GfZMZQIOjA4t2VtSon7in5iKrihRg3N
 BSKXr6+orNw32tsqcC4kpLPbFUFb6zx3EKELwSJwD9ICN7swJEk7gFw7w/F/SOxI
 C1W4Ulm6EcYTWHDePERQ4zHlllHAmyJup61d9HnwA6HhPOLaff4=
 =oeNk
 -----END PGP SIGNATURE-----

Merge 4.4.77 into android-4.4

Changes in 4.4.77
	fs: add a VALID_OPEN_FLAGS
	fs: completely ignore unknown open flags
	driver core: platform: fix race condition with driver_override
	bgmac: reset & enable Ethernet core before using it
	mm: fix classzone_idx underflow in shrink_zones()
	tracing/kprobes: Allow to create probe with a module name starting with a digit
	drm/virtio: don't leak bo on drm_gem_object_init failure
	usb: dwc3: replace %p with %pK
	USB: serial: cp210x: add ID for CEL EM3588 USB ZigBee stick
	Add USB quirk for HVR-950q to avoid intermittent device resets
	usb: usbip: set buffer pointers to NULL after free
	usb: Fix typo in the definition of Endpoint[out]Request
	mac80211_hwsim: Replace bogus hrtimer clockid
	sysctl: don't print negative flag for proc_douintvec
	sysctl: report EINVAL if value is larger than UINT_MAX for proc_douintvec
	pinctrl: sh-pfc: r8a7791: Fix SCIF2 pinmux data
	pinctrl: meson: meson8b: fix the NAND DQS pins
	pinctrl: sunxi: Fix SPDIF function name for A83T
	pinctrl: mxs: atomically switch mux and drive strength config
	pinctrl: sh-pfc: Update info pointer after SoC-specific init
	USB: serial: option: add two Longcheer device ids
	USB: serial: qcserial: new Sierra Wireless EM7305 device ID
	gfs2: Fix glock rhashtable rcu bug
	x86/tools: Fix gcc-7 warning in relocs.c
	x86/uaccess: Optimize copy_user_enhanced_fast_string() for short strings
	ath10k: override CE5 config for QCA9377
	KEYS: Fix an error code in request_master_key()
	RDMA/uverbs: Check port number supplied by user verbs cmds
	mqueue: fix a use-after-free in sys_mq_notify()
	tools include: Add a __fallthrough statement
	tools string: Use __fallthrough in perf_atoll()
	tools strfilter: Use __fallthrough
	perf top: Use __fallthrough
	perf intel-pt: Use __fallthrough
	perf thread_map: Correctly size buffer used with dirent->dt_name
	perf scripting perl: Fix compile error with some perl5 versions
	perf tests: Avoid possible truncation with dirent->d_name + snprintf
	perf bench numa: Avoid possible truncation when using snprintf()
	perf tools: Use readdir() instead of deprecated readdir_r()
	perf thread_map: Use readdir() instead of deprecated readdir_r()
	perf script: Use readdir() instead of deprecated readdir_r()
	perf tools: Remove duplicate const qualifier
	perf annotate browser: Fix behaviour of Shift-Tab with nothing focussed
	perf pmu: Fix misleadingly indented assignment (whitespace)
	perf dwarf: Guard !x86_64 definitions under #ifdef else clause
	perf trace: Do not process PERF_RECORD_LOST twice
	perf tests: Remove wrong semicolon in while loop in CQM test
	perf tools: Use readdir() instead of deprecated readdir_r() again
	md: fix incorrect use of lexx_to_cpu in does_sb_need_changing
	md: fix super_offset endianness in super_1_rdev_size_change
	tcp: fix tcp_mark_head_lost to check skb len before fragmenting
	staging: vt6556: vnt_start Fix missing call to vnt_key_init_table.
	staging: comedi: fix clean-up of comedi_class in comedi_init()
	ext4: check return value of kstrtoull correctly in reserved_clusters_store
	x86/mm/pat: Don't report PAT on CPUs that don't support it
	saa7134: fix warm Medion 7134 EEPROM read
	Linux 4.4.77

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-07-15 13:29:08 +02:00
Liping Zhang
a2148222e3 sysctl: report EINVAL if value is larger than UINT_MAX for proc_douintvec
commit 425fffd886bae3d127a08fa6a17f2e31e24ed7ff upstream.

Currently, inputting the following command will succeed but actually the
value will be truncated:

  # echo 0x12ffffffff > /proc/sys/net/ipv4/tcp_notsent_lowat

This is not friendly to the user, so instead, we should report error
when the value is larger than UINT_MAX.

Fixes: e7d316a02f68 ("sysctl: handle error writing UINT_MAX to u32 fields")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-15 11:57:46 +02:00
Liping Zhang
e8505e6432 sysctl: don't print negative flag for proc_douintvec
commit 5380e5644afbba9e3d229c36771134976f05c91e upstream.

I saw some very confusing sysctl output on my system:
  # cat /proc/sys/net/core/xfrm_aevent_rseqth
  -2
  # cat /proc/sys/net/core/xfrm_aevent_etime
  -10
  # cat /proc/sys/net/ipv4/tcp_notsent_lowat
  -4294967295

Because we forget to set the *negp flag in proc_douintvec, so it will
become a garbage value.

Since the value related to proc_douintvec is always an unsigned integer,
so we can set *negp to false explictily to fix this issue.

Fixes: e7d316a02f68 ("sysctl: handle error writing UINT_MAX to u32 fields")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-15 11:57:46 +02:00
Sabrina Dubroca
fe0bb2ac16 tracing/kprobes: Allow to create probe with a module name starting with a digit
commit 9e52b32567126fe146f198971364f68d3bc5233f upstream.

Always try to parse an address, since kstrtoul() will safely fail when
given a symbol as input. If that fails (which will be the case for a
symbol), try to parse a symbol instead.

This allows creating a probe such as:

    p:probe/vlan_gro_receive 8021q:vlan_gro_receive+0

Which is necessary for this command to work:

    perf probe -m 8021q -a vlan_gro_receive

Link: http://lkml.kernel.org/r/fd72d666f45b114e2c5b9cf7e27b91de1ec966f1.1498122881.git.sd@queasysnail.net

Fixes: 413d37d1e ("tracing: Add kprobe-based event tracer")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-15 11:57:45 +02:00
Joonwoo Park
d368c6faa1 sched: walt: fix window misalignment when HZ=300
Due to rounding error hrtimer tick interval becomes 3333333 ns when HZ=300.
Consequently the tick time stamp nearest to the WALT's default window size
20ms will be also 19999998 (3333333 * 6).

Change-Id: I08f9bd2dbecccbb683e4490d06d8b0da703d3ab2
Suggested-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2017-07-12 21:01:07 +00:00
Greg Kroah-Hartman
64a73ff728 This is the 4.4.76 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllc3f0ACgkQONu9yGCS
 aT4fmA/+OHeYbhpaMRKqrUpsxB3NpROr2Z47ow6vaVjYZzd0irrODLlfIfDQ6EEo
 N3v28povu16VeYXk+4h8bsAP2K2j6/BlRaSi2hB6dmnY8GDMaXEfRojPYAlzVz50
 qnK/6152siDDarUx1h5Zc8GcmX/tEl6h3bOOxDcwLR+RvyIcWxenuR+uqRM/AV6o
 BPEiOuMu7P6LjID7KYgBTFNajVBMLrDXt4SCWdzOZmlNt0QXgKB9yw68vTcc+edC
 ZcXqa0M6nEWSDvwobbwBZhFL8H2dJjzweyjeFBgxnxgmOrRh6kvZG2wsz2c8O3/P
 g8TuMxU7siu+I3lFwKy+dgZ/1REz+6Q3oFBqXsuddrcPYu23rV6mz/GxqWy4cerb
 M4eTWz6L9vA2GoYpvBaWi0tKC9tkNM49g48Y24a6CW1O4dJWlz3RrpTiZmequbNF
 mo8EKomSXn4kYAm1xT03DGljQkK/i2JtyI5sk2hLEqqxKvZ/3q9xxLLKOVx8dPvs
 PIbfpapfYMXXMWgR6e+UKueNLgevfWE12X/OU4SgvSY4n/07/mH40XEd3zd82IsZ
 1Mw0qj3JnqCAFDBBMsDYa+OvABaGD1dHARuiv+aeqW8tqoBglFHxWqF+SQVNXLIE
 qTLiKz78vjQpH0zGpkA3HEOh/h4L7a0y3qRMECsk5SUxXsgu1gg=
 =bwNU
 -----END PGP SIGNATURE-----

Merge 4.4.76 into android-4.4

Changes in 4.4.76
	ipv6: release dst on error in ip6_dst_lookup_tail
	net: don't call strlen on non-terminated string in dev_set_alias()
	decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb
	net: Zero ifla_vf_info in rtnl_fill_vfinfo()
	af_unix: Add sockaddr length checks before accessing sa_family in bind and connect handlers
	Fix an intermittent pr_emerg warning about lo becoming free.
	net: caif: Fix a sleep-in-atomic bug in cfpkt_create_pfx
	igmp: acquire pmc lock for ip_mc_clear_src()
	igmp: add a missing spin_lock_init()
	ipv6: fix calling in6_ifa_hold incorrectly for dad work
	net/mlx5: Wait for FW readiness before initializing command interface
	decnet: always not take dst->__refcnt when inserting dst into hash table
	net: 8021q: Fix one possible panic caused by BUG_ON in free_netdev
	sfc: provide dummy definitions of vswitch functions
	ipv6: Do not leak throw route references
	rtnetlink: add IFLA_GROUP to ifla_policy
	netfilter: xt_TCPMSS: add more sanity tests on tcph->doff
	netfilter: synproxy: fix conntrackd interaction
	NFSv4: fix a reference leak caused WARNING messages
	drm/ast: Handle configuration without P2A bridge
	mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff()
	MIPS: Avoid accidental raw backtrace
	MIPS: pm-cps: Drop manual cache-line alignment of ready_count
	MIPS: Fix IRQ tracing & lockdep when rescheduling
	ALSA: hda - Fix endless loop of codec configure
	ALSA: hda - set input_path bitmap to zero after moving it to new place
	drm/vmwgfx: Free hash table allocated by cmdbuf managed res mgr
	usb: gadget: f_fs: Fix possibe deadlock
	sysctl: enable strict writes
	block: fix module reference leak on put_disk() call for cgroups throttle
	mm: numa: avoid waiting on freed migrated pages
	KVM: x86: fix fixing of hypercalls
	scsi: sd: Fix wrong DPOFUA disable in sd_read_cache_type
	scsi: lpfc: Set elsiocb contexts to NULL after freeing it
	qla2xxx: Fix erroneous invalid handle message
	ARM: dts: BCM5301X: Correct GIC_PPI interrupt flags
	net: mvneta: Fix for_each_present_cpu usage
	MIPS: ath79: fix regression in PCI window initialization
	net: korina: Fix NAPI versus resources freeing
	MIPS: ralink: MT7688 pinmux fixes
	MIPS: ralink: fix USB frequency scaling
	MIPS: ralink: Fix invalid assignment of SoC type
	MIPS: ralink: fix MT7628 pinmux typos
	MIPS: ralink: fix MT7628 wled_an pinmux gpio
	mtd: bcm47xxpart: limit scanned flash area on BCM47XX (MIPS) only
	bgmac: fix a missing check for build_skb
	mtd: bcm47xxpart: don't fail because of bit-flips
	bgmac: Fix reversed test of build_skb() return value.
	net: bgmac: Fix SOF bit checking
	net: bgmac: Start transmit queue in bgmac_open
	net: bgmac: Remove superflous netif_carrier_on()
	powerpc/eeh: Enable IO path on permanent error
	gianfar: Do not reuse pages from emergency reserve
	Btrfs: fix truncate down when no_holes feature is enabled
	virtio_console: fix a crash in config_work_handler
	swiotlb-xen: update dev_addr after swapping pages
	xen-netfront: Fix Rx stall during network stress and OOM
	scsi: virtio_scsi: Reject commands when virtqueue is broken
	platform/x86: ideapad-laptop: handle ACPI event 1
	amd-xgbe: Check xgbe_init() return code
	net: dsa: Check return value of phy_connect_direct()
	drm/amdgpu: check ring being ready before using
	vfio/spapr: fail tce_iommu_attach_group() when iommu_data is null
	virtio_net: fix PAGE_SIZE > 64k
	vxlan: do not age static remote mac entries
	ibmveth: Add a proper check for the availability of the checksum features
	kernel/panic.c: add missing \n
	HID: i2c-hid: Add sleep between POWER ON and RESET
	scsi: lpfc: avoid double free of resource identifiers
	spi: davinci: use dma_mapping_error()
	mac80211: initialize SMPS field in HT capabilities
	x86/mpx: Use compatible types in comparison to fix sparse error
	coredump: Ensure proper size of sparse core files
	swiotlb: ensure that page-sized mappings are page-aligned
	s390/ctl_reg: make __ctl_load a full memory barrier
	be2net: fix status check in be_cmd_pmac_add()
	perf probe: Fix to show correct locations for events on modules
	net/mlx4_core: Eliminate warning messages for SRQ_LIMIT under SRIOV
	sctp: check af before verify address in sctp_addr_id2transport
	ravb: Fix use-after-free on `ifconfig eth0 down`
	jump label: fix passing kbuild_cflags when checking for asm goto support
	xfrm: fix stack access out of bounds with CONFIG_XFRM_SUB_POLICY
	xfrm: NULL dereference on allocation failure
	xfrm: Oops on error in pfkey_msg2xfrm_state()
	watchdog: bcm281xx: Fix use of uninitialized spinlock.
	sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting
	ARM64/ACPI: Fix BAD_MADT_GICC_ENTRY() macro implementation
	ARM: 8685/1: ensure memblock-limit is pmd-aligned
	x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space
	x86/mm: Fix flush_tlb_page() on Xen
	ocfs2: o2hb: revert hb threshold to keep compatible
	iommu/vt-d: Don't over-free page table directories
	iommu: Handle default domain attach failure
	iommu/amd: Fix incorrect error handling in amd_iommu_bind_pasid()
	cpufreq: s3c2416: double free on driver init error path
	KVM: x86: fix emulation of RSM and IRET instructions
	KVM: x86/vPMU: fix undefined shift in intel_pmu_refresh()
	KVM: x86: zero base3 of unusable segments
	KVM: nVMX: Fix exception injection
	Linux 4.4.76

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-07-05 16:16:58 +02:00
Matt Fleming
6ca11db55f sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting
commit 6e5f32f7a43f45ee55c401c0b9585eb01f9629a8 upstream.

If we crossed a sample window while in NO_HZ we will add LOAD_FREQ to
the pending sample window time on exit, setting the next update not
one window into the future, but two.

This situation on exiting NO_HZ is described by:

  this_rq->calc_load_update < jiffies < calc_load_update

In this scenario, what we should be doing is:

  this_rq->calc_load_update = calc_load_update		     [ next window ]

But what we actually do is:

  this_rq->calc_load_update = calc_load_update + LOAD_FREQ   [ next+1 window ]

This has the effect of delaying load average updates for potentially
up to ~9seconds.

This can result in huge spikes in the load average values due to
per-cpu uninterruptible task counts being out of sync when accumulated
across all CPUs.

It's safe to update the per-cpu active count if we wake between sample
windows because any load that we left in 'calc_load_idle' will have
been zero'd when the idle load was folded in calc_global_load().

This issue is easy to reproduce before,

  commit 9d89c257df ("sched/fair: Rewrite runnable load and utilization average tracking")

just by forking short-lived process pipelines built from ps(1) and
grep(1) in a loop. I'm unable to reproduce the spikes after that
commit, but the bug still seems to be present from code review.

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Fixes: commit 5167e8d ("sched/nohz: Rewrite and fix load-avg computation -- again")
Link: http://lkml.kernel.org/r/20170217120731.11868-2-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-05 14:37:21 +02:00
Jiri Slaby
70f41003b9 kernel/panic.c: add missing \n
[ Upstream commit ff7a28a074ccbea999dadbb58c46212cf90984c6 ]

When a system panics, the "Rebooting in X seconds.." message is never
printed because it lacks a new line.  Fix it.

Link: http://lkml.kernel.org/r/20170119114751.2724-1-jslaby@suse.cz
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-05 14:37:19 +02:00
Kees Cook
2449a71eb9 sysctl: enable strict writes
commit 41662f5cc55335807d39404371cfcbb1909304c4 upstream.

SYSCTL_WRITES_WARN was added in commit f4aacea2f5 ("sysctl: allow for
strict write position handling"), and released in v3.16 in August of
2014.  Since then I can find only 1 instance of non-zero offset
writing[1], and it was fixed immediately in CRIU[2].  As such, it
appears safe to flip this to the strict state now.

[1] https://www.google.com/search?q="when%20file%20position%20was%20not%200"
[2] http://lists.openvz.org/pipermail/criu/2015-April/019819.html

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-05 14:37:16 +02:00
Greg Kroah-Hartman
8c91412c32 This is the 4.4.75 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllU2ygACgkQONu9yGCS
 aT4OwA/9GDjnY24njdA5QwYaX7PLAI3mFItUyDmh8daJJhHnz60Q4lB9gMhNVGxU
 7Ov8D00ivsgfAOnadIiyypScy5Ype6US8rfjlVqki16/ExXLg+4yjri/x/q1MjTI
 UjRcJCfMG6XumQ3774yycO3k+gyKgJg2N21C43ewBDLE787mL1TJ4GNtfYgAOrex
 yoW4ZzevaCoik4m7+9RE/Kc6nXJpUeyJ0mJjIhGagRjGLJRTGcoaV7BsyFRDMLJ8
 FlMhf1zPfj1L2lsUkx6qdmeOTfioFra9S82F8+X36Qs1cS0n2RCTyGZL+DxaWqYc
 gw0iVHc7nChZaULWAcAqDwmodI3uYlAHJFw8VwbwQ35DALJJvqUEoSE5avfvF8TB
 daCroZhTuEA1vG8Ui0ZWkQHHSN0p/153zm+ynn9STFnUCh5UDcsTtzWg7tio6Eih
 /l2xeIQI/E+VFtf2GkxGEdxSnVWPQxXEWpCHIb2Us3UCCh+E+tzMqj5rsV8HLzSl
 2LKofqZpHql+nAb+Gf8y/OglPveEeNM5Wy5d6lnU1A652ZoLN+ErV2C7XiK91yG8
 DEVj5Bsko1yZMpPznDLnbeE20xnscrlU1hFJ1qaMiyD2dYnfb7oDdKujXKONoEzM
 MLSQkxIn4bPdqEEK8uDrZo/JMVC5/uS+auXZXK7h+dEDusvjEzI=
 =Dwzp
 -----END PGP SIGNATURE-----

Merge 4.4.75 into android-4.4

Changes in 4.4.75
	fs/exec.c: account for argv/envp pointers
	autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL
	lib/cmdline.c: fix get_options() overflow while parsing ranges
	KVM: PPC: Book3S HV: Preserve userspace HTM state properly
	CIFS: Improve readdir verbosity
	HID: Add quirk for Dell PIXART OEM mouse
	signal: Only reschedule timers on signals timers have sent
	powerpc/kprobes: Pause function_graph tracing during jprobes handling
	Input: i8042 - add Fujitsu Lifebook AH544 to notimeout list
	time: Fix clock->read(clock) race around clocksource changes
	target: Fix kref->refcount underflow in transport_cmd_finish_abort
	iscsi-target: Reject immediate data underflow larger than SCSI transfer length
	drm/radeon: add a PX quirk for another K53TK variant
	drm/radeon: add a quirk for Toshiba Satellite L20-183
	drm/amdgpu/atom: fix ps allocation size for EnableDispPowerGating
	drm/amdgpu: adjust default display clock
	USB: usbip: fix nonconforming hub descriptor
	rxrpc: Fix several cases where a padded len isn't checked in ticket decode
	of: Add check to of_scan_flat_dt() before accessing initial_boot_params
	mtd: spi-nor: fix spansion quad enable
	powerpc/slb: Force a full SLB flush when we insert for a bad EA
	usb: gadget: f_fs: avoid out of bounds access on comp_desc
	net: phy: Initialize mdio clock at probe function
	net: phy: fix marvell phy status reading
	nvme/quirk: Add a delay before checking for adapter readiness
	nvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too
	Linux 4.4.75

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-06-29 14:23:32 +02:00
John Stultz
1fecf3977d time: Fix clock->read(clock) race around clocksource changes
commit ceea5e3771ed2378668455fa21861bead7504df5 upstream.

In tests, which excercise switching of clocksources, a NULL
pointer dereference can be observed on AMR64 platforms in the
clocksource read() function:

u64 clocksource_mmio_readl_down(struct clocksource *c)
{
	return ~(u64)readl_relaxed(to_mmio_clksrc(c)->reg) & c->mask;
}

This is called from the core timekeeping code via:

	cycle_now = tkr->read(tkr->clock);

tkr->read is the cached tkr->clock->read() function pointer.
When the clocksource is changed then tkr->clock and tkr->read
are updated sequentially. The code above results in a sequential
load operation of tkr->read and tkr->clock as well.

If the store to tkr->clock hits between the loads of tkr->read
and tkr->clock, then the old read() function is called with the
new clock pointer. As a consequence the read() function
dereferences a different data structure and the resulting 'reg'
pointer can point anywhere including NULL.

This problem was introduced when the timekeeping code was
switched over to use struct tk_read_base. Before that, it was
theoretically possible as well when the compiler decided to
reload clock in the code sequence:

     now = tk->clock->read(tk->clock);

Add a helper function which avoids the issue by reading
tk_read_base->clock once into a local variable clk and then issue
the read function via clk->read(clk). This guarantees that the
read() function always gets the proper clocksource pointer handed
in.

Since there is now no use for the tkr.read pointer, this patch
also removes it, and to address stopping the fast timekeeper
during suspend/resume, it introduces a dummy clocksource to use
rather then just a dummy read function.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Daniel Mentz <danielmentz@google.com>
Link: http://lkml.kernel.org/r/1496965462-20003-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29 12:48:51 +02:00
Eric W. Biederman
bc7b3e9984 signal: Only reschedule timers on signals timers have sent
commit 57db7e4a2d92c2d3dfbca4ef8057849b2682436b upstream.

Thomas Gleixner  wrote:
> The CRIU support added a 'feature' which allows a user space task to send
> arbitrary (kernel) signals to itself. The changelog says:
>
>   The kernel prevents sending of siginfo with positive si_code, because
>   these codes are reserved for kernel.  I think we can allow a task to
>   send such a siginfo to itself.  This operation should not be dangerous.
>
> Quite contrary to that claim, it turns out that it is outright dangerous
> for signals with info->si_code == SI_TIMER. The following code sequence in
> a user space task allows to crash the kernel:
>
>    id = timer_create(CLOCK_XXX, ..... signo = SIGX);
>    timer_set(id, ....);
>    info->si_signo = SIGX;
>    info->si_code = SI_TIMER:
>    info->_sifields._timer._tid = id;
>    info->_sifields._timer._sys_private = 2;
>    rt_[tg]sigqueueinfo(..., SIGX, info);
>    sigemptyset(&sigset);
>    sigaddset(&sigset, SIGX);
>    rt_sigtimedwait(sigset, info);
>
> For timers based on CLOCK_PROCESS_CPUTIME_ID, CLOCK_THREAD_CPUTIME_ID this
> results in a kernel crash because sigwait() dequeues the signal and the
> dequeue code observes:
>
>   info->si_code == SI_TIMER && info->_sifields._timer._sys_private != 0
>
> which triggers the following callchain:
>
>  do_schedule_next_timer() -> posix_cpu_timer_schedule() -> arm_timer()
>
> arm_timer() executes a list_add() on the timer, which is already armed via
> the timer_set() syscall. That's a double list add which corrupts the posix
> cpu timer list. As a consequence the kernel crashes on the next operation
> touching the posix cpu timer list.
>
> Posix clocks which are internally implemented based on hrtimers are not
> affected by this because hrtimer_start() can handle already armed timers
> nicely, but it's a reliable way to trigger the WARN_ON() in
> hrtimer_forward(), which complains about calling that function on an
> already armed timer.

This problem has existed since the posix timer code was merged into
2.5.63. A few releases earlier in 2.5.60 ptrace gained the ability to
inject not just a signal (which linux has supported since 1.0) but the
full siginfo of a signal.

The core problem is that the code will reschedule in response to
signals getting dequeued not just for signals the timers sent but
for other signals that happen to a si_code of SI_TIMER.

Avoid this confusion by testing to see if the queued signal was
preallocated as all timer signals are preallocated, and so far
only the timer code preallocates signals.

Move the check for if a timer needs to be rescheduled up into
collect_signal where the preallocation check must be performed,
and pass the result back to dequeue_signal where the code reschedules
timers.   This makes it clear why the code cares about preallocated
timers.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Reference: 66dd34ad31 ("signal: allow to send any siginfo to itself")
Reference: 1669ce53e2ff ("Add PTRACE_GETSIGINFO and PTRACE_SETSIGINFO")
Fixes: db8b50ba75f2 ("[PATCH] POSIX clocks & timers")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29 12:48:51 +02:00
Greg Kroah-Hartman
77ddb50929 This is the 4.4.74 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllQl/sACgkQONu9yGCS
 aT5zMRAAuDBpWjQ1IFtgmzQnKGyjS3fm5X/EgPmT81PFKXay5/TH6Hc85TvorChk
 mCC7qybadCFPjieBfUeCGhTposiGkbOZdYIzduzLeHPe7Eda88NKJw5ZS3x+RDro
 if6BZNtQPwPk9jQ95zpBu/p6eCuIGFzQObif8XHga9eEVP+TPGDKFn5EdLM8j99t
 ErKYyTLFEiZYa52hpCBbVz/4mX8bJOoAlZaitcbvaFbG0OodA5SL24sKlr7tAPrM
 ajnuqv+ghOUjbXrUlrTGxCjJ7vCJjdBqNzuxVFNj5P1xDucpBW8uuWGob0XWTMbB
 hj/ToAIQXQXrZKFpASWW74B4QZDcjo7dbhDWOurBaAsyLuBzAi26pI+q6TqgCQUO
 k17ilfk9LVEvvFhiQ7xpJPNnkh6tCEk7Jdblru6ZL5fHCAYe+qUDj56TbqjFJCQK
 +bDzPi0QXkEGQNKxo7zDu5iGQ0Gb0zD2Z3MrGD+3pCkM5yG0PXjzZ7lOlboyPzwY
 88dxuuTRmm8yGEEm81BKmDYqAA1l4FCrap8u9FLoNyoZyMnK7B+SHHuPRBRhL3F2
 I3L/v8BbJhXTsDNPXEsXtpZZpn2wxJp4x4gKWmCcOb5MM1nbFrFtwdj0cKobu6Xe
 ygNMEkjlW2uUrZoDXthj1ICda/cEw/R0gMWzBeNNVfErOZEmFxM=
 =zl9i
 -----END PGP SIGNATURE-----

Merge 4.4.74 into android-4.4

Changes in 4.4.74
	configfs: Fix race between create_link and configfs_rmdir
	can: gs_usb: fix memory leak in gs_cmd_reset()
	cpufreq: conservative: Allow down_threshold to take values from 1 to 10
	vb2: Fix an off by one error in 'vb2_plane_vaddr'
	mac80211: don't look at the PM bit of BAR frames
	mac80211/wpa: use constant time memory comparison for MACs
	mac80211: fix CSA in IBSS mode
	mac80211: fix IBSS presp allocation size
	serial: efm32: Fix parity management in 'efm32_uart_console_get_options()'
	x86/mm/32: Set the '__vmalloc_start_set' flag in initmem_init()
	mfd: omap-usb-tll: Fix inverted bit use for USB TLL mode
	staging: rtl8188eu: prevent an underflow in rtw_check_beacon_data()
	iio: proximity: as3935: recalibrate RCO after resume
	USB: hub: fix SS max number of ports
	usb: core: fix potential memory leak in error path during hcd creation
	pvrusb2: reduce stack usage pvr2_eeprom_analyze()
	USB: gadget: dummy_hcd: fix hub-descriptor removable fields
	usb: r8a66597-hcd: select a different endpoint on timeout
	usb: r8a66597-hcd: decrease timeout
	drivers/misc/c2port/c2port-duramar2150.c: checking for NULL instead of IS_ERR()
	usb: xhci: ASMedia ASM1042A chipset need shorts TX quirk
	USB: gadgetfs, dummy-hcd, net2280: fix locking for callbacks
	mm/memory-failure.c: use compound_head() flags for huge pages
	swap: cond_resched in swap_cgroup_prepare()
	genirq: Release resources in __setup_irq() error path
	alarmtimer: Prevent overflow of relative timers
	usb: dwc3: exynos fix axius clock error path to do cleanup
	MIPS: Fix bnezc/jialc return address calculation
	alarmtimer: Rate limit periodic intervals
	mm: larger stack guard gap, between vmas
	Allow stack to grow up to address space limit
	mm: fix new crash in unmapped_area_topdown()
	Linux 4.4.74

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-06-27 09:47:59 +02:00
Thomas Gleixner
26605a06dd alarmtimer: Rate limit periodic intervals
commit ff86bf0c65f14346bf2440534f9ba5ac232c39a0 upstream.

The alarmtimer code has another source of potentially rearming itself too
fast. Interval timers with a very samll interval have a similar CPU hog
effect as the previously fixed overflow issue.

The reason is that alarmtimers do not implement the normal protection
against this kind of problem which the other posix timer use:

  timer expires -> queue signal -> deliver signal -> rearm timer

This scheme brings the rearming under scheduler control and prevents
permanently firing timers which hog the CPU.

Bringing this scheme to the alarm timer code is a major overhaul because it
lacks all the necessary mechanisms completely.

So for a quick fix limit the interval to one jiffie. This is not
problematic in practice as alarmtimers are usually backed by an RTC for
suspend which have 1 second resolution. It could be therefor argued that
the resolution of this clock should be set to 1 second in general, but
that's outside the scope of this fix.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kostya Serebryany <kcc@google.com>
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Link: http://lkml.kernel.org/r/20170530211655.896767100@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-26 07:13:11 +02:00
Thomas Gleixner
aac7fa215e alarmtimer: Prevent overflow of relative timers
commit f4781e76f90df7aec400635d73ea4c35ee1d4765 upstream.

Andrey reported a alartimer related RCU stall while fuzzing the kernel with
syzkaller.

The reason for this is an overflow in ktime_add() which brings the
resulting time into negative space and causes immediate expiry of the
timer. The following rearm with a small interval does not bring the timer
back into positive space due to the same issue.

This results in a permanent firing alarmtimer which hogs the CPU.

Use ktime_add_safe() instead which detects the overflow and clamps the
result to KTIME_SEC_MAX.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kostya Serebryany <kcc@google.com>
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Link: http://lkml.kernel.org/r/20170530211655.802921648@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-26 07:13:10 +02:00
Heiner Kallweit
4d4d501cd7 genirq: Release resources in __setup_irq() error path
commit fa07ab72cbb0d843429e61bf179308aed6cbe0dd upstream.

In case __irq_set_trigger() fails the resources requested via
irq_request_resources() are not released.

Add the missing release call into the error handling path.

Fixes: c1bacbae81 ("genirq: Provide irq_request/release_resources chip callbacks")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/655538f5-cb20-a892-ff15-fbd2dd1fa4ec@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-26 07:13:10 +02:00
Daniel Borkmann
6bb6b3e686 UPSTREAM: bpf: don't let ldimm64 leak map addresses on unprivileged
[ Upstream commit 0d0e57697f162da4aa218b5feafe614fb666db07 ]

The patch fixes two things at once:

1) It checks the env->allow_ptr_leaks and only prints the map address to
   the log if we have the privileges to do so, otherwise it just dumps 0
   as we would when kptr_restrict is enabled on %pK. Given the latter is
   off by default and not every distro sets it, I don't want to rely on
   this, hence the 0 by default for unprivileged.

2) Printing of ldimm64 in the verifier log is currently broken in that
   we don't print the full immediate, but only the 32 bit part of the
   first insn part for ldimm64. Thus, fix this up as well; it's okay to
   access, since we verified all ldimm64 earlier already (including just
   constants) through replace_map_fd_with_map_ptr().

Fixes: 1be7f75d16 ("bpf: enable non-root eBPF programs")
Fixes: cbd3570086 ("bpf: verifier (add ability to receive verification log)")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Bug: 62199770
Change-Id: I62ee47d06ddc669ba2863e8cf24f8f3e7683a461
2017-06-23 13:31:47 -07:00
Greg Kroah-Hartman
e76c0faf11 This is the 4.4.72 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllBIXAACgkQONu9yGCS
 aT6T+w//VjXDZ+MddWJ4UeQDyIANYeFpa4tJNoqR3JsnT6yg1HODRZDR7aP5QJmN
 GIoRWU/2Q2nmYbAO0c8RPxs07w2xtIZzTUn+H+i6sG7bRs5RbLM5AMg4W/A/X88L
 V5c34kCvCf1HRfrdd4rXIZiibFnSZGqUv6o1YyQqCIvx15pyB6elMM714zt8uubk
 iL4/WJ2M4SrmamHWA349ldEtPjQKpwpwdBcCn+M4awbimdc0pm8oZqNkAfwJ+vLO
 HsuClO57I699ESU2Zt5bfEdVsW/gc7WiJOAr1Mrl2suToryrWfs2YT+sC/IQhkfC
 gUsi9Cm/6YMu+tiP4o6aqYvTFoFplFErpEbC3mqAEvHGGHKhrgEDotYJ+FnvI3q7
 Jaxix0B/Q/NIqsJPnqe5ONOCKFmW7rGR2e2j5+45GuiofioNVNF12HWfQkoItPOL
 YeR2JB8K9aywzYM4gaJuy8ScJ1shN8TY1FKgZa5gBT2ym4pDDcQmxz7Jr7agREHe
 F2sJ23zMU+o9guGA4Is2yqWCQ5yM+3kpPPISz+Pcgh8Q95o+ftCSyOeB2F5roW8I
 EO22AlJPlQH0LWDQhOJ5ZuAVe+qB8EdrQqqdLbP4/oHp7MtlR5ge+idRuZc+AUsa
 UoASccPsEwHyBErQmHoWNI4nPRciFrKliOqERmPLcuzewUwSatw=
 =wXRR
 -----END PGP SIGNATURE-----

Merge 4.4.72 into android-4.4

Changes in 4.4.72
	bnx2x: Fix Multi-Cos
	ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt()
	cxgb4: avoid enabling napi twice to the same queue
	tcp: disallow cwnd undo when switching congestion control
	vxlan: fix use-after-free on deletion
	ipv6: Fix leak in ipv6_gso_segment().
	net: ping: do not abuse udp_poll()
	net: ethoc: enable NAPI before poll may be scheduled
	net: bridge: start hello timer only if device is up
	sparc64: mm: fix copy_tsb to correctly copy huge page TSBs
	sparc: Machine description indices can vary
	sparc64: reset mm cpumask after wrap
	sparc64: combine activate_mm and switch_mm
	sparc64: redefine first version
	sparc64: add per-cpu mm of secondary contexts
	sparc64: new context wrap
	sparc64: delete old wrap code
	arch/sparc: support NR_CPUS = 4096
	serial: ifx6x60: fix use-after-free on module unload
	ptrace: Properly initialize ptracer_cred on fork
	KEYS: fix dereferencing NULL payload with nonzero length
	KEYS: fix freeing uninitialized memory in key_update()
	crypto: gcm - wait for crypto op not signal safe
	drm/amdgpu/ci: disable mclk switching for high refresh rates (v2)
	nfsd4: fix null dereference on replay
	nfsd: Fix up the "supattr_exclcreat" attributes
	kvm: async_pf: fix rcu_irq_enter() with irqs enabled
	KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation
	arm: KVM: Allow unaligned accesses at HYP
	KVM: async_pf: avoid async pf injection when in guest mode
	dmaengine: usb-dmac: Fix DMAOR AE bit definition
	dmaengine: ep93xx: Always start from BASE0
	xen/privcmd: Support correctly 64KB page granularity when mapping memory
	xen-netfront: do not cast grant table reference to signed short
	xen-netfront: cast grant table reference first to type int
	ext4: fix SEEK_HOLE
	ext4: keep existing extra fields when inode expands
	ext4: fix fdatasync(2) after extent manipulation operations
	usb: gadget: f_mass_storage: Serialize wake and sleep execution
	usb: chipidea: udc: fix NULL pointer dereference if udc_start failed
	usb: chipidea: debug: check before accessing ci_role
	staging/lustre/lov: remove set_fs() call from lov_getstripe()
	iio: light: ltr501 Fix interchanged als/ps register field
	iio: proximity: as3935: fix AS3935_INT mask
	drivers: char: random: add get_random_long()
	random: properly align get_random_int_hash
	stackprotector: Increase the per-task stack canary's random range from 32 bits to 64 bits on 64-bit platforms
	cpufreq: cpufreq_register_driver() should return -ENODEV if init fails
	target: Re-add check to reject control WRITEs with overflow data
	drm/msm: Expose our reservation object when exporting a dmabuf.
	Input: elantech - add Fujitsu Lifebook E546/E557 to force crc_enabled
	cpuset: consider dying css as offline
	fs: add i_blocksize()
	ufs: restore proper tail allocation
	fix ufs_isblockset()
	ufs: restore maintaining ->i_blocks
	ufs: set correct ->s_maxsize
	ufs_extend_tail(): fix the braino in calling conventions of ufs_new_fragments()
	ufs_getfrag_block(): we only grab ->truncate_mutex on block creation path
	cxl: Fix error path on bad ioctl
	btrfs: use correct types for page indices in btrfs_page_exists_in_range
	btrfs: fix memory leak in update_space_info failure path
	KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages
	scsi: qla2xxx: don't disable a not previously enabled PCI device
	powerpc/eeh: Avoid use after free in eeh_handle_special_event()
	powerpc/numa: Fix percpu allocations to be NUMA aware
	powerpc/hotplug-mem: Fix missing endian conversion of aa_index
	perf/core: Drop kernel samples even though :u is specified
	drm/vmwgfx: Handle vmalloc() failure in vmw_local_fifo_reserve()
	drm/vmwgfx: limit the number of mip levels in vmw_gb_surface_define_ioctl()
	drm/vmwgfx: Make sure backup_handle is always valid
	drm/nouveau/tmr: fully separate alarm execution/pending lists
	ALSA: timer: Fix race between read and ioctl
	ALSA: timer: Fix missing queue indices reset at SNDRV_TIMER_IOCTL_SELECT
	ASoC: Fix use-after-free at card unregistration
	drivers: char: mem: Fix wraparound check to allow mappings up to the end
	tty: Drop krefs for interrupted tty lock
	serial: sh-sci: Fix panic when serial console and DMA are enabled
	net: better skb->sender_cpu and skb->napi_id cohabitation
	mm: consider memblock reservations for deferred memory initialization sizing
	NFS: Ensure we revalidate attributes before using execute_ok()
	NFSv4: Don't perform cached access checks before we've OPENed the file
	Make __xfs_xattr_put_listen preperly report errors.
	arm64: hw_breakpoint: fix watchpoint matching for tagged pointers
	arm64: entry: improve data abort handling of tagged pointers
	RDMA/qib,hfi1: Fix MR reference count leak on write with immediate
	usercopy: Adjust tests to deal with SMAP/PAN
	arm64: armv8_deprecated: ensure extension of addr
	arm64: ensure extension of smp_store_release value
	Linux 4.4.72

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-06-14 16:33:25 +02:00
Jin Yao
e582b82c16 perf/core: Drop kernel samples even though :u is specified
commit cc1582c231ea041fbc68861dfaf957eaf902b829 upstream.

When doing sampling, for example:

  perf record -e cycles:u ...

On workloads that do a lot of kernel entry/exits we see kernel
samples, even though :u is specified. This is due to skid existing.

This might be a security issue because it can leak kernel addresses even
though kernel sampling support is disabled.

The patch drops the kernel samples if exclude_kernel is specified.

For example, test on Haswell desktop:

  perf record -e cycles:u <mgen>
  perf report --stdio

Before patch applied:

    99.77%  mgen     mgen              [.] buf_read
     0.20%  mgen     mgen              [.] rand_buf_init
     0.01%  mgen     [kernel.vmlinux]  [k] apic_timer_interrupt
     0.00%  mgen     mgen              [.] last_free_elem
     0.00%  mgen     libc-2.23.so      [.] __random_r
     0.00%  mgen     libc-2.23.so      [.] _int_malloc
     0.00%  mgen     mgen              [.] rand_array_init
     0.00%  mgen     [kernel.vmlinux]  [k] page_fault
     0.00%  mgen     libc-2.23.so      [.] __random
     0.00%  mgen     libc-2.23.so      [.] __strcasestr
     0.00%  mgen     ld-2.23.so        [.] strcmp
     0.00%  mgen     ld-2.23.so        [.] _dl_start
     0.00%  mgen     libc-2.23.so      [.] sched_setaffinity@@GLIBC_2.3.4
     0.00%  mgen     ld-2.23.so        [.] _start

We can see kernel symbols apic_timer_interrupt and page_fault.

After patch applied:

    99.79%  mgen     mgen           [.] buf_read
     0.19%  mgen     mgen           [.] rand_buf_init
     0.00%  mgen     libc-2.23.so   [.] __random_r
     0.00%  mgen     mgen           [.] rand_array_init
     0.00%  mgen     mgen           [.] last_free_elem
     0.00%  mgen     libc-2.23.so   [.] vfprintf
     0.00%  mgen     libc-2.23.so   [.] rand
     0.00%  mgen     libc-2.23.so   [.] __random
     0.00%  mgen     libc-2.23.so   [.] _int_malloc
     0.00%  mgen     libc-2.23.so   [.] _IO_doallocbuf
     0.00%  mgen     ld-2.23.so     [.] do_lookup_x
     0.00%  mgen     ld-2.23.so     [.] open_verify.constprop.7
     0.00%  mgen     ld-2.23.so     [.] _dl_important_hwcaps
     0.00%  mgen     libc-2.23.so   [.] sched_setaffinity@@GLIBC_2.3.4
     0.00%  mgen     ld-2.23.so     [.] _start

There are only userspace symbols.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Cc: kan.liang@intel.com
Cc: mark.rutland@arm.com
Cc: will.deacon@arm.com
Cc: yao.jin@intel.com
Link: http://lkml.kernel.org/r/1495706947-3744-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 13:16:25 +02:00
Tejun Heo
c8acec90d9 cpuset: consider dying css as offline
commit 41c25707d21716826e3c1f60967f5550610ec1c9 upstream.

In most cases, a cgroup controller don't care about the liftimes of
cgroups.  For the controller, a css becomes online when ->css_online()
is called on it and offline when ->css_offline() is called.

However, cpuset is special in that the user interface it exposes cares
whether certain cgroups exist or not.  Combined with the RCU delay
between cgroup removal and css offlining, this can lead to user
visible behavior oddities where operations which should succeed after
cgroup removals fail for some time period.  The effects of cgroup
removals are delayed when seen from userland.

This patch adds css_is_dying() which tests whether offline is pending
and updates is_cpuset_online() so that the function returns false also
while offline is pending.  This gets rid of the userland visible
delays.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Link: http://lkml.kernel.org/r/327ca1f5-7957-fbb9-9e5f-9ba149d40ba2@oracle.com
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 13:16:23 +02:00
Daniel Micay
2ff1edbbb2 stackprotector: Increase the per-task stack canary's random range from 32 bits to 64 bits on 64-bit platforms
commit 5ea30e4e58040cfd6434c2f33dc3ea76e2c15b05 upstream.

The stack canary is an 'unsigned long' and should be fully initialized to
random data rather than only 32 bits of random data.

Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Arjan van Ven <arjan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170504133209.3053-1-danielmicay@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 13:16:23 +02:00