Commit graph

435420 commits

Author SHA1 Message Date
Huacai Chen
c14af233fb MIPS: Hibernate: Flush TLB entries in swsusp_arch_resume()
The original MIPS hibernate code flushes cache and TLB entries in
swsusp_arch_resume(). But they are removed in Commit 44eeab6741
(MIPS: Hibernation: Remove SMP TLB and cacheflushing code.). A cross-
CPU flush is surely unnecessary because all but the local CPU have
already been disabled. But a local flush (at least the TLB flush) is
needed. When we do hibernation on Loongson-3 with an E1000E NIC, it is
very easy to produce a kernel panic (kernel page fault, or unaligned
access). The root cause is E1000E driver use vzalloc_node() to allocate
pages, the stale TLB entries of the booting kernel will be misused by
the resumed target kernel.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
Cc: John Crispin <john@phrozen.org>
Cc: Steven J. Hill <Steven.Hill@imgtec.com>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: linux-mips@linux-mips.org
Cc: Fuxin Zhang <zhangfx@lemote.com>
Cc: Zhangjin Wu <wuzhangjin@gmail.com>
Cc: stable@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/6643/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2014-03-31 10:16:53 +02:00
Manuel Lauss
61d3edb862 MIPS: Alchemy: remove duplicate UART register offset definitions
The UART register names are identical to the ones in uapi/linux/serial_reg.h,
which causes build failures in various drivers when they indirectly pull in
the au1000.h header, for example via gpio.h:

In file included from arch/mips/include/asm/mach-au1x00/gpio.h:13:0,
                 from arch/mips/include/asm/gpio.h:4,
                 from include/linux/gpio.h:48,
                 from include/linux/ssb/ssb.h:9,
                 from drivers/ssb/driver_mipscore.c:11:
arch/mips/include/asm/mach-au1x00/au1000.h:1171:0: note: this is the location of the previous definition
 #define UART_LSR 0x1C /* Line Status Register */

Get rid of the altogether, nothing in the core Alchemy code depends
on them any more.

Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Cc: Linux-MIPS <linux-mips@linux-mips.org>
Patchwork: https://patchwork.linux-mips.org/patch/6664/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2014-03-31 10:16:53 +02:00
Ralf Baechle
e60865572f MIPS: Fix build error due to multiple prom_putchar() definitions.
This can happen if both the generic 8250 and another early console
driver are enable.  Fixed by using an auxilliary kconfig symbol to
restrict that choice.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2014-03-31 10:16:52 +02:00
Grant Likely
d88cf7d7b4 Merge remote-tracking branch 'robh/for-next' into devicetree/next 2014-03-31 08:10:55 +01:00
Chen Gang
2601566d01 arch/avr32/mm/cache.c: export symbol flush_icache_range() for module using
Need export symbol flush_icache_range() to modules, just like another
platforms have done, or can not pass compiling.

The related error (with allmodconfig under avr32):

  ERROR: "flush_icache_range" [drivers/misc/lkdtm.ko] undefined!
  make[1]: *** [__modpost] Error 1
  make: *** [modules] Error 2

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
2014-03-31 08:41:52 +02:00
Wolfram Sang
8acc8722d3 avr32: remove cpu_data macro to fix compiles
Having cpu_data as a parameterless macro can easily cause build failures
because it can be a variable name like in linux/pm_domain.h [1]. So,
remove the macro and convert its only user. Because this architecture
cannot do SMP, remove the whole SMP block, too. Only compile tested due
to no hardware.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>

[1] https://lists.01.org/pipermail/kbuild-all/2014-February/003252.html
2014-03-31 08:41:36 +02:00
David S. Miller
9109e17f7c Merge branch 'filter-next'
Daniel Borkmann says:

====================
BPF updates

We sat down and have heavily reworked the whole previous patchset
from v10 [1] to address all comments/concerns. This patchset therefore
*replaces* the internal BPF interpreter with the new layout as
discussed in [1], and migrates some exotic callers to properly use the
BPF API for a transparent upgrade. All other callers that already use
the BPF API in a way it should be used, need no further changes to run
the new internals. We also removed the sysctl knob entirely, and do not
expose any structure to userland, so that implementation details only
reside in kernel space. Since we are replacing the interpreter we had
to migrate seccomp in one patch along with the interpreter to not break
anything. When attaching a new filter, the flow can be described as
following: i) test if jit compiler is enabled and can compile the user
BPF, ii) if so, then go for it, iii) if not, then transparently migrate
the filter into the new representation, and run it in the interpreter.
Also, we have scratched the jit flag from the len attribute and made it
as initial patch in this series as Pablo has suggested in the last
feedback, thanks. For details, please refer to the patches themselves.

We did extensive testing of BPF and seccomp on the new interpreter
itself and also on the user ABIs and could not find any issues; new
performance numbers as posted in patch 8 are also still the same.

Please find more details in the patches themselves.

For all the previous history from v1 to v10, see [1]. We have decided
to drop the v11 as we have pedantically reworked the set, but of course,
included all previous feedback.

v3 -> v4:
 - Applied feedback from Dave regarding swap insns
 - Rebased on net-next
v2 -> v3:
 - Rebased to latest net-next (i.e. w/ rxhash->hash rename)
 - Fixed patch 8/9 commit message/doc as suggested by Dave
 - Rest is unchanged
v1 -> v2:
 - Rebased to latest net-next
 - Added static to ptp_filter as suggested by Dave
 - Fixed a typo in patch 8's commit message
 - Rest unchanged

Thanks !

  [1] http://thread.gmane.org/gmane.linux.kernel/1665858
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 00:45:49 -04:00
Alexei Starovoitov
9a985cdc5c doc: filter: extend BPF documentation to document new internals
Further extend the current BPF documentation to document new BPF
engine internals. Joint work with Daniel Borkmann.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 00:45:09 -04:00
Alexei Starovoitov
bd4cf0ed33 net: filter: rework/optimize internal BPF interpreter's instruction set
This patch replaces/reworks the kernel-internal BPF interpreter with
an optimized BPF instruction set format that is modelled closer to
mimic native instruction sets and is designed to be JITed with one to
one mapping. Thus, the new interpreter is noticeably faster than the
current implementation of sk_run_filter(); mainly for two reasons:

1. Fall-through jumps:

  BPF jump instructions are forced to go either 'true' or 'false'
  branch which causes branch-miss penalty. The new BPF jump
  instructions have only one branch and fall-through otherwise,
  which fits the CPU branch predictor logic better. `perf stat`
  shows drastic difference for branch-misses between the old and
  new code.

2. Jump-threaded implementation of interpreter vs switch
   statement:

  Instead of single table-jump at the top of 'switch' statement,
  gcc will now generate multiple table-jump instructions, which
  helps CPU branch predictor logic.

Note that the verification of filters is still being done through
sk_chk_filter() in classical BPF format, so filters from user- or
kernel space are verified in the same way as we do now, and same
restrictions/constraints hold as well.

We reuse current BPF JIT compilers in a way that this upgrade would
even be fine as is, but nevertheless allows for a successive upgrade
of BPF JIT compilers to the new format.

The internal instruction set migration is being done after the
probing for JIT compilation, so in case JIT compilers are able to
create a native opcode image, we're going to use that, and in all
other cases we're doing a follow-up migration of the BPF program's
instruction set, so that it can be transparently run in the new
interpreter.

In short, the *internal* format extends BPF in the following way (more
details can be taken from the appended documentation):

  - Number of registers increase from 2 to 10
  - Register width increases from 32-bit to 64-bit
  - Conditional jt/jf targets replaced with jt/fall-through
  - Adds signed > and >= insns
  - 16 4-byte stack slots for register spill-fill replaced
    with up to 512 bytes of multi-use stack space
  - Introduction of bpf_call insn and register passing convention
    for zero overhead calls from/to other kernel functions
  - Adds arithmetic right shift and endianness conversion insns
  - Adds atomic_add insn
  - Old tax/txa insns are replaced with 'mov dst,src' insn

Performance of two BPF filters generated by libpcap resp. bpf_asm
was measured on x86_64, i386 and arm32 (other libpcap programs
have similar performance differences):

fprog #1 is taken from Documentation/networking/filter.txt:
tcpdump -i eth0 port 22 -dd

fprog #2 is taken from 'man tcpdump':
tcpdump -i eth0 'tcp port 22 and (((ip[2:2] - ((ip[0]&0xf)<<2)) -
   ((tcp[12]&0xf0)>>2)) != 0)' -dd

Raw performance data from BPF micro-benchmark: SK_RUN_FILTER on the
same SKB (cache-hit) or 10k SKBs (cache-miss); time in ns per call,
smaller is better:

--x86_64--
         fprog #1  fprog #1   fprog #2  fprog #2
         cache-hit cache-miss cache-hit cache-miss
old BPF      90       101        192       202
new BPF      31        71         47        97
old BPF jit  12        34         17        44
new BPF jit TBD

--i386--
         fprog #1  fprog #1   fprog #2  fprog #2
         cache-hit cache-miss cache-hit cache-miss
old BPF     107       136        227       252
new BPF      40       119         69       172

--arm32--
         fprog #1  fprog #1   fprog #2  fprog #2
         cache-hit cache-miss cache-hit cache-miss
old BPF     202       300        475       540
new BPF     180       270        330       470
old BPF jit  26       182         37       202
new BPF jit TBD

Thus, without changing any userland BPF filters, applications on
top of AF_PACKET (or other families) such as libpcap/tcpdump, cls_bpf
classifier, netfilter's xt_bpf, team driver's load-balancing mode,
and many more will have better interpreter filtering performance.

While we are replacing the internal BPF interpreter, we also need
to convert seccomp BPF in the same step to make use of the new
internal structure since it makes use of lower-level API details
without being further decoupled through higher-level calls like
sk_unattached_filter_{create,destroy}(), for example.

Just as for normal socket filtering, also seccomp BPF experiences
a time-to-verdict speedup:

05-sim-long_jumps.c of libseccomp was used as micro-benchmark:

  seccomp_rule_add_exact(ctx,...
  seccomp_rule_add_exact(ctx,...

  rc = seccomp_load(ctx);

  for (i = 0; i < 10000000; i++)
     syscall(199, 100);

'short filter' has 2 rules
'large filter' has 200 rules

'short filter' performance is slightly better on x86_64/i386/arm32
'large filter' is much faster on x86_64 and i386 and shows no
               difference on arm32

--x86_64-- short filter
old BPF: 2.7 sec
 39.12%  bench  libc-2.15.so       [.] syscall
  8.10%  bench  [kernel.kallsyms]  [k] sk_run_filter
  6.31%  bench  [kernel.kallsyms]  [k] system_call
  5.59%  bench  [kernel.kallsyms]  [k] trace_hardirqs_on_caller
  4.37%  bench  [kernel.kallsyms]  [k] trace_hardirqs_off_caller
  3.70%  bench  [kernel.kallsyms]  [k] __secure_computing
  3.67%  bench  [kernel.kallsyms]  [k] lock_is_held
  3.03%  bench  [kernel.kallsyms]  [k] seccomp_bpf_load
new BPF: 2.58 sec
 42.05%  bench  libc-2.15.so       [.] syscall
  6.91%  bench  [kernel.kallsyms]  [k] system_call
  6.25%  bench  [kernel.kallsyms]  [k] trace_hardirqs_on_caller
  6.07%  bench  [kernel.kallsyms]  [k] __secure_computing
  5.08%  bench  [kernel.kallsyms]  [k] sk_run_filter_int_seccomp

--arm32-- short filter
old BPF: 4.0 sec
 39.92%  bench  [kernel.kallsyms]  [k] vector_swi
 16.60%  bench  [kernel.kallsyms]  [k] sk_run_filter
 14.66%  bench  libc-2.17.so       [.] syscall
  5.42%  bench  [kernel.kallsyms]  [k] seccomp_bpf_load
  5.10%  bench  [kernel.kallsyms]  [k] __secure_computing
new BPF: 3.7 sec
 35.93%  bench  [kernel.kallsyms]  [k] vector_swi
 21.89%  bench  libc-2.17.so       [.] syscall
 13.45%  bench  [kernel.kallsyms]  [k] sk_run_filter_int_seccomp
  6.25%  bench  [kernel.kallsyms]  [k] __secure_computing
  3.96%  bench  [kernel.kallsyms]  [k] syscall_trace_exit

--x86_64-- large filter
old BPF: 8.6 seconds
    73.38%    bench  [kernel.kallsyms]  [k] sk_run_filter
    10.70%    bench  libc-2.15.so       [.] syscall
     5.09%    bench  [kernel.kallsyms]  [k] seccomp_bpf_load
     1.97%    bench  [kernel.kallsyms]  [k] system_call
new BPF: 5.7 seconds
    66.20%    bench  [kernel.kallsyms]  [k] sk_run_filter_int_seccomp
    16.75%    bench  libc-2.15.so       [.] syscall
     3.31%    bench  [kernel.kallsyms]  [k] system_call
     2.88%    bench  [kernel.kallsyms]  [k] __secure_computing

--i386-- large filter
old BPF: 5.4 sec
new BPF: 3.8 sec

--arm32-- large filter
old BPF: 13.5 sec
 73.88%  bench  [kernel.kallsyms]  [k] sk_run_filter
 10.29%  bench  [kernel.kallsyms]  [k] vector_swi
  6.46%  bench  libc-2.17.so       [.] syscall
  2.94%  bench  [kernel.kallsyms]  [k] seccomp_bpf_load
  1.19%  bench  [kernel.kallsyms]  [k] __secure_computing
  0.87%  bench  [kernel.kallsyms]  [k] sys_getuid
new BPF: 13.5 sec
 76.08%  bench  [kernel.kallsyms]  [k] sk_run_filter_int_seccomp
 10.98%  bench  [kernel.kallsyms]  [k] vector_swi
  5.87%  bench  libc-2.17.so       [.] syscall
  1.77%  bench  [kernel.kallsyms]  [k] __secure_computing
  0.93%  bench  [kernel.kallsyms]  [k] sys_getuid

BPF filters generated by seccomp are very branchy, so the new
internal BPF performance is better than the old one. Performance
gains will be even higher when BPF JIT is committed for the
new structure, which is planned in future work (as successive
JIT migrations).

BPF has also been stress-tested with trinity's BPF fuzzer.

Joint work with Daniel Borkmann.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <pmoore@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: linux-kernel@vger.kernel.org
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 00:45:09 -04:00
Daniel Borkmann
77e0114ae9 net: isdn: use sk_unattached_filter api
Similarly as in ppp, we need to migrate the ISDN/PPP code to make use
of the sk_unattached_filter api in order to decouple having direct
filter structure access. By using sk_unattached_filter_{create,destroy},
we can allow for the possibility to jit compile filters for faster
filter verdicts as well.

Joint work with Alexei Starovoitov.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: isdn4linux@listserv.isdn4linux.de
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 00:45:09 -04:00
Daniel Borkmann
568f194e8b net: ppp: use sk_unattached_filter api
For the ppp driver, there are currently two open-coded BPF filters in use,
that is, pass_filter and active_filter. Migrate both to make proper use
of sk_unattached_filter_{create,destroy} API so that the actual BPF code
is decoupled from direct access, and filters can be jited as a side-effect
by the internal filter compiler.

Joint work with Alexei Starovoitov.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linux-ppp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 00:45:09 -04:00
Daniel Borkmann
164d8c6665 net: ptp: do not reimplement PTP/BPF classifier
There are currently pch_gbe, cpts, and ixp4xx_eth drivers that open-code
and reimplement a BPF classifier for the PTP protocol. Since all of them
effectively do the very same thing and load the very same PTP/BPF filter,
we can just consolidate that code by introducing ptp_classify_raw() in
the time-stamping core framework which can be used in drivers.

As drivers get initialized after bootstrapping the core networking
subsystem, they can make use of ptp_insns wrapped through
ptp_classify_raw(), which allows to simplify and remove PTP classifier
setup code in drivers.

Joint work with Alexei Starovoitov.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Richard Cochran <richard.cochran@omicron.at>
Cc: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 00:45:09 -04:00
Daniel Borkmann
e62d2df084 net: ptp: use sk_unattached_filter_create() for BPF
This patch migrates an open-coded sk_run_filter() implementation with
proper use of the BPF API, that is, sk_unattached_filter_create(). This
migration is needed, as we will be internally transforming the filter
to a different representation, and therefore needs to be decoupled.

It is okay to do so as skb_timestamping_init() is called during
initialization of the network stack in core initcall via sock_init().
This would effectively also allow for PTP filters to be jit compiled if
bpf_jit_enable is set.

For better readability, there are also some newlines introduced, also
ptp_classify.h is only in kernel space.

Joint work with Alexei Starovoitov.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Richard Cochran <richard.cochran@omicron.at>
Cc: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 00:45:09 -04:00
Daniel Borkmann
fbc907f0b1 net: filter: move filter accounting to filter core
This patch basically does two things, i) removes the extern keyword
from the include/linux/filter.h file to be more consistent with the
rest of Joe's changes, and ii) moves filter accounting into the filter
core framework.

Filter accounting mainly done through sk_filter_{un,}charge() take
care of the case when sockets are being cloned through sk_clone_lock()
so that removal of the filter on one socket won't result in eviction
as it's still referenced by the other.

These functions actually belong to net/core/filter.c and not
include/net/sock.h as we want to keep all that in a central place.
It's also not in fast-path so uninlining them is fine and even allows
us to get rd of sk_filter_release_rcu()'s EXPORT_SYMBOL and a forward
declaration.

Joint work with Alexei Starovoitov.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 00:45:09 -04:00
Daniel Borkmann
a3ea269b8b net: filter: keep original BPF program around
In order to open up the possibility to internally transform a BPF program
into an alternative and possibly non-trivial reversible representation, we
need to keep the original BPF program around, so that it can be passed back
to user space w/o the need of a complex decoder.

The reason for that use case resides in commit a8fc927780 ("sk-filter:
Add ability to get socket filter program (v2)"), that is, the ability
to retrieve the currently attached BPF filter from a given socket used
mainly by the checkpoint-restore project, for example.

Therefore, we add two helpers sk_{store,release}_orig_filter for taking
care of that. In the sk_unattached_filter_create() case, there's no such
possibility/requirement to retrieve a loaded BPF program. Therefore, we
can spare us the work in that case.

This approach will simplify and slightly speed up both, sk_get_filter()
and sock_diag_put_filterinfo() handlers as we won't need to successively
decode filters anymore through sk_decode_filter(). As we still need
sk_decode_filter() later on, we're keeping it around.

Joint work with Alexei Starovoitov.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 00:45:09 -04:00
Daniel Borkmann
f8bbbfc3b9 net: filter: add jited flag to indicate jit compiled filters
This patch adds a jited flag into sk_filter struct in order to indicate
whether a filter is currently jited or not. The size of sk_filter is
not being expanded as the 32 bit 'len' member allows upper bits to be
reused since a filter can currently only grow as large as BPF_MAXINSNS.

Therefore, there's enough room also for other in future needed flags to
reuse 'len' field if necessary. The jited flag also allows for having
alternative interpreter functions running as currently, we can only
detect jit compiled filters by testing fp->bpf_func to not equal the
address of sk_run_filter().

Joint work with Alexei Starovoitov.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 00:45:08 -04:00
Linus Torvalds
455c6fdbd2 Linux 3.14 2014-03-30 20:40:15 -07:00
Linus Torvalds
fedc1ed0f1 Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Switch mnt_hash to hlist, turning the races between __lookup_mnt() and
  hash modifications into false negatives from __lookup_mnt() (instead
  of hangs)"

On the false negatives from __lookup_mnt():
 "The *only* thing we care about is not getting stuck in __lookup_mnt().
  If it misses an entry because something in front of it just got moved
  around, etc, we are fine.  We'll notice that mount_lock mismatch and
  that'll be it"

* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  switch mnt_hash to hlist
  don't bother with propagate_mnt() unless the target is shared
  keep shadowed vfsmounts together
  resizable namespace.c hashes
2014-03-30 17:26:08 -07:00
Randy Dunlap
01358e562a MAINTAINERS: resume as Documentation maintainer
I am the new kernel tree Documentation maintainer (except for parts that
are handled by other people, of course).

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Rob Landley <rob@landley.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-30 17:22:28 -07:00
Linus Torvalds
915ac4e26e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
 "Some more updates for the input subsystem.

  You will get a fix for race in mousedev that has been causing quite a
  few oopses lately and a small fixup for force feedback support in
  evdev"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: mousedev - fix race when creating mixed device
  Input: don't modify the id of ioctl-provided ff effect on upload failure
2014-03-30 17:20:40 -07:00
Eric Paris
aa4af831bb AUDIT: Allow login in non-init namespaces
It its possible to configure your PAM stack to refuse login if audit
messages (about the login) were unable to be sent.  This is common in
many distros and thus normal configuration of many containers.  The PAM
modules determine if audit is enabled/disabled in the kernel based on
the return value from sending an audit message on the netlink socket.
If userspace gets back ECONNREFUSED it believes audit is disabled in the
kernel.  If it gets any other error else it refuses to let the login
proceed.

Just about ever since the introduction of namespaces the kernel audit
subsystem has returned EPERM if the task sending a message was not in
the init user or pid namespace.  So many forms of containers have never
worked if audit was enabled in the kernel.

BUT if the container was not in net_init then the kernel network code
would send ECONNREFUSED (instead of the audit code sending EPERM).  Thus
by pure accident/dumb luck/bug if an admin configured the PAM stack to
reject all logins that didn't talk to audit, but then ran the login
untility in the non-init_net namespace, it would work!! Clearly this was
a bug, but it is a bug some people expected.

With the introduction of network namespace support in 3.14-rc1 the two
bugs stopped cancelling each other out.  Now, containers in the
non-init_net namespace refused to let users log in (just like PAM was
configfured!) Obviously some people were not happy that what used to let
users log in, now didn't!

This fix is kinda hacky.  We return ECONNREFUSED for all non-init
relevant namespaces.  That means that not only will the old broken
non-init_net setups continue to work, now the broken non-init_pid or
non-init_user setups will 'work'.  They don't really work, since audit
isn't logging things.  But it's what most users want.

In 3.15 we should have patches to support not only the non-init_net
(3.14) namespace but also the non-init_pid and non-init_user namespace.
So all will be right in the world.  This just opens the doors wide open
on 3.14 and hopefully makes users happy, if not the audit system...

Reported-by: Andre Tomt <andre@tomt.net>
Reported-by: Adam Richter <adam_richter2004@yahoo.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-30 17:02:53 -07:00
Theodore Ts'o
00a1a053eb ext4: atomically set inode->i_flags in ext4_set_inode_flags()
Use cmpxchg() to atomically set i_flags instead of clearing out the
S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the
EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race
where an immutable file has the immutable flag cleared for a brief
window of time.

Reported-by: John Sullivan <jsrhbz@kanargh.force9.co.uk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-30 17:02:06 -07:00
Al Viro
38129a13e6 switch mnt_hash to hlist
fixes RCU bug - walking through hlist is safe in face of element moves,
since it's self-terminating.  Cyclic lists are not - if we end up jumping
to another hash chain, we'll loop infinitely without ever hitting the
original list head.

[fix for dumb braino folded]

Spotted by: Max Kellermann <mk@cm4all.com>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-03-30 19:18:51 -04:00
Al Viro
0b1b901b5a don't bother with propagate_mnt() unless the target is shared
If the dest_mnt is not shared, propagate_mnt() does nothing -
there's no mounts to propagate to and thus no copies to create.
Might as well don't bother calling it in that case.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-03-30 19:18:50 -04:00
Al Viro
1d6a32acd7 keep shadowed vfsmounts together
preparation to switching mnt_hash to hlist

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-03-30 19:18:50 -04:00
Al Viro
0818bf27c0 resizable namespace.c hashes
* switch allocation to alloc_large_system_hash()
* make sizes overridable by boot parameters (mhash_entries=, mphash_entries=)
* switch mountpoint_hashtable from list_head to hlist_head

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-03-30 19:18:49 -04:00
Andy Lutomirski
37c975545e x86, vdso: Fix the symbol versions on the 32-bit vDSO
The new symbols provide the same API as the 64-bit variants, so they
should have the same symbol version name.  This can't break
userspace, since these symbols are new for 32-bit Linux.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Stefani Seibold <stefani@seibold.net>
Link: http://lkml.kernel.org/r/0a869bce03d25619565b1eee7d69a4fd15fd203a.1396124118.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-03-30 10:08:38 -07:00
Mark Brown
45b15d98a9 Merge remote-tracking branches 'spi/topic/xilinx' and 'spi/topic/xtfpga' into spi-next 2014-03-30 00:51:48 +00:00
Mark Brown
b1ad487c42 Merge remote-tracking branches 'spi/topic/sunxi', 'spi/topic/tegra114', 'spi/topic/ti-qspi', 'spi/topic/ti-ssp', 'spi/topic/topcliff-pch', 'spi/topic/txx9', 'spi/topic/xcomm' and 'spi/topic/xfer' into spi-next 2014-03-30 00:51:41 +00:00
Mark Brown
81235b4ea3 Merge remote-tracking branches 'spi/topic/s3c64xx', 'spi/topic/sc18is602', 'spi/topic/sh-hspi', 'spi/topic/sh-msiof', 'spi/topic/sh-sci', 'spi/topic/sirf' and 'spi/topic/spidev' into spi-next 2014-03-30 00:51:34 +00:00
Mark Brown
1752368064 Merge remote-tracking branches 'spi/topic/omap-uwire', 'spi/topic/omap100k', 'spi/topic/omap2', 'spi/topic/orion', 'spi/topic/pl022', 'spi/topic/qup', 'spi/topic/rspi' and 'spi/topic/s3c24xx' into spi-next 2014-03-30 00:51:27 +00:00
Mark Brown
6e07b9179a Merge remote-tracking branches 'spi/topic/imx', 'spi/topic/init', 'spi/topic/mpc512x-psc', 'spi/topic/mpc52xx', 'spi/topic/mxs', 'spi/topic/nuc900', 'spi/topic/oc-tiny' and 'spi/topic/octeon' into spi-next 2014-03-30 00:51:17 +00:00
Mark Brown
3bcbc14911 Merge remote-tracking branches 'spi/topic/drivers', 'spi/topic/dw', 'spi/topic/efm32', 'spi/topic/ep93xx', 'spi/topic/fsl', 'spi/topic/fsl-dspi', 'spi/topic/fsl-espi' and 'spi/topic/gpio' into spi-next 2014-03-30 00:51:10 +00:00
Mark Brown
9dee279b40 Merge remote-tracking branches 'spi/topic/bus-num', 'spi/topic/cleanup', 'spi/topic/clps711x', 'spi/topic/coldfire', 'spi/topic/completion' and 'spi/topic/davinci' into spi-next 2014-03-30 00:51:03 +00:00
Mark Brown
0f38af451f Merge remote-tracking branches 'spi/topic/altera', 'spi/topic/atmel', 'spi/topic/au1550', 'spi/topic/bcm63xx', 'spi/topic/bcm63xx-hsspi', 'spi/topic/bfin5xx', 'spi/topic/bitbang' and 'spi/topic/bpw' into spi-next 2014-03-30 00:50:57 +00:00
Mark Brown
a78389844e Merge remote-tracking branch 'spi/topic/dma' into spi-next 2014-03-30 00:50:56 +00:00
Mark Brown
5d0eb26ce8 Merge remote-tracking branch 'spi/topic/core' into spi-next 2014-03-30 00:50:55 +00:00
Mark Brown
ff13a9a6a7 Merge remote-tracking branch 'spi/fix/core' into spi-linus 2014-03-30 00:50:53 +00:00
Mark Brown
0b73aa63c1 spi: Fix handling of cs_change in core implementation
The core implementation of cs_change didn't follow the documentation
which says that cs_change in the middle of the transfer means to briefly
deassert chip select, instead it followed buggy drivers which change the
polarity of chip select.  Use a delay of 10us between deassert and
reassert simply from pulling numbers out of a hat.

Reported-by: Gerhard Sittig <gsi@denx.de>
Signed-off-by: Mark Brown <broonie@linaro.org>
2014-03-30 00:48:12 +00:00
David S. Miller
df69491b7d Merge branch 'xen-netback'
Paul Durrant says:

====================
xen-netback: fix rx slot estimation

Sander Eikelenboom reported an issue with ring overflow in netback in
3.14-rc3. This turns outo be be because of a bug in the ring slot estimation
code. This patch series fixes the slot estimation, fixes the BUG_ON() that
was supposed to catch the issue that Sander ran into and also makes a small
fix to start_new_rx_buffer().

v3:
 - Added a cap of MAX_SKB_FRAGS to estimate in patch #2

v2:
 - Added BUG_ON() to patch #1
 - Added more explanation to patch #3
====================

Reported-By: Sander Eikelenboom <linux@eikelenboom.it>
Tested-By: Sander Eikelenboom <linux@eikelenboom.it>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 18:52:04 -04:00
Paul Durrant
1425c7a4e8 xen-netback: BUG_ON in xenvif_rx_action() not catching overflow
The BUG_ON to catch ring overflow in xenvif_rx_action() makes the assumption
that meta_slots_used == ring slots used. This is not necessarily the case
for GSO packets, because the non-prefix GSO protocol consumes one more ring
slot than meta-slot for the 'extra_info'. This patch changes the test to
actually check ring slots.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 18:50:34 -04:00
Paul Durrant
a02eb4732c xen-netback: worse-case estimate in xenvif_rx_action is underestimating
The worse-case estimate for skb ring slot usage in xenvif_rx_action()
fails to take fragment page_offset into account. The page_offset does,
however, affect the number of times the fragmentation code calls
start_new_rx_buffer() (i.e. consume another slot) and the worse-case
should assume that will always return true. This patch adds the page_offset
into the DIV_ROUND_UP for each frag.

Unfortunately some frontends aggressively limit the number of requests
they post into the shared ring so to avoid an estimate that is 'too'
pessimal it is capped at MAX_SKB_FRAGS.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 18:50:34 -04:00
Paul Durrant
0576eddf24 xen-netback: remove pointless clause from if statement
This patch removes a test in start_new_rx_buffer() that checks whether
a copy operation is less than MAX_BUFFER_OFFSET in length, since
MAX_BUFFER_OFFSET is defined to be PAGE_SIZE and the only caller of
start_new_rx_buffer() already limits copy operations to PAGE_SIZE or less.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Reported-By: Sander Eikelenboom <linux@eikelenboom.it>
Tested-By: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 18:50:34 -04:00
David S. Miller
64c27237a0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/marvell/mvneta.c

The mvneta.c conflict is a case of overlapping changes,
a conversion to devm_ioremap_resource() vs. a conversion
to netdev_alloc_pcpu_stats.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 18:48:54 -04:00
Zhao Qiang
77a9939426 phy/at8031: enable at8031 to work on interrupt mode
The at8031 can work on polling mode and interrupt mode.
Add ack_interrupt and config intr funcs to enable
interrupt mode for it.

Signed-off-by: Zhao Qiang <B45475@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 18:20:30 -04:00
Wang Yufen
437de07ced ipv6: fix checkpatch errors of "foo*" and "foo * bar"
ERROR: "(foo*)" should be "(foo *)"
ERROR: "foo * bar" should be "foo *bar"

Suggested-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 18:15:52 -04:00
Wang Yufen
49e253e399 ipv6: fix checkpatch errors of brace and trailing statements
ERROR: open brace '{' following enum go on the same line
ERROR: open brace '{' following struct go on the same line
ERROR: trailing statements should be on next line

Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 18:15:52 -04:00
Wang Yufen
8db46f1d4c ipv6: fix checkpatch errors comments and space
WARNING: please, no space before tabs
WARNING: please, no spaces at the start of a line
ERROR: spaces required around that ':' (ctx:VxW)
ERROR: spaces required around that '>' (ctx:VxV)
ERROR: spaces required around that '>=' (ctx:VxV)

Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 18:15:52 -04:00
Linus Torvalds
981e893ed5 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
 "A late breaking fix from John.  (The bug fixed has a hard lockup
  potential, but that was not observed, warnings were)"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time: Revert to calling clock_was_set_delayed() while in irq context
2014-03-29 15:01:09 -07:00
David S. Miller
587daaf382 Merge branch 'netpoll-next'
Eric W. Biederman says:

====================
netpoll: Cleanups and fixes

This should be a small set of safe cleanups and fixes to netpoll.

The fixes are vlan headers are now always inserted when needed, and
napi polling is always avoided when network devices are closed.

There are a bunch of little cleanups removing unnecessary code, fixing
function naming, not taking unnecessary locks and removing general
silliness.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 18:00:37 -04:00