Commit graph

875 commits

Author SHA1 Message Date
Lorenzo Colitti
c9e82cb34c net: xfrm: allow clearing socket xfrm policies.
[ Upstream commit be8f8284cd897af2482d4e54fbc2bdfc15557259 ]

Currently it is possible to add or update socket policies, but
not clear them. Therefore, once a socket policy has been applied,
the socket cannot be used for unencrypted traffic.

This patch allows (privileged) users to clear socket policies by
passing in a NULL pointer and zero length argument to the
{IP,IPV6}_{IPSEC,XFRM}_POLICY setsockopts. This results in both
the incoming and outgoing policies being cleared.

The simple approach taken in this patch cannot clear socket
policies in only one direction. If desired this could be added
in the future, for example by continuing to pass in a length of
zero (which currently is guaranteed to return EMSGSIZE) and
making the policy be a pointer to an integer that contains one
of the XFRM_POLICY_{IN,OUT} enum values.

An alternative would have been to interpret the length as a
signed integer and use XFRM_POLICY_IN (i.e., 0) to clear the
input policy and -XFRM_POLICY_OUT (i.e., -1) to clear the output
policy.

Tested: https://android-review.googlesource.com/539816
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-22 09:23:28 +01:00
Steffen Klassert
cce422b32d xfrm: Fix stack-out-of-bounds with misconfigured transport mode policies.
[ Upstream commit 732706afe1cc46ef48493b3d2b69c98f36314ae4 ]

On policies with a transport mode template, we pass the addresses
from the flowi to xfrm_state_find(), assuming that the IP addresses
(and address family) don't change during transformation.

Unfortunately our policy template validation is not strict enough.
It is possible to configure policies with transport mode template
where the address family of the template does not match the selectors
address family. This lead to stack-out-of-bound reads because
we compare arddesses of the wrong family. Fix this by refusing
such a configuration, address family can not change on transport
mode.

We use the assumption that, on transport mode, the first templates
address family must match the address family of the policy selector.
Subsequent transport mode templates must mach the address family of
the previous template.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-25 11:03:41 +01:00
Cong Wang
40cda9b7ba xfrm: check id proto in validate_tmpl()
commit 6a53b7593233ab9e4f96873ebacc0f653a55c3e1 upstream.

syzbot reported a kernel warning in xfrm_state_fini(), which
indicates that we have entries left in the list
net->xfrm.state_all whose proto is zero. And
xfrm_id_proto_match() doesn't consider them as a match with
IPSEC_PROTO_ANY in this case.

Proto with value 0 is probably not a valid value, at least
verify_newsa_info() doesn't consider it valid either.

This patch fixes it by checking the proto value in
validate_tmpl() and rejecting invalid ones, like what iproute2
does in xfrm_xfrmproto_getbyname().

Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-25 11:03:35 +01:00
Steffen Klassert
7800c76fb0 xfrm: Fix stack-out-of-bounds read on socket policy lookup.
commit ddc47e4404b58f03e98345398fb12d38fe291512 upstream.

When we do tunnel or beet mode, we pass saddr and daddr from the
template to xfrm_state_find(), this is ok. On transport mode,
we pass the addresses from the flowi, assuming that the IP
addresses (and address family) don't change during transformation.
This assumption is wrong in the IPv4 mapped IPv6 case, packet
is IPv4 and template is IPv6.

Fix this by catching address family missmatches of the policy
and the flow already before we do the lookup.

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-25 11:03:35 +01:00
Herbert Xu
8bfafc972a xfrm: Copy policy family in clone_policy
[ Upstream commit 0e74aa1d79a5bbc663e03a2804399cae418a0321 ]

The syzbot found an ancient bug in the IPsec code.  When we cloned
a socket policy (for example, for a child TCP socket derived from a
listening socket), we did not copy the family field.  This results
in a live policy with a zero family field.  This triggers a BUG_ON
check in the af_key code when the cloned policy is retrieved.

This patch fixes it by copying the family field over.

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-16 10:33:55 +01:00
Herbert Xu
b377c453b3 ipsec: Fix aborted xfrm policy dump crash
commit 1137b5e2529a8f5ca8ee709288ecba3e68044df2 upstream.

An independent security researcher, Mohamed Ghannam, has reported
this vulnerability to Beyond Security's SecuriTeam Secure Disclosure
program.

The xfrm_dump_policy_done function expects xfrm_dump_policy to
have been called at least once or it will crash.  This can be
triggered if a dump fails because the target socket's receive
buffer is full.

This patch fixes it by using the cb->start mechanism to ensure that
the initialisation is always done regardless of the buffer situation.

Fixes: 12a169e7d8 ("ipsec: Put dumpers on the dump list")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-05 11:22:49 +01:00
Vladis Dronov
9b3dcc98d8 xfrm: policy: check policy direction value
commit 7bab09631c2a303f87a7eb7e3d69e888673b9b7e upstream.

The 'dir' parameter in xfrm_migrate() is a user-controlled byte which is used
as an array index. This can lead to an out-of-bound access, kernel lockup and
DoS. Add a check for the 'dir' value.

This fixes CVE-2017-11600.

References: https://bugzilla.redhat.com/show_bug.cgi?id=1474928
Fixes: 80c9abaabf ("[XFRM]: Extension for dynamic update of endpoint address(es)")
Reported-by: "bo Zhang" <zhangbo5891001@gmail.com>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-07 08:34:09 +02:00
Steffen Klassert
ce9b76665e xfrm: Don't use sk_family for socket policy lookups
commit 4c86d77743a54fb2d8a4d18a037a074c892bb3be upstream.

On IPv4-mapped IPv6 addresses sk_family is AF_INET6,
but the flow informations are created based on AF_INET.
So the routing set up 'struct flowi4' but we try to
access 'struct flowi6' what leads to an out of bounds
access. Fix this by using the family we get with the
dst_entry, like we do it for the standard policy lookup.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-06 19:19:46 -07:00
Sabrina Dubroca
398ac7a19f xfrm: fix stack access out of bounds with CONFIG_XFRM_SUB_POLICY
commit 9b3eb54106cf6acd03f07cf0ab01c13676a226c2 upstream.

When CONFIG_XFRM_SUB_POLICY=y, xfrm_dst stores a copy of the flowi for
that dst. Unfortunately, the code that allocates and fills this copy
doesn't care about what type of flowi (flowi, flowi4, flowi6) gets
passed. In multiple code paths (from raw_sendmsg, from TCP when
replying to a FIN, in vxlan, geneve, and gre), the flowi that gets
passed to xfrm is actually an on-stack flowi4, so we end up reading
stuff from the stack past the end of the flowi4 struct.

Since xfrm_dst->origin isn't used anywhere following commit
ca116922af ("xfrm: Eliminate "fl" and "pol" args to
xfrm_bundle_ok()."), just get rid of it.  xfrm_dst->partner isn't used
either, so get rid of that too.

Fixes: 9d6ec93801 ("ipv4: Use flowi4 in public route lookup interfaces.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-05 14:37:21 +02:00
Andy Whitcroft
22c9e7c092 xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder
commit f843ee6dd019bcece3e74e76ad9df0155655d0df upstream.

Kees Cook has pointed out that xfrm_replay_state_esn_len() is subject to
wrapping issues.  To ensure we are correctly ensuring that the two ESN
structures are the same size compare both the overall size as reported
by xfrm_replay_state_esn_len() and the internal length are the same.

CVE-2017-7184
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-31 09:49:52 +02:00
Andy Whitcroft
cce7e56dd7 xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window
commit 677e806da4d916052585301785d847c3b3e6186a upstream.

When a new xfrm state is created during an XFRM_MSG_NEWSA call we
validate the user supplied replay_esn to ensure that the size is valid
and to ensure that the replay_window size is within the allocated
buffer.  However later it is possible to update this replay_esn via a
XFRM_MSG_NEWAE call.  There we again validate the size of the supplied
buffer matches the existing state and if so inject the contents.  We do
not at this point check that the replay_window is within the allocated
memory.  This leads to out-of-bounds reads and writes triggered by
netlink packets.  This leads to memory corruption and the potential for
priviledge escalation.

We already attempt to validate the incoming replay information in
xfrm_new_ae() via xfrm_replay_verify_len().  This confirms that the user
is not trying to change the size of the replay state buffer which
includes the replay_esn.  It however does not check the replay_window
remains within that buffer.  Add validation of the contained
replay_window.

CVE-2017-7184
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-31 09:49:52 +02:00
Florian Westphal
a9a76a3e31 xfrm: policy: init locks early
commit c282222a45cb9503cbfbebfdb60491f06ae84b49 upstream.

Dmitry reports following splat:
 INFO: trying to register non-static key.
 the code is fine but needs lockdep annotation.
 turning off the locking correctness validator.
 CPU: 0 PID: 13059 Comm: syz-executor1 Not tainted 4.10.0-rc7-next-20170207 #1
[..]
 spin_lock_bh include/linux/spinlock.h:304 [inline]
 xfrm_policy_flush+0x32/0x470 net/xfrm/xfrm_policy.c:963
 xfrm_policy_fini+0xbf/0x560 net/xfrm/xfrm_policy.c:3041
 xfrm_net_init+0x79f/0x9e0 net/xfrm/xfrm_policy.c:3091
 ops_init+0x10a/0x530 net/core/net_namespace.c:115
 setup_net+0x2ed/0x690 net/core/net_namespace.c:291
 copy_net_ns+0x26c/0x530 net/core/net_namespace.c:396
 create_new_namespaces+0x409/0x860 kernel/nsproxy.c:106
 unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205
 SYSC_unshare kernel/fork.c:2281 [inline]

Problem is that when we get error during xfrm_net_init we will call
xfrm_policy_fini which will acquire xfrm_policy_lock before it was
initialized.  Just move it around so locks get set up first.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: 283bc9f35b ("xfrm: Namespacify xfrm state/policy locks")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-31 09:49:52 +02:00
subashab@codeaurora.org
759e8f3896 xfrm: Fix crash observed during device unregistration and decryption
[ Upstream commit 071d36bf21bcc837be00cea55bcef8d129e7f609 ]

A crash is observed when a decrypted packet is processed in receive
path. get_rps_cpus() tries to dereference the skb->dev fields but it
appears that the device is freed from the poison pattern.

[<ffffffc000af58ec>] get_rps_cpu+0x94/0x2f0
[<ffffffc000af5f94>] netif_rx_internal+0x140/0x1cc
[<ffffffc000af6094>] netif_rx+0x74/0x94
[<ffffffc000bc0b6c>] xfrm_input+0x754/0x7d0
[<ffffffc000bc0bf8>] xfrm_input_resume+0x10/0x1c
[<ffffffc000ba6eb8>] esp_input_done+0x20/0x30
[<ffffffc0000b64c8>] process_one_work+0x244/0x3fc
[<ffffffc0000b7324>] worker_thread+0x2f8/0x418
[<ffffffc0000bb40c>] kthread+0xe0/0xec

-013|get_rps_cpu(
     |    dev = 0xFFFFFFC08B688000,
     |    skb = 0xFFFFFFC0C76AAC00 -> (
     |      dev = 0xFFFFFFC08B688000 -> (
     |        name =
"......................................................
     |        name_hlist = (next = 0xAAAAAAAAAAAAAAAA, pprev =
0xAAAAAAAAAAA

Following are the sequence of events observed -

- Encrypted packet in receive path from netdevice is queued
- Encrypted packet queued for decryption (asynchronous)
- Netdevice brought down and freed
- Packet is decrypted and returned through callback in esp_input_done
- Packet is queued again for process in network stack using netif_rx

Since the device appears to have been freed, the dereference of
skb->dev in get_rps_cpus() leads to an unhandled page fault
exception.

Fix this by holding on to device reference when queueing packets
asynchronously and releasing the reference on call back return.

v2: Make the change generic to xfrm as mentioned by Steffen and
update the title to xfrm

Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jerome Stanislaus <jeromes@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-20 15:42:05 +09:00
Konstantin Khlebnikov
1f0bdf6092 net: preserve IP control block during GSO segmentation
[ Upstream commit 9207f9d45b0ad071baa128e846d7e7ed85016df3 ]

Skb_gso_segment() uses skb control block during segmentation.
This patch adds 32-bytes room for previous control block which
will be copied into all resulting segments.

This patch fixes kernel crash during fragmenting forwarded packets.
Fragmentation requires valid IP CB in skb for clearing ip options.
Also patch removes custom save/restore in ovs code, now it's redundant.

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Link: http://lkml.kernel.org/r/CALYGNiP-0MZ-FExV2HutTvE9U-QQtkKSoE--KN=JQE5STYsjAA@mail.gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31 11:29:00 -08:00
David S. Miller
024f35c552 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
pull request (net): ipsec 2015-12-22

Just one patch to fix dst_entries_init with multiple namespaces.
From Dan Streetman.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-22 16:26:31 -05:00
Eric Dumazet
d188ba86dd xfrm: add rcu protection to sk->sk_policy[]
XFRM can deal with SYNACK messages, sent while listener socket
is not locked. We add proper rcu protection to __xfrm_sk_clone_policy()
and xfrm_sk_policy_lookup()

This might serve as the first step to remove xfrm.xfrm_policy_lock
use in fast path.

Fixes: fa76ce7328 ("inet: get rid of central tcp/dccp listener timer")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-11 19:22:06 -05:00
Eric Dumazet
56f047305d xfrm: add rcu grace period in xfrm_policy_destroy()
We will soon switch sk->sk_policy[] to RCU protection,
as SYNACK packets are sent while listener socket is not locked.

This patch simply adds RCU grace period before struct xfrm_policy
freeing, and the corresponding rcu_head in struct xfrm_policy.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-11 19:22:06 -05:00
Eric Dumazet
bd5eb35f16 xfrm: take care of request sockets
TCP SYNACK messages might now be attached to request sockets.

XFRM needs to get back to a listener socket.

Adds new helpers that might be used elsewhere :
sk_to_full_sk() and sk_const_to_full_sk()

Note: We also need to add RCU protection for xfrm lookups,
now TCP/DCCP have lockless listener processing. This will
be addressed in separate patches.

Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-07 17:07:33 -05:00
Dan Streetman
a8a572a6b5 xfrm: dst_entries_init() per-net dst_ops
Remove the dst_entries_init/destroy calls for xfrm4 and xfrm6 dst_ops
templates; their dst_entries counters will never be used.  Move the
xfrm dst_ops initialization from the common xfrm/xfrm_policy.c to
xfrm4/xfrm4_policy.c and xfrm6/xfrm6_policy.c, and call dst_entries_init
and dst_entries_destroy for each net namespace.

The ipv4 and ipv6 xfrms each create dst_ops template, and perform
dst_entries_init on the templates.  The template values are copied to each
net namespace's xfrm.xfrm*_dst_ops.  The problem there is the dst_ops
pcpuc_entries field is a percpu counter and cannot be used correctly by
simply copying it to another object.

The result of this is a very subtle bug; changes to the dst entries
counter from one net namespace may sometimes get applied to a different
net namespace dst entries counter.  This is because of how the percpu
counter works; it has a main count field as well as a pointer to the
percpu variables.  Each net namespace maintains its own main count
variable, but all point to one set of percpu variables.  When any net
namespace happens to change one of the percpu variables to outside its
small batch range, its count is moved to the net namespace's main count
variable.  So with multiple net namespaces operating concurrently, the
dst_ops entries counter can stray from the actual value that it should
be; if counts are consistently moved from one net namespace to another
(which my testing showed is likely), then one net namespace winds up
with a negative dst_ops count while another winds up with a continually
increasing count, eventually reaching its gc_thresh limit, which causes
all new traffic on the net namespace to fail with -ENOBUFS.

Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-11-03 08:42:57 +01:00
David S. Miller
e7b63ff115 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2015-10-30

1) The flow cache is limited by the flow cache limit which
   depends on the number of cpus and the xfrm garbage collector
   threshold which is independent of the number of cpus. This
   leads to the fact that on systems with more than 16 cpus
   we hit the xfrm garbage collector limit and refuse new
   allocations, so new flows are dropped. On systems with 16
   or less cpus, we hit the flowcache limit. In this case, we
   shrink the flow cache instead of refusing new flows.

   We increase the xfrm garbage collector threshold to INT_MAX
   to get the same behaviour, independent of the number of cpus.

2) Fix some unaligned accesses on sparc systems.
   From Sowmini Varadhan.

3) Fix some header checks in _decode_session4. We may call
   pskb_may_pull with a negative value converted to unsigened
   int from pskb_may_pull. This can lead to incorrect policy
   lookups. We fix this by a check of the data pointer position
   before we call pskb_may_pull.

4) Reload skb header pointers after calling pskb_may_pull
   in _decode_session4 as this may change the pointers into
   the packet.

5) Add a missing statistic counter on inner mode errors.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-30 20:51:56 +09:00
David S. Miller
ba3e2084f2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/ipv6/xfrm6_output.c
	net/openvswitch/flow_netlink.c
	net/openvswitch/vport-gre.c
	net/openvswitch/vport-vxlan.c
	net/openvswitch/vport.c
	net/openvswitch/vport.h

The openvswitch conflicts were overlapping changes.  One was
the egress tunnel info fix in 'net' and the other was the
vport ->send() op simplification in 'net-next'.

The xfrm6_output.c conflicts was also a simplification
overlapping a bug fix.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-24 06:54:12 -07:00
Steffen Klassert
cb866e3298 xfrm: Increment statistic counter on inner mode error
Increment the LINUX_MIB_XFRMINSTATEMODEERROR statistic counter
to notify about dropped packets if we fail to fetch a inner mode.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-10-23 07:52:58 +02:00
Sowmini Varadhan
e33d4f13d2 xfrm: Fix unaligned access to stats in copy_to_user_state()
On sparc, deleting established SAs (e.g., by restarting ipsec)
results in unaligned access messages via xfrm_del_sa ->
km_state_notify -> xfrm_send_state_notify().

Even though struct xfrm_usersa_info is aligned on 8-byte boundaries,
netlink attributes are fundamentally only 4 byte aligned, and this
cannot be changed for nla_data() that is passed up to userspace.
As a result, the put_unaligned() macro needs to be used to
set up potentially unaligned fields such as the xfrm_stats in
copy_to_user_state()

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-10-23 06:49:29 +02:00
Eric W. Biederman
ede2059dba dst: Pass net into dst->output
The network namespace is already passed into dst_output pass it into
dst->output lwt->output and friends.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08 04:27:03 -07:00
Eric W. Biederman
cf91a99daa ipv4, ipv6: Pass net into __ip_local_out and __ip6_local_out
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08 04:27:02 -07:00
Eric W. Biederman
4ebdfba73c dst: Pass a sk into .local_out
For consistency with the other similar methods in the kernel pass a
struct sock into the dst_ops .local_out method.

Simplifying the socket passing case is needed a prequel to passing a
struct net reference into .local_out.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08 04:26:55 -07:00
Eric W. Biederman
13206b6bff net: Pass net into dst_output and remove dst_output_okfn
Replace dst_output_okfn with dst_output

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08 04:26:54 -07:00
Eric W. Biederman
3f5312ae62 xfrm: Only compute net once in xfrm_policy_queue_process
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08 04:26:53 -07:00
Michael Rossberg
4e077237cf xfrm: Fix state threshold configuration from userspace
Allow to change the replay threshold (XFRMA_REPLAY_THRESH) and expiry
timer (XFRMA_ETIMER_THRESH) of a state without having to set other
attributes like replay counter and byte lifetime. Changing these other
values while traffic flows will break the state.

Signed-off-by: Michael Rossberg <michael.rossberg@tu-ilmenau.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-09-29 11:45:55 +02:00
Eric Dumazet
6f9c961546 inet: constify ip_route_output_flow() socket argument
Very soon, TCP stack might call inet_csk_route_req(), which
calls inet_csk_route_req() with an unlocked listener socket,
so we need to make sure ip_route_output_flow() is not trying to
change any field from its socket argument.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25 13:00:37 -07:00
Eric W. Biederman
be10de0a32 netfilter: Add blank lines in callers of netfilter hooks
In code review it was noticed that I had failed to add some blank lines
in places where they are customarily used.  Taking a second look at the
code I have to agree blank lines would be nice so I have added them
here.

Reported-by:  Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 17:18:37 -07:00
Eric W. Biederman
0c4b51f005 netfilter: Pass net into okfn
This is immediately motivated by the bridge code that chains functions that
call into netfilter.  Without passing net into the okfns the bridge code would
need to guess about the best expression for the network namespace to process
packets in.

As net is frequently one of the first things computed in continuation functions
after netfilter has done it's job passing in the desired network namespace is in
many cases a code simplification.

To support this change the function dst_output_okfn is introduced to
simplify passing dst_output as an okfn.  For the moment dst_output_okfn
just silently drops the struct net.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 17:18:37 -07:00
Eric W. Biederman
29a26a5680 netfilter: Pass struct net into the netfilter hooks
Pass a network namespace parameter into the netfilter hooks.  At the
call site of the netfilter hooks the path a packet is taking through
the network stack is well known which allows the network namespace to
be easily and reliabily.

This allows the replacement of magic code like
"dev_net(state->in?:state->out)" that appears at the start of most
netfilter hooks with "state->net".

In almost all cases the network namespace passed in is derived
from the first network device passed in, guaranteeing those
paths will not see any changes in practice.

The exceptions are:
xfrm/xfrm_output.c:xfrm_output_resume()         xs_net(skb_dst(skb)->xfrm)
ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont()      ip_vs_conn_net(cp)
ipvs/ip_vs_xmit.c:ip_vs_send_or_cont()          ip_vs_conn_net(cp)
ipv4/raw.c:raw_send_hdrinc()                    sock_net(sk)
ipv6/ip6_output.c:ip6_xmit()			sock_net(sk)
ipv6/ndisc.c:ndisc_send_skb()                   dev_net(skb->dev) not dev_net(dst->dev)
ipv6/raw.c:raw6_send_hdrinc()                   sock_net(sk)
br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev

In all cases these exceptions seem to be a better expression for the
network namespace the packet is being processed in then the historic
"dev_net(in?in:out)".  I am documenting them in case something odd
pops up and someone starts trying to track down what happened.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 17:18:37 -07:00
Eric W. Biederman
5a70649e0d net: Merge dst_output and dst_output_sk
Add a sock paramter to dst_output making dst_output_sk superfluous.
Add a skb->sk parameter to all of the callers of dst_output
Have the callers of dst_output_sk call dst_output.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 17:18:32 -07:00
Eric W. Biederman
a6568b2425 xfrm: Remove unused afinfo method init_dst
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 17:18:32 -07:00
Linus Torvalds
dd5cdb48ed Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Another merge window, another set of networking changes.  I've heard
  rumblings that the lightweight tunnels infrastructure has been voted
  networking change of the year.  But what do I know?

   1) Add conntrack support to openvswitch, from Joe Stringer.

   2) Initial support for VRF (Virtual Routing and Forwarding), which
      allows the segmentation of routing paths without using multiple
      devices.  There are some semantic kinks to work out still, but
      this is a reasonably strong foundation.  From David Ahern.

   3) Remove spinlock fro act_bpf fast path, from Alexei Starovoitov.

   4) Ignore route nexthops with a link down state in ipv6, just like
      ipv4.  From Andy Gospodarek.

   5) Remove spinlock from fast path of act_gact and act_mirred, from
      Eric Dumazet.

   6) Document the DSA layer, from Florian Fainelli.

   7) Add netconsole support to bcmgenet, systemport, and DSA.  Also
      from Florian Fainelli.

   8) Add Mellanox Switch Driver and core infrastructure, from Jiri
      Pirko.

   9) Add support for "light weight tunnels", which allow for
      encapsulation and decapsulation without bearing the overhead of a
      full blown netdevice.  From Thomas Graf, Jiri Benc, and a cast of
      others.

  10) Add Identifier Locator Addressing support for ipv6, from Tom
      Herbert.

  11) Support fragmented SKBs in iwlwifi, from Johannes Berg.

  12) Allow perf PMUs to be accessed from eBPF programs, from Kaixu Xia.

  13) Add BQL support to 3c59x driver, from Loganaden Velvindron.

  14) Stop using a zero TX queue length to mean that a device shouldn't
      have a qdisc attached, use an explicit flag instead.  From Phil
      Sutter.

  15) Use generic geneve netdevice infrastructure in openvswitch, from
      Pravin B Shelar.

  16) Add infrastructure to avoid re-forwarding a packet in software
      that was already forwarded by a hardware switch.  From Scott
      Feldman.

  17) Allow AF_PACKET fanout function to be implemented in a bpf
      program, from Willem de Bruijn"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1458 commits)
  netfilter: nf_conntrack: make nf_ct_zone_dflt built-in
  netfilter: nf_dup{4, 6}: fix build error when nf_conntrack disabled
  net: fec: clear receive interrupts before processing a packet
  ipv6: fix exthdrs offload registration in out_rt path
  xen-netback: add support for multicast control
  bgmac: Update fixed_phy_register()
  sock, diag: fix panic in sock_diag_put_filterinfo
  flow_dissector: Use 'const' where possible.
  flow_dissector: Fix function argument ordering dependency
  ixgbe: Resolve "initialized field overwritten" warnings
  ixgbe: Remove bimodal SR-IOV disabling
  ixgbe: Add support for reporting 2.5G link speed
  ixgbe: fix bounds checking in ixgbe_setup_tc for 82598
  ixgbe: support for ethtool set_rxfh
  ixgbe: Avoid needless PHY access on copper phys
  ixgbe: cleanup to use cached mask value
  ixgbe: Remove second instance of lan_id variable
  ixgbe: use kzalloc for allocating one thing
  flow: Move __get_hash_from_flowi{4,6} into flow_dissector.c
  ixgbe: Remove unused PCI bus types
  ...
2015-09-03 08:08:17 -07:00
Herbert Xu
de0ded77b9 ipsec: Replace seqniv with seqiv
Now that seqniv is identical with seqiv we no longer need it.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-08-17 16:53:42 +08:00
David Ahern
42a7b32b73 xfrm: Add oif to dst lookups
Rules can be installed that direct route lookups to specific tables based
on oif. Plumb the oif through the xfrm lookups so it gets set in the flow
struct and passed to the resolver routines.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-08-11 12:41:35 +02:00
Andrzej Hajda
df367561ff net/xfrm: use kmemdup rather than duplicating its implementation
The patch was generated using fixed coccinelle semantic patch
scripts/coccinelle/api/memdup.cocci [1].

[1]: http://permalink.gmane.org/gmane.linux.kernel/2014320

Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-08-11 12:41:35 +02:00
Jakub Wilk
0c199a903d xfrm: Fix a typo
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21 00:28:36 -07:00
Linus Torvalds
e0456717e4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Add TX fast path in mac80211, from Johannes Berg.

 2) Add TSO/GRO support to ibmveth, from Thomas Falcon

 3) Move away from cached routes in ipv6, just like ipv4, from Martin
    KaFai Lau.

 4) Lots of new rhashtable tests, from Thomas Graf.

 5) Run ingress qdisc lockless, from Alexei Starovoitov.

 6) Allow servers to fetch TCP packet headers for SYN packets of new
    connections, for fingerprinting.  From Eric Dumazet.

 7) Add mode parameter to pktgen, for testing receive.  From Alexei
    Starovoitov.

 8) Cache access optimizations via simplifications of build_skb(), from
    Alexander Duyck.

 9) Move page frag allocator under mm/, also from Alexander.

10) Add xmit_more support to hv_netvsc, from KY Srinivasan.

11) Add a counter guard in case we try to perform endless reclassify
    loops in the packet scheduler.

12) Extern flow dissector to be programmable and use it in new "Flower"
    classifier.  From Jiri Pirko.

13) AF_PACKET fanout rollover fixes, performance improvements, and new
    statistics.  From Willem de Bruijn.

14) Add netdev driver for GENEVE tunnels, from John W Linville.

15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso.

16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet.

17) Add an ECN retry fallback for the initial TCP handshake, from Daniel
    Borkmann.

18) Add tail call support to BPF, from Alexei Starovoitov.

19) Add several pktgen helper scripts, from Jesper Dangaard Brouer.

20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa.

21) Favor even port numbers for allocation to connect() requests, and
    odd port numbers for bind(0), in an effort to help avoid
    ip_local_port_range exhaustion.  From Eric Dumazet.

22) Add Cavium ThunderX driver, from Sunil Goutham.

23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata,
    from Alexei Starovoitov.

24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai.

25) Double TCP Small Queues default to 256K to accomodate situations
    like the XEN driver and wireless aggregation.  From Wei Liu.

26) Add more entropy inputs to flow dissector, from Tom Herbert.

27) Add CDG congestion control algorithm to TCP, from Kenneth Klette
    Jonassen.

28) Convert ipset over to RCU locking, from Jozsef Kadlecsik.

29) Track and act upon link status of ipv4 route nexthops, from Andy
    Gospodarek.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits)
  bridge: vlan: flush the dynamically learned entries on port vlan delete
  bridge: multicast: add a comment to br_port_state_selection about blocking state
  net: inet_diag: export IPV6_V6ONLY sockopt
  stmmac: troubleshoot unexpected bits in des0 & des1
  net: ipv4 sysctl option to ignore routes when nexthop link is down
  net: track link-status of ipv4 nexthops
  net: switchdev: ignore unsupported bridge flags
  net: Cavium: Fix MAC address setting in shutdown state
  drivers: net: xgene: fix for ACPI support without ACPI
  ip: report the original address of ICMP messages
  net/mlx5e: Prefetch skb data on RX
  net/mlx5e: Pop cq outside mlx5e_get_cqe
  net/mlx5e: Remove mlx5e_cq.sqrq back-pointer
  net/mlx5e: Remove extra spaces
  net/mlx5e: Avoid TX CQE generation if more xmit packets expected
  net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion
  net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq()
  net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them
  net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues
  net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device
  ...
2015-06-24 16:49:49 -07:00
Linus Torvalds
44d21c3f3a Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "Here is the crypto update for 4.2:

  API:

   - Convert RNG interface to new style.

   - New AEAD interface with one SG list for AD and plain/cipher text.
     All external AEAD users have been converted.

   - New asymmetric key interface (akcipher).

  Algorithms:

   - Chacha20, Poly1305 and RFC7539 support.

   - New RSA implementation.

   - Jitter RNG.

   - DRBG is now seeded with both /dev/random and Jitter RNG.  If kernel
     pool isn't ready then DRBG will be reseeded when it is.

   - DRBG is now the default crypto API RNG, replacing krng.

   - 842 compression (previously part of powerpc nx driver).

  Drivers:

   - Accelerated SHA-512 for arm64.

   - New Marvell CESA driver that supports DMA and more algorithms.

   - Updated powerpc nx 842 support.

   - Added support for SEC1 hardware to talitos"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (292 commits)
  crypto: marvell/cesa - remove COMPILE_TEST dependency
  crypto: algif_aead - Temporarily disable all AEAD algorithms
  crypto: af_alg - Forbid the use internal algorithms
  crypto: echainiv - Only hold RNG during initialisation
  crypto: seqiv - Add compatibility support without RNG
  crypto: eseqiv - Offer normal cipher functionality without RNG
  crypto: chainiv - Offer normal cipher functionality without RNG
  crypto: user - Add CRYPTO_MSG_DELRNG
  crypto: user - Move cryptouser.h to uapi
  crypto: rng - Do not free default RNG when it becomes unused
  crypto: skcipher - Allow givencrypt to be NULL
  crypto: sahara - propagate the error on clk_disable_unprepare() failure
  crypto: rsa - fix invalid select for AKCIPHER
  crypto: picoxcell - Update to the current clk API
  crypto: nx - Check for bogus firmware properties
  crypto: marvell/cesa - add DT bindings documentation
  crypto: marvell/cesa - add support for Kirkwood and Dove SoCs
  crypto: marvell/cesa - add support for Orion SoCs
  crypto: marvell/cesa - add allhwsupport module parameter
  crypto: marvell/cesa - add support for all armada SoCs
  ...
2015-06-22 21:04:48 -07:00
Martin Willi
b08b6b7791 xfrm: Define ChaCha20-Poly1305 AEAD XFRM algo for IPsec users
Signed-off-by: Martin Willi <martin@strongswan.org>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-06-04 15:04:55 +08:00
David S. Miller
dda922c831 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/phy/amd-xgbe-phy.c
	drivers/net/wireless/iwlwifi/Kconfig
	include/net/mac80211.h

iwlwifi/Kconfig and mac80211.h were both trivial overlapping
changes.

The drivers/net/phy/amd-xgbe-phy.c file got removed in 'net-next' and
the bug fix that happened on the 'net' side is already integrated
into the rest of the amd-xgbe driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 22:51:30 -07:00
David S. Miller
a74eab639e Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2015-05-28

1) Remove xfrm_queue_purge as this is the same as skb_queue_purge.

2) Optimize policy and state walk.

3) Use a sane return code if afinfo registration fails.

4) Only check fori a acquire state if the state is not valid.

5) Remove a unnecessary NULL check before xfrm_pol_hold
   as it checks the input for NULL.

6) Return directly if the xfrm hold queue is empty, avoid
   to take a lock as it is nothing to do in this case.

7) Optimize the inexact policy search and allow for matching
   of policies with priority ~0U.

All from Li RongQing.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-28 20:23:01 -07:00
Alexander Duyck
049f8e2e28 xfrm: Override skb->mark with tunnel->parm.i_key in xfrm_input
This change makes it so that if a tunnel is defined we just use the mark
from the tunnel instead of the mark from the skb header.  By doing this we
can avoid the need to set skb->mark inside of the tunnel receive functions.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-05-28 06:23:31 +02:00
Herbert Xu
69b0137f61 ipsec: Add IV generator information to xfrm_state
This patch adds IV generator information to xfrm_state.  This
is currently obtained from our own list of algorithm descriptions.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-05-28 11:23:20 +08:00
Herbert Xu
165ecc6373 xfrm: Add IV generator information to xfrm_algo_desc
This patch adds IV generator information for each AEAD and block
cipher to xfrm_algo_desc.  This will be used to access the new
AEAD interface.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-05-28 11:23:19 +08:00
Herbert Xu
407d34ef29 xfrm: Always zero high-order sequence number bits
As we're now always including the high bits of the sequence number
in the IV generation process we need to ensure that they don't
contain crap.

This patch ensures that the high sequence bits are always zeroed
so that we don't leak random data into the IV.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-05-21 06:56:23 +02:00
Li RongQing
8faf491e64 xfrm: optimise to search the inexact policy list
The policies are organized into list by priority ascent of policy,
so it is unnecessary to continue to loop the policy if the priority
of current looped police is larger than or equal priority which is
from the policy_bydst list.

This allows to match policy with ~0U priority in inexact list too.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-05-18 10:31:56 +02:00