Commit graph

308 commits

Author SHA1 Message Date
Greg Kroah-Hartman
7eab308a49 This is the 4.4.99 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAloQBz0ACgkQONu9yGCS
 aT5nQxAAs/xWKpYLSLvLPYnTOmSmNJ36isgFriVT+wWOLzkrWTWuoQnluDjMQjie
 nH6whZMlOnG+k5GrGF3XymxZ66tDj9TlXnPAHCC8ikcqir2/dBO/gO5v2gmFgF2E
 j52mt09I3acBQJEt+Rz3xJCMa5so61uDYGtqk/URcPEW1nBa1rfA1QIy/9zv2/aw
 2yPSz4NQlv+7yvjguw4Ik5Yt/hGeu1Y8Kuc4bVHG2TB+y0QYwri42bBwQV7llili
 XqwfjFJYGMqWJqHGF/p0hD+/Xylw6GnDzxDZQDMNhsuWcfe3tUhuOVkX30E96fh2
 ipT4wI5DTmql8EN/r/P7VS2BKL4W5HEMeNEd2APkGNGnSrzKGbd0CQ+cWVZbr645
 R03AbqZjhaQKwRi+n82q1mMb4p+3Z/F/T8twHmYg/DLta3kzzRdfVJPNFBFIoSnF
 Bay8KJKqoMv2Bjhla78pHMoqSQ9j/fJc2iPIAABtFlsTjic+/STiS7ANAsmDdJtt
 8XXc6mFQfbulKKlKKqudPLjOpUNu1SrsOcc9gmovbTy7dN6FBOfJwFMCYonNyXAc
 6/ACSxYJlnZ9YEacEmcXmz0GTytyKiTYE3fNsXc/8fHnRZ1+yea9Mo77wWkj7K4V
 IqNIJMCW8K+P97oL6mdUBZwMUi4zrWueakMq8SWBdKYaD5yeV2k=
 =ql4j
 -----END PGP SIGNATURE-----

Merge 4.4.99 into android-4.4

Changes in 4.4.99
	mac80211: accept key reinstall without changing anything
	mac80211: use constant time comparison with keys
	mac80211: don't compare TKIP TX MIC key in reinstall prevention
	usb: usbtest: fix NULL pointer dereference
	Input: ims-psu - check if CDC union descriptor is sane
	ALSA: seq: Cancel pending autoload work at unbinding device
	tun/tap: sanitize TUNSETSNDBUF input
	tcp: fix tcp_mtu_probe() vs highest_sack
	l2tp: check ps->sock before running pppol2tp_session_ioctl()
	tun: call dev_get_valid_name() before register_netdevice()
	sctp: add the missing sock_owned_by_user check in sctp_icmp_redirect
	packet: avoid panic in packet_getsockopt()
	ipv6: flowlabel: do not leave opt->tot_len with garbage
	net/unix: don't show information about sockets from other namespaces
	ip6_gre: only increase err_count for some certain type icmpv6 in ip6gre_err
	tun: allow positive return values on dev_get_valid_name() call
	sctp: reset owner sk for data chunks on out queues when migrating a sock
	ppp: fix race in ppp device destruction
	ipip: only increase err_count for some certain type icmp in ipip_err
	tcp/dccp: fix ireq->opt races
	tcp/dccp: fix lockdep splat in inet_csk_route_req()
	tcp/dccp: fix other lockdep splats accessing ireq_opt
	security/keys: add CONFIG_KEYS_COMPAT to Kconfig
	tipc: fix link attribute propagation bug
	brcmfmac: remove setting IBSS mode when stopping AP
	target/iscsi: Fix iSCSI task reassignment handling
	target: Fix node_acl demo-mode + uncached dynamic shutdown regression
	misc: panel: properly restore atomic counter on error path
	Linux 4.4.99

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-11-18 17:24:24 +01:00
Julien Gomes
bcb3b90cf3 tun: allow positive return values on dev_get_valid_name() call
[ Upstream commit 5c25f65fd1e42685f7ccd80e0621829c105785d9 ]

If the name argument of dev_get_valid_name() contains "%d", it will try
to assign it a unit number in __dev__alloc_name() and return either the
unit number (>= 0) or an error code (< 0).
Considering positive values as error values prevent tun device creations
relying this mechanism, therefor we should only consider negative values
as errors here.

Signed-off-by: Julien Gomes <julien@arista.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-18 11:11:06 +01:00
Cong Wang
4b27fe34a2 tun: call dev_get_valid_name() before register_netdevice()
[ Upstream commit 0ad646c81b2182f7fa67ec0c8c825e0ee165696d ]

register_netdevice() could fail early when we have an invalid
dev name, in which case ->ndo_uninit() is not called. For tun
device, this is a problem because a timer etc. are already
initialized and it expects ->ndo_uninit() to clean them up.

We could move these initializations into a ->ndo_init() so
that register_netdevice() knows better, however this is still
complicated due to the logic in tun_detach().

Therefore, I choose to just call dev_get_valid_name() before
register_netdevice(), which is quicker and much easier to audit.
And for this specific case, it is already enough.

Fixes: 96442e4242 ("tuntap: choose the txq based on rxq")
Reported-by: Dmitry Alexeev <avekceeb@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-18 11:11:05 +01:00
Craig Gallek
735818a8b4 tun/tap: sanitize TUNSETSNDBUF input
[ Upstream commit 93161922c658c714715686cd0cf69b090cb9bf1d ]

Syzkaller found several variants of the lockup below by setting negative
values with the TUNSETSNDBUF ioctl.  This patch adds a sanity check
to both the tun and tap versions of this ioctl.

  watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [repro:2389]
  Modules linked in:
  irq event stamp: 329692056
  hardirqs last  enabled at (329692055): [<ffffffff824b8381>] _raw_spin_unlock_irqrestore+0x31/0x75
  hardirqs last disabled at (329692056): [<ffffffff824b9e58>] apic_timer_interrupt+0x98/0xb0
  softirqs last  enabled at (35659740): [<ffffffff824bc958>] __do_softirq+0x328/0x48c
  softirqs last disabled at (35659731): [<ffffffff811c796c>] irq_exit+0xbc/0xd0
  CPU: 0 PID: 2389 Comm: repro Not tainted 4.14.0-rc7 #23
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  task: ffff880009452140 task.stack: ffff880006a20000
  RIP: 0010:_raw_spin_lock_irqsave+0x11/0x80
  RSP: 0018:ffff880006a27c50 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff10
  RAX: ffff880009ac68d0 RBX: ffff880006a27ce0 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: ffff880006a27ce0 RDI: ffff880009ac6900
  RBP: ffff880006a27c60 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000001 R11: 000000000063ff00 R12: ffff880009ac6900
  R13: ffff880006a27cf8 R14: 0000000000000001 R15: ffff880006a27cf8
  FS:  00007f4be4838700(0000) GS:ffff88000cc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000020101000 CR3: 0000000009616000 CR4: 00000000000006f0
  Call Trace:
   prepare_to_wait+0x26/0xc0
   sock_alloc_send_pskb+0x14e/0x270
   ? remove_wait_queue+0x60/0x60
   tun_get_user+0x2cc/0x19d0
   ? __tun_get+0x60/0x1b0
   tun_chr_write_iter+0x57/0x86
   __vfs_write+0x156/0x1e0
   vfs_write+0xf7/0x230
   SyS_write+0x57/0xd0
   entry_SYSCALL_64_fastpath+0x1f/0xbe
  RIP: 0033:0x7f4be4356df9
  RSP: 002b:00007ffc18101c08 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4be4356df9
  RDX: 0000000000000046 RSI: 0000000020101000 RDI: 0000000000000005
  RBP: 00007ffc18101c40 R08: 0000000000000001 R09: 0000000000000001
  R10: 0000000000000001 R11: 0000000000000293 R12: 0000559c75f64780
  R13: 00007ffc18101d30 R14: 0000000000000000 R15: 0000000000000000

Fixes: 33dccbb050 ("tun: Limit amount of queued packets per device")
Fixes: 20d29d7a91 ("net: macvtap driver")
Signed-off-by: Craig Gallek <kraig@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-18 11:11:05 +01:00
Greg Kroah-Hartman
89074de67a This is the 4.4.94 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlnrYxcACgkQONu9yGCS
 aT6VpQ/9GzBA59FGi6ohxZnrUR35+U5Ehuw0IZo4JTUTrlj28QozeV6dBAdgQHLH
 eGcejtzAsD39m7JjmBzkxiBBlCH9nQkq5IaUrJG6q5dYoTCYMLzHwQLgPSbrhbnS
 hCSeHdJ0fevw9tKQELtWlIiG1iOULrWATf4MtpOCHcRmpxxSMRi22yQ4vKD2Vz4y
 TdKb5c8bYjJoEqbtON4wKIiEK1JfyO80E4eZtNK7FXI+XX1WI65pum9/NBiDqB78
 K0sK1t5pSJHvDgMwtOJ7Nxzcwle1cG3xm7NhZhNCfF9OWedCy+ZCc+e48T+TeoF4
 UDHIhEvhOOOf/W3dRBQQj8VElj0zt92I+ivsWxKmheY9JzJdOvq2pTQoPAtLsBMD
 /mChCvMSNEcHTfLYrm6Bjap0e6D10n1oUHX7jgLtq04EcX9Rh2zgYvL9u9QFLjFx
 sAgTp+kmScgj0fi0XgiXJxj8mPc2MpTVmSUjcwAZD+N9Kuafkqbf3ZddZJiGyPfw
 v4ZiAdUAtACdOaIRVPRUcG2fyLfKYqg2bFsif4Z67/0RmNf3C3rJpS9yX+Q36zCo
 f66xbvysN3pRiME0obenrsxBJ0LvIkSVskxV0+5x0UfP5pOdf7jZqqpkr6IFMtLZ
 /o4DYV+Da/qeYZQnmvF0BEvEnnX8GJFIJ+9RSbz9mAWcCWtWxTU=
 =gevA
 -----END PGP SIGNATURE-----

Merge 4.4.94 into android-4.4

Changes in 4.4.94
	percpu: make this_cpu_generic_read() atomic w.r.t. interrupts
	drm/dp/mst: save vcpi with payloads
	MIPS: Fix minimum alignment requirement of IRQ stack
	sctp: potential read out of bounds in sctp_ulpevent_type_enabled()
	bpf/verifier: reject BPF_ALU64|BPF_END
	udpv6: Fix the checksum computation when HW checksum does not apply
	ip6_gre: skb_push ipv6hdr before packing the header in ip6gre_header
	net: emac: Fix napi poll list corruption
	packet: hold bind lock when rebinding to fanout hook
	bpf: one perf event close won't free bpf program attached by another perf event
	isdn/i4l: fetch the ppp_write buffer in one shot
	vti: fix use after free in vti_tunnel_xmit/vti6_tnl_xmit
	l2tp: Avoid schedule while atomic in exit_net
	l2tp: fix race condition in l2tp_tunnel_delete
	tun: bail out from tun_get_user() if the skb is empty
	packet: in packet_do_bind, test fanout with bind_lock held
	packet: only test po->has_vnet_hdr once in packet_snd
	net: Set sk_prot_creator when cloning sockets to the right proto
	tipc: use only positive error codes in messages
	Revert "bsg-lib: don't free job in bsg_prepare_job"
	locking/lockdep: Add nest_lock integrity test
	watchdog: kempld: fix gcc-4.3 build
	irqchip/crossbar: Fix incorrect type of local variables
	mac80211_hwsim: check HWSIM_ATTR_RADIO_NAME length
	mac80211: fix power saving clients handling in iwlwifi
	net/mlx4_en: fix overflow in mlx4_en_init_timestamp()
	netfilter: nf_ct_expect: Change __nf_ct_expect_check() return value.
	iio: adc: xilinx: Fix error handling
	Btrfs: send, fix failure to rename top level inode due to name collision
	f2fs: do not wait for writeback in write_begin
	md/linear: shutup lockdep warnning
	sparc64: Migrate hvcons irq to panicked cpu
	net/mlx4_core: Fix VF overwrite of module param which disables DMFS on new probed PFs
	crypto: xts - Add ECB dependency
	ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
	slub: do not merge cache if slub_debug contains a never-merge flag
	scsi: scsi_dh_emc: return success in clariion_std_inquiry()
	net: mvpp2: release reference to txq_cpu[] entry after unmapping
	i2c: at91: ensure state is restored after suspending
	ceph: clean up unsafe d_parent accesses in build_dentry_path
	uapi: fix linux/rds.h userspace compilation errors
	uapi: fix linux/mroute6.h userspace compilation errors
	target/iscsi: Fix unsolicited data seq_end_offset calculation
	nfsd/callback: Cleanup callback cred on shutdown
	cpufreq: CPPC: add ACPI_PROCESSOR dependency
	Revert "tty: goldfish: Fix a parameter of a call to free_irq"
	Linux 4.4.94

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-10-22 08:09:11 +02:00
Alexander Potapenko
ee534927f0 tun: bail out from tun_get_user() if the skb is empty
[ Upstream commit 2580c4c17aee3ad58e9751012bad278dd074ccae ]

KMSAN (https://github.com/google/kmsan) reported accessing uninitialized
skb->data[0] in the case the skb is empty (i.e. skb->len is 0):

================================================
BUG: KMSAN: use of uninitialized memory in tun_get_user+0x19ba/0x3770
CPU: 0 PID: 3051 Comm: probe Not tainted 4.13.0+ #3140
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
...
 __msan_warning_32+0x66/0xb0 mm/kmsan/kmsan_instr.c:477
 tun_get_user+0x19ba/0x3770 drivers/net/tun.c:1301
 tun_chr_write_iter+0x19f/0x300 drivers/net/tun.c:1365
 call_write_iter ./include/linux/fs.h:1743
 new_sync_write fs/read_write.c:457
 __vfs_write+0x6c3/0x7f0 fs/read_write.c:470
 vfs_write+0x3e4/0x770 fs/read_write.c:518
 SYSC_write+0x12f/0x2b0 fs/read_write.c:565
 SyS_write+0x55/0x80 fs/read_write.c:557
 do_syscall_64+0x242/0x330 arch/x86/entry/common.c:284
 entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:245
...
origin:
...
 kmsan_poison_shadow+0x6e/0xc0 mm/kmsan/kmsan.c:211
 slab_alloc_node mm/slub.c:2732
 __kmalloc_node_track_caller+0x351/0x370 mm/slub.c:4351
 __kmalloc_reserve net/core/skbuff.c:138
 __alloc_skb+0x26a/0x810 net/core/skbuff.c:231
 alloc_skb ./include/linux/skbuff.h:903
 alloc_skb_with_frags+0x1d7/0xc80 net/core/skbuff.c:4756
 sock_alloc_send_pskb+0xabf/0xfe0 net/core/sock.c:2037
 tun_alloc_skb drivers/net/tun.c:1144
 tun_get_user+0x9a8/0x3770 drivers/net/tun.c:1274
 tun_chr_write_iter+0x19f/0x300 drivers/net/tun.c:1365
 call_write_iter ./include/linux/fs.h:1743
 new_sync_write fs/read_write.c:457
 __vfs_write+0x6c3/0x7f0 fs/read_write.c:470
 vfs_write+0x3e4/0x770 fs/read_write.c:518
 SYSC_write+0x12f/0x2b0 fs/read_write.c:565
 SyS_write+0x55/0x80 fs/read_write.c:557
 do_syscall_64+0x242/0x330 arch/x86/entry/common.c:284
 return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:245
================================================

Make sure tun_get_user() doesn't touch skb->data[0] unless there is
actual data.

C reproducer below:
==========================
    // autogenerated by syzkaller (http://github.com/google/syzkaller)

    #define _GNU_SOURCE

    #include <fcntl.h>
    #include <linux/if_tun.h>
    #include <netinet/ip.h>
    #include <net/if.h>
    #include <string.h>
    #include <sys/ioctl.h>

    int main()
    {
      int sock = socket(PF_INET, SOCK_STREAM, IPPROTO_IP);
      int tun_fd = open("/dev/net/tun", O_RDWR);
      struct ifreq req;
      memset(&req, 0, sizeof(struct ifreq));
      strcpy((char*)&req.ifr_name, "gre0");
      req.ifr_flags = IFF_UP | IFF_MULTICAST;
      ioctl(tun_fd, TUNSETIFF, &req);
      ioctl(sock, SIOCSIFFLAGS, "gre0");
      write(tun_fd, "hi", 0);
      return 0;
    }
==========================

Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-21 17:09:03 +02:00
Todd Kjos
837de638dc Merge branch 'upstream-linux-4.4.y' into android-4.4 2017-03-02 13:53:48 -08:00
Willem de Bruijn
625bd9e43b tun: read vnet_hdr_sz once
[ Upstream commit e1edab87faf6ca30cd137e0795bc73aa9a9a22ec ]

When IFF_VNET_HDR is enabled, a virtio_net header must precede data.
Data length is verified to be greater than or equal to expected header
length tun->vnet_hdr_sz before copying.

Read this value once and cache locally, as it can be updated between
the test and use (TOCTOU).

Signed-off-by: Willem de Bruijn <willemb@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
CC: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-18 16:39:27 +01:00
Dmitry Shmidt
321249bb2f Merge remote-tracking branch 'common/android-4.4' into android-4.4.y-merge
Change-Id: I049d2e9d238a92d56100e8e317be6688497eb501
2016-09-20 13:12:59 -07:00
Soheil Hassas Yeganeh
17e35cbf18 UPSTREAM: tun: fix transmit timestamp support
Instead of using sock_tx_timestamp, use skb_tx_timestamp to record
software transmit timestamp of a packet.

sock_tx_timestamp resets and overrides the tx_flags of the skb.
The function is intended to be called from within the protocol
layer when creating the skb, not from a device driver. This is
inconsistent with other drivers and will cause issues for TCP.

In TCP, we intend to sample the timestamps for the last byte
for each sendmsg/sendpage. For that reason, tcp_sendmsg calls
tcp_tx_timestamp only with the last skb that it generates.
For example, if a 128KB message is split into two 64KB packets
we want to sample the SND timestamp of the last packet. The current
code in the tun driver, however, will result in sampling the SND
timestamp for both packets.

Also, when the last packet is split into smaller packets for
retranmission (see tcp_fragment), the tun driver will record
timestamps for all of the retransmitted packets and not only the
last packet.

Change-Id: If7458ab31de52aa15a12364b6c1ac2a8f93f17a7
Fixes: eda2977291 (tun: Support software transmit time stamping.)
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Francis Yan <francisyyan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-20 10:38:01 -07:00
Dmitry Shmidt
b558f17a13 This is the 4.4.16 stable release
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJXmOXmAAoJEDjbvchgkmk+QYIP/1S8oBZsvjfDzvH8t63HyLeH
 i43MFlYoFAqUIZc002XpluSvZ8uHoG+r7R8Hq3wmv48wxe3M6OBnMdBVTht6mPw+
 t5OLTZr40lWaJm2EIi4aekueMIrCgmL+Et+IFYv7ZVBuYLteVcfny+zdq4EqGmgj
 /a19+L/sTTr4SHtJIhHxWhiVJ9fVMgQk/N3VgQmIiNF2+lVbiFI7QQiDPLbFl0KK
 CM4ETO22HxHCYilGpzhpSMsHCxv12VqNaXNLAsPAepGGW7PqvUmrEWAqgwsbOfRc
 GxTLNk0dUgJqMrfEpQ8ZOMlgzvCAYG2jZuNSuT+nuzrWSUP+WOGRi9TTTxp1CYuZ
 PHlhNTH7ZnqosxJUUZS2d9N5ygpqD48Rhlfl824YzOWCy94VeUnedkVLb20uJwPF
 Y5aQ5WjktBC9why5e4OgGQERvx/U9KTk8E1zRfZZPc2oft9My0YxuemjjKAKZiYN
 ne4WhXbgOJTQkAoZwh2xqny3bWyEaoSrWpQ3R7bBJ9SIRLEOdCKzKpduDbAnbMP7
 QWgQOQC/6qA1mKqjrqF4KPA1Quo9PcUK2Ajh523ewMGCowgY90vyejAgh4Q8g0GC
 fKlx+jJDoKVDbQ8v4hc9PPHMsNNIKT9a1ptwVS3lE+bq1D5Ffm57A4/uOTMYHVab
 gKqu8h1CA0MCVBsH3nNA
 =nY8S
 -----END PGP SIGNATURE-----

Merge tag 'v4.4.16' into android-4.4.y

This is the 4.4.16 stable release

Change-Id: Ibaf7b7e03695e1acebc654a2ca1a4bfcc48fcea4
2016-08-01 15:57:55 -07:00
Jason Wang
bccd56fad0 tuntap: correctly wake up process during uninit
[ Upstream commit addf8fc4acb1cf79492ac64966f07178793cb3d7 ]

We used to check dev->reg_state against NETREG_REGISTERED after each
time we are woke up. But after commit 9e641bdcfa ("net-tun:
restructure tun_do_read for better sleep/wakeup efficiency"), it uses
skb_recv_datagram() which does not check dev->reg_state. This will
result if we delete a tun/tap device after a process is blocked in the
reading. The device will wait for the reference count which was held
by that process for ever.

Fixes this by using RCV_SHUTDOWN which will be checked during
sk_recv_datagram() before trying to wake up the process during uninit.

Fixes: 9e641bdcfa ("net-tun: restructure tun_do_read for better
sleep/wakeup efficiency")
Cc: Eric Dumazet <edumazet@google.com>
Cc: Xi Wang <xii@google.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24 10:18:16 -07:00
Jason Wang
9daaadbe7b tuntap: restore default qdisc
[ Upstream commit 016adb7260f481168c03e09f785184d6d5278894 ]

After commit f84bb1eac0 ("net: fix IFF_NO_QUEUE for drivers using
alloc_netdev"), default qdisc was changed to noqueue because
tuntap does not set tx_queue_len during .setup(). This patch restores
default qdisc by setting tx_queue_len in tun_setup().

Fixes: f84bb1eac0 ("net: fix IFF_NO_QUEUE for drivers using alloc_netdev")
Cc: Phil Sutter <phil@nwl.cc>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-20 15:42:06 +09:00
Daniel Borkmann
e137eeb38d tun, bpf: fix suspicious RCU usage in tun_{attach, detach}_filter
[ Upstream commit 5a5abb1fa3b05dd6aa821525832644c1e7d2905f ]

Sasha Levin reported a suspicious rcu_dereference_protected() warning
found while fuzzing with trinity that is similar to this one:

  [   52.765684] net/core/filter.c:2262 suspicious rcu_dereference_protected() usage!
  [   52.765688] other info that might help us debug this:
  [   52.765695] rcu_scheduler_active = 1, debug_locks = 1
  [   52.765701] 1 lock held by a.out/1525:
  [   52.765704]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff816a64b7>] rtnl_lock+0x17/0x20
  [   52.765721] stack backtrace:
  [   52.765728] CPU: 1 PID: 1525 Comm: a.out Not tainted 4.5.0+ #264
  [...]
  [   52.765768] Call Trace:
  [   52.765775]  [<ffffffff813e488d>] dump_stack+0x85/0xc8
  [   52.765784]  [<ffffffff810f2fa5>] lockdep_rcu_suspicious+0xd5/0x110
  [   52.765792]  [<ffffffff816afdc2>] sk_detach_filter+0x82/0x90
  [   52.765801]  [<ffffffffa0883425>] tun_detach_filter+0x35/0x90 [tun]
  [   52.765810]  [<ffffffffa0884ed4>] __tun_chr_ioctl+0x354/0x1130 [tun]
  [   52.765818]  [<ffffffff8136fed0>] ? selinux_file_ioctl+0x130/0x210
  [   52.765827]  [<ffffffffa0885ce3>] tun_chr_ioctl+0x13/0x20 [tun]
  [   52.765834]  [<ffffffff81260ea6>] do_vfs_ioctl+0x96/0x690
  [   52.765843]  [<ffffffff81364af3>] ? security_file_ioctl+0x43/0x60
  [   52.765850]  [<ffffffff81261519>] SyS_ioctl+0x79/0x90
  [   52.765858]  [<ffffffff81003ba2>] do_syscall_64+0x62/0x140
  [   52.765866]  [<ffffffff817d563f>] entry_SYSCALL64_slow_path+0x25/0x25

Same can be triggered with PROVE_RCU (+ PROVE_RCU_REPEATEDLY) enabled
from tun_attach_filter() when user space calls ioctl(tun_fd, TUN{ATTACH,
DETACH}FILTER, ...) for adding/removing a BPF filter on tap devices.

Since the fix in f91ff5b9ff ("net: sk_{detach|attach}_filter() rcu
fixes") sk_attach_filter()/sk_detach_filter() now dereferences the
filter with rcu_dereference_protected(), checking whether socket lock
is held in control path.

Since its introduction in 9940516259 ("tun: socket filter support"),
tap filters are managed under RTNL lock from __tun_chr_ioctl(). Thus the
sock_owned_by_user(sk) doesn't apply in this specific case and therefore
triggers the false positive.

Extend the BPF API with __sk_attach_filter()/__sk_detach_filter() pair
that is used by tap filters and pass in lockdep_rtnl_is_held() for the
rcu_dereference_protected() checks instead.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-20 15:42:06 +09:00
Chia-chi Yeh
6eca735acf net: Only NET_ADMIN is allowed to fully control TUN interfaces.
Signed-off-by: Chia-chi Yeh <chiachi@android.com>
2016-02-16 13:51:14 -08:00
Eric Dumazet
9cd3e072b0 net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
This patch is a cleanup to make following patch easier to
review.

Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
from (struct socket)->flags to a (struct socket_wq)->flags
to benefit from RCU protection in sock_wake_async()

To ease backports, we rename both constants.

Two new helpers, sk_set_bit(int nr, struct sock *sk)
and sk_clear_bit(int net, struct sock *sk) are added so that
following patch can change their implementation.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-01 15:45:05 -05:00
Eric Dumazet
5fcd2d8be4 tun: use sk_fullsock() before reading sk->sk_tsflags
timewait or request sockets are small and do not contain sk->sk_tsflags

Without this fix, we might read garbage, and crash later in

__skb_complete_tx_timestamp()
 -> sock_queue_err_skb()

(These pseudo sockets do not have an error queue either)

Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:45:48 -07:00
Toshiaki Makita
5e52796a9a tuntap: Don't segment multiple tagged packets on tap device
Tap devices don't need to segment multiple tagged packets.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-03 14:24:50 -07:00
Linus Torvalds
5fc835284d virtio/vhost: cross endian support
I have just queued some more bugfix patches today but none fix regressions and
 none are related to these ones, so it looks like a good time for a merge for
 -rc1.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVk7JOAAoJECgfDbjSjVRpHgEIAKrgLd7gIQ8lO+LCYqne6WLQ
 Ky8rOUnaxX4gD5N0akhfJFr/m/yIyAfk9+ALZZUo3kfuFiEsT2rn32iK/2Gj8pcu
 HFoAWhS+7b/ZsfpHRPtv/zVD3q4c3nWsWpfWK09J+4t0UJuC8fmGMoBzkS0kjZtd
 dQnHlJi5+1u4ch2x9sYYeVx7GOJ8a1W0q7cWJnWdOffWLEP9/zB8fgRVLFp/7AAd
 uBlza93RU81wS7q5tSUph6ESPqt2yu357e//4jnWjVx5EUXDRBL3A/T1JpC1qYSn
 WV2Gv14x+LVz2G8WgGmwfMq1H9Dvd/OzNToX5R8SIRx6Rh5L6gxFQjqt4dclGj8=
 =nKap
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio/vhost cross endian support from Michael Tsirkin:
 "I have just queued some more bugfix patches today but none fix
  regressions and none are related to these ones, so it looks like a
  good time for a merge for -rc1.

  The motivation for this is support for legacy BE guests on the new LE
  hosts.  There are two redeeming properties that made me merge this:

   - It's a trivial amount of code: since we wrap host/guest accesses
     anyway, almost all of it is well hidden from drivers.

   - Sane platforms would never set flags like VHOST_CROSS_ENDIAN_LEGACY,
     and when it's clear, there's zero overhead (as some point it was
     tested by compiling with and without the patches, got the same
     stripped binary).

  Maybe we could create a Kconfig symbol to enforce the second point:
  prevent people from enabling it eg on x86.  I will look into this"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio-pci: alloc only resources actually used.
  macvtap/tun: cross-endian support for little-endian hosts
  vhost: cross-endian support for legacy devices
  virtio: add explicit big-endian support to memory accessors
  vhost: introduce vhost_is_little_endian() helper
  vringh: introduce vringh_is_little_endian() helper
  macvtap: introduce macvtap_is_little_endian() helper
  tun: add tun_is_little_endian() helper
  virtio: introduce virtio_is_little_endian() helper
2015-07-03 16:02:25 -07:00
Greg Kurz
8b8e658b16 macvtap/tun: cross-endian support for little-endian hosts
The VNET_LE flag was introduced to fix accesses to virtio 1.0 headers
that are always little-endian. It can also be used to handle the special
case of a legacy little-endian device implemented by a big-endian host.

Let's add a flag and ioctls for big-endian devices as well. If both flags
are set, little-endian wins.

Since this is isn't a common usecase, the feature is controlled by a kernel
config option (not set by default).

Both macvtap and tun are covered by this patch since they share the same
API with userland.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2015-06-01 15:48:56 +02:00
Greg Kurz
7d82410950 virtio: add explicit big-endian support to memory accessors
The current memory accessors logic is:
- little endian if little_endian
- native endian (i.e. no byteswap) if !little_endian

If we want to fully support cross-endian vhost, we also need to be
able to convert to big endian.

Instead of changing the little_endian argument to some 3-value enum, this
patch changes the logic to:
- little endian if little_endian
- big endian if !little_endian

The native endian case is handled by all users with a trivial helper. This
patch doesn't change any functionality, nor it does add overhead.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2015-06-01 15:48:54 +02:00
Greg Kurz
25bd55bbab tun: add tun_is_little_endian() helper
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2015-06-01 15:48:50 +02:00
Eric W. Biederman
11aa9c28b4 net: Pass kern from net_proto_family.create to sk_alloc
In preparation for changing how struct net is refcounted
on kernel sockets pass the knowledge that we are creating
a kernel socket from sock_create_kern through to sk_alloc.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-11 10:50:17 -04:00
Eric W. Biederman
140e807da1 tun: Utilize the normal socket network namespace refcounting.
There is no need for tun to do the weird network namespace refcounting.
The existing network namespace refcounting in tfile has almost exactly
the same lifetime.  So rewrite the code to use the struct sock network
namespace refcounting and remove the unnecessary hand rolled network
namespace refcounting and the unncesary tfile->net.

This change allows the tun code to directly call sock_put bypassing
sock_release and making SOCK_EXTERNALLY_ALLOCATED unnecessary.

Remove the now unncessary tun_release so that if anything tries to use
the sock_release code path the kernel will oops, and let us know about
the bug.

The macvtap code already uses it's internal socket this way.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-11 10:50:16 -04:00
Al Viro
5d5d568975 make new_sync_{read,write}() static
All places outside of core VFS that checked ->read and ->write for being NULL or
called the methods directly are gone now, so NULL {read,write} with non-NULL
{read,write}_iter will do the right thing in all cases.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11 22:29:40 -04:00
Ying Xue
1b78414047 net: Remove iocb argument from sendmsg and recvmsg
After TIPC doesn't depend on iocb argument in its internal
implementations of sendmsg() and recvmsg() hooks defined in proto
structure, no any user is using iocb argument in them at all now.
Then we can drop the redundant iocb argument completely from kinds of
implementations of both sendmsg() and recvmsg() in the entire
networking stack.

Cc: Christoph Hellwig <hch@lst.de>
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 13:06:31 -05:00
Eric Dumazet
567e4b7973 net: rfs: add hash collision detection
Receive Flow Steering is a nice solution but suffers from
hash collisions when a mix of connected and unconnected traffic
is received on the host, when flow hash table is populated.

Also, clearing flow in inet_release() makes RFS not very good
for short lived flows, as many packets can follow close().
(FIN , ACK packets, ...)

This patch extends the information stored into global hash table
to not only include cpu number, but upper part of the hash value.

I use a 32bit value, and dynamically split it in two parts.

For host with less than 64 possible cpus, this gives 6 bits for the
cpu number, and 26 (32-6) bits for the upper part of the hash.

Since hash bucket selection use low order bits of the hash, we have
a full hash match, if /proc/sys/net/core/rps_sock_flow_entries is big
enough.

If the hash found in flow table does not match, we fallback to RPS (if
it is enabled for the rxqueue).

This means that a packet for an non connected flow can avoid the
IPI through a unrelated/victim CPU.

This also means we no longer have to clear the table at socket
close time, and this helps short lived flows performance.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08 16:53:57 -08:00
David S. Miller
6e03f896b5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/vxlan.c
	drivers/vhost/net.c
	include/linux/if_vlan.h
	net/core/dev.c

The net/core/dev.c conflict was the overlap of one commit marking an
existing function static whilst another was adding a new function.

In the include/linux/if_vlan.h case, the type used for a local
variable was changed in 'net', whereas the function got rewritten
to fix a stacked vlan bug in 'net-next'.

In drivers/vhost/net.c, Al Viro's iov_iter conversions in 'net-next'
overlapped with an endainness fix for VHOST 1.0 in 'net'.

In drivers/net/vxlan.c, vxlan_find_vni() added a 'flags' parameter
in 'net-next' whereas in 'net' there was a bug fix to pass in the
correct network namespace pointer in calls to this function.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 14:33:28 -08:00
Takashi Iwai
c4d33e24b6 tun: Use static attribute groups for sysfs entries
Instead of manual calls of device_create_file() and
device_remove_files(), assign the static attribute groups to netdev
groups array.  This simplifies the code and avoids the possible
races.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 00:30:47 -08:00
Vlad Yasevich
e3e3c423f8 Revert "drivers/net: Disable UFO through virtio"
This reverts commit 3d0ad09412.

Now that GSO functionality can correctly track if the fragment
id has been selected and select a fragment id if necessary,
we can re-enable UFO on tap/macvap and virtio devices.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-03 23:06:43 -08:00
Vlad Yasevich
72f6510745 Revert "drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets"
This reverts commit 5188cd44c5.

Now that GSO layer can track if fragment id has been selected
and can allocate one if necessary, we don't need to do this in
tap and macvtap.  This reverts most of the code and only keeps
the new ipv6 fragment id generation function that is still needed.

Fixes: 3d0ad09412 (drivers/net: Disable UFO through virtio)
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-03 23:06:43 -08:00
Jiri Pirko
df8a39defa net: rename vlan_tx_* helpers since "tx" is misleading there
The same macros are used for rx as well. So rename it.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 17:51:08 -05:00
Pankaj Gupta
baf71c5c1f tuntap: Increase the number of queues in tun.
Networking under kvm works best if we allocate a per-vCPU RX and TX
queue in a virtual NIC. This requires a per-vCPU queue on the host side.

It is now safe to increase the maximum number of queues.
Preceding patch: 'net: allow large number of rx queues'
made sure this won't cause failures due to high order memory
allocations. Increase it to 256: this is the max number of vCPUs
KVM supports.

Size of tun_struct changes from 8512 to 10496 after this patch. This keeps
pages allocated for tun_struct before and after the patch to 3.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: David Gibson <dgibson@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 17:05:05 -05:00
Alex Gartrell
957f094f22 tun: return proper error code from tun_do_read
Instead of -1 with EAGAIN, read on a O_NONBLOCK tun fd will return 0.  This
fixes this by properly returning the error code from __skb_recv_datagram.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 14:14:54 -05:00
Alex Gartrell
87897931c8 tun: Fixed unsigned/signed comparison
Validated that this was actually using the unsigned comparison with gdb.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 14:14:54 -05:00
Michael S. Tsirkin
1cf8e410b6 tun: drop broken IFF_VNET_LE
Use TUNSETVNETLE/TUNGETVNETLE instead.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-16 11:19:41 -05:00
Linus Torvalds
70e71ca0af Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) New offloading infrastructure and example 'rocker' driver for
    offloading of switching and routing to hardware.

    This work was done by a large group of dedicated individuals, not
    limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend,
    Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu

 2) Start making the networking operate on IOV iterators instead of
    modifying iov objects in-situ during transfers.  Thanks to Al Viro
    and Herbert Xu.

 3) A set of new netlink interfaces for the TIPC stack, from Richard
    Alpe.

 4) Remove unnecessary looping during ipv6 routing lookups, from Martin
    KaFai Lau.

 5) Add PAUSE frame generation support to gianfar driver, from Matei
    Pavaluca.

 6) Allow for larger reordering levels in TCP, which are easily
    achievable in the real world right now, from Eric Dumazet.

 7) Add a variable of napi_schedule that doesn't need to disable cpu
    interrupts, from Eric Dumazet.

 8) Use a doubly linked list to optimize neigh_parms_release(), from
    Nicolas Dichtel.

 9) Various enhancements to the kernel BPF verifier, and allow eBPF
    programs to actually be attached to sockets.  From Alexei
    Starovoitov.

10) Support TSO/LSO in sunvnet driver, from David L Stevens.

11) Allow controlling ECN usage via routing metrics, from Florian
    Westphal.

12) Remote checksum offload, from Tom Herbert.

13) Add split-header receive, BQL, and xmit_more support to amd-xgbe
    driver, from Thomas Lendacky.

14) Add MPLS support to openvswitch, from Simon Horman.

15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen
    Klassert.

16) Do gro flushes on a per-device basis using a timer, from Eric
    Dumazet.  This tries to resolve the conflicting goals between the
    desired handling of bulk vs.  RPC-like traffic.

17) Allow userspace to ask for the CPU upon what a packet was
    received/steered, via SO_INCOMING_CPU.  From Eric Dumazet.

18) Limit GSO packets to half the current congestion window, from Eric
    Dumazet.

19) Add a generic helper so that all drivers set their RSS keys in a
    consistent way, from Eric Dumazet.

20) Add xmit_more support to enic driver, from Govindarajulu
    Varadarajan.

21) Add VLAN packet scheduler action, from Jiri Pirko.

22) Support configurable RSS hash functions via ethtool, from Eyal
    Perry.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits)
  Fix race condition between vxlan_sock_add and vxlan_sock_release
  net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
  net/mlx4: Add support for A0 steering
  net/mlx4: Refactor QUERY_PORT
  net/mlx4_core: Add explicit error message when rule doesn't meet configuration
  net/mlx4: Add A0 hybrid steering
  net/mlx4: Add mlx4_bitmap zone allocator
  net/mlx4: Add a check if there are too many reserved QPs
  net/mlx4: Change QP allocation scheme
  net/mlx4_core: Use tasklet for user-space CQ completion events
  net/mlx4_core: Mask out host side virtualization features for guests
  net/mlx4_en: Set csum level for encapsulated packets
  be2net: Export tunnel offloads only when a VxLAN tunnel is created
  gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
  cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
  net: fec: only enable mdio interrupt before phy device link up
  net: fec: clear all interrupt events to support i.MX6SX
  net: fec: reset fep link status in suspend function
  net: sock: fix access via invalid file descriptor
  net: introduce helper macro for_each_cmsghdr
  ...
2014-12-11 14:27:06 -08:00
Linus Torvalds
6b9e2cea42 virtio: virtio 1.0 support, misc patches
This adds a lot of infrastructure for virtio 1.0 support.
 Notable missing pieces: virtio pci, virtio balloon (needs spec extension),
 vhost scsi.
 
 Plus, there are some minor fixes in a couple of places.
 
 Cc: David Miller <davem@davemloft.net>
 Cc: Rusty Russell <rusty@rustcorp.com.au>
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUh1CVAAoJECgfDbjSjVRpWZcH/2+EGPyng7Lca820UHA0cU1U
 u4D8CAAwOGaVdnUUo8ox1eon3LNB2UgRtgsl3rBDR3YTgFfNPrfuYdnHO0dYIDc1
 lS26NuPrVrTX0lA+OBPe2nlKrsrOkn8aw1kxG9Y0gKtNg/+HAGNW5e2eE7R/LrA5
 94XbWZ8g9Yf4GPG1iFmih9vQvvN0E68zcUlojfCnllySgaIEYr8nTiGQBWpRgJat
 fCqFAp1HMDZzGJQO+m1/Vw0OftTRVybyfai59e6uUTa8x1djvzPb/1MvREqQjegM
 ylSuofIVyj7JPu++FbAjd9mikkb53GSc8ql3YmWNZLdr69rnkzP0GdzQvrdheAo=
 =RtrR
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:
 "virtio: virtio 1.0 support, misc patches

  This adds a lot of infrastructure for virtio 1.0 support.  Notable
  missing pieces: virtio pci, virtio balloon (needs spec extension),
  vhost scsi.

  Plus, there are some minor fixes in a couple of places.

  Note: some net drivers are affected by these patches.  David said he's
  fine with merging these patches through my tree.

  Rusty's on vacation, he acked using my tree for these, too"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (70 commits)
  virtio_ccw: finalize_features error handling
  virtio_ccw: future-proof finalize_features
  virtio_pci: rename virtio_pci -> virtio_pci_common
  virtio_pci: update file descriptions and copyright
  virtio_pci: split out legacy device support
  virtio_pci: setup config vector indirectly
  virtio_pci: setup vqs indirectly
  virtio_pci: delete vqs indirectly
  virtio_pci: use priv for vq notification
  virtio_pci: free up vq->priv
  virtio_pci: fix coding style for structs
  virtio_pci: add isr field
  virtio: drop legacy_only driver flag
  virtio_balloon: drop legacy_only driver flag
  virtio_ccw: rev 1 devices set VIRTIO_F_VERSION_1
  virtio: allow finalize_features to fail
  virtio_ccw: legacy: don't negotiate rev 1/features
  virtio: add API to detect legacy devices
  virtio_console: fix sparse warnings
  vhost: remove unnecessary forward declarations in vhost.h
  ...
2014-12-11 12:20:31 -08:00
Al Viro
c0371da604 put iov_iter into msghdr
Note that the code _using_ ->msg_iter at that point will be very
unhappy with anything other than unshifted iovec-backed iov_iter.
We still need to convert users to proper primitives.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-09 16:29:03 -05:00
Michael S. Tsirkin
56f0dcc5aa tun: TUN_VNET_LE support, fix sparse warnings for virtio headers
Pretty straight-forward: convert all fields to/from
virtio endian-ness.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
2014-12-09 12:05:31 +02:00
Michael S. Tsirkin
40630b82c2 tun: drop most type defines
It's just as easy to use IFF_ flags directly,
there's no point in adding our own defines.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09 12:05:30 +02:00
Michael S. Tsirkin
031f5e0338 tun: move internal flag defines out of uapi
TUN_ flags are internal and never exposed
to userspace. Any application using it is almost
certainly buggy.

Move them out to tun.c.

Note: we remove these completely in follow-up patches,
this code movement is split out for ease of review.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09 12:05:30 +02:00
Al Viro
ba00410b81 Merge branch 'iov_iter' into for-next 2014-12-08 20:39:29 -05:00
Jason Wang
f51a5e82ea tun/macvtap: use consume_skb() instead of kfree_skb() when needed
To be more friendly with drop monitor, we should only call kfree_skb() when
the packets were dropped and use consume_skb() in other cases.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:45:09 -08:00
Herbert Xu
d8febb77b5 tun: Fix GSO meta-data handling in tun_get_user
When we write the GSO meta-data in tun_get_user we end up advancing
the IO vector twice, thus exhausting the user buffer before we can
finish writing the packet.

Fixes: f5ff53b4d9 ("{macvtap,tun}_get_user(): switch to iov_iter")
Reported-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02 20:53:47 -08:00
Al Viro
f5ff53b4d9 {macvtap,tun}_get_user(): switch to iov_iter
allows to switch macvtap and tun from ->aio_write() to ->write_iter()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24 05:15:41 -05:00
Al Viro
9b067034d0 switch drivers/net/tun.c to ->read_iter()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24 04:28:51 -05:00
Jason Wang
baeababb5b tun: return NET_XMIT_DROP for dropped packets
After commit 5d09710925
("tun: only queue packets on device"), NETDEV_TX_OK was returned for
dropped packets. This will confuse pktgen since dropped packets were
counted as sent ones.

Fixing this by returning NET_XMIT_DROP to let pktgen count it as error
packet.

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-19 14:43:47 -05:00
Jason Wang
8c847d2541 tun: fix issues of iovec iterators using in tun_put_user()
This patch fixes two issues after using iovec iterators:
- vlan_offset should be initialized to zero, otherwise unexpected offset
  will be used in skb_copy_datagram_iter()
- advance iovec iterator when vnet_hdr_sz is greater than sizeof(gso), this
  is the case when mergeable rx buffer were enabled for a virt guest.

Fixes e0b46d0ee9 ("tun: Use iovec iterators")
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13 14:33:22 -05:00
Herbert Xu
e0b46d0ee9 tun: Use iovec iterators
This patch removes the use of skb_copy_datagram_const_iovec in
favour of the iovec iterator-based skb_copy_datagram_iter.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-07 12:13:34 -05:00