android_kernel_oneplus_msm8998/net
Neil Horman 0e3125c755 packet: Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v4)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Version 4 of this patch.

Change notes:
1) Removed extra memset.  Didn't think kcalloc added a GFP_ZERO the way kzalloc did :)

Summary:
It was shown to me recently that systems under high load were driven very deep
into swap when tcpdump was run.  The reason this happened was because the
AF_PACKET protocol has a SET_RINGBUFFER socket option that allows the user space
application to specify how many entries an AF_PACKET socket will have and how
large each entry will be.  It seems the default setting for tcpdump is to set
the ring buffer to 32 entries of 64 Kb each, which implies 32 order 5
allocation.  Thats difficult under good circumstances, and horrid under memory
pressure.

I thought it would be good to make that a bit more usable.  I was going to do a
simple conversion of the ring buffer from contigous pages to iovecs, but
unfortunately, the metadata which AF_PACKET places in these buffers can easily
span a page boundary, and given that these buffers get mapped into user space,
and the data layout doesn't easily allow for a change to padding between frames
to avoid that, a simple iovec change is just going to break user space ABI
consistency.

So I've done this, I've added a three tiered mechanism to the af_packet set_ring
socket option.  It attempts to allocate memory in the following order:

1) Using __get_free_pages with GFP_NORETRY set, so as to fail quickly without
digging into swap

2) Using vmalloc

3) Using __get_free_pages with GFP_NORETRY clear, causing us to try as hard as
needed to get the memory

The effect is that we don't disturb the system as much when we're under load,
while still being able to conduct tcpdumps effectively.

Tested successfully by me.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Maciej Żenczykowski <zenczykowski@gmail.com>
Reported-by: Maciej Żenczykowski <zenczykowski@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-16 10:26:47 -08:00
..
9p
802
8021q vlan: Fix build warning in vlandev_seq_show() 2010-11-15 10:37:30 -08:00
appletalk
atm
ax25
bluetooth
bridge bridge: add RCU annotations to bridge port lookup 2010-11-15 11:13:18 -08:00
caif
can
ceph
core net: Export netif_get_vlan_features(). 2010-11-15 20:15:03 -08:00
dcb
dccp
decnet
dns_resolver
dsa
econet
ethernet
ieee802154
ipv4 xfrm: use gre key as flow upper protocol info 2010-11-15 10:44:04 -08:00
ipv6 ipv6: fix missing in6_ifa_put in addrconf 2010-11-16 09:24:58 -08:00
ipx
irda
iucv
key
l2tp
lapb
llc
mac80211 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-11-16 09:17:12 -08:00
netfilter
netlabel
netlink
netrom
packet packet: Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v4) 2010-11-16 10:26:47 -08:00
phonet
rds
rfkill
rose
rxrpc
sched
sctp
sunrpc
tipc
unix
wanrouter
wimax
wireless Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-11-16 09:17:12 -08:00
x25
xfrm
compat.c
Kconfig
Makefile
nonet.c
socket.c
sysctl_net.c
TUNABLE