* refs/heads/tmp-c71ad0f:
BACKPORT: arm64: dts: juno: fix cluster sleep state entry latency on all SoC versions
staging: android: ashmem: lseek failed due to no FMODE_LSEEK.
ANDROID: sdcardfs: update module info
ANDROID: sdcardfs: use d_splice_alias
ANDROID: sdcardfs: add read_iter/write_iter opeations
ANDROID: sdcardfs: fix ->llseek to update upper and lower offset
ANDROID: sdcardfs: copy lower inode attributes in ->ioctl
ANDROID: sdcardfs: remove unnecessary call to do_munmap
Merge 4.4.59 into android-4.4
UPSTREAM: ipv6 addrconf: implement RFC7559 router solicitation backoff
android: base-cfg: enable CONFIG_INET_DIAG_DESTROY
ANDROID: android-base.cfg: add CONFIG_MODULES option
ANDROID: android-base.cfg: add CONFIG_IKCONFIG option
ANDROID: android-base.cfg: properly sort the file
ANDROID: binder: add hwbinder,vndbinder to BINDER_DEVICES.
ANDROID: sort android-recommended.cfg
UPSTREAM: config/android: Remove CONFIG_IPV6_PRIVACY
UPSTREAM: config: android: set SELinux as default security mode
config: android: move device mapper options to recommended
ANDROID: ARM64: Allow to choose appended kernel image
UPSTREAM: arm64: vdso: constify vm_special_mapping used for aarch32 vectors page
UPSTREAM: arm64: vdso: add __init section marker to alloc_vectors_page
UPSTREAM: ARM: 8597/1: VDSO: put RO and RO after init objects into proper sections
UPSTREAM: arm64: Add support for CLOCK_MONOTONIC_RAW in clock_gettime() vDSO
UPSTREAM: arm64: Refactor vDSO time functions
UPSTREAM: arm64: fix vdso-offsets.h dependency
UPSTREAM: kbuild: drop FORCE from PHONY targets
UPSTREAM: mm: add PHYS_PFN, use it in __phys_to_pfn()
UPSTREAM: ARM: 8476/1: VDSO: use PTR_ERR_OR_ZERO for vma check
Linux 4.4.58
crypto: algif_hash - avoid zero-sized array
fbcon: Fix vc attr at deinit
serial: 8250_pci: Detach low-level driver during PCI error recovery
ACPI / blacklist: Make Dell Latitude 3350 ethernet work
ACPI / blacklist: add _REV quirks for Dell Precision 5520 and 3520
uvcvideo: uvc_scan_fallback() for webcams with broken chain
s390/zcrypt: Introduce CEX6 toleration
block: allow WRITE_SAME commands with the SG_IO ioctl
vfio/spapr: Postpone allocation of userspace version of TCE table
PCI: Do any VF BAR updates before enabling the BARs
PCI: Ignore BAR updates on virtual functions
PCI: Update BARs using property bits appropriate for type
PCI: Don't update VF BARs while VF memory space is enabled
PCI: Decouple IORESOURCE_ROM_ENABLE and PCI_ROM_ADDRESS_ENABLE
PCI: Add comments about ROM BAR updating
PCI: Remove pci_resource_bar() and pci_iov_resource_bar()
PCI: Separate VF BAR updates from standard BAR updates
x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic
igb: add i211 to i210 PHY workaround
igb: Workaround for igb i210 firmware issue
xen: do not re-use pirq number cached in pci device msi msg data
xfs: clear _XBF_PAGES from buffers when readahead page
USB: usbtmc: add missing endpoint sanity check
nl80211: fix dumpit error path RTNL deadlocks
xfs: fix up xfs_swap_extent_forks inline extent handling
xfs: don't allow di_size with high bit set
libceph: don't set weight to IN when OSD is destroyed
raid10: increment write counter after bio is split
cpufreq: Restore policy min/max limits on CPU online
ARM: dts: at91: sama5d2: add dma properties to UART nodes
ARM: at91: pm: cpu_idle: switch DDR to power-down mode
iommu/vt-d: Fix NULL pointer dereference in device_to_iommu
xen/acpi: upload PM state from init-domain to Xen
mmc: sdhci: Do not disable interrupts while waiting for clock
ext4: mark inode dirty after converting inline directory
parport: fix attempt to write duplicate procfiles
iio: hid-sensor-trigger: Change get poll value function order to avoid sensor properties losing after resume from S3
iio: adc: ti_am335x_adc: fix fifo overrun recovery
mmc: ushc: fix NULL-deref at probe
uwb: hwa-rc: fix NULL-deref at probe
uwb: i1480-dfu: fix NULL-deref at probe
usb: hub: Fix crash after failure to read BOS descriptor
usb: musb: cppi41: don't check early-TX-interrupt for Isoch transfer
USB: wusbcore: fix NULL-deref at probe
USB: idmouse: fix NULL-deref at probe
USB: lvtest: fix NULL-deref at probe
USB: uss720: fix NULL-deref at probe
usb-core: Add LINEAR_FRAME_INTR_BINTERVAL USB quirk
usb: gadget: f_uvc: Fix SuperSpeed companion descriptor's wBytesPerInterval
ACM gadget: fix endianness in notifications
USB: serial: qcserial: add Dell DW5811e
USB: serial: option: add Quectel UC15, UC20, EC21, and EC25 modems
ALSA: hda - Adding a group of pin definition to fix headset problem
ALSA: ctxfi: Fix the incorrect check of dma_set_mask() call
ALSA: seq: Fix racy cell insertions during snd_seq_pool_done()
Input: sur40 - validate number of endpoints before using them
Input: kbtab - validate number of endpoints before using them
Input: cm109 - validate number of endpoints before using them
Input: yealink - validate number of endpoints before using them
Input: hanwang - validate number of endpoints before using them
Input: ims-pcu - validate number of endpoints before using them
Input: iforce - validate number of endpoints before using them
Input: i8042 - add noloop quirk for Dell Embedded Box PC 3000
Input: elan_i2c - add ASUS EeeBook X205TA special touchpad fw
tcp: initialize icsk_ack.lrcvtime at session start time
socket, bpf: fix sk_filter use after free in sk_clone_lock
ipv4: provide stronger user input validation in nl_fib_input()
net: bcmgenet: remove bcmgenet_internal_phy_setup()
net/mlx5e: Count LRO packets correctly
net/mlx5: Increase number of max QPs in default profile
net: unix: properly re-increment inflight counter of GC discarded candidates
amd-xgbe: Fix jumbo MTU processing on newer hardware
net: properly release sk_frag.page
net: bcmgenet: Do not suspend PHY if Wake-on-LAN is enabled
net/openvswitch: Set the ipv6 source tunnel key address attribute correctly
Linux 4.4.57
ext4: fix fencepost in s_first_meta_bg validation
percpu: acquire pcpu_lock when updating pcpu_nr_empty_pop_pages
gfs2: Avoid alignment hole in struct lm_lockname
isdn/gigaset: fix NULL-deref at probe
target: Fix VERIFY_16 handling in sbc_parse_cdb
scsi: libiscsi: add lock around task lists to fix list corruption regression
scsi: lpfc: Add shutdown method for kexec
target/pscsi: Fix TYPE_TAPE + TYPE_MEDIMUM_CHANGER export
md/raid1/10: fix potential deadlock
powerpc/boot: Fix zImage TOC alignment
cpufreq: Fix and clean up show_cpuinfo_cur_freq()
perf/core: Fix event inheritance on fork()
give up on gcc ilog2() constant optimizations
kernek/fork.c: allocate idle task for a CPU always on its local node
hv_netvsc: use skb_get_hash() instead of a homegrown implementation
tpm_tis: Use devm_free_irq not free_irq
drm/amdgpu: add missing irq.h include
s390/pci: fix use after free in dma_init
KVM: PPC: Book3S PR: Fix illegal opcode emulation
xen/qspinlock: Don't kick CPU if IRQ is not initialized
Drivers: hv: avoid vfree() on crash
Drivers: hv: balloon: don't crash when memory is added in non-sorted order
pinctrl: cherryview: Do not mask all interrupts in probe
ACPI / video: skip evaluating _DOD when it does not exist
cxlflash: Increase cmd_per_lun for better throughput
crypto: mcryptd - Fix load failure
crypto: cryptd - Assign statesize properly
crypto: ghash-clmulni - Fix load failure
USB: don't free bandwidth_mutex too early
usb: core: hub: hub_port_init lock controller instead of bus
ANDROID: sdcardfs: Fix style issues in macros
ANDROID: sdcardfs: Use seq_puts over seq_printf
ANDROID: sdcardfs: Use to kstrout
ANDROID: sdcardfs: Use pr_[...] instead of printk
ANDROID: sdcardfs: remove unneeded null check
ANDROID: sdcardfs: Fix style issues with comments
ANDROID: sdcardfs: Fix formatting
ANDROID: sdcardfs: correct order of descriptors
fix the deadlock in xt_qtaguid when enable DDEBUG
net: ipv6: Add sysctl for minimum prefix len acceptable in RIOs.
Linux 4.4.56
futex: Add missing error handling to FUTEX_REQUEUE_PI
futex: Fix potential use-after-free in FUTEX_REQUEUE_PI
x86/perf: Fix CR4.PCE propagation to use active_mm instead of mm
x86/kasan: Fix boot with KASAN=y and PROFILE_ANNOTATED_BRANCHES=y
fscrypto: lock inode while setting encryption policy
fscrypt: fix renaming and linking special files
net sched actions: decrement module reference count after table flush.
dccp: fix memory leak during tear-down of unsuccessful connection request
dccp/tcp: fix routing redirect race
bridge: drop netfilter fake rtable unconditionally
ipv6: avoid write to a possibly cloned skb
ipv6: make ECMP route replacement less greedy
mpls: Send route delete notifications when router module is unloaded
act_connmark: avoid crashing on malformed nlattrs with null parms
uapi: fix linux/packet_diag.h userspace compilation error
vrf: Fix use-after-free in vrf_xmit
dccp: fix use-after-free in dccp_feat_activate_values
net: fix socket refcounting in skb_complete_tx_timestamp()
net: fix socket refcounting in skb_complete_wifi_ack()
tcp: fix various issues for sockets morphing to listen state
dccp: Unlock sock before calling sk_free()
net: net_enable_timestamp() can be called from irq contexts
net: don't call strlen() on the user buffer in packet_bind_spkt()
l2tp: avoid use-after-free caused by l2tp_ip_backlog_recv
ipv4: mask tos for input route
vti6: return GRE_KEY for vti6
vxlan: correctly validate VXLAN ID against VXLAN_N_VID
netlink: remove mmapped netlink support
ANDROID: mmc: core: export emmc revision
BACKPORT: mmc: core: Export device lifetime information through sysfs
ANDROID: android-verity: do not compile as independent module
ANDROID: sched: fix duplicate sched_group_energy const specifiers
config: disable CONFIG_USELIB and CONFIG_FHANDLE
ANDROID: power: align wakeup_sources format
ANDROID: dm: android-verity: allow disable dm-verity for Treble VTS
uid_sys_stats: change to use rt_mutex
ANDROID: vfs: user permission2 in notify_change2
ANDROID: sdcardfs: Fix gid issue
ANDROID: sdcardfs: Use tabs instead of spaces in multiuser.h
ANDROID: sdcardfs: Remove uninformative prints
ANDROID: sdcardfs: move path_put outside of spinlock
ANDROID: sdcardfs: Use case insensitive hash function
ANDROID: sdcardfs: declare MODULE_ALIAS_FS
ANDROID: sdcardfs: Get the blocksize from the lower fs
ANDROID: sdcardfs: Use d_invalidate instead of drop_recurisve
ANDROID: sdcardfs: Switch to internal case insensitive compare
ANDROID: sdcardfs: Use spin_lock_nested
ANDROID: sdcardfs: Replace get/put with d_lock
ANDROID: sdcardfs: rate limit warning print
ANDROID: sdcardfs: Fix case insensitive lookup
ANDROID: uid_sys_stats: account for fsync syscalls
ANDROID: sched: add a counter to track fsync
ANDROID: uid_sys_stats: fix negative write bytes.
ANDROID: uid_sys_stats: allow writing same state
ANDROID: uid_sys_stats: rename uid_cputime.c to uid_sys_stats.c
ANDROID: uid_cputime: add per-uid IO usage accounting
DTB: Add EAS compatible Juno Energy model to 'juno.dts'
arm64: dts: juno: Add idle-states to device tree
ANDROID: Replace spaces by '_' for some android filesystem tracepoints.
usb: gadget: f_accessory: Fix for UsbAccessory clean unbind.
android: binder: move global binder state into context struct.
android: binder: add padding to binder_fd_array_object.
binder: use group leader instead of open thread
nf: IDLETIMER: Use fullsock when querying uid
nf: IDLETIMER: Fix use after free condition during work
ANDROID: dm: android-verity: fix table_make_digest() error handling
ANDROID: usb: gadget: function: Fix commenting style
cpufreq: interactive governor drops bits in time calculation
ANDROID: sdcardfs: support direct-IO (DIO) operations
ANDROID: sdcardfs: implement vm_ops->page_mkwrite
ANDROID: sdcardfs: Don't bother deleting freelist
ANDROID: sdcardfs: Add missing path_put
ANDROID: sdcardfs: Fix incorrect hash
ANDROID: ext4 crypto: Disables zeroing on truncation when there's no key
ANDROID: ext4: add a non-reversible key derivation method
ANDROID: ext4: allow encrypting filenames using HEH algorithm
ANDROID: arm64/crypto: add ARMv8-CE optimized poly_hash algorithm
ANDROID: crypto: heh - factor out poly_hash algorithm
ANDROID: crypto: heh - Add Hash-Encrypt-Hash (HEH) algorithm
ANDROID: crypto: gf128mul - Add ble multiplication functions
ANDROID: crypto: gf128mul - Refactor gf128 overflow macros and tables
UPSTREAM: crypto: gf128mul - Zero memory when freeing multiplication table
ANDROID: crypto: shash - Add crypto_grab_shash() and crypto_spawn_shash_alg()
ANDROID: crypto: allow blkcipher walks over ablkcipher data
UPSTREAM: arm/arm64: crypto: assure that ECB modes don't require an IV
ANDROID: Refactor fs readpage/write tracepoints.
ANDROID: export security_path_chown
Squashfs: optimize reading uncompressed data
Squashfs: implement .readpages()
Squashfs: replace buffer_head with BIO
Squashfs: refactor page_actor
Squashfs: remove the FILE_CACHE option
ANDROID: android-recommended.cfg: CONFIG_CPU_SW_DOMAIN_PAN=y
FROMLIST: 9p: fix a potential acl leak
BACKPORT: posix_acl: Clear SGID bit when setting file permissions
UPSTREAM: udp: properly support MSG_PEEK with truncated buffers
UPSTREAM: arm64: Allow hw watchpoint of length 3,5,6 and 7
BACKPORT: arm64: hw_breakpoint: Handle inexact watchpoint addresses
UPSTREAM: arm64: Allow hw watchpoint at varied offset from base address
BACKPORT: hw_breakpoint: Allow watchpoint of length 3,5,6 and 7
ANDROID: sdcardfs: Switch strcasecmp for internal call
ANDROID: sdcardfs: switch to full_name_hash and qstr
ANDROID: sdcardfs: Add GID Derivation to sdcardfs
ANDROID: sdcardfs: Remove redundant operation
ANDROID: sdcardfs: add support for user permission isolation
ANDROID: sdcardfs: Refactor configfs interface
ANDROID: sdcardfs: Allow non-owners to touch
ANDROID: binder: fix format specifier for type binder_size_t
ANDROID: fs: Export vfs_rmdir2
ANDROID: fs: Export free_fs_struct and set_fs_pwd
BACKPORT: Input: xpad - validate USB endpoint count during probe
BACKPORT: Input: xpad - fix oops when attaching an unknown Xbox One gamepad
ANDROID: mnt: remount should propagate to slaves of slaves
ANDROID: sdcardfs: Switch ->d_inode to d_inode()
ANDROID: sdcardfs: Fix locking issue with permision fix up
ANDROID: sdcardfs: Change magic value
ANDROID: sdcardfs: Use per mount permissions
ANDROID: sdcardfs: Add gid and mask to private mount data
ANDROID: sdcardfs: User new permission2 functions
ANDROID: vfs: Add setattr2 for filesystems with per mount permissions
ANDROID: vfs: Add permission2 for filesystems with per mount permissions
ANDROID: vfs: Allow filesystems to access their private mount data
ANDROID: mnt: Add filesystem private data to mount points
ANDROID: sdcardfs: Move directory unlock before touch
ANDROID: sdcardfs: fix external storage exporting incorrect uid
ANDROID: sdcardfs: Added top to sdcardfs_inode_info
ANDROID: sdcardfs: Switch package list to RCU
ANDROID: sdcardfs: Fix locking for permission fix up
ANDROID: sdcardfs: Check for other cases on path lookup
ANDROID: sdcardfs: override umask on mkdir and create
arm64: kernel: Fix build warning
DEBUG: sched/fair: Fix sched_load_avg_cpu events for task_groups
DEBUG: sched/fair: Fix missing sched_load_avg_cpu events
UPSTREAM: l2tp: fix racy SOCK_ZAPPED flag check in l2tp_ip{,6}_bind()
UPSTREAM: packet: fix race condition in packet_set_ring
UPSTREAM: netlink: Fix dump skb leak/double free
UPSTREAM: net: avoid signed overflows for SO_{SND|RCV}BUFFORCE
MIPS: Prevent "restoration" of MSA context in non-MSA kernels
net: socket: don't set sk_uid to garbage value in ->setattr()
ANDROID: configs: CONFIG_ARM64_SW_TTBR0_PAN=y
UPSTREAM: arm64: Disable PAN on uaccess_enable()
UPSTREAM: arm64: Enable CONFIG_ARM64_SW_TTBR0_PAN
UPSTREAM: arm64: xen: Enable user access before a privcmd hvc call
UPSTREAM: arm64: Handle faults caused by inadvertent user access with PAN enabled
BACKPORT: arm64: Disable TTBR0_EL1 during normal kernel execution
BACKPORT: arm64: Introduce uaccess_{disable,enable} functionality based on TTBR0_EL1
BACKPORT: arm64: Factor out TTBR0_EL1 post-update workaround into a specific asm macro
BACKPORT: arm64: Factor out PAN enabling/disabling into separate uaccess_* macros
UPSTREAM: arm64: alternative: add auto-nop infrastructure
UPSTREAM: arm64: barriers: introduce nops and __nops macros for NOP sequences
Revert "FROMLIST: arm64: Factor out PAN enabling/disabling into separate uaccess_* macros"
Revert "FROMLIST: arm64: Factor out TTBR0_EL1 post-update workaround into a specific asm macro"
Revert "FROMLIST: arm64: Introduce uaccess_{disable,enable} functionality based on TTBR0_EL1"
Revert "FROMLIST: arm64: Disable TTBR0_EL1 during normal kernel execution"
Revert "FROMLIST: arm64: Handle faults caused by inadvertent user access with PAN enabled"
Revert "FROMLIST: arm64: xen: Enable user access before a privcmd hvc call"
Revert "FROMLIST: arm64: Enable CONFIG_ARM64_SW_TTBR0_PAN"
ANDROID: sched/walt: fix build failure if FAIR_GROUP_SCHED=n
ANDROID: trace: net: use %pK for kernel pointers
ANDROID: android-base: Enable QUOTA related configs
net: ipv4: Don't crash if passing a null sk to ip_rt_update_pmtu.
net: inet: Support UID-based routing in IP protocols.
net: core: add UID to flows, rules, and routes
net: core: Add a UID field to struct sock.
Revert "net: core: Support UID-based routing."
UPSTREAM: efi/arm64: Don't apply MEMBLOCK_NOMAP to UEFI memory map mapping
UPSTREAM: arm64: mm: always take dirty state from new pte in ptep_set_access_flags
UPSTREAM: arm64: Implement pmdp_set_access_flags() for hardware AF/DBM
UPSTREAM: arm64: Fix typo in the pmdp_huge_get_and_clear() definition
UPSTREAM: arm64: enable CONFIG_DEBUG_RODATA by default
goldfish: enable CONFIG_INET_DIAG_DESTROY
sched/walt: kill {min,max}_capacity
sched: fix wrong truncation of walt_avg
build: fix build config kernel_dir
ANDROID: dm verity: add minimum prefetch size
build: add build server configs for goldfish
usb: gadget: Fix compilation problem with tx_qlen field
Conflicts:
android/configs/android-base.cfg
arch/arm64/Makefile
arch/arm64/include/asm/cpufeature.h
arch/arm64/kernel/vdso/gettimeofday.S
arch/arm64/mm/cache.S
drivers/md/Kconfig
drivers/misc/Makefile
drivers/mmc/host/sdhci.c
drivers/usb/core/hcd.c
drivers/usb/gadget/function/u_ether.c
fs/sdcardfs/derived_perm.c
fs/sdcardfs/file.c
fs/sdcardfs/inode.c
fs/sdcardfs/lookup.c
fs/sdcardfs/main.c
fs/sdcardfs/multiuser.h
fs/sdcardfs/packagelist.c
fs/sdcardfs/sdcardfs.h
fs/sdcardfs/super.c
include/linux/mmc/card.h
include/linux/mmc/mmc.h
include/trace/events/android_fs.h
include/trace/events/android_fs_template.h
drivers/android/binder.c
fs/exec.c
fs/ext4/crypto_key.c
fs/ext4/ext4.h
fs/ext4/inline.c
fs/ext4/inode.c
fs/ext4/readpage.c
fs/f2fs/data.c
fs/f2fs/inline.c
fs/mpage.c
include/linux/dcache.h
include/trace/events/sched.h
include/uapi/linux/ipv6.h
net/ipv4/tcp_ipv4.c
net/netfilter/xt_IDLETIMER.c
Change-Id: Ie345db6a14869fe0aa794aef4b71b5d0d503690b
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
This implements:
https://tools.ietf.org/html/rfc7559
Backoff is performed according to RFC3315 section 14:
https://tools.ietf.org/html/rfc3315#section-14
We allow setting /proc/sys/net/ipv6/conf/*/router_solicitations
to a negative value meaning an unlimited number of retransmits,
and we make this the new default (inline with the RFC).
We also add a new setting:
/proc/sys/net/ipv6/conf/*/router_solicitation_max_interval
defaulting to 1 hour (per RFC recommendation).
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Acked-by: Erik Kline <ek@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit bd11f0741fa5a2c296629898ad07759dd12b35bb in
DaveM's net-next/master, should make Linus' tree in 4.9-rc1)
Change-Id: Ia32cdc5c61481893ef8040734e014bf2229fc39e
This commit adds a new sysctl accept_ra_rt_info_min_plen that
defines the minimum acceptable prefix length of Route Information
Options. The new sysctl is intended to be used together with
accept_ra_rt_info_max_plen to configure a range of acceptable
prefix lengths. It is useful to prevent misconfigurations from
unintentionally blackholing too much of the IPv6 address space
(e.g., home routers announcing RIOs for fc00::/7, which is
incorrect).
[backport of net-next bbea124bc99df968011e76eba105fe964a4eceab]
Bug: 33333670
Test: net_test passes
Signed-off-by: Joel Scherpelz <jscherpelz@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added new procfs flag to toggle the automatic addition of prefix
routes on a per device basis. The new flag is accept_ra_prefix_route.
Defaults to 1 as to not break existing behavior.
CRs-Fixed: 435320
Change-Id: If25493890c7531c27f5b2c4855afebbbbf5d072a
Acked-by: Harout S. Hedeshian <harouth@qti.qualcomm.com>
Signed-off-by: Tianyi Gou <tgou@codeaurora.org>
[subashab@codeaurora.org: resolve trivial merge conflicts]
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Currently, IPv6 router discovery always puts routes into
RT6_TABLE_MAIN. This causes problems for connection managers
that want to support multiple simultaneous network connections
and want control over which one is used by default (e.g., wifi
and wired).
To work around this connection managers typically take the routes
they prefer and copy them to static routes with low metrics in
the main table. This puts the burden on the connection manager
to watch netlink to see if the routes have changed, delete the
routes when their lifetime expires, etc.
Instead, this patch adds a per-interface sysctl to have the
kernel put autoconf routes into different tables. This allows
each interface to have its own autoconf table, and choosing the
default interface (or using different interfaces at the same
time for different types of traffic) can be done using
appropriate ip rules.
The sysctl behaves as follows:
- = 0: default. Put routes into RT6_TABLE_MAIN as before.
- > 0: manual. Put routes into the specified table.
- < 0: automatic. Add the absolute value of the sysctl to the
device's ifindex, and use that table.
The automatic mode is most useful in conjunction with
net.ipv6.conf.default.accept_ra_rt_table. A connection manager
or distribution could set it to, say, -100 on boot, and
thereafter just use IP rules.
Change-Id: I82d16e3737d9cdfa6489e649e247894d0d60cbb1
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
This patch addresses multiple problems :
UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
while socket is not locked : Other threads can change np->opt
concurrently. Dmitry posted a syzkaller
(http://github.com/google/syzkaller) program desmonstrating
use-after-free.
Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
and dccp_v6_request_recv_sock() also need to use RCU protection
to dereference np->opt once (before calling ipv6_dup_options())
This patch adds full RCU protection to np->opt
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
SYN_RECV & TIMEWAIT sockets are not full blown, they do not have a pinet6
pointer.
Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Like the ipv4 patch with a similar title, this adds a sysctl to allow
the user to change routing behavior based on whether or not the
interface associated with the nexthop was an up or down link. The
default setting preserves the current behavior, but anyone that enables
it will notice that nexthops on down interfaces will no longer be
selected:
net.ipv6.conf.all.ignore_routes_with_linkdown = 0
net.ipv6.conf.default.ignore_routes_with_linkdown = 0
net.ipv6.conf.lo.ignore_routes_with_linkdown = 0
...
When the above sysctls are set, not only will link status be reported to
userspace, but an indication that a nexthop is dead and will not be used
is also reported.
1000::/8 via 7000::2 dev p7p1 metric 1024 dead linkdown pref medium
1000::/8 via 8000::2 dev p8p1 metric 1024 pref medium
7000::/8 dev p7p1 proto kernel metric 256 dead linkdown pref medium
8000::/8 dev p8p1 proto kernel metric 256 pref medium
9000::/8 via 8000::2 dev p8p1 metric 2048 pref medium
9000::/8 via 7000::2 dev p7p1 metric 1024 dead linkdown pref medium
fe80::/64 dev p7p1 proto kernel metric 256 dead linkdown pref medium
fe80::/64 dev p8p1 proto kernel metric 256 pref medium
This also adds devconf support and notification when sysctl values
change.
v2: drop use of rt6i_nhflags since it is not needed right now
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 6fd99094de ("ipv6: Don't reduce hop limit for an interface")
disabled accept hop limit from RA if it is smaller than the current hop
limit for security stuff. But this behavior kind of break the RFC definition.
RFC 4861, 6.3.4. Processing Received Router Advertisements
A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
and Retrans Timer) may contain a value denoting that it is
unspecified. In such cases, the parameter should be ignored and the
host should continue using whatever value it is already using.
If the received Cur Hop Limit value is non-zero, the host SHOULD set
its CurHopLimit variable to the received value.
So add sysctl option accept_ra_min_hop_limit to let user choose the minimum
hop limit value they can accept from RA. And set default to 1 to meet RFC
standards.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Per RFC 6724, section 4, "Candidate Source Addresses":
It is RECOMMENDED that the candidate source addresses be the set
of unicast addresses assigned to the interface that will be used
to send to the destination (the "outgoing" interface).
Add a sysctl to enable this behaviour.
Signed-off-by: Erik Kline <ek@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements the procfs logic for the stable_address knob:
The secret is formatted as an ipv6 address and will be stored per
interface and per namespace. We track initialized flag and return EIO
errors until the secret is set.
We don't inherit the secret to newly created namespaces.
Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull IPv6 cork initialization into its own function that
can be re-used. IPv6 specific cork data did not have an
explicit data structure. This patch creats eone so that
just ipv6 cork data can be as arguemts. Also, since
IPv6 tries to save the flow label into inet_cork_full
tructure, pass the full cork.
Adjust ip6_cork_release() to take cork data structures.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The kernel forcefully applies MTU values received in router
advertisements provided the new MTU is less than the current. This
behavior is undesirable when the user space is managing the MTU. Instead
a sysctl flag 'accept_ra_mtu' is introduced such that the user space
can control whether or not RA provided MTU updates should be applied. The
default behavior is unchanged; user space must explicitly set this flag
to 0 for RA MTUs to be ignored.
Signed-off-by: Harout Hedeshian <harouth@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is only used in net/ipv6/inet6_hashtables.c.
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a sysctl that causes an interface's optimistic addresses
to be considered equivalent to other non-deprecated addresses
for source address selection purposes. Preferred addresses
will still take precedence over optimistic addresses, subject
to other ranking in the source address selection algorithm.
This is useful where different interfaces are connected to
different networks from different ISPs (e.g., a cell network
and a home wifi network).
The current behaviour complies with RFC 3484/6724, and it
makes sense if the host has only one interface, or has
multiple interfaces on the same network (same or cooperating
administrative domain(s), but not in the multiple distinct
networks case.
For example, if a mobile device has an IPv6 address on an LTE
network and then connects to IPv6-enabled wifi, while the wifi
IPv6 address is undergoing DAD, IPv6 connections will try use
the wifi default route with the LTE IPv6 address, and will get
stuck until they time out.
Also, because optimistic nodes can receive frames, issue
an RTM_NEWADDR as soon as DAD starts (with the IFA_F_OPTIMSTIC
flag appropriately set). A second RTM_NEWADDR is sent if DAD
completes (the address flags have changed), otherwise an
RTM_DELADDR is sent.
Also: add an entry in ip-sysctl.txt for optimistic_dad.
Signed-off-by: Erik Kline <ek@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Automatically generate flow labels for IPv6 packets on transmit.
The flow label is computed based on skb_get_hash. The flow label will
only automatically be set when it is zero otherwise (i.e. flow label
manager hasn't set one). This supports the transmit side functionality
of RFC 6438.
Added an IPv6 sysctl auto_flowlabels to enable/disable this behavior
system wide, and added IPV6_AUTOFLOWLABEL socket option to enable this
functionality per socket.
By default, auto flowlabels are disabled to avoid possible conflicts
with flow label manager, however if this feature proves useful we
may want to enable it by default.
It should also be noted that FreeBSD has already implemented automatic
flow labels (including the sysctl and socket option). In FreeBSD,
automatic flow labels default to enabled.
Performance impact:
Running super_netperf with 200 flows for TCP_RR and UDP_RR for
IPv6. Note that in UDP case, __skb_get_hash will be called for
every packet with explains slight regression. In the TCP case
the hash is saved in the socket so there is no regression.
Automatic flow labels disabled:
TCP_RR:
86.53% CPU utilization
127/195/322 90/95/99% latencies
1.40498e+06 tps
UDP_RR:
90.70% CPU utilization
118/168/243 90/95/99% latencies
1.50309e+06 tps
Automatic flow labels enabled:
TCP_RR:
85.90% CPU utilization
128/199/337 90/95/99% latencies
1.40051e+06
UDP_RR
92.61% CPU utilization
115/164/236 90/95/99% latencies
1.4687e+06
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an UDP application switches from AF_INET to AF_INET6 sockets, we
have a small performance degradation for IPv4 communications because of
extra cache line misses to access ipv6only information.
This can also be noticed for TCP listeners, as ipv6_only_sock() is also
used from __inet_lookup_listener()->compute_score()
This is magnified when SO_REUSEPORT is used.
Move ipv6only into struct sock_common so that it is available at
no extra cost in lookups.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This can be used in virtual networking applications, and
may have other uses as well. The option is disabled by
default.
A specific use case is setting up virtual routers, bridges, and
hosts on a single OS without the use of network namespaces or
virtual machines. With proper use of ip rules, routing tables,
veth interface pairs and/or other virtual interfaces,
and applications that can bind to interfaces and/or IP addresses,
it is possibly to create one or more virtual routers with multiple
hosts attached. The host interfaces can act as IPv6 systems,
with radvd running on the ports in the virtual routers. With the
option provided in this patch enabled, those hosts can now properly
obtain IPv6 addresses from the radvd.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since pktops is only used for IPv6 only and opts is used for IPv4
only, we can move these fields into a union and this allows us to drop
the inet6_reqsk_alloc function as after this change it becomes
equivalent with inet_reqsk_alloc.
This patch also fixes a kmemcheck issue in the IPv6 stack: the flags
field was not annotated after a request_sock was allocated.
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We currently don't report IPV6_RECVPKTINFO in cmsg access ancillary data
for IPv4 datagrams on IPv6 sockets.
This patch splits the ip6_datagram_recv_ctl into two functions, one
which handles both protocol families, AF_INET and AF_INET6, while the
ip6_datagram_recv_specific_ctl only handles IPv6 cmsg data.
ip6_datagram_recv_*_ctl never reported back any errors, so we can make
them return void. Also provide a helper for protocols which don't offer dual
personality to further use ip6_datagram_recv_ctl, which is exported to
modules.
I needed to shuffle the code for ping around a bit to make it easier to
implement dual personality for ping ipv6 sockets in future.
Reported-by: Gert Doering <gert@space.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
With this option, the socket will reply with the flow label value read
on received packets.
The goal is to have a connection with the same flow label in both
direction of the communication.
Changelog of V4:
* Do not erase the flow label on the listening socket. Use pktopts to
store the received value
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPV6_PMTU_INTERFACE is the same as IPV6_PMTU_PROBE for ipv6. Add it
nontheless for symmetry with IPv4 sockets. Also drop incoming MTU
information if this mode is enabled.
The additional bit in ipv6_pinfo just eats in the padding behind the
bitfield. There are no changes to the layout of the struct at all.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
tclass information in now already stored in rcv_flowinfo
We do not need to store the same information twice.
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current implementation of IPV6_FLOWINFO only gives a
result if pktoptions is available (thanks to the
ip6_datagram_recv_ctl function).
It gives inconsistent results to user space, sometimes
there is a result for getsockopt(IPV6_FLOWINFO), sometimes
not.
This patch add rcv_flowinfo to store it, and return it to
the userspace in the same way than other pkt_options.
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code to detect fragments in checksum_setup() was missing for IPv4 and
too eager for IPv6. (It transpires that Windows seems to send IPv6 packets
with a fragment header even if they are not a fragment - i.e. offset is zero,
and M bit is not set).
This patch also incorporates a fix to callers of maybe_pull_tail() where
skb->network_header was being erroneously added to the length argument.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
cc: David Miller <davem@davemloft.net>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code for privacy extentions is very mature, and making it
configurable only gives marginal memory/code savings in exchange
for obfuscation and hard to read code via CPP ifdef'ery.
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP listener refactoring, part 5 :
We want to be able to insert request sockets (SYN_RECV) into main
ehash table instead of the per listener hash table to allow RCU
lookups and remove listener lock contention.
This patch includes the needed struct sock_common in front
of struct request_sock
This means there is no more inet6_request_sock IPv6 specific
structure.
Following inet_request_sock fields were renamed as they became
macros to reference fields from struct sock_common.
Prefix ir_ was chosen to avoid name collisions.
loc_port -> ir_loc_port
loc_addr -> ir_loc_addr
rmt_addr -> ir_rmt_addr
rmt_port -> ir_rmt_port
iif -> ir_iif
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP listener refactoring, part 4 :
To speed up inet lookups, we moved IPv4 addresses from inet to struct
sock_common
Now is time to do the same for IPv6, because it permits us to have fast
lookups for all kind of sockets, including upcoming SYN_RECV.
Getting IPv6 addresses in TCP lookups currently requires two extra cache
lines, plus a dereference (and memory stall).
inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6
This patch is way bigger than its IPv4 counter part, because for IPv4,
we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
it's not doable easily.
inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr
And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
at the same offset.
We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
macro.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP listener refactoring, part 2 :
We can use a generic lookup, sockets being in whatever state, if
we are sure all relevant fields are at the same place in all socket
types (ESTABLISH, TIME_WAIT, SYN_RECV)
This patch removes these macros :
inet_addrpair, inet_addrpair, tw_addrpair, tw_portpair
And adds :
sk_portpair, sk_addrpair, sk_daddr, sk_rcv_saddr
Then, INET_TW_MATCH() is really the same than INET_MATCH()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements RFC6980: Drop fragmented ndisc packets by
default. If a fragmented ndisc packet is received the user is informed
that it is possible to disable the check.
Cc: Fernando Gont <fernando@gont.com.ar>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/wireless/iwlwifi/pcie/trans.c
include/linux/inetdevice.h
The inetdevice.h conflict involves moving the IPV4_DEVCONF values
into a UAPI header, overlapping additions of some new entries.
The iwlwifi conflict is a context overlap.
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not allowed for an ipv6 packet to contain multiple fragmentation
headers. So discard packets which were already reassembled by
fragmentation logic and send back a parameter problem icmp.
The updates for RFC 6980 will come in later, I have to do a bit more
research here.
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit cab70040df ("net: igmp:
Reduce Unsolicited report interval to 1s when using IGMPv3") and
2690048c01 ("net: igmp: Allow user-space
configuration of igmp unsolicited report interval") by William Manley made
igmp unsolicited report intervals configurable per interface and corrected
the interval of unsolicited igmpv3 report messages resendings to 1s.
Same needs to be done for IPv6:
MLDv1 (RFC2710 7.10.): 10 seconds
MLDv2 (RFC3810 9.11.): 1 second
Both intervals are configurable via new procfs knobs
mldv1_unsolicited_report_interval and mldv2_unsolicited_report_interval.
(also added .force_mld_version to ipv6_devconf_dflt to bring structs in
line without semantic changes)
v2:
a) Joined documentation update for IPv4 and IPv6 MLD/IGMP
unsolicited_report_interval procfs knobs.
b) incorporate stylistic feedback from William Manley
v3:
a) add new DEVCONF_* values to the end of the enum (thanks to David
Miller)
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: William Manley <william.manley@youview.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Router Alert option is very small and we can store the value
itself in the skb.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 7a3198a8 ("ipv6: helper function to get tclass") introduced
ipv6_tclass(), but similar function is already available as
ipv6_get_dsfield().
We might be able to call ipv6_tclass() from ipv6_get_dsfield(),
but it is confusing to have two versions.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support in the kernel for offloading in the NIC Tx and Rx
checksumming for encapsulated packets (such as VXLAN and IP GRE).
For Tx encapsulation offload, the driver will need to set the right bits
in netdev->hw_enc_features. The protocol driver will have to set the
skb->encapsulation bit and populate the inner headers, so the NIC driver will
use those inner headers to calculate the csum in hardware.
For Rx encapsulation offload, the driver will need to set again the
skb->encapsulation flag and the skb->ip_csum to CHECKSUM_UNNECESSARY.
In that case the protocol driver should push the decapsulated packet up
to the stack, again with CHECKSUM_UNNECESSARY. In ether case, the protocol
driver should set the skb->encapsulation flag back to zero. Finally the
protocol driver should have NETIF_F_RXCSUM flag set in its features.
Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 68835aba4d (net: optimize INET input path further)
moved some fields used for tcp/udp sockets lookup in the first cache
line of struct sock_common.
This patch moves inet_dport/inet_num as well, filling a 32bit hole
on 64 bit arches and reducing number of cache line misses in lookups.
Also change INET_MATCH()/INET_TW_MATCH() to perform the ports match
before addresses match, as this check is more discriminant.
Remove the hash check from MATCH() macros because we dont need to
re validate the hash value after taking a refcount on socket, and
use likely/unlikely compiler hints, as the sk_hash/hash check
makes the following conditional tests 100% predicted by cpu.
Introduce skc_addrpair/skc_portpair pair values to better
document the alignment requirements of the port/addr pairs
used in the various MATCH() macros, and remove some casts.
The namespace check can also be done at last.
This slightly improves TCP/UDP lookup times.
IP/TCP early demux needs inet->rx_dst_ifindex and
TCP needs inet->min_ttl, lets group them together in same cache line.
With help from Ben Hutchings & Joe Perches.
Idea of this patch came after Ling Ma proposal to move skc_hash
to the beginning of struct sock_common, and should allow him
to submit a final version of his patch. My tests show an improvement
doing so.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Joe Perches <joe@perches.com>
Cc: Ling Ma <ling.ma.program@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces a new knob ndisc_notify. If enabled, the kernel
will transmit an unsolicited neighbour advertisement on link-layer address
change to update the neighbour tables of the corresponding hosts more quickly.
This is the equivalent to arp_notify in ipv4 world.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>
The IPv6 conntrack fragmentation currently has a couple of shortcomings.
Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the
defragmented packet is then passed to conntrack, the resulting conntrack
information is attached to each original fragment and the fragments then
continue their way through the stack.
Helper invocation occurs in the POSTROUTING hook, at which point only
the original fragments are available. The result of this is that
fragmented packets are never passed to helpers.
This patch improves the situation in the following way:
- If a reassembled packet belongs to a connection that has a helper
assigned, the reassembled packet is passed through the stack instead
of the original fragments.
- During defragmentation, the largest received fragment size is stored.
On output, the packet is refragmented if required. If the largest
received fragment size exceeds the outgoing MTU, a "packet too big"
message is generated, thus behaving as if the original fragments
were passed through the stack from an outside point of view.
- The ipv6_helper() hook function can't receive fragments anymore for
connections using a helper, so it is switched to use ipv6_skip_exthdr()
instead of the netfilter specific nf_ct_ipv6_skip_exthdr() and the
reassembled packets are passed to connection tracking helpers.
The result of this is that we can properly track fragmented packets, but
still generate ICMPv6 Packet too big messages if we would have before.
This patch is also required as a precondition for IPv6 NAT, where NAT
helpers might enlarge packets up to a point that they require
fragmentation. In that case we can't generate Packet too big messages
since the proper MTU can't be calculated in all cases (f.i. when
changing textual representation of a variable amount of addresses),
so the packet is transparently fragmented iff the original packet or
fragments would have fit the outgoing MTU.
IPVS parts by Jesper Dangaard Brouer <brouer@redhat.com>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
IPv6 needs a cookie in dst_check() call.
We need to add rx_dst_cookie and provide a family independent
sk_rx_dst_set(sk, skb) method to properly support IPv6 TCP early demux.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should provide to inet6_csk_route_socket a struct flowi6 pointer,
so that net6_csk_xmit() works correctly instead of sending garbage.
Also add some consts
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, it is not easily possible to get TOS/DSCP value of packets from
an incoming TCP stream. The mechanism is there, IP_PKTOPTIONS getsockopt
with IP_RECVTOS set, the same way as incoming TTL can be queried. This is
not actually implemented for TOS, though.
This patch adds this functionality, both for IPv4 (IP_PKTOPTIONS) and IPv6
(IPV6_2292PKTOPTIONS). For IPv4, like in the IP_RECVTTL case, the value of
the TOS field is stored from the other party's ACK.
This is needed for proxies which require DSCP transparency. One such example
is at http://zph.bratcheda.org/.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement helper inline function to get traffic class from IPv6 header.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IPV6_UNICAST_IF feature is the IPv6 compliment to IP_UNICAST_IF.
Signed-off-by: Erich E. Hoover <ehoover@mines.edu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6_sk_mc_lock rwlock becomes a spinlock.
readers (inet6_mc_check()) now takes rcu_read_lock() instead of read
lock. Writers dont need to disable BH anymore.
struct ipv6_mc_socklist objects are reclaimed after one RCU grace
period.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>