The statistics registers have write-clear behaviour, which means we
will lose any increment between the read and write. Mitigate this by
only clearing when we read a non-zero value, so we will never falsely
report a total of zero. This also saves time as we only handle
error statistics here and they won't often be incremented.
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are many different sets of registers implemented by the
different versions of this controller, and we can only expect this to
get more complicated in future. Limit how much ethtool needs to know
by including an explicit bitmap of which registers are included in the
dump, allowing room for future growth in the number of possible
registers.
As I don't have datasheets for all of these, I've only included
registers that are:
- defined in all 5 register type arrays, or
- used by the driver, or
- documented in the datasheet I have
Add one new capability flag so we can tell whether the RTRATE
register is implemented.
Delete the TSU_ADRL0 and TSU_ADR{H,L}31 definitions, as they weren't
used and the address table is already assumed to be contiguous.
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we may silently read/write a register at offset 0. Change
this to WARN and then ignore the write or read-back all-ones.
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At least on the R8A7790, RFS8 reflects the RINT8 (multicast) MAC
status flag.
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
- minor timeout & other fixes on reservation/fence
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJU9q1GAAoJEAG+/NWsLn5bzhEQAOOVtIPmM2mF/FChUXROAv/M
AFNGamRi+F1cJH2dWNJCUKpqBDKw/73g84Qj7LH7VcnBZO3nc8fy9cBYIJ3yBrYE
fcUl88yzm7iQSlunlayBT+krItCxzzhVjS31OoEsQKlm1UdllYre0VPX/F50Zz2q
eB+ts6pRj7qRBfrX4bJ/zAoaAvLnHYxLCF/ib6toHCY2q0gnSzKq+LCUheGAKcc9
6Yi8uDd3oqOU9TJbP5kQgUhNgv/CmWeHATssmpbf81WUSavg7+WDQAy+eQz65z93
DIdP97GWEz+I9ClU1u0eG8JYUozvQ/YiY/DMqDcrZxoyTVSMdyRlnTMAXZZDVExJ
Op/n2IGj6Bu7sluDdtGvO8xSypCufLZm29upUWtmLsR5tsTtBcKGidBC0sqfpdSI
pDq4+KpFfxCfCx0iFnDATHnjFANNUZIOt14+eKnQZ58mximYphn/0iLO4q68VvSE
1E4JVMTOGXKcsA0XLqbIz8lsCEOva29svooXrOA7Hybb7bUnWXAfMMthvyAhWWVe
At0ZBJF1Ajx/UadfvhdRMuiZDPCbxrTcU344rxxANHfjBiDX99VqyvCoSFEqAMyL
iDhBK8xkOB/5f9k9KEd3dGU9ABvTmrg9sw2UbS0rDxkU0yZLE2+ajseQTFmJB4Dh
qHwFXLXN91FM4NwgeS73
=k3LO
-----END PGP SIGNATURE-----
Merge tag 'dma-buf-for-4.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/sumits/dma-buf
Pull dma-buf fixes from Sumit Semwal:
"Minor timeout & other fixes on reservation/fence"
* tag 'dma-buf-for-4.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/sumits/dma-buf:
reservation: Remove shadowing local variable 'ret'
dma-buf/fence: don't wait when specified timeout is zero
reservation: wait only with non-zero timeout specified (v3)
In general, if a transaction object is added to the list successfully,
we can rely on the abort path to undo what we've done. This allows us to
simplify the error handling of the rule replacement path in
nf_tables_newrule().
This implicitly fixes an unnecessary removal of the old rule, which
needs to be left in place if we fail to replace.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The NFT_USERDATA_MAXLEN is defined to 256, however we only have a u8
to store its size. Introduce a struct nft_userdata which contains a
length field and indicate its presence using a single bit in the rule.
The length field of struct nft_userdata is also a u8, however we don't
store zero sized data, so the actual length is udata->len + 1.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Check that the space required for the expressions doesn't exceed the
size of the dlen field, which would lead to the iterators crashing.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
A race condition exists in the rule transaction code for rules that
get added and removed within the same transaction.
The new rule starts out as inactive in the current and active in the
next generation and is inserted into the ruleset. When it is deleted,
it is additionally set to inactive in the next generation as well.
On commit the next generation is begun, then the actions are finalized.
For the new rule this would mean clearing out the inactive bit for
the previously current, now next generation.
However nft_rule_clear() clears out the bits for *both* generations,
activating the rule in the current generation, where it should be
deactivated due to being deleted. The rule will thus be active until
the deletion is finalized, removing the rule from the ruleset.
Similarly, when aborting a transaction for the same case, the undo
of insertion will remove it from the RCU protected rule list, the
deletion will clear out all bits. However until the next RCU
synchronization after all operations have been undone, the rule is
active on CPUs which can still see the rule on the list.
Generally, there may never be any modifications of the current
generations' inactive bit since this defeats the entire purpose of
atomicity. Change nft_rule_clear() to only touch the next generations
bit to fix this.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
DMA_BIT_MASK of 64 is not valid dma address mask for OMAPs, it should be
set to 32.
The 64 was introduced by commit (in 2009):
a152ff24b9 ASoC: OMAP: Make DMA 64 aligned
But the dma_mask and coherent_dma_mask can not be used to specify alignment.
Fixes: a152ff24b9 (ASoC: OMAP: Make DMA 64 aligned)
Reported-by: Grygorii Strashko <Grygorii.Strashko@linaro.org>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
- Fix for dynticks.
- Fix for smpboot bug.
- Fix for IOMMU group refcounting.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJU9mwsAAoJEFHr6jzI4aWAS28QALF6rTNK0lywqO+eV75BRLco
z8eINZozRJBBXjt5ATp4zJSAjOpnRNFlclfDPEcYav51cyxZRREbD322rMYR5uOL
9ixnOTDGpOFE9Sgt562OEVmRadAVrJf9gKw7/Cjebxb9XgfwpepvwBfN8Xxl1yc+
LnvnkJN327nDaBnWP7eTSkTeiznhDCdXz0HYmO1zUnrico2AZ/9qVdd+XD/QTk6o
t6m+aTowy9+4Ps7B4Y/l5dmKuOruFMxwSWFhUZrC6huvArJrYAMGvMT7++r2JTW4
tyHBCuB3pWHBm3BjlcR23sZoi8jPVkulKKLShGiXwaZxCSOhwMLwwFFVBHw2gJIW
grzzAtkM5cXACmGKiCWzAc8WMM0+u4vz/w1md+Nt4XvtZssRJwbXdJJHh7nXq+RE
M7IDnyvLI5m8Ae7qAnAts2Lf1SRZALY0zOUyS0jsn6jDJUw49BbKD54RuFqNzuhL
70dEsHy+awHEagIO47q+JEFWPaG20HzupVDe33j7KEx20xROvC4R2u+LIEG8YiJH
ve4xDVwTM7rVqvIvb17Rs0LLqKs8P9O7sQFoQf2V/xie/X1QwHDZsC24C4r07M/+
7ZqL9og3/nMo9SIKE1eNgBHq0FmUgOYkoaF+biDuzava9JYNxeT7nOhFfOqNz4RO
M+tgZwHU1NDpVHJerdP3
=8wky
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux
Pull powerpc fixes from Michael Ellerman:
- Fix for dynticks.
- Fix for smpboot bug.
- Fix for IOMMU group refcounting.
* tag 'powerpc-4.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
powerpc/iommu: Remove IOMMU device references via bus notifier
powerpc/smp: Wait until secondaries are active & online
powerpc: Re-enable dynticks
In seq_buf_vprintf(), vsnprintf() is used to copy the format into the
buffer remaining in the seq_buf structure. The return of vsnprintf()
is the amount of characters written to the buffer excluding the '\0',
unless the line was truncated!
If the line copied does not fit, it is truncated, and a '\0' is added
to the end of the buffer. But in this case, '\0' is included in the length
of the line written. To know if the buffer had overflowed, the return
length will be the same as the length of the buffer passed in.
The check in seq_buf_vprintf() only checked if the length returned from
vsnprintf() would fit in the buffer, as the seq_buf_vprintf() is only
to be an all or nothing command. It either writes all the string into
the seq_buf, or none of it. If the string is truncated, the pointers
inside the seq_buf must be reset to what they were when the function was
called. This is not the case. On overflow, it copies only part of the string.
The fix is to change the overflow check to see if the length returned from
vsnprintf() is less than the length remaining in the seq_buf buffer, and not
if it is less than or equal to as it currently does. Then seq_buf_vprintf()
will know if the write from vsnpritnf() was truncated or not.
Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Report the actual error code from acpi_bus_register_driver(), it may
help future debugging (typically ENODEV as previously reported, but the
unusual cases are where it may help most).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
i915.ko depends upon the acpi/video.ko module and so refuses to load if
ACPI is disabled at runtime if for example the BIOS is broken beyond
repair. acpi/video provides an optional service for i915.ko and so we
should just allow the modules to load, but do no nothing in order to let
the machines boot correctly.
Reported-by: Bill Augur <bill-auger@programmer.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: All applicable <stable@vger.kernel.org>
Acked-by: Aaron Lu <aaron.lu@intel.com>
[ rjw: Fixed up the new comment in acpi_video_init() ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Not all firmware revisions have a proper
multi-interface client powersaving implementation,
e.g. qca6174 WLAN.RM.2.0-00073.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
qca6184 WLAN.RM.2.0-00073 has a bug in sta
powersave state machine and requires peer param to
be poked to enable the powersave.
Calling this unconditionally should be safe for
other chips/firmwares.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
To keep consisitency with the rest of the file, use 'genpd' as the
name of the 'struct generic_pm_domain' pointer instead of 'gpd'.
This is just a rename, no functional changes.
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If CONFIG_SMP=n, <linux/smp.h> does not include <asm/smp.h>, causing:
drivers/cpufreq/ppc-corenet-cpufreq.c: In function 'corenet_cpufreq_cpu_init':
drivers/cpufreq/ppc-corenet-cpufreq.c:173:3: error: implicit declaration of function 'get_hard_smp_processor_id' [-Werror=implicit-function-declaration]
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Some BIOSes report incorrect length for ACPI address space descriptors,
so relax the checks to avoid regressions. This issue has appeared several
times as:
3162b6f0c5 ("PNPACPI: truncate _CRS windows with _LEN > _MAX - _MIN + 1")
d558b483d5 ("x86/PCI: truncate _CRS windows with _LEN > _MAX - _MIN + 1")
f238b414a7 ("PNPACPI: compute Address Space length rather than using _LEN")
48728e0774 ("x86/PCI: compute Address Space length rather than using _LEN")
Please refer to https://bugzilla.kernel.org/show_bug.cgi?id=94221
for more details and example malformed ACPI resource descriptors.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=94221
Fixes: 593669c2ac (x86/PCI/ACPI: Use common ACPI resource interfaces ...)
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Dave Airlie <airlied@redhat.com>
Tested-by: Prakash Punnoor <prakash@punnoor.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When parsing resources for PCI host bridge, we should ignore resources
consumed by host bridge itself and only report window resources available
to child PCI busses.
Fixes: 593669c2ac (x86/PCI/ACPI: Use common ACPI resource interfaces ...)
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Below are the refinements.
1. Set DMA abort bit when disabling dma channel. This will clear
the remaining data in dma FIFO, to fix channel-swap issue.
2. Read DMA HW pointer when updating DMA status. Previously dma
position is calculated by adding one period size in dma interrupt.
This is inaccurate/insufficient for some high-quality audio APP.
Since interrupt bottom half handler has variable schedule delay,
it causes big error when calculating sample delay. Read the actual
HW pointer and feedback can improve the accuracy.
3. Do some minor code clean.
Signed-off-by: Qiao Zhou <zhouqiao@marvell.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
The commit [39f802d6b6: 'regulator: Build sysfs entries with static
attribute groups'] converted the sysfs entry creation to static
attribute groups, but this resulted in a regression due to the NULL
check of rdev->constraints. At the point where the device is
registered, rdev->constraints isn't set, so the attributes depending
on it are missing.
We may fix it by shuffling the code order in regulator_register(), but
a quicker fix is to just remove this NULL check. rdev->constraints is
in anyway always set to non-NULL in set_machine_constraints(), thus
the check there is basically superfluous.
Fixes: 39f802d6b6 ('regulator: Build sysfs entries with static attribute groups')
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Reportded-by: Steve Twiss <stwiss.opensource@diasemi.com>
Tested-by: Steve Twiss <stwiss.opensource@diasemi.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
During wmm tests changing wmm parameters did not change anything.
This was because of mismatch in WMM params per vdev command.
WMM params per vdev uses different command structure than wmm params
per pdev command.
Patch concerns qca6174.
Signed-off-by: Marek Puzyniak <marek.puzyniak@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
qca6174 WLAN.RM.2.0-00073 firmware uses full rx
reordering offload and delivers Rx via a new HTT
event. The event however is incorrectly generated
in firmware and becomes overly long (with trailing
garbage). This was hitting defined CE buffer limit
that was programmed to the device and caused
device to crash upon busier Rx traffic.
Increasing the CE buffer limit for HTT Rx pipe to
2KBytes seems to be enough to workaround this
problem.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
The release_firmware() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
With patch "include guest facilities in kvm facility test" it is no
longer necessary to have special handling for the non-LPAR case.
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Most facility related decisions in KVM have to take into account:
- the facilities offered by the underlying run container (LPAR/VM)
- the facilities supported by the KVM code itself
- the facilities requested by a guest VM
This patch adds the KVM driver requested facilities to the test routine.
It additionally renames struct s390_model_fac to kvm_s390_fac and its field
names to be more meaningful.
The semantics of the facilities stored in the KVM architecture structure
is changed. The address arch.model.fac->list now points to the guest
facility list and arch.model.fac->mask points to the KVM facility mask.
This patch fixes the behaviour of KVM for some facilities for guests
that ignore the guest visible facility bits, e.g. guests could use
transactional memory intructions on hosts supporting them even if the
chosen cpu model would not offer them.
The userspace interface is not affected by this change.
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The facility lists were not fully copied.
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Under z/VM PQAP might trigger an operation exception if no crypto cards
are defined via APVIRTUAL or APDEDICATED.
[ 386.098666] Kernel BUG at 0000000000135c56 [verbose debug info unavailable]
[ 386.098693] illegal operation: 0001 ilc:2 [#1] SMP
[...]
[ 386.098751] Krnl PSW : 0704c00180000000 0000000000135c56 (kvm_s390_apxa_installed+0x46/0x98)
[...]
[ 386.098804] [<000000000013627c>] kvm_arch_init_vm+0x29c/0x358
[ 386.098806] [<000000000012d008>] kvm_dev_ioctl+0xc0/0x460
[ 386.098809] [<00000000002c639a>] do_vfs_ioctl+0x332/0x508
[ 386.098811] [<00000000002c660e>] SyS_ioctl+0x9e/0xb0
[ 386.098814] [<000000000070476a>] system_call+0xd6/0x258
[ 386.098815] [<000003fffc7400a2>] 0x3fffc7400a2
Lets add an extable entry and provide a zeroed config in that case.
Reported-by: Stefan Zimmermann <stzi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Tested-by: Stefan Zimmermann <stzi@linux.vnet.ibm.com>
Header file was in arch dependent location arch/blackfin/include/asm/bfin_can.h,
Now move and merge the useful contents of header file into driver code, note
the original header file is reserved for full registers set access test by other
code so it survives.
Signed-off-by: Aaron Wu <Aaron.wu@analog.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Blackfin was built without MMU, old driver code access the IO space by
physical address, introduce the ioremap approach to be compitable with
the common style supporting MMU enabled arch.
Signed-off-by: Aaron Wu <Aaron.wu@analog.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Replace the blackfin arch dependent style of bfin_read/bfin_write with
common readw/writew
Signed-off-by: Aaron Wu <Aaron.wu@analog.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Bjørn reported that his machine hang during hibernation and eventually
bisected the problem to the following commit:
commit da2bc1b9db
Author: Imre Deak <imre.deak@intel.com>
Date: Thu Oct 23 19:23:26 2014 +0300
drm/i915: add poweroff_late handler
The problem seems to be that after the kernel puts the device into D3
the BIOS still tries to access it, or otherwise assumes that it's in D0.
This is clearly bogus, since ACPI mandates that devices are put into D3
by the OSPM if they are not wake-up sources. In the future we want to
unify more of the driver's runtime and system suspend paths, for example
by skipping all the system suspend/hibernation hooks if the device is
runtime suspended already. Accordingly for all other platforms the goal
is still to properly power down the device during hibernation.
v2:
- Another GEN4 Lenovo laptop had the same issue, while platforms from
other vendors (including mobile and desktop, GEN4 and non-GEN4) seem
to work fine. Based on this apply the workaround on all GEN4 Lenovo
platforms.
- add code comment about failing platforms (Ville)
Reference: http://lists.freedesktop.org/archives/intel-gfx/2015-February/060633.html
Reported-and-bisected-by: Bjørn Mork <bjorn@mork.no>
Cc: stable@vger.kernel.org # v3.19
Signed-off-by: Imre Deak <imre.deak@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
When we takeover from the BIOS and install our interrupt handler, the
BIOS may have left us a few surprises in the form of spontaneous
interrupts. (This is especially likely on hardware like 965gm where
display fifo underruns are continuous and the GMCH cannot filter that
interrupt souce.) As we enable our IRQ early so that we can use it
during hardware probing, our interrupt handler must be prepared to
handle a few sources prior to being fully configured. As such, we need
to add a simple is-ready check prior to dereferencing our KMS state for
reporting underruns.
Reported-by: Rob Clark <rclark@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1193972
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
[Jani: dropped the extra !]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Eric W. Biederman says:
====================
Basic MPLS support take 2
On top of my two pending neighbour table prep patches here is the mpls
support refactored to use them, and edited to not drop routes when
an interface goes down. Additionally the addition of RTA_LLGATEWAY
has been replaced with the addtion of RTA_VIA. RTA_VIA being an
attribute that includes the address family as well as the address
of the next hop.
MPLS is at it's heart simple and I have endeavoured to maintain that
simplicity in my implemenation.
This is an implementation of a RFC3032 forwarding engine, and basic MPLS
egress logic. Which should make linux sufficient to be a mpls
forwarding node or to be a LSA (Label Switched Router) as it says in all
of the MPLS documents. The ingress support will follow but it deserves
it's own discussion so I am pushing it separately.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Unlike IPv4 this code notifies on all cases where mpls routes
are added or removed and it never automatically removes routes.
Avoiding both the userspace confusion that is caused by omitting
route updates and the possibility of a flood of netlink traffic
when an interface goes doew.
For now reserved labels are handled automatically and userspace
is not notified.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change adds two new netlink routing attributes:
RTA_VIA and RTA_NEWDST.
RTA_VIA specifies the specifies the next machine to send a packet to
like RTA_GATEWAY. RTA_VIA differs from RTA_GATEWAY in that it
includes the address family of the address of the next machine to send
a packet to. Currently the MPLS code supports addresses in AF_INET,
AF_INET6 and AF_PACKET. For AF_INET and AF_INET6 the destination mac
address is acquired from the neighbour table. For AF_PACKET the
destination mac_address is specified in the netlink configuration.
I think raw destination mac address support with the family AF_PACKET
will prove useful. There is MPLS-TP which is defined to operate
on machines that do not support internet packets of any flavor. Further
seem to be corner cases where it can be useful. At this point
I don't care much either way.
RTA_NEWDST specifies the destination address to forward the packet
with. MPLS typically changes it's destination address at every hop.
For a swap operation RTA_NEWDST is specified with a length of one label.
For a push operation RTA_NEWDST is specified with two or more labels.
For a pop operation RTA_NEWDST is not specified or equivalently an emtpy
RTAN_NEWDST is specified.
Those new netlink attributes are used to implement handling of rt-netlink
RTM_NEWROUTE, RTM_DELROUTE, and RTM_GETROUTE messages, to maintain the
MPLS label table.
rtm_to_route_config parses a netlink RTM_NEWROUTE or RTM_DELROUTE message,
verify no unhandled attributes or unhandled values are present and sets
up the data structures for mpls_route_add and mpls_route_del.
I did my best to match up with the existing conventions with the caveats
that MPLS addresses are all destination-specific-addresses, and so
don't properly have a scope.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reading and writing addresses in network byte order in netlink is
traditional and I see no reason to change that. MPLS is interesting
as effectively it has variabely length addresses (the MPLS label
stack). To represent these variable length addresses in netlink
I use a valid MPLS label stack (complete with stop bit).
This achieves two things: a well defined existing format is used,
and the data can be interpreted without looking at it's length.
Not needed to look at the length to decode the variable length
network representation allows existing userspace functions
such as inet_ntop to be used without needed to change their
prototype.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mpls_route_add and mpls_route_del implement the basic logic for adding
and removing Next Hop Label Forwarding Entries from the MPLS input
label map. The addition and subtraction is done in a way that is
consistent with how the existing routing table in Linux are
maintained. Thus all of the work to deal with NLM_F_APPEND,
NLM_F_EXCL, NLM_F_REPLACE, and NLM_F_CREATE.
Cases that are not clearly defined such as changing the interpretation
of the mpls reserved labels is not allowed.
Because it seems like the right thing to do adding an MPLS route without
specifying an input label and allowing the kernel to pick a free label
table entry is supported. The implementation is currently less than optimal
but that can be changed.
As I don't have anything else to test with only ethernet and the loopback
device are the only two device types currently supported for forwarding
MPLS over.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This sysctl gives two benefits. By defaulting the table size to 0
mpls even when compiled in and enabled defaults to not forwarding
any packets. This prevents unpleasant surprises for users.
The other benefit is that as mpls labels are allocated locally a dense
table a small dense label table may be used which saves memory and
is extremely simple and efficient to implement.
This sysctl allows userspace to choose the restrictions on the label
table size userspace applications need to cope with.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change adds a new Kconfig option MPLS_ROUTING.
The core of this change is the code to look at an mpls packet received
from another machine. Look that packet up in a routing table and
forward the packet on.
Support of MPLS over ATM is not considered or attempted here. This
implemntation follows RFC3032 and implements the MPLS shim header that
can pass over essentially any network.
What RFC3021 refers to as the as the Incoming Label Map (ILM) I call
net->mpls.platform_label[]. What RFC3031 refers to as the Next Label
Hop Forwarding Entry (NHLFE) I call mpls_route. Though calling it the
label fordwarding information base (lfib) might also be valid.
Further the implemntation forwards packets as described in RFC3032.
There is no need and given the original motivation for MPLS a strong
discincentive to have a flexible label forwarding path. In essence
the logic is the topmost label is read, looked up, removed, and
replaced by 0 or more new lables and the sent out the specified
interface to it's next hop.
Quite a few optional features are not implemented here. Among them
are generation of ICMP errors when the TTL is exceeded or the packet
is larger than the next hop MTU (those conditions are detected and the
packets are dropped instead of generating an icmp error). The traffic
class field is always set to 0. The implementation focuses on IP over
MPLS and does not handle egress of other kinds of protocols.
Instead of implementing coordination with the neighbour table and
sorting out how to input next hops in a different address family (for
which there is value). I was lazy and implemented a next hop mac
address instead. The code is simpler and there are flavor of MPLS
such as MPLS-TP where neither an IPv4 nor an IPv6 next hop is
appropriate so a next hop by mac address would need to be implemented
at some point.
Two new definitions AF_MPLS and PF_MPLS are exposed to userspace.
Decoding the mpls header must be done by first byeswapping a 32bit bit
endian word into the local cpu endian and then bit shifting to extract
the pieces. There is no C bit-field that can represent a wire format
mpls header on a little endian machine as the low bits of the 20bit
label wind up in the wrong half of third byte. Therefore internally
everything is deal with in cpu native byte order except when writing
to and reading from a packet.
For management simplicity if a label is configured to forward out
an interface that is down the packet is dropped early. Similarly
if an network interface is removed rt_dev is updated to NULL
(so no reference is preserved) and any packets for that label
are dropped. Keeping the label entries in the kernel allows
the kernel label table to function as the definitive source
of which labels are allocated and which are not.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This refactoring is needed to allow more than just mpls gso
support to be built into the mpls moddule.
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman says:
====================
Neighbour table prep for MPLS
In preparation for using the IPv4 and IPv6 neighbour tables in my mpls
code this patchset factors out ___neigh_lookup_noref from
__ipv4_neigh_lookup_noref, __ipv6_lookup_noref and neigh_lookup.
Allowing the lookup logic to be shared between the different
implementations. At what appears to be no cost. (Aka the same assembly
is generated for ip6_finish_output2 and ip_finish_output2).
After that I add a simple function that takes an address family and an
address consults the neighbour table and sends the packet to the
appropriate location. The address family argument decoupls callers
of neigh_xmit from the addresses families the packets are sent over.
(Aka The ipv6 module can be loaded after mpls and a previously
configured ipv6 next hop will start working).
The refactoring in ___neigh_lookup_noref may be a bit overkill but it
feels like the right thing to do. Especially since the same code is
generated.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
For MPLS I am building the code so that either the neighbour mac
address can be specified or we can have a next hop in ipv4 or ipv6.
The kind of next hop we have is indicated by the neighbour table
pointer. A neighbour table pointer of NULL is a link layer address.
A non-NULL neighbour table pointer indicates which neighbour table and
thus which address family the next hop address is in that we need to
look up.
The code either sends a packet directly or looks up the appropriate
neighbour table entry and sends the packet.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While looking at the mpls code I found myself writing yet another
version of neigh_lookup_noref. We currently have __ipv4_lookup_noref
and __ipv6_lookup_noref.
So to make my work a little easier and to make it a smidge easier to
verify/maintain the mpls code in the future I stopped and wrote
___neigh_lookup_noref. Then I rewote __ipv4_lookup_noref and
__ipv6_lookup_noref in terms of this new function. I tested my new
version by verifying that the same code is generated in
ip_finish_output2 and ip6_finish_output2 where these functions are
inlined.
To get to ___neigh_lookup_noref I added a new neighbour cache table
function key_eq. So that the static size of the key would be
available.
I also added __neigh_lookup_noref for people who want to to lookup
a neighbour table entry quickly but don't know which neibhgour table
they are going to look up.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the STP timer fires, it can call br_ifinfo_notify(),
which in turn ends up in the new br_get_link_af_size().
This function is annotated to be using RTNL locking, which
clearly isn't the case here, and thus lockdep warns:
===============================
[ INFO: suspicious RCU usage. ]
3.19.0+ #569 Not tainted
-------------------------------
net/bridge/br_private.h:204 suspicious rcu_dereference_protected() usage!
Fix this by doing RCU locking here.
Fixes: b7853d73e3 ("bridge: add vlan info to bridge setlink and dellink notification messages")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the call to exchange-id returns with the EXCHGID4_FLAG_CONFIRMED_R flag
set, then that means our lease was established by a previous mount instance.
Ensure that we detect this situation, and that we clear the state held by
that mount.
Reported-by: Jorge Mora <Jorge.Mora@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>