2012-09-07 04:17:02 +08:00
|
|
|
#
|
|
|
|
# Arch-specific CryptoAPI modules.
|
|
|
|
#
|
|
|
|
|
|
|
|
obj-$(CONFIG_CRYPTO_AES_ARM) += aes-arm.o
|
ARM: add support for bit sliced AES using NEON instructions
Bit sliced AES gives around 45% speedup on Cortex-A15 for encryption
and around 25% for decryption. This implementation of the AES algorithm
does not rely on any lookup tables so it is believed to be invulnerable
to cache timing attacks.
This algorithm processes up to 8 blocks in parallel in constant time. This
means that it is not usable by chaining modes that are strictly sequential
in nature, such as CBC encryption. CBC decryption, however, can benefit from
this implementation and runs about 25% faster. The other chaining modes
implemented in this module, XTS and CTR, can execute fully in parallel in
both directions.
The core code has been adopted from the OpenSSL project (in collaboration
with the original author, on cc). For ease of maintenance, this version is
identical to the upstream OpenSSL code, i.e., all modifications that were
required to make it suitable for inclusion into the kernel have been made
upstream. The original can be found here:
http://git.openssl.org/gitweb/?p=openssl.git;a=commit;h=6f6a6130
Note to integrators:
While this implementation is significantly faster than the existing table
based ones (generic or ARM asm), especially in CTR mode, the effects on
power efficiency are unclear as of yet. This code does fundamentally more
work, by calculating values that the table based code obtains by a simple
lookup; only by doing all of that work in a SIMD fashion, it manages to
perform better.
Cc: Andy Polyakov <appro@openssl.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2013-09-16 18:31:38 +02:00
|
|
|
obj-$(CONFIG_CRYPTO_AES_ARM_BS) += aes-arm-bs.o
|
2012-09-07 04:17:02 +08:00
|
|
|
obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o
|
2014-07-29 17:14:14 +01:00
|
|
|
obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
|
2015-04-03 18:03:40 +08:00
|
|
|
obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o
|
2015-05-08 10:46:21 +02:00
|
|
|
obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o
|
BACKPORT, FROMGIT: crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS
Add an ARM NEON-accelerated implementation of Speck-XTS. It operates on
128-byte chunks at a time, i.e. 8 blocks for Speck128 or 16 blocks for
Speck64. Each 128-byte chunk goes through XTS preprocessing, then is
encrypted/decrypted (doing one cipher round for all the blocks, then the
next round, etc.), then goes through XTS postprocessing.
The performance depends on the processor but can be about 3 times faster
than the generic code. For example, on an ARMv7 processor we observe
the following performance with Speck128/256-XTS:
xts-speck128-neon: Encryption 107.9 MB/s, Decryption 108.1 MB/s
xts(speck128-generic): Encryption 32.1 MB/s, Decryption 36.6 MB/s
In comparison to AES-256-XTS without the Cryptography Extensions:
xts-aes-neonbs: Encryption 41.2 MB/s, Decryption 36.7 MB/s
xts(aes-asm): Encryption 31.7 MB/s, Decryption 30.8 MB/s
xts(aes-generic): Encryption 21.2 MB/s, Decryption 20.9 MB/s
Speck64/128-XTS is even faster:
xts-speck64-neon: Encryption 138.6 MB/s, Decryption 139.1 MB/s
Note that as with the generic code, only the Speck128 and Speck64
variants are supported. Also, for now only the XTS mode of operation is
supported, to target the disk and file encryption use cases. The NEON
code also only handles the portion of the data that is evenly divisible
into 128-byte chunks, with any remainder handled by a C fallback. Of
course, other modes of operation could be added later if needed, and/or
the NEON code could be updated to handle other buffer sizes.
The XTS specification is only defined for AES which has a 128-bit block
size, so for the GF(2^64) math needed for Speck64-XTS we use the
reducing polynomial 'x^64 + x^4 + x^3 + x + 1' given by the original XEX
paper. Of course, when possible users should use Speck128-XTS, but even
that may be too slow on some processors; Speck64-XTS can be faster.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit ede9622162fac42eacde231d64e94c926f4be45d
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git master)
(changed speck-neon-glue.c to use blkcipher API instead of skcipher API)
(resolved merge conflict in arch/arm/crypto/Makefile)
(made CONFIG_CRYPTO_SPECK_NEON select CONFIG_CRYPTO_GF128MUL, since
gf128mul_x_ble() is non-inline in older kernels)
Change-Id: I5bbc86cb3c2cbc36636a59a0db725b2ad95ea81b
Signed-off-by: Eric Biggers <ebiggers@google.com>
2018-02-14 10:42:21 -08:00
|
|
|
obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o
|
crypto: arm - workaround for building with old binutils
Old versions of binutils (before 2.23) do not yet understand the
crypto-neon-fp-armv8 fpu instructions, and an attempt to build these
files results in a build failure:
arch/arm/crypto/aes-ce-core.S:133: Error: selected processor does not support ARM mode `vld1.8 {q10-q11},[ip]!'
arch/arm/crypto/aes-ce-core.S:133: Error: bad instruction `aese.8 q0,q8'
arch/arm/crypto/aes-ce-core.S:133: Error: bad instruction `aesmc.8 q0,q0'
arch/arm/crypto/aes-ce-core.S:133: Error: bad instruction `aese.8 q0,q9'
arch/arm/crypto/aes-ce-core.S:133: Error: bad instruction `aesmc.8 q0,q0'
Since the affected versions are still in widespread use, and this breaks
'allmodconfig' builds, we should try to at least get a successful kernel
build. Unfortunately, I could not come up with a way to make the Kconfig
symbol depend on the binutils version, which would be the nicest solution.
Instead, this patch uses the 'as-instr' Kbuild macro to find out whether
the support is present in the assembler, and otherwise emits a non-fatal
warning indicating which selected modules could not be built.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: http://storage.kernelci.org/next/next-20150410/arm-allmodconfig/build.log
Fixes: 864cbeed4ab22d ("crypto: arm - add support for SHA1 using ARMv8 Crypto Instructions")
[ard.biesheuvel:
- omit modules entirely instead of building empty ones if binutils is too old
- update commit log accordingly]
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-04-11 15:32:34 +02:00
|
|
|
|
|
|
|
ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
|
|
|
|
ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
|
|
|
|
ce-obj-$(CONFIG_CRYPTO_SHA2_ARM_CE) += sha2-arm-ce.o
|
|
|
|
ce-obj-$(CONFIG_CRYPTO_GHASH_ARM_CE) += ghash-arm-ce.o
|
|
|
|
|
|
|
|
ifneq ($(ce-obj-y)$(ce-obj-m),)
|
|
|
|
ifeq ($(call as-instr,.fpu crypto-neon-fp-armv8,y,n),y)
|
|
|
|
obj-y += $(ce-obj-y)
|
|
|
|
obj-m += $(ce-obj-m)
|
|
|
|
else
|
|
|
|
$(warning These ARMv8 Crypto Extensions modules need binutils 2.23 or higher)
|
|
|
|
$(warning $(ce-obj-y) $(ce-obj-m))
|
|
|
|
endif
|
|
|
|
endif
|
2012-09-07 04:17:02 +08:00
|
|
|
|
ARM: add support for bit sliced AES using NEON instructions
Bit sliced AES gives around 45% speedup on Cortex-A15 for encryption
and around 25% for decryption. This implementation of the AES algorithm
does not rely on any lookup tables so it is believed to be invulnerable
to cache timing attacks.
This algorithm processes up to 8 blocks in parallel in constant time. This
means that it is not usable by chaining modes that are strictly sequential
in nature, such as CBC encryption. CBC decryption, however, can benefit from
this implementation and runs about 25% faster. The other chaining modes
implemented in this module, XTS and CTR, can execute fully in parallel in
both directions.
The core code has been adopted from the OpenSSL project (in collaboration
with the original author, on cc). For ease of maintenance, this version is
identical to the upstream OpenSSL code, i.e., all modifications that were
required to make it suitable for inclusion into the kernel have been made
upstream. The original can be found here:
http://git.openssl.org/gitweb/?p=openssl.git;a=commit;h=6f6a6130
Note to integrators:
While this implementation is significantly faster than the existing table
based ones (generic or ARM asm), especially in CTR mode, the effects on
power efficiency are unclear as of yet. This code does fundamentally more
work, by calculating values that the table based code obtains by a simple
lookup; only by doing all of that work in a SIMD fashion, it manages to
perform better.
Cc: Andy Polyakov <appro@openssl.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2013-09-16 18:31:38 +02:00
|
|
|
aes-arm-y := aes-armv4.o aes_glue.o
|
|
|
|
aes-arm-bs-y := aesbs-core.o aesbs-glue.o
|
|
|
|
sha1-arm-y := sha1-armv4-large.o sha1_glue.o
|
2014-07-29 17:14:14 +01:00
|
|
|
sha1-arm-neon-y := sha1-armv7-neon.o sha1_neon_glue.o
|
2015-04-03 18:03:40 +08:00
|
|
|
sha256-arm-neon-$(CONFIG_KERNEL_MODE_NEON) := sha256_neon_glue.o
|
|
|
|
sha256-arm-y := sha256-core.o sha256_glue.o $(sha256-arm-neon-y)
|
2015-05-08 10:46:21 +02:00
|
|
|
sha512-arm-neon-$(CONFIG_KERNEL_MODE_NEON) := sha512-neon-glue.o
|
|
|
|
sha512-arm-y := sha512-core.o sha512-glue.o $(sha512-arm-neon-y)
|
2015-03-10 09:47:45 +01:00
|
|
|
sha1-arm-ce-y := sha1-ce-core.o sha1-ce-glue.o
|
2015-03-10 09:47:46 +01:00
|
|
|
sha2-arm-ce-y := sha2-ce-core.o sha2-ce-glue.o
|
2015-03-10 09:47:47 +01:00
|
|
|
aes-arm-ce-y := aes-ce-core.o aes-ce-glue.o
|
2015-03-10 09:47:48 +01:00
|
|
|
ghash-arm-ce-y := ghash-ce-core.o ghash-ce-glue.o
|
BACKPORT, FROMGIT: crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS
Add an ARM NEON-accelerated implementation of Speck-XTS. It operates on
128-byte chunks at a time, i.e. 8 blocks for Speck128 or 16 blocks for
Speck64. Each 128-byte chunk goes through XTS preprocessing, then is
encrypted/decrypted (doing one cipher round for all the blocks, then the
next round, etc.), then goes through XTS postprocessing.
The performance depends on the processor but can be about 3 times faster
than the generic code. For example, on an ARMv7 processor we observe
the following performance with Speck128/256-XTS:
xts-speck128-neon: Encryption 107.9 MB/s, Decryption 108.1 MB/s
xts(speck128-generic): Encryption 32.1 MB/s, Decryption 36.6 MB/s
In comparison to AES-256-XTS without the Cryptography Extensions:
xts-aes-neonbs: Encryption 41.2 MB/s, Decryption 36.7 MB/s
xts(aes-asm): Encryption 31.7 MB/s, Decryption 30.8 MB/s
xts(aes-generic): Encryption 21.2 MB/s, Decryption 20.9 MB/s
Speck64/128-XTS is even faster:
xts-speck64-neon: Encryption 138.6 MB/s, Decryption 139.1 MB/s
Note that as with the generic code, only the Speck128 and Speck64
variants are supported. Also, for now only the XTS mode of operation is
supported, to target the disk and file encryption use cases. The NEON
code also only handles the portion of the data that is evenly divisible
into 128-byte chunks, with any remainder handled by a C fallback. Of
course, other modes of operation could be added later if needed, and/or
the NEON code could be updated to handle other buffer sizes.
The XTS specification is only defined for AES which has a 128-bit block
size, so for the GF(2^64) math needed for Speck64-XTS we use the
reducing polynomial 'x^64 + x^4 + x^3 + x + 1' given by the original XEX
paper. Of course, when possible users should use Speck128-XTS, but even
that may be too slow on some processors; Speck64-XTS can be faster.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit ede9622162fac42eacde231d64e94c926f4be45d
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git master)
(changed speck-neon-glue.c to use blkcipher API instead of skcipher API)
(resolved merge conflict in arch/arm/crypto/Makefile)
(made CONFIG_CRYPTO_SPECK_NEON select CONFIG_CRYPTO_GF128MUL, since
gf128mul_x_ble() is non-inline in older kernels)
Change-Id: I5bbc86cb3c2cbc36636a59a0db725b2ad95ea81b
Signed-off-by: Eric Biggers <ebiggers@google.com>
2018-02-14 10:42:21 -08:00
|
|
|
speck-neon-y := speck-neon-core.o speck-neon-glue.o
|
ARM: add support for bit sliced AES using NEON instructions
Bit sliced AES gives around 45% speedup on Cortex-A15 for encryption
and around 25% for decryption. This implementation of the AES algorithm
does not rely on any lookup tables so it is believed to be invulnerable
to cache timing attacks.
This algorithm processes up to 8 blocks in parallel in constant time. This
means that it is not usable by chaining modes that are strictly sequential
in nature, such as CBC encryption. CBC decryption, however, can benefit from
this implementation and runs about 25% faster. The other chaining modes
implemented in this module, XTS and CTR, can execute fully in parallel in
both directions.
The core code has been adopted from the OpenSSL project (in collaboration
with the original author, on cc). For ease of maintenance, this version is
identical to the upstream OpenSSL code, i.e., all modifications that were
required to make it suitable for inclusion into the kernel have been made
upstream. The original can be found here:
http://git.openssl.org/gitweb/?p=openssl.git;a=commit;h=6f6a6130
Note to integrators:
While this implementation is significantly faster than the existing table
based ones (generic or ARM asm), especially in CTR mode, the effects on
power efficiency are unclear as of yet. This code does fundamentally more
work, by calculating values that the table based code obtains by a simple
lookup; only by doing all of that work in a SIMD fashion, it manages to
perform better.
Cc: Andy Polyakov <appro@openssl.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2013-09-16 18:31:38 +02:00
|
|
|
|
|
|
|
quiet_cmd_perl = PERL $@
|
|
|
|
cmd_perl = $(PERL) $(<) > $(@)
|
|
|
|
|
|
|
|
$(src)/aesbs-core.S_shipped: $(src)/bsaes-armv7.pl
|
|
|
|
$(call cmd,perl)
|
|
|
|
|
2015-04-03 18:03:40 +08:00
|
|
|
$(src)/sha256-core.S_shipped: $(src)/sha256-armv4.pl
|
|
|
|
$(call cmd,perl)
|
|
|
|
|
2015-05-08 10:46:21 +02:00
|
|
|
$(src)/sha512-core.S_shipped: $(src)/sha512-armv4.pl
|
|
|
|
$(call cmd,perl)
|
|
|
|
|
|
|
|
.PRECIOUS: $(obj)/aesbs-core.S $(obj)/sha256-core.S $(obj)/sha512-core.S
|