android_kernel_oneplus_msm8998/crypto
Arnd Bergmann e041ad0664 crypto: improve gcc optimization flags for serpent and wp512
commit 7d6e9105026788c497f0ab32fa16c82f4ab5ff61 upstream.

An ancient gcc bug (first reported in 2003) has apparently resurfaced
on MIPS, where kernelci.org reports an overly large stack frame in the
whirlpool hash algorithm:

crypto/wp512.c:987:1: warning: the frame size of 1112 bytes is larger than 1024 bytes [-Wframe-larger-than=]

With some testing in different configurations, I'm seeing large
variations in stack frames size up to 1500 bytes for what should have
around 300 bytes at most. I also checked the reference implementation,
which is essentially the same code but also comes with some test and
benchmarking infrastructure.

It seems that recent compiler versions on at least arm, arm64 and powerpc
have a partial fix for this problem, but enabling "-fsched-pressure", but
even with that fix they suffer from the issue to a certain degree. Some
testing on arm64 shows that the time needed to hash a given amount of
data is roughly proportional to the stack frame size here, which makes
sense given that the wp512 implementation is doing lots of loads for
table lookups, and the problem with the overly large stack is a result
of doing a lot more loads and stores for spilled registers (as seen from
inspecting the object code).

Disabling -fschedule-insns consistently fixes the problem for wp512,
in my collection of cross-compilers, the results are consistently better
or identical when comparing the stack sizes in this function, though
some architectures (notable x86) have schedule-insns disabled by
default.

The four columns are:
default: -O2
press:	 -O2 -fsched-pressure
nopress: -O2 -fschedule-insns -fno-sched-pressure
nosched: -O2 -no-schedule-insns (disables sched-pressure)

				default	press	nopress	nosched
alpha-linux-gcc-4.9.3		1136	848	1136	176
am33_2.0-linux-gcc-4.9.3	2100	2076	2100	2104
arm-linux-gnueabi-gcc-4.9.3	848	848	1048	352
cris-linux-gcc-4.9.3		272	272	272	272
frv-linux-gcc-4.9.3		1128	1000	1128	280
hppa64-linux-gcc-4.9.3		1128	336	1128	184
hppa-linux-gcc-4.9.3		644	308	644	276
i386-linux-gcc-4.9.3		352	352	352	352
m32r-linux-gcc-4.9.3		720	656	720	268
microblaze-linux-gcc-4.9.3	1108	604	1108	256
mips64-linux-gcc-4.9.3		1328	592	1328	208
mips-linux-gcc-4.9.3		1096	624	1096	240
powerpc64-linux-gcc-4.9.3	1088	432	1088	160
powerpc-linux-gcc-4.9.3		1080	584	1080	224
s390-linux-gcc-4.9.3		456	456	624	360
sh3-linux-gcc-4.9.3		292	292	292	292
sparc64-linux-gcc-4.9.3		992	240	992	208
sparc-linux-gcc-4.9.3		680	592	680	312
x86_64-linux-gcc-4.9.3		224	240	272	224
xtensa-linux-gcc-4.9.3		1152	704	1152	304

aarch64-linux-gcc-7.0.0		224	224	1104	208
arm-linux-gnueabi-gcc-7.0.1	824	824	1048	352
mips-linux-gcc-7.0.0		1120	648	1120	272
x86_64-linux-gcc-7.0.1		240	240	304	240

arm-linux-gnueabi-gcc-4.4.7	840			392
arm-linux-gnueabi-gcc-4.5.4	784	728	784	320
arm-linux-gnueabi-gcc-4.6.4	736	728	736	304
arm-linux-gnueabi-gcc-4.7.4	944	784	944	352
arm-linux-gnueabi-gcc-4.8.5	464	464	760	352
arm-linux-gnueabi-gcc-4.9.3	848	848	1048	352
arm-linux-gnueabi-gcc-5.3.1	824	824	1064	336
arm-linux-gnueabi-gcc-6.1.1	808	808	1056	344
arm-linux-gnueabi-gcc-7.0.1	824	824	1048	352

Trying the same test for serpent-generic, the picture is a bit different,
and while -fno-schedule-insns is generally better here than the default,
-fsched-pressure wins overall, so I picked that instead.

				default	press	nopress	nosched
alpha-linux-gcc-4.9.3		1392	864	1392	960
am33_2.0-linux-gcc-4.9.3	536	524	536	528
arm-linux-gnueabi-gcc-4.9.3	552	552	776	536
cris-linux-gcc-4.9.3		528	528	528	528
frv-linux-gcc-4.9.3		536	400	536	504
hppa64-linux-gcc-4.9.3		524	208	524	480
hppa-linux-gcc-4.9.3		768	472	768	508
i386-linux-gcc-4.9.3		564	564	564	564
m32r-linux-gcc-4.9.3		712	576	712	532
microblaze-linux-gcc-4.9.3	724	392	724	512
mips64-linux-gcc-4.9.3		720	384	720	496
mips-linux-gcc-4.9.3		728	384	728	496
powerpc64-linux-gcc-4.9.3	704	304	704	480
powerpc-linux-gcc-4.9.3		704	296	704	480
s390-linux-gcc-4.9.3		560	560	592	536
sh3-linux-gcc-4.9.3		540	540	540	540
sparc64-linux-gcc-4.9.3		544	352	544	496
sparc-linux-gcc-4.9.3		544	344	544	496
x86_64-linux-gcc-4.9.3		528	536	576	528
xtensa-linux-gcc-4.9.3		752	544	752	544

aarch64-linux-gcc-7.0.0		432	432	656	480
arm-linux-gnueabi-gcc-7.0.1	616	616	808	536
mips-linux-gcc-7.0.0		720	464	720	488
x86_64-linux-gcc-7.0.1		536	528	600	536

arm-linux-gnueabi-gcc-4.4.7	592			440
arm-linux-gnueabi-gcc-4.5.4	776	448	776	544
arm-linux-gnueabi-gcc-4.6.4	776	448	776	544
arm-linux-gnueabi-gcc-4.7.4	768	448	768	544
arm-linux-gnueabi-gcc-4.8.5	488	488	776	544
arm-linux-gnueabi-gcc-4.9.3	552	552	776	536
arm-linux-gnueabi-gcc-5.3.1	552	552	776	536
arm-linux-gnueabi-gcc-6.1.1	560	560	776	536
arm-linux-gnueabi-gcc-7.0.1	616	616	808	536

I did not do any runtime tests with serpent, so it is possible that stack
frame size does not directly correlate with runtime performance here and
it actually makes things worse, but it's more likely to help here, and
the reduced stack frame size is probably enough reason to apply the patch,
especially given that the crypto code is often used in deep call chains.

Link: https://kernelci.org/build/id/58797d7559b5149efdf6c3a9/logs/
Link: http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11488
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-18 19:09:56 +08:00
..
asymmetric_keys PKCS#7: Don't require SpcSpOpusInfo in Authenticode pkcs7 signatures 2016-10-28 03:01:33 -04:00
async_tx async_pq_val: fix DMA memory leak 2016-10-22 12:26:55 +02:00
.gitignore crypto: rsa - add .gitignore for crypto/*.-asn1.[ch] files 2015-06-25 23:29:24 +08:00
842.c crypto: 842 - change 842 alg to use software 2015-05-11 15:06:43 +08:00
ablk_helper.c crypto: cryptd - process CRYPTO_ALG_INTERNAL 2015-03-31 21:21:04 +08:00
ablkcipher.c crypto: skcipher - Copy iv from desc even for 0-len walks 2015-12-09 20:16:22 +08:00
aead.c crypto: aead - Remove CRYPTO_ALG_AEAD_NEW flag 2015-08-17 16:53:53 +08:00
aes_generic.c crypto: add missing crypto module aliases 2015-01-13 22:29:11 +11:00
af_alg.c crypto: af_alg - Forbid bind(2) when nokey child sockets are present 2016-02-17 12:31:04 -08:00
ahash.c crypto: hash - Fix page length clamping in hash walk 2016-05-18 17:06:45 -07:00
akcipher.c crypto: akcipher - Don't #include crypto/public_key.h as the contents aren't used 2015-10-20 22:14:01 +08:00
algapi.c crypto: api - Clear CRYPTO_ALG_DEAD bit before registering an alg 2017-02-09 08:02:44 +01:00
algboss.c crypto: algboss - Remove reference to nivaead 2015-08-17 16:53:41 +08:00
algif_aead.c net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA 2015-12-01 15:45:05 -05:00
algif_hash.c crypto: algif_hash - wait for crypto_ahash_init() to complete 2016-02-17 12:31:04 -08:00
algif_rng.c crypto: algif_rng - Remove obsolete const-removal cast 2015-04-22 09:30:21 +08:00
algif_skcipher.c crypto: algif_skcipher - Do not set MAY_BACKLOG on the async path 2016-02-17 12:31:05 -08:00
ansi_cprng.c crypto: ansi_cprng - Convert to new rng interface 2015-04-22 09:30:18 +08:00
anubis.c crypto: prefix module autoloading with "crypto-" 2014-11-24 22:43:57 +08:00
api.c crypto: api - Only abort operations on fatal signal 2015-10-20 21:59:25 +08:00
arc4.c crypto: prefix module autoloading with "crypto-" 2014-11-24 22:43:57 +08:00
authenc.c crypto: aead - Remove CRYPTO_ALG_AEAD_NEW flag 2015-08-17 16:53:53 +08:00
authencesn.c crypto: aead - Remove CRYPTO_ALG_AEAD_NEW flag 2015-08-17 16:53:53 +08:00
blkcipher.c crypto: skcipher - Fix blkcipher walk OOM crash 2016-09-30 10:18:34 +02:00
blowfish_common.c crypto: blowfish - split generic and common c code 2011-09-22 21:25:25 +10:00
blowfish_generic.c crypto: add missing crypto module aliases 2015-01-13 22:29:11 +11:00
camellia_generic.c crypto: add missing crypto module aliases 2015-01-13 22:29:11 +11:00
cast5_generic.c crypto: add missing crypto module aliases 2015-01-13 22:29:11 +11:00
cast6_generic.c crypto: add missing crypto module aliases 2015-01-13 22:29:11 +11:00
cast_common.c crypto: make tables used from assembler __visible 2013-08-14 20:42:03 +10:00
cbc.c crypto: include crypto- module prefix in template 2014-11-26 20:06:30 +08:00
ccm.c crypto: replace scatterwalk_sg_chain with sg_chain 2015-08-17 08:12:54 -06:00
chacha20_generic.c crypto: chacha20 - Export common ChaCha20 helpers 2015-07-17 21:20:21 +08:00
chacha20poly1305.c crypto: aead - Remove CRYPTO_ALG_AEAD_NEW flag 2015-08-17 16:53:53 +08:00
chainiv.c crypto: chainiv - Offer normal cipher functionality without RNG 2015-06-22 15:49:28 +08:00
cipher.c crypto: cipher - Fix checkpatch errors 2010-02-16 20:31:37 +08:00
cmac.c crypto: include crypto- module prefix in template 2014-11-26 20:06:30 +08:00
compress.c crypto: compress - Fix checkpatch errors 2010-02-16 20:31:04 +08:00
crc32.c crypto: prefix module autoloading with "crypto-" 2014-11-24 22:43:57 +08:00
crc32c_generic.c crypto: crc32c - Fix crc32c soft dependency 2016-02-17 12:31:04 -08:00
crct10dif_common.c crypto: crct10dif - Add fallback for broken initrds 2013-09-12 15:31:34 +10:00
crct10dif_generic.c crypto: add missing crypto module aliases 2015-01-13 22:29:11 +11:00
cryptd.c crypto: cryptd - initialize child shash_desc on import 2016-09-24 10:07:41 +02:00
crypto_null.c crypto: null - Add default null skcipher 2015-05-22 11:25:55 +08:00
crypto_user.c crypto: user - re-add size check for CRYPTO_MSG_GETALG 2016-07-11 09:31:12 -07:00
crypto_wq.c crypto: crypto_wq - Fix late crypto work queue initialization 2014-03-21 21:54:28 +08:00
ctr.c crypto: include crypto- module prefix in template 2014-11-26 20:06:30 +08:00
cts.c crypto: cts - Weed out non-CBC algorithms 2015-01-20 14:44:15 +11:00
deflate.c crypto: prefix module autoloading with "crypto-" 2014-11-24 22:43:57 +08:00
des_generic.c crypto: add missing crypto module aliases 2015-01-13 22:29:11 +11:00
drbg.c crypto: drbg - report backend_cra_name when allocation fails 2015-06-11 21:55:28 +08:00
ecb.c crypto: include crypto- module prefix in template 2014-11-26 20:06:30 +08:00
echainiv.c crypto: echainiv - Replace chaining with multiplication 2016-09-30 10:18:34 +02:00
eseqiv.c crypto: eseqiv - Offer normal cipher functionality without RNG 2015-06-22 15:49:28 +08:00
fcrypt.c crypto: prefix module autoloading with "crypto-" 2014-11-24 22:43:57 +08:00
fips.c crypto: fips - Move fips_enabled sysctl into fips.c 2015-04-23 14:18:09 +08:00
gcm.c crypto: gcm - Fix IV buffer size in crypto_gcm_setkey 2016-10-31 04:13:59 -06:00
gf128mul.c crypto: gf128mul - fix call to memset() 2011-07-08 17:21:21 +08:00
ghash-generic.c crypto: ghash-generic - move common definitions to a new header file 2016-10-22 12:26:56 +02:00
hash_info.c crypto: provide single place for hash algo information 2013-10-25 17:14:03 -04:00
hmac.c crypto: include crypto- module prefix in template 2014-11-26 20:06:30 +08:00
internal.h crypto: api - Remove linux/fips.h from internal.h 2015-04-23 14:18:10 +08:00
jitterentropy-kcapi.c crypto: jitterentropy - remove unnecessary information from a comment 2015-10-14 22:23:16 +08:00
jitterentropy.c crypto: jitterentropy - Delete unnecessary checks before the function call "kzfree" 2015-06-25 23:18:33 +08:00
Kconfig crypto: keywrap - enable compilation 2015-10-15 21:05:06 +08:00
keywrap.c crypto: keywrap - memzero the correct memory 2016-04-12 09:08:45 -07:00
khazad.c crypto: prefix module autoloading with "crypto-" 2014-11-24 22:43:57 +08:00
lrw.c crypto: include crypto- module prefix in template 2014-11-26 20:06:30 +08:00
lz4.c crypto: prefix module autoloading with "crypto-" 2014-11-24 22:43:57 +08:00
lz4hc.c crypto: prefix module autoloading with "crypto-" 2014-11-24 22:43:57 +08:00
lzo.c crypto: prefix module autoloading with "crypto-" 2014-11-24 22:43:57 +08:00
Makefile crypto: improve gcc optimization flags for serpent and wp512 2017-03-18 19:09:56 +08:00
mcryptd.c crypto: mcryptd - Check mcryptd algorithm compatibility 2016-12-15 08:49:22 -08:00
md4.c crypto: prefix module autoloading with "crypto-" 2014-11-24 22:43:57 +08:00
md5.c crypto: md5 - use md5 IV MD5_HX instead of their raw value 2015-05-18 12:20:18 +08:00
memneq.c crypto: memneq - fix for archs without efficient unaligned access 2013-12-09 20:09:12 +08:00
michael_mic.c crypto: prefix module autoloading with "crypto-" 2014-11-24 22:43:57 +08:00
pcbc.c crypto: include crypto- module prefix in template 2014-11-26 20:06:30 +08:00
pcompress.c crypto: pcomp - Use crypto_alg_extsize helper 2015-04-21 10:19:55 +08:00
pcrypt.c crypto: aead - Remove CRYPTO_ALG_AEAD_NEW flag 2015-08-17 16:53:53 +08:00
poly1305_generic.c crypto: poly1305 - Export common Poly1305 helpers 2015-07-17 21:20:26 +08:00
proc.c crypto: fips - Move fips_enabled sysctl into fips.c 2015-04-23 14:18:09 +08:00
ripemd.h
rmd128.c crypto: prefix module autoloading with "crypto-" 2014-11-24 22:43:57 +08:00
rmd160.c crypto: prefix module autoloading with "crypto-" 2014-11-24 22:43:57 +08:00
rmd256.c crypto: prefix module autoloading with "crypto-" 2014-11-24 22:43:57 +08:00
rmd320.c crypto: prefix module autoloading with "crypto-" 2014-11-24 22:43:57 +08:00
rng.c crypto: rng - Do not free default RNG when it becomes unused 2015-06-22 15:49:18 +08:00
rsa.c crypto: akcipher - Changes to asymmetric key API 2015-10-14 22:23:16 +08:00
rsa_helper.c crypto: akcipher - Changes to asymmetric key API 2015-10-14 22:23:16 +08:00
rsaprivkey.asn1 crypto: akcipher - Changes to asymmetric key API 2015-10-14 22:23:16 +08:00
rsapubkey.asn1 crypto: akcipher - Changes to asymmetric key API 2015-10-14 22:23:16 +08:00
salsa20_generic.c crypto: add missing crypto module aliases 2015-01-13 22:29:11 +11:00
scatterwalk.c crypto: scatterwalk - Fix test in scatterwalk_done 2016-08-16 09:30:50 +02:00
seed.c crypto: prefix module autoloading with "crypto-" 2014-11-24 22:43:57 +08:00
seqiv.c crypto: seqiv - Use generic geniv init/exit helpers 2015-08-17 16:53:46 +08:00
serpent_generic.c crypto: add missing crypto module aliases 2015-01-13 22:29:11 +11:00
sha1_generic.c crypto: sha1-generic - move to generic glue implementation 2015-04-10 21:39:40 +08:00
sha256_generic.c crypto: sha256-generic - move to generic glue implementation 2015-04-10 21:39:41 +08:00
sha512_generic.c crypto: sha512-generic - move to generic glue implementation 2015-04-10 21:39:41 +08:00
shash.c crypto: shash - Fix has_key setting 2016-02-17 12:31:04 -08:00
skcipher.c crypto: skcipher - Add crypto_skcipher_has_setkey 2016-02-17 12:31:03 -08:00
tcrypt.c crypto: tcrypt - avoid mapping from module image addresses 2015-09-21 22:00:36 +08:00
tcrypt.h crypto: tcrypt - Add ChaCha20/Poly1305 speed tests 2015-07-17 21:20:20 +08:00
tea.c crypto: add missing crypto module aliases 2015-01-13 22:29:11 +11:00
testmgr.c crypto: testmgr - Use kmalloc memory for RSA input 2016-05-18 17:06:45 -07:00
testmgr.h crypto: testmgr - Pad aes_ccm_enc_tv_template vector 2017-03-12 06:37:28 +01:00
tgr192.c crypto: add missing crypto module aliases 2015-01-13 22:29:11 +11:00
twofish_common.c crypto: twofish-x86_64-3way - add lrw support 2011-11-09 11:53:32 +08:00
twofish_generic.c crypto: add missing crypto module aliases 2015-01-13 22:29:11 +11:00
vmac.c crypto: include crypto- module prefix in template 2014-11-26 20:06:30 +08:00
wp512.c crypto: add missing crypto module aliases 2015-01-13 22:29:11 +11:00
xcbc.c crypto: include crypto- module prefix in template 2014-11-26 20:06:30 +08:00
xor.c add further __init annotations to crypto/xor.c 2012-10-11 13:42:32 +11:00
xts.c crypto: include crypto- module prefix in template 2014-11-26 20:06:30 +08:00
zlib.c crypto: pcomp - Constify (de)compression parameters 2015-05-01 11:16:37 +08:00