The following software has not been verified!
Prerequisites: AVX2 (Intel Haswell and newer; AMD Excavator and newer).
Optimization target: Skylake. Also works well on Broadwell, Kaby Lake, Coffee Lake, etc. Somewhat worse on Haswell because of the slower CMOVs.
There are two software packages:
-
inverse25519skylake
provides aninverse25519
function for modulus p=2^255−19. -
inverse256skylake
provides aninverse256_BTC_p
function for the Bitcoin field prime, aninverse256_BTC_n
function for the number of points on the Bitcoin curve, aninverse256_P256_p
function for the NIST P-256 field prime, and aninverse256_P256_n
function for the number of points on the NIST P-256 curve.
The inverse256
code is intended to allow
any prime p between 2^255 and 2^256,
with timings independent of p,
after a small precomputation of a table of p data.
The inverse25519
code is specific to p=2^255−19 and is slightly faster.
How to test inverse25519skylake
:
wget -m https://gcd.cr.yp.to/software/inverse25519skylake-latest-version.txt
version=$(cat gcd.cr.yp.to/software/inverse25519skylake-latest-version.txt)
wget -m https://gcd.cr.yp.to/software/inverse25519skylake-$version.tar.gz
tar -xzf gcd.cr.yp.to/software/inverse25519skylake-$version.tar.gz
cd inverse25519skylake-$version
make
./test < /dev/urandom
How to test inverse256skylake
:
wget -m https://gcd.cr.yp.to/software/inverse256skylake-latest-version.txt
version=$(cat gcd.cr.yp.to/software/inverse256skylake-latest-version.txt)
wget -m https://gcd.cr.yp.to/software/inverse256skylake-$version.tar.gz
tar -xzf gcd.cr.yp.to/software/inverse256skylake-$version.tar.gz
cd inverse256skylake-$version
make
./test < /dev/urandom
The ./test
program reads 32-byte inputs,
interpreted in little-endian form as integers
between 0 and 2^256−1;
uses inverse25519
, inverse256_BTC_p
, etc.
to invert each input x modulo the applicable prime p;
and uses GMP to check the following:
- The output is between 0 and p−1.
- The output times x is 1 modulo p; or the output times x is 0 modulo p, and x is also 0 modulo p. In other words, if x is 0 modulo p then the output is 0 modulo p; otherwise the output is the inverse of x modulo p.
- The output, used as an input, passes the same tests.
- Other integers between 0 and 2^256−1 congruent to x modulo p produce the same output.
The program prints occasional output lines such as
loop 1048576
indicating that it has tested 1048576 inputs. It also specifically checks integers p−1000 through p+999 (and thus also 0 through 999 and 2p−1000 through 2^256−1).
The ./test
program also runs a few benchmarks
showing the cycle counts of successive calls to the inversion function.
Typical output on a 3GHz Skylake (samba
) for inverse25519
:
nothing 85 41 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 43 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
cycles 5639 4280 4224 4227 4165 3951 3901 3856 3880 3873 3884 3846 3896 3898 3885 3882 3900 3860 3816 3894 3865 3867 3864 3870 3881 3794 3880 3881 3881 3872 3862 3870 3880 3899 3901 3902 3850 3863 3931 3806 3864 3844 3858 3897 3877 3897 3782 3868 3875 3876 3854 3865 3893 3799 3912 3882 3895 3918 3913 3910 3781 3849 3856
sorted 3781 3782 3794 3799 3806 3816 3844 3846 3849 3850 3854 3856 3856 3858 3860 3862 3863 3864 3864 3865 3865 3867 3868 3870 3870 3872 3873 3875 3876 3877 3880 3880 3880 3881 3881 3881 3882 3882 3884 3885 3893 3894 3895 3896 3897 3897 3898 3899 3900 3901 3901 3902 3910 3912 3913 3918 3931 3951 4165 4224 4227 4280 5639
The nothing
line shows the cost of the cycle counter.
The cycles
line shows the cost of the cycle counter
plus inverse25519
.
The first cycle counts are larger
because of cache effects.
Releases:
-
inverse256skylake-20210131.tar.gz
. Intel: 6418hiphop
(Haswell). 4260bolero
(Broadwell). 4028samba
(Skylake). 4070kizomba
(Kaby Lake). AMD: 5366rumba7
(Zen). -
inverse25519skylake-20210110.tar.gz
. Intel: 5908hiphop
(Haswell). 4008bolero
(Broadwell). 3880samba
(Skylake). 3816kizomba
(Kaby Lake). AMD: 5074rumba7
(Zen). -
inverse25519skylake-20201231.tar.gz
. Intel: 7616hiphop
(Haswell), 5192bolero
(Broadwell), 4724samba
(Skylake), 4935kizomba
(Kaby Lake). AMD: 6600rumba7
(Zen).
Version: This is version 2021.01.31 of the "Software" web page.