Skip to content
This repository has been archived by the owner on Jan 26, 2022. It is now read-only.

CUDA Scalar Mul #17

Merged
merged 177 commits into from
Nov 10, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
177 commits
Select commit Hold shift + click to select a range
a64d7fb
First draft affine batch ops & wnaf
jon-chuang Jul 31, 2020
b7024dd
changes to mutability and lifetimes
jon-chuang Jul 31, 2020
40ef5d7
delete superfluous files
jon-chuang Jul 31, 2020
0fa5eeb
crazy direction: Passing a FnMut to generate an iterator locally
jon-chuang Aug 1, 2020
eebb12b
unsuccessful further attempts
jon-chuang Aug 1, 2020
4d22acf
compile sucess using index approach
jon-chuang Aug 1, 2020
bbbec75
fixes for mutable borrows
jon-chuang Aug 1, 2020
3a6e45c
Successfully passed scalar mul test
jon-chuang Aug 1, 2020
5c65917
benchmarks + prefetching
jon-chuang Aug 3, 2020
3bf2bc1
stash
jon-chuang Aug 6, 2020
4bb5ad5
generic impl of batch arith for all affinecurves
jon-chuang Aug 6, 2020
67da071
batched affine formulas for TE - too expensive
jon-chuang Aug 6, 2020
2e54f67
improved TE affine
jon-chuang Aug 6, 2020
62df27d
cleanup batch inversion
jon-chuang Aug 6, 2020
e6d28b6
fmt...
jon-chuang Aug 6, 2020
74d9bb7
fix minor error
jon-chuang Aug 6, 2020
908fb73
remove debugging scaffolding
jon-chuang Aug 6, 2020
c0a5a07
fmt...
jon-chuang Aug 6, 2020
5c89660
delete batch arith bench as not suitable for criterion or bench
jon-chuang Aug 6, 2020
6359f7c
fix bench removal errors
jon-chuang Aug 6, 2020
56b8181
fmt...
jon-chuang Aug 6, 2020
ec2decd
added missing coeff_a
jon-chuang Aug 6, 2020
bad37bd
refactor BatchGroupArithmetic to be separate trait
jon-chuang Aug 12, 2020
5b9cae9
Batch verification with radix sort
jon-chuang Aug 16, 2020
cbf8e49
Cache-locality & parallelisation
jon-chuang Aug 17, 2020
200f5fa
Successfully impl batch verify
jon-chuang Aug 18, 2020
ed7c4a7
added tests and bench for batch_ver, parallel_random_gen, ^ thread util
jon-chuang Aug 18, 2020
0e612e4
fmt
jon-chuang Aug 18, 2020
8819290
enabled missing test
jon-chuang Aug 18, 2020
a8e9c18
remove voracious_radix_sort
jon-chuang Aug 18, 2020
f6a2392
commented unneeded Instant::now()
jon-chuang Aug 18, 2020
2390243
Fixed batch_ver tests for curves of small or unit cofactor
jon-chuang Aug 18, 2020
cbee6a2
split recursive and non-recursive, tidy up shared functionality
jon-chuang Aug 20, 2020
0811a0f
reduce max_logn
jon-chuang Aug 20, 2020
2cbff4d
adjust max_logn further
jon-chuang Aug 20, 2020
c138904
Batch MSM, speedup only for bw6 due to poor cache performance
jon-chuang Aug 21, 2020
5068e74
fmt...
jon-chuang Aug 21, 2020
e886a38
GLV iBiginteger
jon-chuang Aug 21, 2020
1235117
stash
jon-chuang Aug 22, 2020
a60bedc
stash
jon-chuang Aug 22, 2020
31690ce
Merge branch 'jonch/batch_ver' into jonch/glv
jon-chuang Aug 22, 2020
ae69a9f
GLV with Parameter-based specialisation
jon-chuang Aug 27, 2020
1cb7e65
GLV lattice basis script success
jon-chuang Aug 30, 2020
f68cf6e
Successfully passed tests and benched
jon-chuang Aug 31, 2020
cee0204
Improvments to MSM with and bucketed adds using lightweight index sort
jon-chuang Sep 2, 2020
0c3bde5
changed rng to be external parameter for non-parallel batch veri
jon-chuang Sep 3, 2020
a87db71
remove bench print scaffolding
jon-chuang Sep 3, 2020
1909a4b
remove old batch_bucketed_add using vectors instead of fixed offsets
jon-chuang Sep 3, 2020
9bfd683
retain parallel batch_add_split
jon-chuang Sep 3, 2020
24fcd36
Comments for batch arith
jon-chuang Sep 3, 2020
ed201c0
remove need for hashmap for no std for batch_bucketed_add
jon-chuang Sep 3, 2020
517df11
minor changes
jon-chuang Sep 3, 2020
22a48d3
cleanup
jon-chuang Sep 3, 2020
b5852b4
cleanup
jon-chuang Sep 3, 2020
af70e80
fmt + use no_std Vec
jon-chuang Sep 3, 2020
4421820
removed std::
jon-chuang Sep 3, 2020
7962c8c
add scratch space
jon-chuang Sep 3, 2020
9318e37
Add GLV for non-batched SW mul
jon-chuang Sep 4, 2020
a9c951a
fix for glv_scalar_decomposition when k == MODULUS (subgroup check)
jon-chuang Sep 4, 2020
a90dfa5
Fixed performance BUG: unnecessary table generation
jon-chuang Sep 4, 2020
3a70376
GLV -> has_glv(), bigint slice bd check, refactor batch loops, u32 index
jon-chuang Sep 7, 2020
e9027c0
clean remove of batch_verify
jon-chuang Sep 7, 2020
f65bdef
fix mistake with elems indexing, unused arg for future recursion PR
jon-chuang Sep 7, 2020
e5b1182
trivial errors
jon-chuang Sep 7, 2020
c0a53df
more minor fixes
jon-chuang Sep 7, 2020
344fbd3
fix issues with batch_ver (.is_zero(), TE affine->proj mul)
jon-chuang Sep 7, 2020
646260b
fix issue with batch_bucketed_add_split
jon-chuang Sep 7, 2020
ecdd939
misname
jon-chuang Sep 7, 2020
7ba3688
Success in test and bench \(*v*)/
jon-chuang Sep 7, 2020
9ec6727
tmp commit to cache experimental batch_add_write_shift_..
jon-chuang Sep 8, 2020
1810368
remove batch_add_write_shift..
jon-chuang Sep 8, 2020
58e46b4
optional dep, fmt...
jon-chuang Sep 8, 2020
6a6e2fd
undo accidental deletion of dlsd sort
jon-chuang Sep 8, 2020
9ec0eb7
fmt...
jon-chuang Sep 8, 2020
493626d
cleanup batch bucket add, unify impl
jon-chuang Sep 8, 2020
56bf4f9
no std...
jon-chuang Sep 8, 2020
a5640a4
fixed tests
jon-chuang Sep 8, 2020
6b39608
fixed unimplemented for TE, swapped wnaf table row/col for batchaddwrite
jon-chuang Sep 8, 2020
4cf6c5f
wnaf table generation uses fewer copies, remove timing instrumentation
jon-chuang Sep 8, 2020
1a928b0
Minor Cleanup
jon-chuang Sep 9, 2020
5964b4b
Add feature-activated timing instrumentation, reduce code bloat (wnaf)
jon-chuang Sep 9, 2020
d9de7b6
unused var, no_std
jon-chuang Sep 9, 2020
5b0872f
Make timing macros defined globally, instrument more code
jon-chuang Sep 9, 2020
abad582
instrument w/ tid, better num_rounds est. f64, timing black/whitelisting
jon-chuang Sep 9, 2020
1eacd89
Minor changes
jon-chuang Sep 9, 2020
204ffa5
refactor tests, generic MSM test
jon-chuang Sep 10, 2020
9efaae4
2D test matrix :)
jon-chuang Sep 10, 2020
bd82f31
batchaffine
jon-chuang Sep 10, 2020
e5cb574
tests
jon-chuang Sep 10, 2020
3ed5d9f
additive features
jon-chuang Sep 11, 2020
2fc20e4
big_n feature for test-benching
jon-chuang Sep 11, 2020
f21f40a
prefetch unroll
jon-chuang Sep 11, 2020
c605894
minor adjustments
jon-chuang Sep 11, 2020
6a70b67
extension(s -> "")_fields
jon-chuang Sep 14, 2020
c83b29d
remove artifacts, fix asm
jon-chuang Sep 14, 2020
3a8e853
uncomment subgroup checks, glv param sources
jon-chuang Sep 14, 2020
16f5005
gpu scalar mul
jon-chuang Sep 20, 2020
4ec989b
fix dependency issues
jon-chuang Sep 20, 2020
8469bbb
Extend GPU scalar mul to all curves
jon-chuang Sep 21, 2020
0a9d59b
refactor
jon-chuang Sep 21, 2020
06ea360
CPU + GPU coprocessing
jon-chuang Oct 1, 2020
fb84f7d
With suboptimal BW6 assembly
jon-chuang Oct 1, 2020
1a47280
add static partitioning
jon-chuang Oct 2, 2020
24e2521
profiling-based static partitioining
jon-chuang Oct 3, 2020
9e7ac90
statically partition between multiple gpus
jon-chuang Oct 4, 2020
1cac126
comments
jon-chuang Oct 4, 2020
ff7777d
BBaseField -> BaseFieldForBatch
jon-chuang Oct 5, 2020
13241ec
Outline of basic traits
jon-chuang Oct 5, 2020
71b60de
Remove sw_proj, add gpu support for all sw projective curves
jon-chuang Oct 6, 2020
3d1885e
impl gpu kernels for all curves
jon-chuang Oct 6, 2020
c78beb1
feature-gate with "cuda"
jon-chuang Oct 6, 2020
0514459
Merge branch 'master' into jonch/mongrel
jon-chuang Oct 6, 2020
3d112d0
rename curves/gpu directory to curves/cuda
jon-chuang Oct 6, 2020
9836675
Fix merge errors
jon-chuang Oct 6, 2020
113c621
Use github rather than local jon-chuang/accel
jon-chuang Oct 6, 2020
c3861eb
again
jon-chuang Oct 6, 2020
19c424c
again
jon-chuang Oct 6, 2020
0485533
update README
jon-chuang Oct 6, 2020
60dab2e
feature = "cuda"
jon-chuang Oct 6, 2020
d4bcf87
gpu_standalone (good for non-generic), feature gate under cuda too
jon-chuang Oct 6, 2020
2037c4e
Merge branch 'master' into jonch/mongrel
jon-chuang Oct 6, 2020
3dac0ee
fix merging errors
jon-chuang Oct 6, 2020
f269e6a
make helpers a same-file module
jon-chuang Oct 6, 2020
9ad9faa
remove cancerous --all-features from github yml
jon-chuang Oct 6, 2020
d504482
Use dummy accel_dummy crate for when not compiling as CUDA
jon-chuang Oct 6, 2020
dd204c2
feature gate accel import
jon-chuang Oct 6, 2020
4f05be5
fix no_std
jon-chuang Oct 6, 2020
f693b96
fix gpu-standalone does not depend algebra-core/cuda
jon-chuang Oct 6, 2020
eb37b29
lazy static optional
jon-chuang Oct 6, 2020
5616302
kernel-specific static profile data
jon-chuang Oct 6, 2020
262d140
cuda test, cached profile data (in OS cache dir) for all curves
jon-chuang Oct 6, 2020
03b36b3
rectify omission of NAMESPACE, minor errors
jon-chuang Oct 6, 2020
d94a3aa
fix no_std, group size in bits too large for 2 groups (mnt6, cp6 - Fq3)
jon-chuang Oct 6, 2020
96d2fa5
toml fixes
jon-chuang Oct 6, 2020
a292866
update README
jon-chuang Oct 6, 2020
014a878
remove extraneous file
jon-chuang Oct 6, 2020
986885e
bake in check for oversized group elems
jon-chuang Oct 6, 2020
ca91eba
typo
jon-chuang Oct 6, 2020
45d0e44
remove boilerplate/compactify
jon-chuang Oct 10, 2020
9938870
remove standalone
jon-chuang Oct 12, 2020
f46c436
fmt
jon-chuang Oct 12, 2020
c1a4682
Merge branch 'master' into jonch/mongrel
jon-chuang Oct 12, 2020
91c8bf8
fix println and comments
jon-chuang Oct 12, 2020
de0df85
Merge branch 'master' into jonch/mongrel
jon-chuang Oct 12, 2020
4f10b62
fix: typo
jon-chuang Oct 12, 2020
e88806c
Update README.md
jon-chuang Oct 13, 2020
088d260
Make GPUScalarMulInternal APIs, only expose two APIs
jon-chuang Oct 19, 2020
5aefed1
Merge branch 'master' into jonch/mongrel
jon-chuang Oct 20, 2020
a4963a6
add ci to test cuda compilation/link and cuda scalar mul when no gpu
jon-chuang Nov 6, 2020
47e3e87
Merge branch 'master' into jonch/mongrel
jon-chuang Nov 6, 2020
61b49ae
change kernel accel compile branch to master
jon-chuang Nov 6, 2020
6c45c02
fix ci
kobigurk Nov 6, 2020
850fc56
use unreachable instead of empty implementation
kobigurk Nov 6, 2020
9859cb7
install required toolchain
kobigurk Nov 6, 2020
c60ca93
Empty commit to get CI working
kobigurk Nov 6, 2020
7f7c887
try to fix ci
kobigurk Nov 6, 2020
9e7c407
Merge remote-tracking branch 'origin/master' into jonch/mongrel
kobigurk Nov 6, 2020
22cfcd1
fmt
kobigurk Nov 6, 2020
f9355b8
fix ci
kobigurk Nov 6, 2020
478a526
safer error handling in gpu code
kobigurk Nov 6, 2020
ae0909c
fix ci
kobigurk Nov 6, 2020
16f408f
handle dirs crate not available without cuda
kobigurk Nov 6, 2020
44ac6d9
don't check early intermediate results
kobigurk Nov 6, 2020
0e5f2c4
fix no_std and nightly
kobigurk Nov 7, 2020
06cc547
fix remaining errors
jon-chuang Nov 8, 2020
24bb1f1
No for_tests
jon-chuang Nov 8, 2020
e4fcb04
Feature gate clear profile data
jon-chuang Nov 8, 2020
95902fc
install cuda library to successfully link
kobigurk Nov 8, 2020
5e9c0a0
change the order of CI jobs
jon-chuang Nov 8, 2020
1235667
change the order of CI again
jon-chuang Nov 8, 2020
5b53d60
cd ..
jon-chuang Nov 8, 2020
3b84656
Get rid of cacheing
jon-chuang Nov 8, 2020
c966a57
Never all features
jon-chuang Nov 8, 2020
a0ae36f
Put back cacheing
jon-chuang Nov 8, 2020
152fd36
Remove cuda .deb to save disk space
jon-chuang Nov 9, 2020
51ce96b
Increase max-parallel
jon-chuang Nov 9, 2020
b508064
check examples with all features
kobigurk Nov 9, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 33 additions & 14 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ jobs:
toolchain: stable
override: true
components: rustfmt

default: true
- name: cargo fmt --check
uses: actions-rs/cargo@v1
with:
Expand All @@ -35,6 +35,7 @@ jobs:
env:
RUSTFLAGS: -Dwarnings
strategy:
max-parallel: 6
matrix:
rust:
- stable
Expand All @@ -50,14 +51,38 @@ jobs:
toolchain: ${{ matrix.rust }}
override: true

- uses: actions/cache@v2
with:
path: |
~/.cargo/registry
~/.cargo/git
target
- name: Install CUDA toolchains
run: |
wget -q https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget -q https://developer.download.nvidia.com/compute/cuda/11.1.1/local_installers/cuda-repo-ubuntu1804-11-1-local_11.1.1-455.32.00-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-11-1-local_11.1.1-455.32.00-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu1804-11-1-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda
rm cuda-repo-ubuntu*
curl -sSL https://github.com/jon-chuang/accel/raw/master/setup_nvptx_toolchain.sh | bash

- uses: actions/cache@v2
with:
path: |
~/.cargo/registry
~/.cargo/git
target
key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}

- name: Test algebra with CUDA
run: |
cd algebra
cargo test --features "all_curves cuda cuda_test"
cd ..

- name: Test algebra
run: |
cd algebra
cargo test --features full
cd ..

- name: Check examples
uses: actions-rs/cargo@v1
with:
Expand All @@ -68,7 +93,7 @@ jobs:
uses: actions-rs/cargo@v1
with:
command: check
args: --examples --all-features --all
args: --all-features --examples --all
if: matrix.rust == 'stable'

- name: Check benchmarks on nightly
Expand All @@ -88,12 +113,6 @@ jobs:
--exclude ff-fft-benches \
-- --skip dpc --skip integration_test"

- name: Test algebra
run: |
cd algebra
cargo test --features full
cd ..

- name: Test algebra with assembly
run: |
cd algebra
Expand Down
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ members = [
"r1cs-core",
"r1cs-std",
"algebra-core/algebra-core-derive",
"scripts/glv_lattice_basis"
"scripts/glv_lattice_basis",
]

[profile.release]
Expand Down
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,13 @@ To bench `algebra-benches` with greater accuracy, especially for functions with
cargo +nightly bench --features "n_fold bls12_381"
```

CUDA support is available for a limited set of functions. To allow compilation for CUDA on Linux, first run the script
```
curl -sSL https://github.com/jon-chuang/accel/raw/master/setup_nvptx_toolchain.sh | bash
```
or run the equivalent commands for your OS. Then, pass the `cuda` feature to rustc or cargo when compiling, and import the relevant traits (e.g. GPUScalarMulSlice) wherever the functions are called.

When the `cuda` feature is not activated, Zexe will still compile. However, when either the `cuda` feature is not activated during compilation or CUDA is not detected on your system at runtime, Zexe will default to a CPU-only implementation of the same functionality.

## License

Expand Down
3 changes: 2 additions & 1 deletion algebra-benches/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,10 @@ rand_xorshift = { version = "0.2" }
paste = "1.0"

[features]
bw6_asm = [ "algebra/bw6_asm"]
asm = [ "algebra/asm"]
prefetch = [ "algebra/prefetch"]
bw6_asm = [ "algebra/bw6_asm"]
cuda = [ "algebra/cuda" ]
n_fold = []
mnt4_298 = [ "algebra/mnt4_298"]
mnt6_298 = [ "algebra/mnt6_298"]
Expand Down
14 changes: 11 additions & 3 deletions algebra-core/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,32 +27,40 @@ algebra-core-derive = { path = "algebra-core-derive", optional = true }
derivative = { version = "2", features = ["use_core"] }
num-traits = { version = "0.2", default-features = false }
rand = { version = "0.7", default-features = false }
rayon = { version = "1", optional = true }
rayon = { version = "1.3.0", optional = true }
unroll = { version = "=0.1.4" }
itertools = { version = "0.9.0", default-features = false }
either = { version = "1.6.0", default-features = false }
thread-id = { version = "3.3.0", optional = true }
backtrace = { version = "0.3", optional = true }
accel = { git = "https://github.com/jon-chuang/accel", package = "accel", optional = true }
peekmore = "0.5.6"
closure = { version = "0.3.0", optional = true }
lazy_static = { version = "1.4.0", optional = true }
serde_json = { version = "1.0.58", optional = true }
dirs = { version = "1.0.5", optional = true }
log = { version = "0.4.11", optional = true }
paste = "0.1"

[build-dependencies]
field-assembly = { path = "./field-assembly", optional = true }
cc = "1.0"
rustc_version = "0.2"
cc = "1.0"

[dev-dependencies]
rand_xorshift = "0.2"

[features]
bw6_asm = []
default = [ "std", "rand/default" ]
std = []
parallel = [ "std", "rayon", "rand/default" ]
derive = [ "algebra-core-derive" ]
prefetch = [ "std" ]
cuda = [ "std", "parallel", "accel", "lazy_static", "serde_json", "dirs", "closure", "log" ]

timing = [ "std", "backtrace" ]
timing_detailed = [ "std", "backtrace" ]
timing_thread_id = [ "thread-id" ]

llvm_asm = [ "field-assembly" ]
bw6_asm = []
2 changes: 1 addition & 1 deletion algebra-core/algebra-core-derive/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,4 @@ proc-macro = true
[dependencies]
proc-macro2 = "1.0"
syn = "1.0"
quote = "1.0"
quote = "1.0.7"
2 changes: 1 addition & 1 deletion algebra-core/mince/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ edition = "2018"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
quote = "1.0"
quote = "1.0.7"
syn = {version = "1.0.17", features = ["full"]}

[lib]
Expand Down
2 changes: 1 addition & 1 deletion algebra-core/src/bytes.rs
Original file line number Diff line number Diff line change
Expand Up @@ -316,7 +316,7 @@ mod test {
fn test_macro_empty() {
let array: Vec<u8> = vec![];
let bytes: Vec<u8> = to_bytes![array].unwrap();
assert_eq!(&bytes, &[]);
assert_eq!(bytes, Vec::<u8>::new());
assert_eq!(bytes.len(), 0);
}

Expand Down
4 changes: 2 additions & 2 deletions algebra-core/src/curves/batch_arith.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ pub trait BatchGroupArithmetic
where
Self: Sized + Clone + Copy + Zero + Neg<Output = Self>,
{
type BBaseField: Field;
type BaseFieldForBatch: Field;
jon-chuang marked this conversation as resolved.
Show resolved Hide resolved

// We use the w-NAF method, achieving point density of approximately 1/(w + 1)
// and requiring storage of only 2^(w - 1).
Expand Down Expand Up @@ -136,7 +136,7 @@ where
fn batch_double_in_place(
bases: &mut [Self],
index: &[u32],
scratch_space: Option<&mut Vec<Self::BBaseField>>,
scratch_space: Option<&mut Vec<Self::BaseFieldForBatch>>,
);

/// Mutates bases in place and stores result in the first operand.
Expand Down
9 changes: 9 additions & 0 deletions algebra-core/src/curves/cuda/accel_dummy.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#[cfg(not(feature = "std"))]
use alloc::vec::Vec;
pub mod error {
pub type Result<T> = T;
}

pub struct Context {}

pub type DeviceMemory<T> = Vec<T>;
6 changes: 6 additions & 0 deletions algebra-core/src/curves/cuda/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#[macro_use]
pub mod scalar_mul;
pub use scalar_mul::*;

#[cfg(not(feature = "cuda"))]
pub mod accel_dummy;
Loading