openmlsys-cuda

Examples for beginners to write your own high-performance AI operators. We introduced optimizations tricks like using shared memory and pipeline rearrangement to maximize the throughput. We also provided an example for using CUTLASS to implement an FC + ReLU fused operator.

Dependencies

Eigen: CPU linear algebra template library
OpenMP: Enable multi-threads acceleration on CPU
CUDA toolkit: Compile GPU kernels and analyse GPU executions
Gflags: Commandline flags library released by Google
CUTLASS: GPU GEMM template library

Installation Hints

Eigen: Use package manager, e.g. apt install libeigen3-dev, or download from the official website and build from source.
OpenMP: Most time the compilers have already integrated with OpenMP. If your compiler does not support OpenMP, try apt install libgomp-dev or apt install libomp-dev for GCC or Clang separately.
CUDA toolkit: It's recommended to install following the official instructions.
Gflags: Use package manager, e.g. apt install libgflags-dev, or download from the official website and build from source.
CUTLASS: We have registered it to our git module, so you do not have to install by yourself.

Compilation

Once you have installed the dependencies, you can use the following instruction to compile the project:

git clone [email protected]:openmlsys/openmlsys-cuda.git
cd openmlsys-cuda
git submodule init && git submodule sync
mkdir build && cd build
cmake ..
make -j4

Examples

first_attempt: The naive implementation
gemm: Collection of implementations using different optimization tricks
fc_relu: Example for fusing FC and ReLU by using CUTLASS

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
third_party		third_party
.clang-format		.clang-format
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
README.md		README.md
fc_relu.cu		fc_relu.cu
first_attempt.cu		first_attempt.cu
gemm.cu		gemm.cu
gemm_final.cu		gemm_final.cu
gemm_hide_smem_latency.cu		gemm_hide_smem_latency.cu
gemm_naive.cu		gemm_naive.cu
gemm_transpose_smem.cu		gemm_transpose_smem.cu
gemm_use_128.cu		gemm_use_128.cu
gemm_use_smem.cu		gemm_use_smem.cu
gemm_use_tile.cu		gemm_use_tile.cu
util.cuh		util.cuh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

openmlsys-cuda

Dependencies

Installation Hints

Compilation

Examples

About

Releases

Packages

Contributors 2

Languages

openmlsys/openmlsys-cuda

Folders and files

Latest commit

History

Repository files navigation

openmlsys-cuda

Dependencies

Installation Hints

Compilation

Examples

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages