This course is a high-level GPU programming for parallel data processing. Topics cover parallel CUDA programming on GPU including efficient memory access, threading models, multi-stream, and multi-GPU programming. Focusing on hands-on applications such as big data processing, visualization, and an artificial intelligence through the real-time GPU system.
Recommended preparation: High-level C/C++ programming skills, ECE 15 or equivalent, CSE 240A or the equivalent.
Prerequisites: graduate standing.
This course aims to learn high-level CUDA programming skills through class labs, class quiz, and homework labs. Thus, all the enrolled students are required to be familiar with C/C++ programming and computer architecture such as SIMD, cache, registers, memory, instructions, pipeline, and so on (Check out Prerequisites).
Also, all the enrolled students must be familiar with CMake and Visual Studio compile and debug.
John Cheng, Max Grossman, Ty McKercher, "Professional CUDA C Programming", Wiley, 2014. ISBN: 978-1-118-73932-7
Prerequisites:
C/C++ programming skill (require hands-on experiences) (ECE-15, ECE-17)
CPU architecture course (ECE-30)
Machine learning and Pytorch
Here are some online tutorials for CMake and Visual Studio.
- CMake for Visual studio
Basic CMake Hello World: https://www.youtube.com/watch?v=nbNDhC9Tvg4
- Visual Studio Debugging
How to debug C++ in VS. :https://www.youtube.com/watch?v=0ebzPwixrJA
- Lab1: No-learning Reinforcement Learning - cuda memory allocate
- Lab2: Single Agent Reinforcement Learning
- Lab3: Multi-Agent Reinforcement Learning on CUDA
- Midterm: Optimization Strategies for Lab3
- Final Project
- Quiz0: CUDA setup
- Quiz1: CPU and GPU programming
- Quiz2: CUDA boot speed programming