Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenCL tutorial #125

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions opencl/Installation/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Installation procedure:

## Visa
Log into https://visa.esrf.fr
Create new instance
Custommize the instance settings
Choose 32cores, 128GB memory: esrf.gpu.a16.large
check "I accept"
Create the instance
Connect to the instance

## Install the environment
Open a web browser on https://github.com/conda-forge/miniforge
Open a terminal
Install miniforge from:
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
bash Miniforge3-Linux-x86_64.sh

mamba install ipykernel pyopencl ocl-icd clinfo clpeak pillow matplotlib ipympl oclgrind
python -m ipykernel install --user --name=opencl

## Clone the working directory
git clone https://github.com/silx-kit/silx-training

## Jupyterlab
Reconnect to the instance using jupyterlab

91 changes: 61 additions & 30 deletions opencl/OpenCL.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,11 @@
{
"cell_type": "markdown",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "slide"
}
},
"tags": []
},
"source": [
"# GPU programming with PyOpenCL\n",
Expand All @@ -24,14 +26,16 @@
{
"cell_type": "markdown",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "slide"
}
},
"tags": []
},
"source": [
"## Comparison of modern CPU and GPU\n",
"<div>\n",
" <img src=\"cpu_gpu.png\" width=\"1200\">\n",
" <img src=\"cpu_gpu_2024.png\" width=\"1200\">\n",
"</div>\n",
"They are not that different: both are many-core processors.\n",
"\n",
Expand All @@ -42,9 +46,11 @@
{
"cell_type": "markdown",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "slide"
}
},
"tags": []
},
"source": [
"## Introduction to OpenCL\n",
Expand Down Expand Up @@ -173,13 +179,37 @@
"output_type": "stream",
"text": [
"Choose platform:\n",
"[0] <pyopencl.Platform 'NVIDIA CUDA' at 0x5653b7405a40>\n",
"[1] <pyopencl.Platform 'Intel(R) OpenCL' at 0x5653b73fff00>\n",
"Choice [0]:0\n",
"[0] <pyopencl.Platform 'NVIDIA CUDA' at 0x55f3c21393b0>\n",
"[1] <pyopencl.Platform 'Intel(R) OpenCL' at 0x55f3c1fead68>\n"
]
},
{
"name": "stdin",
"output_type": "stream",
"text": [
"Choice [0]: 0\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Choose device(s):\n",
"[0] <pyopencl.Device 'GeForce GTX TITAN' on 'NVIDIA CUDA' at 0x5653b73dc750>\n",
"[1] <pyopencl.Device 'Quadro M2000' on 'NVIDIA CUDA' at 0x5653b73dc170>\n",
"Choice, comma-separated [0]:0\n",
"[0] <pyopencl.Device 'NVIDIA RTX A5000' on 'NVIDIA CUDA' at 0x55f3c2139420>\n",
"[1] <pyopencl.Device 'Quadro P2200' on 'NVIDIA CUDA' at 0x55f3c21394b0>\n"
]
},
{
"name": "stdin",
"output_type": "stream",
"text": [
"Choice, comma-separated [0]: 0\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Set the environment variable PYOPENCL_CTX='0:0' to avoid being asked again.\n"
]
}
Expand Down Expand Up @@ -334,7 +364,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"<pyopencl._cl.Event object at 0x7f7530da5450>\n"
"<pyopencl._cl.Event object at 0x7f13681ff7d0>\n"
]
}
],
Expand Down Expand Up @@ -381,9 +411,9 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Execution time on GPU: 0.906 ms\n",
"Execution time on GPU: 0.346 ms\n",
"Execution time on CPU:\n",
"18.7 ms ± 126 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
"9.29 ms ± 6.74 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
]
}
],
Expand Down Expand Up @@ -437,7 +467,7 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 10,
"metadata": {
"slideshow": {
"slide_type": "fragment"
Expand Down Expand Up @@ -479,8 +509,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 242 µs, sys: 100 µs, total: 342 µs\n",
"Wall time: 348 µs\n"
"CPU times: user 109 µs, sys: 46 µs, total: 155 µs\n",
"Wall time: 159 µs\n"
]
},
{
Expand Down Expand Up @@ -517,9 +547,9 @@
"One can use `reductions` to perform the scalar product ... \n",
"`a.b = sum(a[i]*b[i])` where `a[i]*b[i]` is the **map** operation and the sum is the **reduction**.\n",
"\n",
"Reduction are not limited to addition, any commutative operation is usable like `*`, `min`, `max`, ... \n",
"Reduction are not limited to addition, any associative operation is usable like `*`, `min`, `max`, ... \n",
"\n",
"**⚠** addition is not strictly commutative for floating point numbers, but that's another story."
"**⚠** addition is not strictly associative for floating point numbers, but that's another story."
]
},
{
Expand Down Expand Up @@ -574,11 +604,11 @@
"output_type": "stream",
"text": [
"Performance evaluation on the GPU:\n",
"847 µs ± 3.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n",
"424 µs ± 365 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)\n",
"Performance evaluation on CPU (32 bits):\n",
"9.25 ms ± 16.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n",
"4.39 ms ± 1.26 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n",
"Performance evaluation on CPU (64 bits):\n",
"7.09 ms ± 685 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
"1.91 ms ± 21.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
]
}
],
Expand Down Expand Up @@ -640,13 +670,14 @@
"\n",
" __kernel void reduce_kernel_stage2(\n",
" __global pcl_out_type *pcl_out__base, long pcl_out__offset,\n",
" __global float *pyopencl_reduction_inp, __global float *a, __global float *b, \n",
" __global float *pyopencl_reduction_inp, __global float *a__base, long a__offset, __global float *b__base, long b__offset, \n",
" long pcl_start, long pcl_step, long pcl_stop,\n",
" unsigned int pcl_seq_count, long n)\n",
" {\n",
" __global pcl_out_type *pcl_out = (__global pcl_out_type *) (\n",
" (__global char *) pcl_out__base + pcl_out__offset);\n",
" \n",
" __global float *a = (__global float *) ((__global char *) a__base + a__offset);\n",
"__global float *b = (__global float *) ((__global char *) b__base + b__offset);\n",
"\n",
" __local pcl_out_type pcl_ldata[PCL_GROUP_SIZE];\n",
"\n",
Expand Down Expand Up @@ -975,7 +1006,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Length of the source code: 313 lines\n"
"Length of the source code: 314 lines\n"
]
}
],
Expand All @@ -997,7 +1028,7 @@
"![Scan kernel](300px-Prefix_sum_16.svg.png)\n",
"Their typical application is *in memory* compaction or in compression algorithms. \n",
"\n",
"Once again, this is not limited to the `+` operation, it is valid for any commutative operation.\n",
"Once again, this is not limited to the `+` operation, it is valid for any associative operation.\n",
"\n",
"In `numpy` this is implemented in the `cumsum` function:"
]
Expand Down Expand Up @@ -1046,11 +1077,11 @@
"output_type": "stream",
"text": [
"Performances of `cumsum` implemented in pyopencl:\n",
"8.47 ms ± 27 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n",
"2.21 ms ± 3.17 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n",
"Performances of `cumsum` implemented in numpy (32 bits):\n",
"44.2 ms ± 109 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n",
"49.3 ms ± 30.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n",
"Performances of `cumsum` implemented in numpy (64 bits):\n",
"50.2 ms ± 340 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
"54.6 ms ± 33 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
]
}
],
Expand Down Expand Up @@ -1103,7 +1134,7 @@
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -1117,7 +1148,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.0"
"version": "3.11.0"
}
},
"nbformat": 4,
Expand Down
768 changes: 768 additions & 0 deletions opencl/Tutorials/1-Map.ipynb

Large diffs are not rendered by default.

Binary file added opencl/Tutorials/14h33m35-SM-N970U1.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
529 changes: 529 additions & 0 deletions opencl/Tutorials/2-Gather.ipynb

Large diffs are not rendered by default.

Loading