silx-kit · kif · Oct 10, 2024 · Oct 15, 2024 · Oct 16, 2024 · Oct 16, 2024
diff --git a/opencl/Installation/Readme.md b/opencl/Installation/Readme.md
@@ -0,0 +1,27 @@
+# Installation procedure:
+
+## Visa
+Log into https://visa.esrf.fr
+Create new instance
+Custommize the instance settings
+Choose 32cores, 128GB memory: esrf.gpu.a16.large
+check "I accept"
+Create the instance
+Connect to the instance
+
+## Install the environment
+Open a web browser on https://github.com/conda-forge/miniforge
+Open a terminal
+Install miniforge from:
+wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
+bash Miniforge3-Linux-x86_64.sh
+
+mamba install ipykernel pyopencl ocl-icd clinfo clpeak pillow matplotlib ipympl oclgrind
+python -m ipykernel install --user --name=opencl
+
+## Clone the working directory
+git clone https://github.com/silx-kit/silx-training
+
+## Jupyterlab
+Reconnect to the instance using jupyterlab
+
diff --git a/opencl/OpenCL.ipynb b/opencl/OpenCL.ipynb
@@ -3,9 +3,11 @@
  {
  "cell_type": "markdown",
  "metadata": {
+ "editable": true,
  "slideshow": {
  "slide_type": "slide"
- }
+ },
+ "tags": []
  },
  "source": [
  "# GPU programming with PyOpenCL\n",
@@ -24,14 +26,16 @@
  {
  "cell_type": "markdown",
  "metadata": {
+ "editable": true,
  "slideshow": {
  "slide_type": "slide"
- }
+ },
+ "tags": []
  },
  "source": [
  "## Comparison of modern CPU and GPU\n",
  "<div>\n",
- " <img src=\"cpu_gpu.png\" width=\"1200\">\n",
+ " <img src=\"cpu_gpu_2024.png\" width=\"1200\">\n",
  "</div>\n",
  "They are not that different: both are many-core processors.\n",
  "\n",
@@ -42,9 +46,11 @@
  {
  "cell_type": "markdown",
  "metadata": {
+ "editable": true,
  "slideshow": {
  "slide_type": "slide"
- }
+ },
+ "tags": []
  },
  "source": [
  "## Introduction to OpenCL\n",
@@ -173,13 +179,37 @@
  "output_type": "stream",
  "text": [
  "Choose platform:\n",
- "[0] <pyopencl.Platform 'NVIDIA CUDA' at 0x5653b7405a40>\n",
- "[1] <pyopencl.Platform 'Intel(R) OpenCL' at 0x5653b73fff00>\n",
- "Choice [0]:0\n",
+ "[0] <pyopencl.Platform 'NVIDIA CUDA' at 0x55f3c21393b0>\n",
+ "[1] <pyopencl.Platform 'Intel(R) OpenCL' at 0x55f3c1fead68>\n"
+ ]
+ },
+ {
+ "name": "stdin",
+ "output_type": "stream",
+ "text": [
+ "Choice [0]: 0\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
  "Choose device(s):\n",
- "[0] <pyopencl.Device 'GeForce GTX TITAN' on 'NVIDIA CUDA' at 0x5653b73dc750>\n",
- "[1] <pyopencl.Device 'Quadro M2000' on 'NVIDIA CUDA' at 0x5653b73dc170>\n",
- "Choice, comma-separated [0]:0\n",
+ "[0] <pyopencl.Device 'NVIDIA RTX A5000' on 'NVIDIA CUDA' at 0x55f3c2139420>\n",
+ "[1] <pyopencl.Device 'Quadro P2200' on 'NVIDIA CUDA' at 0x55f3c21394b0>\n"
+ ]
+ },
+ {
+ "name": "stdin",
+ "output_type": "stream",
+ "text": [
+ "Choice, comma-separated [0]: 0\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
  "Set the environment variable PYOPENCL_CTX='0:0' to avoid being asked again.\n"
  ]
  }
@@ -334,7 +364,7 @@
  "name": "stdout",
  "output_type": "stream",
  "text": [
- "<pyopencl._cl.Event object at 0x7f7530da5450>\n"
+ "<pyopencl._cl.Event object at 0x7f13681ff7d0>\n"
  ]
  }
  ],
@@ -381,9 +411,9 @@
  "name": "stdout",
  "output_type": "stream",
  "text": [
- "Execution time on GPU: 0.906 ms\n",
+ "Execution time on GPU: 0.346 ms\n",
  "Execution time on CPU:\n",
- "18.7 ms ± 126 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
+ "9.29 ms ± 6.74 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
  ]
  }
  ],
@@ -437,7 +467,7 @@
  },
  {
  "cell_type": "code",
- "execution_count": 19,
+ "execution_count": 10,
  "metadata": {
  "slideshow": {
  "slide_type": "fragment"
@@ -479,8 +509,8 @@
  "name": "stdout",
  "output_type": "stream",
  "text": [
- "CPU times: user 242 µs, sys: 100 µs, total: 342 µs\n",
- "Wall time: 348 µs\n"
+ "CPU times: user 109 µs, sys: 46 µs, total: 155 µs\n",
+ "Wall time: 159 µs\n"
  ]
  },
  {
@@ -517,9 +547,9 @@
  "One can use `reductions` to perform the scalar product ... \n",
  "`a.b = sum(a[i]*b[i])` where `a[i]*b[i]` is the **map** operation and the sum is the **reduction**.\n",
  "\n",
- "Reduction are not limited to addition, any commutative operation is usable like `*`, `min`, `max`, ... \n",
+ "Reduction are not limited to addition, any associative operation is usable like `*`, `min`, `max`, ... \n",
  "\n",
- "**⚠** addition is not strictly commutative for floating point numbers, but that's another story."
+ "**⚠** addition is not strictly associative for floating point numbers, but that's another story."
  ]
  },
  {
@@ -574,11 +604,11 @@
  "output_type": "stream",
  "text": [
  "Performance evaluation on the GPU:\n",
- "847 µs ± 3.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n",
+ "424 µs ± 365 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)\n",
  "Performance evaluation on CPU (32 bits):\n",
- "9.25 ms ± 16.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n",
+ "4.39 ms ± 1.26 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n",
  "Performance evaluation on CPU (64 bits):\n",
- "7.09 ms ± 685 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
+ "1.91 ms ± 21.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
  ]
  }
  ],
@@ -640,13 +670,14 @@
  "\n",
  " __kernel void reduce_kernel_stage2(\n",
  " __global pcl_out_type *pcl_out__base, long pcl_out__offset,\n",
- " __global float *pyopencl_reduction_inp, __global float *a, __global float *b, \n",
+ " __global float *pyopencl_reduction_inp, __global float *a__base, long a__offset, __global float *b__base, long b__offset, \n",
  " long pcl_start, long pcl_step, long pcl_stop,\n",
  " unsigned int pcl_seq_count, long n)\n",
  " {\n",
  " __global pcl_out_type *pcl_out = (__global pcl_out_type *) (\n",
  " (__global char *) pcl_out__base + pcl_out__offset);\n",
- " \n",
+ " __global float *a = (__global float *) ((__global char *) a__base + a__offset);\n",
+ "__global float *b = (__global float *) ((__global char *) b__base + b__offset);\n",
  "\n",
  " __local pcl_out_type pcl_ldata[PCL_GROUP_SIZE];\n",
  "\n",
@@ -975,7 +1006,7 @@
  "name": "stdout",
  "output_type": "stream",
  "text": [
- "Length of the source code: 313 lines\n"
+ "Length of the source code: 314 lines\n"
  ]
  }
  ],
@@ -997,7 +1028,7 @@
  "![Scan kernel](300px-Prefix_sum_16.svg.png)\n",
  "Their typical application is *in memory* compaction or in compression algorithms. \n",
  "\n",
- "Once again, this is not limited to the `+` operation, it is valid for any commutative operation.\n",
+ "Once again, this is not limited to the `+` operation, it is valid for any associative operation.\n",
  "\n",
  "In `numpy` this is implemented in the `cumsum` function:"
  ]
@@ -1046,11 +1077,11 @@
  "output_type": "stream",
  "text": [
  "Performances of `cumsum` implemented in pyopencl:\n",
- "8.47 ms ± 27 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n",
+ "2.21 ms ± 3.17 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n",
  "Performances of `cumsum` implemented in numpy (32 bits):\n",
- "44.2 ms ± 109 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n",
+ "49.3 ms ± 30.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n",
  "Performances of `cumsum` implemented in numpy (64 bits):\n",
- "50.2 ms ± 340 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
+ "54.6 ms ± 33 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
  ]
  }
  ],
@@ -1103,7 +1134,7 @@
  "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
- "display_name": "Python 3",
+ "display_name": "Python 3 (ipykernel)",
  "language": "python",
  "name": "python3"
  },
@@ -1117,7 +1148,7 @@
  "name": "python",
  "nbconvert_exporter": "python",
  "pygments_lexer": "ipython3",
- "version": "3.8.0"
+ "version": "3.11.0"
  }
  },
  "nbformat": 4,

diff --git a/opencl/Tutorials/1-Map.ipynb b/opencl/Tutorials/1-Map.ipynb
diff --git a/opencl/Tutorials/14h33m35-SM-N970U1.jpg b/opencl/Tutorials/14h33m35-SM-N970U1.jpg
diff --git a/opencl/Tutorials/2-Gather.ipynb b/opencl/Tutorials/2-Gather.ipynb