-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CoreNEURON error when running on NVIDIA Grace Hopper GPU and NVHPC 24.9 #3144
Comments
@iraikov : could you turn off unified memory and try? i.e. |
Did you clean old build directory? Just in case...! I don't have access to Grace-Hopper but on my local Ubuntu box with $ cmake .. -DNRN_ENABLE_INTERVIEWS=OFF -DNRN_ENABLE_MPI=ON -DNRN_ENABLE_RX3D=OFF -DNRN_ENABLE_CORENEURON=ON -DCMAKE_INSTALL_PREFIX=`pwd`/install -DNRN_ENABLE_TESTS=OFF -DCORENRN_ENABLE_GPU=ON
-- The C compiler identification is NVHPC 24.9.0
-- The CXX compiler identification is NVHPC 24.9.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/nvidia/hpc_sdk/Linux_x86_64/24.9/compilers/bin/nvc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/nvidia/hpc_sdk/Linux_x86_64/24.9/compilers/bin/nvc++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Setting build type to 'RelWithDebInfo' as none was specified.
-- The compiler /opt/nvidia/hpc_sdk/Linux_x86_64/24.9/compilers/bin/nvc++ has no support for OpenMP SIMD construct
-- 3rd party project: using Random123 from "external/Random123"
-- 3rd party project: using eigen from "external/eigen"
-- Sub-project : using fmt from from /home/kumbhar/workarena/repos/bbp/nrn/external/fmt
-- {fmt} version: 11.0.2
-- Build type: RelWithDebInfo
-- No python executable specified. Looking for `python3` in the PATH...
-- Checking if /usr/bin/python3 is a working python
-- Found BISON: /usr/bin/bison (found version "3.8.2")
-- Found FLEX: /usr/bin/flex (found suitable version "2.6.4", minimum required is "2.6")
-- Found Readline: /usr/include
-- Found MPI_C: /opt/nvidia/hpc_sdk/Linux_x86_64/24.9/comm_libs/12.6/hpcx/hpcx-2.20/ompi/lib/libmpi.so (found version "3.1")
-- Found MPI_CXX: /opt/nvidia/hpc_sdk/Linux_x86_64/24.9/comm_libs/12.6/hpcx/hpcx-2.20/ompi/lib/libmpi.so (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
-- Detected OpenMPI 4.1.7
-- Sub-project : using nanobind from from /home/kumbhar/workarena/repos/bbp/nrn/external/nanobind
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- 3rd party project: using CLI11 from "external/CLI11"
-- Building CoreNEURON
-- Found Git: /usr/bin/git (found version "2.34.1")
-- Setting default CUDA architectures to 70;80
-- The CUDA compiler identification is NVIDIA 12.6.20
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /opt/nvidia/hpc_sdk/Linux_x86_64/24.9/compilers/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: /opt/nvidia/hpc_sdk/Linux_x86_64/24.9/cuda/12.6/include (found suitable version "12.6.20", minimum required is "9.0")
-- Could NOT find nmodl (missing: nmodl_BINARY)
-- Sub-project : using nmodl from from /home/kumbhar/workarena/repos/bbp/nrn/external/nmodl
-- CHECKING FOR FLEX/BISON
-- Found BISON: /usr/bin/bison (found suitable version "3.8.2", minimum required is "3.0")
-- NMODL_TEST_FORMATTING: OFF
-- NMODL_GIT_HOOKS: OFF
-- NMODL_GIT_COMMIT_HOOKS:
-- NMODL_GIT_PUSH_HOOKS: courtesy-msg
-- NMODL_STATIC_ANALYSIS: OFF
-- NMODL_TEST_STATIC_ANALYSIS: OFF
-- 3rd party project: using json from "ext/json"
-- Using the multi-header code from /home/kumbhar/workarena/repos/bbp/nrn/external/nmodl/ext/json/include/
-- 3rd party project: using pybind11 from "ext/pybind11"
-- pybind11 v2.12.0
-- Found PythonInterp: /usr/bin/python3 (found suitable version "3.10.12", minimum required is "3.6")
-- Found PythonLibs: /usr/lib/x86_64-linux-gnu/libpython3.10.so
-- 3rd party project: using spdlog from "ext/spdlog"
-- Build spdlog: 1.13.0
-- Build type: RelWithDebInfo
-- CHECKING FOR PYTHON
-- Found Python: /usr/bin/python3.10 (found suitable version "3.10.12", minimum required is "3.8") found components: Interpreter
--
-- Configured NMODL 0.6 (e6250014d 2024-09-10 09:00:35 -0400)
--
-- You can now build NMODL using:
-- cmake --build . --parallel 8 [--target TARGET]
-- You might want to adjust the number of parallel build jobs for your system.
-- Some non-default targets you might want to build:
-- --------------------+--------------------------------------------------------
-- Target | Description
-- --------------------+--------------------------------------------------------
-- test | Run unit tests
-- install | Will install NMODL to: /home/kumbhar/workarena/repos/bbp/nrn/build_gpu/install
-- --------------------+--------------------------------------------------------
-- Build option | Status
-- --------------------+--------------------------------------------------------
-- CXX COMPILER | /opt/nvidia/hpc_sdk/Linux_x86_64/24.9/compilers/bin/nvc++
-- COMPILE FLAGS | -mp -g -O2 -Wc,--pending_instantiations=0
-- Build Type | RelWithDebInfo
-- Python Bindings | OFF
-- Flex | /usr/bin/flex
-- Bison | /usr/bin/bison
-- Python | /usr/bin/python3
-- Linked against | ON
-- --------------------+--------------------------------------------------------
-- See documentation : https://github.com/BlueBrain/nmodl/
-- --------------------+--------------------------------------------------------
--
--
-- CoreNEURON is enabled with following build configuration:
-- --------------------+--------------------------------------------------------
-- Build option | Status
-- --------------------+--------------------------------------------------------
-- CXX COMPILER | /opt/nvidia/hpc_sdk/Linux_x86_64/24.9/compilers/bin/nvc++
-- COMPILE FLAGS | -mp -g -O2 --c++17 -cuda -gpu=cuda12.6,lineinfo,cc70,cc80 -mp=gpu -Mautoinline -DCORENEURON_CUDA_PROFILING -DCORENEURON_ENABLE_GPU -DCORENEURON_PREFER_OPENMP_OFFLOAD -DCORENEURON_BUILD -DHAVE_MALLOC_H -DCORENRN_BUILD=1 -DEIGEN_DONT_PARALLELIZE -DEIGEN_DONT_VECTORIZE=1 -DNRNMPI=1 -DLAYOUT=0 -DDISABLE_HOC_EXP -DENABLE_SPLAYTREE_QUEUING
-- Build Type | SHARED
-- MPI | ON
-- DYNAMIC | OFF
-- INC | /opt/nvidia/hpc_sdk/Linux_x86_64/24.9/comm_libs/12.6/hpcx/hpcx-2.20/ompi/include;/opt/nvidia/hpc_sdk/Linux_x86_64/24.9/comm_libs/12.6/hpcx/hpcx-2.20/ompi/include/openmpi;/opt/nvidia/hpc_sdk/Linux_x86_64/24.9/comm_libs/12.6/hpcx/hpcx-2.20/ompi/include/openmpi/opal/mca/hwloc/hwloc201/hwloc/include;/opt/nvidia/hpc_sdk/Linux_x86_64/24.9/comm_libs/12.6/hpcx/hpcx-2.20/ompi/include/openmpi/opal/mca/event/libevent2022/libevent;/opt/nvidia/hpc_sdk/Linux_x86_64/24.9/comm_libs/12.6/hpcx/hpcx-2.20/ompi/include/openmpi/opal/mca/event/libevent2022/libevent/include
-- OpenMP | ON
-- NMODL PATH | /home/kumbhar/workarena/repos/bbp/nrn/build_gpu/bin/nmodl
-- NMODL FLAGS |
-- GPU Support | ON
-- CUDA | /opt/nvidia/hpc_sdk/Linux_x86_64/24.9/cuda/12.6/lib64
-- Offload | OpenMP
-- Unified Memory | OFF
-- Auto Timeout | ON
-- Wrap exp() | OFF
-- SplayTree Queue | ON
-- NetReceive Buffer | ON
-- Caliper | OFF
-- Likwid | OFF
-- Unit Tests | OFF
-- Reporting | OFF
-- --------------------+--------------------------------------------------------
--
Extracting link flags from target 'nrngnu', beware that this can be fragile. Got:
Extracting link flags from target 'sparse13', beware that this can be fragile. Got:
Extracting link flags from target 'fmt::fmt', beware that this can be fragile. Got:
For 'nrnpython' going to see TARGET 'fmt::fmt' recursively.
Extracting link flags from target 'fmt::fmt', beware that this can be fragile. Got: /usr/lib/x86_64-linux-gnu/libpython3.10.so;fmt::fmt;nanobind
For 'nrnpython' going to see TARGET 'nanobind' recursively.
Extracting link flags from target 'nanobind', beware that this can be fragile. Got: /usr/lib/x86_64-linux-gnu/libpython3.10.so;fmt::fmt;nanobind
Extracting link flags from target 'nrnpython', beware that this can be fragile. Got: /usr/lib/x86_64-linux-gnu/libpython3.10.so;fmt::fmt;nanobind
Extracting link flags from target 'Threads::Threads', beware that this can be fragile. Got: /usr/lib/x86_64-linux-gnu/libpython3.10.so;fmt::fmt;nanobind
Generating link flags from path /usr/lib/x86_64-linux-gnu/libreadline.so Got: /usr/lib/x86_64-linux-gnu/libreadline.so -Wl,-rpath,/usr/lib/x86_64-linux-gnu
Generating link flags from path /usr/lib/x86_64-linux-gnu/libcurses.so Got: /usr/lib/x86_64-linux-gnu/libcurses.so -Wl,-rpath,/usr/lib/x86_64-linux-gnu
Generating link flags from path /usr/lib/x86_64-linux-gnu/libform.so Got: /usr/lib/x86_64-linux-gnu/libform.so -Wl,-rpath,/usr/lib/x86_64-linux-gnu
Generating link flags from path /opt/nvidia/hpc_sdk/Linux_x86_64/24.9/comm_libs/12.6/hpcx/hpcx-2.20/ompi/lib/libmpi.so Got: /opt/nvidia/hpc_sdk/Linux_x86_64/24.9/comm_libs/12.6/hpcx/hpcx-2.20/ompi/lib/libmpi.so -Wl,-rpath,/opt/nvidia/hpc_sdk/Linux_x86_64/24.9/comm_libs/12.6/hpcx/hpcx-2.20/ompi/lib
Generating link flags from name 'dl', beware that this can be fragile. Got: -ldl
--
-- Configured NEURON 9.0.0
--
-- You can now build NEURON using:
-- cmake --build . --parallel 8 [--target TARGET]
-- You might want to adjust the number of parallel build jobs for your system.
-- Some non-default targets you might want to build:
-- --------------+--------------------------------------------------------------
-- Target | Description
-- --------------+--------------------------------------------------------------
-- install | Will install NEURON to: /home/kumbhar/workarena/repos/bbp/nrn/build_gpu/install
-- | Change the install location of NEURON using:
-- | cmake <src_path> -DCMAKE_INSTALL_PREFIX=<install_path>
-- docs | Build full docs. Calls targets: doxygen, notebooks, sphinx, notebooks-clean
-- uninstall | Removes files installed by make install (todo)
-- --------------+--------------------------------------------------------------
-- Build option | Status
-- --------------+--------------------------------------------------------------
-- C COMPILER | /opt/nvidia/hpc_sdk/Linux_x86_64/24.9/compilers/bin/nvc
-- CXX COMPILER | /opt/nvidia/hpc_sdk/Linux_x86_64/24.9/compilers/bin/nvc++
-- BUILD_TYPE | RelWithDebInfo (allowed: Custom;Debug;Release;RelWithDebInfo;Fast;FastDebug)
-- COMPILE FLAGS | -g -O2 --diag_suppress=1,47,111,128,170,174,177,186,541,550,816,2465 -noswitcherror
-- Shared | ON
-- MPI | ON
-- DYNAMIC | OFF
-- INC | /opt/nvidia/hpc_sdk/Linux_x86_64/24.9/comm_libs/12.6/hpcx/hpcx-2.20/ompi/include;/opt/nvidia/hpc_sdk/Linux_x86_64/24.9/comm_libs/12.6/hpcx/hpcx-2.20/ompi/include/openmpi;/opt/nvidia/hpc_sdk/Linux_x86_64/24.9/comm_libs/12.6/hpcx/hpcx-2.20/ompi/include/openmpi/opal/mca/hwloc/hwloc201/hwloc/include;/opt/nvidia/hpc_sdk/Linux_x86_64/24.9/comm_libs/12.6/hpcx/hpcx-2.20/ompi/include/openmpi/opal/mca/event/libevent2022/libevent;/opt/nvidia/hpc_sdk/Linux_x86_64/24.9/comm_libs/12.6/hpcx/hpcx-2.20/ompi/include/openmpi/opal/mca/event/libevent2022/libevent/include
-- LIB | /opt/nvidia/hpc_sdk/Linux_x86_64/24.9/comm_libs/12.6/hpcx/hpcx-2.20/ompi/lib/libmpi.so
-- Python | ON
-- DYNAMIC | OFF
-- MODULE | ON
-- python3.10 (default)
-- EXE | /usr/bin/python3
-- INC | /usr/include/python3.10
-- LIB | /usr/lib/x86_64-linux-gnu/libpython3.10.so
-- Readline | /usr/lib/x86_64-linux-gnu/libreadline.so
-- Curses | /usr/lib/x86_64-linux-gnu/libcurses.so;/usr/lib/x86_64-linux-gnu/libform.so
-- RX3D | OFF
-- Interviews | OFF
-- CoreNEURON | ON
-- PATH | /home/kumbhar/workarena/repos/bbp/nrn/src/coreneuron
-- LINK FLAGS | -cuda -gpu=cuda12.6,lineinfo,cc70,cc80 -mp=gpu -lcorenrnmech -lcoreneuron-cuda -Wl,-rpath,/opt/nvidia/hpc_sdk/Linux_x86_64/24.9/comm_libs/12.6/hpcx/hpcx-2.20/ompi/lib /opt/nvidia/hpc_sdk/Linux_x86_64/24.9/comm_libs/12.6/hpcx/hpcx-2.20/ompi/lib/libmpi.so -ldl
-- Tests | OFF
-- --------------+--------------------------------------------------------------
-- See documentation : https://www.neuron.yale.edu/neuron/
-- --------------+--------------------------------------------------------------
--
-- Configuring done
-- Generating done
-- Build files have been written to: /home/kumbhar/workarena/repos/bbp/nrn/build_gpu and then # nsys profile just for a sanity check
$ nsys nvprof /home/kumbhar/workarena/repos/bbp/nrn/build_gpu/install/bin/nrniv -python test.py
WARNING: nrniv and any of its children processes will be profiled.
Collecting data...
NEURON -- VERSION 9.0.dev-1246-g797e9b0a8+ HEAD (797e9b0a8+) 2023-01-04
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits
Info : 1 GPUs shared by 1 ranks per node
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
Version : 9.0.0 2434192bc (2024-10-17 16:11:44 +0200)
Additional mechanisms from files
exp2syn.mod expsyn.mod hh.mod netstim.mod passive.mod pattern.mod stim.mod svclmp.mod
Memory (MBs) : After mk_mech : Max 691.2656, Min 691.2656, Avg 691.2656
GPU Memory (MiBs) : Used = 245.000000, Free = 5678.812500, Total = 5923.812500
Memory (MBs) : After MPI_Init : Max 691.2656, Min 691.2656, Avg 691.2656
GPU Memory (MiBs) : Used = 245.000000, Free = 5678.812500, Total = 5923.812500
Memory (MBs) : Before nrn_setup : Max 691.2656, Min 691.2656, Avg 691.2656
GPU Memory (MiBs) : Used = 245.000000, Free = 5678.812500, Total = 5923.812500
Setup Done : 0.00 seconds
Model size : 4.56 kB
Memory (MBs) : After nrn_setup : Max 691.7500, Min 691.7500, Avg 691.7500
GPU Memory (MiBs) : Used = 245.000000, Free = 5678.812500, Total = 5923.812500
GENERAL PARAMETERS
--mpi=false
--mpi-lib=
--gpu=true
--dt=0.25
--tstop=500
GPU
--nwarp=65536
--cell-permute=1
--cuda-interface=false
INPUT PARAMETERS
--voltage=1000
--seed=-1
--datpath=.
--filesdat=files.dat
--pattern=
--report-conf=
--restore=
PARALLEL COMPUTATION PARAMETERS
--threading=false
--skip_mpi_finalize=true
SPIKE EXCHANGE
--ms_phases=2
--ms_subintervals=2
--multisend=false
--spk_compress=0
--binqueue=false
CONFIGURATION
--spikebuf=100000
--prcellgid=-1
--forwardskip=0
--celsius=6.3
--mindelay=10
--report-buffer-size=4
OUTPUT PARAMETERS
--dt_io=0.1
--outpath=.
--checkpoint=
Start time (t) = 0
Memory (MBs) : After mk_spikevec_buffer : Max 691.7500, Min 691.7500, Avg 691.7500
GPU Memory (MiBs) : Used = 245.000000, Free = 5678.812500, Total = 5923.812500
Memory (MBs) : After nrn_finitialize : Max 691.9883, Min 691.9883, Avg 691.9883
GPU Memory (MiBs) : Used = 245.000000, Free = 5678.812500, Total = 5923.812500
psolve |=========================================================| t: 500.00 ETA: 0h00m01s
Solver Time : 0.390506
Simulation Statistics
Number of cells: 2
Number of compartments: 10
Number of presyns: 2
Number of input presyns: 0
Number of synapses: 1
Number of point processes: 3
Number of transfer sources: 0
Number of transfer targets: 0
Number of spikes: 0
Number of spikes with non negative gid-s: 0
numprocs = 1
Rank 0: created gid 0; stim delay = 10.00
Rank 0: created gid 1; stim delay = 20.00
created cells
created connections
rank 0: total compute time: 0.63
Generating '/tmp/nsys-report-f933.qdstrm'
[1/7] [========================100%] report2.nsys-rep
[2/7] [========================100%] report2.sqlite
[3/7] Executing 'nvtx_sum' stats report
SKIPPED: /home/kumbhar/workarena/repos/bbp/nrn/ivan/report2.sqlite does not contain NV Tools Extension (NVTX) data.
[4/7] Executing 'cuda_api_sum' stats report
Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- ---------- ---------- -------- -------- ----------- --------------------------
63.0 238542651 46161 5167.6 4620.0 177 33643 2306.9 cuStreamSynchronize
13.4 50750148 42000 1208.3 1179.0 1041 156736 829.8 cuLaunchKernel
8.4 31962472 2 15981236.0 15981236.0 1369231 30593241 20664495.6 cudaProfilerStop
7.7 29251053 2044 14310.7 14484.0 5204 35330 1533.2 cuMemcpyDtoHAsync_v2
5.4 20595070 1 20595070.0 20595070.0 20595070 20595070 0.0 cuMemAllocManaged
1.1 4288147 4134 1037.3 1014.0 886 6575 169.1 cuMemcpyHtoDAsync_v2
0.4 1644335 2006 819.7 785.0 677 15841 457.1 cuMemsetD32Async
0.2 938734 1 938734.0 938734.0 938734 938734 0.0 cudaGetFuncBySymbol_v11000
0.1 437050 1 437050.0 437050.0 437050 437050 0.0 cuMemAllocHost_v2
0.0 129572 62 2089.9 864.0 710 59470 7438.3 cuMemAlloc_v2
0.0 43339 6 7223.2 5951.5 5033 13387 3212.2 cudaMemGetInfo
0.0 42526 412 103.2 77.0 42 2366 126.2 cuGetProcAddress_v2
0.0 6084 2 3042.0 3042.0 1500 4584 2180.7 cudaProfilerStart
0.0 4742 1 4742.0 4742.0 4742 4742 0.0 cudaFree
0.0 1571 2 785.5 785.5 527 1044 365.6 cuInit
0.0 746 4 186.5 127.5 46 445 188.4 cuCtxSetCurrent
0.0 310 1 310.0 310.0 310 310 0.0 cuFuncGetModule
0.0 82 1 82.0 82.0 82 82 0.0 cuModuleGetLoadingMode
[5/7] Executing 'cuda_gpu_kern_sum' stats report
Time (%) Total Time (ns) Instances Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- -------- -------- -------- -------- ----------- ----------------------------------------------------------------------------------------------------
16.9 19083822 2000 9541.9 9536.0 7808 13632 795.0 nvkernel__ZN10coreneuron18solve_interleaved1Ei_F580L793_4
10.5 11918583 2000 5959.3 6048.0 4928 6432 294.4 nvkernel__ZN10coreneuron17nrn_state_Exp2SynEPNS_9NrnThreadEPNS_9Memb_listEi_F1L477_23
9.0 10127902 2000 5064.0 5120.0 4160 5856 257.0 nvkernel__ZN10coreneuron14nrn_cur_IClampEPNS_9NrnThreadEPNS_9Memb_listEi_F1L321_9
8.3 9413280 2000 4706.6 4768.0 3872 5344 233.1 nvkernel__ZN10coreneuron15nrn_cur_Exp2SynEPNS_9NrnThreadEPNS_9Memb_listEi_F1L436_16
5.5 6240543 2000 3120.3 3168.0 2560 3744 156.5 nvkernel__ZN10coreneuron11nrn_cur_pasEPNS_9NrnThreadEPNS_9Memb_listEi_F1L305_9
4.6 5204375 4000 1301.1 1312.0 1024 1793 70.8 nvkernel__ZN10coreneuron23net_buf_receive_Exp2SynEPNS_9NrnThreadE_F1L340_2
4.3 4819276 2000 2409.6 2432.0 1952 2688 122.3 nvkernel__ZN95_INTERNAL_73__home_kumbhar_workarena_repos_bbp_nrn_src_coreneuron_sim_treeset_core_cp…
4.1 4650301 2000 2325.2 2368.0 1888 2401 118.4 nvkernel__ZN95_INTERNAL_73__home_kumbhar_workarena_repos_bbp_nrn_src_coreneuron_sim_treeset_core_cp…
3.9 4393077 2000 2196.5 2209.0 1791 2688 112.2 nvkernel__ZN10coreneuron23nrncore2nrn_send_valuesEPNS_9NrnThreadE_F1L295_18
.... Could you copy the output of cmake configure step ? |
Hello @pramodk, I did remove the old build directory, but thanks for checking! Below is my cmake configure log. It appears that in your CoreNEURON "GPU" section, the Offload setting is OpenMP, whereas mine is OpenACC. Could this be the culprit? |
Add -- CoreNEURON is enabled with following build configuration:
-- --------------------+--------------------------------------------------------
-- Build option | Status
-- --------------------+--------------------------------------------------------
-- CXX COMPILER | /opt/nvidia/hpc_sdk/Linux_x86_64/24.9/compilers/bin/nvc++
-- COMPILE FLAGS | -g -O2 --c++17 -cuda -gpu=cuda12.6,lineinfo,cc70,cc80 -acc -Mautoinline -DCORENEURON_CUDA_PROFILING -DCORENEURON_ENABLE_GPU -DCORENEURON_BUILD -DHAVE_MALLOC_H -DCORENRN_BUILD=1 -DEIGEN_DONT_PARALLELIZE -DEIGEN_DONT_VECTORIZE=1 -DNRNMPI=1 -DLAYOUT=0 -DDISABLE_HOC_EXP -DENABLE_SPLAYTREE_QUEUING
-- Build Type | SHARED
-- MPI | ON
-- DYNAMIC | OFF
-- INC | /opt/nvidia/hpc_sdk/Linux_x86_64/24.9/comm_libs/12.6/hpcx/hpcx-2.20/ompi/include;/opt/nvidia/hpc_sdk/Linux_x86_64/24.9/comm_libs/12.6/hpcx/hpcx-2.20/ompi/include/openmpi;/opt/nvidia/hpc_sdk/Linux_x86_64/24.9/comm_libs/12.6/hpcx/hpcx-2.20/ompi/include/openmpi/opal/mca/hwloc/hwloc201/hwloc/include;/opt/nvidia/hpc_sdk/Linux_x86_64/24.9/comm_libs/12.6/hpcx/hpcx-2.20/ompi/include/openmpi/opal/mca/event/libevent2022/libevent;/opt/nvidia/hpc_sdk/Linux_x86_64/24.9/comm_libs/12.6/hpcx/hpcx-2.20/ompi/include/openmpi/opal/mca/event/libevent2022/libevent/include
-- OpenMP | OFF
-- NMODL PATH | /home/kumbhar/workarena/repos/bbp/nrn/build_gpu_acc/bin/nmodl
-- NMODL FLAGS |
-- GPU Support | ON
-- CUDA | /opt/nvidia/hpc_sdk/Linux_x86_64/24.9/cuda/12.6/lib64
-- Offload | OpenACC
-- Unified Memory | OFF
-- Auto Timeout | ON
-- Wrap exp() | OFF
-- SplayTree Queue | ON
-- NetReceive Buffer | ON
-- Caliper | OFF
-- Likwid | OFF
-- Unit Tests | OFF and still finished without errors: $ nsys nvprof /home/kumbhar/workarena/repos/bbp/nrn/build_gpu_acc/install/bin/nrniv -python test.py
WARNING: nrniv and any of its children processes will be profiled.
Collecting data...
NEURON -- VERSION 9.0.dev-1246-g797e9b0a8+ HEAD (797e9b0a8+) 2023-01-04
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits
Info : 1 GPUs shared by 1 ranks per node
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
Version : 9.0.0 2434192bc (2024-10-17 16:11:44 +0200)
Additional mechanisms from files
exp2syn.mod expsyn.mod hh.mod netstim.mod passive.mod pattern.mod stim.mod svclmp.mod
Memory (MBs) : After mk_mech : Max 690.1641, Min 690.1641, Avg 690.1641
GPU Memory (MiBs) : Used = 245.000000, Free = 5678.812500, Total = 5923.812500
Memory (MBs) : After MPI_Init : Max 690.1641, Min 690.1641, Avg 690.1641
GPU Memory (MiBs) : Used = 245.000000, Free = 5678.812500, Total = 5923.812500
Memory (MBs) : Before nrn_setup : Max 690.1641, Min 690.1641, Avg 690.1641
GPU Memory (MiBs) : Used = 245.000000, Free = 5678.812500, Total = 5923.812500
Setup Done : 0.00 seconds
Model size : 4.56 kB
Memory (MBs) : After nrn_setup : Max 690.1641, Min 690.1641, Avg 690.1641
GPU Memory (MiBs) : Used = 245.000000, Free = 5678.812500, Total = 5923.812500
GENERAL PARAMETERS
--mpi=false
--mpi-lib=
--gpu=true
--dt=0.25
--tstop=500
GPU
--nwarp=65536
--cell-permute=1
--cuda-interface=false
INPUT PARAMETERS
--voltage=1000
--seed=-1
--datpath=.
--filesdat=files.dat
--pattern=
--report-conf=
--restore=
PARALLEL COMPUTATION PARAMETERS
--threading=false
--skip_mpi_finalize=true
SPIKE EXCHANGE
--ms_phases=2
--ms_subintervals=2
--multisend=false
--spk_compress=0
--binqueue=false
CONFIGURATION
--spikebuf=100000
--prcellgid=-1
--forwardskip=0
--celsius=6.3
--mindelay=10
--report-buffer-size=4
OUTPUT PARAMETERS
--dt_io=0.1
--outpath=.
--checkpoint=
Start time (t) = 0
Memory (MBs) : After mk_spikevec_buffer : Max 690.1641, Min 690.1641, Avg 690.1641
GPU Memory (MiBs) : Used = 245.000000, Free = 5678.812500, Total = 5923.812500
Memory (MBs) : After nrn_finitialize : Max 690.7969, Min 690.7969, Avg 690.7969
GPU Memory (MiBs) : Used = 245.000000, Free = 5678.812500, Total = 5923.812500
psolve |=========================================================| t: 500.00 ETA: 0h00m00s
Solver Time : 0.263252
Simulation Statistics
Number of cells: 2
Number of compartments: 10
Number of presyns: 2
Number of input presyns: 0
Number of synapses: 1
Number of point processes: 3
Number of transfer sources: 0
Number of transfer targets: 0
Number of spikes: 0
Number of spikes with non negative gid-s: 0
numprocs = 1
Rank 0: created gid 0; stim delay = 10.00
Rank 0: created gid 1; stim delay = 20.00
created cells
created connections
rank 0: total compute time: 0.46
Generating '/tmp/nsys-report-3cb0.qdstrm'
[1/7] [========================100%] report3.nsys-rep
[2/7] [========================100%] report3.sqlite
[3/7] Executing 'nvtx_sum' stats report
SKIPPED: /home/kumbhar/workarena/repos/bbp/nrn/ivan/report3.sqlite does not contain NV Tools Extension (NVTX) data.
[4/7] Executing 'cuda_api_sum' stats report
Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- ---------- ---------- -------- -------- ----------- --------------------------
55.7 108157789 36162 2990.9 202.0 150 37026 4569.9 cuStreamSynchronize
26.8 52061729 42000 1239.6 1208.0 1028 114231 664.3 cuLaunchKernel
5.8 11304352 1 11304352.0 11304352.0 11304352 11304352 0.0 cuMemHostAlloc
5.4 10557036 2 5278518.0 5278518.0 1372485 9184551 5523964.8 cudaProfilerStop
2.4 4696424 4139 1134.7 1125.0 874 16344 313.9 cuMemcpyHtoDAsync_v2
1.3 2468031 2044 1207.5 1084.0 983 23143 803.0 cuMemcpyDtoHAsync_v2
0.8 1582809 2001 791.0 772.0 679 2974 136.6 cuMemsetD32Async
0.6 1217946 2000 609.0 593.0 543 4273 118.9 cuEventRecord
0.5 1006975 30 33565.8 5287.5 532 783273 141747.0 cudaGetFuncBySymbol_v11000
0.2 420253 2 210126.5 210126.5 2417 417836 293745.6 cuMemAllocHost_v2
0.2 345595 2000 172.8 169.0 156 1514 38.2 cuEventSynchronize
0.1 175334 62 2828.0 825.0 655 64889 9902.4 cuMemAlloc_v2
0.0 45463 6 7577.2 6085.0 3472 13111 4281.6 cudaMemGetInfo
0.0 42937 412 104.2 78.0 41 2569 136.1 cuGetProcAddress_v2
0.0 4679 2 2339.5 2339.5 1257 3422 1530.9 cudaProfilerStart
0.0 3828 4 957.0 464.0 233 2667 1156.9 cuEventCreate
0.0 3050 1 3050.0 3050.0 3050 3050 0.0 cuStreamCreate
0.0 1181 5 236.2 95.0 47 867 353.9 cuCtxSetCurrent
0.0 476 1 476.0 476.0 476 476 0.0 cuInit
0.0 464 1 464.0 464.0 464 464 0.0 cuFuncGetModule
0.0 70 1 70.0 70.0 70 70 0.0 cuModuleGetLoadingMode
[5/7] Executing 'cuda_gpu_kern_sum' stats report
Time (%) Total Time (ns) Instances Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- -------- -------- -------- -------- ----------- -------------------------------------------------------------------------------------------
16.9 15945974 2000 7973.0 8672.0 5887 12544 1482.4 coreneuron::solve_interleaved1_793(int)
11.1 10420669 2000 5210.3 5760.0 3904 6240 844.6 coreneuron::nrn_state_Exp2Syn_477(coreneuron::NrnThread *, coreneuron::Memb_list *, int)
8.8 8310735 2000 4155.4 4576.0 3103 5280 682.6 coreneuron::nrn_cur_IClamp_321(coreneuron::NrnThread *, coreneuron::Memb_list *, int)
8.3 7871696 2000 3935.8 4352.0 2943 5025 640.4 coreneuron::nrn_cur_Exp2Syn_436(coreneuron::NrnThread *, coreneuron::Memb_list *, int)
5.5 5197502 2000 2598.8 2849.0 1920 3456 425.3 coreneuron::nrn_cur_pas_305(coreneuron::NrnThread *, coreneuron::Memb_list *, int)
5.1 4836565 4000 1209.1 1312.0 863 1920 203.9 coreneuron::net_buf_receive_Exp2Syn_340(coreneuron::NrnThread *)
4.5 4252522 2000 2126.3 2336.0 1567 2656 351.1 coreneuron::nrncore2nrn_send_values_295(coreneuron::NrnThread *)
4.2 3973279 2000 1986.6 2177.0 1471 2848 329.6 coreneuron::NetCvode::check_thresh_536(coreneuron::NrnThread *)
3.7 3472390 2000 1736.2 1920.0 1279 2176 285.9 coreneuron::nrn_rhs_83(coreneuron::NrnThread *)
3.5 3284434 2000 1642.2 1792.0 1215 1888 270.8 coreneuron::nrn_lhs_160(coreneuron::NrnThread *)
3.5 3261122 2000 1630.6 1792.0 1184 2016 269.8 coreneuron::nrn_jacob_capacitance_74(coren
... Quickly skimming through log, I don't see anything obvious 😕. I will be off some time, we can revisit this later. |
@pramodk Thank you for checking! I confirmed that I still get the same error with the build settings above. Just for my reference, what Nvidia platform are you running on? |
This is my local development machine with
I didn't test on our HPC cluster with Volta 100 because we don't have NVHPC 24.9 there yet. We will figure this out, need to look a bit into the details. |
Ok, thank you. I wonder if the ARM architecture of Grace somehow could be a cause. |
@pramodk Would it be helpful if I added you to the allocation on TACC Vista so you can take a look? |
@pramodk: It would be helpful to have access. My account name on TACC portal is ( I am off this week and hence will be a bit late with the response) |
@pramodk Thank you so much for responding during your off time. We have added you to our TACC allocation. ssh access to Vista from outside is not allowed yet, so you first have to connect to frontera.tacc.utexas.edu and from there to vista.tacc.utexas.edu. Thanks a lot for your help! Just FYI, my module environment on Vista is:
The Grace-Hopper queue is called gh, and the Grace-Grace queue is called gg.
The Vista user guide is here: https://docs.tacc.utexas.edu/hpc/vista/
|
Context
Overview of the issue
Hello, I am trying to get CoreNEURON to run on Grace Hopper nodes on TACC Vista. I have compiled NEURON from the master branch with NVHPC 24.9. Unfortunately, the following error occurs (detailed log below):
Expected result/behavior
Successful invocation of psolve.
NEURON setup
Minimal working example - MWE
MWE that can be used for reproducing the issue and testing. A couple of examples:
Logs
The text was updated successfully, but these errors were encountered: