Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

“CUDA kernel build failed with error code 6” in major-rafactor branch #64

Closed
Gxinhu opened this issue Sep 18, 2024 · 5 comments
Closed

Comments

@Gxinhu
Copy link

Gxinhu commented Sep 18, 2024

Hello there,

I try to run the "lid_driven_cavity_2d.py" by warp in major-rafactor branch. But the error occur as indicated by the title. And I found the error is from this line:

feq = self.equilibrium.warp_functional(rho, u)

Here is the error log:

Warp 1.3.3 initialized:
CUDA Toolkit 12.5, Driver 12.4
Devices:
"cpu" : "x86_64"
"cuda:0" : "NVIDIA GeForce RTX 4070 Ti" (12 GiB, sm_89, mempool enabled)
Kernel cache:
/home/xhu/.cache/warp/1.3.3
Module xlb.operator.boundary_masker.indices_boundary_masker c934392 load on device 'cuda:0' took 0.20 ms (cached)
Module xlb.operator.equilibrium.quadratic_equilibrium 6b76017 load on device 'cuda:0' took 0.11 ms (cached)
Warp NVRTC compilation error 6: NVRTC_ERROR_COMPILATION (/builds/omniverse/warp/warp/native/warp.cu:2582)
default_program(141): warning #550-D: variable "var_17" was set but never used
bool var_17;
^

Remark: The warnings can be suppressed with "-diag-suppress "

default_program(145): warning #550-D: variable "var_21" was set but never used
bool var_21;
^

default_program(159): warning #550-D: variable "var_35" was set but never used
bool var_35;
^

default_program(163): warning #550-D: variable "var_39" was set but never used
bool var_39;
^

default_program(177): warning #550-D: variable "var_53" was set but never used
bool var_53;
^

default_program(181): warning #550-D: variable "var_57" was set but never used
bool var_57;
^

default_program(195): warning #550-D: variable "var_71" was set but never used
bool var_71;
^

default_program(199): warning #550-D: variable "var_75" was set but never used
bool var_75;
^

default_program(213): warning #550-D: variable "var_89" was set but never used
bool var_89;
^

default_program(217): warning #550-D: variable "var_93" was set but never used
bool var_93;
^

default_program(231): warning #550-D: variable "var_107" was set but never used
bool var_107;
^

default_program(235): warning #550-D: variable "var_111" was set but never used
bool var_111;
^

default_program(249): warning #550-D: variable "var_125" was set but never used
bool var_125;
^

default_program(253): warning #550-D: variable "var_129" was set but never used
bool var_129;
^

default_program(267): warning #550-D: variable "var_143" was set but never used
bool var_143;
^

default_program(271): warning #550-D: variable "var_147" was set but never used
bool var_147;
^

default_program(285): warning #550-D: variable "var_161" was set but never used
bool var_161;
^

default_program(289): warning #550-D: variable "var_165" was set but never used
bool var_165;
^

default_program(3098): warning #177-D: variable "adj_13" was declared but never referenced
bool adj_13 = {};
^

default_program(3111): warning #177-D: variable "adj_26" was declared but never referenced
bool adj_26 = {};
^

default_program(3120): warning #177-D: variable "adj_35" was declared but never referenced
bool adj_35 = {};
^

default_program(3132): warning #177-D: variable "adj_47" was declared but never referenced
bool adj_47 = {};
^

default_program(3149): warning #177-D: variable "adj_64" was declared but never referenced
bool adj_64 = {};
^

default_program(3161): warning #177-D: variable "adj_76" was declared but never referenced
bool adj_76 = {};
^

default_program(3170): warning #177-D: variable "adj_85" was declared but never referenced
bool adj_85 = {};
^

default_program(3182): warning #177-D: variable "adj_97" was declared but never referenced
bool adj_97 = {};
^

default_program(3199): warning #177-D: variable "adj_114" was declared but never referenced
bool adj_114 = {};
^

default_program(3211): warning #177-D: variable "adj_126" was declared but never referenced
bool adj_126 = {};
^

default_program(3220): warning #177-D: variable "adj_135" was declared but never referenced
bool adj_135 = {};
^

default_program(3232): warning #177-D: variable "adj_147" was declared but never referenced
bool adj_147 = {};
^

default_program(3249): warning #177-D: variable "adj_164" was declared but never referenced
bool adj_164 = {};
^

default_program(3261): warning #177-D: variable "adj_176" was declared but never referenced
bool adj_176 = {};
^

default_program(3270): warning #177-D: variable "adj_185" was declared but never referenced
bool adj_185 = {};
^

default_program(3282): warning #177-D: variable "adj_197" was declared but never referenced
bool adj_197 = {};
^

default_program(3299): warning #177-D: variable "adj_214" was declared but never referenced
bool adj_214 = {};
^

default_program(3311): warning #177-D: variable "adj_226" was declared but never referenced
bool adj_226 = {};
^

default_program(3320): warning #177-D: variable "adj_235" was declared but never referenced
bool adj_235 = {};
^

default_program(3332): warning #177-D: variable "adj_247" was declared but never referenced
bool adj_247 = {};
^

default_program(3349): warning #177-D: variable "adj_264" was declared but never referenced
bool adj_264 = {};
^

default_program(3361): warning #177-D: variable "adj_276" was declared but never referenced
bool adj_276 = {};
^

default_program(3370): warning #177-D: variable "adj_285" was declared but never referenced
bool adj_285 = {};
^

default_program(3382): warning #177-D: variable "adj_297" was declared but never referenced
bool adj_297 = {};
^

default_program(3399): warning #177-D: variable "adj_314" was declared but never referenced
bool adj_314 = {};
^

default_program(3411): warning #177-D: variable "adj_326" was declared but never referenced
bool adj_326 = {};
^

default_program(3420): warning #177-D: variable "adj_335" was declared but never referenced
bool adj_335 = {};
^

default_program(3432): warning #177-D: variable "adj_347" was declared but never referenced
bool adj_347 = {};
^

default_program(3449): warning #177-D: variable "adj_364" was declared but never referenced
bool adj_364 = {};
^

default_program(3461): warning #177-D: variable "adj_376" was declared but never referenced
bool adj_376 = {};
^

default_program(3470): warning #177-D: variable "adj_385" was declared but never referenced
bool adj_385 = {};
^

default_program(3482): warning #177-D: variable "adj_397" was declared but never referenced
bool adj_397 = {};
^

default_program(3499): warning #177-D: variable "adj_414" was declared but never referenced
bool adj_414 = {};
^

default_program(3511): warning #177-D: variable "adj_426" was declared but never referenced
bool adj_426 = {};
^

default_program(3520): warning #177-D: variable "adj_435" was declared but never referenced
bool adj_435 = {};
^

default_program(3532): warning #177-D: variable "adj_447" was declared but never referenced
bool adj_447 = {};
^

default_program(5439): warning #177-D: variable "adj_5" was declared but never referenced
bool adj_5 = {};
^

default_program(5447): warning #177-D: variable "adj_13" was declared but never referenced
bool adj_13 = {};
^

default_program(5448): warning #177-D: variable "adj_14" was declared but never referenced
bool adj_14 = {};
^

default_program(5457): warning #177-D: variable "adj_23" was declared but never referenced
bool adj_23 = {};
^

default_program(5463): warning #177-D: variable "adj_29" was declared but never referenced
bool adj_29 = {};
^

default_program(5464): warning #177-D: variable "adj_30" was declared but never referenced
bool adj_30 = {};
^

default_program(5473): warning #177-D: variable "adj_39" was declared but never referenced
bool adj_39 = {};
^

default_program(5479): warning #177-D: variable "adj_45" was declared but never referenced
bool adj_45 = {};
^

default_program(5480): warning #177-D: variable "adj_46" was declared but never referenced
bool adj_46 = {};
^

default_program(5489): warning #177-D: variable "adj_55" was declared but never referenced
bool adj_55 = {};
^

default_program(5495): warning #177-D: variable "adj_61" was declared but never referenced
bool adj_61 = {};
^

default_program(5496): warning #177-D: variable "adj_62" was declared but never referenced
bool adj_62 = {};
^

default_program(5505): warning #177-D: variable "adj_71" was declared but never referenced
bool adj_71 = {};
^

default_program(5511): warning #177-D: variable "adj_77" was declared but never referenced
bool adj_77 = {};
^

default_program(5512): warning #177-D: variable "adj_78" was declared but never referenced
bool adj_78 = {};
^

default_program(5521): warning #177-D: variable "adj_87" was declared but never referenced
bool adj_87 = {};
^

default_program(5527): warning #177-D: variable "adj_93" was declared but never referenced
bool adj_93 = {};
^

default_program(5528): warning #177-D: variable "adj_94" was declared but never referenced
bool adj_94 = {};
^

default_program(5537): warning #177-D: variable "adj_103" was declared but never referenced
bool adj_103 = {};
^

default_program(5543): warning #177-D: variable "adj_109" was declared but never referenced
bool adj_109 = {};
^

default_program(5544): warning #177-D: variable "adj_110" was declared but never referenced
bool adj_110 = {};
^

default_program(5553): warning #177-D: variable "adj_119" was declared but never referenced
bool adj_119 = {};
^

default_program(5559): warning #177-D: variable "adj_125" was declared but never referenced
bool adj_125 = {};
^

default_program(5560): warning #177-D: variable "adj_126" was declared but never referenced
bool adj_126 = {};
^

default_program(5569): warning #177-D: variable "adj_135" was declared but never referenced
bool adj_135 = {};
^

default_program(5575): warning #177-D: variable "adj_141" was declared but never referenced
bool adj_141 = {};
^

default_program(5576): warning #177-D: variable "adj_142" was declared but never referenced
bool adj_142 = {};
^

default_program(6211): warning #550-D: variable "var_266" was set but never used
wp::int32 var_266;
^

default_program(6835): warning #550-D: variable "var_266" was set but never used
wp::int32 var_266;
^

default_program(6840): warning #177-D: variable "adj_2" was declared but never referenced
bool adj_2 = {};
^

default_program(6848): warning #177-D: variable "adj_10" was declared but never referenced
bool adj_10 = {};
^

default_program(11383): warning #177-D: variable "adj_6" was declared but never referenced
bool adj_6 = {};
^

default_program(11392): warning #177-D: variable "adj_15" was declared but never referenced
bool adj_15 = {};
^

default_program(11400): warning #177-D: variable "adj_23" was declared but never referenced
bool adj_23 = {};
^

default_program(11408): warning #177-D: variable "adj_31" was declared but never referenced
bool adj_31 = {};
^

default_program(11416): warning #177-D: variable "adj_39" was declared but never referenced
bool adj_39 = {};
^

default_program(11424): warning #177-D: variable "adj_47" was declared but never referenced
bool adj_47 = {};
^

default_program(11432): warning #177-D: variable "adj_55" was declared but never referenced
bool adj_55 = {};
^

default_program(11440): warning #177-D: variable "adj_63" was declared but never referenced
bool adj_63 = {};
^

default_program(11448): warning #177-D: variable "adj_71" was declared but never referenced
bool adj_71 = {};
^

default_program(13333): warning #177-D: variable "adj_12" was declared but never referenced
bool adj_12 = {};
^

default_program(13340): warning #177-D: variable "adj_19" was declared but never referenced
bool adj_19 = {};
^

default_program(13348): warning #177-D: variable "adj_27" was declared but never referenced
bool adj_27 = {};
^

default_program(13355): warning #177-D: variable "adj_34" was declared but never referenced
bool adj_34 = {};
^

default_program(13367): warning #177-D: variable "adj_46" was declared but never referenced
bool adj_46 = {};
^

default_program(13374): warning #177-D: variable "adj_53" was declared but never referenced
bool adj_53 = {};
^

default_program(13382): warning #177-D: variable "adj_61" was declared but never referenced
bool adj_61 = {};
^

default_program(13389): warning #177-D: variable "adj_68" was declared but never referenced
bool adj_68 = {};
^

default_program(13401): warning #177-D: variable "adj_80" was declared but never referenced
bool adj_80 = {};
^

default_program(13408): warning #177-D: variable "adj_87" was declared but never referenced
bool adj_87 = {};
^

default_program(13416): warning #177-D: variable "adj_95" was declared but never referenced
bool adj_95 = {};
^

default_program(13423): warning #177-D: variable "adj_102" was declared but never referenced
bool adj_102 = {};
^

default_program(13435): warning #177-D: variable "adj_114" was declared but never referenced
bool adj_114 = {};
^

default_program(13442): warning #177-D: variable "adj_121" was declared but never referenced
bool adj_121 = {};
^

default_program(13450): warning #177-D: variable "adj_129" was declared but never referenced
bool adj_129 = {};
^

default_program(13457): warning #177-D: variable "adj_136" was declared but never referenced
bool adj_136 = {};
^

default_program(13469): warning #177-D: variable "adj_148" was declared but never referenced
bool adj_148 = {};
^

default_program(13476): warning #177-D: variable "adj_155" was declared but never referenced
bool adj_155 = {};
^

default_program(13484): warning #177-D: variable "adj_163" was declared but never referenced
bool adj_163 = {};
^

default_program(13491): warning #177-D: variable "adj_170" was declared but never referenced
bool adj_170 = {};
^

default_program(13503): warning #177-D: variable "adj_182" was declared but never referenced
bool adj_182 = {};
^

default_program(13510): warning #177-D: variable "adj_189" was declared but never referenced
bool adj_189 = {};
^

default_program(13518): warning #177-D: variable "adj_197" was declared but never referenced
bool adj_197 = {};
^

default_program(13525): warning #177-D: variable "adj_204" was declared but never referenced
bool adj_204 = {};
^

default_program(13537): warning #177-D: variable "adj_216" was declared but never referenced
bool adj_216 = {};
^

default_program(13544): warning #177-D: variable "adj_223" was declared but never referenced
bool adj_223 = {};
^

default_program(13552): warning #177-D: variable "adj_231" was declared but never referenced
bool adj_231 = {};
^

default_program(13559): warning #177-D: variable "adj_238" was declared but never referenced
bool adj_238 = {};
^

default_program(13571): warning #177-D: variable "adj_250" was declared but never referenced
bool adj_250 = {};
^

default_program(13578): warning #177-D: variable "adj_257" was declared but never referenced
bool adj_257 = {};
^

default_program(13586): warning #177-D: variable "adj_265" was declared but never referenced
bool adj_265 = {};
^

default_program(13593): warning #177-D: variable "adj_272" was declared but never referenced
bool adj_272 = {};
^

default_program(13605): warning #177-D: variable "adj_284" was declared but never referenced
bool adj_284 = {};
^

default_program(13612): warning #177-D: variable "adj_291" was declared but never referenced
bool adj_291 = {};
^

default_program(13620): warning #177-D: variable "adj_299" was declared but never referenced
bool adj_299 = {};
^

default_program(13627): warning #177-D: variable "adj_306" was declared but never referenced
bool adj_306 = {};
^

default_program(14655): error: function "QuadraticEquilibrium___construct_warp__locals__functional" has already been defined
static CUDA_CALLABLE wp::vec_t<9,wp::float32> QuadraticEquilibrium___construct_warp__locals__functional(
^

default_program(15753): error: function "adj_QuadraticEquilibrium___construct_warp__locals__functional" has already been defined
static CUDA_CALLABLE void adj_QuadraticEquilibrium___construct_warp__locals__functional(
^

default_program(68): warning #177-D: function "adj_IncompressibleNavierStokesStepper___construct_warp__locals__BoundaryConditionIDStruct" was declared but never referenced
static CUDA_CALLABLE void adj_IncompressibleNavierStokesStepper___construct_warp__locals__BoundaryConditionIDStruct(wp::uint8 const&,
^

2 errors detected in the compilation of "default_program".
Module xlb.operator.stepper.nse_stepper bbb6463 load on device 'cuda:0' took 343.97 ms (error)
Traceback (most recent call last):
File "/home/xhu/LBM/XLB/examples/cfd/lid_driven_cavity_2d.py", line 114, in
simulation.run(num_steps=5000, post_process_interval=1000)
File "/home/xhu/LBM/XLB/examples/cfd/lid_driven_cavity_2d.py", line 71, in run
self.f_1 = self.stepper(self.f_0, self.f_1, self.bc_mask, self.missing_mask, i)
File "/home/xhu/LBM/XLB/xlb/operator/operator.py", line 74, in call
raise Exception(f"Error captured for backend with key {key} for operator {self.class.name}: {error}\n {traceback_str}")
Exception: Error captured for backend with key ('IncompressibleNavierStokesStepper', <ComputeBackend.WARP: 2>, '(self, f_0, f_1, bc_mask, missing_mask, timestep)') for operator IncompressibleNavierStokesStepper: CUDA kernel build failed with error code 6
Traceback (most recent call last):
File "/home/xhu/LBM/XLB/xlb/operator/operator.py", line 64, in call
result = backend_method(self, *args, **kwargs)
File "/home/xhu/LBM/XLB/xlb/operator/stepper/nse_stepper.py", line 404, in warp_implementation
wp.launch(
File "/home/xhu/miniconda3/envs/xlb/lib/python3.10/site-packages/warp/context.py", line 4726, in launch
module_exec = kernel.module.load(device)
File "/home/xhu/miniconda3/envs/xlb/lib/python3.10/site-packages/warp/context.py", line 1878, in load
raise (e)
File "/home/xhu/miniconda3/envs/xlb/lib/python3.10/site-packages/warp/context.py", line 1866, in load
warp.build.build_cuda(
File "/home/xhu/miniconda3/envs/xlb/lib/python3.10/site-packages/warp/build.py", line 30, in build_cuda
raise Exception(f"CUDA kernel build failed with error code {err}")
Exception: CUDA kernel build failed with error code 6

@mehdiataei
Copy link
Contributor

Hi @Gxinhu. The major refactoring branch currently works with an internal custom build of the Warp library that is not upstreamed yet (The changes will be available soon to the public). I'll update you on this.

@Gxinhu
Copy link
Author

Gxinhu commented Sep 18, 2024

Hi @Gxinhu. The major refactoring branch currently works with an internal custom build of the Warp library that is not upstreamed yet (The changes will be available soon to the public). I'll update you on this.

Thank you for your response. As a workaround, I replaced the line feq = self.equilibrium.warp_functional(rho, u) with the implementation from QuadraticEquilibrium.functional. This temporary solution appears to be working correctly for my.

@wangguan1995
Copy link

Hi @Gxinhu. The major refactoring branch currently works with an internal custom build of the Warp library that is not upstreamed yet (The changes will be available soon to the public). I'll update you on this.

Thank you for your response. As a workaround, I replaced the line feq = self.equilibrium.warp_functional(rho, u) with the implementation from QuadraticEquilibrium.functional. This temporary solution appears to be working correctly for my.

Can you please show the code in details? Thanks~

@Gxinhu
Copy link
Author

Gxinhu commented Sep 24, 2024

Hi @Gxinhu. The major refactoring branch currently works with an internal custom build of the Warp library that is not upstreamed yet (The changes will be available soon to the public). I'll update you on this.

Thank you for your response. As a workaround, I replaced the line feq = self.equilibrium.warp_functional(rho, u) with the implementation from QuadraticEquilibrium.functional. This temporary solution appears to be working correctly for my.

Can you please show the code in details? Thanks~

  1. Replaced Specific Lines in nse_stepper.py:

    Both lines have been replaced with the implementation from QuadraticEquilibrium.functional:

    • Replacement Code:
      # Allocate the equilibrium
      feq = _f_vec()
      # Compute the equilibrium
      for l in range(self.velocity_set.q):
      # Compute cu
      cu = self.compute_dtype(0.0)
      for d in range(self.velocity_set.d):
      if _c[d, l] == 1:
      cu += u[d]
      elif _c[d, l] == -1:
      cu -= u[d]
      cu *= self.compute_dtype(3.0)
      # Compute usqr
      usqr = self.compute_dtype(1.5) * wp.dot(u, u)
      # Compute feq
      feq[l] = rho * _w[l] * (self.compute_dtype(1.0) + cu * (self.compute_dtype(1.0) + self.compute_dtype(0.5) * cu) - usqr)
  2. Initialization of _w:

    • Initialization Code: Ensure that _w = self.velocity_set.w is initialized.
    • Placement: You can add the initialization after the following lines:
      def _construct_warp(self):
      # Set local constants TODO: This is a hack and should be fixed with warp update
      _c = self.velocity_set.c
      _w = self.velocity_set.w
      _f_vec = wp.vec(self.velocity_set.q, dtype=self.compute_dtype)
      _u_vec = wp.vec(self.velocity_set.d, dtype=self.compute_dtype)

@mehdiataei
Copy link
Contributor

Installing warp from source will also fix the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants