Problem with @jit.trace(symbolic=True) in the train.py of the cascade_emd model #5

stoneMo · 2020-07-17T09:50:00Z

When I run the cascade_emd model, I met the error as the following. I appreciate it if you could help me out. Thank you in advance.

Traceback (most recent call last):
File "train.py", line 167, in
run_train()
File "train.py", line 164, in run_train
train(args)
File "train.py", line 156, in train
worker(0, 1, args)
File "train.py", line 119, in worker
train_one_epoch(model, train_loader, opt, max_steps, rank, epoch_id, gpu_num)
File "train.py", line 58, in train_one_epoch
losses = propagate()
File "/home/xinmiao/anaconda3/envs/CRDET/lib/python3.6/site-packages/megengine/jit/init.py", line 424, in call
self._compiled_func()
File "/home/xinmiao/anaconda3/envs/CRDET/lib/python3.6/site-packages/megengine/_internal/mgb.py", line 1208, in call
self._execute()
File "/home/xinmiao/anaconda3/envs/CRDET/lib/python3.6/site-packages/megengine/_internal/mgb.py", line 1092, in _execute
return _mgb.AsyncExec__execute(self)
megengine._internal.exc.MegBrainError: MegBrain core throws exception: mgb::AssertionError
assertion `begin >= 0 && end >= begin && end <= size_ax' failed at /home/code/src/core/impl/tensor.cpp:151: mgb::SubTensorSpec mgb::Slice::apply(megdnn::TensorLayout, int) const
extra message: index out of bound: layout={511(1),1(1)}; request begin=None end=2 step=None axis=1

bt:/home/xinmiao/anaconda3/envs/CRDET/lib/python3.6/site-packages/megengine/_internal/_mgb.cpython-36m-x86_64-linux-gnu.so{1e36052,1edec06,1fc6782,1fc6fd0}
| Associated operator: id=160315 name=subtensor(argsort[160305]:o0)[160315] type=mgb::opr::Subtensor
| input variables:
| 0: {id:160306, shape:{511,1}, Float32, owner:argsort(MUL[160303])[160305]{ArgsortForward}, name:argsort(MUL[160303])[160305]:o0, slot:0, gpu0:0, d, 8, 1}
| 1: {id:21, shape:{1}, Int32, owner:2[20]{ImmutableTensor}, name:2[20], slot:0, gpu0:0, s, 2, 2}
| output variables:
| 0: {id:160316, shape:{553,2}, Float32, owner:subtensor(argsort[160305]:o0)[160315]{Subtensor}, name:subtensor(argsort[160305]:o0)[160315], slot:0, gpu0:0, d, 8, 8}
|
| Unoptimized equivalent of associated operator: id=10623 name=subtensor(argsort[10615]:o0)[10623] type=mgb::opr::Subtensor
| input variables:
| 0: {id:10616, shape:{}, Float32, owner:argsort(MUL[10611])[10615]{ArgsortForward}, name:argsort(MUL[10611])[10615]:o0, slot:0, gpu0:0, d, 8, 1}
| 1: {id:21, shape:{1}, Int32, owner:2[20]{ImmutableTensor}, name:2[20], slot:0, gpu0:0, s, 2, 2}
| output variables:
| 0: {id:10624, shape:{}, Float32, owner:subtensor(argsort[10615]:o0)[10623]{Subtensor}, name:subtensor(argsort[10615]:o0)[10623], slot:0, gpu0:0, d, 8, 8}

xg-chu · 2020-08-03T08:36:18Z

Does this error exist in fpn_baseline or emd_simple?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with @jit.trace(symbolic=True) in the train.py of the cascade_emd model #5

Problem with @jit.trace(symbolic=True) in the train.py of the cascade_emd model #5

stoneMo commented Jul 17, 2020

xg-chu commented Aug 3, 2020

Problem with @jit.trace(symbolic=True) in the train.py of the cascade_emd model #5

Problem with @jit.trace(symbolic=True) in the train.py of the cascade_emd model #5

Comments

stoneMo commented Jul 17, 2020

xg-chu commented Aug 3, 2020