Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with @jit.trace(symbolic=True) in the train.py of the cascade_emd model #5

Open
stoneMo opened this issue Jul 17, 2020 · 1 comment

Comments

@stoneMo
Copy link

stoneMo commented Jul 17, 2020

When I run the cascade_emd model, I met the error as the following. I appreciate it if you could help me out. Thank you in advance.

Traceback (most recent call last):
File "train.py", line 167, in
run_train()
File "train.py", line 164, in run_train
train(args)
File "train.py", line 156, in train
worker(0, 1, args)
File "train.py", line 119, in worker
train_one_epoch(model, train_loader, opt, max_steps, rank, epoch_id, gpu_num)
File "train.py", line 58, in train_one_epoch
losses = propagate()
File "/home/xinmiao/anaconda3/envs/CRDET/lib/python3.6/site-packages/megengine/jit/init.py", line 424, in call
self._compiled_func()
File "/home/xinmiao/anaconda3/envs/CRDET/lib/python3.6/site-packages/megengine/_internal/mgb.py", line 1208, in call
self._execute()
File "/home/xinmiao/anaconda3/envs/CRDET/lib/python3.6/site-packages/megengine/_internal/mgb.py", line 1092, in _execute
return _mgb.AsyncExec__execute(self)
megengine._internal.exc.MegBrainError: MegBrain core throws exception: mgb::AssertionError
assertion `begin >= 0 && end >= begin && end <= size_ax' failed at /home/code/src/core/impl/tensor.cpp:151: mgb::SubTensorSpec mgb::Slice::apply(megdnn::TensorLayout, int) const
extra message: index out of bound: layout={511(1),1(1)}; request begin=None end=2 step=None axis=1

  • bt:/home/xinmiao/anaconda3/envs/CRDET/lib/python3.6/site-packages/megengine/_internal/_mgb.cpython-36m-x86_64-linux-gnu.so{1e36052,1edec06,1fc6782,1fc6fd0}
    | Associated operator: id=160315 name=subtensor(argsort[160305]:o0)[160315] type=mgb::opr::Subtensor
    | input variables:
    | 0: {id:160306, shape:{511,1}, Float32, owner:argsort(MUL[160303])[160305]{ArgsortForward}, name:argsort(MUL[160303])[160305]:o0, slot:0, gpu0:0, d, 8, 1}
    | 1: {id:21, shape:{1}, Int32, owner:2[20]{ImmutableTensor}, name:2[20], slot:0, gpu0:0, s, 2, 2}
    | output variables:
    | 0: {id:160316, shape:{553,2}, Float32, owner:subtensor(argsort[160305]:o0)[160315]{Subtensor}, name:subtensor(argsort[160305]:o0)[160315], slot:0, gpu0:0, d, 8, 8}
    |
    | Unoptimized equivalent of associated operator: id=10623 name=subtensor(argsort[10615]:o0)[10623] type=mgb::opr::Subtensor
    | input variables:
    | 0: {id:10616, shape:{}, Float32, owner:argsort(MUL[10611])[10615]{ArgsortForward}, name:argsort(MUL[10611])[10615]:o0, slot:0, gpu0:0, d, 8, 1}
    | 1: {id:21, shape:{1}, Int32, owner:2[20]{ImmutableTensor}, name:2[20], slot:0, gpu0:0, s, 2, 2}
    | output variables:
    | 0: {id:10624, shape:{}, Float32, owner:subtensor(argsort[10615]:o0)[10623]{Subtensor}, name:subtensor(argsort[10615]:o0)[10623], slot:0, gpu0:0, d, 8, 8}
@xg-chu
Copy link
Collaborator

xg-chu commented Aug 3, 2020

Does this error exist in fpn_baseline or emd_simple?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants