Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

D2Go Does Not Work as Expected When Using Detectron2 >= v0.5 #524

Open
reganh98 opened this issue Apr 5, 2023 · 0 comments
Open

D2Go Does Not Work as Expected When Using Detectron2 >= v0.5 #524

reganh98 opened this issue Apr 5, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@reganh98
Copy link

reganh98 commented Apr 5, 2023

Instructions To Reproduce the 🐛 Bug:

When Detectron2 >= v0.5 is used, D2Go does not work as expected when trained on the balloon dataset used in beginner tutorial:

  • The total_loss value does not decrease and remain around >1.4.
  • The average precision is also poor.
  • The visualised results detect both balloon and non-balloon objects with similar 50% confidence only.

However, when Detectron2 <= v0.4 is used, D2Go works as expected when trained on the balloon dataset used in beginner tutorial:

  • The total_loss value decreases.
  • The average precision is good.
  • The visualised results detect balloon objects with around 90% confidence.
  1. Full runnable code or full changes you made:
    Original code is based on: https://github.com/TannerGilbert/Object-Detection-and-Image-Segmentation-with-Detectron2/blob/592960ddc4243ff34af89a38124452a75309aa1c/D2Go/D2GO_Introduction.ipynb

    As the latest version of Detectron2 is always used in the original code, the code has been modified to use an older version:

    Older commit which works well (based on facebookresearch/detectron2@7ce4d12 ): https://colab.research.google.com/gist/reganh98/5923626a2aa52cd4f1d4f9d05d368d8f/working-detectron2-0-4-5-d2go-introduction.ipynb

    Newer commit which does not work (based on facebookresearch/detectron2@3755562 ): https://colab.research.google.com/gist/reganh98/739ba188b0dbca32cfe3828bc81375a4/broken-detectron2-0-4-5-d2go-introduction.ipynb

    Note:

  • Both older and new commit notebook has the same code but different detectron2 commit is used
  • Older commit is the parent commit of newer commit
  • Compare changes between Dectectron2 v0.4 and v0.5: facebookresearch/detectron2@v0.4...v0.5
  • If latest version of detectron2, mobile vision, d2go is used, it does not work as well and behave similar to newer commit.
  1. What exact command you run:
    Run the older commit and newer commit notebook. Them, observe and compare the results.

  2. Full logs or other relevant observations:
    When using newer commit:
    The total_loss value does not decrease after 600 iterations:

[04/05 01:10:09 d2.utils.events]:  eta: 0:03:09  iter: 19  total_loss: 1.521  loss_cls: 0.7575  loss_box_reg: 0.615  loss_rpn_cls: 0.08469  loss_rpn_loc: 0.004889  time: 0.2966  data_time: 0.0769  lr: 2.4208e-07  max_mem: 432M
[04/05 01:10:14 d2.utils.events]:  eta: 0:02:01  iter: 39  total_loss: 1.44  loss_cls: 0.7469  loss_box_reg: 0.5574  loss_rpn_cls: 0.1014  loss_rpn_loc: 0.008108  time: 0.2514  data_time: 0.0358  lr: 2.3375e-07  max_mem: 432M
[04/05 01:10:18 d2.utils.events]:  eta: 0:01:55  iter: 59  total_loss: 1.386  loss_cls: 0.748  loss_box_reg: 0.5551  loss_rpn_cls: 0.08697  loss_rpn_loc: 0.005132  time: 0.2371  data_time: 0.0438  lr: 2.2542e-07  max_mem: 432M
[04/05 01:10:25 d2.utils.events]:  eta: 0:01:55  iter: 79  total_loss: 1.425  loss_cls: 0.7425  loss_box_reg: 0.556  loss_rpn_cls: 0.09538  loss_rpn_loc: 0.005735  time: 0.2626  data_time: 0.0631  lr: 2.1708e-07  max_mem: 432M
[04/05 01:10:29 d2.utils.events]:  eta: 0:01:47  iter: 99  total_loss: 1.476  loss_cls: 0.748  loss_box_reg: 0.6392  loss_rpn_cls: 0.07331  loss_rpn_loc: 0.007898  time: 0.2493  data_time: 0.0360  lr: 2.0875e-07  max_mem: 432M
[04/05 01:10:33 d2.utils.events]:  eta: 0:01:41  iter: 119  total_loss: 1.396  loss_cls: 0.7446  loss_box_reg: 0.5497  loss_rpn_cls: 0.09379  loss_rpn_loc: 0.004724  time: 0.2422  data_time: 0.0462  lr: 2.0042e-07  max_mem: 432M
[04/05 01:10:40 d2.utils.events]:  eta: 0:01:39  iter: 139  total_loss: 1.404  loss_cls: 0.7455  loss_box_reg: 0.5565  loss_rpn_cls: 0.07553  loss_rpn_loc: 0.003958  time: 0.2582  data_time: 0.0789  lr: 1.9208e-07  max_mem: 432M
[04/05 01:10:44 d2.utils.events]:  eta: 0:01:34  iter: 159  total_loss: 1.424  loss_cls: 0.7416  loss_box_reg: 0.5527  loss_rpn_cls: 0.0873  loss_rpn_loc: 0.005553  time: 0.2514  data_time: 0.0349  lr: 1.8375e-07  max_mem: 432M
[04/05 01:10:48 d2.utils.events]:  eta: 0:01:29  iter: 179  total_loss: 1.514  loss_cls: 0.7453  loss_box_reg: 0.6301  loss_rpn_cls: 0.08914  loss_rpn_loc: 0.006679  time: 0.2457  data_time: 0.0318  lr: 1.7542e-07  max_mem: 432M
[04/05 01:10:53 d2.utils.events]:  eta: 0:01:25  iter: 199  total_loss: 1.373  loss_cls: 0.7465  loss_box_reg: 0.5454  loss_rpn_cls: 0.07865  loss_rpn_loc: 0.005857  time: 0.2483  data_time: 0.0284  lr: 1.6708e-07  max_mem: 432M
[04/05 01:10:59 d2.utils.events]:  eta: 0:01:21  iter: 219  total_loss: 1.442  loss_cls: 0.7388  loss_box_reg: 0.5893  loss_rpn_cls: 0.09883  loss_rpn_loc: 0.00505  time: 0.2505  data_time: 0.0364  lr: 1.5875e-07  max_mem: 432M
[04/05 01:11:03 d2.utils.events]:  eta: 0:01:16  iter: 239  total_loss: 1.447  loss_cls: 0.7426  loss_box_reg: 0.5966  loss_rpn_cls: 0.09175  loss_rpn_loc: 0.005687  time: 0.2458  data_time: 0.0278  lr: 1.5042e-07  max_mem: 432M
[04/05 01:11:08 d2.utils.events]:  eta: 0:01:12  iter: 259  total_loss: 1.475  loss_cls: 0.737  loss_box_reg: 0.6313  loss_rpn_cls: 0.09042  loss_rpn_loc: 0.006798  time: 0.2466  data_time: 0.0451  lr: 1.4208e-07  max_mem: 432M
[04/05 01:11:13 d2.utils.events]:  eta: 0:01:08  iter: 279  total_loss: 1.47  loss_cls: 0.7409  loss_box_reg: 0.596  loss_rpn_cls: 0.07537  loss_rpn_loc: 0.005352  time: 0.2466  data_time: 0.0348  lr: 1.3375e-07  max_mem: 432M
[04/05 01:11:17 d2.utils.events]:  eta: 0:01:03  iter: 299  total_loss: 1.362  loss_cls: 0.7365  loss_box_reg: 0.5082  loss_rpn_cls: 0.08473  loss_rpn_loc: 0.004225  time: 0.2431  data_time: 0.0280  lr: 1.2542e-07  max_mem: 432M
[04/05 01:11:21 d2.utils.events]:  eta: 0:00:58  iter: 319  total_loss: 1.413  loss_cls: 0.7387  loss_box_reg: 0.5664  loss_rpn_cls: 0.08534  loss_rpn_loc: 0.006374  time: 0.2399  data_time: 0.0253  lr: 1.1708e-07  max_mem: 432M
[04/05 01:11:27 d2.utils.events]:  eta: 0:00:55  iter: 339  total_loss: 1.5  loss_cls: 0.735  loss_box_reg: 0.6373  loss_rpn_cls: 0.07618  loss_rpn_loc: 0.005614  time: 0.2444  data_time: 0.0444  lr: 1.0875e-07  max_mem: 432M
[04/05 01:11:31 d2.utils.events]:  eta: 0:00:50  iter: 359  total_loss: 1.384  loss_cls: 0.736  loss_box_reg: 0.5642  loss_rpn_cls: 0.07379  loss_rpn_loc: 0.003249  time: 0.2423  data_time: 0.0387  lr: 1.0042e-07  max_mem: 432M
[04/05 01:11:35 d2.utils.events]:  eta: 0:00:45  iter: 379  total_loss: 1.451  loss_cls: 0.738  loss_box_reg: 0.6443  loss_rpn_cls: 0.08626  loss_rpn_loc: 0.003971  time: 0.2402  data_time: 0.0275  lr: 9.2083e-08  max_mem: 432M
[04/05 01:11:41 d2.utils.events]:  eta: 0:00:42  iter: 399  total_loss: 1.425  loss_cls: 0.7382  loss_box_reg: 0.5347  loss_rpn_cls: 0.1259  loss_rpn_loc: 0.007587  time: 0.2419  data_time: 0.0359  lr: 8.375e-08  max_mem: 432M
[04/05 01:11:45 d2.utils.events]:  eta: 0:00:38  iter: 419  total_loss: 1.552  loss_cls: 0.7345  loss_box_reg: 0.624  loss_rpn_cls: 0.07093  loss_rpn_loc: 0.005918  time: 0.2409  data_time: 0.0287  lr: 7.5417e-08  max_mem: 432M
[04/05 01:11:50 d2.utils.events]:  eta: 0:00:33  iter: 439  total_loss: 1.379  loss_cls: 0.7342  loss_box_reg: 0.553  loss_rpn_cls: 0.07594  loss_rpn_loc: 0.003523  time: 0.2400  data_time: 0.0312  lr: 6.7083e-08  max_mem: 432M
[04/05 01:11:55 d2.utils.events]:  eta: 0:00:29  iter: 459  total_loss: 1.356  loss_cls: 0.7416  loss_box_reg: 0.5068  loss_rpn_cls: 0.06778  loss_rpn_loc: 0.004962  time: 0.2400  data_time: 0.0285  lr: 5.875e-08  max_mem: 432M
[04/05 01:12:00 d2.utils.events]:  eta: 0:00:25  iter: 479  total_loss: 1.45  loss_cls: 0.7354  loss_box_reg: 0.5846  loss_rpn_cls: 0.08065  loss_rpn_loc: 0.004995  time: 0.2410  data_time: 0.0365  lr: 5.0417e-08  max_mem: 432M
[04/05 01:12:04 d2.utils.events]:  eta: 0:00:21  iter: 499  total_loss: 1.45  loss_cls: 0.7344  loss_box_reg: 0.6091  loss_rpn_cls: 0.1082  loss_rpn_loc: 0.007439  time: 0.2396  data_time: 0.0302  lr: 4.2083e-08  max_mem: 432M
[04/05 01:12:08 d2.utils.events]:  eta: 0:00:16  iter: 519  total_loss: 1.43  loss_cls: 0.7305  loss_box_reg: 0.5831  loss_rpn_cls: 0.08499  loss_rpn_loc: 0.00539  time: 0.2381  data_time: 0.0233  lr: 3.375e-08  max_mem: 432M
[04/05 01:12:14 d2.utils.events]:  eta: 0:00:12  iter: 539  total_loss: 1.384  loss_cls: 0.7382  loss_box_reg: 0.5773  loss_rpn_cls: 0.08638  loss_rpn_loc: 0.004809  time: 0.2406  data_time: 0.0390  lr: 2.5417e-08  max_mem: 432M
[04/05 01:12:18 d2.utils.events]:  eta: 0:00:08  iter: 559  total_loss: 1.473  loss_cls: 0.7336  loss_box_reg: 0.646  loss_rpn_cls: 0.0693  loss_rpn_loc: 0.005401  time: 0.2394  data_time: 0.0266  lr: 1.7083e-08  max_mem: 432M
[04/05 01:12:22 d2.utils.events]:  eta: 0:00:04  iter: 579  total_loss: 1.462  loss_cls: 0.7343  loss_box_reg: 0.6151  loss_rpn_cls: 0.07802  loss_rpn_loc: 0.004659  time: 0.2378  data_time: 0.0229  lr: 8.75e-09  max_mem: 432M
[04/05 01:12:28 d2.utils.events]:  eta: 0:00:00  iter: 599  total_loss: 1.511  loss_cls: 0.7326  loss_box_reg: 0.6743  loss_rpn_cls: 0.07882  loss_rpn_loc: 0.006696  time: 0.2393  data_time: 0.0348  lr: 4.1667e-10  max_mem: 432M

The average precision is poor:

[04/05 01:13:06 d2.evaluation.coco_evaluation]: Evaluation results for bbox: 
|  AP   |  AP50  |  AP75  |  APs  |  APm  |  APl  |
|:-----:|:------:|:------:|:-----:|:-----:|:-----:|
| 1.602 | 4.187  | 1.305  | 0.000 | 0.099 | 3.720 |

Visualised results detect both balloon and non-balloon objects with similar 50% confidence:
image
image
image

When using older commit: See Expected behavior

  1. Related issues:

Expected behavior:

The total_loss value decrease as expected:

[04/05 01:30:24 d2.utils.events]:  eta: 0:03:35  iter: 19  total_loss: 1.36  loss_cls: 0.6522  loss_box_reg: 0.5817  loss_rpn_cls: 0.07138  loss_rpn_loc: 0.003773  time: 0.3636  data_time: 0.1102  lr: 3.8077e-06  max_mem: 433M
[04/05 01:30:31 d2.utils.events]:  eta: 0:03:14  iter: 39  total_loss: 1.353  loss_cls: 0.6354  loss_box_reg: 0.637  loss_rpn_cls: 0.0653  loss_rpn_loc: 0.00412  time: 0.3633  data_time: 0.0641  lr: 7.5527e-06  max_mem: 433M
[04/05 01:30:35 d2.utils.events]:  eta: 0:02:45  iter: 59  total_loss: 1.329  loss_cls: 0.6168  loss_box_reg: 0.5782  loss_rpn_cls: 0.08931  loss_rpn_loc: 0.006219  time: 0.3147  data_time: 0.0555  lr: 1.1298e-05  max_mem: 433M
[04/05 01:30:40 d2.utils.events]:  eta: 0:02:15  iter: 79  total_loss: 1.295  loss_cls: 0.5853  loss_box_reg: 0.5275  loss_rpn_cls: 0.07087  loss_rpn_loc: 0.006107  time: 0.2872  data_time: 0.0395  lr: 1.5043e-05  max_mem: 433M
[04/05 01:30:45 d2.utils.events]:  eta: 0:02:11  iter: 99  total_loss: 1.301  loss_cls: 0.551  loss_box_reg: 0.6606  loss_rpn_cls: 0.08501  loss_rpn_loc: 0.005243  time: 0.2854  data_time: 0.0547  lr: 1.8788e-05  max_mem: 433M
[04/05 01:30:50 d2.utils.events]:  eta: 0:02:02  iter: 119  total_loss: 1.222  loss_cls: 0.5153  loss_box_reg: 0.5809  loss_rpn_cls: 0.06901  loss_rpn_loc: 0.004477  time: 0.2773  data_time: 0.0507  lr: 2.2533e-05  max_mem: 433M
[04/05 01:30:54 d2.utils.events]:  eta: 0:01:44  iter: 139  total_loss: 1.124  loss_cls: 0.4636  loss_box_reg: 0.5349  loss_rpn_cls: 0.06427  loss_rpn_loc: 0.002402  time: 0.2660  data_time: 0.0407  lr: 2.6278e-05  max_mem: 433M
[04/05 01:30:59 d2.utils.events]:  eta: 0:01:41  iter: 159  total_loss: 1.055  loss_cls: 0.42  loss_box_reg: 0.5459  loss_rpn_cls: 0.05461  loss_rpn_loc: 0.004683  time: 0.2649  data_time: 0.0447  lr: 3.0023e-05  max_mem: 433M
[04/05 01:31:04 d2.utils.events]:  eta: 0:01:39  iter: 179  total_loss: 1.093  loss_cls: 0.4007  loss_box_reg: 0.6011  loss_rpn_cls: 0.09434  loss_rpn_loc: 0.007772  time: 0.2638  data_time: 0.0249  lr: 3.3768e-05  max_mem: 433M
[04/05 01:31:08 d2.utils.events]:  eta: 0:01:29  iter: 199  total_loss: 0.9464  loss_cls: 0.3487  loss_box_reg: 0.5435  loss_rpn_cls: 0.07013  loss_rpn_loc: 0.004343  time: 0.2564  data_time: 0.0288  lr: 3.7513e-05  max_mem: 433M
[04/05 01:31:12 d2.utils.events]:  eta: 0:01:21  iter: 219  total_loss: 1.041  loss_cls: 0.3414  loss_box_reg: 0.6349  loss_rpn_cls: 0.0804  loss_rpn_loc: 0.005732  time: 0.2501  data_time: 0.0289  lr: 4.1258e-05  max_mem: 433M
[04/05 01:31:18 d2.utils.events]:  eta: 0:01:20  iter: 239  total_loss: 0.9544  loss_cls: 0.2988  loss_box_reg: 0.5186  loss_rpn_cls: 0.06485  loss_rpn_loc: 0.004111  time: 0.2552  data_time: 0.0423  lr: 4.5003e-05  max_mem: 433M
[04/05 01:31:22 d2.utils.events]:  eta: 0:01:14  iter: 259  total_loss: 0.7627  loss_cls: 0.2579  loss_box_reg: 0.4708  loss_rpn_cls: 0.04615  loss_rpn_loc: 0.003904  time: 0.2502  data_time: 0.0290  lr: 4.8748e-05  max_mem: 433M
[04/05 01:31:26 d2.utils.events]:  eta: 0:01:08  iter: 279  total_loss: 0.8192  loss_cls: 0.239  loss_box_reg: 0.5564  loss_rpn_cls: 0.04034  loss_rpn_loc: 0.003721  time: 0.2458  data_time: 0.0206  lr: 5.2493e-05  max_mem: 433M
[04/05 01:31:31 d2.utils.events]:  eta: 0:01:05  iter: 299  total_loss: 0.8055  loss_cls: 0.2214  loss_box_reg: 0.504  loss_rpn_cls: 0.04697  loss_rpn_loc: 0.003925  time: 0.2471  data_time: 0.0513  lr: 5.6238e-05  max_mem: 433M
[04/05 01:31:37 d2.utils.events]:  eta: 0:01:00  iter: 319  total_loss: 0.7221  loss_cls: 0.215  loss_box_reg: 0.496  loss_rpn_cls: 0.04628  loss_rpn_loc: 0.006306  time: 0.2509  data_time: 0.0392  lr: 5.9983e-05  max_mem: 433M
[04/05 01:31:41 d2.utils.events]:  eta: 0:00:55  iter: 339  total_loss: 0.6474  loss_cls: 0.1822  loss_box_reg: 0.4004  loss_rpn_cls: 0.04719  loss_rpn_loc: 0.002194  time: 0.2474  data_time: 0.0263  lr: 6.3728e-05  max_mem: 433M
[04/05 01:31:46 d2.utils.events]:  eta: 0:00:51  iter: 359  total_loss: 0.6308  loss_cls: 0.1845  loss_box_reg: 0.4051  loss_rpn_cls: 0.03855  loss_rpn_loc: 0.004295  time: 0.2463  data_time: 0.0368  lr: 6.7473e-05  max_mem: 433M
[04/05 01:31:51 d2.utils.events]:  eta: 0:00:47  iter: 379  total_loss: 0.5822  loss_cls: 0.1467  loss_box_reg: 0.3891  loss_rpn_cls: 0.03839  loss_rpn_loc: 0.002698  time: 0.2469  data_time: 0.0334  lr: 7.1218e-05  max_mem: 433M
[04/05 01:31:55 d2.utils.events]:  eta: 0:00:42  iter: 399  total_loss: 0.6388  loss_cls: 0.1751  loss_box_reg: 0.3792  loss_rpn_cls: 0.04692  loss_rpn_loc: 0.005512  time: 0.2440  data_time: 0.0276  lr: 7.4963e-05  max_mem: 433M
[04/05 01:31:59 d2.utils.events]:  eta: 0:00:38  iter: 419  total_loss: 0.4451  loss_cls: 0.1312  loss_box_reg: 0.2895  loss_rpn_cls: 0.03037  loss_rpn_loc: 0.003738  time: 0.2417  data_time: 0.0271  lr: 7.8708e-05  max_mem: 433M
[04/05 01:32:05 d2.utils.events]:  eta: 0:00:34  iter: 439  total_loss: 0.4999  loss_cls: 0.1617  loss_box_reg: 0.3061  loss_rpn_cls: 0.04294  loss_rpn_loc: 0.005447  time: 0.2451  data_time: 0.0464  lr: 8.2453e-05  max_mem: 433M
[04/05 01:32:09 d2.utils.events]:  eta: 0:00:29  iter: 459  total_loss: 0.2906  loss_cls: 0.1105  loss_box_reg: 0.154  loss_rpn_cls: 0.0219  loss_rpn_loc: 0.002998  time: 0.2422  data_time: 0.0179  lr: 8.6198e-05  max_mem: 433M
[04/05 01:32:13 d2.utils.events]:  eta: 0:00:25  iter: 479  total_loss: 0.3995  loss_cls: 0.1338  loss_box_reg: 0.2239  loss_rpn_cls: 0.02851  loss_rpn_loc: 0.004489  time: 0.2400  data_time: 0.0253  lr: 8.9943e-05  max_mem: 433M
[04/05 01:32:18 d2.utils.events]:  eta: 0:00:21  iter: 499  total_loss: 0.2791  loss_cls: 0.09935  loss_box_reg: 0.1678  loss_rpn_cls: 0.0238  loss_rpn_loc: 0.001993  time: 0.2405  data_time: 0.0353  lr: 9.3688e-05  max_mem: 433M
[04/05 01:32:23 d2.utils.events]:  eta: 0:00:17  iter: 519  total_loss: 0.3667  loss_cls: 0.1141  loss_box_reg: 0.2005  loss_rpn_cls: 0.01896  loss_rpn_loc: 0.00434  time: 0.2407  data_time: 0.0272  lr: 9.7433e-05  max_mem: 433M
[04/05 01:32:26 d2.utils.events]:  eta: 0:00:12  iter: 539  total_loss: 0.2748  loss_cls: 0.09409  loss_box_reg: 0.1638  loss_rpn_cls: 0.02734  loss_rpn_loc: 0.004298  time: 0.2387  data_time: 0.0281  lr: 0.00010118  max_mem: 433M
[04/05 01:32:30 d2.utils.events]:  eta: 0:00:08  iter: 559  total_loss: 0.2836  loss_cls: 0.09103  loss_box_reg: 0.1756  loss_rpn_cls: 0.0221  loss_rpn_loc: 0.002208  time: 0.2371  data_time: 0.0321  lr: 0.00010492  max_mem: 433M
[04/05 01:32:37 d2.utils.events]:  eta: 0:00:04  iter: 579  total_loss: 0.2813  loss_cls: 0.09612  loss_box_reg: 0.1542  loss_rpn_cls: 0.02588  loss_rpn_loc: 0.003828  time: 0.2398  data_time: 0.0478  lr: 0.00010867  max_mem: 433M
[04/05 01:32:41 d2.utils.events]:  eta: 0:00:00  iter: 599  total_loss: 0.2875  loss_cls: 0.09217  loss_box_reg: 0.1633  loss_rpn_cls: 0.02244  loss_rpn_loc: 0.004157  time: 0.2380  data_time: 0.0194  lr: 0.00011241  max_mem: 433M

The average precision is good:

[04/05 01:33:18 d2.evaluation.coco_evaluation]: Evaluation results for bbox: 
|   AP   |  AP50  |  AP75  |  APs  |  APm  |  APl   |
|:------:|:------:|:------:|:-----:|:-----:|:------:|
| 48.513 | 65.306 | 54.182 | 0.000 | 8.384 | 74.112 |

The visualised results detect balloon objects with around 90% confidence:
image
image
image

@reganh98 reganh98 added the bug Something isn't working label Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant