Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model doesn't learn #51

Open
SilvesterYu opened this issue Sep 20, 2022 · 2 comments
Open

Model doesn't learn #51

SilvesterYu opened this issue Sep 20, 2022 · 2 comments

Comments

@SilvesterYu
Copy link

Hi! Thank you for your wonderful project! I was trying to train the model on the dataset provided in your README. I used the command:

python train.py --phi 0 --weights ../Linemod_and_Occlusion-001/COCO/efficientdet-d0.h5 linemod ../Linemod_and_Occlusion-001/Linemod_preprocessed/ --object-id 4

I got the result with 0 and nan values:

mAP: 0.0000
ADD: 0.0000
ADD-S: 0.0000
5cm_5degree: 0.0000
TranslationErrorMean_in_mm: nan
TranslationErrorStd_in_mm: nan
RotationErrorMean_in_degree: nan
RotationErrorStd_in_degree: nan
2D-Projection: 0.0000
Summed_Translation_Rotation_Error: nan
ADD(-S): 0.0000
AveragePointDistanceMean_in_mm: nan
AveragePointDistanceStd_in_mm: nan
AverageSymmetricPointDistanceMean_in_mm: nan
AverageSymmetricPointDistanceStd_in_mm: nan
MixedAveragePointDistanceMean_in_mm: nan
MixedAveragePointDistanceStd_in_mm: nan

Epoch 00001: ADD improved from -inf to 0.00000, saving model to checkpoints/20_09_2022_18_04_29/object_4/phi_0_linemod_best_ADD.h5
1790/1790 [==============================] - 1376s 769ms/step - loss: nan - classification_loss: 13689.5068 - regression_loss: nan - transformation_loss: 0.0000e+00

I followed issue #23 and the 2D bounding box is working fine. Am I missing something? I would really appreciate some help.

@SilvesterYu
Copy link
Author

SilvesterYu commented Sep 21, 2022

Hi, I figured that it is because tensorflow 1.15 requires cuda 10.0 but I am using a different cuda version.

I downgraded to cpu instead and the problem is resolved. The training is much slower though.

Here is what I did:

(1) downgrade to cpu if cuda is not 10.0

pip install tensorflow-cpu==1.15

(2) solve version errors

pip install 'h5py==2.10.0' --force-reinstall
pip install numpy==1.19.5

(3) adding a weight in the argument also helps, according to issue #23. The COCO weights can be downloaded from here: https://drive.google.com/drive/folders/1n3opeOo2ko9GwC9G5_OkVF1sk0NVd3YY, which includes efficientdet-d0.h5 in my command below.

Train

using object id 4 as an example

python3 train.py --phi 0 --weights <path_to_weight>/efficientdet-d0.h5 linemod <path_to_dataset>/Linemod_preprocessed/ --object-id 4

If .decode() errors occur in custom_load_weights.py, comment out the .decode() line and un-comment the line without .decode()

@BhavinPrajapti
Copy link

@SilvesterYu Hello,
I have win11 i want to run this reporistory but tensorflow 1.15.0 not supported for cuda 11.7. I have windows11 and cuda 10.2 is not available for win11. can you please help me regarding requirements installtion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants