Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accuracy issue #190

Open
saifullah27 opened this issue Aug 9, 2019 · 17 comments
Open

accuracy issue #190

saifullah27 opened this issue Aug 9, 2019 · 17 comments

Comments

@saifullah27
Copy link

hi i trained model and the accuracy was 80% on test dataset, i saved model.
when i load the saved model and measured accuracy again, i achieved 1%. is this issue with random seed? or something else. looking forward for your response. thanks

@jstypka
Copy link
Collaborator

jstypka commented Aug 9, 2019

@saifullah27 you probably did not save and load the word2vec model and the scaler as well. They're essential for getting reasonable performance.

@saifullah27
Copy link
Author

can u please take a look on my code? what wrong i m doing?
model = Magpie()
model.train_word2vec(DATA_DIR, vec_dim=100)
model.fit_scaler(DATA_DIR)
model.init_word_vectors(DATA_DIR, vec_dim=100)
model.train(DATA_DIR, labels, nn_model='cnn', test_ratio=0.2, epochs=30)

model.save_word2vec_model('../models/embeddings/mymodel', overwrite=True)
model.save_scaler('../models/scaler/mymodel', overwrite=True)
model.save_model('../models/model/mymodel.h5')

# load already trained model
model = Magpie(
    keras_model='../models/model/mymodel.h5',
    word2vec_model='../models/embeddings/mymodel',
    scaler='../models/scaler/mymodel',
    labels = labels)

@jstypka
Copy link
Collaborator

jstypka commented Aug 10, 2019

Looks correct 👍 You don't need to run train_word2vec() and fit_scaler() if you run init_word_vectors(). The latter one is just a substitute for the first two. But the code is correct, so it should work. Maybe you're testing on different data that you train on?

@saifullah27
Copy link
Author

i extracted some rows from my train set as unseen dataset.
step 1: i trained model on trainset
step 2: i saved model
step 3: loaded the saved model and tested it on unseen dataset, achieved good accuracy.

now i want to execute code again:
so i removed the code of training model
and load the trained model again from directory
run same unseen dataset and achieved accuracy less.

@jstypka
Copy link
Collaborator

jstypka commented Aug 10, 2019

how do you test the model i.e. how do you run it on the unseen data?

@saifullah27
Copy link
Author

first i wrote this code:

model = Magpie()
#model.train_word2vec(DATA_DIR, vec_dim=100)
#model.fit_scaler(DATA_DIR)
model.init_word_vectors(DATA_DIR, vec_dim=100)
model.train(DATA_DIR, labels, nn_model='cnn', test_ratio=.3, epochs=30)

model.save_word2vec_model('../models/embeddings/mymodel', overwrite=True)
model.save_scaler('../models/scaler/mymodel', overwrite=True)
model.save_model('../models/model/mymodel.h5')
# load already trained model
model = Magpie(
    keras_model='../models/model/mymodel.h5',
    word2vec_model='../models/embeddings/mymodel',
    scaler='../models/scaler/mymodel',
    labels = labels)

predictions = model.predict_from_file(unseendata)
i got 86% accuracy

now i did this:
# load already trained model
model = Magpie(
keras_model='../models/model/mymodel.h5',
word2vec_model='../models/embeddings/mymodel',
scaler='../models/scaler/mymodel',
labels = labels)

predictions = model.predict_from_file(unseendata) #same dataset as above
i got around 1-5% accuracy

@jstypka
Copy link
Collaborator

jstypka commented Aug 11, 2019

that looks very weird, Magpie should give you the same answer. What happens if you run model.predict_from_file(unseendata) twice without loading the files back again? Does it fail as well?

@saifullah27
Copy link
Author

yes it fails.

@jstypka
Copy link
Collaborator

jstypka commented Aug 17, 2019

That doesn't seem possible. How do you compute accuracy? predict_from_file just runs a prediction for one data sample.

@saifullah27
Copy link
Author

i think this is something issue with weights, the model doesnt save weights. i read about this issue on several places.

@hzsolutions
Copy link

I also confirmed that there is a problem with saved model.
The prediction of "Black holes are cool! "when the model first trained and get saved is

[('Theory-HEP', 0.48431695), ('Gravitation and Cosmology', 0.36985886), ('Phenomenology-HEP', 0.33781576), ('Astrophysics', 0.2659044), ('Experiment-HEP', 1.513857e-06)]

Load saved model, it returns different result every time it re-loads and runs.
Here are results I am getting

[('Phenomenology-HEP', 0.48431695), ('Astrophysics', 0.36985886), ('Theory-HEP', 0.33781576), ('Experiment-HEP', 0.2659044), ('Gravitation and Cosmology', 1.513857e-06)]

[('Experiment-HEP', 0.48431695), ('Theory-HEP', 0.36985886), ('Phenomenology-HEP', 0.33781576), ('Astrophysics', 0.2659044), ('Gravitation and Cosmology', 1.513857e-06)]

[('Astrophysics', 0.6604629), ('Gravitation and Cosmology', 0.464918), ('Experiment-HEP', 0.22854687), ('Phenomenology-HEP', 0.20457356), ('Theory-HEP', 9.436367e-07)]

Any idea why this is happening?

@saifullah27
Copy link
Author

you have not saved the model weights after training. it requires to save the model weights and load it and compile the model based on saved weights.

@jstypka
Copy link
Collaborator

jstypka commented Aug 27, 2019

are you getting different answers also without saving & reloading? Just train it, keep it in memory and query it 10 times. I'd be curious whether it gives deterministic responses.

@saifullah27
Copy link
Author

no, if i dnt save/load model. it gives almost similar answer. there is slight changes in accuracy rate.

@saifullah27
Copy link
Author

@vnwind please mail me directly at [email protected]

@tapiatellez
Copy link

Hello, I trained the model and it showed an top_k_accuracy going around 80%, then I predicted on my test set and got the mean average precision score and it went to 16%, do you think this is because of the different scores?

@saihow1999
Copy link

@saifullah27 @jstypka @hzsolutions [Fix Found]
I was also facing the same issue but what I did was: after saving the word2vec, scaler and .h5 model. I loaded them and passed the label in the same order as I passed during the time of training. It gave me same results. Example,
let say if you train your model using,
magpie.train('data/hep-categories', labels, test_ratio=0.2, epochs=30)
where labels = ['cow','dog','cat']
Then make sure that labels variable is same in order too while inferencing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants