Skip to content

This repo is my attempt to experiment with a few aspects of GPT and getting hands on experience of all my theoretical learnings.

License

Notifications You must be signed in to change notification settings

sabirali2560/play_nano_gpt

Repository files navigation

Playing with Nano GPT

This repo is my attempt to experiment with a few aspects of GPT and getting hands on experience of all my theoretical learnings. I have used Karpthy's version of a nano GPT for this experimentation. You can find more info about the same here : https://github.com/karpathy/nanoGPT

I focused on the following tasks :

Setup and Reproduction

Setting up the basic flow for our experiment and generating some initial results out of it

Prepare

python data/shakespeare_char/prepare.py

Train

python train.py config/train_shakespeare_char.py --device=mps --compile=False

Sample

python sample.py --out_dir=out-shakespeare-char --device=cpu

Example generated by the model

BRUTUS:
For the devil'd the beast torm:
When could I should be saw the pride
That should be thou not be subject.'

MERCUTIO:
For what, comes the way
Methink of my company?

MERCUTIO:
Tranio, Romeo, go, tyrant, and since to speak.

SIRREY:
Then did your hearts first,
For I make more call them again.

BRUTUS:
Come, sir, my lord.

SICINIUS:
Sir, sir.

CORIOLANUS:
Well, let us murderer?

First Servingman:
Take me to have better.

First Citizen:
I can perfort you are thou wert not the good?

CORIOLAN

Hyperparameter Experimentation

Modifying hyperparameters such as number of heads, layers in order to achieve a setting which produces the lowest validation loss.

Use the following command to train nano GPT on Shakespeare data using your Mac's on-chip GPU. Using lower settings for hyperparameter so that it doesn't take more than 10 mins to run. Feel free to play around with these hyperparameters.

python train.py config/train_shakespeare_char.py --device=mps --compile=False --eval_iters=20 --log_interval=1 --block_size=64 --batch_size=12 --n_layer=4 --n_head=4 --n_embd=128 --max_iters=3000 --lr_decay_iters=3000 --dropout=0.0

We get the following losses with different number of heads and layers :

  1. Layers = 4, Heads = 4:

    • Train Loss: 1.6169
    • Validation Loss: 1.7946
  2. Layers = 8, Heads = 8:

    • Train Loss: 1.6393
    • Validation Loss: 1.7759
  3. Layers = 16, Heads = 16:

    • Train Loss: 1.5978
    • Validation Loss: 1.7658
  4. Layers = 32, Heads = 32:

    • Train Loss: 1.5765
    • Validation Loss: 1.7662

Evaluation Metrics

Specific Metric :

Meant to capture how close the generated data distribution is to the training distribution.

I have used Bleu score for my specific metric. The BLEU (Bilingual Evaluation Understudy) score is computed by comparing n-grams (contiguous sequences of n words or characters) between the generated text and reference texts. BLEU typically considers n-grams of different lengths, from unigrams (single words) to higher-order n-grams like bigrams, trigrams, and so on. Using this kind of evaluation BLEU measures the similarity between the generated text and the reference text (training text). This seems ideal for our current use case of comparing how close the generated data distribution is to the training distribution.

I wrote a script evaluation_bleu.py which uses the nltk library in order to compute the bleu score.

python evaluation_bleu.py --data_dir=shakespeare_char

Result : BLEU Score: 0.5455284552845528

General Metric :

Meant to capture how our model performs in general for text generation regardless of data it has been trained on.

I have used a simple spell check function to test if our model produces words which actually exist in the English language. This makes sense as we are training a character level GPT and if our model is able to associate characters into meaningful words we are in a win-win situation.

I wrote a script evaluation_spell_check.py which uses the nltk words corpus for spell checking the generated words by our model.

python evaluation_spell_check.py --data_dir=shakespeare_char

Result : Percentage of words not correctly spelled: 8.292682926829269 %

My favourite Dataset

Here I experiment training nano GPT with my favourite dataset which is the screenplay scripts from the popular TV series Breaking Bad

I have downloaded this dataset from : https://bulletproofscreenwriting.tv/breaking-bad-tv-script-download/

In particular, I use scripts from the following episodes:

  1. Breaking Bad - 101 Pilot (2008)

  2. Breaking Bad - 301 No Mas (2010)

  3. Breaking Bad - 306 Sunset (2010)

  4. Breaking Bad - 307 One Minute (2010)

  5. Breaking Bad - 309 Kafkaesque (2010)

  6. Breaking Bad - 310 Fly (2010)

  7. Breaking Bad - 311 Abiquiu (2010)

  8. Breaking Bad - 312 Half Measures (2010)

Follow these steps to experiment on this dataset:

Prepare

python data/breaking_bad_char/prepare.py --num_scripts=1

Train

python train.py config/train_breaking_bad_char.py --device=mps --compile=False --eval_iters=20 --log_interval=1 --block_size=64 --batch_size=12 --n_layer=16 --n_head=16 --n_embd=128 --max_iters=2000 --lr_decay_iters=2000 --dropout=0.0 --ckpt_file=breaking_bad_ckpt.pt

Sample

python sample.py --out_dir=out-shakespeare-char --ckpt_file=breaking_bad_ckpt.pt --device=cpu

An interesting experiment I perform on this is to vary the number of characters in the input and obsereve the variation of the above evaluation metrics. You can very the input character size using the num_scripts flag in the prepare command above.

To produce the evalutation metrics result use the following commands :

BLEU score :

python evaluation_bleu.py --data_dir=breaking_bad_char
Length of Dataset in Characters (1K scale) Bleu Score
74 0.4557877814
149 0.4930498774
221 0.4955752212
281 0.5
353 0.5193415638
415 0.5375203915
478 0.5366795367
553 0.4962593516

Spell Check :

python evaluation_spell_check.py --data_dir=breaking_bad_char
Length of Dataset in Characters (1K scale) Spell Check Score
74 0.813
149 0.827
221 0.832
281 0.843
353 0.859
415 0.868
478 0.864
553 0.832

Fine tuning

Now we fine tune the model trained on Shakespeare dataset on our Breaking Bad dataset. We will also carry out some evaluation to see how much data is required to go from Shakesperean output to something that resembles our dataset.

Fine tune training

python train.py config/finetune_breaking_bad.py --device=mps --compile=False

Comparing the pre-train and Fine tune data distributions

python compare_similarity.py

Results :

Data in Characters (1K scale) Training Fine-Tune (max_iters) Bleu of output compared to Shakespeare Bleu of output compared to Breaking Bad (Our Dataset)
74 20 0.45340501 0.3324372
74 50 0.42239858 0.3959435
74 100 0.3683274 0.4092526
149 20 0.4535809 0.3545534
149 50 0.3669391 0.3687556
149 100 0.38263950 0.402125
281 20 0.4276672 0.342676
281 50 0.39730941 0.3757847
281 100 0.3564266 0.4020054
553 20 0.45182 0.3509
553 50 0.412078 0.3925399
553 100 0.38356 0.392694

About

This repo is my attempt to experiment with a few aspects of GPT and getting hands on experience of all my theoretical learnings.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published