Playing with Nano GPT

This repo is my attempt to experiment with a few aspects of GPT and getting hands on experience of all my theoretical learnings. I have used Karpthy's version of a nano GPT for this experimentation. You can find more info about the same here : https://github.com/karpathy/nanoGPT

I focused on the following tasks :

Setup and Reproduction

Setting up the basic flow for our experiment and generating some initial results out of it

Prepare

python data/shakespeare_char/prepare.py

Train

python train.py config/train_shakespeare_char.py --device=mps --compile=False

Sample

python sample.py --out_dir=out-shakespeare-char --device=cpu

Example generated by the model

BRUTUS:
For the devil'd the beast torm:
When could I should be saw the pride
That should be thou not be subject.'

MERCUTIO:
For what, comes the way
Methink of my company?

MERCUTIO:
Tranio, Romeo, go, tyrant, and since to speak.

SIRREY:
Then did your hearts first,
For I make more call them again.

BRUTUS:
Come, sir, my lord.

SICINIUS:
Sir, sir.

CORIOLANUS:
Well, let us murderer?

First Servingman:
Take me to have better.

First Citizen:
I can perfort you are thou wert not the good?

CORIOLAN

Hyperparameter Experimentation

Modifying hyperparameters such as number of heads, layers in order to achieve a setting which produces the lowest validation loss.

Use the following command to train nano GPT on Shakespeare data using your Mac's on-chip GPU. Using lower settings for hyperparameter so that it doesn't take more than 10 mins to run. Feel free to play around with these hyperparameters.

python train.py config/train_shakespeare_char.py --device=mps --compile=False --eval_iters=20 --log_interval=1 --block_size=64 --batch_size=12 --n_layer=4 --n_head=4 --n_embd=128 --max_iters=3000 --lr_decay_iters=3000 --dropout=0.0

We get the following losses with different number of heads and layers :

Layers = 4, Heads = 4:
- Train Loss: 1.6169
- Validation Loss: 1.7946
Layers = 8, Heads = 8:
- Train Loss: 1.6393
- Validation Loss: 1.7759
Layers = 16, Heads = 16:
- Train Loss: 1.5978
- Validation Loss: 1.7658
Layers = 32, Heads = 32:
- Train Loss: 1.5765
- Validation Loss: 1.7662

Evaluation Metrics

Specific Metric :

Meant to capture how close the generated data distribution is to the training distribution.

I have used Bleu score for my specific metric. The BLEU (Bilingual Evaluation Understudy) score is computed by comparing n-grams (contiguous sequences of n words or characters) between the generated text and reference texts. BLEU typically considers n-grams of different lengths, from unigrams (single words) to higher-order n-grams like bigrams, trigrams, and so on. Using this kind of evaluation BLEU measures the similarity between the generated text and the reference text (training text). This seems ideal for our current use case of comparing how close the generated data distribution is to the training distribution.

I wrote a script evaluation_bleu.py which uses the nltk library in order to compute the bleu score.

python evaluation_bleu.py --data_dir=shakespeare_char

Result : BLEU Score: 0.5455284552845528

General Metric :

Meant to capture how our model performs in general for text generation regardless of data it has been trained on.

I have used a simple spell check function to test if our model produces words which actually exist in the English language. This makes sense as we are training a character level GPT and if our model is able to associate characters into meaningful words we are in a win-win situation.

I wrote a script evaluation_spell_check.py which uses the nltk words corpus for spell checking the generated words by our model.

python evaluation_spell_check.py --data_dir=shakespeare_char

Result : Percentage of words not correctly spelled: 8.292682926829269 %

My favourite Dataset

Here I experiment training nano GPT with my favourite dataset which is the screenplay scripts from the popular TV series Breaking Bad

I have downloaded this dataset from : https://bulletproofscreenwriting.tv/breaking-bad-tv-script-download/

In particular, I use scripts from the following episodes:

Follow these steps to experiment on this dataset:

Prepare

python data/breaking_bad_char/prepare.py --num_scripts=1

Train

python train.py config/train_breaking_bad_char.py --device=mps --compile=False --eval_iters=20 --log_interval=1 --block_size=64 --batch_size=12 --n_layer=16 --n_head=16 --n_embd=128 --max_iters=2000 --lr_decay_iters=2000 --dropout=0.0 --ckpt_file=breaking_bad_ckpt.pt

Sample

python sample.py --out_dir=out-shakespeare-char --ckpt_file=breaking_bad_ckpt.pt --device=cpu

An interesting experiment I perform on this is to vary the number of characters in the input and obsereve the variation of the above evaluation metrics. You can very the input character size using the num_scripts flag in the prepare command above.

To produce the evalutation metrics result use the following commands :

BLEU score :

python evaluation_bleu.py --data_dir=breaking_bad_char

Length of Dataset in Characters (1K scale)	Bleu Score
74	0.4557877814
149	0.4930498774
221	0.4955752212
281	0.5
353	0.5193415638
415	0.5375203915
478	0.5366795367
553	0.4962593516

Spell Check :

python evaluation_spell_check.py --data_dir=breaking_bad_char

Length of Dataset in Characters (1K scale)	Spell Check Score
74	0.813
149	0.827
221	0.832
281	0.843
353	0.859
415	0.868
478	0.864
553	0.832

Fine tuning

Now we fine tune the model trained on Shakespeare dataset on our Breaking Bad dataset. We will also carry out some evaluation to see how much data is required to go from Shakesperean output to something that resembles our dataset.

Fine tune training

python train.py config/finetune_breaking_bad.py --device=mps --compile=False

Comparing the pre-train and Fine tune data distributions

python compare_similarity.py

Results :

Data in Characters (1K scale)	Training Fine-Tune (max_iters)	Bleu of output compared to Shakespeare	Bleu of output compared to Breaking Bad (Our Dataset)
74	20	0.45340501	0.3324372
74	50	0.42239858	0.3959435
74	100	0.3683274	0.4092526
149	20	0.4535809	0.3545534
149	50	0.3669391	0.3687556
149	100	0.38263950	0.402125
281	20	0.4276672	0.342676
281	50	0.39730941	0.3757847
281	100	0.3564266	0.4020054
553	20	0.45182	0.3509
553	50	0.412078	0.3925399
553	100	0.38356	0.392694

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
config		config
data		data
logo		logo
plots		plots
LICENSE		LICENSE
README.md		README.md
bench.py		bench.py
clean_fine_tune.sh		clean_fine_tune.sh
clean_steps.sh		clean_steps.sh
compare_similarity.py		compare_similarity.py
configurator.py		configurator.py
evaluation_bleu.py		evaluation_bleu.py
evaluation_spell_check.py		evaluation_spell_check.py
model.py		model.py
run_fine_tune.sh		run_fine_tune.sh
run_steps.sh		run_steps.sh
sample.py		sample.py
scaling_laws.ipynb		scaling_laws.ipynb
train.py		train.py
transformer_sizing.ipynb		transformer_sizing.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Playing with Nano GPT

Setup and Reproduction

Hyperparameter Experimentation

Evaluation Metrics

Specific Metric :

General Metric :

My favourite Dataset

BLEU score :

Spell Check :

Fine tuning

About

Releases

Packages

Languages

License

sabirali2560/play_nano_gpt

Folders and files

Latest commit

History

Repository files navigation

Playing with Nano GPT

Setup and Reproduction

Hyperparameter Experimentation

Evaluation Metrics

Specific Metric :

General Metric :

My favourite Dataset

BLEU score :

Spell Check :

Fine tuning

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages