- The model they are created will be a classifier from github issue they have to finish quickly in 5 minutes
- Create a semantic search for coding question
- Github Issue
- Add github discussion
- add open source forum
- Find Code snippet for a code example of the question
- Add stack overflow for the question.
- The requirement will be using low abstraction for the project for very basic stuff
- We will use PyTorch no lightning. In the future I wished there was a faster way they can use lightning or composer
- The first part of the project is to get the dataset
- You will have to use the python_graphql_client to download the github issue using the graphiql api call.
- Create a new directory called multi_label_classification
- install miniconda ( if you have window install install wsl 2 and get familiar with vim)
- Create a new conda environment called aim
conda create -n AIM python==3.7 -y conda activate AIM
- know do you first conda install with python graphql
pip install python-graphql-client
- To explore the github graphiql go to this link https://docs.github.com/en/graphql/overview/explorer
- Press the button with the big triangle pointing to the right
- You should see something like
{ "data": { "viewer": { "login": "Rami-Ismael" ## Github Profile } } }
- Click the explore button
- just play
- Create your first graphiql query for github that will grab data about issue and labels from one repository
{ "data": { "repository": { "diskUsage": 101531, "issue": { "id": "MDU6SXNzdWU1MDI4MTY1MzA=", "title": "Add TPU support", "labels": { "edges": [ { "node": { "id": "MDU6TGFiZWwxMjk3MDkwNjg4", "name": "feature" } }, { "node": { "id": "MDU6TGFiZWwxMjk3MDkwNjg5", "name": "help wanted" } } ] } } } } }
- You will download the github issue over this github repository
- PyTorch Lightning
- PyTorch
- Optuna
- Pandas
- Numpy
- Zarr
- Hugging Face Transformers
- Pinecone
- Weavite
- Torch Metrics
- Ray Tune
- Weight and Bias ... ( You can add more if you want)
- The dataset will be store in json format. You must find a python library that can compress the library
- The dataset will be store in azure.
- You will have to use the python_graphql_client to download the github issue using the graphiql api call.
- The second part will be exploring the dataset using scikit-learn , seaborn
- to explore the dataset
- Target distribution
- The lengtht of the text
- Word Counts
- Word Legth
- Most Common Word
- Calculate the tf-idf of the most common word in that doesn't exist in common parlar with other respository https://www.kaggle.com/code/datafan07/disaster-tweets-nlp-eda-bert-with-transformers/notebook [[Data Science]]
- Explain the dataset for me
- to explore the dataset
- Learn about NLP
- Grooking Deep Learning Chapter 11 on NLP
- Go through Hugging Face Course
- Create the multi-label classfication model with only pytorch and hugging face
- Add Infrastructure tooling Weight and Bias
- Add your model to the cloud
- Test your model (https://fullstackdeeplearning.com/)
- Monitor your model
We should a text classfication model in 4 week.
Create a semantic search function that will produce an answer to people question about coding documentation
- Go through [[Haystack]] documentation
- Go through [[Pinecone]] documentation
- go through [[Weavite]] documentation
- Building the semantic search pipeline
- I want the pipeline find question and answer pair
- add chat in the model
- Add reinforcement learning to increase the model performance
Codex- Clone that can produce code from a propmpt to code
Create an semantic search for for meme
Generate meme with diffusion model