Scripts to convert datasets from various sources to Hugging Face datasets
.
Convert Any Kaggle Dataset To a Hugging Face Dataset:
git clone https://github.com/nateraw/huggingface-datasets-converter.git
cd huggingface-datasets-converter
pip install -r requirements.txt
Make sure to authenticate with Hugging Face Hub
huggingface-cli login
Make sure you have your kaggle.json
file in ~/.kaggle
. Then...
Provide the kaggle dataset ID and the Hugging Face Hub repo ID that you'd like to upload to (it will be created if it doesn't exist).
python run_kaggle.py --kaggle_id evangower/airbnb-stock-price --repo_id nateraw/airbnb-stock-price
Provide the record ID and the name of the repo on Hugging Face Hub you'd like to upload to (it will be created if it doesn't exist).
python run_zenodo.py --zenodo_record 6606485 --repo_id nateraw/espeni
For zenodo, you can also pass --workers
flag if you want to do this with multiprocessing.
python run_zenodo.py --zenodo_record 6606485 --repo_id nateraw/espeni --workers 2