Skip to content

Scripts to convert datasets from various sources to Hugging Face Datasets.

Notifications You must be signed in to change notification settings

StarkOsae/huggingface-datasets-converter

 
 

Repository files navigation

Hugging Face Datasets Converter

Scripts to convert datasets from various sources to Hugging Face datasets.

Demo

Convert Any Kaggle Dataset To a Hugging Face Dataset: Open In Colab

Usage

Setup

git clone https://github.com/nateraw/huggingface-datasets-converter.git
cd huggingface-datasets-converter
pip install -r requirements.txt

Make sure to authenticate with Hugging Face Hub

huggingface-cli login

Convert Kaggle Dataset

Make sure you have your kaggle.json file in ~/.kaggle. Then...

Provide the kaggle dataset ID and the Hugging Face Hub repo ID that you'd like to upload to (it will be created if it doesn't exist).

python run_kaggle.py --kaggle_id evangower/airbnb-stock-price --repo_id nateraw/airbnb-stock-price

Convert Zenodo Dataset

Provide the record ID and the name of the repo on Hugging Face Hub you'd like to upload to (it will be created if it doesn't exist).

python run_zenodo.py --zenodo_record 6606485 --repo_id nateraw/espeni

For zenodo, you can also pass --workers flag if you want to do this with multiprocessing.

python run_zenodo.py --zenodo_record 6606485 --repo_id nateraw/espeni --workers 2

About

Scripts to convert datasets from various sources to Hugging Face Datasets.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 72.8%
  • Jupyter Notebook 27.2%