Fitness database

Welcome to the Fitness Database ETL Pipeline! This pipeline is designed to extract workout data files from Intervals.icu, transform the data and load it into a Microsoft SQL database using Python.

Project Structure

FitnessDatabase # Project folder │ ├── etl_pipeline.py # Main ETL script ├── db_connection.py # Database connection script ├── config.py # Configuration for folder paths and database settings ├── data/ # Folder containing downloaded workout data in CSV format │ └── i123456_activities.csv # Example CSV file ├── sql_docs # Folder containing SQL code to create the database ├── README.md # Documentation ├── requirements.txt # Python dependencies

Dependencies

Python 3.x
pandas
sqlalchemy
pyodbc
SQL Server
Git (optional)

Install dependencies via pip:

pip install -r requirements.txt

Explanation of Each Folder/File:

data/ folder:
- This is where you will place your CSV files, downloaded from Intervals.icu or other fitness apps.
- Each file should be named in the format i______activities.csv (e.g., i123456_activities.csv).
- This folder will serve as the source for your ETL script to find and process the data.
etl_pipeline.py:
- This is your main Python script that handles the ETL process (Extract, Transform, Load).
- It will scan the data/ folder for new CSV files, transform the data, and load it into your SQL Server database.
db_connection.py (optional):
- You can move your database connection logic to a separate file to keep things modular.
- This will help separate concerns: one script focuses on database connections, while etl_pipeline.py focuses on the ETL logic.
config.py (optional):
- This can store configuration values like your SQL Server name, database name, and file path constants.
- It’s useful to avoid hardcoding values in your main script.
sql_docs (optional)
- This is where to store the sql documents to create and query the database.
README.md:
- This file explains what your project does, how to set it up, and how to run the ETL pipeline.
- It's helpful for future reference or for others who might work with your project.
requirements.txt:
- List all Python dependencies here so that the environment can be easily replicated.

Configuration

Clone the repository to your local machine.
Make sure you have Python and the required dependencies installed.
Set up a Microsoft SQL database.
Configure the connection details. Make sure to replace the DESKTOP-CFN92PN\MSSQLSERVER2 and myFitnessApp with your own server and database name in the config.py file and replace driver=ODBC+Driver+17+for+SQL+Server with your own driver in the db_connection.py file.

Running the pipeline

Place your fitness activity CSV files in the data/ folder. These files should follow the naming convention iXXXXXX_activities.csv where XXXXXX is a 6-digit number representing the athlete ID.
Run the ETL process by executing etl_pipeline.py.

The pipeline will:

Extract data from CSV files.
Transform the data (convert time, distance, and pace).
Load the data into the FitnessData table in SQL Server.

Troubleshooting

A common error that you may encounter: "Error occurred while inserting data: (pyodbc.ProgrammingError) ...".

Make sure that the data types in your SQL table match the ones used in the CSV file and Python code. For example, ensure that the Pace column in SQL Server is set to store strings if you're saving it as 'min:sec/km'.

License

This project is licensed under the MIT License. See the LICENSE file for more information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fitness database

Project Structure

Dependencies

Explanation of Each Folder/File:

Configuration

Running the pipeline

Troubleshooting

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
sql_docs		sql_docs
.gitignore		.gitignore
README.md		README.md
config.py		config.py
db_connection.py		db_connection.py
etl_pipeline.py		etl_pipeline.py
requirements.txt		requirements.txt

PanosChatzi/FitnessDatabase

Folders and files

Latest commit

History

Repository files navigation

Fitness database

Project Structure

Dependencies

Explanation of Each Folder/File:

Configuration

Running the pipeline

Troubleshooting

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages