This project is driven by the goal of achieving continuous learning and development across specific topics of data engineering, backend engineering, and machine learning engineering.
The ultimate ambition of this endeavor is not only to maximize my learning but also to develop a personal platform tailored to overcome a subset of my everyday problems, thereby simplifying aspects of my daily life. Ideally, this platform will not only serve my needs but also offer inspiration or practical utility to others in some way.
- Modify
.env.example
into.env
, filling in the necessary keys. - Launch the development environment using
docker-compose -f docker-compose-dev.yml up --build
.
- Currently, the deployment is managed with Cloud Run connected to a Cloud SQL database. Here, a vector database extension is automatically enabled via code. Plans are underway to transition to a Kubernetes cluster, enhancing the capacity for experimentation and offering greater flexibility.
The guiding principles of this project are as follows:
- In many contexts, the real competitive edge will stem from the ability to access quality, up-to-date data. This will surpass the advantage of having superior models, as model development often begins with pre-trained models from leading companies. While there will be aspects like RAG and fine-tuning, the true competitive edge will lie in data quality and the ability to deploy these models effectively.
- The aim is to develop a cost-effective, production-robust system that enables individuals to organize and maintain their data for various projects. This system should serve as a guide for scaling these projects in the future.
- Ideally, this project will result in a "template" complete with documentation and best practices. It will focus on scaling while keeping costs lower than profits, and will document all learning experiences and decision-making rationales.
- This endeavor serves as a platform for studying, experimenting, practicing, and enhancing skills in data engineering, backend engineering, and machine learning. It aims to build a methodology and system that ensures continuous access to historical data, ready for any current or future idea or project.
- Maintaining as current documentation as possible on the theory, practice, and specific choices for this repository is a priority. This involves starting with the basics and exploring through various tools and frameworks.
- If having at least a static website was mandatory until now, I predict that in the next 20 years it will be mandatory to have one's own SaaS, platform, or web application
We welcome contributions! Feel free to open an issue or submit a pull request.
- fastapi-alembic-sqlmodel-async: Integrating FastAPI with Alembic and SQLModel for asynchronous database operations.
- fastcrud: Streamlining CRUD operations in FastAPI.
- agentkit: A toolkit for building intelligent agents.
- fastapi-best-practices
- JumpStart: A starter template for new projects.
- instagraph-nextjs-fastapi: Combining Next.js with FastAPI for Instagram-like applications.
- video-2-text: Focused on converting video content to text.