This project is a demonstration of an automated invoice processing system. It provides the minimal features needed for automated invoice processing and can serve as a starting point for those looking to automate their own invoice workflows.
- Invoice ingestion from email attachments and manual uploads
- OCR (Optical Character Recognition) for extracting text from PDF and image invoices
- Data extraction and validation for key invoice fields
- Basic error handling and reporting
- Simple search functionality using ElasticSearch
- Web interface for uploading invoices and viewing processed results
- Backend: Python 3.x with Django
- Database: SQLite (for simplicity, can be easily switched to PostgreSQL for production)
- OCR: Tesseract (via pytesseract)
- PDF Processing: pdf2image
- Task Queue: Celery with Redis as the message broker
- Search: ElasticSearch with django-elasticsearch-dsl
- Frontend: Django templates (basic HTML/CSS)
Before you begin, ensure you have the following installed:
- Python 3.x
- pip (Python package manager)
- Redis server
- Tesseract OCR
- ElasticSearch
- (Optional) MacPorts (if you're using macOS and prefer it over Homebrew)
-
Clone the repository:
git clone https://github.com/fuchen2624/automated-invoice-processor.git cd automated-invoice-processor
-
Create a virtual environment and activate it:
python3 -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required Python packages:
pip install -r requirements.txt
-
Apply database migrations:
python manage.py migrate
-
Create a superuser for the Django admin interface:
python manage.py createsuperuser
-
Build the ElasticSearch index:
python manage.py search_index --rebuild
To enable automatic email checking for invoices, you need to set up a .env
file in the root directory of the project with your email credentials:
-
Create a file named
.env
in the project root. -
Add the following content to the file:
EMAIL_HOST=your_email_host EMAIL_USER=your_email_username EMAIL_PASSWORD=your_email_password
Replace
your_email_host
,your_email_username
, andyour_email_password
with your actual email server details. -
Make sure to add
.env
to your.gitignore
file to avoid committing sensitive information.
-
Start the Redis server (if not already running)
-
Start the Celery worker:
celery -A invoice_processor worker -l info
-
Start the Celery beat scheduler:
celery -A invoice_processor beat -l info
-
Run the Django development server:
python manage.py runserver
-
Access the application at
http://localhost:8000
- Upload invoices through the web interface at
http://localhost:8000/ingestion/upload/
- View processed invoices and search results at
http://localhost:8000/ingestion/search/
- Access the admin interface at
http://localhost:8000/admin/
to manage invoices and other data - The system will automatically check for new invoices in the configured email account at regular intervals
Contributions are welcome! Please feel free to submit a Pull Request.
This project is open-source and available under the MIT License.
This is a demonstration project and may not be suitable for production use without further development and security enhancements. Use at your own risk.