Tiki is one of the most popular e-commerce websites in Vietnam. The purpose of this project is crawling as much as possible product information, then reshape them in PostgreSQL. Scripts can be fired daily to collect historical pricing data for a week.
Features:
- Database: PostgreSQL 12.1 64-bit
- Language: Python 3.7.4
- Libraries: psycopg2, requests, beautifulsoup4, smtplib
- Platform: Anaconda
- Sample Scripts:
See how it works below
Phase 01 - Gather product URLs list, apply conditions, send an email whether that condition was met.
Phase 02 - Gather 16 main-category URLs list, loop until the last page of each URL.
Phase 03 - Gather all active category Urls list, classify leaf category URLs, loop until the last page of each leaf.
Phase 04 - Get Tiki's Category and Product API.
Phase 05 - Scrapy, seller_id and configurable_product (In Progress).
Please feel free to fork, comment or give feedback to [email protected]