The project simply builds a web crawler to check and find broken webpages across the whole website.
As a developer
I want a tool to automatically check all the webpages in the website
So that I can quickly identify if the new features or the bug fixing changes introduced to the website break any existing pages.
- All the public facing webpages in the website can be easily located and tested.
- Any error pages should be logged for further follow-ups.
In the spider class (e.g: ./mycrawler/spiders/pageavailability.py), replace the example.com
URL with a real one for crawling.
This project is tested in MacOS ONLY.
- Install Docker for Mac
- Clone this project to your local environment.
- Run
docker-compose up
from the top level directory for your project.
This docker-compose up
command will start a crawler
service and run the crawler for the specified website.