This tool was built to extract external links of a website snapshots in the Internet Archive. The output can be used to perform link analysis on website.
- Download or clone the project.
- Install requirements
pip install -r requirements.txt
Create file with list of URLs, each URL in new line. Then run the command:
python link_extractor.py -i filename
To get help about the optional parameters run:
python link_extractor.py -h
The format of the output file is JSON. Each line in the output file represents one URL from the input file and all the external links that found in each snapshot.