Skip to content

Latest commit

 

History

History
30 lines (20 loc) · 724 Bytes

README.md

File metadata and controls

30 lines (20 loc) · 724 Bytes

Internet Archive Link Extractor

This tool was built to extract external links of a website snapshots in the Internet Archive. The output can be used to perform link analysis on website.

Preparations

  1. Download or clone the project.
  2. Install requirements
pip install -r requirements.txt

Usage

Create file with list of URLs, each URL in new line. Then run the command:

python link_extractor.py -i filename

To get help about the optional parameters run:

python link_extractor.py -h

Output Format

The format of the output file is JSON. Each line in the output file represents one URL from the input file and all the external links that found in each snapshot.