Skip to content
Jesús Alberto Martínez Mendoza edited this page Mar 22, 2020 · 1 revision

Welcome to the API-COVID-19 wiki!

Scraper Docs

Requirements

This project is built with Django 3.0 and uses the following libraries:

  • beautifulsoup4: Library for extract PDF links from Government website.
  • camelot-py: Super powerful tool to parse PDF to CSV.
  • pandas: Auxilary library to handle CSV in an easy way.
  • requests: Library to make HTTP requests.

All the libraries are found in the requirements.txt file and can be install using the command pip install -r requirements.txt. It's recommended to use a Virtual Environment when installing new libraries.

Data source

Data extracted from Mexican Government Daily Technical Report.

Data processing

All the data mining is found in the file scripts/fetch_data.py. It contains all the functions to web scrap, download, parse and store in CSV format.

It can be run using Django Extensions:

python3 manage.py runscript fetch_data -v2

Output

At the end of the script it will generate 2 filse with the confirmed and suspected cases. Example: 2020.03.21_confirmed_cases.csv and 2020.03.21_suspected_cases.csv

Clone this wiki locally