-
Notifications
You must be signed in to change notification settings - Fork 2
Home
Jesús Alberto Martínez Mendoza edited this page Mar 22, 2020
·
1 revision
This project is built with Django 3.0 and uses the following libraries:
-
beautifulsoup4
: Library for extract PDF links from Government website. -
camelot-py
: Super powerful tool to parse PDF to CSV. -
pandas
: Auxilary library to handle CSV in an easy way. -
requests
: Library to make HTTP requests.
All the libraries are found in the requirements.txt
file and can be install using the command pip install -r requirements.txt
. It's recommended to use a Virtual Environment when installing new libraries.
Data extracted from Mexican Government Daily Technical Report.
All the data mining is found in the file
scripts/fetch_data.py
. It contains all the functions to web scrap, download, parse and store in CSV format.
It can be run using Django Extensions:
python3 manage.py runscript fetch_data -v2
At the end of the script it will generate 2 filse with the confirmed and suspected cases.
Example: 2020.03.21_confirmed_cases.csv
and 2020.03.21_suspected_cases.csv