"Sigan Viendo" is a web scraping application designed to gather and analyze information from Dominican Republic government websites.
This application is designed to:
- Search for all
*.gob.do
domains which represent Dominican Republic government institutions. - Collect specific data from these websites, specifically the ID (cédula), position, and salary of public employees.
- Identify employees who work in multiple government institutions.
- Clone this repository:
git clone https://github.com/gmarte/gobdo.git
- Change into the directory:
cd gobdo
- Install the required Python packages:
pip install -r requirements.txt
- Set up the MongoDB database by following the instructions in
config.py
. - Run the application:
python src/main.py
To start the web scraping process, run the main.py
script located in the src
directory:
This will start the web scraping process and the collected data will be stored in the MongoDB database configured in config.py
.
The application is structured as follows:
src/
: Contains the Python scripts for the application.main.py
: Entry point for the application.web_scraper.py
: Contains the logic for web scraping.database.py
: Contains the logic for interacting with the database.
config.py
: Contains the configuration for the database.requirements.txt
: Contains the Python packages required for this application.
If you want to contribute to this project, please fork the repository, make your changes, and open a Pull Request.
This project is licensed under the MIT License. See LICENSE
for more details.