Skip to content

πŸ“§ A Python-based web scraping tool that recursively collects email addresses from websites. 🌐 It follows links to multiple pages and extracts emails using BeautifulSoup and regex.

License

Notifications You must be signed in to change notification settings

AdrianTomin/email-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Email Scraper Tool πŸ“§

A Python-based tool that scrapes websites to collect email addresses. Given a starting URL, this tool will recursively follow links found on the page and extract email addresses from all visited pages.

Built with:

Python

Features ✨

  • Recursive scraping: Follows links on the web pages to visit multiple pages for a thorough search.
  • Email extraction: Uses regular expressions to find and collect email addresses.
  • Easy to use: Just enter a URL, and the tool will start scraping.

Requirements πŸ› οΈ

  • python 3.x
  • requests – For making HTTP requests.
  • beautifulsoup4 – For parsing and navigating HTML.
  • lxml – An XML/HTML parser for BeautifulSoup.

Installation Guide πŸ“

1. Clone the repository:

git clone https://github.com/AdrianTomin/email-scraper.git
cd email-scraper

2. Set up a virtual environment (optional but recommended):

python -m venv venv
source venv/bin/activate  

On Windows: venv\Scripts\activate

3. Install dependencies:

The required libraries are listed in requirements.txt. You can install them using pip:

pip install -r requirements.txt

If you don't have the requirements.txt file yet, you can generate it as follows:

pip freeze > requirements.txt

4. Run the tool:

After installing the dependencies, you can run the tool by executing the following command:

python email_scraper.py

Example Output πŸ–₯️

[+] Enter url to scan: https://example.com
[1] Processing https://example.com
[2] Processing https://example.com/contact
Found emails:
info@example.com
support@example.com

Badges

MIT License

Authors

About

πŸ“§ A Python-based web scraping tool that recursively collects email addresses from websites. 🌐 It follows links to multiple pages and extracts emails using BeautifulSoup and regex.

Topics

Resources

License

Stars

Watchers

Forks

Languages