Skip to content

ScraperHub/best-practices-for-scaling-your-web-scraping-projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Best Practices for Scaling Your Web Scraping Projects in 2025

We invite you to explore our blog for more details.

Setting Up Your Coding Environment

Before building the application, you’ll need to set up a basic Python environment. Follow these steps to get started:

  1. Install Python 3 on your system.
  2. Install the required dependencies by running:
python -m pip install -r requirements.txt
  1. To make the webhook publicly accessible to Crawlbase servers for demonstration purposes, install and configure ngrok.

Obtaining API Credentials

  1. Sign up for a Crawlbase account and log in.
  2. Upon registration, you’ll receive 5,000 free requests to get started.
  3. Navigate to your Account Docs and copy your Crawling API token (Normal or JavaScript requests).
  4. Create a new Crawler to start configuring your crawl tasks.

Running the Example Scripts

Before running the examples, ensure that you replace all instances of the following placeholders:

  1. <Normal or Javascript requests token> - Replace this with your Crawling API requests token.
  2. <Crawler name> - Replace this with the name of your newly created crawler. You can create or view it here.

Example Scripts

  1. Start the ngrok tunnel:
ngrok http 5768
  1. Set the callback URL:

Copy the forwarding URL provided by ngrok and paste it into the Callback URL field of your Crawler settings. Example: https://xxxx-xxx-xxx-xxx-xx.ngrok-free.app/webhook

  1. Run the Webhook HTTP server:
python webhook_http_server.py
  1. Send a crawl request (in a separate terminal):
python crawl.py

🛡 Disclaimer This repository is for educational purposes only. Please make sure you comply with the Terms of Service of any website you scrape. Use this responsibly and only where permitted.


Copyright 2025 Crawlbase