Skip to content

Latest commit

Β 

History

History
185 lines (129 loc) Β· 4.6 KB

README.md

File metadata and controls

185 lines (129 loc) Β· 4.6 KB

🌐 ScrapeGraph Python SDK

PyPI version Python Support License Code style: black Documentation Status

ScrapeGraph API Banner

Official Python SDK for the ScrapeGraph API - Smart web scraping powered by AI.

πŸ“¦ Installation

pip install scrapegraph-py

πŸš€ Features

  • πŸ€– AI-powered web scraping
  • πŸ”„ Both sync and async clients
  • πŸ“Š Structured output with Pydantic schemas
  • πŸ” Detailed logging
  • ⚑ Automatic retries
  • πŸ” Secure authentication

🎯 Quick Start

from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

Note

You can set the SGAI_API_KEY environment variable and initialize the client without parameters: client = Client()

πŸ“š Available Endpoints

πŸ” SmartScraper

Scrapes any webpage using AI to extract specific information.

from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

# Basic usage
response = client.smartscraper(
    website_url="https://example.com",
    user_prompt="Extract the main heading and description"
)

print(response)
Output Schema (Optional)
from pydantic import BaseModel, Field
from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

class WebsiteData(BaseModel):
    title: str = Field(description="The page title")
    description: str = Field(description="The meta description")

response = client.smartscraper(
    website_url="https://example.com",
    user_prompt="Extract the title and description",
    output_schema=WebsiteData
)

πŸ“ Markdownify

Converts any webpage into clean, formatted markdown.

from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

response = client.markdownify(
    website_url="https://example.com"
)

print(response)

πŸ’» LocalScraper

Extracts information from HTML content using AI.

from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

html_content = """
<html>
    <body>
        <h1>Company Name</h1>
        <p>We are a technology company focused on AI solutions.</p>
        <div class="contact">
            <p>Email: contact@example.com</p>
        </div>
    </body>
</html>
"""

response = client.localscraper(
    user_prompt="Extract the company description",
    website_html=html_content
)

print(response)

⚑ Async Support

All endpoints support async operations:

import asyncio
from scrapegraph_py import AsyncClient

async def main():
    async with AsyncClient() as client:
        response = await client.smartscraper(
            website_url="https://example.com",
            user_prompt="Extract the main content"
        )
        print(response)

asyncio.run(main())

πŸ“– Documentation

For detailed documentation, visit docs.scrapegraphai.com

πŸ› οΈ Development

For information about setting up the development environment and contributing to the project, see our Contributing Guide.

πŸ’¬ Support & Feedback

  • πŸ“§ Email: support@scrapegraphai.com
  • πŸ’» GitHub Issues: Create an issue
  • 🌟 Feature Requests: Request a feature
  • ⭐ API Feedback: You can also submit feedback programmatically using the feedback endpoint:
    from scrapegraph_py import Client
    
    client = Client(api_key="your-api-key-here")
    
    client.submit_feedback(
        request_id="your-request-id",
        rating=5,
        feedback_text="Great results!"
    )

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ”— Links


Made with ❀️ by ScrapeGraph AI