Merge pull request #14 from ScrapeGraphAI/pre/beta

Added Markdownify and Localscraper
ScrapeGraphAI · Dec 5, 2024 · d789f59 · d789f59
2 parents e9c852c + 5e65800
commit d789f59
Show file tree

Hide file tree

Showing 15 changed files with 679 additions and 111 deletions.
diff --git a/scrapegraph-py/CHANGELOG.md b/scrapegraph-py/CHANGELOG.md
@@ -1,3 +1,10 @@
+## [1.7.0-beta.1](https://github.com/ScrapeGraphAI/scrapegraph-sdk/compare/v1.6.0...v1.7.0-beta.1) (2024-12-05)
+
+
+### Features
+
+* add markdownify and localscraper ([6296510](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/6296510b22ce511adde4265532ac6329a05967e0))
+
 ## [1.6.0](https://github.com/ScrapeGraphAI/scrapegraph-sdk/compare/v1.5.0...v1.6.0) (2024-12-05)
 
 

diff --git a/scrapegraph-py/CONTRIBUTING.md b/scrapegraph-py/CONTRIBUTING.md
@@ -13,11 +13,36 @@ Thank you for your interest in contributing to **ScrapeGraphAI**! We welcome con
 
 ## Getting Started
 
-To get started with contributing, follow these steps:
+### Development Setup
 
 1. Fork the repository on GitHub **(FROM pre/beta branch)**.
-2. Clone your forked repository to your local machine.
-3. Install the necessary dependencies from requirements.txt or via pyproject.toml as you prefere :).
+2. Clone your forked repository:
+   ```bash
+   git clone https://github.com/ScrapeGraphAI/scrapegraph-sdk.git
+   cd scrapegraph-sdk/scrapegraph-py
+   ```
+
+3. Install dependencies using uv (recommended):
+   ```bash
+   # Install uv if you haven't already
+   pip install uv
+
+   # Install dependencies
+   uv sync
+
+   # Install pre-commit hooks
+   uv run pre-commit install
+   ```
+
+4. Run tests:
+   ```bash
+   # Run all tests
+   uv run pytest
+
+   # Run specific test file
+   uv run pytest tests/test_client.py
+   ```
+
 4. Make your changes or additions.
 5. Test your changes thoroughly.
 6. Commit your changes with descriptive commit messages.

diff --git a/scrapegraph-py/README.md b/scrapegraph-py/README.md
@@ -6,164 +6,175 @@
 [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
 [![Documentation Status](https://readthedocs.org/projects/scrapegraph-py/badge/?version=latest)](https://scrapegraph-py.readthedocs.io/en/latest/?badge=latest)
 
-Official Python SDK for the ScrapeGraph AI API - Smart web scraping powered by AI.
-
-## 🚀 Features
-
-- ✨ Smart web scraping with AI
-- 🔄 Both sync and async clients
-- 📊 Structured output with Pydantic schemas
-- 🔍 Detailed logging with emojis
-- ⚡ Automatic retries and error handling
-- 🔐 Secure API authentication
+Official Python SDK for the ScrapeGraph API - Smart web scraping powered by AI.
 
 ## 📦 Installation
 
-### Using pip
-
-```
+```bash
 pip install scrapegraph-py
 ```
 
-### Using uv
+## 🚀 Features
 
-We recommend using [uv](https://docs.astral.sh/uv/) to install the dependencies and pre-commit hooks.
+- 🤖 AI-powered web scraping
+- 🔄 Both sync and async clients
+- 📊 Structured output with Pydantic schemas
+- 🔍 Detailed logging
+- ⚡ Automatic retries
+- 🔐 Secure authentication
 
-```
-# Install uv if you haven't already
-pip install uv
+## 🎯 Quick Start
 
-# Install dependencies
-uv sync
+```python
+from scrapegraph_py import Client
 
-# Install pre-commit hooks
-uv run pre-commit install
+client = Client(api_key="your-api-key-here")
 ```
 
-## 🔧 Quick Start
-
 > [!NOTE]
-> If you prefer, you can use the environment variables to configure the API key and load them using `load_dotenv()`
+> You can set the `SGAI_API_KEY` environment variable and initialize the client without parameters: `client = Client()`
 
-```python
-from scrapegraph_py import SyncClient
-from scrapegraph_py.logger import get_logger
+## 📚 Available Endpoints
+
+### 🔍 SmartScraper
 
-# Enable debug logging
-logger = get_logger(level="DEBUG")
+Scrapes any webpage using AI to extract specific information.
+
+```python
+from scrapegraph_py import Client
 
-# Initialize client
-sgai_client = SyncClient(api_key="your-api-key-here")
+client = Client(api_key="your-api-key-here")
 
-# Make a request
-response = sgai_client.smartscraper(
+# Basic usage
+response = client.smartscraper(
     website_url="https://example.com",
     user_prompt="Extract the main heading and description"
 )
 
-print(response["result"])
-```
-
-## 🎯 Examples
-
-### Async Usage
-
-```python
-import asyncio
-from scrapegraph_py import AsyncClient
-
-async def main():
-    async with AsyncClient(api_key="your-api-key-here") as sgai_client:
-        response = await sgai_client.smartscraper(
-            website_url="https://example.com",
-            user_prompt="Summarize the main content"
-        )
-        print(response["result"])
-
-asyncio.run(main())
+print(response)
 ```
 
 <details>
-<summary><b>With Output Schema</b></summary>
+<summary>Output Schema (Optional)</summary>
 
 ```python
 from pydantic import BaseModel, Field
-from scrapegraph_py import SyncClient
+from scrapegraph_py import Client
+
+client = Client(api_key="your-api-key-here")
 
 class WebsiteData(BaseModel):
     title: str = Field(description="The page title")
     description: str = Field(description="The meta description")
 
-sgai_client = SyncClient(api_key="your-api-key-here")
-response = sgai_client.smartscraper(
+response = client.smartscraper(
     website_url="https://example.com",
     user_prompt="Extract the title and description",
     output_schema=WebsiteData
 )
-
-print(response["result"])
 ```
+
 </details>
 
-## 📚 Documentation
+### 📝 Markdownify
 
-For detailed documentation, visit [docs.scrapegraphai.com](https://docs.scrapegraphai.com)
+Converts any webpage into clean, formatted markdown.
 
-## 🛠️ Development
+```python
+from scrapegraph_py import Client
 
-### Setup
+client = Client(api_key="your-api-key-here")
 
-1. Clone the repository:
-```
-git clone https://github.com/ScrapeGraphAI/scrapegraph-sdk.git
-cd scrapegraph-sdk/scrapegraph-py
-```
+response = client.markdownify(
+    website_url="https://example.com"
+)
 
-2. Install dependencies:
-```
-uv sync
+print(response)
 ```
 
-3. Install pre-commit hooks:
-```
-uv run pre-commit install
-```
+### 💻 LocalScraper
 
-### Running Tests
+Extracts information from HTML content using AI.
 
-```
-# Run all tests
-uv run pytest
+```python
+from scrapegraph_py import Client
+
+client = Client(api_key="your-api-key-here")
+
+html_content = """
+<html>
+    <body>
+        <h1>Company Name</h1>
+        <p>We are a technology company focused on AI solutions.</p>
+        <div class="contact">
+            <p>Email: contact@example.com</p>
+        </div>
+    </body>
+</html>
+"""
+
+response = client.localscraper(
+    user_prompt="Extract the company description",
+    website_html=html_content
+)
 
-# Run specific test file
-poetry run pytest tests/test_client.py
+print(response)
 ```
 
-## 📝 License
+## ⚡ Async Support
 
-This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
+All endpoints support async operations:
+
+```python
+import asyncio
+from scrapegraph_py import AsyncClient
 
-## 🤝 Contributing
+async def main():
+    async with AsyncClient() as client:
+        response = await client.smartscraper(
+            website_url="https://example.com",
+            user_prompt="Extract the main content"
+        )
+        print(response)
+
+asyncio.run(main())
+```
 
-Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
+## 📖 Documentation
 
-1. Fork the repository
-2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
-3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
-4. Push to the branch (`git push origin feature/AmazingFeature`)
-5. Open a Pull Request
+For detailed documentation, visit [scrapegraphai.com/docs](https://scrapegraphai.com/docs)
 
-## 🔗 Links
+## 🛠️ Development
 
-- [Website](https://scrapegraphai.com)  
-- [Documentation](https://scrapegraphai.com/documentation)  
-- [GitHub](https://github.com/ScrapeGraphAI/scrapegraph-sdk)  
+For information about setting up the development environment and contributing to the project, see our [Contributing Guide](CONTRIBUTING.md).
 
-## 💬 Support
+## 💬 Support & Feedback
 
 - 📧 Email: support@scrapegraphai.com
 - 💻 GitHub Issues: [Create an issue](https://github.com/ScrapeGraphAI/scrapegraph-sdk/issues)
 - 🌟 Feature Requests: [Request a feature](https://github.com/ScrapeGraphAI/scrapegraph-sdk/issues/new)
+- ⭐ API Feedback: You can also submit feedback programmatically using the feedback endpoint:
+  ```python
+  from scrapegraph_py import Client
+
+  client = Client(api_key="your-api-key-here")
+
+  client.submit_feedback(
+      request_id="your-request-id",
+      rating=5,
+      feedback_text="Great results!"
+  )
+  ```
+
+## 📄 License
+
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
+
+## 🔗 Links
+
+- [Website](https://scrapegraphai.com)
+- [Documentation](https://scrapegraphai.com/docs)
+- [GitHub](https://github.com/ScrapeGraphAI/scrapegraph-sdk)
 
 ---
 

diff --git a/scrapegraph-py/examples/async_markdownify_example.py b/scrapegraph-py/examples/async_markdownify_example.py
@@ -0,0 +1,37 @@
+import asyncio
+
+from scrapegraph_py import AsyncClient
+from scrapegraph_py.logger import sgai_logger
+
+sgai_logger.set_logging(level="INFO")
+
+
+async def main():
+    # Initialize async client
+    sgai_client = AsyncClient(api_key="your-api-key-here")
+
+    # Concurrent markdownify requests
+    urls = [
+        "https://scrapegraphai.com/",
+        "https://github.com/ScrapeGraphAI/Scrapegraph-ai",
+    ]
+
+    tasks = [sgai_client.markdownify(website_url=url) for url in urls]
+
+    # Execute requests concurrently
+    responses = await asyncio.gather(*tasks, return_exceptions=True)
+
+    # Process results
+    for i, response in enumerate(responses):
+        if isinstance(response, Exception):
+            print(f"\nError for {urls[i]}: {response}")
+        else:
+            print(f"\nPage {i+1} Markdown:")
+            print(f"URL: {urls[i]}")
+            print(f"Result: {response['result']}")
+
+    await sgai_client.close()
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
diff --git a/scrapegraph-py/examples/localscraper_example.py b/scrapegraph-py/examples/localscraper_example.py
@@ -0,0 +1,31 @@
+from scrapegraph_py import Client
+from scrapegraph_py.logger import sgai_logger
+
+sgai_logger.set_logging(level="INFO")
+
+# Initialize the client
+sgai_client = Client(api_key="your-api-key-here")
+
+# Example HTML content
+html_content = """
+<html>
+    <body>
+        <h1>Company Name</h1>
+        <p>We are a technology company focused on AI solutions.</p>
+        <div class="contact">
+            <p>Email: contact@example.com</p>
+            <p>Phone: (555) 123-4567</p>
+        </div>
+    </body>
+</html>
+"""
+
+# LocalScraper request
+response = sgai_client.localscraper(
+    user_prompt="Extract the company description and contact information",
+    website_html=html_content,
+)
+
+# Print the response
+print(f"Request ID: {response['request_id']}")
+print(f"Result: {response['result']}")