GRASP - GRAph-oriented Synthetic data generation Pipeline
-
Updated
Sep 13, 2025 - Python
GRASP - GRAph-oriented Synthetic data generation Pipeline
Following is the Open Email Marketing Dataset; you can use it without any restrictions.
An open-source collection of datasets, guides, and rankings for B2B email marketing and lead generation. Your go-to resource for sales prospecting strategies.
👨🏫This project was developed under the guidance of Mr. Lokesh Sir as part of the AI & ML Training Program. It explores LLM integration using Google Gemini APIs with a custom UI built on Streamlit.
🔧 Modular pipeline for generating high-quality, domain-specific datasets for LLM fine-tuning — from PDFs and web scraping to synthetic Q&A generation, quality filtering, and training-ready formatting.
Sample edition of The Stack Enriched: annotated, secure, and optimized code dataset, this is a sample version
Add a description, image, and links to the llm-training-data topic page so that developers can more easily learn about it.
To associate your repository with the llm-training-data topic, visit your repo's landing page and select "manage topics."