LLM Content Extractor

Purpose

This project is designed to automatically extract user-facing content from various file types and convert it into Markdown format. It's particularly useful to quickly generate documentation or content summaries from codebases, where user facing content is disseminated. You can then easily input that content to another LLM for ghost-writing, finding answers, irrelancies, ...

How It Works

The script scans through specified files in your project.
It uses the Anthropic API (Claude AI) to determine if a file contains user-facing content.
If the file is deemed user-facing, the script extracts the relevant content and converts it to Markdown format.
The extracted content is appended to an output.md file.
The script keeps track of processed files to avoid duplicate extractions in a extracted.csv. file.

Key Features

Automatic detection of user-facing content
Conversion of various file types to Markdown
Avoidance of duplicate processing
Utilization of advanced AI (Claude) for content extraction and summarization

Prerequisites

Python 3.x
Anthropic API key

Setup

Clone the repository:

git clone [repository-url]
cd llm_content_extractor

Set up your Anthropic API key:

export ANTHROPIC_API_KEY='your-api-key-here'

Run the setup script to create a virtual environment and install dependencies:
```
chmod +x set_up_and_run.sh
./set_up_and_run.sh
```

Usage

To extract content from files, run:

./set_up_and_run.sh file1.py file2.js file3.html

Replace file1.py, file2.js, file3.html with the paths to the files you want to process.

The script will:

Create and activate a virtual environment
Install necessary dependencies
Run the content extraction process
Deactivate the virtual environment when done

Output

Extracted content is appended to output.md in the project root.
A list of processed files is maintained in extracted.csv to avoid reprocessing.

Notes

The script uses the Anthropic API, which may incur costs. Please check your usage and pricing.
Ensure your API key is kept secure and not shared publicly.

Contributing

Contributions to improve the script or extend its functionality are welcome. Please submit pull requests or open issues for any bugs or feature requests.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
claude_content_to_md.py		claude_content_to_md.py
prompt.py		prompt.py
set_up_and_run.sh		set_up_and_run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Content Extractor

Purpose

How It Works

Key Features

Prerequisites

Setup

Usage

Output

Notes

Contributing

License

About

Releases

Packages

Languages

Lazare-42/llm-content-extractor

Folders and files

Latest commit

History

Repository files navigation

LLM Content Extractor

Purpose

How It Works

Key Features

Prerequisites

Setup

Usage

Output

Notes

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages