Image Analysis with Anthropic's Claude LLM

This Accelerator for Machine Learning Projects ("AMP") allows users to perform transcription and information extraction on images using Anthropic Claude models. The app covers a variety of use cases, including text extraction, complex document question-answering, and converting unstructured content into structured formats like JSON. The advantage of using Claude 3 over traditional OCR systems is that you can specify exactly what you want to transcribe due to Claude 3's advanced reasoning capabilities.

Use Cases Solved With Anthropic Vision Models

Transcribing Typed Text: Extracts typed or printed text from images into editable digital text, suitable for scanned documents or printouts.
Transcribing Handwritten Text: Converts handwritten notes into machine-readable text, aiding in digitizing personal notes or historical documents.
Transcribing Forms: Extracts data from structured forms while preserving their organization, useful for automating data entry.
Complicated Document QA: Answers questions based on the content of complex documents, suitable for contexts requiring deeper understanding.
Unstructured Information → JSON: Converts unstructured document content into a structured JSON format, ideal for turning reports or diagrams into data.
User Defined Prompts: Provides flexibility for custom prompts to process images, catering to advanced or unique use cases.

Choose Your Claude: A Model for Every Task

Several Claude models are tested and supported:

claude-3-5-sonnet-20240620
claude-3-opus-20240229
claude-3-sonnet-20240229
claude-3-haiku-20240307

Haiku: Anthropic's fastest model that can execute lightweight actions, with industry-leading speed. Ideal for quick tasks where time is of the essence.

Sonnet: The best combination of performance and speed for efficient, high-throughput tasks. Strikes a balance between speed and power, making it suitable for most general-purpose tasks.

Opus: Anthropic's highest-performing model, capable of handling complex analysis, longer tasks with many steps, and higher-order math and coding tasks. Best for situations where accuracy and depth are prioritized over speed.

Using the Application

1. Transcribing Typed Text

The app can easily extract clean and accurate text from typed or printed images, such as scanned PDFs or printouts, allowing users to quickly digitize documents.

2. Transcribing Handwritten Text

With powerful recognition capabilities, the app can process handwritten notes from images, making it possible to convert personal writings into editable, searchable digital formats.

3. Transcibing Forms

The app preserves the structure and layout of forms while extracting content, ensuring that complex tables, questionnaires, and other structured documents are accurately digitized for data processing.

4. Complicated Document QA

Users can ask specific questions about the content of a complex document, and the app leverages the Claude model's capabilities to understand and extract context-based answers from the image.

5. Unstructured Information to JSONs

The app provides the functionality to convert unstructured or free-form content from images into structured JSON data, enabling easier integration with other systems or databases.

6. User Defined Prompts

This feature allows users to input their own custom prompts for Claude to process the image in any way they need, offering advanced flexibility for various unique use cases not covered by predefined options.

7. Upload Photos

Users can easily upload images for processing, manage their image library, and view or delete existing images, enabling efficient preparation for all use cases.

Deployment

AMP Deployment Methods

There are two ways to launch this prototype on CML:

From Prototype Catalog - Navigate to the Prototype Catalog on a CML workspace, select the "Document Summarization with Gemini from Vertex AI" tile, click "Launch as Project", click "Configure Project".
As ML Prototype - In a CML workspace, click "New Project", add a Project Name, select "ML Prototype" as the Initial Setup option, copy in the repo URL, click "Create Project", click "Configure Project".

AMP Deployment

In both cases, you will need to specify the ANTHROPIC_API_KEY (steps in next section on how to create this) which enables the connection between Anthropic's API and the Application in CML.

Requirements

Setup API Key with Access to Anthropic

Navigate to https://console.anthropic.com/ and sign up for an account.

Recommended Runtime

JupyterLab - Python 3.11 - Standard - 2024.05

Resource Requirements

This AMP creates the following workloads with resource requirements:

CML Session: 2 CPU, 8GB MEM
CML Application: 2 CPU, 8GB MEM

External Resources

This AMP requires pip packages and models from huggingface. Depending on your CML networking setup, you may need to whitelist some domains:

pypi.python.org
pypi.org
pythonhosted.org
huggingface.co

Additionally, it will require access to Anthropic's Claude API. Please ensure access to Claude is whitelisted as well.

Deploying on CML

There are two ways to launch this prototype on CML:

From Prototype Catalog - Navigate to the Prototype Catalog on a CML workspace, select the "Intelligent QA Chatbot with NiFi, Pinecone, and Llama2" tile, click "Launch as Project", click "Configure Project"
As ML Prototype - In a CML workspace, click "New Project", add a Project Name, select "ML Prototype" as the Initial Setup option, copy in the repo URL, click "Create Project", click "Configure Project"

The Fine Print

IMPORTANT: Please read the following before proceeding. This AMP includes or otherwise depends on certain third party software packages. Information about such third party software packages are made available in the notice file associated with this AMP. By configuring and launching this AMP, you will cause such third party software packages to be downloaded and installed into your environment, in some instances, from third parties' websites. For each third party software package, please see the notice file and the applicable websites for more information, including the applicable license terms.

If you do not wish to download and install the third party software packages, do not configure, launch or otherwise use this AMP. By configuring, launching or otherwise using the AMP, you acknowledge the foregoing statement and agree that Cloudera is not responsible or liable in any way for the third party software packages.

Refer to the Project NOTICE and LICENSE files in the root directory. Author: Cloudera Inc.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
1_session-install-dependencies		1_session-install-dependencies
2_application		2_application
assets		assets
data		data
.project-metadata.yaml		.project-metadata.yaml
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
catalog-entry.yaml		catalog-entry.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Analysis with Anthropic's Claude LLM

Use Cases Solved With Anthropic Vision Models

Choose Your Claude: A Model for Every Task

Using the Application

1. Transcribing Typed Text

2. Transcribing Handwritten Text

3. Transcibing Forms

4. Complicated Document QA

5. Unstructured Information to JSONs

6. User Defined Prompts

7. Upload Photos

Deployment

AMP Deployment Methods

AMP Deployment

Requirements

Setup API Key with Access to Anthropic

Recommended Runtime

Resource Requirements

External Resources

Deploying on CML

The Fine Print

About

Releases

Packages

Languages

License

cloudera/CML_AMP_Image-Analysis-with-Anthropic-Claude

Folders and files

Latest commit

History

Repository files navigation

Image Analysis with Anthropic's Claude LLM

Use Cases Solved With Anthropic Vision Models

Choose Your Claude: A Model for Every Task

Using the Application

1. Transcribing Typed Text

2. Transcribing Handwritten Text

3. Transcibing Forms

4. Complicated Document QA

5. Unstructured Information to JSONs

6. User Defined Prompts

7. Upload Photos

Deployment

AMP Deployment Methods

AMP Deployment

Requirements

Setup API Key with Access to Anthropic

Recommended Runtime

Resource Requirements

External Resources

Deploying on CML

The Fine Print

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages