Natserract AI

This repository contains code that demonstrates how to build AI assistant using Langchain, integrating GPT-4 from OpenAI. The assistant can handle question-answering (QA), provide various tools, similarity search with Doc2Vec approach, to provide answers to user queries based on the provided documents.

Throughout this journey, i use PostgreSQL as the main Database and PGVector extension to store the embeddings.

Setup

Before running the script, you need to set up the required credentials and install the necessary libraries.

Install Required Libraries

You can install the required libraries using poetry. Run the following command in your terminal or command prompt:

poetry install

Install Spacy

python -m spacy download en_core_web_sm

Setup API Keys

The script uses the OpenAI API key. You need to set these API keys as environment variables in your system. Replace OPENAI_API_KEY with your actual API keys.

Setup Database

Postgres 15
Enable the extension
```
CREATE EXTENSION vector;
```

Running

poetry shell

poetry run python main.py

Process

Demo

natserract-ai.mp4

Custom Datasets

Create _datasets directory and place all markdown documents in it.

Performance Considerations

If you need to perform this operation frequently and especially if the set of word vectors is large, it may be practical to use a database or a data store optimized for vector operations. These data stores can persist your word vectors and provide efficient similarity search functionality:

FAISS by Facebook AI Research is a library for efficient similarity search and clustering of dense vectors.
Elasticsearch has plugins like elasticsearch-vector-scoring to handle vector similarity.
Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given query point.

Using such systems can significantly speed up the similarity search process.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.idea		.idea
_datasets		_datasets
coordinators		coordinators
data_loaders		data_loaders
helpers		helpers
llms		llms
.gitignore		.gitignore
README.md		README.md
config.py		config.py
database.py		database.py
env.example		env.example
logger.py		logger.py
main.py		main.py
models.py		models.py
poetry.lock		poetry.lock
process.png		process.png
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Natserract AI

Setup

Install Required Libraries

Install Spacy

Setup API Keys

Setup Database

Running

Process

Demo

Custom Datasets

Performance Considerations

About

Releases

Packages

Languages

natserract/natserract-ai

Folders and files

Latest commit

History

Repository files navigation

Natserract AI

Setup

Install Required Libraries

Install Spacy

Setup API Keys

Setup Database

Running

Process

Demo

Custom Datasets

Performance Considerations

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages