This application is a simple console interface that allows you to ask questions and get the most relevant information from the articles stored in the database.
- Create a virtual environment by running the following command:
python3 -m venv .venv
- Activate the virtual environment by running the following command:
- For Windows:
.venv\Scripts\activate
- For MacOS/Linux:
source .venv/bin/activate
- Install the required packages by running the following command:
pip install -r requirements.txt
-
Create file in the root directory named
.env
and fill it with your own values following the provided env variables in .env.example: -
Create
urls_config.txt
file in the root directory and add the news urls (url to certain article) you want to scrape (add them line by line, use the following format http://url.com/path-to-path or https://url.com/articles/news/article). -
Setup the database by running the following command:
python3 db_engine.py
This command will process all the urls in the urls_config.txt file and store the data in the database.
In case you want to add more urls to the database, add new urls to urls_config.txt file and run the db_engine.py
again. All already processed urls will be skipped.
- Run the application by running the following command:
python3 app.py
- In the provided console interface put your question and the application will return the summary from the most relevant articles stored in databases and provide the links to the sources (Press CTRL+C to exit). In case if there are no relevant articles in the database, the application will return "I have no articles about this topic".
The application consists of the following parts:
- vector database: stores the articles in the vector form
- scraper: scrapes the articles from the provided urls, retrieves headline and content
- AI article formatter: adds summary and identifies main topics of the article
- RAG system: retrieves the most relevant articles from the database based on the user question