Stanford NLP Project Repo
VLM RAG pipeline based on CoPali.
Query: What is the hand-and-arm signal used for tuning right while driving?
VLM RAG/
│
├── benchmark_run_metrics/ # ranking metrics for benchmark
│ ├── datasetName/
│ └── metrics.json
│
├── codes/
│ ├── finetune.py # script for fine-tuning retriever using contrastive learning
│ ├── run_benchmark.py # script to run model on benchmark
│ └── utils # util functions
│
├── interpreted_output # heatmap visualizing visual attention
|
├── main/ # main rag pipeline
│ ├── dbManager.py # script for article vectorization
│ ├── gen.py # script for inference and synthetic question generation
│ ├── preprocessor.py # script for doc preprocessing
│ ├── get_data.py # scraper for evaluation set
│ └── pipeline.py # script for RAG pipeline
│
├── dmv_example.png # example image used for interpretable similarity mapping
|
├── requirements.txt # Python dependencies