This plugin for Papis integrates paper-qa to allow you to use LLMs to ask questions about your library. Use it to search for documents or have it explain things to you. You can set it up to use a variety of local and online models. It is inspired by isaksamsten's excellent work on papisqa.
Papis-ask is under active development. Expect bugs and changes.
Install papis (if not already installed):
$ pipx install papis
Then inject papis-ask
:
$ pipx inject papis git+https://github.com/jghauser/papis-ask
Nix users can use the flake to create an overlay for Papis that includes Papis-ask.
Nix configuration example
{
description = "Papis-ask installation example";
inputs = {
nixpkgs.url = "github:nixos/nixpkgs/nixos-unstable";
papis-ask = {
url = "github:jghauser/papis-ask"; # Replace with actual repository
inputs.nixpkgs.follows = "nixpkgs";
};
};
outputs = { self, nixpkgs, papis-ask, ... }:
let
system = "x86_64-linux";
pkgs = import nixpkgs {
inherit system;
overlays = [
(final: prev: {
papis = prev.papis.overrideAttrs (oldAttrs: {
propagatedBuildInputs = (oldAttrs.propagatedBuildInputs or []) ++ [
papis-ask.packages.${system}.default
];
});
})
];
};
in {
# NixOS system configuration
nixosConfigurations.mySystem = nixpkgs.lib.nixosSystem {
inherit system;
modules = [
({ pkgs, ... }: {
environment.systemPackages = [
pkgs.papis
];
})
];
};
};
}
Paper-qa (and hence Papis-ask) uses liteLLM for model access, which supports various LLM providers (Ollama, OpenAI, Anthropic, Google, etc.). You'll need to set up your models and API keys following the liteLLM documentation.
Configure the following settings in your Papis configuration file:
ask-llm = "your-preferred-llm-model"
ask-summary-llm = "your-preferred-summary-llm-model"
ask-embedding = "your-preferred-embedding-model"
I've had decent success using "ollama/nomic-embed-text" to create embeddings locally.
Additionally, you can set the settings that define defaults for the plugin's arguments. See the section on commands below for further information on what these settings do.
ask-evidence-k = 10
ask-max-sources = 5
ask-answer-length = "about 200 words, but can be longer"
ask-context = True
ask-excerpt = False
Papis-ask assumes various things about the state of your library: it assumes that your pdf files contain text and that metadata is complete and correct. There are various scripts in the contrib
folder that can help you making sure the library is in a good state. Create backups and use at your own risk.
You might want to use the ocrpdf.sh
script to OCR all PDFs that are missing embedded texts. The script is semi-smart at detecting which PDFs need to be processed and doesn't mess with annotations.
The editor-author-list.py
and fix-months.sh
scripts help fix the metadata in your info.yaml
files. The first creates author_list
and editor_list
fields from author
and editor
fields, respectively. The second converts the month
fields to an integer. Additionally, I suggest to use papis doctor
to make sure the library doesn't contain any errors. Files will be indexed even if metadata is missing or false, but such mistakes might impact response quality.
Before querying, you need to index your library:
$ papis ask index
Note that this can take a long time if you're indexing your whole library. Progress is saved after each document, and it's hence possible to interrupt the commmand and continue later.
You can also index specific documents (note that this will remove documents that don't match the query from the index):
$ papis ask index "author:einstein"
Use the --force
or -f
flag to regenerate the entire index:
$ papis ask index --force
Ask questions about your library:
$ papis ask "What is the relationship between X and Y?"
Control the output format and level of detail:
$ papis ask "My question" --context/no-context # Show context for each source (default: True)
$ papis ask "My question" --excerpt/no-excerpt # Show context with excerpts (default: False)
$ papis ask "My question" --output markdown # Output format, one of terminal/markdown/json (default: terminal)
$ papis ask "My question" --answer-length short # Length of answer (default: "about 200 words, but can be longer")
$ papis ask "My question" --evidence-k 20 # Retrieve 20 pieces of evidence (default: 10)
$ papis ask "My question" --max-sources 10 # Use up to 10 sources in the answer (default: 5)
Make sure your papis library's cache is up-to-date. Run papis cache reset
when in doubt.
Papis-ask is querying Semantic Scholar for some metadata. This service is quite strictly rate-limited. Getting your own api key can help, though unfortunately there seems to be a long waitlist. Otherwise, rerunning the command is the only option at the moment.