Skip to content

A simple script that preps a synonyms list for Solr (and other matchers)

License

Notifications You must be signed in to change notification settings

TechAndCheck/prepcook

Repository files navigation

Prepcook

A simple script that preps a synonyms list for Solr and the Chewy gem. Others are pretty easy to add if necessary.

Setup

This repo includes an Anaconda environment file, along with a requirements.py. I'd suggest the former, but if you want to just go for the latter that's on you.

File Format

This expects a Google Doc with the following format (just copy and paste this in as the header of a new document)

This document is to allow a collaborative curation of synonyms used in our search algorithms. Nicknames, deferential titles, etc. should be added to this as they emerge or we think of them.

Please label the headword in “Heading 2”, followed by a newline (do not put a blank line after the headword), followed by the synonyms separated by commas, followed by a newline (again, do not put a blank line yourself afterwards). Please keep everything in lower case with no punctuation.

-----

Anaconda

  1. Clone this repo $ https://github.com/TechAndCheck/prepcook.git
  2. Install Anaconda (I prefer Miniconda since it has less packages)
  3. Create the Anaconda environment by running the following in a terminal in your repo folder (this takes awhile sometimes) $ conda env create --file environment.yml

Note: If you get an error such as PackagesNotFoundError: The following packages are not available from current channels run conda config --append channels conda-forge to add conda-forge to your repositories.

Google Docs

  1. Go to the Google Developer Console and create a new project.
  2. Then go to the Google Docs API
  3. Click "Enable"
  4. This should take you back to the home page with a banner at the top and a button on the far right saying "CREATE CREDENTIALS", click that. (If you don't see it, you can go to "Credentials" on the left side.)
  5. In the drop down for "Which API are you using?" select "Google Docs API"
  6. In the "Where will you be calling this API from?" select "Other UI"
  7. In "What data will you be accessing?" select "User data"
  8. Configure the consent screen by typing in a name, I use "Prep Cook"
  9. When you're configuring everything make sure you add the Google Doc API scopes ../auth/drive.file
  10. Create an OAuth credential, selecting "Other" for client type and name it CLI (or whatever)
  11. Click "OK" after it's created.
  12. Download the credentials file by click the down arrow on the new credentials line.
  13. Then click the "Download Client Configuration" button and save the file to this repo. (DO NOT CHECK THIS IN IF YOU'RE MODIFYING ANY CODE)
  14. Rename the file to credentials.json

Running

  1. Get the document ID from Chris
  2. Run the command python prepcook.py --docid <DOCUMENT_ID>
  3. If it's your first time, the script should automatically open a website to get the OAuth credentials
  4. Go through, and yes, you want to trust PrepCook, even though Google hasn't verified it

Requirements

  • Python 3
  • Pip

This has been tested on MacOS, and should work just fine on Linux. Windows is up in the air.

Contributing

The main thing is that this uses Pylint and has a .pylintrc configuration file in the repository

It also contains an Anaconda setup, so if you use that you can do conda install --file environment.yml and it'll all be setup.

If you use just normal Pip then pip install -r requirements.txt will do the trick

Author

Christopher Guess @cguess

Lead Technologist Duke Reporters' Lab Duke University christopher.guess@duke.edu

About

A simple script that preps a synonyms list for Solr (and other matchers)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages