Skip to content

Latest commit

 

History

History
76 lines (57 loc) · 3.17 KB

README.md

File metadata and controls

76 lines (57 loc) · 3.17 KB

lucene2sql

Converts (a subset of) the Lucene query syntax to SQL queries.

Currently, it supports the following features:

  • Words: single words, like potato;
  • Phrases: quoted strings, such as "some stuff";
  • Named words or phrases: a word or phrase prefixed by <fieldname>:;
  • Groups: parenthesized queries of any sort;
  • Negation: negates the condition of any term by prefixing it with -;
  • Ranges: numeric ranges in the format {a TO b} or [a TO b], for exclusive and inclusive ranges.

Sequential terms are ORed together. Additionally, explicit OR and ANDs are supported.

It generates correctly typed values for booleans, integers, and otherwise, strings. All strings are searched through LIKE with leading and trailing wildcards.

Purpose

This was developed for the purpose of being used internally in the main Troposphir, hence why it uses standard I/O for input instead of command-line or similar.

Other Lucene features such as weights and fuzzy searches are unimplemented due to not being possible in naive SQL, and/or not used in the game's search feature.

Input

This program expects a JSON file in the standard input with the following properties:

  • query: A string containing the Lucene source;
  • table: The table to which search. It will be quoted, so it cannot contain a database name;
  • default_fields: An array of field names. If a word or phrase is not named, then search these fields;
  • allowed_fields (optional): If provided, should be an array of field names that will be used as a whitelist. If any extraneous fields are found in the query, the program will error out;
  • renames (optional): A mapping of field names to field names, which replaces the keys for the values, to allow a different name in the input query than the actual database;
  • expressions (optional): This one is a bit tricky. It is a pattern-matching "switch": an object, where keys are field names, and contain an array of arrays. In this array, There should be 2-item arrays, the first being a expected value, and the second being a SQL condition. When a named or default field of that key is encountered, all cases will be tried in order. If the value of the input query matches the expected value of that rule, the whole term is replaced by the SQL condition. If the rule's value is null, the rule will always match.

Output

On the standard output, this program will emit a JSON with the contents:

  • body: A string containing the SQL query, using ? as a parameter placeholder.
  • params: The parameters to bind, in order.

Example

Input

{
  "query": "(a:true blabla) -foo AND -bar deleted:true",
  "table": "stuff",
  "default_fields": ["b", "c"],
  "allowed_fields": ["a", "b", "c", "deleted"],
  "expressions": {
    "deleted": [
      [true, "`DELETED_AT` IS NULL"],
      [false, "(NOT (`DELETED_AT` IS NULL))"]
    ]
  }
}

Output

{
  "body": "SELECT * FROM `stuff` WHERE (`a` = ? OR (`b` = ? OR `b` = ?)) OR ((NOT (`a` = ? OR `b` = ?)) AND (`a` = ? OR `b` = ?)) OR `DELETED_AT` IS NULL",
  "params": [
    1,
    "blabla",
    "blabla",
    "foo",
    "foo",
    "bar",
    "bar"
  ]
}