|
| 1 | +--- |
| 2 | +title: "Attempting to write formatter for GAMS" |
| 3 | +layout: single |
| 4 | +excerpt: "and having more respect for syntax highlighting" |
| 5 | +tags: [til] |
| 6 | +--- |
| 7 | + |
| 8 | +My current focus at work is developing the in house linear programming solver, mostly written in [GAMS](https://www.gams.com/products/gams/gams-language/). This "Generic Algebraic Modelling System" is a high level language for describing linear optimisation problems, that it can then compile into any low level solver, such as [CPLEX](https://www.ibm.com/products/ilog-cplex-optimization-studio/cplex-optimizer). |
| 9 | + |
| 10 | +As an example, say we have 5 items, $i$, and a rucksack with capacity `100`, how do we decide which items to put into the bag? Each one has a different weight, $w_{i}$ and value, $v_{i}$ and we can set up the problem by introducing a new binary variable $x_i$ to represent if we pick that item or not. |
| 11 | + |
| 12 | +We can then write the objective function to maximise as: |
| 13 | +$$Z = x_1*v_{1} + x_2*v_{2} + x_3*v_{3} + x_4*v_{4} + x_5*v_{5}$$ |
| 14 | + |
| 15 | +subject to the constraint: |
| 16 | +$$x_1*w_{1} + x_2*w_{2} + x_3*w_{3} + x_4*w_{4} + x_5*w_{5} + \le 100$$ |
| 17 | + |
| 18 | +This representation obviously does not scale well with 1000s of items, so we can refactor it in GAMS like so: |
| 19 | + |
| 20 | +``` |
| 21 | +Z =e= sum(i, x(i) * v(i) ) |
| 22 | +``` |
| 23 | + |
| 24 | +and |
| 25 | + |
| 26 | +```gams |
| 27 | +sum(i, x(i) * w(i) ) =l= 100 |
| 28 | +``` |
| 29 | + |
| 30 | +That all seems fairly innocuous, but when you write an application to handle complex business logic it can turn into a sea of parentheses... |
| 31 | + |
| 32 | +This issue gets worse when you discover that there are no formatting rules in GAMS! Imagine my horror when trying to understand what a thousand lines like this could be doing |
| 33 | + |
| 34 | +``` |
| 35 | +equation_1(a)$(condition_1(i,j,t) and value_1(x) < value_2(x)).. sum((ab(a,b),bc(b,c),cd(c,d)), value(a,b,c,d)$(condition_2(a,b,c,d))) =l= 100 |
| 36 | +``` |
| 37 | + |
| 38 | +# Topiary |
| 39 | + |
| 40 | +I couldn't find a formatter, like [ruff](https://docs.astral.sh/ruff/formatter/), that could understand this GAMS code, so I made the terrible decision of trying to write my own... |
| 41 | + |
| 42 | +Things started off well after finding this very recent (Jan 2025) blog [post](https://www.tweag.io/blog/2025-01-30-topiary-tutorial-part-1) about a new "universal formatting engine" called [Topiary](https://topiary.tweag.io/). Apparently, all I needed to do was use the tree-sitter package to generate a grammar.js file and I would be off and away. |
| 43 | + |
| 44 | +This great [intro](https://derek.stride.host/posts/comprehensive-introduction-to-tree-sitter) to syntax trees shows how we can define rules to turn expressions like $x*y+z$ into the syntax tree below |
| 45 | + |
| 46 | +<img src="https://derek.stride.host/assets/images/graphs/tree-sitter-parsing-part-6.svg"> |
| 47 | + |
| 48 | +Because we could parse this tree as `(x + y) + z` or `x + ( y + z)` we need to explicitly tell the parser that we prefer the left option (in the `grammar.js` this is done with `prec.left()` wrapped around the rule) |
| 49 | + |
| 50 | +In GAMS you can assign a set like `x = 1`, a subset of the set `x(i) = 1` or an attribute `x.lo(i) = 1`. |
| 51 | + |
| 52 | +We can set up these rules like so: |
| 53 | + |
| 54 | +```javascript |
| 55 | +parameter_reference: $ => seq( |
| 56 | + $.identifer, |
| 57 | + optional(seq(token("."), $.attribute)), |
| 58 | + optional($.indexing) |
| 59 | +) |
| 60 | + |
| 61 | +$.identifer: $ => /[a-zA-Z_][a-zA-Z0-9_]*/, //matches letters |
| 62 | + |
| 63 | +$.attribute: $ => choice( |
| 64 | + "l", |
| 65 | + "lo", |
| 66 | + "up", |
| 67 | + "scale", |
| 68 | + "fx" |
| 69 | +), |
| 70 | + |
| 71 | +$.indexing: $ => seq( |
| 72 | + "(", |
| 73 | + $.identifier, |
| 74 | + optional(seq(",", $.identifier)), |
| 75 | + ")" |
| 76 | +) |
| 77 | +``` |
| 78 | + |
| 79 | +which should be able to handle: |
| 80 | +- `x` as `identifer` |
| 81 | +- `x(i)` as `identifier(index)` |
| 82 | +- `x.lo(i)` as `identifier.attribute(index)` |
| 83 | + |
| 84 | +However, I couldn't vibe code my way past the issue that tree-sitter reads from left to right, meaning it reads the `x` in `x.lo(i)` and instantly assigns it as its own parameter_reference before waiting to read the whole `x.lo(i)`!!! |
| 85 | + |
| 86 | +I tried changing the precedence of rules (as explained above) but nothing worked. Weirdly if I have something like `x.lo(i) = y.lo(i)` it parses properly on the RHS but not the LHS!! |
| 87 | + |
| 88 | +At this point I gave up, because GAMS has a lot of syntax rules and I couldn't even get past `x.lo(i)`... |
| 89 | + |
| 90 | +...and then we got access to GitHub copilot at work. I asked copilot to format my files for me and hey presto, I got something like this |
| 91 | + |
| 92 | +``` |
| 93 | +equation_1(a)$( |
| 94 | + condition_1(i,j,t) |
| 95 | + and value_1(x) < value_2(x) |
| 96 | + ) |
| 97 | + .. |
| 98 | + sum( |
| 99 | + ( |
| 100 | + ab(a,b), |
| 101 | + bc(b,c), |
| 102 | + cd(c,d) |
| 103 | + ), |
| 104 | + value(a,b,c,d)$(condition_2(a,b,c,d)) |
| 105 | + ) =l= 100 |
| 106 | +``` |
| 107 | + |
| 108 | +So, who needs to meticulously build a parser these days when all the languages on the internet have been modelled in an LLM? |
| 109 | + |
| 110 | +At least I now have more respect whenever I see syntax highlighting. |
| 111 | + |
| 112 | +> As an aside, these types of syntax trees are useful for understanding grammar in natural languages (as I read in [The Sense of Style](https://en.wikipedia.org/wiki/The_Sense_of_Style)) |
| 113 | +> |
| 114 | +> <img src="https://external-content.duckduckgo.com/iu/?u=http%3A%2F%2Fellinfobcps.weebly.com%2Fuploads%2F4%2F8%2F6%2F7%2F48674241%2F650692067.png&f=1&nofb=1&ipt=8bcc605927ef165c999ea0fd776183bde15d174b32214f0a5340ababa29c3fb5"> |
| 115 | +> |
0 commit comments