Skip to content

Commit cf52085

Browse files
authored
Create 2025-05-09-gams-formatter.md
1 parent 3f544ea commit cf52085

File tree

1 file changed

+115
-0
lines changed

1 file changed

+115
-0
lines changed

_posts/2025-05-09-gams-formatter.md

+115
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
---
2+
title: "Attempting to write formatter for GAMS"
3+
layout: single
4+
excerpt: "and having more respect for syntax highlighting"
5+
tags: [til]
6+
---
7+
8+
My current focus at work is developing the in house linear programming solver, mostly written in [GAMS](https://www.gams.com/products/gams/gams-language/). This "Generic Algebraic Modelling System" is a high level language for describing linear optimisation problems, that it can then compile into any low level solver, such as [CPLEX](https://www.ibm.com/products/ilog-cplex-optimization-studio/cplex-optimizer).
9+
10+
As an example, say we have 5 items, $i$, and a rucksack with capacity `100`, how do we decide which items to put into the bag? Each one has a different weight, $w_{i}$ and value, $v_{i}$ and we can set up the problem by introducing a new binary variable $x_i$ to represent if we pick that item or not.
11+
12+
We can then write the objective function to maximise as:
13+
$$Z = x_1*v_{1} + x_2*v_{2} + x_3*v_{3} + x_4*v_{4} + x_5*v_{5}$$
14+
15+
subject to the constraint:
16+
$$x_1*w_{1} + x_2*w_{2} + x_3*w_{3} + x_4*w_{4} + x_5*w_{5} + \le 100$$
17+
18+
This representation obviously does not scale well with 1000s of items, so we can refactor it in GAMS like so:
19+
20+
```
21+
Z =e= sum(i, x(i) * v(i) )
22+
```
23+
24+
and
25+
26+
```gams
27+
sum(i, x(i) * w(i) ) =l= 100
28+
```
29+
30+
That all seems fairly innocuous, but when you write an application to handle complex business logic it can turn into a sea of parentheses...
31+
32+
This issue gets worse when you discover that there are no formatting rules in GAMS! Imagine my horror when trying to understand what a thousand lines like this could be doing
33+
34+
```
35+
equation_1(a)$(condition_1(i,j,t) and value_1(x) < value_2(x)).. sum((ab(a,b),bc(b,c),cd(c,d)), value(a,b,c,d)$(condition_2(a,b,c,d))) =l= 100
36+
```
37+
38+
# Topiary
39+
40+
I couldn't find a formatter, like [ruff](https://docs.astral.sh/ruff/formatter/), that could understand this GAMS code, so I made the terrible decision of trying to write my own...
41+
42+
Things started off well after finding this very recent (Jan 2025) blog [post](https://www.tweag.io/blog/2025-01-30-topiary-tutorial-part-1) about a new "universal formatting engine" called [Topiary](https://topiary.tweag.io/). Apparently, all I needed to do was use the tree-sitter package to generate a grammar.js file and I would be off and away.
43+
44+
This great [intro](https://derek.stride.host/posts/comprehensive-introduction-to-tree-sitter) to syntax trees shows how we can define rules to turn expressions like $x*y+z$ into the syntax tree below
45+
46+
<img src="https://derek.stride.host/assets/images/graphs/tree-sitter-parsing-part-6.svg">
47+
48+
Because we could parse this tree as `(x + y) + z` or `x + ( y + z)` we need to explicitly tell the parser that we prefer the left option (in the `grammar.js` this is done with `prec.left()` wrapped around the rule)
49+
50+
In GAMS you can assign a set like `x = 1`, a subset of the set `x(i) = 1` or an attribute `x.lo(i) = 1`.
51+
52+
We can set up these rules like so:
53+
54+
```javascript
55+
parameter_reference: $ => seq(
56+
$.identifer,
57+
optional(seq(token("."), $.attribute)),
58+
optional($.indexing)
59+
)
60+
61+
$.identifer: $ => /[a-zA-Z_][a-zA-Z0-9_]*/, //matches letters
62+
63+
$.attribute: $ => choice(
64+
"l",
65+
"lo",
66+
"up",
67+
"scale",
68+
"fx"
69+
),
70+
71+
$.indexing: $ => seq(
72+
"(",
73+
$.identifier,
74+
optional(seq(",", $.identifier)),
75+
")"
76+
)
77+
```
78+
79+
which should be able to handle:
80+
- `x` as `identifer`
81+
- `x(i)` as `identifier(index)`
82+
- `x.lo(i)` as `identifier.attribute(index)`
83+
84+
However, I couldn't vibe code my way past the issue that tree-sitter reads from left to right, meaning it reads the `x` in `x.lo(i)` and instantly assigns it as its own parameter_reference before waiting to read the whole `x.lo(i)`!!!
85+
86+
I tried changing the precedence of rules (as explained above) but nothing worked. Weirdly if I have something like `x.lo(i) = y.lo(i)` it parses properly on the RHS but not the LHS!!
87+
88+
At this point I gave up, because GAMS has a lot of syntax rules and I couldn't even get past `x.lo(i)`...
89+
90+
...and then we got access to GitHub copilot at work. I asked copilot to format my files for me and hey presto, I got something like this
91+
92+
```
93+
equation_1(a)$(
94+
condition_1(i,j,t)
95+
and value_1(x) < value_2(x)
96+
)
97+
..
98+
sum(
99+
(
100+
ab(a,b),
101+
bc(b,c),
102+
cd(c,d)
103+
),
104+
value(a,b,c,d)$(condition_2(a,b,c,d))
105+
) =l= 100
106+
```
107+
108+
So, who needs to meticulously build a parser these days when all the languages on the internet have been modelled in an LLM?
109+
110+
At least I now have more respect whenever I see syntax highlighting.
111+
112+
> As an aside, these types of syntax trees are useful for understanding grammar in natural languages (as I read in [The Sense of Style](https://en.wikipedia.org/wiki/The_Sense_of_Style))
113+
>
114+
> <img src="https://external-content.duckduckgo.com/iu/?u=http%3A%2F%2Fellinfobcps.weebly.com%2Fuploads%2F4%2F8%2F6%2F7%2F48674241%2F650692067.png&f=1&nofb=1&ipt=8bcc605927ef165c999ea0fd776183bde15d174b32214f0a5340ababa29c3fb5">
115+
>

0 commit comments

Comments
 (0)