Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Capture groups in syntax definition files #3705

Open
SkyyySi opened this issue Mar 25, 2025 · 2 comments
Open

[Feature request] Capture groups in syntax definition files #3705

SkyyySi opened this issue Mar 25, 2025 · 2 comments

Comments

@SkyyySi
Copy link

SkyyySi commented Mar 25, 2025

Some languages require knowing a previous token to be able to correctly highlight them. For example:

  • Lua allows putting an arbitrary amount of equals signs = between the two square brackets for long strings ([[ ... ]]) and block comments (--[[ ... ]]). The equals signs must be balanced (same amount for the opening and closing token).
  • C# allows using more than tree double-quote characters """ for long strings literals.
  • Unix shells (Bash, Zsh, etc.) allow creating "heredoc" strings with the << operator:
    cat << EOF
    This will be treated as text, piped into cat's standard input.
    Variables like $x may be expanded here.
    EOF

In the case of shell heredocs, the problem is worse because the text after << can be anything.

Thus, my suggestion: Allow YAML grammar files to use captures with $1, $2 and so on, similarly to how it currently works with the replaceall command.

@Andriamanitra
Copy link
Contributor

How are you proposing that would work in practice? What would the syntax file using captures look like? How would the syntax rules know which capture they're referring to, especially when there are multiple? Could they be nested?

Personally I don't think adding more complexity to the current highlighting system is ever going to provide a satisfactory solution. Micro's way of doing regions with start and end token is not going to cut it for context-sensitive or recursive languages like typst or Python's format specification mini-language. 1 2

At some point there were plans to integrate PEG-based zyedidia/flare in micro which would provide a more robust solution, but I don't know if anyone has worked on that since. 3

Footnotes

  1. comment by Andriamanitra on Typst syntax #2780

  2. comment by Andriamanitra on [Feature Request] syntax coloring of Python f-strings by default #3605

  3. comment by zyedidia on Syntax highlighter #2464

@Andriamanitra
Copy link
Contributor

Lua allows putting an arbitrary amount of equals signs = between the two square brackets for long strings ([[ ... ]]) and block comments (--[[ ... ]]). The equals signs must be balanced (same amount for the opening and closing token).

For this case it's in practice "good enough" to hard-code couple of the first ones (that's what I did for Rust's r##".."## strings in #3192)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants