-
Notifications
You must be signed in to change notification settings - Fork 19
Notes
I’ve chosen to implement char parsers very differently from how they work in other parser combinator libraries, and differently from how they work in Parjs v1.
While I intend to design a compatibility API that will make using them a bit more familiar, there are some crucial reasons to drift away from the previous API.
Parjs allows parsing all kinds of characters based on various definitions. However, this flexibility comes at a cost – in terms of both performance and complexity.
While it’s possible to parse arbitrary sets of Unicode characters, doing so has relatively high overhead. If you just want to parse ASCII, the machinery involved in parsing Unicode is kind of redundant
But say you want to parse ASCII, plus a limited subset of characters with diacritics, there is no need for you to deal with the sheer number and variety of Japanese Kanji.
Similarly, if you just want to parse Japanese Kanji, you don’t have to deal with the composite structure of Korean Hangul characters.
On the other hand, if you do want to parse Hangul, you still don’t have to deal with the confusing mess that is emoji variants.
As you can see, writing a truly universal character parser can get quite complicated, and that complexity will be lost on 90% of your users.