Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make StyledStrings a normal package again #57998

Open
KristofferC opened this issue Apr 3, 2025 · 9 comments
Open

Make StyledStrings a normal package again #57998

KristofferC opened this issue Apr 3, 2025 · 9 comments
Labels
excision Removal of code from Base or the repository latency Latency

Comments

@KristofferC
Copy link
Member

KristofferC commented Apr 3, 2025

StyledStrings makes writing colored text more convenient, but its design incurs significant latency for nearly all Julia users. This is partly due to its type piracy (as reported in JuliaLang/StyledStrings.jl#61) and untyped internal fields. I have debugged countless latency issues, and it is quite common that the root cause ends up being simply that StyledStrings get loaded (which now happens when just opening the REPL) and due to its type piracy invalidates many chunks of precompiled code. For instance, see this discussion, where the first call to Pkg.add becomes three times slower solely because StyledStrings is loaded. Similar issues have also impacted Julia's parallel precompilation, as detailed here.

Latency is arguably one of the biggest concerns in Julia, so any regressions must be taken very seriously. In the #ttfx Slack channel, many users mentioned that they continue using Julia v1.10 because of its generally better latency. Perhaps coincidentally, StyledStrings was added in version 1.11. The type piracy issue was reported a year ago, and there have been attempts (by contributors other than the original author) to work around it, the latest being this pull request. However, these feel to me like trying to plaster a fix on something that is quite fundamentally broken.

There has also been talk about using StyledStrings more extensively to render stacktraces and error messages etc. However, due to the untyped fields and dynamic dispatch this seems quite scary with the new interest in trimming to produce small binaries. AFAIU, the very dynamic design of StyledStrings makes them hard to support together with trimming.

My patience with debugging these latency issues has run out. I keep encountering the same problems repeatedly, and it is no longer fun. The benefit of having syntax highlighting in markdown is, in my opinion, vastly outweighed by the cost of imposing extra latency on all Julia users to the point that they may hesitate to upgrade to newer Julia versions. Therefore, I argue that it is time to remove the (documented as experimental) StyledStrings from being a standard library and treat it as a normal package. This change could even improve the latency for packages that depend on StyledStrings, as the AnnotatedX types and AnnotatedIOBuffer would then reside solely within the StyledStrings package, eliminating type piracy.

This was a worthwhile experiment, but it ultimately did not pan out, and it would be unfortunate to remain stuck in a bad situation indefinitely.

@KristofferC KristofferC added the latency Latency label Apr 3, 2025
@jakobnissen
Copy link
Member

You're right the increased latency is a big usability regression, and I empathize with your frustration having to debug recurrent latency issues. Thank you for the work you're putting in to make Julia's user experience better.

You suggest two latency sources in StyledStrings: Its piracy, and its poorly inferred code. While they lead to the same consequence - increased latency - I believe we should consider them separately, because they have different implications for how this issue should be solved.

If the problem is piracy, then the solution is to remove piracy in StyledStrings. If this is not possible (it could be StyledStrings fundamentally relies on pirating the Base.Annotated* types), then yes, StyledStrings is fundamentally misdesigned and I agree it should be removed. Or maybe redesigned, if that's remotely possible.

However, if the main cause of latency is poor inference, then I think StyledStrings is not ultimately the culprit but just "the messenger", of a deeper issue. That would lead to a broader conclusion, namely that it's simply impossible to depend on dynamically typed Julia code because it will trash its responsiveness. That is - REPL (or anything else) can't be deeply dynamic, and we can't ever improve or extend REPL using dynamic Julia code.

If the latter is the case, then it is indeed unfortunate that we are stuck in a bad situation indefinitely, but then the situation we're stuck in has little to do with StyledStrings.

Either way, the first step would be to either fix the type piracy as you pointed out, if possible), or remove StyledString from REPL if not. If the latency issues remain after type piracy is fixed, implying the latency is mostly due to badly inferred code, then I think we should keep StyledString and hope the issue is fixed at a deeper level.

@topolarity
Copy link
Member

You're right the increased latency is a big usability regression, and I empathize with your frustration having to debug recurrent latency issues. Thank you for the work you're putting in to make Julia's user experience better.

You suggest two latency sources in StyledStrings: Its piracy, and its poorly inferred code. While they lead to the same consequence - increased latency - I believe we should consider them separately, because they have different implications for how this issue should be solved.

If the problem is piracy, then the solution is to remove piracy in StyledStrings.

I agree - the solution in my mind is #56194. There are some minor challenges there regarding how to manage display state for an AnnotatedString with multiple different types of annotations, and it also moves the majority of the AnnotatedString display code into Base (where it is required to be by type-piracy)

That said, I've been considering a more --trim-compatible solution, that also fixes the dynamicism / inference issues, to fully-parameterize AnnotatedString{A,S} where the A represents the type of an annotation. (A would probably be required to be a concrete type - mixing AnnotatedString would not promote, it would error)

That allows StyledStrings to legally provide its own StyleAnnotation type and implement printing / display for it via Base.write(io::IO, s::AnnotatedString{AnnotationType}) - without the need for a "display protocol" and without bad inference.

Either way, the first step would be to either fix the type piracy as you pointed out, if possible), or remove StyledString from REPL if not. If the latency issues remain after type piracy is fixed, implying the latency is mostly due to badly inferred code, then I think we should keep StyledString and hope the issue is fixed at a deeper level.

Totally agreed.

In my mind, we were always committed to fixing this type-piracy problem in StyledStrings for 1.12, but it sounds like the time is now to take action. We have three decent pathways I think:

  1. Implement the display protocol, as in Add AnnotatedString display interface (eliminate StyledStrings type-piracy) #56194
  2. Make AnnotatedString parameterized on AnnotationType (as above)
  3. Move the AnnotatedString machinery entirely to StyledStrings and simply opt out of the AbstractString interface

Any of these options are fine.

I think what we need is for @tecosaur to choose a pathway forward for us (and then I can help implement it, if needed). We've been stalled on "that leaves us with x,y,z options to fix" for 6 months now - It's time to just pick a path and move forward.

@KristofferC
Copy link
Member Author

If the problem is piracy, then the solution is to remove piracy in StyledStrings.

I have had this conversation probably 10 times... First, moving the AnnotatedX stuff from Base to StyledStrings always gets blocked on the claim that this function needs to exist in Base and have access to the Annotated types:

julia/base/strings/basic.jl

Lines 270 to 271 in f211a77

_isannotated(S::Type) = S != Union{} && (S <: AnnotatedString || S <: AnnotatedChar)
_isannotated(s) = _isannotated(typeof(s))

for StyledStrings to inject itself in some places to "win" vs String, e.g:

julia/base/strings/io.jl

Lines 354 to 358 in f211a77

if isconcretetype(et) && !_isannotated(et) && !any(_isannotated, args)
sprint(join, iterator, args...)
else
io = AnnotatedIOBuffer()
join(io, iterator, args...)
.

However, looking at the uses of StyledString in the package ecosystem (which is only a few places actually) it is in all places I can find used as a "glorified printstyled", so removing the join functionality doesn't actually make any difference to anyone in practice. Moving more stuff into Base gets blocked on that the core developers don't want to add a bunch of extra cruft to Base.

or remove StyledString from REPL

This also requires removing StyledString from Markdown, which at that point it has almost no use. So, at that point, it can just be made its own package?

Move the AnnotatedString machinery entirely to StyledStrings

Yes, that's great and then it can also just be moved out of being an stdlib? This would be so much better for the package that can version itself independently, and the small loss of functionality we get in the stdlibs is not that hard to implement in other ways. For example, we have a LLVM syntax highlighter in InteractiveUtils without having to add a whole extra stdlib with theming support etc.

@KristofferC
Copy link
Member Author

KristofferC commented Apr 3, 2025

That said, I've been considering a more --trim-compatible solution, that also fixes the dynamicism / inference issues, to fully-parameterize AnnotatedString{A,S} where the A represents the type of an annotation. (A would probably be required to be a concrete type - mixing AnnotatedString would not promote, it would error)

Design issues like this are great to flesh out in a normal package.

@tecosaur
Copy link
Contributor

tecosaur commented Apr 3, 2025

In my mind, we were always committed to fixing this type-piracy problem in StyledStrings for 1.12, but it sounds like the time is now to take action.

This is the impression I have too, the piracy/invalidation isn't "fine/acceptable". I also don't like something I've put so much time into having a major caveat, the way StyledStrings currently does.

To elaborate on what I meant in #57997 (comment) when I said I have trouble identifying "actionable" ways for me to fix the issue, that's based on these three approaches I had in mind for fixing the piracy:

  1. Moving the display code to Base, next to live with the types we define.
    Issue: I don't feel like this is something I can do given this split was a triage decision.
  2. Modifying the way that display is done so that being split doesn't cause type piracy.
    Issue: I don't have a clean way of doing so in my mind.
    Saving grace: Cody's done some thinking and worked out an approach for this.
  3. Making String less privileged in Base, so a non-Base string type can be treated as a 1st class AbstractString on par with String
    Issue: This is quite a bit of work, and would also need a lot of careful thinking

(1) is a technically easy fix that I'd be happy with, but as mentioned by Kristoffer last this was brought up Triage thought it was too much cruft in Base. (2) is what Cody and I are currently trying, and (3) is the idea I like the most (it would mean that _isannotated is no longer needed in Base) but it's not something I feel up to implementing.

We have three decent pathways I think:

  1. Implement the display protocol, as in Add AnnotatedString display interface (eliminate StyledStrings type-piracy) #56194
  2. Make AnnotatedString parameterized on AnnotationType (as above)
  3. Move the AnnotatedString machinery entirely to StyledStrings and simply opt out of the AbstractString interface

Any of these options are fine.

I think what we need is for @tecosaur to choose a pathway forward for us (and then I can help implement it, if needed)

I've been cautiously onboard with (1) since discussing it with you, and find (2) an interesting option. It seems like going with (1) for now doesn't block moving to a better solution in the future, if I interpret the approach correctly?


However, looking at the uses of StyledString in the package ecosystem (which is only a few places actually) it is in all places I can find used as a "glorified printstyled", so removing the join functionality doesn't actually make any difference to anyone in practice.

That's not surprising, it's the most simple use of the library and currently there's not a lot of examples of fancier usage (mostly un-PR'd WIP code I've written and demoed in Zulip, I'd wager).

FWIW, having a quick look currently _isannotated is used to support AnnotatedStrings in the following places:

  • *
  • join
  • lpad/rpad (could be alternatively implemented via a small pile of method specialisations)
  • logging

I have tried to make use of this functionality in more key places, like stacktraces, but that's been ruled out by the display code living outside Base.

Yes, that's great and then it can also just be moved out of being an stdlib? This would be so much better for the package that can version itself independently.

I do still think this functionality is worth keeping if we can have it without the latency/invalidation headaches, to allow it to not just be used as it currently is with REPL/Markdown for more components in 1.13/1.14. If all the value was syntax highlighting, I'd more easily agree that it's not worth it.

@topolarity
Copy link
Member

Moving the display code to Base, next to live with the types we define.
Issue: I don't feel like this is something I can do given this split was a triage decision.

FWIW triage decisions are always re-negotiable, especially in light of new evidence / motivations

Type-piracy, as has become clear, is not an acceptable compromise to avoid bloating Base. The correct solution in that situation is not to take on the library, to accept the Base bloat, or to fix the type-piracy.

Issue: I don't feel like this is something I can do given this split was a triage decision.
Issue: I don't have a clean way of doing so in my mind.
Issue: This is quite a bit of work, and would also need a lot of careful thinking

If there's no actionable solution, @KristofferC is right that the path forward to revert triage's decision and remove the stdlib.

It'd be more helpful to choose / commit to / develop a solution, rather than re-explain the existing challenges (which I agree have been quite sticky)

@tecosaur
Copy link
Contributor

tecosaur commented Apr 3, 2025

It'd be more helpful to choose / commit to / develop a solution, rather than re-explain the existing challenges (which I agree have been quite sticky)

Regarding the three approaches I mentioned, I think long-term we want either (1) the display code to live in Base, or (3) the entire implementation lives out of Base. I'm interested in (2), but primarily as a stop-gap solution since I don't see a clear interface/API that can be drawn (though I very much appreciate your efforts to make an interface that works for getting rid of inference problems ❤).

So, this depends on whether or not the long-term plan is around:

  • Evicting all fancy printing -type stuff from Base (while discussing StyledStrings on Triage, I heard that this is the direction to move towards), i.e. just plain black and white text with no decoration
  • Whether it would be fine to have nicer stacktraces/errors/etc. in the REPL compared to outside the REPL via a parallel implementation.

If the answer is that these things will/can happen, then I think (3) is most suitable, otherwise I lean towards (1).

Technically (1) is easy. It's the most straightforwards/obvious fix, it just requires Triage to be ok with it.

While I said (3) requires an investment of technical effort, I'm willing to work on this with someone.

If we're working towards (3), I think going through with (2) / #56194 for 1.12 makes sense to solve the piracy issues in the meantime.

@KristofferC
Copy link
Member Author

FWIW, having a quick look currently _isannotated is used to support AnnotatedStrings in the following places:

Which is not a lot. I dare to say it isn't used anywhere (where it couldn't be easily worked around). As a first step, we could move all AnnotatedX code to StyledStrings, which would remove the type of piracy and likely fix the latency, with the drawback that the package will work like it does in 1.10 (no * and join). I don't see any issues in the repo complaining about the lack of these methods on 1.10 so maybe they are not very important.

Just to give some background on why this is such a big thing. Julia's main problem has almost always been latency. If Julia's latency were completely fixed, it would be a much better language. People chose not to use Julia due to its latency. To improve this, there have likely been many years of developer effort aimed at reducing latency in various ways. For someone to then put in a package that does "fancy printing" (which we to some extend already had) at the cost of undoing a lot of this latency work is extremely provocative, especially when the issue is known and not dealt with. I know if I had made a change with these latency effects I would have either immediately reverted it or it would have been my number one priority to fix. People are already starting to stay on old julia versions because it has better latency, making all the work we do here not reach users.

Something has to be done soon, and if nothing happens from other persons, I will have to push something through because it really has reached the point where this is not ok anymore.

@tecosaur
Copy link
Contributor

tecosaur commented Apr 8, 2025

As a first step, we could move all AnnotatedX code to StyledStrings, which would remove the type of piracy and likely fix the latency, with the drawback that the package will work like it does in 1.10 (no * and join).

I follow you here, but these are tangibly useful in making other string methods built on them (such as the recently introduced ltruncate / rtruncate methods) generically work.

I don't see any issues in the repo complaining about the lack of these methods on 1.10 so maybe they are not very important.

JuliaLang/StyledStrings.jl#102 is one.

That said, to enable work like Zulip #repl > Revamped REPL history @ 💬 to actually make its way to REPL, I would say this is a better option than tossing out StyledStrings entirely: the stdlib can use annotated_ltruncate etc. if it comes to it, as much as I dislike no longer being able to rely on generic methods that work.


Just to give some background on why this is such a big thing. [...]

I think my responses around this in the past may (on GitHub and Slack) may have misleadingly seemed as though I wasn't bothered about the latency impacts of the implementation split.

To be clear here, on the contrary I very much am. The implementation split in its current form is non-viable due to latency, makes it very difficult for a solid chunk of the value of this system to be realised, and is otherwise a pain. I have never been ok with this. In my mind, the long-term future of this feature is with the display code together with the types.

If I could have my way, I would have put the display code needed to avoid invalidation and piracy in Base, while keeping the API responsibility under StyledStrings so we're not committing to this particular arrangement long-term. There are four simple reason why this has not happened: (1) Triage initially decided on the split, (2) I can no longer make Triage meetings, (3) I don't feel this is a change I can unilaterally make, and (4) nobody else has pushed for this.

Something has to be done soon, and if nothing happens from other persons, I will have to push something through because it really has reached the point where this is not ok anymore.

This is completely reasonable, and I'm sorry you've had so many headaches from this. As you know, Cody and I started talking about a stop-gap/middle-ground solution to the piracy/invalidation in October last year which I was keen to explore, but unfortunately I ended up becoming particularly busy around November-December last year and then was overseas for an extended period.

Prompted by you re-raising this in #54561 (comment) this conversation has now started up again, and we are actively working on getting the piracy resolved. #57912 is the start of this work, I've made the matching change in StyledStrings.jl, and we're currently figuring out what needs to be tweaked in #56194.

Completing this effort is a priority of mine.

Longer term, I do want to get all the display code and types together, but I think this is the right (minimal viable) approach to stop being part of the latency problem in 1.12.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
excision Removal of code from Base or the repository latency Latency
Projects
None yet
Development

No branches or pull requests

5 participants