Add extra lsp features #204

JonasAlaif · 2024-02-15T10:32:22Z

No description provided.

…ht"" This reverts commit 20485bf.

LaurenzV · 2025-02-22T09:56:33Z

@rayman2000

So, I believe that I've found three issues, some of which I'm not sure what the best way of fixing is, for now I've just implemented "hacky" workarounds to demonstrate that these should fix the issue.

Current Architecture

Just to briefly summarize my understanding of the current architecture: At the start of the communication chain, we have two "message reporting tasks", namely verification tasks and AST construction tasks. Those tasks regularly send messages to the "reporter (such as when exception occurs, when a new state is reached, etc.), which in this case is a Queue Actor. Once they send a message. The queue actor receives those messages and puts them into a queue. Others can "subscribe" to those queues and receive message there. In our case, this is what the "RelayActor" does, who received those messages and will use them to update the state in the LSP server.

Problem 1: Dead letters

The first problem I noticed is that I always kept getting those "dead letter" messages, even for very simple verification tasks. If I understood correctly, the reason for this is that at some point, we destroy the relay actor responsible for receiving the messages from AST constructions:

viperserver/src/main/scala/viper/server/vsi/VerificationServer.scala

Lines 168 to 173 in f5e2501

    
           protected def discardAstOnCompletion(jid: AstJobId, jobActor: ActorRef) = { 
        
             ast_jobs.lookupJob(jid).map(_.map(_.queue.watchCompletion().onComplete(_ => { 
        
               ast_jobs.discardJob(jid) 
        
               jobActor ! PoisonPill 
        
             }))) 
        
           }

We only do this until the queue is completed (i.e. AST construction finished), but apparently akka still sends a final "Success" message after the watchOnCompletion callbacks have been executed, and since we kill the actor in this callback, the letter cannot be delivered anymore. For now, I've just commented out the part where we kill the actor, but obviously this will require a more principled approach for solving the problem. And by the way, I might have missed something, but when debugging it seemed to me like we never kill the verification relay actors that we create, so that might also be something to look into.

Problem 2: IDELanguageClient

Currently, in the client coordinator, we store the IdeLanguageClient as an option which is set once on initialization, and set to null on exiting:

viperserver/src/main/scala/viper/server/frontends/lsp/ClientCoordinator.scala

Lines 22 to 44 in f5e2501

    
           class ClientCoordinator(val server: ViperServerService)(implicit executor: VerificationExecutionContext) { 
        
             val localLogger: Logger = ClientLogger(this, "Client logger", server.config.logLevel()).get 
        
             val logger: Logger = server.combineLoggers(Some(localLogger)) 
        
             private var _client: Option[IdeLanguageClient] = None 
        
             private val _files: ConcurrentMap[String, FileManager] = new ConcurrentHashMap() // each file is managed individually. 
        
             private var _vprFileEndings: Option[Array[String]] = None 
        
             private var _exited: Boolean = false 
        
             def client: IdeLanguageClient = { 
        
               _client.getOrElse({assert(false, "getting client failed - client has not been set yet"); ???}) 
        
             } 
        
             def setClient(c: IdeLanguageClient): Unit = { 
        
               assert(_client.isEmpty, "setting client failed - client has already been set") 
        
               _client = Some(c) 
        
             } 
        
             /** called when client disconnects; the server should however remain running */ 
        
             def exit(): Unit = { 
        
               _exited = true 
        
               _client = None 
        
               _files.clear() 
        
             }

This is a bit of a problem because during tests because it can happen that we disconnect while a verification or something else is still running, and thus each call to client would lead to an assertion error, resulting in a whole array of very weird error messages that would suddenly appear. The way I solved this for now is that we don't store the language client as an option, but instead we use a MockClient like it is used in testing when resetting the coordinator. This solves the error messages for me, so I believe this approach can be used directly (though the MockClient should obviously be deduplicated).

Problem 3: Terminating a null job actor

The third problem that caused the stress test "quickly switching backends" to get stuck when a file was verified that doesn't reach the verification stage (e.g. due to a parsing error) is due to this part:

viperserver/src/main/scala/viper/server/frontends/lsp/ViperServerService.scala

Lines 117 to 125 in f5e2501

    
           private def stopOnlyVerification(handle: VerHandle, combinedLogger: Logger): Future[Boolean] = { 
        
             implicit val askTimeout: Timeout = Timeout(config.actorCommunicationTimeout() milliseconds) 
        
             val interrupt: Future[String] = (handle.job_actor ? StopVerification).mapTo[String] 
        
             handle.job_actor ! PoisonPill // the actor played its part. 
        
             interrupt.map(msg => { 
        
               combinedLogger.info(msg) 
        
               true 
        
             }) 
        
           }

The problem is that we set the job actor of this verification to null when AST creation didn't complete:

viperserver/src/main/scala/viper/server/vsi/VerificationServer.scala

Line 94 in f5e2501

Future.successful(VerHandle(null, null, null, prev_job_id_maybe))

And thus, we try to terminate a job actor that is null, which results in the server completely hanging. I fixed this by adding a simple null check before doing so, I believe this can be taken as is.

I've pushed my changes here, though as I said this still needs cleaning up and the first problem requires a more principled solution: LaurenzV@99b5bd4

If I run the Viper IDE tests using those changes, all tests seem to pass for me locally. :)

rayman2000 · 2025-02-23T08:26:30Z

@LaurenzV
So problem 1 is somewhat unrelated to this pull request right? Since we do not have an easy solution, I would probably move this to a separate issue in the interest of getting things merged quickly.

Problem 3 seems to be the main thing we were looking for, great that you figured that out and that the solution is quite simple!

I am not entirely sure if problem 2 is new / related to the current issues. I do not love the solution of using a MockClient outside of tests. Can this only occur during tests?

LaurenzV · 2025-02-23T08:33:46Z

So problem 1 is somewhat unrelated to this pull request right? Since we do not have an easy solution, I would probably move this to a separate issue in the interest of getting things merged quickly.

It's probably unrelated to the specific test failure, yes, but it's not unrelated to the PR since afaik it was introduced here in the first place. But it probably should still be fixed before this PR is merged, it's an issue after all, and I'm not sure whether it could lead to other weirdness.

I am not entirely sure if problem 2 is new / related to the current issues. I do not love the solution of using a MockClient outside of tests. Can this only occur during tests?

Same as above, it might not be related to that issue specifically, but I still think it should be addressed before merging.

WIth that said, if you want I can try whether just applying solution 3 fixes all of the test failures.

rayman2000 · 2025-02-23T09:31:10Z

Ah I was assuming (for no reason) that at least Problem 1 was previously existing. Yeah if these are regressions then we should definitely address them!

rayman2000 · 2025-02-24T14:01:47Z

OK I have talked with @ArquintL and decided that we prefer following solution to problem 2: instead of setting _client to MockClient, we change the return value of client to be an Option. This forces callers to decide what to do if the client is None, instead of silently dropping things.

LaurenzV · 2025-03-01T15:29:49Z

@rayman2000 I took a look, and indeed it seems like on the current main branch, relay actors are never killed. I verified this by adding

    override def postStop(): Unit = {
      println("XActor killed")
    }

to the respective classes, for queue actors the message is printed after each parse/verification step, while for the relay actor such a message is never printed. So I guess if we decide to just not kill the actor, it at least wouldn't be a regression, but obiously it would still be better to fix the underlying problem.

rayman2000 · 2025-03-05T15:06:14Z

@JonasAlaif and I have decided to just not kill the actors. The effort to fix this seems pretty large (mostly because no one really understands the actor model we use...) and it is not a regression, the issue existed beforehand as well. While having unnecessary actors lying around is not ideal, for known usages of ViperServer this has not been an issue so far.

rayman2000 · 2025-03-05T16:41:59Z

Linard and I have done some more digging and found some more issues. First of all, I get stracktraces when I verify multiple files after each other, one of the gets called here is called on None:

viperserver/src/main/scala/viper/server/frontends/lsp/file/ProjectManager.scala

Line 70 in c659916

if (uri == file.file_uri) this else project.left.toOption.get.get(uri).get

Second of all, we printed out all of the existing actors over time. It looks like each edit of a file causes a new AST Actor to be created and they are never removed, even with the Poison Pills added back in. So the list of actors grows fairly quickly if you just edit a file a couple of times. This feels bad enough that probably do not want to just merge it as is.
Having an actor for every AST job has always seemed like a questionable design decision, but now that we have lots and lots of AST jobs, we probably have to change this before this can reasonably be merged...

LaurenzV · 2025-03-05T16:54:26Z

So should I prioritize looking into this over trying to rebase my existing branches on this one?

rayman2000 · 2025-03-25T18:28:45Z

All right, gang, it's update time!
So, I think I have fixed the actor mess enough for this to progress. The final problem was a juicy race condition if you cancel an AST job and it finishes at the same time. This led to the cancellation waiting for a response from the already poisoned job actor. It now no longer waits, this means we will get a couple of dead letter messages (which is fine in my opinion, it only happens for this race condition.
So now:

I think all the actors are killed appropriately and the reflection hack to check this seem to confirm.
The IDE tests run through for me.

@JonasAlaif the issue here is still open:

viperserver/src/main/scala/viper/server/frontends/lsp/file/ProjectManager.scala

Line 70 in c659916

if (uri == file.file_uri) this else project.left.toOption.get.get(uri).get

This is all your changes and is not akka related, so I would be grateful if you could take a look. One of the gets is called on None if I quickly click between multiple files in the IDE.

ArquintL

I only made a quick pass over the changes and left some comments

src/main/scala/viper/server/core/ViperCoreServerUtils.scala

src/main/scala/viper/server/frontends/lsp/ClientCoordinator.scala

src/main/scala/viper/server/frontends/lsp/Common.scala

src/main/scala/viper/server/vsi/VerificationServer.scala

src/main/scala/viper/server/frontends/lsp/ast/utility/GotoDefinition.scala

src/main/scala/viper/server/frontends/lsp/ast/utility/FoldingRange.scala

src/main/scala/viper/server/frontends/lsp/ViperServerService.scala

Co-authored-by: Linard Arquint <ArquintL@users.noreply.github.com>

JonasAlaif · 2025-04-16T07:37:42Z

@ArquintL come chat about the remaining two points

JonasAlaif · 2025-04-16T09:03:03Z

@rayman2000 done with the review, happy to merge if you're ready

JonasAlaif added 2 commits February 15, 2024 11:31

Revert "Revert "Merge pull request #201 from viperproject/sem-highlig…

6173a4b

…ht"" This reverts commit 20485bf.

Reorder VerificationState enum to be compatible with old viperserver

e234e84

rayman2000 mentioned this pull request Apr 2, 2024

Update to new diagnostics format and shorten status bar messages viperproject/viper-ide#433

Open

Dspil and others added 3 commits August 22, 2024 14:51

Merge branch 'master' into lsp

a033f91

Merge remote-tracking branch 'origin/master' into lsp

5f19c3e

Removed parentheses?

f5e2501

Fixes

c659916

rayman2000 added 3 commits March 25, 2025 14:45

New removal of RelayActors

a94882d

Kill ast job actors properly?

8a923ee

Removed akka ask for interupting ast jobs

8d7c8ae

ArquintL approved these changes Apr 3, 2025

View reviewed changes

JonasAlaif and others added 7 commits April 7, 2025 17:19

Handle incomplete project due to termination plugin imports

1fad671

Apply suggestions from code review

9e2815f

Co-authored-by: Linard Arquint <ArquintL@users.noreply.github.com>

Cleanup and fix some bugs

6768f9d

Properly handle closing of files in the ide

f30cff4

Clean up specifying file-to-be-verified

27b439b

Fix hover hints in imported files and correctly get last root

4a92f24

Remove todos

12662aa

Address review comment

121d530

Merge branch 'master' into lsp

2386f45

JonasAlaif force-pushed the lsp branch from a8adb90 to 2386f45 Compare April 16, 2025 09:05

Update code for latest silver

b69102e

rayman2000 added this pull request to the merge queue Apr 17, 2025

Merged via the queue into master with commit 0195c6f Apr 17, 2025
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add extra lsp features #204

Add extra lsp features #204

JonasAlaif commented Feb 15, 2024

LaurenzV commented Feb 22, 2025

rayman2000 commented Feb 23, 2025

LaurenzV commented Feb 23, 2025 •

edited

Loading

rayman2000 commented Feb 23, 2025

rayman2000 commented Feb 24, 2025

LaurenzV commented Mar 1, 2025

rayman2000 commented Mar 5, 2025

rayman2000 commented Mar 5, 2025

LaurenzV commented Mar 5, 2025

rayman2000 commented Mar 25, 2025

ArquintL left a comment

JonasAlaif commented Apr 16, 2025

JonasAlaif commented Apr 16, 2025

Add extra lsp features #204

Add extra lsp features #204

Conversation

JonasAlaif commented Feb 15, 2024

LaurenzV commented Feb 22, 2025

Current Architecture

Problem 1: Dead letters

Problem 2: IDELanguageClient

Problem 3: Terminating a null job actor

rayman2000 commented Feb 23, 2025

LaurenzV commented Feb 23, 2025 • edited Loading

rayman2000 commented Feb 23, 2025

rayman2000 commented Feb 24, 2025

LaurenzV commented Mar 1, 2025

rayman2000 commented Mar 5, 2025

rayman2000 commented Mar 5, 2025

LaurenzV commented Mar 5, 2025

rayman2000 commented Mar 25, 2025

ArquintL left a comment

Choose a reason for hiding this comment

JonasAlaif commented Apr 16, 2025

JonasAlaif commented Apr 16, 2025

LaurenzV commented Feb 23, 2025 •

edited

Loading