Skip to content

Conversation

dblnz
Copy link
Contributor

@dblnz dblnz commented Aug 31, 2025

Description

This PR closes #723, #704 and partially addresses #318.
These changes modify the way we perform guest tracing to use the tracing crate and its macros (instrument, trace).

How it works

Guest

What makes this possible is the implementation of the Subscriber trait in the hyperlight-guest-tracing crate. By implementing it, we can now handle the capturing of spans and events and choose how to store them and when to export them to the host.

The GuestSubscriber type that implements Subscriber keeps an internal TraceState that holds all the needed information.
Whenever a new span is created, entered or exited, a callback on the subscriber is called so that we can handle the functionality. The same happens with the events also.

Each time a new span or event is added to the internal state, we check whether the buffer got full and send them to the host to process.

Host

When the host detects a VM exit from the guest, it checks whether it contains tracing information in the OutB instruction.
When tracing information is found, the host starts going through it and check against the local storage of spans.

  • If the spans have previously been created, just update the end timestamp (if present) and add new events (if any).
  • If they haven't been created, create and store them.

The spans parents are set based on the information got from the host.

TODO

  • The current issue is with the guest calls that end up calling back into the host.
    These do not correctly set the parents of the spans created in the host to the last one created in the guest before doing the VM exit
    I need to find a way to propagate the context into the guest and back whenever it is needed. But using the Opentelemetry propagators needs std support which we do not have in the guest.
  • There is a corner case with calculating the timestamp for the span that is open when a VM exit is done
Jaeger picture of a Guest call that calls back into the host image

dblnz added 13 commits August 31, 2025 11:32
- This feature is not used separate from the mem_profile
- All the unwind logic is now gated by mem_profile

Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
- The guest side does not use this type of OutBAction
- The stack unwinding is done either way when the mem_profile feature is enabled

Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
- This helps with keeping code separate and easily gating it out

Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
- This steps cleans up codebase for the new way of tracing guests
- The current method involves custom macros and logic that are not
the best for maintainability

Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
- Define a separate struct that holds the functionality related to
  memory profiling of the guest
…ents

- This helps us have control over the events fields and store what we care

Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
- The `heapless` definitions for spans and events are useful for memory
  footprint control of the guest

Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
- Parse the spans and events coming from the guest and
create corresponding spans and events from the host that
mimics a single call from host

Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
- conditionally handle logs either through tracing or the dedicated VM exit
  based on whether tracing is initialized on the guest

Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
@dblnz dblnz added the kind/enhancement For PRs adding features, improving functionality, docs, tests, etc. label Aug 31, 2025
@dblnz dblnz changed the title Tracing improvements Guest tracing improvements to use tracing crate Aug 31, 2025
@dblnz dblnz marked this pull request as draft August 31, 2025 13:48
Copy link
Contributor

@jprendes jprendes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a high level review, and what I've seen looks good :-)

@@ -1094,15 +1095,9 @@ impl Hypervisor for HypervWindowsDriver {
}

#[cfg(feature = "trace_guest")]
fn read_trace_reg(&self, reg: TraceRegister) -> Result<u64> {
fn read_trace_reg(&self) -> Result<X86_64Regs> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be read_regs(&self)?

}
#[cfg(feature = "trace_guest")]
fn trace_info_as_mut(&mut self) -> &mut TraceInfo {
fn trace_info_mut(&mut self) -> &mut TraceInfo {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<rant>
Ah, I never understand this pattern, why isn't trace_info directly a public member?
But we have that all over the codebase, so... it's ok here
</rant>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because in this case, I need to retrieve the TraceInfo through the &dyn Hypervisor which doesn't know about the KvmDriver's member field.

I don't like it either, to be honest, I think this will look a lot better when we unify the Hypervisors into a single struct. A lot of duplicated code will go also

@@ -11,7 +11,6 @@ members = [
"src/hyperlight_host",
"src/hyperlight_guest_capi",
"src/hyperlight_guest_tracing",
"src/hyperlight_guest_tracing_macro",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep it around, but as deprecated, shimming the new API? so that we don't break anyone who was already using this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is the way to go. But I don't think we'll ever use it again

Comment on lines 117 to 118
#[cfg(feature = "trace_guest")]
104 => Ok(OutBAction::TraceRecord),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't though about this before, but, does this mean that a guest compiled with an older hyperlight_guest doesn't run in a newer hyperlight_host?
We should think about this more carefully, not particularly in this PR, but I think in general we haven't been too diligent on paying attention to that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created #845 to track that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and in this case, Ideally we should keep that 104 value reserved

HyperlightExit::Halt()
| HyperlightExit::IoOut(_, _, _, _)
| HyperlightExit::Mmio(_) => {
// If the result is not a halt, io out, mmio or debug exit, we need to process the trace batch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the comment wrong? the logic wrong? or my understanding wrong?

Suggested change
// If the result is not a halt, io out, mmio or debug exit, we need to process the trace batch
// If the result is a halt, io out, mmio or debug exit, we need to process the trace batch

HyperlightExit::Halt()
| HyperlightExit::IoOut(_, _, _, _)
| HyperlightExit::Mmio(_) => {
// If the result is not a halt, io out, mmio or debug exit, we need to process the trace batch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

HyperlightExit::Halt()
| HyperlightExit::IoOut(_, _, _, _)
| HyperlightExit::Mmio(_) => {
// If the result is not a halt, io out, mmio or debug exit, we need to process the trace batch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement For PRs adding features, improving functionality, docs, tests, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve guest tracing
2 participants