Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging Instrumentation | Context & Prompt Logging Infra For Enhanced Understanding of Context Composition #196

Open
jkbrooks opened this issue Jan 7, 2025 · 38 comments
Assignees

Comments

@jkbrooks
Copy link

jkbrooks commented Jan 7, 2025

As RSP team, we want to have deeper visibility in Context Construction via providers, so that we can understand how key details (Recent Messages, User Context, Relevant Facts) are constructed for debugging and context construction optimization.

Acceptance Criteria:

  • ensure data is parsable and structured
  • a way to log events, and stream them somewhere
  • logging prompts sent to LLMs
  • can pipe the output to other sources (data dog, etc) for data analysis
  • Detailed logs are generated for each step of the context composition process.
  • Logs include all relevant data, including provider outputs, intermediate results, and final state.
  • Logs are stored in a place that can be accessed by agents and team members easily (in JSON or in a log file or in db)
  • target user is understanding prompt and LLM output for prompt-engineering and A/B prompt testing

We want to log and review these as it relates to constructing context

const comprehensiveProvider: Provider = {
    get: async (runtime: IAgentRuntime, message: Memory, state?: State) => {
        try {
            // Get recent messages
            const messages = await runtime.messageManager.getMemories({
                roomId: message.roomId,
                count: 5,
            });

            // Get user context
            const userContext = await runtime.descriptionManager.getMemories({
                roomId: message.roomId,
                userId: message.userId,
            });

            // Get relevant facts
            const facts = await runtime.messageManager.getMemories({
                roomId: message.roomId,
                tableName: "facts",
                count: 3,
            });

            // Format comprehensive context
            return `


${messages.map((m) => `- ${m.content.text}`).join("\n")}

${userContext.map((c) => c.content.text).join("\n")}

${facts.map((f) => `- ${f.content.text}`).join("\n")}
      `.trim();
        } catch (error) {
            console.error("Provider error:", error);
            return "Context temporarily unavailable";
        }
    },
};

@jkbrooks
Copy link
Author

jkbrooks commented Jan 7, 2025

I would care substantially about where these logs would be stored and where they can be accessed, and am extremely interested in agents being able to access all logs.

@ArsalonAI ArsalonAI changed the title Provider Context Logging For Enhanced Understanding of Provider Context Composition Logging Instrumentation | Provider Context Logging For Enhanced Understanding of Provider Context Composition Jan 9, 2025
@jzvikart
Copy link
Collaborator

jzvikart commented Jan 9, 2025

@monilpat What branch this should target?

@ArsalonAI ArsalonAI changed the title Logging Instrumentation | Provider Context Logging For Enhanced Understanding of Provider Context Composition Logging Instrumentation | Context & Prompt Logging Infra For Enhanced Understanding of Context Composition Jan 9, 2025
@ArsalonAI
Copy link
Collaborator

@monilpat What branch this should target?

each issue should have a new branch created and after it's been completed and then you can make a pull request to the next env (dev ENV?) and after that then to main branch?

@monilpat what's your thoughts ^^

@monilpat
Copy link
Collaborator

monilpat commented Jan 10, 2025 via email

@ArsalonAI
Copy link
Collaborator

@monilpat we discussed logging and some thoughts came up.

  1. Where to store them Pubsub vs. Postgress - @jzvikart recommends we use postgress
  2. Add to RAG system - allow the agent to access these logs using RAG so that it can get some information about it's own logs

More to be discussed.

@ArsalonAI
Copy link
Collaborator

ArsalonAI commented Jan 14, 2025

Adding notes from our call 1/13/24 w/ Jure and Monil
@monilpat @jzvikart
@jkbrooks adding you here for visibility

  • Need to be able to Query effectively
  • Need to "Define a run" and different logs that make up a run
  • Need to store data and query it, will need a DB even in simpliest
    we have Eliza logger, can write conditional logic to write to DB that is defined, what is logging (.error, .debug, etc)

Define the DB architecture

  • Jure says a simple postgres, says Eliza logging internally is for users, what we want to do is instrumentation, would be there but default inactive unless set env variables, decide what is a run, an instance of agent, one instance of memory/context
  • thinking more of relational way (Monil)
  • a run table that has with a 1-to-n relationship (easier to query)
  • a run has ID, agent, specific action being done (it's own table) in future more info like swarm_id
    for a run, can grab whatever events
  • a scenario table
  • includes a event (type of event)
  • includes output (output we are logging)
  • Monil thinks it should be in the same infra as Eliza, no reason to seclude it, very easy to add a table and use underlying methods they have to query it, recommends creating it on the postgres adapter to start
  • Jure thought what Eliza uses internally has nothing to do with this, how much sense to use the same DB?

Run (Definition)

  • an instance of an agent can have many runs, doing multiple different workflows and actions
    ex. an agent is reviewing a PR -> query current state, generate dynamic template of PR info to review, making LLM call, getting response, parsing response, calling associated github API to create a review comment, returning success/failure is a single run (an iteration through the loop)

  • a unit of execution, start pulling data from various sources, 1 or more LLM queries, return a result -> generic steps to doing this, there is a structured flow almost always

  • Categories and info to log (every single time, this makes up a run)

  • state before doing anything

  • interpolated prompt

  • output of the LLM

  • any action taken and output of the action

Query and UI

  • want to query for a run, what did LLM say?
  • for a run, what did we pass into it?
    this makes useful input/output for prompt engineer
  • how to query? a non-tech user will not use SQL? how to visualize?
  • prompt and LLM output for prompt-engineer
  • small script that gets this info in DB
    a non tech person will use our UI, agent chat UI, some ability to review the logs, be able to click around, for this agent, want latest run for this X action
  • could create a tiny app - enter run_id into text box, submit action, prints all records and events for the run in execution order
  • can edit prompt in character file, can see input/see output and update character file with template

@jzvikart
Copy link
Collaborator

jzvikart commented Jan 16, 2025

After some initial research, the approach that @monilpat suggested appears to have several drawbacks. In particular, if we use logging with existing PostgreSQL adapter:

  1. Anybody who wants to use logging is forced to use PostgreSQL also for other stuff (memories etc.) This introduces a new coupling constraint that in many scenarios might not be desirable.
  2. Using PostgreSQL adapter requires installation of 3rd party extensions (e.g. vector) which are typically not part of distribution packages and generally need to be built from sources. This introduces additional complexity for deployment and maintenance, and reduces efficiency of development process. It also places limitations on the availability of easy-to-use solutions such as docker containers or hosted database instances.
  3. The storage/processing requirements of logging might bloat the requiremrnts for Eliza database itself, resulting in lower operational performance and higher system requirements that could otherwise be avoided.
  4. Considering above issues together it means that for this ticket there should be a prior decision and planning for infrastructure for hosting the DB, as well as a separate ticket for deploying it. In this case it would be best to start with PostgreSQL deployment, because we will already need it during development for this ticket and having one ready would avoid duplicate work. As already suggested, at this point we can also still decide to decouple the databases and use a separate instance for logging (without the need for extensions), or write logs to a text file.

@ArsalonAmini2024 @jkbrooks

@ArsalonAI
Copy link
Collaborator

@jzvikart @monilpat @jkbrooks I created this ADR for the feature - we may have jumped in quickly and skipped this technical scoping step. Let's fill this out, take a step back and ensure we're all on the same page with the ADR (architectural decision record) - https://docs.google.com/document/d/11CB3FyorvSxPxqbO4P35wTNuHJ-EDD2rKBEUiyO-ngc/edit?usp=sharing

@jzvikart
Copy link
Collaborator

jzvikart commented Jan 20, 2025

What is the scenario that we want to instrument?
Steps to reproduce? (command to start, env settings, character files, prompts etc.)

@jzvikart
Copy link
Collaborator

jzvikart commented Jan 21, 2025

Implementation is now working, the recommended next steps are:

  • pair up with the person who's going to be analyzing the data
  • decide on a particular scenario (see above) and set up a testing system
  • start analyzing the data and add/refine tracepoints to capture information that is needed
  • develop tools for analyzing and processing the trace data

Collection and refinement of trace data should be done selectively and iteratively. Capturing and analyzing "everything" is not realistic.

To kick this off I recommend doing a demo, or a pair coding session.

@ArsalonAI
Copy link
Collaborator

@jzvikart thanks for the update. A few PM comments -

  1. I don't see a pull request for this feature. Is this PR in review now?

  2. Can you comment on the solution approach or attach a Loom video so that our QA team can understand / begin manual testing scenario development

  • did you utilize the adapter-postgres and create additional table in the DB?
  • is this live in the test ENV running on an instance of PROSPER?
  • is there a simple UI client (did you create a new React client UI to visualize the run data, a simple table or a filter by RUN ID?)

@monilpat can you review this PR and give feedback

@jzvikart
Copy link
Collaborator

1. I don't see a pull request for this feature. Is this PR in review now?

I did not create a PR yet since it would make sense to answer some of the questions first.

2. Can you comment on the solution approach or attach a Loom video so that our QA team can understand / begin manual testing scenario development

I cannot make a video, but I am happy to pair up and discuss whatever the person who will use this wants to know. Testing/QA does not make sense here.

* did you utilize the adapter-postgres and create additional table in the DB?

Yes, as Monil suggested, although it is now clear that it would be better to separate the trace data in a separate DB instance. We can still change that though.

* is this live in the test ENV running on an instance of PROSPER?

No, this is currently only on development machine due to reasons and limitations that I mentioned, and we should make a plan how/where to deploy it. See above.

* is there a simple UI client (did you create a new React client UI to visualize the run data, a simple table or a filter by RUN ID?)

No, the current interface is SQL, any additional tools need to be discussed and developed.

@jzvikart
Copy link
Collaborator

One more thing: running and building is still failing non-deterministically. I've tried 3 different branches already and verified that the problem exists in version prior to my changes. We should address this. I've been in contact with Caner, but so far there is no known cause or fix.

@monilpat
Copy link
Collaborator

Hey, thanks so much for doing this in terms of the bill issues. It's something that the V2 separation into community plug-ins is gonna solve so it's a separate repository. Note with the way it currently works you will need to run it multiple times for it to successfully build and if it still fails, you'll need to comment out the blame plug-ins. We need to address this as long as your plug-in is being built you are not blocked by this so if you read the logs, you can see if your plug-in has been built or not

@monilpat
Copy link
Collaborator

1. I don't see a pull request for this feature. Is this PR in review now?

I did not create a PR yet since it would make sense to answer some of the questions first.

2. Can you comment on the solution approach or attach a Loom video so that our QA team can understand / begin manual testing scenario development

I cannot make a video, but I am happy to pair up and discuss whatever the person who will use this wants to know. Testing/QA does not make sense here.

* did you utilize the adapter-postgres and create additional table in the DB?

Yes, as Monil suggested, although it is now clear that it would be better to separate the trace data in a separate DB instance. We can still change that though.

* is this live in the test ENV running on an instance of PROSPER?

No, this is currently only on development machine due to reasons and limitations that I mentioned, and we should make a plan how/where to deploy it. See above.

* is there a simple UI client (did you create a new React client UI to visualize the run data, a simple table or a filter by RUN ID?)

No, the current interface is SQL, any additional tools need to be discussed and developed.

Yeah, that makes sense regarding the PR note. We preferred to have draft PR's when possible for review yeah I think our arsalon would probably be the best person to pair up with at this point. And I'm happy to hop in as needed yeah that can be a fast fallout to separate it as it makes sense but right now getting something working is very important. Yeah, I think Ars and I were talking about a simple UI that is part of the Eliza chat UI that for a conversation shows the runs and then when you click on the run or select the run from a drop-down, it will show you all the associated logs

@jzvikart
Copy link
Collaborator

@monilpat Thanks for explanation, that's exactly what I've been doing. If it's a known issue that's being worked on that's enough for me.

@jzvikart
Copy link
Collaborator

Yeah, that makes sense regarding the PR note. We preferred to have draft PR's when possible for review yeah I think our arsalon would probably be the best person to pair up with at this point. And I'm happy to hop in as needed yeah that can be a fast fallout to separate it as it makes sense but right now getting something working is very important. Yeah, I think Ars and I were talking about a simple UI that is part of the Eliza chat UI that for a conversation shows the runs and then when you click on the run or select the run from a drop-down, it will show you all the associated logs

OK, I'll create a draft PR so that we can continue the discussion there. As for tools - everything is possible, but we need to decide on the right approach first, considering the tradeoffs and skills of the person who will be doing this. I think more than UI/dropdowns we will need some data analysis tools, scripting, etc. And if we do go into UI, it should definitely be separate from Eliza main UI.

@monilpat
Copy link
Collaborator

monilpat commented Jan 21, 2025 via email

@jzvikart
Copy link
Collaborator

#275

@TimKozak
Copy link
Collaborator

@jzvikart @monilpat (tagging you on behalf of Ars) - how's everything going with this feature?

@jzvikart
Copy link
Collaborator

jzvikart commented Jan 23, 2025

@TimKozak Tracing framework is implemented and works. To meaningfully continue this ticket, we would need a "customer" who will be analysing the data/prompts and one or more use cases. When we know who the "customer" is I can provide engineering support and everything that's needed. See my comments above.

@jzvikart
Copy link
Collaborator

Next steps that we discussed so far:

  • Record a video
  • Trace a random scenario with unknown tracing criteria

@monilpat
Copy link
Collaborator

monilpat commented Jan 23, 2025 via email

@monilpat
Copy link
Collaborator

monilpat commented Jan 23, 2025 via email

@jzvikart
Copy link
Collaborator

Note to self:

  1. Implement instrumentation according to Monil's suggestions as a baseline.
  2. Take a simple scenario such as Coinbase "create charge" (take something from example), or a simple generic character (Trump).
  3. Capture traces
  4. Make some screenshots, video, or export the data for review, analysis and further discussion.

This would wrap up this ticket.

@ArsalonAI
Copy link
Collaborator

ArsalonAI commented Jan 24, 2025

@jzvikart the above sounds good. The ultimate use case I want to be able to do is this.

Have an API endpoint in which I can GET the traces for runs (add some pagination, optional query params like run ID, agent name, date range).

The endpoint will be consumed by a frontend (Swagger UI is fine) and displayed. If we don't have Swagger implemented in the codebase add the config please and host on a non local URL (public URL) we can all access.

with the swagger UI i can send in params like agent name, date range, etc. and get back the runs as a response

I can then look into the response for various details like what the prompt was, the action, etc.

it will also help me to SEE and understand the implementation so i can give feedback on additional things we want to include

If you can get the following done, we can consider this completed:

  1. Implement the logic to log the various components of a run in a well defined table in some DB (your choice if separate DB instance or same DB instance as the other info for an instantiated agent running on the server).
  2. Write tests to confirm the function, class, logic returns what we want and appends to the Db appropriately
  3. Expose this via an API endpoint so we can query it on Swagger UI
  4. Add to the Swagger documentation so I can play around with it

I think if we have a Swagger UI that I can play around with this endpoint that will be good enough here.

@ArsalonAI
Copy link
Collaborator

FYI - moving this into two separate tickets

  1. Implement Swagger UI on a public URL for the Dev ENV and
  2. Implement a RESTFUL API (GET) for this data (to display on SwaggerUI).
Image

@ArsalonAI
Copy link
Collaborator

Additional discussion on Docker and vector extensions -

Image

@ArsalonAI
Copy link
Collaborator

@jzvikart once the PR is merged, I will close this out.

@jzvikart
Copy link
Collaborator

Blocked - to test traces with coinbase plugin I need the coinbase API keys.

@ArsalonAI
Copy link
Collaborator

Image

@jzvikart please confirm this unblocks you -

@jzvikart
Copy link
Collaborator

jzvikart commented Jan 27, 2025

With Coinbase Commerce key I can only test Coinbase Commerce part. If we're OK with that, I will use the Commerce key provided by Monil.

@ArsalonAI ArsalonAI assigned VisionOra and unassigned jzvikart Jan 28, 2025
@VisionOra
Copy link
Collaborator

VisionOra commented Jan 29, 2025

📝 Project To-Do List

✅ Already Completed

  • ✅ A logging system has been implemented (Context composition tracking).
  • ✅ Logs are stored in PostgreSQL database.

🚧 Work Remaining @ArsalonAmini2024

(Can you please confirm if I am on the right track?)

  • 🔹 REST API to fetch logs data from database.
  • 🔹 Swagger UI for easy log retrieval.
  • 🔹 UI for visualizing logs (React-based simple frontend).
  • 🔹 Production deployment and database optimization.
  • 🔹 Integration with Eliza and RAG system.
  • 🔹 Additional metadata & API improvements.

🚀 Upcoming Enhancements & Next Steps

1️⃣ UI for Viewing Logs

Currently, logs can only be accessed via the API. A simple UI needs to be built that:

  • Displays logs in a dropdown or table format.
  • Supports search and filtering (e.g., fetching logs by Run ID).

2️⃣ Deployment to Production

The system is currently in a development environment. To move it to production:

  • The PostgreSQL database needs final setup and optimization.
  • The logging system must be fine-tuned to prevent performance issues.

3️⃣ Integration with Existing Systems (Eliza, RAG, etc.)

  • The Eliza logging system must be integrated so that agents can access logs.
  • The RAG (Retrieval-Augmented Generation) system needs log access support.

4️⃣ Additional API Improvements

  • More metadata needs to be added to logs (e.g., agent actions, errors, timestamps).
  • Aggregation queries should be optimized for faster log retrieval.

@ArsalonAI
Copy link
Collaborator

ArsalonAI commented Jan 29, 2025

@VisionOra - Make a Draft PR when you're done or a PR for @monilpat to Review

1️⃣ UI for Viewing Logs

  • We want a public swagger URL on the Dev ENV so we can play around with the REST API and view the logs/responses
  • We want the logs publically avail on the PROD ENV in the current UI (settings) https://agents.realityspiral.com/

2️⃣ Deployment to Production

  • we want it deployed to here (settings section) - https://agents.realityspiral.com/
  • We currently have one agent running (Prosper), if we can integrate this into his deployment we can begin collecting his logs as well.

3️⃣ Integration with Existing Systems (Eliza, RAG, etc.)

  • we want to log the runs, so this should be somewhat already completed as we are logging the context construction, the actions and the events that are associated with an agent run loop.

4️⃣ Additional API Improvements

  • Yes, good.

@monilpat would be a good person to reach out to for more info on this.

@jzvikart
Copy link
Collaborator

jzvikart commented Feb 1, 2025

We should consider using OpenTelemetry + LGTM stack for this.

@ArsalonAmini2024 @VisionOra @monilpat @jkbrooks

@Imsharad
Copy link
Collaborator

Imsharad commented Feb 5, 2025

Instrumentation plan

Stage Sub-Stage Event Key Functions/Files Data to Capture Notes
Initialization Runtime Boot session_start • registerAgent() in packages/client-direct/src/index.ts
• AgentRuntime constructor in packages/core
• Character/Agent ID
• Client (agent) count
• Config hash
Log at agent startup registration; differentiate cold/warm starts; store hashed config details
Observe Context Hydration context_loaded • composeState() in packages/client-direct/src/index.ts
• Context hydration functions in packages/core/src/context.ts
• Logger call via instrument.ts
• Character data loaded
• Memory count
• Hydration latency
Log immediately after the context is built; now use the dedicated logger (instrument.ts) to track hydration details including cache hit/miss, latency, etc.
Observe Input Reception message_received • HTTP POST handlers in packages/client-direct/src/index.ts (e.g., /:agentId/message, /:agentId/speak) • Input source
• Message type
• First 100 chars of the message text
Detect empty messages and duplicated inputs; log trimmed content
Orient Model Preparation model_selected • generateMessageResponse() in packages/client-direct/src/index.ts • Model class
• Context window size
• Provider info
Log immediately before generating a response; track model fallback and version mismatches
Decide Response Generation generation_started • composeContext() and generateMessageResponse() in packages/client-direct/src/index.ts • Prompt token count
• Template/version used
• Attached knowledge count
Log before beginning LLM generation; monitor prompt size and template integrity
Decide Safety Checking safety_evaluated • runtime.processActions() in packages/client-direct/src/index.ts
• Action evaluators in packages/core/src/actions.ts
• Flagged content types
• Override flags
• Evaluator latency
Log the outcome of safety checks without including sensitive details; use this to monitor potential false positives
Act Response Delivery response_sent • Response handler in packages/client-direct/src/index.ts • Output token count
• Delivery channel
• Response latency
Log immediately before delivering the output to the client; ensure multi-channel delivery consistency and error monitoring
Act Action Execution action_triggered • Action execution via processActions() in packages/client-direct/src/index.ts
• Action handlers in packages/core/src/actions.ts
• Action type
• Execution outcome
• Handler latency
Log custom action executions; avoid capturing extensive payload details; track execution anomalies
Learn Memory Formation memory_persisted • createMemory() calls in packages/client-direct/src/index.ts
• MemoryManager functions in packages/core
• Memory ID
• Creation timestamp
• Embedding dimensions
• Storage type
Log every persisted memory; use these logs for monitoring storage failures or potential memory leaks
Learn Feedback Integration feedback_received • Planned telemetry handlers in packages/core/src/telemetry.ts (or within a dedicated feedback module) • Feedback type
• Response time delta
• User rating
Ensure user feedback is anonymized; correlate these logs with output quality and overall response time

@ArsalonAI
Copy link
Collaborator

ArsalonAI commented Feb 19, 2025

@snobbee I reassigned this to you.

On the standup today, @monilpat mentioned - ComposeState, ComposeContext and GenerateObject (3 methods)

Monil Patel: 12:23
If you see the wrapper class for compose context, compose state and generate object that has the three areas we want to log only area that you have to add in each of the plugins and or clients is the result of the output.

Monil Patel: 13:06
All it is just a wrapper class around those three methods and calling those everywhere. So we get the instrumentation for free then in each of the plugins client before you return whatever the output of the API is, just throw it in and call it Makes sense.


Also adding Jazear's video here for reference to ensure we are covering all the points - https://drive.google.com/drive/u/2/folders/1Og7qFAZuQEzAWPamdwPiyiBiCINNsjmC

@snobbee
Copy link
Collaborator

snobbee commented Mar 13, 2025

@ArsalonAI @jkbrooks @monilpat the wrappers functions (compose state, context, and trace results) were implemented as part of this ticket #384 and this PR Sifchain/realityspiral#65

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants