Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update troubleshooting.md #103

Merged
merged 4 commits into from
Mar 13, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 49 additions & 1 deletion docs/_edot-sdks/java/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,52 @@ parent: EDOT Java

# Troubleshooting the EDOT Java Agent

TODO
The sections below are in the order you should follow, unless you have already identified the section you need.

This guide assumes you have tested the other components in the route from application+agent to Elastic Observability (eg collector or APM server, Elasticsearch, and Kibana) and that the problem has been isolated to the application+agent.

## General

Ensure you have set a service name (eg `-Dotel.service.name=Service1` or environment variable `OTEL_SERVICE_NAME` set to `Service1`) otherwise by default the data (traces, metrics, logs) will be sent to `unknown_service_java` - you may be getting data but it may all be under that service
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[for later improvement] here when we have the configuration part ready, I think it would be simpler to link to it for how to set the config option, we only need to explain the "why we need it".


## Connectivity to endpoint

Check _from_ the host/VM/pod/container/image running the app, that connectivity is available to the APM server or collector. The examples here use a default URL, which you should replace with the endpoint you are using:

- OpenTelemetry or EDOT collector without authentication: `curl -i http://127.0.0.1:4318/v1/traces -X POST -d '{}' -H content-type:application/json`
- APM server without authentication: `curl --verbose -X GET http://127.0.0.1:8200`
- APM server with secret token authentication: `curl -X POST http://127.0.0.1:8200/ -H "Authorization: Bearer <secret_token>"`
- APM server with API key authentication: `curl -X POST http://127.0.0.1:8200/ -H "Authorization: ApiKey <api_key>"`

The collector should produce output similar to
```
{"partialSuccess":{}}
```

The APM server should produce output similar to
```
{
"build_date": "2021-12-18T19:59:06Z",
"build_sha": "24fe620eeff5a19e2133c940c7e5ce1ceddb1445",
"publish_ready": true,
"version": "8.17.3"
}
```


## Is it the agent?

Determine if the issue is related to the agent by

1. Starting the application with no agent and seeing if the issue is not present, but then the issue is again present when restarting with the agent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] we can advise to disable the agent through configuration or env variable as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, but I've seen people think they have but haven't. Better to advise to remove it I think

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, this is also something that can be easily verified on a running JVM by doing a ps -ef|grep javaagent, unlike setting an environment variable.

2. Check end-to-end connectivity without the agent by running one or more of the example apps in https://github.com/elastic/elastic-otel-java/blob/main/examples/troubleshooting/README.md . These use the OpenTelemetry SDK rather than the auto-instrumentation, ie there is no agent present, and create traces, metrics and logs, so provide confirmation that the issue is specific to the agent or can otherwise identify that the issue is something else

## Agent DEBUG

Debug output is enabled with `-Dotel.javaagent.debug=true` or environment variable `OTEL_JAVAAGENT_DEBUG` to `true`.

Once debug is enabled, look for:
- Errors and exceptions
- For the expected traces or metrics - or lack of them (maybe the [technology isn't instrumented?](https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/docs/supported-libraries.md))