-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Support for SQL Statement Sanitization in OpenTelemetryTraceEventListener
#132
Comments
We should explore this idea! What approach did you have in mind for parsing SQL strings and detecting sensitive values? |
Great, hopefully will free up some time to get a draft RP out.
Going with a lexical analysis approach similar to the existing SqlSanitizer in OpenTelemetry should work well. |
One option might be to have our project directly use SqlStatementSanitizer from the Open Telemetry project. Maintaining our own sanitizer could be costly, so my preference would be to reuse what's already available. I see "incubator" in the package. Usually, that means the API is not stable, and might even be removed. That's something we should consider. If we have to update our code to use a new API, it's not the end of the world. But if the API is removed, I guess we would have to think about forking and maintaining our on copy. At the very least, we would be able to preserve whatever functionality was already there. Maybe the next step is to kick tires on SqlStatementSanitizer? I wonder how capable it is in the current state. I'd want to know if it's ready for use in production systems. |
Some decent tire kicking happening here: https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/57c7cf2ad5f7272c1eb83b9816e652bf832c91d4/instrumentation-api-incubator/src/test/java/io/opentelemetry/instrumentation/api/incubator/semconv/db/SqlStatementSanitizerTest.java#L143 I'll try to find some time to go hard on this. I want to toss in a |
I completely agree that reusing SqlStatementSanitizer would be better than maintaining a new sanitizer. It’s also worth noting that OpenTelemetry’s specification mandates database statements must be sanitized, which is why SqlStatementSanitizer is already widely used across the auto-instrumentation package. This makes it likely that OpenTelemetry will continue to maintain SqlStatementSanitizer in a reusable form.
|
I threw a few cases at this: import io.opentelemetry.instrumentation.api.incubator.semconv.db.SqlStatementInfo;
import io.opentelemetry.instrumentation.api.incubator.semconv.db.SqlStatementSanitizer;
public class SanitizerTest {
public static void main(String[] args) {
SqlStatementSanitizer sanitizer = SqlStatementSanitizer.create(true);
print(sanitizer, "SELECT 1");
String[] cases = new String[]{
"select ? from dual",
"insert into dual values (?)",
"delete from dual",
"update dual set dummy = ?",
"merge tab into dual",
//weird spacing
" select ? from dual where ? = ?",
"select ?from dual where?=?for update",
// string literals
// NOTE: Sanitizer is not recognizing Oracle's wonderful q-tick syntax
"select '?', n'?', q'???', q'{?}', q'{cat's}' from dual",
"select'?',n'?',q'???',q'{?}',q'{cat's}'from dual",
// comments
// NOTE: Sanitizer is not preserving new lines,
"select--line\n? from dual",
"select --line\n? from dual",
"--line\nselect ? from dual",
" --line\nselect ? from dual",
"--line\n select ? from dual",
// problems
// NOTE: Sanitizer is not recognizing PL/SQL procedure call, getOperation() == null
"begin proc4in4out (:x1, :x2, :x3, :x4); end;",
"{CALL tkpjpn01(:pin, :pinout, :pout)}",
"select :NumberBindVar as the_number from dual",
"select {fn locate(bob(carol(),ted(alice,sue)), 'xfy')} from dual",
"CREATE USER vijay6 IDENTIFIED BY \"vjay?\"",
"ALTER SESSION SET TIME",
"SELECT ename FROM emp WHERE hiredate BETWEEN {ts'1980-12-17'} AND {ts '1981-09-28'} ",
};
for (String sql : cases)
print(sanitizer, sql);
print(sanitizer, "CREATE USER \"scott\" IDENTIFIED BY \"tiger\"");
}
static void print(SqlStatementSanitizer sanitizer, String sql) {
SqlStatementInfo info = sanitizer.sanitize(sql);
System.out.printf("%s\n%s\n\t%s\n\t%s\n\n",
sql, info.getFullStatement(), info.getOperation(), info.getMainIdentifier());
}
} Here's what I noted:
I'm also noting that a IMO, we should be adhering to the specifications, which means we should be sanitizing SQL and conforming to the semantic conventions. I'll give the rest of my team some time to weigh in on this. |
Yes seems that double-quoted strings aren't detected by default, but {DOUBLE_QUOTED_STR} {
if (dialect == SqlDialect.COUCHBASE) {
builder.append('?');
} else {
if (!insideComment && !extractionDone) {
extractionDone = operation.handleIdentifier();
}
appendCurrentFragment();
}
if (isOverLimit()) return YYEOF;
} We can probably use this dialect until we add proper support for
On the topic of adhering to the specifications would folks be open to the idea of apply the full set of semantic standards defined by otel DB query execute call spans ? #134
Sounds good, thanks again for the quick follow ups here! 🚀 |
Hi @agent-adam, you've got a valid point that we should sanitize the SQL before it gets published in the span. We would love to get your contribution. Have you reviewed the Oracle Contributor Agreement (OCA)? You'll see that it will take you to an Oracle website where you can fill the agreement. If we can use the OT sanitizer in production, then we should go with that. |
Sounds good, thanks folks. |
Hi @agent-adam, we discussed this more internally and concluded that because we don't include the SQL text by default (SENSITIVE mode needs to be enabled) we follow the OpenTelemetry requirement. Furthermore, doing sanitization on the client will be challenging as it requires parsing SQL. What do you think? |
Yes, to your point, sanitization is only required if the SQL statement is going to be added to the spans by default. OpenTelemetry’s DB semantic conventions emphasize this. |
Description:
Currently, the
OpenTelemetryTraceEventListener
enriches spans with valuable metadata such as SQL operation, connection ID, and database user. However, the addition of sensitive information, such as Original SQL Text and Actual SQL Text (enabled by theOPEN_TELEMETRY_TRACE_EVENT_LISTENER_SENSITIVE_ENABLED
flag), raises concerns about exposing sensitive data in trace spans.Proposed Feature:
I propose adding support for SQL statement sanitization in the
OpenTelemetryTraceEventListener
. This feature would allow users to include SQL statements in spans while masking sensitive information, such as literal values, without compromising visibility into database interactions.Sanitization Capability:
Example:
I am happy to contribute to this feature and provide an initial implementation if the maintainers agree on its inclusion. Please let me know your thoughts!
The text was updated successfully, but these errors were encountered: