Skip to content

Conversation

dimpavloff
Copy link

@dimpavloff dimpavloff commented Jul 7, 2025

Part one for grpc/proposal#492 (A97).
This is done in a new credentials/jwt package to provide file-based PerRPCCallCredentials. It can be used beyond XDS. The package handles token reloading, caching, and validation as per A97 .

There will be a separate PR which uses it in xds/bootstrap.

Whilst implementing the above, I considered credentials/oauth and credentials/xds packages instead of creating a new one. The former package has NewJWTAccessFromKey and jwtAccess which seem very relevant at first. However, I think the jwtAccess behaviour seems more tailored towards Google services. Also, the refresh, caching, and error behaviour for A97 is quite different than what's already there and therefore a separate implementation would have still made sense.
WRT credentials/xds, it could have been extended to both handle transport and call credentials. However, this is a bit at odds with A97 which says that the implementation should be non-XDS specific and, from reading between the lines, usable beyond XDS.
I think the current approach makes review easier but because of the similarities with the other two packages, it is a bit confusing to navigate. Please let me know whether the structure should change.

Relates to istio/istio#53532

RELEASE NOTES:

  • credentials: add credentials/jwt package providing file-based JWT PerRPCCredentials (A97)

Copy link

codecov bot commented Jul 7, 2025

Codecov Report

❌ Patch coverage is 64.58333% with 51 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.65%. Comparing base (dd718e4) to head (75fbc02).
⚠️ Report is 67 commits behind head on master.

Files with missing lines Patch % Lines
credentials/jwt/jwt_token_file_call_creds.go 66.66% 19 Missing and 13 partials ⚠️
credentials/jwt/jwt_file_reader.go 60.41% 10 Missing and 9 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8431      +/-   ##
==========================================
- Coverage   82.27%   80.65%   -1.62%     
==========================================
  Files         414      415       +1     
  Lines       40424    40969     +545     
==========================================
- Hits        33259    33044     -215     
- Misses       5795     6244     +449     
- Partials     1370     1681     +311     
Files with missing lines Coverage Δ
credentials/jwt/jwt_file_reader.go 58.00% <60.41%> (ø)
credentials/jwt/jwt_token_file_call_creds.go 66.66% <66.66%> (ø)

... and 261 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dimpavloff dimpavloff changed the title xds: implement file-based JWT authentication (A97) xds: implement file-based JWT Call Credentials (A97) Jul 7, 2025
@dimpavloff
Copy link
Author

@dfawley hey 👋 Given you approved A97, would you mind having a cursory look at the PR to confirm if at least at a high level the approach looks good?

@eshitachandwani
Copy link
Member

I will take a look at this , I need to go through the gRFC first.

@dfawley dfawley self-assigned this Jul 22, 2025
@dfawley dfawley requested review from easwars and eshitachandwani and removed request for dfawley July 25, 2025 20:39
@dfawley dfawley assigned easwars and unassigned dfawley Jul 25, 2025
@dfawley
Copy link
Member

dfawley commented Jul 25, 2025

Sorry for the delay here.

@easwars would you be able to review this change? I think you have more background into some of the things than I do, like the bootstrap integration. Thank you!

@easwars
Copy link
Contributor

easwars commented Jul 28, 2025

Thank you for your contribution @dimpavloff. Yes, it would be nice if you can split this into smaller PRs. I will continue to use this PR to review the JWT call credentials implementation. If you can move the xDS implementation out to one or more PRs, I would greatly appreciate that and would be happy to review them as well.


// Verify cached expiration is 30 seconds before actual token expiration
impl := creds.(*jwtTokenFileCallCreds)
impl.mu.RLock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please only test the API surface. Relying on implementation internals in tests makes them brittle and would result in test changes when any changes to implementation is made.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you are referring to using a private field rather than obtaining mu specifically.
In general I agree -- white box tests may get fragile and break during a refactor. However, this test and the next couple of ones are about the caching behaviour -- it is meant to be transparent to the external API. If I don't make assertions about the private fields, the tests may pass trivially and become more flaky (e.g. when testing the backoff in the next test).
One alternative could be factoring out these behaviours out into a separate private struct with "public" functions which expose the same information. Given that it would require shifting the majority of the implementation into that struct, I'm not sure it is an improvement from the current approach.
Please do let me know your thoughts and if you have other suggestions.

@easwars easwars assigned dimpavloff and unassigned easwars Jul 28, 2025
@dimpavloff
Copy link
Author

@arjan-bal ping

}

// jWTFileReader handles reading and parsing JWT tokens from files.
type jWTFileReader struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: This struct should be named jwtFileReader. See https://google.github.io/styleguide/go/decisions#initialisms

func (r *jWTFileReader) extractExpiration(token string) (time.Time, error) {
parts := strings.Split(token, ".")
if len(parts) != 3 {
return time.Time{}, fmt.Errorf("expected 3 parts, got %d: %w", len(parts), errJWTValidation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method seems to be wrapping errJWTValidation in every error it returns. I think we should move the wrapping to the calling function to avoid redundancy.

If there are reasons to perform the wrapping in this method, we should mention that the cases in which the returned error is an instance of errJWTValidation in the godoc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point 👍

}

// extractExpiration parses the JWT token to extract the expiration time.
func (r *jWTFileReader) extractExpiration(token string) (time.Time, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JWT parsing seems like a fairly complex task. Even though we aren't verifying the token, my opinion is that we should use a popular library like golang-jwt. I would like to get others' opinions on this.

@dfawley @gtcooke94 @matthewstevenson88

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I scanned through that library. Looks like it has a lot of functionality. In the past, we have decided to implement things ourselves when we only need a small amount of functionality from a library. Also, we already have this functionality in the existing GCP JWT token call creds. So, I'm leaning against not taking a dependency on an external library (even though it seems to be a 1.x). Want to hear Doug's thoughts on this though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This functionality is preset in the go std lib.

import	"golang.org/x/oauth2/jws"

func (r *jwtFileReader) extractExpiration(token string) (time.Time, error) {
	cs, err := jws.Decode(token)
	if err != nil {
		return time.Time{}, err
	}
	expTime := time.Unix(cs.Exp, 0)
	if expTime.Before(time.Now()) {
		return time.Time{}, fmt.Errorf("expired token: %w", errJWTValidation)
	}
	return expTime, nil
}

This is what the Google oauth2 implementation uses internally. The jws package is marked as deprecated with the following message:

Deprecated: this package is not intended for public use and might be removed in the future. It exists for internal use only. Please switch to another JWS package or copy this package into your own source tree.

I think we can take inspiration from their implementation.

Copy link
Author

@dimpavloff dimpavloff Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can take inspiration from their implementation.

Their implementation's jws.Decode:

  1. Parses the token by splitting on the . character https://cs.opensource.google/go/x/oauth2/+/refs/tags/v0.31.0:jws/jws.go;drc=696f7b31289a98558822be146698b7834e477e63;l=182
  2. Calls base64.RawURLEncoding.DecodeString(claims)
  3. Decodes the JWT Claims json.NewDecoder(bytes.NewBuffer(decoded)).Decode(c)

This is exactly what the current function does, barring minor detail of what exact functions are called to achieve these steps. So I am struggling to understand what your suggestion is here – copy paste the code verbatim so that it's exactly the same as in golang.org/x/oauth2/jws? If yes, why is that preferred? Or ignore golang.org/x/oauth2/jws's deprecation notice and do what you have in the snippet?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the confusion, this thread is not actionable now. I left it open for other reviewers to see. I left comments about replacing strings.Split for efficiency and using RawURLEncoding below after going through the std lib implementation.

*
*/

// Package jwt implements gRPC credentials using JWT tokens from files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a single file in the package should have the package level godoc. Since it's there in doc.go, we should remove it from here.

if needsPreemptiveRefresh {
// Start refresh if not pending (handling the prior RPC may have
// just spawned a goroutine).
if !c.pendingRefresh {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we combine this with the outer check to reduce one level of indentation?

if needsPreemptiveRefresh && !c.pendingRefresh {


// refreshToken reads the token from file and updates the cached data.
func (c *jwtTokenFileCallCreds) refreshToken() {
// Deliberately not locking c.mu here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the mutex lock omitted here to avoid doing file IO while holding the lock? If that's the case, we should state this in the comment for the benefit of readers.

I think this is safe only as long fileReader can handle concurrent calls. It can presently, since it doesn't have any mutable state. We should probably this requirement in the godoc of fileReader to serve as a good warning in case state is ever added to fileReader. Something like "It is safe to call methods on this type concurrently" should work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the mutex lock omitted here to avoid doing file IO while holding the lock?

Yes, exactly.

We should probably this requirement in the godoc of fileReader to serve as a good warning in case state is ever added to fileReader.

Good idea! 👍

c.mu.Lock()
defer c.mu.Unlock()
c.updateCacheLocked(token, expiry, err)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Please omit empty line here, it doesn't seem to improve readability here.

c.nextRetryTime = time.Time{}

c.cachedAuthHeader = "Bearer " + token
// Per RFC A97: consider token invalid if it expires within the next 30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This should be Per gRFC instead of Per RFC.

func (c *jwtTokenFileCallCreds) GetRequestMetadata(ctx context.Context, _ ...string) (map[string]string, error) {
ri, _ := credentials.RequestInfoFromContext(ctx)
if err := credentials.CheckSecurityLevel(ri.AuthInfo, credentials.PrivacyAndIntegrity); err != nil {
return nil, fmt.Errorf("cannot send secure credentials on an insecure connection: %w", err)
Copy link
Contributor

@arjan-bal arjan-bal Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Use %v instead of %w to format the embedded error here. Using %w can leak the type of the embedded error and make it part of the API which people may start depending on. See https://google.github.io/styleguide/go/best-practices#adding-information-to-errors

It is best to avoid using %w unless you also document (and have tests that validate) the underlying errors that you expose. If you do not expect your caller to call errors.Unwrap, errors.Is and so on, don’t bother with %w.

impl.mu.Unlock()

if !nextRetryTime2.Equal(nextRetryTime) {
t.Errorf("nextRetryTime should not change due to backoff. Got: %v, Want: %v", nextRetryTime2, nextRetryTime)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check seems to rely on the fact that time b/w the two GetRequestMetadata calls is always <= 0.8s. I think we should try to avoid relying on time like this because overloaded CI infra can cause test execution to slow down occasionally. Can we change the nextRetryTime to something larger than defaultTestTimeout before this verification?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed; I've removed the reliance on the backoff strategy implementation entirely and now we control the nextRetryTime whenever we want to retry / ensure no retries are made.

// Third call should retry but still fail with UNAVAILABLE.
_, err3 := creds.GetRequestMetadata(ctx)
if err3 == nil {
t.Fatal("Expected cached error")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Use the word want instead of expect, e.g.:

t.Fatalf("creds.GetRequestMetadata() = %v, want non-nil". err)

See https://google.github.io/styleguide/go/decisions#got-before-want

impl.mu.Unlock()

// Third call should retry but still fail with UNAVAILABLE.
_, err3 := creds.GetRequestMetadata(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Since we're not using the old err variable again, we can re-use the err names instead of adding numbers as suffixes.

t.Errorf("nextRetryTime should not change due to backoff. Got: %v, Want: %v", nextRetryTime4, nextRetryTime3)
}
if retryAttempt4 != retryAttempt3 {
t.Error("retry attempt should not change due to backoff")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Log message should being with a capital letter, like regular sentences.

Comment on lines 470 to 471
t.Errorf("after creating valid token file, GetRequestMetadata() should eventually succeed, but got: %v", err5)
t.Error("backoff should expire and trigger new attempt on next RPC")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Can we combine these two into a single t.Fatalf call? This should also allow us to un-indent the else block below. It is fine if the log message takes up more horizontal space.
See https://google.github.io/styleguide/go/guide#line-length

}

// extractExpiration parses the JWT token to extract the expiration time.
func (r *jWTFileReader) extractExpiration(token string) (time.Time, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This functionality is preset in the go std lib.

import	"golang.org/x/oauth2/jws"

func (r *jwtFileReader) extractExpiration(token string) (time.Time, error) {
	cs, err := jws.Decode(token)
	if err != nil {
		return time.Time{}, err
	}
	expTime := time.Unix(cs.Exp, 0)
	if expTime.Before(time.Now()) {
		return time.Time{}, fmt.Errorf("expired token: %w", errJWTValidation)
	}
	return expTime, nil
}

This is what the Google oauth2 implementation uses internally. The jws package is marked as deprecated with the following message:

Deprecated: this package is not intended for public use and might be removed in the future. It exists for internal use only. Please switch to another JWS package or copy this package into your own source tree.

I think we can take inspiration from their implementation.

Comment on lines 75 to 80
// Add padding if necessary for base64 decoding.
if m := len(payload) % 4; m != 0 {
payload += strings.Repeat("=", 4-m)
}

payloadBytes, err := base64.URLEncoding.DecodeString(payload)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use base64.RawURLEncoding to handle padding for us. See the std lib implementation.

Comment on lines 69 to 71
parts := strings.Split(token, ".")
if len(parts) != 3 {
return time.Time{}, fmt.Errorf("expected 3 parts, got %d: %w", len(parts), errJWTValidation)
Copy link
Contributor

@arjan-bal arjan-bal Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the peel-off style used by the std lib implementation as its more efficient. It avoids extra allocations only re-slices the existing string a couple of times.

Copy link
Author

@dimpavloff dimpavloff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @arjan-bal . Could you please take another look?


// refreshToken reads the token from file and updates the cached data.
func (c *jwtTokenFileCallCreds) refreshToken() {
// Deliberately not locking c.mu here
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the mutex lock omitted here to avoid doing file IO while holding the lock?

Yes, exactly.

We should probably this requirement in the godoc of fileReader to serve as a good warning in case state is ever added to fileReader.

Good idea! 👍

grpctest.Tester
}

func TestTokenFileCallCreds(t *testing.T) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, nicely spotted. Thanks

name: "insufficient_security_level",
tokenContent: createTestJWT(t, now.Add(time.Hour)),
authInfo: &testAuthInfo{secLevel: credentials.NoSecurity},
wantCode: codes.Unknown, // http2Client.getCallAuthData actually transforms such errors into into Unauthenticated
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to a few lines above --

return nil, status.Error(codes.Unauthenticated, "transport: cannot send secure credentials on an insecure connection")

{
name: "unreachable_token_file",
invalidTokenPath: true,
tokenContent: "",
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree but because I am varying the value across the test cases, I thought it would be clearer to the reader what the intent of each test is as it's easier to compare visually.
The rule you quoted supports this: Zero-value fields may be omitted from struct literals when clarity is not lost as a result.

My preference would be to keep it as is but happy to remove if you insist.

t.Fatal("token should be cached after successful GetRequestMetadata")
}

if !shouldTriggerRefresh {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just explicitly confirms the precondition necessary for the rest of the test -- the test assumes that the subsequent call (the second call) will trigger a refresh. So this test will fail sooner if this doesn't happen.

I've updated the failure message to something more useful but happy to remove it, too, as it's not strictly needed.

impl.mu.Unlock()

if !nextRetryTime2.Equal(nextRetryTime) {
t.Errorf("nextRetryTime should not change due to backoff. Got: %v, Want: %v", nextRetryTime2, nextRetryTime)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed; I've removed the reliance on the backoff strategy implementation entirely and now we control the nextRetryTime whenever we want to retry / ensure no retries are made.

if retryAttempt != 1 {
t.Errorf("Expected retry attempt to be 1, got %d", retryAttempt)
}
if nextRetryTime.IsZero() || nextRetryTime.Before(time.Now()) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, doesn't this make the test a bit more noisy as it will have to be aware of the backoff details, which is another implementation detail, calculate the above, and explain it?

I've attempted to improve the test flow a bit. Please lmk what you think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Auth Includes regular credentials API and implementation. Also includes advancedtls, authz, rbac etc. Type: Feature New features or improvements in behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants