Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ProductCatalogService] Add product catalog service problem patterns #1382

Conversation

klucsik
Copy link
Contributor

@klucsik klucsik commented Feb 15, 2024

Changes

Refactored the featre flag querying in ProductCatalogService to be reusable
Added Problems to ListProducts and GetProduct endpoints in the product catalog service:

  • ItemListInternal - will fail ListProducts with Internal error code
  • ItemNotFound - will fail GetProduct with NotFound error code

Merge Requirements

For new features contributions please make sure you have completed the following
essential items:

  • CHANGELOG.md updated to document new feature additions
  • Appropriate documentation updates in the docs
    * [ ] Appropriate Helm chart updates in the [helm-charts][]

Copy link

linux-foundation-easycla bot commented Feb 15, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@klucsik klucsik changed the title Klucsik add product catalog service problem patterns Add product catalog service problem patterns Feb 15, 2024
@klucsik klucsik marked this pull request as ready for review February 15, 2024 10:01
@klucsik klucsik requested a review from a team February 15, 2024 10:01
@klucsik klucsik changed the title Add product catalog service problem patterns [ProductCatalogService} Add product catalog service problem patterns Feb 15, 2024
@klucsik klucsik changed the title [ProductCatalogService} Add product catalog service problem patterns [ProductCatalogService] Add product catalog service problem patterns Feb 15, 2024
@klucsik klucsik marked this pull request as draft February 15, 2024 11:24
@julianocosta89
Copy link
Member

Hey @klucsik, nice to see you here!
Before we check your PR, could you sign the EasyCLA?

@klucsik klucsik force-pushed the klucsik-add-product-catalog-service-problem-patterns branch 2 times, most recently from 9eba716 to 7084b43 Compare February 15, 2024 11:48
@klucsik klucsik closed this Feb 15, 2024
@klucsik klucsik force-pushed the klucsik-add-product-catalog-service-problem-patterns branch from 7084b43 to 371b36f Compare February 15, 2024 11:52
@klucsik
Copy link
Contributor Author

klucsik commented Feb 15, 2024

Hey @klucsik, nice to see you here! Before we check your PR, could you sign the EasyCLA?

Yes yes, we are on it, just having some learning path with commit signing :D

@klucsik klucsik reopened this Feb 15, 2024
@klucsik
Copy link
Contributor Author

klucsik commented Feb 15, 2024

Hey @klucsik, nice to see you here! Before we check your PR, could you sign the EasyCLA?

Nice to work with you again :) EasyCLA is green

@klucsik klucsik marked this pull request as ready for review February 15, 2024 12:23
Co-authored-by: Juliano Costa <julianocosta89@outlook.com>
Copy link
Member

@austinlparker austinlparker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! I'm not categorically opposed to these changes, but they seem like very blunt ways of triggering 100% failure scenarios for a service that already has failure scenarios available. Could you help me understand what you're trying to accomplish by adding these cases?

@@ -237,6 +244,13 @@ func (p *productCatalog) GetProduct(ctx context.Context, req *pb.GetProductReque
return nil, status.Errorf(codes.Internal, msg)
}

if p.getFeatureFlag(ctx, "ItemNotFound") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this failure case -- do we need to force it via flag? Wouldn't simply passing an invalid product ID to the service trigger the same result as this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flag would still need to be in place to control when to send the wrong id, but yeah, agree with you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm, validating the PR I actually understood the scenario.
ItemNotFound is being triggered just when navigating to the product page.

I can see that being used in demos, to showcase errors on a page that is part of the checkout flow.

ItemListInternal on the other hand, fails on listing the products, so you can't even see the products.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could move the ItemNotFound logic to loadgenerator to query some clearly invalid products, the requirement from our end is to demonstrate a notfound error in the sample

@@ -217,6 +217,13 @@ func (p *productCatalog) Watch(req *healthpb.HealthCheckRequest, ws healthpb.Hea
func (p *productCatalog) ListProducts(ctx context.Context, req *pb.Empty) (*pb.ListProductsResponse, error) {
span := trace.SpanFromContext(ctx)

if p.getFeatureFlag(ctx, "ItemListInternal") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would a 500 error happen here naturally? Is there a better way to represent this failure scenario?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This 500 error is initially meant to be in the SearchProducts endpoint, but that is not called by anything, thats why I moved this to here to be able to throw an Internal error in a Go service, so we can demonstrate the detection of that.

@puckpuck
Copy link
Contributor

I agree with @austinlparker's comment about introducing another feature flag for a simple blunt failure scenario. We already have several similar feature flags across different aspects of the demo. What observability demo and/or use cases are we trying to solve with this feature flag?

@klucsik
Copy link
Contributor Author

klucsik commented Feb 19, 2024

@austinlparker @julianocosta89 @puckpuck Thank you for reading through my PR, and the comments.
What do you mean of triggering 100% failure scenarios? As I understand, and experienced, the problem scenarios can be triggered partially if we set the feature flag in the feature flag service like 0.3 (which gets triggered every 30% of the calls).
My goal is basically to introduce 404 and 500 errors in a Go service, so we can demonstrate what can one see with open telemetry data, and with more instrumentation. Initally these meant to be in the SearchProducts, but turned out, nothing calls the SearchProducts endpoint (and the loadgenerator doesnt seem to handle gRPC requests) so I moved them where it made most sense to me.
I could trigger the 404 from the loadgenerator side, but 500 should be in the ProductCatalogService.
I could imagine the ListProduct would be able to error out if it gets a malformed data json, or if we would use some DB to store the products and that would have an outage or similar problem.
I'm very much open for your suggestions, my goal would be to introduce these errortypes in a meaningfull form, triggerable from the featureflag service.

@puckpuck
Copy link
Contributor

@klucsik we already have a feature flag in the Product Catalog service that will return a 500 for a specific product ID when the flag is enabled. Would the productCatalogFailure feature flag support your need to see a Go service return a 500 under some conditions? https://github.com/open-telemetry/opentelemetry-demo/blob/main/src/productcatalogservice/main.go#L279-L303

I want to leverage feature flags to showcase potential failures, but I would like the failures to be more than just blunt failures on a feature flag being positive. It would help that the failure was due to a specific scenario, such as a product ID or multiple specific products being purchased, etc.

@klucsik klucsik force-pushed the klucsik-add-product-catalog-service-problem-patterns branch from 6344637 to 96969b6 Compare February 22, 2024 12:50
Copy link

github-actions bot commented Mar 1, 2024

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@julianocosta89
Copy link
Member

@klucsik we have just merged a PR that removed the current FeatureFlagService in favor of OpenFeature and FlagD.
Unfortunately your PR needs to be revisited.
Would you be able to update it?

@klucsik
Copy link
Contributor Author

klucsik commented Mar 8, 2024

Hey @julianocosta89. Sorry for going radiosilent about this. We discussed it a lot internally.
I'll definitely rework this, as we need these problem patterns present, but maybe we shouldn't merge it to upstream as you guys pointed out that from pure otel perspective these aren't providing much value.

@julianocosta89
Copy link
Member

@klucsik got it.
Should we close this PR then?

@klucsik klucsik closed this Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants