Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the repository retrieval in get_dr_push #1376

Merged
merged 1 commit into from
Nov 9, 2023

Conversation

lubosmj
Copy link
Member

@lubosmj lubosmj commented Aug 25, 2023

closes #1275
closes #489

@lubosmj lubosmj marked this pull request as ready for review August 25, 2023 20:58
@lubosmj lubosmj force-pushed the refactor-get-dr-push-intererrors branch from cbfcfb4 to ef0d823 Compare October 31, 2023 13:53
@@ -0,0 +1,2 @@
Ensured repositories are correctly retrieved or initialized when handling concurrent requests
Copy link
Member

@ipanova ipanova Oct 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/when handling concurrent requests/when handling client parallel blob upload

try:
dist_serializer.is_valid(raise_exception=True)
distribution = dist_serializer.create(dist_serializer.validated_data)
except (IntegrityError, ValidationError):
Copy link
Member

@ipanova ipanova Oct 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should catch these error separately because the error message is different

@@ -349,6 +325,32 @@ def get_dr_push(self, request, path, create=False):
raise RepositoryNotFound(name=path)
return distribution, repository

@transaction.atomic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a different wrapping into a transaction whole method vs just the try block

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this change. The logic should be similar to the previous one, where we catch exceptions outside the transaction.atomic()'s context manager.

@lubosmj lubosmj force-pushed the refactor-get-dr-push-intererrors branch from ef0d823 to aa2bf06 Compare November 1, 2023 13:22
ipanova
ipanova previously approved these changes Nov 1, 2023
@lubosmj lubosmj marked this pull request as draft November 1, 2023 14:05
@lubosmj lubosmj force-pushed the refactor-get-dr-push-intererrors branch 2 times, most recently from 695a008 to 13906c8 Compare November 1, 2023 17:03
@lubosmj
Copy link
Member Author

lubosmj commented Nov 1, 2023

@ipanova, I believe this is the best I can do. We no longer granularly handle the validation and creation of entities inside pulp-container, see https://github.com/pulp/pulpcore/blob/3594624a3556af2ad400c29871041252ae947495/pulpcore/app/serializers/base.py#L360.

@lubosmj lubosmj marked this pull request as ready for review November 1, 2023 17:37
distribution = serializers.ContainerDistributionSerializer.get_or_create(
{"base_path": path, "name": path}, {"repository": get_url(repository)}
)
except ObjectDoesNotExist:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to be unnecessary, we whether retrieve existing instance or create a new one

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is necessary. Once the validation fails (e.g., because of an invalid base_path), it tries to fetch an object from the database: https://github.com/pulp/pulpcore/blob/3594624a3556af2ad400c29871041252ae947495/pulpcore/app/serializers/base.py#L374. If the object does not exist after the last attempt, we should return RepositoryNotFound.

Copy link
Member

@ipanova ipanova Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're saying and you are right only because the get_or_create is not implemented fully as it should be. Get or create implies that on success an object will always be returned and failure shuld not be related to object existence but rather only validation.
Here https://github.com/pulp/pulpcore/blob/3594624a3556af2ad400c29871041252ae947495/pulpcore/app/serializers/base.py#L372 we should not catch these together but separately, because if validation fails because of field being missing or invalid we need to raise it. And if it failed because 'this field must be unique; then we retrieve existing object.
Here I took a stab pulp/pulpcore#4643

{"name": path}
)
distribution = serializers.ContainerDistributionSerializer.get_or_create(
{"base_path": path, "name": path}, {"repository": get_url(repository)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, you figured it 👍

Copy link
Member

@ipanova ipanova Nov 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here you need to check whether returned distribution has a repo. and if it has it should be same repo-id as on line 330 otherwise we need to raise an error that repo is read-only
how about this workflow:

  1. create repo(regular) foo and distribution foo
  2. docker push foo and it should fail

raise RepositoryInvalid(name=path)

repository = repository.cast()
if not repository.PUSH_ENABLED:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah so this needs to be moved upper per my previous comment. now you're checking the repo you have created on line 330

repository = repository.cast()
if not repository.PUSH_ENABLED:
raise RepositoryInvalid(name=path, message="Repository is read-only.")
distribution, repository = self.create_dr(path, request)
else:
raise RepositoryNotFound(name=path)
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked the else branch? I think it needs re-working too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 328 makes little sense without trying to understand much the code, can you double check the logic if this is still needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I revised the else branch and it also needs to switch the serializers to use get_or_create.

@lubosmj lubosmj dismissed ipanova’s stale review November 2, 2023 12:11

The code has changed significantly after the approval. New comments arose.

@lubosmj lubosmj marked this pull request as draft November 3, 2023 12:41
@lubosmj
Copy link
Member Author

lubosmj commented Nov 6, 2023

Waiting for pulp/pulpcore@bf0654b to be released.

@lubosmj lubosmj force-pushed the refactor-get-dr-push-intererrors branch 3 times, most recently from 4954d4c to 424f2a3 Compare November 8, 2023 14:43
@lubosmj lubosmj marked this pull request as ready for review November 8, 2023 15:14
@lubosmj lubosmj requested a review from ipanova November 8, 2023 15:14
)
repo_serializer.is_valid(raise_exception=True)
repository = repo_serializer.create(repo_serializer.validated_data)
distribution.repository = repository
distribution.save()
except IntegrityError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be removed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unsure if save() could raise any error. By the way, should we enforce saving the object just in time with force_update=True?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{"base_path": path, "name": path}, {"repository": get_url(repository)}
)

dist_repository = distribution.repository.cast()
Copy link
Member

@ipanova ipanova Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to account for the case when distribution.repository will be None, cast will fail

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes.

@lubosmj lubosmj force-pushed the refactor-get-dr-push-intererrors branch from 424f2a3 to f761804 Compare November 8, 2023 17:11
{"base_path": path, "name": path}, {"repository": get_url(repository)}
)

if distribution.repository:
Copy link
Member

@ipanova ipanova Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it needs an else:
distribution.repository = repository
distribution.save()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not needed if we create a new distribution. We want to check the names and references without updating anything.

In [1]: from pulp_container.app.serializers import ContainerPushRepositorySerializer

In [2]: from pulp_container.app.serializers import ContainerDistributionSerializer

In [3]: repo = ContainerPushRepositorySerializer.get_or_create({"name": "bla"})

In [4]: from pulp_container.app.models import ContainerPushRepository, ContainerDistribution

In [5]: ContainerPushRepository.objects.filter(name="bla")
Out[5]: <QuerySet [<ContainerPushRepository (pulp_type=container.container-push): pk=018bb01f-a735-79ab-a913-8878aec9226c>]>

In [6]: ContainerDistribution.objects.filter(name="bla")
Out[6]: <QuerySet []>

In [7]: from pulpcore.plugin.util import get_url

In [8]: dist = ContainerDistributionSerializer.get_or_create({"name": "bla", "base_path": "bla"}, {"repository": get_url(repo)})

In [9]: dist
Out[9]: <ContainerDistribution (pulp_type=container.container): pk=018bb022-33d0-723e-8562-576ce14e9ed8>

In [10]: dist.repository
Out[10]: <ContainerPushRepository (pulp_type=container.container-push): pk=018bb01f-a735-79ab-a913-8878aec9226c>

In [11]: repo
Out[11]: <ContainerPushRepository (pulp_type=container.container-push): pk=018bb01f-a735-79ab-a913-8878aec9226c>

Copy link
Member Author

@lubosmj lubosmj Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code path suggests that we are about to create a new distribution:

            distribution = models.ContainerDistribution.objects.get(base_path=path)
        except models.ContainerDistribution.DoesNotExist:
            if create:
                distribution, repository = self.create_dr(path, request)

Do you agree?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree that we are about to create new distribution, however to account for race condition during get_or_create ( and that's why we're using it here) another process could have created a distribution(without a repo) and the get_or_create would get the newly-created-dist by another process

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right!

)
distribution.repository = repository
distribution.save()
# Seems like another process created our stuff already. Retry fetching it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 309-315 can be entirely removed

@lubosmj lubosmj force-pushed the refactor-get-dr-push-intererrors branch from f761804 to 8ea9166 Compare November 8, 2023 18:20
@lubosmj lubosmj force-pushed the refactor-get-dr-push-intererrors branch from 8ea9166 to b856afd Compare November 8, 2023 19:52
@lubosmj lubosmj requested a review from ipanova November 8, 2023 19:53
@lubosmj lubosmj force-pushed the refactor-get-dr-push-intererrors branch from b856afd to afe751c Compare November 9, 2023 10:29
@lubosmj
Copy link
Member Author

lubosmj commented Nov 9, 2023

I am closing #489 with this change.

Copy link
Member

@ipanova ipanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥂

@lubosmj lubosmj merged commit 499e084 into pulp:main Nov 9, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants