-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor the repository retrieval in get_dr_push #1376
Conversation
cbfcfb4
to
ef0d823
Compare
CHANGES/1275.misc
Outdated
@@ -0,0 +1,2 @@ | |||
Ensured repositories are correctly retrieved or initialized when handling concurrent requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/when handling concurrent requests/when handling client parallel blob upload
pulp_container/app/registry_api.py
Outdated
try: | ||
dist_serializer.is_valid(raise_exception=True) | ||
distribution = dist_serializer.create(dist_serializer.validated_data) | ||
except (IntegrityError, ValidationError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should catch these error separately because the error message is different
pulp_container/app/registry_api.py
Outdated
@@ -349,6 +325,32 @@ def get_dr_push(self, request, path, create=False): | |||
raise RepositoryNotFound(name=path) | |||
return distribution, repository | |||
|
|||
@transaction.atomic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a different wrapping into a transaction whole method vs just the try block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this change. The logic should be similar to the previous one, where we catch exceptions outside the transaction.atomic()
's context manager.
ef0d823
to
aa2bf06
Compare
695a008
to
13906c8
Compare
@ipanova, I believe this is the best I can do. We no longer granularly handle the validation and creation of entities inside pulp-container, see https://github.com/pulp/pulpcore/blob/3594624a3556af2ad400c29871041252ae947495/pulpcore/app/serializers/base.py#L360. |
pulp_container/app/registry_api.py
Outdated
distribution = serializers.ContainerDistributionSerializer.get_or_create( | ||
{"base_path": path, "name": path}, {"repository": get_url(repository)} | ||
) | ||
except ObjectDoesNotExist: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to be unnecessary, we whether retrieve existing instance or create a new one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is necessary. Once the validation fails (e.g., because of an invalid base_path), it tries to fetch an object from the database: https://github.com/pulp/pulpcore/blob/3594624a3556af2ad400c29871041252ae947495/pulpcore/app/serializers/base.py#L374. If the object does not exist after the last attempt, we should return RepositoryNotFound
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you're saying and you are right only because the get_or_create is not implemented fully as it should be. Get or create implies that on success an object will always be returned and failure shuld not be related to object existence but rather only validation.
Here https://github.com/pulp/pulpcore/blob/3594624a3556af2ad400c29871041252ae947495/pulpcore/app/serializers/base.py#L372 we should not catch these together but separately, because if validation fails because of field being missing or invalid we need to raise it. And if it failed because 'this field must be unique; then we retrieve existing object.
Here I took a stab pulp/pulpcore#4643
pulp_container/app/registry_api.py
Outdated
{"name": path} | ||
) | ||
distribution = serializers.ContainerDistributionSerializer.get_or_create( | ||
{"base_path": path, "name": path}, {"repository": get_url(repository)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great, you figured it 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here you need to check whether returned distribution has a repo. and if it has it should be same repo-id as on line 330 otherwise we need to raise an error that repo is read-only
how about this workflow:
- create repo(regular) foo and distribution foo
- docker push foo and it should fail
pulp_container/app/registry_api.py
Outdated
raise RepositoryInvalid(name=path) | ||
|
||
repository = repository.cast() | ||
if not repository.PUSH_ENABLED: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah so this needs to be moved upper per my previous comment. now you're checking the repo you have created on line 330
repository = repository.cast() | ||
if not repository.PUSH_ENABLED: | ||
raise RepositoryInvalid(name=path, message="Repository is read-only.") | ||
distribution, repository = self.create_dr(path, request) | ||
else: | ||
raise RepositoryNotFound(name=path) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you checked the else branch? I think it needs re-working too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 328 makes little sense without trying to understand much the code, can you double check the logic if this is still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I revised the else branch and it also needs to switch the serializers to use get_or_create.
The code has changed significantly after the approval. New comments arose.
Waiting for pulp/pulpcore@bf0654b to be released. |
4954d4c
to
424f2a3
Compare
pulp_container/app/registry_api.py
Outdated
) | ||
repo_serializer.is_valid(raise_exception=True) | ||
repository = repo_serializer.create(repo_serializer.validated_data) | ||
distribution.repository = repository | ||
distribution.save() | ||
except IntegrityError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unsure if save()
could raise any error. By the way, should we enforce saving the object just in time with force_update=True
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pulp_container/app/registry_api.py
Outdated
{"base_path": path, "name": path}, {"repository": get_url(repository)} | ||
) | ||
|
||
dist_repository = distribution.repository.cast() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to account for the case when distribution.repository will be None, cast will fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes.
424f2a3
to
f761804
Compare
{"base_path": path, "name": path}, {"repository": get_url(repository)} | ||
) | ||
|
||
if distribution.repository: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it needs an else:
distribution.repository = repository
distribution.save()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not needed if we create a new distribution. We want to check the names and references without updating anything.
In [1]: from pulp_container.app.serializers import ContainerPushRepositorySerializer
In [2]: from pulp_container.app.serializers import ContainerDistributionSerializer
In [3]: repo = ContainerPushRepositorySerializer.get_or_create({"name": "bla"})
In [4]: from pulp_container.app.models import ContainerPushRepository, ContainerDistribution
In [5]: ContainerPushRepository.objects.filter(name="bla")
Out[5]: <QuerySet [<ContainerPushRepository (pulp_type=container.container-push): pk=018bb01f-a735-79ab-a913-8878aec9226c>]>
In [6]: ContainerDistribution.objects.filter(name="bla")
Out[6]: <QuerySet []>
In [7]: from pulpcore.plugin.util import get_url
In [8]: dist = ContainerDistributionSerializer.get_or_create({"name": "bla", "base_path": "bla"}, {"repository": get_url(repo)})
In [9]: dist
Out[9]: <ContainerDistribution (pulp_type=container.container): pk=018bb022-33d0-723e-8562-576ce14e9ed8>
In [10]: dist.repository
Out[10]: <ContainerPushRepository (pulp_type=container.container-push): pk=018bb01f-a735-79ab-a913-8878aec9226c>
In [11]: repo
Out[11]: <ContainerPushRepository (pulp_type=container.container-push): pk=018bb01f-a735-79ab-a913-8878aec9226c>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code path suggests that we are about to create a new distribution:
distribution = models.ContainerDistribution.objects.get(base_path=path)
except models.ContainerDistribution.DoesNotExist:
if create:
distribution, repository = self.create_dr(path, request)
Do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree that we are about to create new distribution, however to account for race condition during get_or_create ( and that's why we're using it here) another process could have created a distribution(without a repo) and the get_or_create would get the newly-created-dist by another process
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right!
pulp_container/app/registry_api.py
Outdated
) | ||
distribution.repository = repository | ||
distribution.save() | ||
# Seems like another process created our stuff already. Retry fetching it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 309-315 can be entirely removed
f761804
to
8ea9166
Compare
8ea9166
to
b856afd
Compare
b856afd
to
afe751c
Compare
I am closing #489 with this change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥂
closes #1275
closes #489