Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loader: Lazily allocate ICD surface objects #1659

Conversation

aqnuep
Copy link
Contributor

@aqnuep aqnuep commented Feb 18, 2025

Fixes #1541.

This is marked as draft, because there are some open questions:

  • Do we want to have code-gen for the surface create info structure chain copies? I had to do a separate copy, because the currently captured create parameters are not complete, and there's one NVIDIA extension struct for surface creation which requires chained copies. They don't get modified often, but code-gen would be a more future proof solution
  • I had to modify the test framework, because the sType of surface create infos were not set before and now the new code expects correct input, not invalid (due to the create info copies). I hope that's okay.

@ci-tester-lunarg
Copy link

CI Vulkan-Loader build queued with queue ID 374508.

@ci-tester-lunarg
Copy link

CI Vulkan-Loader build queued with queue ID 374525.

@ci-tester-lunarg
Copy link

CI Vulkan-Loader build # 2918 running.

@ci-tester-lunarg
Copy link

CI Vulkan-Loader build # 2918 passed.

Copy link
Collaborator

@charles-lunarg charles-lunarg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a first review, my impression is that this 1. cleans the code up nicely and 2. the wsi code should really be generated.

As far as pNext chains go, there shouldn't be any unknown types due to the Loader preventing any unknown instance level extensions from being enabled. This can be disabled, but the policy prevents the common situation of "new instance level extension silently breaks code" by forcing a new loader to be used.

There is a new codegen feature in town with the latest header update, vulkan_object.py.

I would hold off doing any major changes to the code gen due so that the new (much better!) code gen can be used instead, which should be much easier to use and less "oh dear god". I can work on it shortly but am happy to review any codgen related changes you make. Wanted to establish who's doing what so nobody does work unnecessarily.

@aqnuep
Copy link
Contributor Author

aqnuep commented Feb 24, 2025

Thank you for the feedback!

All of that makes sense, but I'm not entirely sure what your suggested path is going forward.

I agree that we'd better add new code-gen based on the new infrastructure, but at the same time I'm not sure if/when the existing code-gen will be updated to use the new infrastructure.

Also, more importantly, do you want to take this as is and add the code-gen yourself, or you'd like me to do it just wait until the new code-gen infrastructure lands in the loader?

@charles-lunarg
Copy link
Collaborator

So in the interest of keeping things moving, I went ahead and started working to refactor the codegen to use vulkan_object.py #1665 - your review would be appreciated, as I'm sure I broke VulkanSC somehow.

As far as plans go, I think a reasonable path forward is to merge this PR without any (further) codegen changes, just focusing on correctness and features. Later, codegen can be added after the vulkan_object.py PR is finished. I did notice that because this PR touches codegen (for surface handle unwrapping), one PR is going to have to be rebased on the other. I suggest this PR goes first, since the codegen PR attempts to make no changes to the generated files.

wsi.c has been ripe for rewritting using codegen, so the above plan gives time for that without delaying any current PR.

@aqnuep
Copy link
Contributor Author

aqnuep commented Feb 26, 2025

So in the interest of keeping things moving, I went ahead and started working to refactor the codegen to use vulkan_object.py #1665 - your review would be appreciated, as I'm sure I broke VulkanSC somehow.

Thanks, I'll take a look.

As far as plans go, I think a reasonable path forward is to merge this PR without any (further) codegen changes, just focusing on correctness and features. Later, codegen can be added after the vulkan_object.py PR is finished. I did notice that because this PR touches codegen (for surface handle unwrapping), one PR is going to have to be rebased on the other. I suggest this PR goes first, since the codegen PR attempts to make no changes to the generated files.

Sure, that works, let me do another round of testing and review.

Regarding the codegen for surface handle unwrapping, I noticed that lot of the codegen doesn't actually produce any difference in output, I guess certain things changed over time related to which wsi parts are generated or not.

wsi.c has been ripe for rewritting using codegen, so the above plan gives time for that without delaying any current PR.

My thinking is to only extract out the surface create info pNext chain copies into codegen for now (in a separate PR per your suggestion), no general wsi codegen, simply because a lot of custom behavior is currently baked manually, e.g. certain differences across the different WSI platforms as well as customizations to support VK_GOOGLE_surfaceless_query. So completely replacing the wsi code with codegen requires additional design / thought put into it.

@aqnuep
Copy link
Contributor Author

aqnuep commented Feb 28, 2025

@charles-lunarg, what should be the course of action here?

Because if we want to postpone code-gen for the surface create info pNext chain copy for a future PR, then I could remove the "Draft" state from this PR and get it in pretty much as is (barring any rebasing that may be necessary).

@charles-lunarg
Copy link
Collaborator

Yes, lets mark this PR as 'ready' and review it again.

I want to review it again for consistency and to make sure I'm doing my due diligence.

@charles-lunarg charles-lunarg marked this pull request as ready for review February 28, 2025 16:26
@aqnuep aqnuep force-pushed the rastergrid/icd-surface-creation-fix branch from bacdd2a to decc83a Compare March 3, 2025 14:21
@ci-tester-lunarg
Copy link

CI Vulkan-Loader build queued with queue ID 384707.

@ci-tester-lunarg
Copy link

CI Vulkan-Loader build # 2929 running.

@ci-tester-lunarg
Copy link

CI Vulkan-Loader build queued with queue ID 384725.

@ci-tester-lunarg
Copy link

CI Vulkan-Loader build # 2930 running.

@ci-tester-lunarg
Copy link

CI Vulkan-Loader build # 2930 passed.

@aqnuep
Copy link
Contributor Author

aqnuep commented Mar 3, 2025

I've re-checked the code, rebased, all seems fine. Please feel free to review as you get to it.

Copy link
Collaborator

@charles-lunarg charles-lunarg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one actual bug (google surfaceless query) and some quibles over type punning that may or may not be valid. Would like your thoughts, especially since I believe the worries over UB may be irrelevant when codegen comes into play.

@aqnuep aqnuep force-pushed the rastergrid/icd-surface-creation-fix branch from decc83a to 15b50a4 Compare March 4, 2025 10:15
@ci-tester-lunarg
Copy link

CI Vulkan-Loader build queued with queue ID 385696.

@ci-tester-lunarg
Copy link

CI Vulkan-Loader build queued with queue ID 385713.

Copy link
Contributor Author

@aqnuep aqnuep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated what can be updated.

I know about the casting UB (theoretical one though), but it's not really unique to this situation. Strictly speaking any cast from any pointer type to something but char pointer is UB, but no SW (not even existing loader code) would work without that.

@ci-tester-lunarg
Copy link

CI Vulkan-Loader build # 2937 running.

@ci-tester-lunarg
Copy link

CI Vulkan-Loader build # 2937 passed.

Copy link
Collaborator

@charles-lunarg charles-lunarg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last nit around google_surfaceless_query, and I think this PR is good to go.

I have to reiterate that I love how the refactor makes a lot of the ugly 'check if handle is valid' code is replaced with single function call. Much easier to read & grok. So thank you for taking time to make the code better and not just different.

@aqnuep aqnuep force-pushed the rastergrid/icd-surface-creation-fix branch from 15b50a4 to 69d1ab1 Compare March 4, 2025 17:01
@ci-tester-lunarg
Copy link

CI Vulkan-Loader build queued with queue ID 386163.

@ci-tester-lunarg
Copy link

CI Vulkan-Loader build # 2943 running.

Copy link
Collaborator

@charles-lunarg charles-lunarg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI is going to 'fail' due to a scheduled brownout. I have another PR to fix it, I won't block this on that.

@ci-tester-lunarg
Copy link

CI Vulkan-Loader build # 2943 passed.

@charles-lunarg charles-lunarg merged commit 322d634 into KhronosGroup:main Mar 4, 2025
40 of 44 checks passed
@y-novikov
Copy link
Contributor

We have trouble rolling this into ANGLE.
SwiftShader doesn't look happy with this change.
https://chromium-review.googlesource.com/c/angle/angle/+/6325016
https://ci.chromium.org/ui/p/angle/builders/try/linux-test/26232/overview
https://chromium-swarm.appspot.com/task?id=6f7e751ca786b710

We don't get any useful stack from the crash, though.
If you can build ANGLE with this PR, you can repro on Linux with:
./angle_deqp_gles2_tests --use-angle=swiftshader --gtest_filter=dEQP-GLES2.info.vendor

@charles-lunarg
Copy link
Collaborator

@aqnuep I may revert this commit to unblock angle while the regression is diagnosed. Pinging you as a heads up.

@y-novikov
Copy link
Contributor

Here is a roll with a smaller blamelist than the one I gave earlier
https://chromium-review.googlesource.com/c/angle/angle/+/6333474

@charles-lunarg
Copy link
Collaborator

I believe I know what went wrong - the pAllocator callbacks passed in to vkCreateSurfaceXXX aren't being saved, and the instance's allocator callbacks are being used instead - however, no allocator callbacks were ever set, so angle tries to use the allocator callbacks and crashes since they are all nullptr.

Should be an easy fix, I'll work on it now.

@charles-lunarg
Copy link
Collaborator

PR is up, testing locally with ANGLE before merging
#1671

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

terminator_CreateDisplayPlaneSurfaceKHR() with multiple ICDs
4 participants