Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manifest,sources: add librepo support to Serialize/GenSources #1142

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mvo5
Copy link
Contributor

@mvo5 mvo5 commented Jan 14, 2025

This commit enables librepo sources generation via a new
osbuild.RpmDownloader iota that is passed to to
manifest.Serialize() and osbuild.GenSources().

We also need to pass the resolved repoConfig to manifest.Serialize().
This is currently not type-safe, ideally we would look into how to do
this in a type-safe way. Serialize should also take less args
ideally.

Superseeds #1132

With that we can enable --use-librepo in

This commit enables librepo sources generation via a new
osbuild.RpmDownloader iota that is passed to to
`manifest.Serialize()` and `osbuild.GenSources()`.

We also need to pass the resolved repoConfig to `manifest.Serialize()`.
This is currently not type-safe, ideally we would look into how to do
this in a type-safe way. Serialize should also take less args
ideally.
@achilleas-k
Copy link
Member

achilleas-k commented Jan 15, 2025

Something about this approach doesn't sit well with me. It's getting a bit confusing, I think. I think part of (or maybe the source of) the issue is that we carry around repo configs alongside the content (depsolved RPMs) but so far it was only used for creating the rpm stage options (GPG keys). Now, we're (optionally) swapping out the curl source for a librepo source, which changes both the associated sources and the stage options. This all makes sense. Where it gets confusing is how everything interacts, where information on one argument needs to line up with things from the other.

Here's what I mean by that:
There are two (package-related) collections we need when serialising: RPMs and repositories. There's also an option/enum that defines if we're going to use curl or librepo for the source download.

  1. In the curl case we need:
    • The remote_location property set on all RPMs.
    • OPTIONALLY: a collection of repositories with GPG keys and repo_ids that match the IDs in the RPMs.
      • In fact, these aren't really needed since the GPG keys are available on the repository configurations that are associated with pipelines before we even depsolve.
  2. In the librepo case we need:
    • The path property set on all RPMs.
    • MANDATORY: a collection of repositories with repo_ids that match the IDs in the RPMs.

This is simple enough for the RPMs themselves, it's pretty much just requiring either remote_location or path based on use case and both should always be set by the depsolver anyway. What I think creates confusion is that the arguments to serializeStart() (now Inputs) are a bit inconsistent:

  • All the fields define content (packages, ostree payloads, containers) except the RPM repos.
  • All the types of content are independent of one another in terms of source and stage option creation, except the RPM repos.

What I'm getting at here is that it might be cleaner to bundle the RPMs with the repositories from their creation in the depsolver and carry them around as one. This would change the following (fake-ish) code:

// init manifest
manifest := imgType.Manifest(bp, options, repos)

// resolve content
packageSets := manifest.GetPackageSetChains()
packageSpecs, repos := depsolve(packageSets)

containers := manifest.GetContainerSourceSpecs()
containerSpecs := resolveContainers(containers)

commits := manifest.GetOSTreeSourceSpecs()
commitSpecs := resolveCommits(commits)

// serialise with content
// NOTE: passing `nil` instead of `repos` might have no effect here, because
// the pipelines have the repo configs with the GPG keys already; we don't need
// resolved repos
serializedManifest := manifest.Serialize(packageSpecs, containerSpecs, commitSpecs, repos)  

to

// init manifest
manifest := imgType.Manifest(bp, options, repos)

// resolve content
packageSets := manifest.GetPackageSetChains()
packageRepoSpecs := depsolve(packageSets)  // *** NOTE: NEW STUFF HERE *** //

containers := manifest.GetContainerSourceSpecs()
containerSpecs := resolveContainers(containers)

commits := manifest.GetOSTreeSourceSpecs()
commitSpecs := resolveCommits(commits)

// serialise with content
serializedManifest := manifest.Serialize(packageRepoSpecs, containerSpecs, commitSpecs)

packageRepoSpecs can be a map[string]PkgRepos (keyed by pipeline name), where PkgRepos (better name needed) is a new type that just looks like:

type PkgRepos struct {
  rpms []rpmmd.PackageSpec
  repos []rpmmd.RepoConfig
}

The librepo vs curl option can be part of the Serialize() call, like this PR does already, or we could make it an option on the manifest, set during manifest initialisation, i.e., as part of one of the options in imgType.Manifest(...). Perhaps it could somehow be part of the repos argument, since it only really makes sense to use librepo when the original repo configs use mirrorlists.

One benefit here is that we can contain the consistency validation inside PkgRepos. In other words, we can have getters which will return an error if the rpms and repos are inconsistent¹. It would also allow us to guard against modifications of one of the two in a way that might create that inconsistency.

¹ Source/stage option creation should be left to pkg/osbuild and we can have getters with validation, e.g:

  • PkgRepos.GPGKeys() for the RPM stage options,
  • PkgRepos.PackageURLs() which returns package IDs alongside their full remote location URLs for the curl source,
  • PkgRepos.PackageMirrors() (again, bad name) which returns a list of (package path, mirror ID) 2-tuples and the mirror information, and checks for inconsistencies along the way.

The details aren't completely clear in my mind, so maybe ignore any issues with some of the specifics. What I think is the good idea here is having a single entity that we can be reasonably certain is internally consistent (because it comes straight from the depsolver) and provides conveniences for pulling out the data we need for stages and sources in a way that we don't need to verify or worry about outside of the entity itself.

@thozza
Copy link
Member

thozza commented Jan 15, 2025

I second @achilleas-k's comment. Bundling RPMs and repos into a single input type makes a lot of sense.

I don't have a strong opinion on adding the rpmDownloader argument to Manifest.Serialize(), but making it an option of the manifest when initializing it would be in my opinion cleaner.

Would it make sense to use librepo implicitly (if not forced) in cases when the repo configuration uses mirrorlist? I expect that we may end up using a mix of (3rd party) repos that use just baseurl and system (e.g. Fedora) repos, that by default use mirrorlist. While it would be possible to default to librepo for all such repos, using curl and librepo in combination may be technically feasible.

@mvo5
Copy link
Contributor Author

mvo5 commented Jan 15, 2025

Thanks for the feedback, those are good thoughts and they align well with my plans - I am happy to change the types, fwiw, the TODOs about type-safety in this PR align well with the ideas there. I went this way to land librepo in bib/ibcli asap to fix the issue that we see in CI and that our users have with flaky mirror and then I wanted to get the manifestgen code in images so that there is a single place accross the projects to change when changing how depsolve works and then change the types. But I'm happy to flip this around, it is nicer (but will be a bigger diff).

About the rpmDownloader argument, I have no strong opinion but I assume eventually we will just sunset the curl backend for package downloading? AIUI librepo is also curl based and can do everything that we do with curl too (i.e. if the repo does not have a mirrorlist or a metalink it will just use the BaseUrls. Or am I missing something?

@achilleas-k
Copy link
Member

You're right, we could use librepo everywhere, even when there's just one repo.

I don't mind doing the bigger change in a follow-up though.

@mvo5
Copy link
Contributor Author

mvo5 commented Jan 15, 2025

You're right, we could use librepo everywhere, even when there's just one repo.

I don't mind doing the bigger change in a follow-up though.

I did a quick test in 6a3ffdb - if that looks reasonable (and not too big) I am happy to merge it in here. Needs some naming tweaks probably and we need to decide if we really want this or a tailored dnfjson.Result (or in a different pkg) but it seems the principle is there (unless I misunderstood some of the above suggestions of course :)

@achilleas-k
Copy link
Member

Yeah, pull that in here too. I was thinking of even bigger things, but that's a good start.

@thozza
Copy link
Member

thozza commented Jan 16, 2025

I did a quick test in 6a3ffdb - if that looks reasonable (and not too big) I am happy to merge it in here. Needs some naming tweaks probably and we need to decide if we really want this or a tailored dnfjson.Result (or in a different pkg) but it seems the principle is there (unless I misunderstood some of the above suggestions of course :)

This looks OK to me at the first glance.

What I liked a lot from @achilleas-k 's comment were also the methods for getting data for stage / source generators:

  • PkgRepos.GPGKeys() for the RPM stage options,
  • PkgRepos.PackageURLs() which returns package IDs alongside their full remote location URLs for the curl source,
  • PkgRepos.PackageMirrors() (again, bad name) which returns a list of (package path, mirror ID) 2-tuples and the mirror information, and checks for inconsistencies along the way.

But that can definitely be a follow-up.

mvo5 added 2 commits January 16, 2025 08:43
This is a quick draft to show what it make manifest.Serialize()
less "floppy", i.e. to make it
a) type-safe so that only depsolved pkgs/repos can be passed
b) ensure pkg/repos that got depsolved together are grouped together

Note that this is reusing `dnfjson.DepsolveResult` for simplicity
to how the idea its an open question if we should actually use
it. The upside is that it gives us e.g. the sbom everyhwere which
might be nice. The downside is that its not tailored and a bit
long (dnfjson.Result might be enough for this already).
This commit renames `dnfjson.DepsolveResult` to `dnfjson.Result`.
It is slightly less descriptive but shorter and the only kind
of result we have in the pacakge is from a depsolve so the meaning
should be clear enough.
@mvo5
Copy link
Contributor Author

mvo5 commented Jan 16, 2025

Thank you! I pulled in the described variant of the suggestion now. It uses the dnfjson.DepsolveResult as input for the serialization. This gives us the coupling that we all want and type-safety (one can no longer pass a non-depsolved rpmmd.RepoConfig. I also renamed dnfjson.DepsolveResult to just Result in a separate commit (so that I can revert again if the feeling is that this is too short/non-descriptive/muddy).

I am happy to do followups with more cleanup, my plan is to pull in "manifestgen" into "images" from "image-builder-cli", that might be a good opportunity for more refactor. I will think more about the suggestion for the helpers and how that fits (would love love to hear more of your ideas, maybe in a short call?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants