Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add s3:// as possible package URLs #960

Open
pavelzw opened this issue Nov 25, 2024 · 15 comments
Open

Add s3:// as possible package URLs #960

pavelzw opened this issue Nov 25, 2024 · 15 comments

Comments

@pavelzw
Copy link
Contributor

pavelzw commented Nov 25, 2024

@delsner and I are planning to add S3 support to pixi/rattler.
Following ideas for modifying the config files

pixi.toml

[project]
name = "my-project"
channels = [
  "s3://my-bucket/channel",
  "conda-forge",
]

[s3]
region = "us-west-2"
endpoint = "http://localhost:9000"
url-style = "subdomain" # or "path" (used by Cloudflare R2 or minio)

[s3."my-bucket"]
region = "eu-west-1"

config.toml

[s3]
region = "eu-west-1" # globally (used in pixi init, pixi seach, pixi exec)

[s3.conda-forge] # globally (used in pixi init, pixi search, pixi exec)
# not used when not specified in pixi.toml but used in s3://
endpoint = "http://localhost:9000"
region = "us-east-1"
url-style = "subdomain" # or "path"
pixi auth login --access-key-id <access-key-id> --secret-access-key <secret-access-key> --session-token <session-token> s3://my-bucket

credentials.json

{
    "s3://my-bucket": {
        "S3": {
            "AccessKeyId": "<access-key-id>",
            "SecretAccessKey": "<secret-access-key>",
            "SessionToken": "<session-token>" // optional
        }
    }
}

pixi.lock

version: 6
environments:
  default:
    channels:
    - url: https://conda.anaconda.org/conda-forge/
    - url: s3://my-bucket/channel/
      s3:
        region: eu-west-1
        endpoint: http://localhost:9000
        url-style: subdomain # or path

Any thoughts on this proposal?

@pavelzw
Copy link
Contributor Author

pavelzw commented Nov 25, 2024

One open question would be:

do we want to use the rust-s3 crate (which might introduce additional dependencies) that also provides some sort of native credential reading (through env vars, ...) or should we only rely on our own implementation where we store access keys, ... in pixi's credential stores?

I personally would like to be able to handle pixi's credential stores for S3 credentials as well.
We could theoretically default to aws's defaults if we didn't specify anything else.

For reproducibily's sake we should IMO definitely add the aws region and endpoint (if not default) into pixi.lock (and pixi.toml).

@pavelzw
Copy link
Contributor Author

pavelzw commented Nov 25, 2024

AFAICT the designed above should be able to handle other providers like minio, cloudflare, gcs (for which we already have a middleware, maybe deprecate it afterwards?) as well.

@olivier-lacroix
Copy link

@pavelzw , any reason for not using the same approach as for gcs?

I reckon using existing crates for authentication is the best path: these things have a lot of corner cases...

@pavelzw
Copy link
Contributor Author

pavelzw commented Nov 25, 2024

IMO S3 is a protocol that has a much larger scope than AWS itself since it's used by almost all other cloud storage providers and not only AWS.
Thus, I would find it a bit irritating if the only configuration option for S3 would be via AWS config files / AWS environment variables.

Also, from what I've seen, AWS S3 is not really a hard protocol to implement. But I'm open to including it via for example rust-s3. Not sure about pulling the whole aws sdk in as a dependency, that might result in unnecessary large binary sizes for pixi.

@olivier-lacroix
Copy link

@pavelzw currently, the GCS middleware relies on a small external crate for authentication, but then directly maps the gcs URL to and HTTPS endpoint, rather than depending on any google cloud storage SDK.

Moving to rust-s3 would only replace that URL mapping, and may be valuable to handle uploads for instance

However, I think including middlewares with specific authentication mechanism is something orthogonal to this, and quite valuable (I am not loving keyring python packages...). What do you think?

@baszalmstra
Copy link
Collaborator

Roping in @ruben-arts since this also concerns pixi.

@pavelzw do you have an idea on how you would implement this in rattler? We currently dont really have a way to parameterize channels.

@pavelzw
Copy link
Contributor Author

pavelzw commented Nov 26, 2024

Shouldn't we be able to parametrize them similarly to how we do mirror configuration and feed this information into rattler from pixi and rattler-build?
(Haven't looked at much concerning rattler code yet, though)

@baszalmstra
Copy link
Collaborator

That would mean that we have configuration for channels living in the “channels” and in middleware. Coupling them. You also want to store the information in the lock file which then also requires specific configuration.

It would be nice if we could encapsulate that some more.

@ruben-arts
Copy link
Collaborator

Would there be configuration like this we want to be able to set for gcs://? If so I think it makes sense to move the table under [remote.s3] or something else. To show a clear connection.

@olivier-lacroix
Copy link

Would there be configuration like this we want to be able to set for gcs://? If so I think it makes sense to move the table under [remote.s3] or something else. To show a clear connection.

I don't think so @ruben-arts . but maybe I am missing something

@pavelzw
Copy link
Contributor Author

pavelzw commented Nov 26, 2024

theoretically gcs is s3-compatible as well, so we would be able to use GCS like this:

s3://my-bucket/chanel

# pixi.toml
[s3]
region = "us-east1"
endpoint = "https://storage.googleapis.com/"
url-style = "path"

see https://github.com/durch/rust-s3/blob/7c6fdc0646704eac315c11eb60bf9f125975159b/examples/gcs-tokio.rs#L18

you would need to specify your credentials manually using pixi auth login s3, though

@olivier-lacroix
Copy link

@pavelzw , I agree this would work. But I feel this would be much less practical than the current solution in place where one has only to specify the gcs url, and use usual GCP authentication mechanisms

to summarize,

  • like you @pavelzw , I agree that handling s3-like object store would be valuable
  • but that handling authentication in a « native » way is key to make it convenient. Which tends to be specific to each object store
  • And that handling the mapping of the « native » url (e.g. gcs://…) to the s3 config makes it a lot simpler

Also, I wonder if, in terms of pixi config, it wouldn’t be more natural to allow s3 config as inline tables in the channels array?

@pavelzw
Copy link
Contributor Author

pavelzw commented Nov 26, 2024

How about making it possible to configure it as described above (or your suggestion with inline tables) and if nothing is provided, we fall back to the aws-creds crate?

@olivier-lacroix
Copy link

@pavelzw , I am not sure I understand what you mean. I think there are two orthogonal aspects to consider:

  • Handling of S3-like object stores, which the rust-s3 crate can help with
  • Handling of authentication to these stores. This is where aws-cred can help, but also other helpers (for GCP, Azure, etc.). Selection of the right helper could/should be done automatically

@pavelzw
Copy link
Contributor Author

pavelzw commented Jan 2, 2025

in #1008 i added a first draft on s3 in rattler. i chose @olivier-lacroix's suggestion to handle all authentication over to aws crates

it should be possible to use different endpoints like minio, cloudflare r2, ... as well by specifying a custom aws config.
will write some docs in the pixi repo soon on this

feel free to already take a look


EDIT:

This is a list of all S3-related PRs:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants