Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it easier to get versioned set of parameters saved #4571

Open
maxschulz-COL opened this issue Mar 13, 2025 · 0 comments
Open

Make it easier to get versioned set of parameters saved #4571

maxschulz-COL opened this issue Mar 13, 2025 · 0 comments
Labels
Community Issue/PR opened by the open-source community Issue: Feature Request New feature or improvement to existing feature

Comments

@maxschulz-COL
Copy link

maxschulz-COL commented Mar 13, 2025

Description

As you will have seen on the Slack channel, I wanted an easy (no extensions) way to save a version file of the parameters I am using. I consider that a key thing that many people must be doing??

So my feature request would be: make it easier to get versioned set of parameters saved

The current solution I found (after much going back and fourth) is the following hook:

@hook_impl
def after_pipeline_run(
    self,
    run_params: dict[str, Any],
    catalog: DataCatalog,
) -> None:
    """Hook implementation to run after a pipeline completes to save versioned data."""
    session_id = run_params["session_id"]
    parameters = catalog.load("parameters")

    # Create dataframe with session_id and parameters
    parameters_df = pd.DataFrame(
        {"session_id": [session_id], "parameters": [str(parameters)]}
    )
    # Save parameters to catalog
    catalog.save("params", parameters_df)

with

params:
  type: pandas.ParquetDataset
  filepath: data/08_reporting/params.parquet
  versioned: true

Context

Just some context as to the journey I had (not saying that things can eventually be found, but they are very sprinkled across docs:

  • searching for paramers saving/versioning yields no results
  • Once realising hooks might be a solution (scary for a newbie), the docs are lackluster as to what I have access to, see:
  • Once I understood hooks in general, and also that I can interact with the catalog, I struggled to get the session_id and the parameters together
    • Aside: KedroContext and KedroSession are very confusing as to what they contain
  • In the end, it was super hard to realise the catalog also contains the parameters - which was the last missing piece

Anyway, that was my journey for something that I feel should be easier...

Possible Implementation

So the solution to this could be:

  • to have better/easier documentation (especially on hooks) that gets you to the answer of this question quicker
  • extend some arguments in hooks to have more variables (major pain was to realise parameters could be found in catalog - could this be flattened
  • build something even more native, not sure what this would look like

In the end it's probably realistic to only do the first, but due to the other options, I left it as feature request

@maxschulz-COL maxschulz-COL added the Issue: Feature Request New feature or improvement to existing feature label Mar 13, 2025
@maxschulz-COL maxschulz-COL changed the title <Title> Make it easier to get versioned set of parameters saved Mar 13, 2025
@merelcht merelcht added the Community Issue/PR opened by the open-source community label Mar 13, 2025
@github-project-automation github-project-automation bot moved this to Wizard inbox in Kedro Wizard 🪄 Mar 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Issue/PR opened by the open-source community Issue: Feature Request New feature or improvement to existing feature
Projects
Status: Wizard inbox
Development

No branches or pull requests

2 participants