Skip to content

Conversation

thomasteisberg
Copy link

This change allows nested dictionaries within Dataset attributes to be displayed as collapsible sections in notebook output (_repr_html_).

I'm working on a project that produces xarray Datasets with large nested attribute dictionaries. I couldn't find any way of overriding the _repr_html_ output without subclassing Dataset. We've temporarily implemented an accessor as a work-around, but I'm hoping this change in display format is generally useful enough to make it into xarray.

I'd appreciate any feedback on this display format and if xarray would be interested in incorporating something like this.

Minimal example

import xarray as xr

ds = xr.Dataset(
    {},
    attrs={
        "test": "This is a test dataset",
        "nested_dict": {
            "key1": "value1",
            "key2": "value2",
            "nested": {
                "subkey1": "subvalue1",
                "subkey2": "subvalue2",
            },
        },
        "nested_dict_2": {
            "key3": "value3",
            "key4": "value4",
        },
        "regular_key": "regular_value",
    }
)

ds

Before:

Screenshot from 2025-08-20 16-58-32

After:

Screenshot from 2025-08-20 16-46-46

There's an example of how we're using this (currently through an accessor) here: https://www.thomasteisberg.com/xopr/demo-notebook/

Copy link

welcome bot commented Aug 21, 2025

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient.
If you have questions, some answers may be found in our contributing guidelines.

@shoyer
Copy link
Member

shoyer commented Aug 21, 2025

Thanks Thomas for sharing!

My initial thought is that this feels a little complex to add into Xarray for a relatively niche use-case (sorry!). In particular, new CSS gets added to the saved output of every Xarray object, whether it uses dict attributes or not.

One small fix that might help a bit for your use-case is to is clip long attributes in the HTML repr at some max line length, similar to what we already do in the text repr:

<xarray.Dataset> Size: 0B
Dimensions:  ()
Data variables:
    *empty*
Attributes:
    test:           This is a test dataset
    nested_dict:    {'key1': 'value1', 'key2': 'value2', 'nested': {'subkey0'...
    nested_dict_2:  {'key3': 'value3', 'key4': 'value4'}
    regular_key:    regular_value

Then at least Xarray's HTML repr would not take up the entire screen.

@thomasteisberg
Copy link
Author

Hi @shoyer thanks for taking a look.

We were trying to come up with the most general form of an improved repr that would still work for us, but it may be that this is still too niche. Given the "do not subclass" recommendation, is there any recommended way to implement a custom _repr_html_ for libraries that build upon Dataset?

I ran across the same question here, but it doesn't seem like it went anywhere.

As far as CSS bloat, this PR adds 702 bytes to an 8048 byte stylesheet. If CSS file size is the main concern, I think it would likely be possible to cut that down more by re-using rules from existing collapsible elements. I'm happy to take a shot at that if it would make a difference, but I suspect the issue here is more about if this functionality is generally useful enough. Please tell me if I'm wrong, of course.

@shoyer
Copy link
Member

shoyer commented Aug 21, 2025

Thinking about this is a little more, I am coming around to the idea that expandable HTML reprs of attribute values could be broadly useful.

The structure would look something like:

  • Create a one line summary of an attribute. This could reuse the _repr_inline_ method, which we currently support for array values, or we could have a specific method for generating one line HTML summaries (maybe _repr_html_inline_?).
  • If the summary is shorter than the full repr, add an "expand" icon.
  • Clicking on "expand" shows the full repr, reusing protocols like _repr_html_ from IPython notebooks, which Xarray also currently supports for array values (e.g., for showing nice dask.array reprs).

I can see a case for implementing default HTML reprs for built-in Python objects like dict.

There's some related discussion here about nested inline reprs here: #4324

@thomasteisberg
Copy link
Author

That generally seems reasonable. I took a try at implementing a quick version of what I think you're describing. The behavior is:

For each attribute, we look for a _repr_html_inline_ and a _repr_html_. If we only have the inline, use that. If we have both and they are not equal, then show the _repr_html_inline_ as the preview with _repr_html_ as what it expands to.

Here's an example with a test dask array subclass:

image

All of the special casing makes the code quite a bit messier, but I suppose that's the tradeoff for a more general solution.

I added a test notebook you can try running if you want to play with a live version.

@ianhi
Copy link
Contributor

ianhi commented Aug 27, 2025

In general I am pro more interactive and rich displays so this is a welcome direction! I'm not ready to comment on generalization and protocols yet (but think I have thoughts).

My quick note on a possible failure mode of this PR is that there are some pathological dictionaries that this display will need to protect against. For example this code gives a recursion error:

# /// script
# requires-python = ">=3.12"
# dependencies = [
#     "xarray @ git+https://github.com/thomasteisberg/xarray.git@html-repr-nested-dictionary",
# ]
# ///

import xarray as xr
a = dict()
a['b']=a
ds = xr.Dataset(
    {},
    attrs={
        "a":a,
    }
)

ds

@thomasteisberg
Copy link
Author

That's a good point @ianhi. One option would be to include a maximum recursion depth for dictionaries.

Looking forward to seeing your thoughts on the rest of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants