Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue in the cell for the glueing #227

Open
burnout87 opened this issue Jan 15, 2025 · 4 comments · May be fixed by #226
Open

Issue in the cell for the glueing #227

burnout87 opened this issue Jan 15, 2025 · 4 comments · May be fixed by #226
Assignees

Comments

@burnout87
Copy link
Contributor

burnout87 commented Jan 15, 2025

As noticed by @ferrigno , there is probably an issue with the decode function of the ImageDataProduct and the NumpyDataProduct class

https://gitlab.renkulab.io/astronomy/mmoda/jemx-expert/-/blob/main/test_products.ipynb?ref_type=heads

@burnout87 burnout87 self-assigned this Jan 15, 2025
@burnout87
Copy link
Contributor Author

burnout87 commented Jan 15, 2025

After some investigation, I noticed that the error:

UnpicklingError: invalid load key, '\xfc'.

appears when executing the code:

ImageDataProduct.decode(result_image)

where result_image is read from the outputs of the notebook execution.Could it be related to the processing performed by the operations of glueing in the last cell (added by the backend nb2workflow)?

@dsavchenko maybe you have an idea of this

@burnout87 burnout87 transferred this issue from oda-hub/oda_api Jan 15, 2025
@burnout87 burnout87 linked a pull request Jan 15, 2025 that will close this issue
@burnout87 burnout87 changed the title Issue in decoding Issue in the cell for the glueing Jan 15, 2025
@dsavchenko
Copy link
Member

I changed the test a bit (see PR), to demonstrate what's going on.
While it works in the first case, it fails in the second.

The difference is that in the second case, the encoded object is a string (kind of double-encoded in a sense). So it ends up being processed through literal_to_json. And by simple coincidence, there is 'nan' substring in the encoded representation of the binary data. It gets modified, then unsurprisingly the data is corrupted.

I don't see a simple and clear solution, the overall encoding-decoding is convoluted, and overall, I don't really understand why it is made like this, with all this pickling etc.
Changing it may have unpredicted consequences and compatibility issues.
So we really need to think well...

@ferrigno
Copy link
Contributor

It seems to affect only products retrieved through the nb2workflow run function, as files downloaded from the front end are correct. Maybe @volodymyrss has a historical overall view.

Please notice that also the rmf file in the spectral extraction has an issue related to encoding and decoding. Most probably due to the peculiar nature of the compression used in the matrix or maybe just for the presence of NaN.

@volodymyrss
Copy link
Member

Is it really for this repository?

The complexity with encoding in odaapi is old, but the complexity is already with astropy fits serialization, there were astropy bugs with variable size fits tables.

I think as long as there are tests, refactoring is safe.
Maybe we can add a bit more tests to be safer. And refactor a bit as needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants