You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
looks like the problem is with Apache Tika v1.2 service that we use. Something in this particular PDF breaks Tika.
There are several ways we can fix this problem. There are some new Go libraries that we should try for PDF conversion to text, and for text encodings normalization to UTF-8. If they perform well enough.
Also there is a newer version of Tika (v3.0.0), which might help, but in this case we have to rebuild Tika client, because new version's API is not compatible with one we currently use.
I get an internal server error when using this URL:
http://dermestidae.wz.cz/wp-content/uploads/2024/09/Catalogue-Derodontidae-2024.pdf
at:
https://finder.globalnames.org
The PDF upload also errors, although copying and pasting in the text works.
The text was updated successfully, but these errors were encountered: