Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal Server Error #154

Open
gdower opened this issue Dec 5, 2024 · 1 comment
Open

Internal Server Error #154

gdower opened this issue Dec 5, 2024 · 1 comment

Comments

@gdower
Copy link

gdower commented Dec 5, 2024

I get an internal server error when using this URL:

http://dermestidae.wz.cz/wp-content/uploads/2024/09/Catalogue-Derodontidae-2024.pdf

at:

https://finder.globalnames.org

The PDF upload also errors, although copying and pasting in the text works.

@dimus
Copy link
Member

dimus commented Dec 6, 2024

looks like the problem is with Apache Tika v1.2 service that we use. Something in this particular PDF breaks Tika.

There are several ways we can fix this problem. There are some new Go libraries that we should try for PDF conversion to text, and for text encodings normalization to UTF-8. If they perform well enough.

Also there is a newer version of Tika (v3.0.0), which might help, but in this case we have to rebuild Tika client, because new version's API is not compatible with one we currently use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants