-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with loading Additional Entities #19
Comments
I went through a very similar issue after updating the files with latest wiki dumps. I believe that is indeed to attribute to the different shape of the classes tensor. To perform zero-shot inference, without the need to retrain your model, you may want to use a mixture of original files (the ones that consider the old number of classes) and newly generated ones. The combination that i figured out to run the model effectively is the following:
NOTE: |
Thanks heaps for replying @lucatorellimxm. With your suggestions I was at least able to run the model... but for whatever reason the performance is way off. Some entities that it was previously disambiguating/linking are no longer linking correctly, and my 'additional entities' are also not linking. |
Just an update in case anyone ever looks here... Eventually got everything working well... But discovered 2 things:
|
Great advices, thank you @seanaedmiston. Does 2. still hold true in case of full model training? I am experiencing some linking issues with rather easy mentions even after training the model from scratch on new data and that could be the case. |
Yes - I saw poor linking performance (point 2) even with full model training. Fixing the 'redirect' parsing problems I found should fix that. It made a huge difference for me. My fork is a bit of a mess, but the only changes you should need are in process_wiki.py : main...Simbolo-io:ReFinED:main#diff-7aac257f29f9e00bda22f968125b52fc5bc3ced71e9627c5bf51780c4a8230c3 One little wrinkle, in the latest wikipedia dumps there is an article title that consists of just a backslash. If that causes you problems, you may need the additional fix to loaders.py here: main...Simbolo-io:ReFinED:main#diff-7fbb3c56891f6094624a3872d81cde9dab1d4585452975093f5fdd63dece42ea |
I am trying to add additional entities without retraining. I am not able to find the file "chosen_classes.txt" in the original folder: datasets: roberta-base: wikipedia_data: wikipedia_model: wikipedia_model_with_numbers: |
I have tried to load additional entities as per the README by running
preprocess_all
. Everything appears to run fine - however when I try and load the refined model afterwards with something like:I get an error like:
To the best of my understanding, this is because the number of classes in the wikidata dump has changed since the original model was trained. (Number of class_to_label.json now has 1446 entries.) Is there any way to accomodate this without completely retraining the model?
The text was updated successfully, but these errors were encountered: