Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions about training dataset #27

Open
MikeDean2367 opened this issue May 1, 2024 · 0 comments
Open

Some questions about training dataset #27

MikeDean2367 opened this issue May 1, 2024 · 0 comments

Comments

@MikeDean2367
Copy link

MikeDean2367 commented May 1, 2024

Great work!

I executed the following command and obtained the data file named wikipedia_links_aligned_spans.json in the folder ~/.cache/refined/datasets.

python3 src/refined/training/train/train.py --experiment_name test

I have two questions regarding this file:

  • Is wikipedia_links_aligned_spans.json the training data?
  • If so, which fields are used for training? I found three fields in the wikipedia_links_aligned_spans.json, which are hyperlinks_clean, hyperlinks, and predicted_spans. I'm not familiar with this three fields and I'm unsure how to proceed with obtaining the training data.

Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant