Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing NER task #61

Open
fdalvi opened this issue Jun 14, 2023 · 1 comment
Open

Parsing NER task #61

fdalvi opened this issue Jun 14, 2023 · 1 comment

Comments

@fdalvi
Copy link
Collaborator

fdalvi commented Jun 14, 2023

Sometimes, the outputs are like:

{
  "label": "B-LOC B-LOC O B-PERS I-PERS O O O O B-PERS I-PERS O O O O O O O O O O O O O O O O O O B-LOC B-LOC O O O O O O O O O O O O B-PERS O O O O O O O O O O O O O O O O O O O O B-LOC B-LOC O O O O O O O",
  "model_output": "Output: [('الصالحية', 'LOC'), ('المفرق', 'LOC'), ('-', 'O'), ('غيث', 'PER'), ('الطراونة', 'PER'), ('-', 'O'), ('أمر', 'O'), ('جلالة', 'O'), ('الملك', 'PER'), ('عبدالله', 'PER'), ('الثاني', 'PER'), ('أمس', 'O'), ('بتنفيذ', 'O'), ('حزمة', 'O'), ('من',... ('التحديات', 'O'), ('التي', 'O'), ('يواجهها', 'O'), ('أبناء', 'O'), ('الصالحية', 'LOC'), ('ونايفة', 'LOC'), ('خصوصا', 'O'), ('فيما', 'O'), ('يتعلق', 'O'), ('بمشكلتي', 'O'), ('الفقر', 'O'), ('والبطالة', 'O'), ('.', 'O')]",
}

Should we count the LOC as "B-LOC"? What about consecutive ones, should the first one be "B-" and second one "I-" (this is not always correct, like the first two tokens in the above; Up for discussion @firojalam @baselmousi

@baselmousi
Copy link
Contributor

Thanks for bringing this up. I will prepare output files to compare labels and returned post-processed responses for both gpt-3.5 and gpt-4. Considering 5 labels instead of 9 will improve the results quite a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants