Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in Span.sents #13769

Open
nrodnova opened this issue Mar 12, 2025 · 0 comments
Open

Bug in Span.sents #13769

nrodnova opened this issue Mar 12, 2025 · 0 comments

Comments

@nrodnova
Copy link
Contributor

When a Doc's entity is in the second to the last sentence, and the last sentence consists only of one token, entity.sents includes that last 1-token sentence (even though the entity is fully contained by the previous sentence.

How to reproduce the behaviour

text = "This is a sentence. This is another sentence. Third"
doc = nlp.tokenizer(text)
doc[0].is_sent_start = True
doc[5].is_sent_start = True
doc[10].is_sent_start = True
doc.ents = [('ENTITY', 7, 9)] # "another sentence" phrase in the second sentence
entity = doc.ents[0]
print(f"Entity: {entity}. Sentence: {entity.sent} Sentences: {list(entity.sents)}")

Output:

Entity: another sentence. Sentence: This is another sentence. Sentences: [This is another sentence., Third]

Your Environment

  • Operating System:
  • Python Version Used:
  • spaCy Version Used:
  • Environment Information:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant