You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Although In gen_urls() a call to _get_tld_pos() determines the correct position of the TLD using rfind(), this correction has no bearing on on tld_pos, leading to returned incorrect indices and an invalid offset on the next loop..
Should the same TLD appear multiple times within a hostname, it may match repeatedly.
For example
>>> txt = "String bbb.aaa.bbb.aaa.aaa test string"
>>> for out in urlextract.URLExtract().gen_urls(txt, get_indices=1):
... print(out, txt[out[1][0] : out[1][1]])
...
('bbb.aaa.bbb.aaa.aaa', (-5, 14))
('bbb.aaa.bbb.aaa.aaa', (3, 22)) ing bbb.aaa.bbb.aaa
('bbb.aaa.bbb.aaa.aaa', (7, 26)) bbb.aaa.bbb.aaa.aaa
Should there be a query part in the string, further matches will possibly be skipped.
>>> txt = "String http://bbb.aaa.aaa/tests test string"
>>> for out in urlextract.URLExtract().gen_urls(txt, get_indices=1):
... print(out, txt[out[1][0] : out[1][1]])
...
('http://bbb.aaa.aaa/tests', (3, 27)) ing http://bbb.aaa.aaa/t
The text was updated successfully, but these errors were encountered:
Although In gen_urls() a call to _get_tld_pos() determines the correct position of the TLD using rfind(), this correction has no bearing on on tld_pos, leading to returned incorrect indices and an invalid offset on the next loop..
Should the same TLD appear multiple times within a hostname, it may match repeatedly.
For example
Should there be a query part in the string, further matches will possibly be skipped.
The text was updated successfully, but these errors were encountered: