You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found some unique patter of text and combination of URL-s and IP-s in it which cannot be parsed by def find_urls func correctly
Here is the text: data = """ blablabla https://advengineering.ru/ru/aden/software/mezhdisciplinarnyj-inzhenernyj-analiz/logos/o-programme/ blababla2 https://advengineering.ru/ru/aden/software/proektirovanie/kompas-3d/o-programme/ blablabla3 https://advengineering.ru/ru/aden/software/proektirovanie/kompas-3d/o-programme/ T-FLEX CAD 7.1.17.0 blablabla4 http://government.ru/news/51998/) bla bla bla5(https://t.me/government_rus/13877 finally 7.1.17.0 """
Obviously, some URL-s that follows after first of duplicated IP-s (7.1.17.0 ) are ignored and I'm pretty sure that problem (and the magic) is in some of numbers in IP-s.
I tried to dive into generator in def gen urls but it is quite complex for me. Maybe someone else would like to take this ... yr wellcome
The text was updated successfully, but these errors were encountered:
andreys42
changed the title
Exotic pattern that broke find_urls
Exotic pattern that broke find_urlsOct 10, 2024
I found some unique patter of text and combination of URL-s and IP-s in it which cannot be parsed by
def find_urls
func correctlyHere is the text:
data = """ blablabla https://advengineering.ru/ru/aden/software/mezhdisciplinarnyj-inzhenernyj-analiz/logos/o-programme/ blababla2 https://advengineering.ru/ru/aden/software/proektirovanie/kompas-3d/o-programme/ blablabla3 https://advengineering.ru/ru/aden/software/proektirovanie/kompas-3d/o-programme/ T-FLEX CAD 7.1.17.0 blablabla4 http://government.ru/news/51998/) bla bla bla5(https://t.me/government_rus/13877 finally 7.1.17.0 """
And there is what
find_urls
returns:['https://advengineering.ru/ru/aden/software/mezhdisciplinarnyj-inzhenernyj-analiz/logos/o-programme/', 'https://advengineering.ru/ru/aden/software/proektirovanie/kompas-3d/o-programme/', 'https://advengineering.ru/ru/aden/software/proektirovanie/kompas-3d/o-programme/', '7.1.17.0', '7.1.17.0']
Obviously, some URL-s that follows after first of duplicated IP-s (7.1.17.0 ) are ignored and I'm pretty sure that problem (and the magic) is in some of numbers in IP-s.
I tried to dive into generator in
def gen urls
but it is quite complex for me. Maybe someone else would like to take this ... yr wellcomeThe text was updated successfully, but these errors were encountered: