Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exotic pattern that broke find_urls #170

Open
andreys42 opened this issue Oct 9, 2024 · 0 comments
Open

Exotic pattern that broke find_urls #170

andreys42 opened this issue Oct 9, 2024 · 0 comments

Comments

@andreys42
Copy link

andreys42 commented Oct 9, 2024

I found some unique patter of text and combination of URL-s and IP-s in it which cannot be parsed by def find_urls func correctly

Here is the text:
data = """ blablabla https://advengineering.ru/ru/aden/software/mezhdisciplinarnyj-inzhenernyj-analiz/logos/o-programme/ blababla2 https://advengineering.ru/ru/aden/software/proektirovanie/kompas-3d/o-programme/ blablabla3 https://advengineering.ru/ru/aden/software/proektirovanie/kompas-3d/o-programme/ T-FLEX CAD 7.1.17.0 blablabla4 http://government.ru/news/51998/) bla bla bla5(https://t.me/government_rus/13877 finally 7.1.17.0 """

And there is what find_urls returns:

['https://advengineering.ru/ru/aden/software/mezhdisciplinarnyj-inzhenernyj-analiz/logos/o-programme/', 'https://advengineering.ru/ru/aden/software/proektirovanie/kompas-3d/o-programme/', 'https://advengineering.ru/ru/aden/software/proektirovanie/kompas-3d/o-programme/', '7.1.17.0', '7.1.17.0']

Obviously, some URL-s that follows after first of duplicated IP-s (7.1.17.0 ) are ignored and I'm pretty sure that problem (and the magic) is in some of numbers in IP-s.
I tried to dive into generator in def gen urls but it is quite complex for me. Maybe someone else would like to take this ... yr wellcome

@andreys42 andreys42 changed the title Exotic pattern that broke find_urls Exotic pattern that broke find_urls Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant