Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML parsing error #7

Open
goodmami opened this issue Nov 17, 2016 · 3 comments
Open

XML parsing error #7

goodmami opened this issue Nov 17, 2016 · 3 comments

Comments

@goodmami
Copy link
Member

Not sure which document it was or what the problem was exactly.

  File ".../freki/readers/tetml.py", line 19, in __init__
    self.init_pages()
  File ".../freki/readers/tetml.py", line 25, in init_pages
    for event in xml_iter:
  File "/opt/python-3.4.1/lib/python3.4/xml/etree/ElementTree.py", line 1294, in __next__
    for event in self._parser.read_events():
  File "/opt/python-3.4.1/lib/python3.4/xml/etree/ElementTree.py", line 1277, in read_events
    raise event
  File "/opt/python-3.4.1/lib/python3.4/xml/etree/ElementTree.py", line 1235, in feed
    self._parser.feed(data)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 12, column 32
@goodmami
Copy link
Member Author

Also seeing this:

xml.etree.ElementTree.ParseError: unclosed token: line 720851, column 4

@goodmami
Copy link
Member Author

Quick inspection looks like many of these are having encoding issues. E.g. corrupted data makes the XML invalid.

@timo-kang
Copy link

timo-kang commented Dec 8, 2018

it looks your xml file is broken :(

try recover the xml file and then parse again.

parser = etree.XMLParser(recover=True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants