HTML Cleaner allows crafted scripts in special contexts like svg or math to pass through
High severity
GitHub Reviewed
Published
Nov 19, 2024
in
fedora-python/lxml_html_clean
•
Updated Jan 14, 2025
Description
Published to the GitHub Advisory Database
Nov 19, 2024
Reviewed
Nov 19, 2024
Published by the National Vulnerability Database
Nov 19, 2024
Last updated
Jan 14, 2025
Impact
The HTML Parser in lxml does not properly handle context-switching for special HTML tags such as
<svg>
,<math>
and<noscript>
. This behavior deviates from how web browsers parse and interpret such tags. Specifically, content in CSS comments is ignored by lxml_html_clean but may be interpreted differently by web browsers, enabling malicious scripts to bypass the cleaning process. This vulnerability could lead to Cross-Site Scripting (XSS) attacks, compromising the security of users relying on lxml_html_clean in default configuration for sanitizing untrusted HTML content.Patches
Users employing the HTML cleaner in a security-sensitive context should upgrade to lxml 0.4.0, which addresses this issue.
Workarounds
As a temporary mitigation, users can configure lxml_html_clean with the following settings to prevent the exploitation of this vulnerability:
remove_tags
: Specify tags to remove - their content is moved to their parents' tags.kill_tags
: Specify tags to be removed completely.allow_tags
: Restrict the set of permissible tags, excluding context-switching tags like<svg>
,<math>
and<noscript>
.References
References