Skip to content

HTML Cleaner allows crafted scripts in special contexts like svg or math to pass through

High severity GitHub Reviewed Published Nov 19, 2024 in fedora-python/lxml_html_clean • Updated Jan 14, 2025

Package

pip lxml-html-clean (pip)

Affected versions

< 0.4.0

Patched versions

0.4.0

Description

Impact

The HTML Parser in lxml does not properly handle context-switching for special HTML tags such as <svg>, <math> and <noscript>. This behavior deviates from how web browsers parse and interpret such tags. Specifically, content in CSS comments is ignored by lxml_html_clean but may be interpreted differently by web browsers, enabling malicious scripts to bypass the cleaning process. This vulnerability could lead to Cross-Site Scripting (XSS) attacks, compromising the security of users relying on lxml_html_clean in default configuration for sanitizing untrusted HTML content.

Patches

Users employing the HTML cleaner in a security-sensitive context should upgrade to lxml 0.4.0, which addresses this issue.

Workarounds

As a temporary mitigation, users can configure lxml_html_clean with the following settings to prevent the exploitation of this vulnerability:

  • remove_tags: Specify tags to remove - their content is moved to their parents' tags.
  • kill_tags: Specify tags to be removed completely.
  • allow_tags: Restrict the set of permissible tags, excluding context-switching tags like <svg>, <math> and <noscript>.

References

References

Published to the GitHub Advisory Database Nov 19, 2024
Reviewed Nov 19, 2024
Published by the National Vulnerability Database Nov 19, 2024
Last updated Jan 14, 2025

Severity

High

CVSS overall score

This score calculates overall vulnerability severity from 0 to 10 and is based on the Common Vulnerability Scoring System (CVSS).
/ 10

CVSS v3 base metrics

Attack vector
Network
Attack complexity
High
Privileges required
None
User interaction
None
Scope
Unchanged
Confidentiality
High
Integrity
Low
Availability
High

CVSS v3 base metrics

Attack vector: More severe the more the remote (logically and physically) an attacker can be in order to exploit the vulnerability.
Attack complexity: More severe for the least complex attacks.
Privileges required: More severe if no privileges are required.
User interaction: More severe when no user interaction is required.
Scope: More severe when a scope change occurs, e.g. one vulnerable component impacts resources in components beyond its security scope.
Confidentiality: More severe when loss of data confidentiality is highest, measuring the level of data access available to an unauthorized user.
Integrity: More severe when loss of data integrity is the highest, measuring the consequence of data modification possible by an unauthorized user.
Availability: More severe when the loss of impacted component availability is highest.
CVSS:3.1/AV:N/AC:H/PR:N/UI:N/S:U/C:H/I:L/A:H

EPSS score

0.063%
(29th percentile)

CVE ID

CVE-2024-52595

GHSA ID

GHSA-5jfw-gq64-q45f

Credits

Loading Checking history
See something to contribute? Suggest improvements for this vulnerability.