HTML Sanitization

HTML sanitization is the process of cleaning and filtering HTML code to ensure it is safe to display and use. This process removes or neutralizes potentially harmful code that could be used for cross-site scripting (XSS) attacks or other malicious activities. Sanitizing HTML is crucial for maintaining the security and integrity of web applications.

How HTML Sanitization Works

HTML sanitization involves parsing the input HTML code and removing or escaping any potentially dangerous elements or attributes. The goal is to retain the safe, expected content while eliminating the risk of harmful actions.

Default Allowed Tags and Attributes

In our HTML sanitization process, we have a predefined set of tags and attributes that are allowed to ensure the security and integrity of the content. All other tags and attributes are blocked to prevent potential security risks.

Default Allowed Tags

These are the default tags allowed for user generated content everywhere in the Aristotle Metadata Registry. Any tags not listed here will be removed during the sanitization process:

<a>, <abbr>, <acronym>, <b>, <blockquote>, <br>, <code>, <col>, <colgroup>, <del>, <em>, <h1>, <h2>, <h3>, <h4>, <h5>, <h6>, <hr>, <i>, <img>, <ins>, <li>, <ol>, <p>, <strong>, <sub>, <sup>, <table>, <tbody>, <td>, <th>, <thead>, <tr>, <u>, <ul>

On custom pages, we also allow the <iframe> tag to accommodate additional functionality.

Default Allowed Attributes

In addition to allowing certain tags, we also specify which attributes are permitted for each tag to further control the content and ensure security. The following attributes are allowed:

  • Links (<a>): href, title, class, data-aristotle-concept-id, target

  • Abbreviations (<abbr>): title

  • Acronyms (<acronym>): title

  • Images (<img>): src, height, width, alt, style

  • Table Data (<td> and <tr>): colspan, rowspan, style

  • Table Headers (<th>): colspan, rowspan, style

  • Column Groups (<colgroup> and <col>): span

  • Strong Emphasis (<strong>): title

  • Tables (<table>): align, border, cellpadding, cellspacing

For custom pages, we also permit certain attributes for <iframe> elements, such as src, height, width, title, allowfullscreen, style, and sandbox.

Allowed CSS Styles

We also allow specific CSS styles to enable better control over the presentation of the content. These include:

  • height, width, background-color, vertical-align, text-align

This comprehensive approach to HTML sanitization ensures that we maintain a balance between functionality and security, allowing necessary content while preventing potential threats.

Last updated