HTML Sanitization
HTML sanitization is the process of cleaning and filtering HTML code to ensure it is safe to display and use. This process removes or neutralizes potentially harmful code that could be used for cross-site scripting (XSS) attacks or other malicious activities. Sanitizing HTML is crucial for maintaining the security and integrity of web applications.
How HTML Sanitization Works
HTML sanitization involves parsing the input HTML code and removing or escaping any potentially dangerous elements or attributes. The goal is to retain the safe, expected content while eliminating the risk of harmful actions.
Default Allowed Tags and Attributes
In our HTML sanitization process, we have a predefined set of tags and attributes that are allowed to ensure the security and integrity of the content. All other tags and attributes are blocked to prevent potential security risks.
Default Allowed Tags
These are the default tags allowed for user generated content everywhere in the Aristotle Metadata Registry. Any tags not listed here will be removed during the sanitization process:
<a>
, <abbr>
, <acronym>
, <b>
, <blockquote>
, <br>
, <code>
, <col>
, <colgroup>
, <del>
, <em>
, <h1>
, <h2>
, <h3>
, <h4>
, <h5>
, <h6>
, <hr>
, <i>
, <img>
, <ins>
, <li>
, <ol>
, <p>
, <strong>
, <sub>
, <sup>
, <table>
, <tbody>
, <td>
, <th>
, <thead>
, <tr>
, <u>
, <ul>
On custom pages, we also allow the <iframe>
tag to accommodate additional functionality.
Default Allowed Attributes
In addition to allowing certain tags, we also specify which attributes are permitted for each tag to further control the content and ensure security. The following attributes are allowed:
Links (
<a>
):href
,title
,class
,data-aristotle-concept-id
,target
Abbreviations (
<abbr>
):title
Acronyms (
<acronym>
):title
Images (
<img>
):src
,height
,width
,alt
,style
Table Data (
<td>
and<tr>
):colspan
,rowspan
,style
Table Headers (
<th>
):colspan
,rowspan
,style
Column Groups (
<colgroup>
and<col>
):span
Strong Emphasis (
<strong>
):title
Tables (
<table>
):align
,border
,cellpadding
,cellspacing
For custom pages, we also permit certain attributes for <iframe>
elements, such as src
, height
, width
, title
, allowfullscreen
, style
, and sandbox
.
Allowed CSS Styles
We also allow specific CSS styles to enable better control over the presentation of the content. These include:
height
,width
,background-color
,vertical-align
,text-align
This comprehensive approach to HTML sanitization ensures that we maintain a balance between functionality and security, allowing necessary content while preventing potential threats.
Last updated