# HTML Sanitization

HTML sanitization is the process of cleaning and filtering HTML code to ensure it is safe to display and use. This process removes or neutralizes potentially harmful code that could be used for cross-site scripting (XSS) attacks or other malicious activities. Sanitizing HTML is crucial for maintaining the security and integrity of web applications.

### How HTML Sanitization Works

HTML sanitization involves parsing the input HTML code and removing or escaping any potentially dangerous elements or attributes. The goal is to retain the safe, expected content while eliminating the risk of harmful actions.

#### Default Allowed Tags and Attributes

In our HTML sanitization process, we have a predefined set of tags and attributes that are allowed to ensure the security and integrity of the content. All other tags and attributes are blocked to prevent potential security risks.

**Default Allowed Tags**

These are the default tags allowed *for user generated content* everywhere in the Aristotle Metadata Registry. Any tags not listed here will be removed during the sanitization process:

`<a>`, `<abbr>`, `<acronym>`, `<b>`, `<blockquote>`, `<br>`, `<code>`, `<col>`, `<colgroup>`, `<del>`, `<em>`, `<h1>`, `<h2>`, `<h3>`, `<h4>`, `<h5>`, `<h6>`, `<hr>`, `<i>`, `<img>`, `<ins>`, `<li>`, `<ol>`, `<p>`, `<strong>`, `<sub>`, `<sup>`, `<table>`, `<tbody>`, `<td>`, `<th>`, `<thead>`, `<tr>`, `<u>`, `<ul>`

On custom pages, we also allow the `<iframe>` tag to accommodate additional functionality.

**Default Allowed Attributes**

In addition to allowing certain tags, we also specify which attributes are permitted for each tag to further control the content and ensure security. The following attributes are allowed:

* **Links (`<a>`)**: `href`, `title`, `class`, `data-aristotle-concept-id`, `target`
* **Abbreviations (`<abbr>`)**: `title`
* **Acronyms (`<acronym>`)**: `title`
* **Images (`<img>`)**: `src`, `height`, `width`, `alt`, `style`
* **Table Data (`<td>` and `<tr>`)**: `colspan`, `rowspan`, `style`
* **Table Headers (`<th>`)**: `colspan`, `rowspan`, `style`
* **Column Groups (`<colgroup>` and `<col>`)**: `span`
* **Strong Emphasis (`<strong>`)**: `title`
* **Tables (`<table>`)**: `align`, `border`, `cellpadding`, `cellspacing`

For custom pages, we also permit certain attributes for `<iframe>` elements, such as `src`, `height`, `width`, `title`, `allowfullscreen`, `style`, and `sandbox`.

**Allowed CSS Styles**

We also allow specific CSS styles to enable better control over the presentation of the content. These include:

* `height`, `width`, `background-color`, `vertical-align`, `text-align`

This comprehensive approach to HTML sanitization ensures that we maintain a balance between functionality and security, allowing necessary content while preventing potential threats.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.aristotlemetadata.com/creating-and-editing/html-sanitization.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
