[PR #6867] feat(editor): try to have ckeditor not crash when handling stranger tags #4862

Open
opened 2025-10-01 16:40:35 -05:00 by giteasync · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/TriliumNext/Trilium/pull/6867
Author: @perfectra1n
Created: 9/1/2025
Status: 🔄 Open

Base: mainHead: fix/html-tags-again


📝 Commits (1)

  • ba1c6ba feat(editor): try to have ckeditor not crash when handling stranger tags

📊 Changes

2 files changed (+440 additions, -10 deletions)

View changed files

apps/client/src/widgets/type_widgets/editable_text.test.ts (+295 -0)
📝 apps/client/src/widgets/type_widgets/editable_text.ts (+145 -10)

📄 Description

this definitely isn't being merged, but keeping it open for Elian. Here's what the annoying AI said the fix was for below lol:

CKEditor Generic Type Syntax Fix

The Problem

We discovered a critical bug where CKEditor would crash when switching from read-only to edit mode on notes containing programming language generic type syntax. Here's what was happening:

The Crash

When a note contained code like this:

HashMap<String, PromptTemplate> templates = new HashMap<>();
List<String, Value> items = getItems();

Or Rust code like:

fn process(error: Box<dyn Error>) -> Result<()> {
    // ...
}

CKEditor would crash with this error:

InvalidCharacterError: Failed to execute 'createElement' on 'Document': 
The tag name provided ('string,') is not a valid name.

Why It Crashed

CKEditor was trying to parse the generic type syntax as HTML tags! When it saw <String, PromptTemplate>, it thought:

  • "Oh, this must be an HTML tag called String,"
  • "Let me create that element..."
  • 💥 Crash! Because String, isn't a valid HTML tag name (it has a comma!)

The same happened with:

  • Box<dyn → tried to create a box<dyn element
  • <OpenAIClient> → tried to create an openaiclient element

Why Existing Solutions Didn't Work

Server-side HTML Sanitizer

Trilium already has an HTML sanitizer that uses an allowedHtmlTags setting. You might think, "Why not just use that?" Here's why that wouldn't work:

The sanitizer removes non-allowed tags entirely:

// What the sanitizer does:
sanitizeHtml("Here is <String, Type> code", { allowedTags: ['div', 'p'] })
// Result: "Here is  code"  
// ❌ The generic type syntax is DELETED!

We don't want to delete the code - we want to preserve it as text!

The Key Insight

  • Sanitizing: Removes unwanted HTML for security (data loss)
  • Escaping: Converts angle brackets to HTML entities (data preserved)

We need escaping, not sanitizing!

Our Solution

We implemented a smart escaping mechanism that:

  1. Identifies what's actually HTML using the user's allowedHtmlTags setting
  2. Escapes everything else so it displays as text instead of being parsed as HTML

Here's how it works:

private escapeGenericTypeSyntax(content: string): string {
    // Get allowed HTML tags from user settings
    const allowedTags = JSON.parse(options.get("allowedHtmlTags"));
    const htmlTags = new Set(allowedTags.map(tag => tag.toLowerCase()));
    
    // Step 1: Protect real HTML tags with placeholders
    const htmlProtectionMap = new Map<string, string>();
    let protectedContent = content.replace(/<\/?([a-zA-Z][a-zA-Z0-9-]*)\b[^>]*>/gi, (match, tagName) => {
        if (htmlTags.has(tagName.toLowerCase())) {
            const placeholder = `__HTML_PROTECTED_${counter++}__`;
            htmlProtectionMap.set(placeholder, match);
            return placeholder;
        }
        return match;
    });
    
    // Step 2: Escape ALL remaining angle brackets (these are code, not HTML!)
    protectedContent = protectedContent
        .replace(/</g, '&lt;')
        .replace(/>/g, '&gt;');
    
    // Step 3: Restore the real HTML tags
    htmlProtectionMap.forEach((original, placeholder) => {
        protectedContent = protectedContent.replace(placeholder, original);
    });
    
    return protectedContent;
}

Example Transformation

Input:

<p>Here's a Java class:</p>
<pre>
HashMap<String, PromptTemplate> templates;
Box<dyn Error> error;
</pre>
<div>More content</div>

After Step 1 (Protect HTML):

__HTML_0__Here's a Java class:__HTML_1__
__HTML_2__
HashMap<String, PromptTemplate> templates;
Box<dyn Error> error;
__HTML_3__
__HTML_4__More content__HTML_5__

After Step 2 (Escape remaining brackets):

__HTML_0__Here's a Java class:__HTML_1__
__HTML_2__
HashMap&lt;String, PromptTemplate&gt; templates;
Box&lt;dyn Error&gt; error;
__HTML_3__
__HTML_4__More content__HTML_5__

Final Output:

<p>Here's a Java class:</p>
<pre>
HashMap&lt;String, PromptTemplate&gt; templates;
Box&lt;dyn Error&gt; error;
</pre>
<div>More content</div>

CKEditor receives this and:

  • Parses <p>, <pre>, <div> as HTML tags
  • Displays &lt;String, PromptTemplate&gt; as visible text: <String, PromptTemplate>
  • No crash!

Why Our Fix is Correct

  1. Preserves Data: Unlike sanitization which deletes unknown tags, escaping preserves the code content
  2. Respects User Settings: Uses the allowedHtmlTags option so users can customize what's treated as HTML
  3. No False Positives: Only escapes things that aren't in the allowed tags list
  4. Bidirectional: We unescape when saving, so the database content remains unchanged
  5. Fail-Safe: Has fallback error handling if escaping fails

Configuration

Users can customize which tags are treated as HTML through:
Settings → Other → Allowed HTML Tags

The default list includes common HTML tags (div, span, p, etc.) and Trilium-specific elements (includenote).

Testing

The fix handles all these patterns correctly:

  • Java/C#: HashMap<String, List<Item>>
  • TypeScript: Record<string, unknown>
  • Rust: Box<dyn Error + Send>
  • C++: std::vector<int>
  • Incomplete: <String, (missing closing bracket)
  • Mixed: HTML and code in the same note

Summary

The bug occurred because CKEditor tried to parse programming language generic syntax as HTML tags. Our fix intelligently escapes non-HTML angle brackets while preserving real HTML tags, preventing crashes while maintaining all content. The solution respects user preferences through the configurable allowedHtmlTags setting and ensures no data loss occurs.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/TriliumNext/Trilium/pull/6867 **Author:** [@perfectra1n](https://github.com/perfectra1n) **Created:** 9/1/2025 **Status:** 🔄 Open **Base:** `main` ← **Head:** `fix/html-tags-again` --- ### 📝 Commits (1) - [`ba1c6ba`](https://github.com/TriliumNext/Trilium/commit/ba1c6ba0e1ef406022e39bbb87cd937c968c430b) feat(editor): try to have ckeditor not crash when handling stranger tags ### 📊 Changes **2 files changed** (+440 additions, -10 deletions) <details> <summary>View changed files</summary> ➕ `apps/client/src/widgets/type_widgets/editable_text.test.ts` (+295 -0) 📝 `apps/client/src/widgets/type_widgets/editable_text.ts` (+145 -10) </details> ### 📄 Description this definitely isn't being merged, but keeping it open for Elian. Here's what the annoying AI said the fix was for below lol: CKEditor Generic Type Syntax Fix ## The Problem We discovered a critical bug where CKEditor would crash when switching from read-only to edit mode on notes containing programming language generic type syntax. Here's what was happening: ### The Crash When a note contained code like this: ```java HashMap<String, PromptTemplate> templates = new HashMap<>(); List<String, Value> items = getItems(); ``` Or Rust code like: ```rust fn process(error: Box<dyn Error>) -> Result<()> { // ... } ``` CKEditor would crash with this error: ``` InvalidCharacterError: Failed to execute 'createElement' on 'Document': The tag name provided ('string,') is not a valid name. ``` Why It Crashed CKEditor was trying to parse the generic type syntax as HTML tags! When it saw `<String, PromptTemplate>`, it thought: - "Oh, this must be an HTML tag called `String,`" - "Let me create that element..." - 💥 Crash! Because `String,` isn't a valid HTML tag name (it has a comma!) The same happened with: - `Box<dyn` → tried to create a `box<dyn` element - `<OpenAIClient>` → tried to create an `openaiclient` element ## Why Existing Solutions Didn't Work ### Server-side HTML Sanitizer Trilium already has an HTML sanitizer that uses an `allowedHtmlTags` setting. You might think, "Why not just use that?" Here's why that wouldn't work: The sanitizer **removes** non-allowed tags entirely: ```javascript // What the sanitizer does: sanitizeHtml("Here is <String, Type> code", { allowedTags: ['div', 'p'] }) // Result: "Here is code" // ❌ The generic type syntax is DELETED! ``` We don't want to delete the code - we want to preserve it as text! ### The Key Insight - **Sanitizing**: Removes unwanted HTML for security (data loss) - **Escaping**: Converts angle brackets to HTML entities (data preserved) We need escaping, not sanitizing! ## Our Solution We implemented a smart escaping mechanism that: 1. **Identifies what's actually HTML** using the user's `allowedHtmlTags` setting 2. **Escapes everything else** so it displays as text instead of being parsed as HTML Here's how it works: ```typescript private escapeGenericTypeSyntax(content: string): string { // Get allowed HTML tags from user settings const allowedTags = JSON.parse(options.get("allowedHtmlTags")); const htmlTags = new Set(allowedTags.map(tag => tag.toLowerCase())); // Step 1: Protect real HTML tags with placeholders const htmlProtectionMap = new Map<string, string>(); let protectedContent = content.replace(/<\/?([a-zA-Z][a-zA-Z0-9-]*)\b[^>]*>/gi, (match, tagName) => { if (htmlTags.has(tagName.toLowerCase())) { const placeholder = `__HTML_PROTECTED_${counter++}__`; htmlProtectionMap.set(placeholder, match); return placeholder; } return match; }); // Step 2: Escape ALL remaining angle brackets (these are code, not HTML!) protectedContent = protectedContent .replace(/</g, '&lt;') .replace(/>/g, '&gt;'); // Step 3: Restore the real HTML tags htmlProtectionMap.forEach((original, placeholder) => { protectedContent = protectedContent.replace(placeholder, original); }); return protectedContent; } ``` ### Example Transformation **Input:** ```html <p>Here's a Java class:</p> <pre> HashMap<String, PromptTemplate> templates; Box<dyn Error> error; </pre> <div>More content</div> ``` **After Step 1 (Protect HTML):** ``` __HTML_0__Here's a Java class:__HTML_1__ __HTML_2__ HashMap<String, PromptTemplate> templates; Box<dyn Error> error; __HTML_3__ __HTML_4__More content__HTML_5__ ``` **After Step 2 (Escape remaining brackets):** ``` __HTML_0__Here's a Java class:__HTML_1__ __HTML_2__ HashMap&lt;String, PromptTemplate&gt; templates; Box&lt;dyn Error&gt; error; __HTML_3__ __HTML_4__More content__HTML_5__ ``` **Final Output:** ```html <p>Here's a Java class:</p> <pre> HashMap&lt;String, PromptTemplate&gt; templates; Box&lt;dyn Error&gt; error; </pre> <div>More content</div> ``` CKEditor receives this and: - ✅ Parses `<p>`, `<pre>`, `<div>` as HTML tags - ✅ Displays `&lt;String, PromptTemplate&gt;` as visible text: `<String, PromptTemplate>` - ✅ No crash! ## Why Our Fix is Correct 1. **Preserves Data**: Unlike sanitization which deletes unknown tags, escaping preserves the code content 2. **Respects User Settings**: Uses the `allowedHtmlTags` option so users can customize what's treated as HTML 3. **No False Positives**: Only escapes things that aren't in the allowed tags list 4. **Bidirectional**: We unescape when saving, so the database content remains unchanged 5. **Fail-Safe**: Has fallback error handling if escaping fails ## Configuration Users can customize which tags are treated as HTML through: **Settings → Other → Allowed HTML Tags** The default list includes common HTML tags (div, span, p, etc.) and Trilium-specific elements (includenote). ## Testing The fix handles all these patterns correctly: - ✅ Java/C#: `HashMap<String, List<Item>>` - ✅ TypeScript: `Record<string, unknown>` - ✅ Rust: `Box<dyn Error + Send>` - ✅ C++: `std::vector<int>` - ✅ Incomplete: `<String,` (missing closing bracket) - ✅ Mixed: HTML and code in the same note ## Summary The bug occurred because CKEditor tried to parse programming language generic syntax as HTML tags. Our fix intelligently escapes non-HTML angle brackets while preserving real HTML tags, preventing crashes while maintaining all content. The solution respects user preferences through the configurable `allowedHtmlTags` setting and ensures no data loss occurs. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
giteasync added the
pull-request
label 2025-10-01 16:40:35 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: TriliumNext/Trilium#4862
No description provided.