Description
Search terms
#router #createNormalizedUrl #utf8 #emoji
I’ve identified an issue in TypeDoc’s source code related to the createNormalizedUrl function, which is responsible for normalizing the output file name (URL). The problem occurs when handling UTF-8 characters during the normalization process.
Expected Behavior
- The createNormalizedUrl function should correctly handle all UTF-8 characters, including those that require more than one 16-bit code unit (like emojis and other non-BMP characters).
- The URL should be properly normalized without breaking or displaying incorrect characters.
- When a UTF-8 character is processed, it should be correctly encoded in the final output file name.
Actual Behavior
In the createNormalizedUrl function, TypeDoc uses String.fromCharCode() to rebuild the URL string after excluding unsupported characters. However, this method is inappropriate for handling UTF-8 characters that are outside the Basic Multilingual Plane (BMP), such as emoji or certain rare Unicode characters.
Why This Is a Problem:
- Incorrect Rebuilding of Characters: Using String.fromCharCode() can result in broken or incorrect characters when processing characters with code points greater than 0xFFFF (e.g., many emoji).
- Incorrect URL Normalization: This leads to improperly normalized output file names, potentially creating invalid or inconsistent file names for URLs that contain non-BMP characters.
Suggested Fix:
Replace String.fromCharCode() with String.fromCodePoint(). String.fromCodePoint() is designed to correctly handle all Unicode characters, including those that require more than one 16-bit code unit.
Steps to reproduce the bug
- Test with a UTF-8 character that requires more than one 16-bit code unit (e.g., an emoji) and observe the incorrect URL normalization.
- Add a frontmatter in an External document:
---
title: 🔧 🧑💻 Foo bar
---
Environment
- Typedoc version: ^0.27.9
- TypeScript version: 5.8.2
- Node.js version: 20.18.3
- OS: macOS 15.3.1