CmdPal: Enhance font icon classification and visuals (#41573)

## Summary of the Pull Request - Introduces `FontIconGlyphClassifier` for classifying emojis and symbols. - Correctly recognizes multi-codepoint glyphs (e.g., 🧙🏼‍♀️ *woman mage with medium-light skin tone*). - Explicitly disallows multi-glyph icons (they would overflow anyway). - Distinguishes between emojis and regular text characters (letters, numbers, symbols), since emojis are slightly larger and require different padding. - Recognizes Unicode [Variation Selectors](https://en.wikipedia.org/wiki/Variation_Selectors_(Unicode_block)) to enforce specific styles: VS15 (U+FE0E) for text style (monochrome) and VS16 (U+FE0F) for emoji style (color). This lets developers choose which variant to display. By default, characters with both representations render as text/monochrome (e.g., ▶ `\u25B6`): <img width="428" height="39" alt="image" src="https://github.com/user-attachments/assets/c5e6865f-61de-4f45-9f3a-4e15e5e5ceb8" /> - Invalid icons are displayed as a dashed circle so extension developers can spot issues, without being overly distracting if they slip into production. - Updates `IconPathConverter` to use the new classifier for improved icon handling. - Adds `SampleIconPage` to demonstrate various icon usages and classifications. - Adjusts icon alignment in `IconBox` so icons are centered. - Scales negative padding for emojis in `IconBox` with control size, fixing misalignment and clipping (noticeable in tags and the details pane hero image). - Applies negative padding to all font icons. This removes the need for classification in these cases and ensures symbols rendered below the baseline remain visible. Based on [microsoft/terminal#19143](https://github.com/microsoft/terminal/pull/19143): Co-authored-by: Dustin L. Howett <duhowett@microsoft.com> Pictures? Pictures! <img width="1912" height="2394" alt="image" src="https://github.com/user-attachments/assets/05a16309-b658-4f21-8f9d-9a3f20db6ad8" /> Keyboard and flag/country emojis may look a bit off, but that’s how they’re actually rendered: <img width="482" height="95" alt="image" src="https://github.com/user-attachments/assets/dc7d4d0d-3dc8-4df5-9b9f-9e977e7e989f" />  ## PR Checklist - [x] Closes: - #41489 - #41496 - [ ] **Communication:** I've discussed this with core contributors already. If the work hasn't been agreed, this work might be rejected - [ ] **Tests:** Added/updated and all pass - [ ] **Localization:** All end-user-facing strings can be localized - [ ] **Dev docs:** Added/updated - [ ] **New binaries:** Added on the required places - [ ] [JSON for signing](https://github.com/microsoft/PowerToys/blob/main/.pipelines/ESRPSigning_core.json) for new binaries - [ ] [WXS for installer](https://github.com/microsoft/PowerToys/blob/main/installer/PowerToysSetup/Product.wxs) for new binaries and localization folder - [ ] [YML for CI pipeline](https://github.com/microsoft/PowerToys/blob/main/.pipelines/ci/templates/build-powertoys-steps.yml) for new test projects - [ ] [YML for signed pipeline](https://github.com/microsoft/PowerToys/blob/main/.pipelines/release.yml) - [ ] **Documentation updated:** If checked, please file a pull request on [our docs repo](https://github.com/MicrosoftDocs/windows-uwp/tree/docs/hub/powertoys) and link it here: #xxx  ## Detailed Description of the Pull Request / Additional comments  ## Validation Steps Performed
2026-04-05 18:57:19 +02:00 · 2025-09-03 20:17:52 +02:00
parent b71bbf89ce
commit 347c3f1efa
9 changed files with 532 additions and 51 deletions
--- a/src/modules/cmdpal/Microsoft.Terminal.UI/FontIconGlyphClassifier.cpp
+++ b/src/modules/cmdpal/Microsoft.Terminal.UI/FontIconGlyphClassifier.cpp
@@ -0,0 +1,177 @@
+#include "pch.h"
+#include "FontIconGlyphClassifier.h"
+#include "FontIconGlyphClassifier.g.cpp"
+
+#include <icu.h>
+#include <utility>
+
+namespace winrt::Microsoft::Terminal::UI::implementation
+{
+    namespace
+    {
+        // Check if the code point is in the Private Use Area range used by Fluent UI icons.
+        [[nodiscard]] constexpr bool _isFluentIconPua(const UChar32 cp) noexcept
+        {
+            static constexpr UChar32 _fluentIconsPrivateUseAreaStart = 0xE700;
+            static constexpr UChar32 _fluentIconsPrivateUseAreaEnd = 0xF8FF;
+            return cp >= _fluentIconsPrivateUseAreaStart && cp <= _fluentIconsPrivateUseAreaEnd;
+        }
+
+        // Determine if the given text (as a sequence of UChar code units) is emoji
+        [[nodiscard]] bool _isEmoji(const UChar* p, const int32_t length) noexcept
+        {
+            if (!p || length < 1)
+            {
+                return false;
+            }
+
+            // https://www.unicode.org/reports/tr51/#Emoji_Variation_Selector_Notes
+            constexpr UChar32 vs15CodePoint = 0xFE0E; // Variation Selectors 15: text variation selector
+            constexpr UChar32 vs16CodePoint = 0xFE0F; // Variation Selectors: 16 emoji variation selector
+
+            // Decode the first code point correctly (surrogate-safe)
+            int32_t i0{ 0 };
+            UChar32 first{ 0 };
+            U16_NEXT(p, i0, length, first);
+
+            for (int32_t i = 0; i < length;)
+            {
+                UChar32 cp{ 0 };
+                U16_NEXT(p, i, length, cp);
+
+                if (cp == vs16CodePoint) { return true; }
+                if (cp == vs15CodePoint) { return false; }
+            }
+
+            return !U_IS_SURROGATE(first) && u_hasBinaryProperty(first, UCHAR_EMOJI_PRESENTATION);
+        }
+    }
+
+    bool FontIconGlyphClassifier::IsLikelyToBeEmojiOrSymbolIcon(const hstring& text)
+    {
+        if (text.empty())
+        {
+            return false;
+        }
+
+        if (text.size() == 1 && !IS_HIGH_SURROGATE(text[0]))
+        {
+            // If it's a single code unit, it's definitely either zero or one grapheme clusters.
+            // If it turns out to be illegal Unicode, we don't really care.
+            return true;
+        }
+
+        if (text.size() >= 2 && text[0] <= 0x7F && text[1] <= 0x7F)
+        {
+            // Two adjacent ASCII characters (as seen in most file paths) aren't a single
+            // grapheme cluster.
+            return false;
+        }
+
+        // Use ICU to determine whether text is composed of a single grapheme cluster.
+        int32_t off{ 0 };
+        UErrorCode status{ U_ZERO_ERROR };
+
+        UBreakIterator* const bi{ ubrk_open(UBRK_CHARACTER,
+                                            nullptr,
+                                            reinterpret_cast<const UChar*>(text.data()),
+                                            static_cast<int>(text.size()),
+                                            &status) };
+        if (bi)
+        {
+            if (U_SUCCESS(status))
+            {
+                off = ubrk_next(bi);
+            }
+            ubrk_close(bi);
+        }
+        return std::cmp_equal(off, text.size());
+    }
+
+    FontIconGlyphKind FontIconGlyphClassifier::Classify(hstring const& text) noexcept
+    {
+        if (text.empty())
+        {
+            return FontIconGlyphKind::None;
+        }
+
+        const size_t textSize{ text.size() };
+        const auto* buffer{ reinterpret_cast<const UChar*>(text.c_str()) };
+
+        // Fast path 1: Single UTF-16 code unit (most common case)
+        if (textSize == 1)
+        {
+            const UChar ch{ buffer[0] };
+
+            if (IS_HIGH_SURROGATE(ch))
+            {
+                return FontIconGlyphKind::Invalid;
+            }
+
+            if (_isFluentIconPua(ch))
+            {
+                return FontIconGlyphKind::FluentSymbol;
+            }
+
+            if (_isEmoji(&ch, 1))
+            {
+                return FontIconGlyphKind::Emoji;
+            }
+
+            return FontIconGlyphKind::Other;
+        }
+
+        // Fast path 2: Common file path pattern - two ASCII printable characters
+        if (textSize >= 2 && buffer[0] <= 0x7F && buffer[1] <= 0x7F)
+        {
+            // Definitely multiple graphemes
+            return FontIconGlyphKind::Invalid;
+        }
+
+        // Expensive path: Use ICU to determine grapheme boundaries
+        UErrorCode status{ U_ZERO_ERROR };
+
+        UBreakIterator* bi{ ubrk_open(UBRK_CHARACTER,
+                                      nullptr,
+                                      buffer,
+                                      static_cast<int32_t>(textSize),
+                                      &status) };
+
+        if (U_FAILURE(status) || !bi)
+        {
+            return FontIconGlyphKind::Invalid;
+        }
+
+        const int32_t start{ ubrk_first(bi) };
+        const int32_t end{ ubrk_next(bi) }; // end of first grapheme
+        ubrk_close(bi);
+
+        // No graphemes found
+        if (end == UBRK_DONE || end <= start)
+        {
+            return FontIconGlyphKind::None;
+        }
+
+        // If there's more than one grapheme, it's not a valid icon glyph
+        if (std::cmp_not_equal(end, textSize))
+        {
+            return FontIconGlyphKind::Invalid;
+        }
+
+        // Exactly one grapheme: classify
+        const UChar* grapheme = buffer + start;
+        const int32_t graphemeLength = end - start;
+
+        if (graphemeLength == 1 && _isFluentIconPua(grapheme[0]))
+        {
+            return FontIconGlyphKind::FluentSymbol;
+        }
+
+        if (_isEmoji(grapheme, graphemeLength))
+        {
+            return FontIconGlyphKind::Emoji;
+        }
+
+        return FontIconGlyphKind::Other;
+    }
+}