mirror of
https://github.com/microsoft/PowerToys.git
synced 2026-04-03 01:36:31 +02:00
[Run] Replace WindowWalker's brute-force fuzzy matching algorithm with optimal DP solution (#44551)
## Summary of the Pull Request Window Walker's fuzzy string matching algorithm exhibits exponential memory usage and execution time when given inputs containing repeated characters or phrases. When a user has several windows open with long titles (such as browser windows), it is straightforward to trigger a pathological case which uses up gigabytes of memory and freezes the UI. This is exacerbated by Run's lack of thread pruning, meaning work triggered by older keystrokes consumes CPU and memory until completion. <!-- Please review the items on the PR checklist before submitting--> ## PR Checklist - [x] Closes: #44546 - [x] Closes: #44184 - [ ] **Communication:** I've discussed this with core contributors already. If the work hasn't been agreed, this work might be rejected - [ ] **Tests:** Added/updated and all pass - [ ] **Localization:** All end-user-facing strings can be localized - [ ] **Dev docs:** Added/updated - [ ] **New binaries:** Added on the required places - [ ] [JSON for signing](https://github.com/microsoft/PowerToys/blob/main/.pipelines/ESRPSigning_core.json) for new binaries - [ ] [WXS for installer](https://github.com/microsoft/PowerToys/blob/main/installer/PowerToysSetup/Product.wxs) for new binaries and localization folder - [ ] [YML for CI pipeline](https://github.com/microsoft/PowerToys/blob/main/.pipelines/ci/templates/build-powertoys-steps.yml) for new test projects - [ ] [YML for signed pipeline](https://github.com/microsoft/PowerToys/blob/main/.pipelines/release.yml) - [ ] **Documentation updated:** If checked, please file a pull request on [our docs repo](https://github.com/MicrosoftDocs/windows-uwp/tree/docs/hub/powertoys) and link it here: #xxx ## Detailed Description of the Pull Request / Additional comments The existing algorithm in `FuzzyMatching.cs` is greedy, creating all possible matching combinations of the search string within the candidate via its `GetAllMatchIndexes()` method. After this, it selects the best match and discards the others. This may be considered reasonable for small search strings, but it causes a combinatorial explosion when there are multiple possible matches where characters or substrings repeat, even when the search string is small. The current brute-force algorithm has time complexity of **O(n * m * C(n,m))** where **C(n,m)** = **n!/(m!(n-m)!)** and space complexity of **O(C(n,m) * m)** because it stores all possible match combinations before choosing the best. For example, matching `"eeee"` in `"eeeeeeee"` creates **C(8,4)** = **70** match combinations, which stores 70 lists with 4 integers each, plus overhead from the LINQ-based list copying and appending: ```csharp var tempList = results .Where(x => x.Count == secondIndex && x[x.Count - 1] < firstIndex) .Select(x => x.ToList()) // Creates a full copy of each matching path .ToList(); // Materializes all copies results.AddRange(tempList); // Adds lists to results ``` Each potential sub-match may be recalculated many times. Window Walker queries across all window titles, so this problem will be magnified if the search text happens to match multiple titles and/or if a search string containing a single repeated character is used. For browser windows, where titles may be long, this is especially problematic, and similarly for Explorer windows with longer paths. ## Proposed solution The solution presented here is to use a dynamic programming algorithm which finds the optimal match directly without generating all possibilities. In terms of complexity, the new algorithm benefits from a single pass through its DP table and only has to store two integer arrays which are sized proportionally to the search and candidate text string lengths; so **O(n * m)** for both time and space, i.e. polynomial instead of exponential. Scoring is equivalent between the old and new algorithms, based strictly on the minimum match span within the candidate string. ## Implementation notes The new algorithm tracks the best start index for matches ending at each position, eliminating the need to store all possible paths. By storing the "latest best match so far" as you scan through the search text, you are guaranteed to minimise the span length. To recreate the best match, a separate table of parent indexes is kept and iterated backwards once the DP step is complete. Reversing this provides you with the same result (or equivalent if there are multiple best matches) as the original algorithm. For this "minimum-span" fuzzy matching method, this should be optimal as it only scans once and storage is proportional to the search and candidate strings only. ## Benchmarks A verification and benchmarking suite is here: https://github.com/daverayment/WindowWalkerBench Results from comparing the old and new algorithms are here: https://docs.google.com/spreadsheets/d/1eXmmnN2eI3774QxXXyx1Dv4SKu78U96q28GYnpHT0_8/edit?usp=sharing | Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Gen2 | Allocated | Alloc Ratio | |---------------- |-----------------:|-----------------:|-----------------:|-----------:|----------:|-----------:|-----------:|----------:|-------------:|------------:| | Old_Normal | 4,034.4 ns | 220.94 ns | 647.98 ns | 1.02 | 0.23 | 1.9760 | - | - | 8.09 KB | 1.00 | | New_Normal | 804.5 ns | 24.29 ns | 70.47 ns | 0.20 | 0.04 | 0.4339 | - | - | 1.77 KB | 0.22 | | Old_Repetitive | 7,624.7 ns | 318.06 ns | 912.57 ns | 1.94 | 0.38 | 3.7079 | - | - | 15.16 KB | 1.87 | | New_Repetitive | 2,714.6 ns | 109.03 ns | 318.03 ns | 0.69 | 0.13 | 1.6403 | - | - | 6.72 KB | 0.83 | | Old_Explosion | 881,443,209.3 ns | 26,273,980.96 ns | 76,225,588.43 ns | 223,872.87 | 39,357.31 | 50000.0000 | 27000.0000 | 5000.0000 | 351885.11 KB | 43,518.16 | | New_Explosion | 3,225.4 ns | 111.98 ns | 315.84 ns | 0.82 | 0.15 | 1.7738 | - | - | 7.26 KB | 0.90 | | Old_Explosion_8 | 460,153,862.6 ns | 18,744,417.95 ns | 54,974,137.06 ns | 116,871.93 | 22,719.87 | 25000.0000 | 14000.0000 | 3000.0000 | 173117.13 KB | 21,409.65 | | New_Explosion_8 | 2,958.3 ns | 78.16 ns | 230.45 ns | 0.75 | 0.13 | 1.5793 | - | - | 6.46 KB | 0.80 | | Old_Explosion_7 | 189,069,384.8 ns | 3,774,916.46 ns | 6,202,296.49 ns | 48,020.68 | 7,501.98 | 11000.0000 | 6333.3333 | 2000.0000 | 71603.96 KB | 8,855.37 | | New_Explosion_7 | 2,667.5 ns | 117.69 ns | 337.68 ns | 0.68 | 0.13 | 1.3924 | - | - | 5.7 KB | 0.70 | | Old_Explosion_6 | 71,960,114.8 ns | 1,757,017.15 ns | 5,125,301.87 ns | 18,276.75 | 3,083.86 | 4500.0000 | 2666.6667 | 1333.3333 | 25515.96 KB | 3,155.60 | | New_Explosion_6 | 2,232.5 ns | 72.65 ns | 202.52 ns | 0.57 | 0.10 | 1.1978 | - | - | 4.91 KB | 0.61 | | Old_Explosion_5 | 9,121,126.4 ns | 180,744.42 ns | 228,583.84 ns | 2,316.62 | 358.55 | 1000.0000 | 968.7500 | 484.3750 | 7630.49 KB | 943.67 | | New_Explosion_5 | 1,917.3 ns | 48.63 ns | 133.95 ns | 0.49 | 0.08 | 1.0109 | - | - | 4.13 KB | 0.51 | | Old_Explosion_4 | 2,489,593.2 ns | 82,937.33 ns | 236,624.90 ns | 632.32 | 113.96 | 281.2500 | 148.4375 | 74.2188 | 1729.71 KB | 213.92 | | New_Explosion_4 | 1,598.3 ns | 51.92 ns | 152.28 ns | 0.41 | 0.07 | 0.8163 | - | - | 3.34 KB | 0.41 | | Old_Explosion_3 | 202,814.0 ns | 7,684.44 ns | 22,293.96 ns | 51.51 | 9.72 | 72.7539 | 0.2441 | - | 298.13 KB | 36.87 | | New_Explosion_3 | 1,222.5 ns | 26.07 ns | 76.45 ns | 0.31 | 0.05 | 0.6275 | - | - | 2.57 KB | 0.32 | | Old_Subsequence | 419,417.7 ns | 8,308.97 ns | 22,178.33 ns | 106.53 | 17.23 | 266.6016 | 0.9766 | - | 1090.05 KB | 134.81 | | New_Subsequence | 2,501.9 ns | 80.91 ns | 233.43 ns | 0.64 | 0.11 | 1.3542 | - | - | 5.55 KB | 0.69 | (Where "Old_Explosion" is "e" repeated 9 times. Times in nanoseconds or one millionth of a millisecond.) It is worth noting that the results show a **single string match**. So matching "eeeeee" against a 99-character string took 25 MB of memory and 71 milliseconds to compute. For the new algorithm, this is reduced down to <5KB and 0.002 milliseconds. Even for a three-character repetition, the new algorithm is >150x faster with <1% of the allocations. ## Real world example **Before (results still pending after more than a minute):** <img width="837" height="336" alt="Image" src="https://github.com/user-attachments/assets/c4c3ae04-6a47-40b9-a2a4-7a4da169f7d5" /> **After (instantaneous results):** <img width="829" height="444" alt="image" src="https://github.com/user-attachments/assets/055fc4a6-f34f-4bed-a12c-408b52274de2" /> ## Validation Steps Performed The verification tests in the benchmark project pass, with results identical to the original across a number of test cases, including the pathological cases identified earlier and edge cases such as single-character searches. All unit tests under `Wox.Test`, including all 38 `FuzzyMatcherTest` entries still pass.
This commit is contained in:
@@ -6,7 +6,6 @@
|
|||||||
using System;
|
using System;
|
||||||
using System.Collections.Generic;
|
using System.Collections.Generic;
|
||||||
using System.Globalization;
|
using System.Globalization;
|
||||||
using System.Linq;
|
|
||||||
|
|
||||||
namespace Microsoft.Plugin.WindowWalker.Components
|
namespace Microsoft.Plugin.WindowWalker.Components
|
||||||
{
|
{
|
||||||
@@ -16,99 +15,128 @@ namespace Microsoft.Plugin.WindowWalker.Components
|
|||||||
internal static class FuzzyMatching
|
internal static class FuzzyMatching
|
||||||
{
|
{
|
||||||
/// <summary>
|
/// <summary>
|
||||||
/// Finds the best match (the one with the most
|
/// Find the best match (the one with the smallest span) using a Dynamic Programming approach
|
||||||
/// number of letters adjacent to each other) and
|
/// to minimize candidate matches.
|
||||||
/// returns the index location of each of the letters
|
|
||||||
/// of the matches
|
|
||||||
/// </summary>
|
/// </summary>
|
||||||
/// <param name="text">The text to search inside of</param>
|
/// <param name="text">The text to search inside of.</param>
|
||||||
/// <param name="searchText">the text to search for</param>
|
/// <param name="searchText">The text to search for.</param>
|
||||||
/// <returns>returns the index location of each of the letters of the matches</returns>
|
/// <returns>The index location of each of the letters in the best match.</returns>
|
||||||
internal static List<int> FindBestFuzzyMatch(string text, string searchText)
|
internal static List<int> FindBestFuzzyMatch(string text, string searchText)
|
||||||
{
|
{
|
||||||
ArgumentNullException.ThrowIfNull(searchText);
|
ArgumentNullException.ThrowIfNull(searchText);
|
||||||
|
|
||||||
ArgumentNullException.ThrowIfNull(text);
|
ArgumentNullException.ThrowIfNull(text);
|
||||||
|
|
||||||
// Using CurrentCulture since this is user facing
|
var sLower = searchText.ToLower(CultureInfo.CurrentCulture);
|
||||||
searchText = searchText.ToLower(CultureInfo.CurrentCulture);
|
var tLower = text.ToLower(CultureInfo.CurrentCulture);
|
||||||
text = text.ToLower(CultureInfo.CurrentCulture);
|
int m = sLower.Length;
|
||||||
|
int n = tLower.Length;
|
||||||
|
|
||||||
// Create a grid to march matches like
|
// A subsequence longer than the candidate text can never match.
|
||||||
// e.g.
|
if (m > n)
|
||||||
// a b c a d e c f g
|
|
||||||
// a x x
|
|
||||||
// c x x
|
|
||||||
bool[,] matches = new bool[text.Length, searchText.Length];
|
|
||||||
for (int firstIndex = 0; firstIndex < text.Length; firstIndex++)
|
|
||||||
{
|
{
|
||||||
for (int secondIndex = 0; secondIndex < searchText.Length; secondIndex++)
|
return [];
|
||||||
|
}
|
||||||
|
|
||||||
|
// bestStart[k, i] stores the latest possible start index of a match for s[0..k] that
|
||||||
|
// ends exactly at t[i], or -1 if no such match exists.
|
||||||
|
//
|
||||||
|
// Tracking the latest start ensures that we only retain the smallest span of all matches
|
||||||
|
// that end at i.
|
||||||
|
int[,] bestStart = new int[m, n];
|
||||||
|
|
||||||
|
// parent[k, i] stores the index where the previous character matched to allow for
|
||||||
|
// reconstruction of the best path once the DP step completes.
|
||||||
|
int[,] parent = new int[m, n];
|
||||||
|
|
||||||
|
// Initialize tables.
|
||||||
|
for (int k = 0; k < m; k++)
|
||||||
|
{
|
||||||
|
for (int i = 0; i < n; i++)
|
||||||
{
|
{
|
||||||
matches[firstIndex, secondIndex] =
|
bestStart[k, i] = -1;
|
||||||
searchText[secondIndex] == text[firstIndex] ?
|
|
||||||
true :
|
|
||||||
false;
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// use this table to get all the possible matches
|
// Base case: match the first character of the search string s[0].
|
||||||
List<List<int>> allMatches = GetAllMatchIndexes(matches);
|
for (int i = 0; i < n; i++)
|
||||||
|
|
||||||
// return the score that is the max
|
|
||||||
int maxScore = allMatches.Count > 0 ? CalculateScoreForMatches(allMatches[0]) : 0;
|
|
||||||
List<int> bestMatch = allMatches.Count > 0 ? allMatches[0] : new List<int>();
|
|
||||||
|
|
||||||
foreach (var match in allMatches)
|
|
||||||
{
|
{
|
||||||
int score = CalculateScoreForMatches(match);
|
if (tLower[i] == sLower[0])
|
||||||
if (score > maxScore)
|
|
||||||
{
|
{
|
||||||
bestMatch = match;
|
bestStart[0, i] = i;
|
||||||
maxScore = score;
|
parent[0, i] = -1;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
return bestMatch;
|
// Dynamic programming step: extend matches for the remaining characters s[1..m-1].
|
||||||
}
|
for (int k = 1; k < m; k++)
|
||||||
|
|
||||||
/// <summary>
|
|
||||||
/// Gets all the possible matches to the search string with in the text
|
|
||||||
/// </summary>
|
|
||||||
/// <param name="matches"> a table showing the matches as generated by
|
|
||||||
/// a two dimensional array with the first dimension the text and the second
|
|
||||||
/// one the search string and each cell marked as an intersection between the two</param>
|
|
||||||
/// <returns>a list of the possible combinations that match the search text</returns>
|
|
||||||
internal static List<List<int>> GetAllMatchIndexes(bool[,] matches)
|
|
||||||
{
|
|
||||||
ArgumentNullException.ThrowIfNull(matches);
|
|
||||||
|
|
||||||
List<List<int>> results = new List<List<int>>();
|
|
||||||
|
|
||||||
for (int secondIndex = 0; secondIndex < matches.GetLength(1); secondIndex++)
|
|
||||||
{
|
{
|
||||||
for (int firstIndex = 0; firstIndex < matches.GetLength(0); firstIndex++)
|
int currentMaxStart = -1;
|
||||||
{
|
int currentParentIndex = -1;
|
||||||
if (secondIndex == 0 && matches[firstIndex, secondIndex])
|
|
||||||
{
|
|
||||||
results.Add(new List<int> { firstIndex });
|
|
||||||
}
|
|
||||||
else if (matches[firstIndex, secondIndex])
|
|
||||||
{
|
|
||||||
var tempList = results.Where(x => x.Count == secondIndex && x[x.Count - 1] < firstIndex).Select(x => x.ToList()).ToList();
|
|
||||||
|
|
||||||
foreach (var pathSofar in tempList)
|
for (int i = 0; i < n; i++)
|
||||||
|
{
|
||||||
|
// 1. Try to match s[k] at t[i].
|
||||||
|
// We must use a valid start from the previous row (k-1) that appeared BEFORE i.
|
||||||
|
// 'currentMaxStart' holds the best start value from indices 0 to i-1.
|
||||||
|
if (tLower[i] == sLower[k])
|
||||||
|
{
|
||||||
|
if (currentMaxStart != -1)
|
||||||
{
|
{
|
||||||
pathSofar.Add(firstIndex);
|
bestStart[k, i] = currentMaxStart;
|
||||||
|
parent[k, i] = currentParentIndex;
|
||||||
}
|
}
|
||||||
|
}
|
||||||
|
|
||||||
results.AddRange(tempList);
|
// 2. Maintain the dominating predecessor for the next column.
|
||||||
|
// We only keep the match with the latest start index, as it strictly dominates
|
||||||
|
// all earlier-starting matches for the purpose of minimizing the match span.
|
||||||
|
if (bestStart[k - 1, i] > currentMaxStart)
|
||||||
|
{
|
||||||
|
currentMaxStart = bestStart[k - 1, i];
|
||||||
|
currentParentIndex = i;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
results = results.Where(x => x.Count == secondIndex + 1).ToList();
|
|
||||||
}
|
}
|
||||||
|
|
||||||
return results.Where(x => x.Count == matches.GetLength(1)).ToList();
|
// Select the ending position that minimizes span.
|
||||||
|
int bestEndIndex = -1;
|
||||||
|
int maxScore = int.MinValue;
|
||||||
|
|
||||||
|
// Score logic: -(LastIndex - StartIndex).
|
||||||
|
// We want to Maximize Score => Minimize Span.
|
||||||
|
for (int i = 0; i < n; i++)
|
||||||
|
{
|
||||||
|
if (bestStart[m - 1, i] != -1)
|
||||||
|
{
|
||||||
|
int start = bestStart[m - 1, i];
|
||||||
|
int score = -(i - start);
|
||||||
|
|
||||||
|
if (score > maxScore)
|
||||||
|
{
|
||||||
|
maxScore = score;
|
||||||
|
bestEndIndex = i;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (bestEndIndex == -1)
|
||||||
|
{
|
||||||
|
return [];
|
||||||
|
}
|
||||||
|
|
||||||
|
// Reconstruct only the winning path.
|
||||||
|
var result = new List<int>(m);
|
||||||
|
int curr = bestEndIndex;
|
||||||
|
|
||||||
|
for (int k = m - 1; k >= 0; k--)
|
||||||
|
{
|
||||||
|
result.Add(curr);
|
||||||
|
curr = parent[k, curr];
|
||||||
|
}
|
||||||
|
|
||||||
|
result.Reverse();
|
||||||
|
return result;
|
||||||
}
|
}
|
||||||
|
|
||||||
/// <summary>
|
/// <summary>
|
||||||
|
|||||||
Reference in New Issue
Block a user