Fix the instruction and dump prs

This commit is contained in:
Gordon Lam (SH)
2025-09-30 11:51:30 +08:00
parent 235c3ef3e6
commit e963125210
4 changed files with 272 additions and 12 deletions

View File

@@ -26,11 +26,11 @@ This document describes how to collect pull requests for a milestone, request a
1) run `dump-prs-information.ps1` to export PRs for the target milestone (initial run, CopilotSummary likely empty)
- Open `dump-prs-information.ps1` and set:
- `$repo` (e.g., `microsoft/PowerToys`)
- `$milestone` (milestone title exactly as in GitHub, e.g., `PowerToys 0.94`)
- `$milestone` (milestone title exactly as in GitHub, e.g., `PowerToys 0.95`)
- run the script in PowerShell; it will generate `milestone_prs.json` and `sorted_prs.csv`.
2) Request Copilot reviews for each PR listed in the CSV in Agent mode (MUST NOT generate or run any ps1)
- Use MCP tools "MCP Server: github-remote" in current Agent mode to request Copilot reviews for all PR Ids in `sorted_prs.csv`.
- Must use MCP tools "MCP Server: github-remote" in current Agent mode to request Copilot reviews for all PR Ids in `sorted_prs.csv`.
3) run `dump-prs-information.ps1` again
- This refresh collects the latest Copilot review body into the `CopilotSummary` column in `sorted_prs.csv`.
@@ -39,15 +39,33 @@ This document describes how to collect pull requests for a milestone, request a
5) Summarize PRs into perlabel Markdown files in Agent mode (MUST NOT generate or run any script in terminal nor ps1)
- Read the the csv files in the folder grouped_csv one by one
- Generate the summary md file as the following instruciton in two parts:
- For each label group, create a markdown file under a new folder `grouped_md/` (create if missing). File name: sanitized label group name (same pattern as CSV) with `.md` extension. Example: `Area-Build.md`.
- Each markdown file content must follow the structure below (two sections) and preserve the PR order from the source CSV.
- Do not embed PR numbers in the bullet list lines; only link them in the table.
- If re-running, overwrite existing markdown files (idempotent generation).
- After generation, you should have a 1:1 correspondence between files in `grouped_csv/` and `grouped_md/` (excluding any intentionally skipped groups—document if skipped).
- Generate the summary md file as the following instruction in two parts:
1. Markdown list: one concise, userfacing line per PR (no deep technical jargon). Use "Verbed" + "Scenario" + "Impact" as setence structure. Use `Title`, `Body`, and `CopilotSummary` as sources.
- If `Author` is NOT in `MemberList.md`, append a "Thanks @handle!" see `SampleOutput.md` as example.
- Do NOT include PR numbers or IDs in the list line; keep the PR link only in the table mentioned in 2. below, please refer to `SampleOutput.md` as example.
- If `Author` is NOT in `**/MemberList.md`, append a "Thanks @handle!" see `**/SampleOutput.md` as example.
- Do NOT include PR numbers or IDs in the list line; keep the PR link only in the table mentioned in 2. below, please refer to `**/SampleOutput.md` as example.
- If confidence to have enough information for summarization according to guideline above is < 70%, write: `Human Summary Needed: <PR full link>` on that line.
2. Threecolumn table (in the same PR order):
- Column 1: The concise, userfacing summary (the "cut version")
- Column 2: PR link
- Column 3: Confidence (e.g., `High/Medium/Low`) and the reason if < 70%
- Column 3: Confidence (e.g., `High/Medium/Low`) and the reasoning if < 70%
6) According the generated grouped_md/*.md, update back the repo root's `Readme.md`. Here is the guideline:
a. Replace all versioned references in `README.md`:
- Bump current release heading (e.g. **Version 0.xx**) by +0.01.
- Shift link references: previous `[github-current-release-work]` becomes old version; increment `[github-next-release-work]` to point to the following milestone.
- Update download asset filenames (e.g. `PowerToysSetup-0.94.0-...``PowerToysSetup-0.95.0-...`).
b. Build the What's New content from `grouped_md`:
- Combine `Area-Build` and `Area-Tests` entries under a single `Development` subsection (keep bullet order from CSV).
- Each other `Product-*` group gets its own subsection titled by the module name.
- Order subsections alphabetically by their heading text, with **Highlights** always first and **Development** always last (e.g., Environment Variables, File Locksmith, Find My Mouse, ... , ZoomIt, Development).
- Copy bullet lines verbatim from the corresponding `grouped_md` files (preserve punctuation and any trailing `Thanks @handle!`). Do NOT add, remove, or reevaluate thanks in the README stage.
c. Highlights: choose up to 10 bullets focused on user-visible feature additions or impactful fixes (avoid purely internal refactors). Use pattern: `Module/Feature <past-tense verb> <scenario> <impact>`.
d. Keep wording concise (aim 1 line per bullet), no PR numbers, no deep implementation details.
e. After updating, verify total highlight count ≤ 10 and that all internal contributors are not thanked.
## Notes and conventions
- Terminal usage: Disabled by default. Do NOT run terminal commands or ps1 scripts unless the user explicitly instructs you to.

View File

@@ -19,7 +19,7 @@ vanzue
zadjii-msft
khmyznikov
chatasweetie
MichaelJolley
michaeljolley
Jaylyn-Barbee
zateutsch
crutkas

View File

@@ -25,7 +25,7 @@ $csvData = $sorted | ForEach-Object {
$reviewsCommand = "gh pr view $prNumber --repo $repo --json reviews"
$reviewsJson = Invoke-Expression $reviewsCommand | ConvertFrom-Json
# Find the latest Copilot review - try different possible author names
# Collect Copilot reviews (match various author logins). Choose the LONGEST body (more content) vs newest.
$copilotReviews = $reviewsJson.reviews | Where-Object {
($_.author.login -eq "github-copilot[bot]" -or
$_.author.login -eq "copilot" -or
@@ -33,11 +33,11 @@ $csvData = $sorted | ForEach-Object {
$_.author.login -like "*copilot*") -and
$_.body -and
$_.body.Trim() -ne ""
} | Sort-Object submittedAt -Descending
}
if ($copilotReviews -and $copilotReviews.Count -gt 0) {
$copilotOverview = $copilotReviews[0].body.Replace("`r", "").Replace("`n", " ") -replace '\s+', ' '
Write-Host " Found Copilot review from: $($copilotReviews[0].author.login)"
$longest = $copilotReviews | Sort-Object { $_.body.Length } -Descending | Select-Object -First 1
$copilotOverview = $longest.body.Replace("`r", "").Replace("`n", " ") -replace '\s+', ' '
Write-Host " Selected Copilot review (author=$($longest.author.login) length=$($longest.body.Length))"
} else {
Write-Host " No Copilot reviews found for PR #$prNumber"
}

View File

@@ -0,0 +1,242 @@
[CmdletBinding()]
param(
[Parameter(Mandatory = $true)][string]$StartCommit, # exclusive start (commits AFTER this one)
[string]$EndCommit = "HEAD",
[string]$Repo = "microsoft/PowerToys",
[string]$OutputCsv = "sorted_prs.csv",
[string]$OutputJson = "milestone_prs.json"
)
<#
.SYNOPSIS
Dump merged PR information whose merge commits are reachable from EndCommit but not from StartCommit.
.DESCRIPTION
Uses git rev-list to compute commits in the (StartCommit, EndCommit] range, extracts PR numbers from merge commit messages,
queries GitHub (gh CLI) for details, then outputs a CSV similar to dump-prs-information.ps1.
PR merge commit messages in PowerToys generally contain patterns like:
Merge pull request #12345 from ...
.EXAMPLE
pwsh ./dump-prs-since-commit.ps1 -StartCommit 0123abcd
.EXAMPLE
pwsh ./dump-prs-since-commit.ps1 -StartCommit 0123abcd -EndCommit 89ef7654 -OutputCsv changes.csv
.NOTES
Requires: gh CLI authenticated; git available in working directory (must be inside PowerToys repo clone).
CopilotSummary behavior:
- Attempts to locate the latest GitHub Copilot authored review (preferred).
- If no review is found, lazily fetches PR comments to look for a Copilot-authored comment.
- Normalizes whitespace and strips newlines. Empty when no Copilot activity detected.
- Run with -Verbose to see whether the summary came from a 'review' or 'comment' source.
#>
function Write-Info($msg) { Write-Host $msg -ForegroundColor Cyan }
function Write-Warn($msg) { Write-Host $msg -ForegroundColor Yellow }
function Write-Err($msg) { Write-Host $msg -ForegroundColor Red }
function Write-DebugMsg($msg) { if ($PSBoundParameters.ContainsKey('Verbose') -or $VerbosePreference -eq 'Continue') { Write-Host "[VERBOSE] $msg" -ForegroundColor DarkGray } }
# Validate we are in a git repo
#if (-not (Test-Path .git)) {
# Write-Err "Current directory does not appear to be the root of a git repository."
# exit 1
#}
# Resolve commits
try {
$startSha = (git rev-parse --verify $StartCommit) 2>$null
if (-not $startSha) { throw "StartCommit '$StartCommit' not found" }
$endSha = (git rev-parse --verify $EndCommit) 2>$null
if (-not $endSha) { throw "EndCommit '$EndCommit' not found" }
}
catch {
Write-Err $_
exit 1
}
Write-Info "Collecting commits between $startSha..$endSha (excluding start, including end)."
# Get list of commits reachable from end but not from start.
# IMPORTANT: In PowerShell, the .. operator creates a numeric/char range. If $startSha and $endSha look like hex strings,
# `$startSha..$endSha` will expand unexpectedly (often to empty/undesired) instead of passing the literal "sha1..sha2".
# Therefore we build the range explicitly as a single string argument.
$rangeArg = "$startSha..$endSha"
$commitList = git rev-list $rangeArg
# Normalize list (filter out empty strings)
$normalizedCommits = $commitList | Where-Object { $_ -and $_.Trim() -ne '' }
$commitCount = ($normalizedCommits | Measure-Object).Count
Write-DebugMsg ("Raw commitList length (including blanks): {0}" -f (($commitList | Measure-Object).Count))
Write-DebugMsg ("Normalized commit count: {0}" -f $commitCount)
if ($commitCount -eq 0) {
Write-Warn "No commits found in specified range ($startSha..$endSha)."; exit 0
}
Write-DebugMsg ("First 5 commits: {0}" -f (($normalizedCommits | Select-Object -First 5) -join ', '))
<#
Extract PR numbers from commits.
Patterns handled:
1. Merge commits: 'Merge pull request #12345 from ...'
2. Squash commits: 'Some feature change (#12345)' (GitHub default squash format)
We collect both. If a commit matches both (unlikely), it's deduped later.
#>
# Extract PR numbers from merge or squash commits
$mergeCommits = @()
foreach ($c in $normalizedCommits) {
$subject = git show -s --format=%s $c
$matched = $false
# Pattern 1: Traditional merge commit
if ($subject -match 'Merge pull request #([0-9]+) ') {
$prNumber = [int]$matches[1]
$mergeCommits += [PSCustomObject]@{ Sha = $c; Pr = $prNumber; Subject = $subject; Pattern = 'merge' }
Write-DebugMsg "Matched merge PR #$prNumber in commit $c"
$matched = $true
}
# Pattern 2: Squash merge subject line with ' (#12345)' at end (allow possible whitespace before paren)
if ($subject -match '\(#([0-9]+)\)$') {
$prNumber2 = [int]$matches[1]
# Avoid duplicate object if pattern 1 already captured same number for same commit
if (-not ($mergeCommits | Where-Object { $_.Sha -eq $c -and $_.Pr -eq $prNumber2 })) {
$mergeCommits += [PSCustomObject]@{ Sha = $c; Pr = $prNumber2; Subject = $subject; Pattern = 'squash' }
Write-DebugMsg "Matched squash PR #$prNumber2 in commit $c"
}
$matched = $true
}
if (-not $matched) {
Write-DebugMsg "No PR pattern in commit $c : $subject"
}
}
if (-not $mergeCommits -or $mergeCommits.Count -eq 0) {
Write-Warn "No merge commits with PR numbers found in range."; exit 0
}
# Deduplicate PR numbers (in case of revert or merges across branches)
$prNumbers = $mergeCommits | Select-Object -ExpandProperty Pr -Unique | Sort-Object
Write-Info ("Found {0} unique PRs: {1}" -f $prNumbers.Count, ($prNumbers -join ', '))
Write-DebugMsg ("Total merge commits examined: {0}" -f $mergeCommits.Count)
# Query GitHub for each PR
$prDetails = @()
function Get-CopilotSummaryFromPrJson {
param(
[Parameter(Mandatory=$true)]$PrJson,
[switch]$VerboseMode
)
# Returns a hashtable with Summary and Source keys.
$result = @{ Summary = ""; Source = "" }
if (-not $PrJson) { return $result }
$candidateAuthors = @(
'github-copilot[bot]', 'github-copilot', 'copilot'
)
# 1. Reviews (preferred) pick the LONGEST valid Copilot body, not the most recent
$reviews = $PrJson.reviews
if ($reviews) {
$copilotReviews = $reviews | Where-Object {
($candidateAuthors -contains $_.author.login -or $_.author.login -like '*copilot*') -and $_.body -and $_.body.Trim() -ne ''
}
if ($copilotReviews) {
$longest = $copilotReviews | Sort-Object { $_.body.Length } -Descending | Select-Object -First 1
if ($longest) {
$body = $longest.body
$norm = ($body -replace "`r", '') -replace "`n", ' '
$norm = $norm -replace '\s+', ' '
$result.Summary = $norm
$result.Source = 'review'
if ($VerboseMode) { Write-DebugMsg "Selected Copilot review length=$($body.Length) (longest)." }
return $result
}
}
}
# 2. Comments fallback (some repos surface Copilot summaries as PR comments rather than review objects)
if ($null -eq $PrJson.comments) {
try {
# Lazy fetch comments only if needed
$commentsJson = gh pr view $PrJson.number --repo $Repo --json comments 2>$null | ConvertFrom-Json
if ($commentsJson -and $commentsJson.comments) {
$PrJson | Add-Member -NotePropertyName comments -NotePropertyValue $commentsJson.comments -Force
}
} catch {
if ($VerboseMode) { Write-DebugMsg "Failed to fetch comments for PR #$($PrJson.number): $_" }
}
}
if ($PrJson.comments) {
$copilotComments = $PrJson.comments | Where-Object {
($candidateAuthors -contains $_.author.login -or $_.author.login -like '*copilot*') -and $_.body -and $_.body.Trim() -ne ''
}
if ($copilotComments) {
$longestC = $copilotComments | Sort-Object { $_.body.Length } -Descending | Select-Object -First 1
if ($longestC) {
$body = $longestC.body
$norm = ($body -replace "`r", '') -replace "`n", ' '
$norm = $norm -replace '\s+', ' '
$result.Summary = $norm
$result.Source = 'comment'
if ($VerboseMode) { Write-DebugMsg "Selected Copilot comment length=$($body.Length) (longest)." }
return $result
}
}
}
return $result
}
foreach ($pr in $prNumbers) {
Write-Info "Fetching PR #$pr ..."
try {
# Include comments only if Verbose asked, otherwise we lazily pull if reviews missing
$fields = 'number,title,labels,author,url,body,reviews'
if ($PSBoundParameters.ContainsKey('Verbose')) { $fields += ',comments' }
$json = gh pr view $pr --repo $Repo --json $fields 2>$null | ConvertFrom-Json
if ($null -eq $json) { throw "Empty response" }
$copilot = Get-CopilotSummaryFromPrJson -PrJson $json -VerboseMode:($PSBoundParameters.ContainsKey('Verbose'))
if ($copilot.Summary -and $copilot.Source -and $PSBoundParameters.ContainsKey('Verbose')) {
Write-DebugMsg "Copilot summary source=$($copilot.Source) chars=$($copilot.Summary.Length)"
} elseif (-not $copilot.Summary) {
Write-DebugMsg "No Copilot summary found for PR #$pr"
}
# Filter labels
$filteredLabels = $json.labels | Where-Object {
($_.name -like "Product-*") -or
($_.name -like "Area-*") -or
($_.name -like "Github*") -or
($_.name -like "*Plugin") -or
($_.name -like "Issue-*")
}
$labelNames = ($filteredLabels | ForEach-Object { $_.name }) -join ", "
$bodyValue = if ($json.body) { ($json.body -replace "`r", '') -replace "`n", ' ' } else { '' }
$bodyValue = $bodyValue -replace '\s+', ' '
$prDetails += [PSCustomObject]@{
Id = $json.number
Title = $json.title
Labels = $labelNames
Author = $json.author.login
Url = $json.url
Body = $bodyValue
CopilotSummary = $copilot.Summary
}
}
catch {
$err = $_
Write-Warn ("Failed to fetch PR #{0}: {1}" -f $pr, $err)
}
}
if (-not $prDetails) { Write-Warn "No PR details fetched."; exit 0 }
# Sort by Labels like original script (first label alphabetical)
$sorted = $prDetails | Sort-Object { ($_.Labels -split ',')[0] }
# Output JSON raw (optional)
$sorted | ConvertTo-Json -Depth 6 | Out-File -Encoding UTF8 $OutputJson
Write-Info "Saving CSV to $OutputCsv ..."
$sorted | Export-Csv $OutputCsv -NoTypeInformation
Write-Host "✅ Done. Generated $($prDetails.Count) PR rows." -ForegroundColor Green