Files
wrkflw/crates/executor/src/substitution.rs
Gokul 6016887a3b feat(executor): easy GHA emulation fixes for better compatibility (#82)
* feat(executor): add easy GHA emulation fixes for better compatibility

- Expand github.* context with 13 missing env vars (CI, GITHUB_ACTIONS,
  GITHUB_REF_NAME, GITHUB_REF_TYPE, GITHUB_REPOSITORY_OWNER, etc.) and
  improve GITHUB_ACTOR to use git config / $USER instead of hardcoded value
- Enforce timeout-minutes at both job level (default 360m per GHA spec)
  and step level via tokio::time::timeout
- Implement defaults.run.shell and defaults.run.working-directory with
  proper fallback chain: step > job defaults > workflow defaults > bash
- Implement hashFiles() expression function with glob matching, sorted
  file hashing (SHA-256), and integration into the substitution pipeline

* fix(executor): harden hashFiles, working-directory, and shell -e

Three issues from code review, all in the "we got the GHA emulation
*almost* right" category:

1. hashFiles() was returning an empty string when no files matched.
   GHA returns the SHA-256 of empty input (e3b0c44...), not nothing.
   An empty string as a cache key component is the kind of thing
   that silently ruins your day. Also, unreadable files were being
   skipped without a peep — now we at least warn about it.

2. The working-directory default resolution was doing a naive
   Path::join with user-controlled input. If someone writes
   `working-directory: ../../../etc` or an absolute path, join
   happily replaces the base. Inside a container this is *somewhat*
   contained, but in emulation mode it's a real path traversal.
   Normalize the path and reject anything that escapes the
   workspace.

3. The bash -e flag change (correct per GHA spec) was undocumented.
   Scripts that relied on intermediate commands failing without
   aborting the step will now break. Document it in
   BREAKING_CHANGES.md so users aren't left guessing.

* fix(executor): complete the GHA shell invocation and harden hashFiles

The previous commit added `-e` to bash but stopped there, even
though the BREAKING_CHANGES.md *literally documented* the full GHA
invocation as `bash --noprofile --norc -e -o pipefail {0}`. So we
were advertising behavior we weren't actually implementing. This is
not great.

Without `-o pipefail`, piped commands like `false | echo ok` would
silently succeed, which is exactly the kind of divergence that makes
you distrust an emulator. And without `--noprofile --norc`, user
profile scripts can interfere with reproducibility.

While at it, fix hashFiles error handling — it was silently
swallowing read errors and producing a partial hash, which is worse
than failing because you get a *wrong* cache key with no indication
anything went sideways. preprocess_hash_files and
preprocess_expressions now return Result and the engine surfaces
failures as step errors.

Also add the tests that should have been there from the start:
shell invocation flags, working-directory path traversal rejection,
and defaults cascade (step > job > workflow).

* fix(executor): harden hashFiles, timeout, and shell edge cases

The previous round of GHA emulation fixes left a few holes that
would bite you in production:

hashFiles() would happily glob '../../etc/passwd' and hash whatever
it found outside the workspace. It also loaded entire file contents
into memory before hashing, which is *not great* when someone points
it at a large binary artifact. The glob patterns now reject '..'
traversal, and file contents are streamed into the SHA-256 hasher
via io::copy instead.

timeout-minutes accepted any f64 from YAML, including negative
values, NaN, and infinity — all of which make Duration::from_secs_f64
panic. Non-finite and non-positive values now fall back to the GHA
default of 360 minutes.

Unknown shell values were silently accepted with a '-c' fallback.
Now they emit a warning so you at least *know* something is off.

While at it, replaced the hash_files_read_error_returns_err test
that was testing two Ok paths (despite its name) with proper
path-traversal rejection tests.

* fix(executor): fix shadowed timeout_mins and extract sanitization helper

It turns out the job timeout error path was re-reading the *raw*
timeout_minutes value instead of using the already-sanitized one.
If someone set timeout-minutes to NaN or a negative number, the
sanitization would correctly fall back to 360, but the error
message would happily print "Job exceeded timeout of NaN minutes."

Not great.

Extract sanitize_timeout_minutes() so both the job and step
timeout paths use the same logic instead of duplicating the
is_finite/positive/clamp dance. While at it, add proper tests
for NaN, Infinity, negative, zero, and the max clamp — plus a
test that actually exercises the job-level timeout expiry branch,
which previously had zero coverage.
2026-04-02 18:00:41 +05:30

343 lines
12 KiB
Rust

use lazy_static::lazy_static;
use regex::Regex;
use serde_yaml::Value;
use sha2::{Digest, Sha256};
use std::collections::HashMap;
use std::path::Path;
lazy_static! {
static ref MATRIX_PATTERN: Regex =
Regex::new(r"\$\{\{\s*matrix\.([a-zA-Z0-9_]+)\s*\}\}").unwrap();
static ref HASH_FILES_PATTERN: Regex =
Regex::new(r"\$\{\{\s*hashFiles\(([^)]+)\)\s*\}\}").unwrap();
}
/// Preprocesses a command string to replace GitHub-style matrix variable references
/// with their values from the environment
pub fn preprocess_command(command: &str, matrix_values: &HashMap<String, Value>) -> String {
// Replace matrix references like ${{ matrix.os }} with their values
let result = MATRIX_PATTERN.replace_all(command, |caps: &regex::Captures| {
let var_name = &caps[1];
// Get the value from matrix context
if let Some(value) = matrix_values.get(var_name) {
// Convert value to string
match value {
Value::String(s) => s.clone(),
Value::Number(n) => n.to_string(),
Value::Bool(b) => b.to_string(),
_ => format!("\\${{{{ matrix.{} }}}}", var_name), // Escape $ for shell
}
} else {
// Keep original if not found but escape $ to prevent shell errors
format!("\\${{{{ matrix.{} }}}}", var_name)
}
});
result.into_owned()
}
/// Apply variable substitution to step run commands
pub fn process_step_run(run: &str, matrix_combination: &Option<HashMap<String, Value>>) -> String {
if let Some(matrix) = matrix_combination {
preprocess_command(run, matrix)
} else {
// Escape $ in GitHub expression syntax to prevent shell interpretation
MATRIX_PATTERN
.replace_all(run, |caps: &regex::Captures| {
let var_name = &caps[1];
format!("\\${{{{ matrix.{} }}}}", var_name)
})
.to_string()
}
}
/// Replace `${{ hashFiles(...) }}` expressions with the SHA-256 hash of matched files.
///
/// Accepts one or more comma-separated, quoted glob patterns. Files are matched
/// relative to `workspace`, sorted lexicographically, and hashed in order to
/// produce a deterministic digest — matching GitHub Actions behavior.
///
/// Returns `Err` if any matched file cannot be read.
pub fn preprocess_hash_files(text: &str, workspace: &Path) -> Result<String, String> {
let mut error: Option<String> = None;
let result = HASH_FILES_PATTERN
.replace_all(text, |caps: &regex::Captures| {
if error.is_some() {
return String::new();
}
let args_raw = &caps[1];
match compute_hash_files(args_raw, workspace) {
Ok(hash) => hash,
Err(e) => {
error = Some(e);
String::new()
}
}
})
.into_owned();
match error {
Some(e) => Err(e),
None => Ok(result),
}
}
/// Compute a SHA-256 hash of the contents of all files matching the given glob patterns.
///
/// `args_raw` is the raw argument string inside `hashFiles(...)`, e.g.
/// `'**/package-lock.json', '**/yarn.lock'`.
///
/// Returns `Ok(hash)` on success or `Err(message)` if any matched file cannot be read.
fn compute_hash_files(args_raw: &str, workspace: &Path) -> Result<String, String> {
// Parse comma-separated, quoted patterns
let patterns: Vec<&str> = args_raw
.split(',')
.map(|s| s.trim().trim_matches('\'').trim_matches('"'))
.filter(|s| !s.is_empty())
.collect();
if patterns.is_empty() {
return Ok(String::new());
}
// Reject patterns containing path traversal components
for pattern in &patterns {
if pattern.split('/').any(|seg| seg == "..") {
return Err(format!(
"hashFiles: pattern '{}' contains '..' path traversal",
pattern
));
}
}
// Collect all matching files
let mut matched_files = Vec::new();
for pattern in &patterns {
let full_pattern = workspace.join(pattern).to_string_lossy().to_string();
if let Ok(entries) = glob::glob(&full_pattern) {
for entry in entries.flatten() {
if entry.is_file() {
matched_files.push(entry);
}
}
}
}
if matched_files.is_empty() {
// GHA returns the SHA-256 of empty input when no files match
return Ok(format!("{:x}", Sha256::new().finalize()));
}
// Sort for deterministic output (GHA sorts lexicographically)
matched_files.sort();
matched_files.dedup();
// Hash all file contents (stream to avoid loading large files into memory)
let mut hasher = Sha256::new();
for path in &matched_files {
let mut file = std::fs::File::open(path)
.map_err(|e| format!("hashFiles: could not read '{}': {}", path.display(), e))?;
std::io::copy(&mut file, &mut hasher)
.map_err(|e| format!("hashFiles: could not read '{}': {}", path.display(), e))?;
}
Ok(format!("{:x}", hasher.finalize()))
}
/// Apply all expression substitutions: hashFiles, matrix variables.
///
/// Returns `Err` if a `hashFiles()` expression fails (e.g. unreadable file).
pub fn preprocess_expressions(
text: &str,
workspace: &Path,
matrix_combination: &Option<HashMap<String, Value>>,
) -> Result<String, String> {
// First resolve hashFiles
let result = preprocess_hash_files(text, workspace)?;
// Then resolve matrix variables
Ok(if let Some(matrix) = matrix_combination {
preprocess_command(&result, matrix)
} else {
MATRIX_PATTERN
.replace_all(&result, |caps: &regex::Captures| {
let var_name = &caps[1];
format!("\\${{{{ matrix.{} }}}}", var_name)
})
.to_string()
})
}
#[cfg(test)]
mod tests {
use super::*;
use std::fs;
use tempfile::tempdir;
#[test]
fn test_preprocess_simple_matrix_vars() {
let mut matrix = HashMap::new();
matrix.insert("os".to_string(), Value::String("ubuntu-latest".to_string()));
matrix.insert(
"node".to_string(),
Value::Number(serde_yaml::Number::from(14)),
);
let cmd = "echo \"Running on ${{ matrix.os }} with Node ${{ matrix.node }}\"";
let processed = preprocess_command(cmd, &matrix);
assert_eq!(processed, "echo \"Running on ubuntu-latest with Node 14\"");
}
#[test]
fn test_preprocess_with_missing_vars() {
let mut matrix = HashMap::new();
matrix.insert("os".to_string(), Value::String("ubuntu-latest".to_string()));
let cmd = "echo \"Running on ${{ matrix.os }} with Node ${{ matrix.node }}\"";
let processed = preprocess_command(cmd, &matrix);
// Missing vars should be escaped
assert_eq!(
processed,
"echo \"Running on ubuntu-latest with Node \\${{ matrix.node }}\""
);
}
#[test]
fn test_preprocess_preserves_other_text() {
let mut matrix = HashMap::new();
matrix.insert("os".to_string(), Value::String("ubuntu-latest".to_string()));
let cmd = "echo \"Starting job\" && echo \"OS: ${{ matrix.os }}\" && echo \"Done!\"";
let processed = preprocess_command(cmd, &matrix);
assert_eq!(
processed,
"echo \"Starting job\" && echo \"OS: ubuntu-latest\" && echo \"Done!\""
);
}
#[test]
fn test_process_without_matrix() {
let cmd = "echo \"Value: ${{ matrix.value }}\"";
let processed = process_step_run(cmd, &None);
assert_eq!(processed, "echo \"Value: \\${{ matrix.value }}\"");
}
#[test]
fn hash_files_single_pattern() {
let dir = tempdir().unwrap();
fs::write(dir.path().join("package-lock.json"), "lock-content").unwrap();
fs::write(dir.path().join("other.txt"), "other").unwrap();
let text = "${{ hashFiles('package-lock.json') }}";
let result = preprocess_hash_files(text, dir.path()).unwrap();
assert!(!result.is_empty());
assert!(!result.contains("hashFiles"));
// Hash should be 64 hex chars (SHA-256)
assert_eq!(result.len(), 64);
}
#[test]
fn hash_files_multiple_patterns() {
let dir = tempdir().unwrap();
fs::write(dir.path().join("a.lock"), "aaa").unwrap();
fs::write(dir.path().join("b.json"), "bbb").unwrap();
let text = "${{ hashFiles('*.lock', '*.json') }}";
let result = preprocess_hash_files(text, dir.path()).unwrap();
assert_eq!(result.len(), 64);
}
#[test]
fn hash_files_no_matches_returns_hash_of_empty() {
let dir = tempdir().unwrap();
let text = "${{ hashFiles('nonexistent-*.xyz') }}";
let result = preprocess_hash_files(text, dir.path()).unwrap();
// GHA returns SHA-256 of empty input when no files match
let expected = format!("{:x}", Sha256::new().finalize());
assert_eq!(result, expected);
assert_eq!(result.len(), 64);
}
#[test]
fn hash_files_deterministic() {
let dir = tempdir().unwrap();
fs::write(dir.path().join("a.txt"), "hello").unwrap();
fs::write(dir.path().join("b.txt"), "world").unwrap();
let text = "${{ hashFiles('*.txt') }}";
let r1 = preprocess_hash_files(text, dir.path()).unwrap();
let r2 = preprocess_hash_files(text, dir.path()).unwrap();
assert_eq!(r1, r2);
}
#[test]
fn hash_files_glob_recursive() {
let dir = tempdir().unwrap();
let sub = dir.path().join("sub");
fs::create_dir(&sub).unwrap();
fs::write(sub.join("deep.lock"), "deep-content").unwrap();
let text = "${{ hashFiles('**/deep.lock') }}";
let result = preprocess_hash_files(text, dir.path()).unwrap();
assert_eq!(result.len(), 64);
}
#[test]
fn hash_files_inline_with_other_text() {
let dir = tempdir().unwrap();
fs::write(dir.path().join("Cargo.lock"), "lockfile").unwrap();
let text = "cache-key-${{ hashFiles('Cargo.lock') }}-suffix";
let result = preprocess_hash_files(text, dir.path()).unwrap();
assert!(result.starts_with("cache-key-"));
assert!(result.ends_with("-suffix"));
assert!(!result.contains("hashFiles"));
}
#[test]
fn preprocess_expressions_combines_hash_and_matrix() {
let dir = tempdir().unwrap();
fs::write(dir.path().join("Cargo.lock"), "lockfile").unwrap();
let mut matrix = HashMap::new();
matrix.insert("os".to_string(), Value::String("ubuntu".to_string()));
let text = "${{ matrix.os }}-${{ hashFiles('Cargo.lock') }}";
let result = preprocess_expressions(text, dir.path(), &Some(matrix)).unwrap();
assert!(result.starts_with("ubuntu-"));
assert!(!result.contains("hashFiles"));
assert!(!result.contains("matrix"));
}
#[test]
fn hash_files_rejects_path_traversal() {
let dir = tempdir().unwrap();
fs::write(dir.path().join("legit.txt"), "content").unwrap();
let text = "${{ hashFiles('../../etc/passwd') }}";
let result = preprocess_hash_files(text, dir.path());
assert!(result.is_err());
assert!(result.unwrap_err().contains("path traversal"));
}
#[test]
fn hash_files_rejects_mid_path_traversal() {
let dir = tempdir().unwrap();
let result = compute_hash_files("'subdir/../../etc/passwd'", dir.path());
assert!(result.is_err());
assert!(result.unwrap_err().contains("path traversal"));
}
}