Parallel and Matrix Steps for Atmos Workflows
Atmos workflows can now run independent work concurrently with first-class parallel and matrix control steps. Add dependency-aware fan-out, readable grouped or live-prefixed output, and explicit failure behavior directly to your workflow YAML.
The Problem
Workflows are where teams encode the operational knowledge that should not live in someone's shell history: run the checks, build the thing, deploy the dependencies, then summarize what happened.
Until now, those steps were sequential. That was easy to reason about, but it meant a workflow with four independent checks took the sum of all four runtimes. The usual workaround was to drop into shell scripts, background jobs, wait, temp files, and hand-rolled log prefixes. That works until it doesn't:
- Output from concurrent commands interleaves into unreadable logs.
- Failure behavior is implicit and different in every script.
- Dependency relationships are hidden in shell control flow.
- Local workflows and CI matrices drift apart.
Infrastructure automation should not force you to choose between "simple but slow" and "fast but fragile."
What's New
Atmos now supports two new workflow control step types:
parallelruns sibling steps concurrently.matrixexpands literal axes and schedules the generated child steps.
Both support:
needsdependencies between sibling steps.max_concurrencyto bound parallelism.- Failure modes:
wait_all,fail_fast, andbest_effort. - Output modes:
grouped,prefixed, andnone. - Parent-owned summaries with success, failed, skipped, and canceled counts.
This is built into the workflow engine, so the orchestration rules are visible in the workflow file instead of buried in shell glue.
Parallel Checks
Run independent checks together, then run a dependent summary step only after both prerequisites succeed:
workflows:
checks:
steps:
- name: checks
type: parallel
max_concurrency: 4
fail:
mode: wait_all
output:
mode: grouped
order: completion
show_summary: true
prefix: "{{ .step.name }}"
steps:
- name: lint
type: shell
command: make lint
- name: test
type: shell
command: make test
- name: summarize
type: shell
needs: [lint, test]
command: ./scripts/summary.sh
The workflow is still declarative: summarize says what it needs, not how to poll for it. Atmos schedules everything else.
Matrix Fan-Out
Use matrix when the same step should run across combinations:
workflows:
test-matrix:
steps:
- name: test-matrix
type: matrix
max_concurrency: 3
output:
mode: grouped
order: definition
matrix:
os: [linux, darwin]
go: ["1.22", "1.23"]
steps:
- name: test
type: shell
command: make test OS={{ .matrix.os }} GO_VERSION={{ .matrix.go }}
That gives you CI-style fan-out without requiring the workflow to become a GitHub Actions-only construct. The same workflow can run locally, in CI, or inside a larger operational runbook.
Output That Stays Readable
Concurrent output is only useful if humans can read it. The control step owns child output rendering:
groupedcaptures child stdout/stderr and prints labeled blocks.prefixedstreams live output with complete-line prefixes.nonesuppresses terminal output while still capturing metadata.
For live logs:
output:
mode: prefixed
prefix: "{{ .step.name }}"
Example output:
[lint] checking formatting
[test] running unit tests
[lint] passed
[test] passed
[checks] summary: 2 succeeded, 0 failed, 0 skipped, 0 canceled
The summary uses the same Atmos UI formatter as other command output, so success, warning, and failure states are immediately visible.
Explicit Failure Semantics
Parallel work needs a clear answer to "what happens when one branch fails?"
fail:
mode: wait_all # wait_all | fail_fast | best_effort
max_failures: 2 # 0 means unlimited
wait_alllets independent ready/running branches continue, skips dependents of failed children, and fails the parent after schedulable work settles.fail_fastcancels pending and running siblings once the failure threshold is reached.best_effortrecords failures and skips dependents, but lets the parent succeed unless the control step itself is invalid.
That makes failure behavior reviewable. Operators can choose fast feedback for checks, complete collection for reports, or best-effort fan-out where partial success is still useful.
Guardrails for v1
The first version intentionally allows only non-interactive child steps inside concurrent groups:
shellatmossleep
Interactive prompts, terminal-owning renderers, file editors, pagers, spinners, environment-mutating steps, and exec are kept outside concurrent groups for now. That boundary is deliberate: concurrent workflows should not start by letting multiple children fight over the same terminal.
You can still use rich UI steps before or after a parallel or matrix control step to frame the workflow, show tables, render markdown, or summarize the result.
Why This Matters
Parallel and matrix workflow steps make Atmos workflows feel like real orchestration instead of a sequential macro runner.
- Local runbooks get faster without becoming bash concurrency puzzles.
- CI and local automation can share the same workflow definition.
- Dependency relationships are visible as
needs, not hidden in scripts. - Output remains readable by default.
- Failure behavior is part of the contract.
- Matrix fan-out is available anywhere Atmos runs, not only inside a CI provider.
This is especially useful for validation workflows, multi-component smoke tests, cross-platform checks, reporting jobs, and any operational task where several independent branches can run safely at the same time.
Try It
This PR includes a runnable example:
cd examples/parallel-steps
atmos workflow checks -f parallel
atmos workflow prefixed -f parallel
atmos workflow matrix -f parallel
Start with validation and reporting workflows first. They usually have the safest fan-out shape: independent checks, obvious dependencies, and low risk if one branch fails.
For the full field reference, see the parallel and matrix step type documentation.
Get Involved
Try the new control steps on real workflows and tell us where the v1 guardrails feel too strict or exactly right. We're especially interested in feedback on output modes, failure semantics, and which additional non-interactive step types should be allowed inside concurrent groups next.
