Component Retry
Configure per-component retry behavior so transient failures — provider download 502s, S3 backend timeouts, registry rate limits — recover automatically without manual re-runs.
Terraform commands often fail with transient infrastructure errors that have nothing to do with the Terraform code. The most common is a 502 Bad Gateway during terraform init:
Error: Failed to install provider … 502 Bad Gateway returned from
https://github.com/opentofu/terraform-provider-local/releases/...
Configure a retry block on the component and Atmos will retry each subprocess invocation (init, workspace, plan/apply) when the captured output matches one of the regex patterns you list as recoverable.
Usage
components:
terraform:
vpc:
retry:
max_attempts: 5
backoff_strategy: exponential
initial_delay: 2s
max_delay: 30s
conditions:
- /Bad Gateway/
- /5\d\d /
- /connection reset/
- /TLS handshake timeout/
- /could not query provider registry/
How it works
When retry.conditions is configured, each call Atmos makes to the terraform/tofu binary is wrapped in an independent retry loop:
- The subprocess runs as usual; its stdout and stderr stream to the user and are captured into an in-memory buffer.
- When the subprocess exits with a non-zero status, the captured output is matched against every regex in
conditions. - If at least one pattern matches, Atmos waits for the configured backoff and retries.
- If no pattern matches, the error is returned immediately — real failures (
terraform planexit-code 2, schema errors, permission denials) are never silently retried.
Each subprocess invocation has its own retry loop. Retrying terraform init does not consume the apply retry budget.
Arguments
conditionsA list of regex patterns matched against captured stdout/stderr. Only errors whose output matches at least one pattern are retried. Patterns may be wrapped in
/.../for readability. Withoutconditions, no retry is attempted — this is a safety default so a typo never silently retries real failures.max_attempts- Maximum number of attempts, including the first. Omit for unlimited; defaults to
1(no retry) whenretryis unset entirely. backoff_strategy- One of
constant,linear, orexponential. Defaults toexponentialwheninitial_delayis set. initial_delay- Delay before the second attempt as a Go duration string (e.g.
"2s","500ms"). max_delay- Upper bound on any single delay; caps exponential growth (e.g.
"30s"). max_elapsed_time- Total time budget across all attempts (e.g.
"5m"). After this the most recent error is returned regardless ofmax_attempts. multiplier- Growth multiplier for exponential backoff. Defaults to
2.0. random_jitter- Fractional jitter applied to each delay (0–1). Useful to avoid thundering-herd retries from CI fleets.
Inheritance
The retry block participates in component inheritance. Abstract components can define a default retry policy that concrete components inherit and override:
components:
terraform:
base/network:
metadata:
type: abstract
retry:
max_attempts: 3
initial_delay: 2s
backoff_strategy: exponential
conditions:
- /Bad Gateway/
- /5\d\d /
vpc:
metadata:
component: base/network
# vpc inherits the base retry block.
transit-gateway:
metadata:
component: base/network
retry:
# Override only the fields we care about. `conditions` is appended to,
# `max_attempts` is replaced.
max_attempts: 5
conditions:
- /TLS handshake timeout/
When NOT to use retry
- Real Terraform failures. A
planthat exits with code 2 because of a configuration error or avalidatethat catches a typo should fail loudly. Keepconditionsnarrowly scoped to transient infra patterns. - Stateful mutations with non-idempotent side effects. If a third-party API is invoked from a
local-execprovisioner and is not safe to retry, do not configure retry conditions that could match its error output. - Hooks. Atmos hooks have their own configuration and are not affected by component retry.
See also
- Workflow retry — retry configuration for multi-step workflows.
- Vendor retry — retry configuration for component vendoring.