Skip to main content

Performance Profiling

Atmos provides comprehensive performance profiling capabilities through two complementary approaches: a lightweight interactive heatmap for quick performance insights, and deep pprof integration for detailed runtime analysis.

Profiling Approaches

Atmos offers two methods for performance analysis, each suited to different use cases:

Performance Heatmap (Quick Analysis)

A lightweight, built-in visualization tool that provides real-time performance metrics with minimal overhead:

  • Function-level metrics: Call counts, execution times, and percentile statistics
  • Interactive visualization: Multiple display modes (bar charts, sparklines, tables)
  • Zero setup: Single CLI flag (--heatmap) enables tracking
  • Microsecond precision: Captures fast function executions
  • Best for: Quick performance checks, development workflow, identifying hot paths

pprof Profiling (Deep Analysis)

Go's comprehensive profiling toolkit for detailed runtime performance analysis:

  • Multiple profile types: CPU, memory (heap/allocs), goroutines, blocking, mutex contention
  • Line-level profiling: Pinpoint exact code locations
  • Flame graphs: Visual call stack analysis
  • Persistent data: Export profiles for historical comparison
  • Best for: Deep performance investigation, memory leak detection, production debugging

Performance Heatmap

Quick Start

Display the performance heatmap after any Atmos command:

# Run any Atmos command with the --heatmap flag
atmos describe stacks --heatmap

This launches an interactive TUI (Terminal User Interface) where you can switch between visualization modes by pressing 1-3.

Real-World Example

Here's actual output from atmos describe stacks --heatmap:

Interactive Mode (when running in a terminal with TTY):

Performance Heatmap - Bar Chart

The interactive TUI shows the top 25 functions with color-coded bars representing total CPU time. Each function displays its average execution time per call alongside the total number of calls, making it easy to identify both slow functions and high-frequency functions at a glance.

Example Interactive Display:

🔥 Performance Heatmap - Bar Chart

utils.processCustomTags ████████████████████████████████████████ avg: 0.37ms | calls: 447586
exec.ProcessYAMLConfigFile ████████████████████████████████ avg: 686µs | calls: 199
utils.UnmarshalYAMLFromFile █████████████████ avg: 82µs | calls: 53429
  • Bar length: Proportional to total CPU time (shows overall impact on performance)
  • avg: Average self-time per call (shows typical function performance)
  • calls: Number of times the function was invoked

Interactive TUI Legend:

The TUI displays a comprehensive legend at the top showing performance summary and metric descriptions:

Parallelism: ~0.9x | Elapsed: 279.002ms | CPU Time: 255.378ms
Count: # calls (incl. recursion) | CPU Time: sum of self-time (excludes children)
Avg: avg self-time | Max: max self-time | P95: 95th percentile self-time
  • Line 1: Live performance metrics for this execution (Parallelism, Elapsed time, Total CPU time)
  • Line 2: Explanation of Count and CPU Time columns
  • Line 3: Explanation of statistical timing columns (Avg, Max, P95)

Non-Interactive Mode (CI/CD, scripts, or redirected output):

Performance heatmap output

=== Atmos Performance Summary ===
Elapsed: 54.16ms | CPU Time: 37.21ms | Parallelism: ~0.7x
Functions: 42 | Total Calls: 5980

Function Count CPU Time Avg Max P95
exec.ProcessYAMLConfigFileWithContext 52 7.488ms 144µs 760µs 662µs
exec.Execute 1 6.233ms 6.233ms 6.233ms 6.233ms
exec.ValidateStacks 1 5.499ms 5.499ms 5.499ms 5.499ms
merge.MergeWithOptions 746 4.561ms 6µs 654µs 21µs
utils.processCustomTags 1024 4.27ms 4µs 146µs 15µs
utils.GetHighlightedYAML 1 3.86ms 3.86ms 3.86ms 3.86ms
merge.MergeWithContext 356 3.123ms 8µs 655µs 29µs
utils.ConvertToYAML 177 2.5ms 14µs 309µs 56µs
exec.ProcessStackConfig 12 1.926ms 160µs 367µs 186µs
utils.GetGlobMatches 48 1.859ms 38µs 418µs 226µs

Visualization Modes

The heatmap supports three visualization modes. In interactive mode, press 1-3 to switch:

1. Bar Chart (Default)

Horizontal bars with color gradient showing relative total CPU time:

atmos describe stacks --heatmap --heatmap-mode=bar
  • Bar length: Proportional to total CPU time (overall performance impact)
  • Display format: Shows avg: Xms | calls: N for each function
  • Color gradient: Red (highest impact) → Green (lowest impact)
  • Best for: Quick identification of both slow functions and high-impact functions
  • Example: utils.processCustomTags ████████ avg: 0.37ms | calls: 447586

2. Sparkline Mode

Compact sparkline charts showing relative average execution times:

atmos describe stacks --heatmap --heatmap-mode=sparkline
  • Sparkline height: Proportional to average self-time per call
  • Display format: Shows avg: Xms | calls: N for each function
  • Compact view: See more functions in less vertical space
  • Best for: Quick pattern recognition and comparing many functions at once
  • Example: utils.processCustomTags ▇▇▇▇▇▇▇▇▇▇ avg: 0.37ms | calls: 447586

3. Table Mode

Detailed tabular view with all metric columns (top 50 rows):

atmos describe stacks --heatmap --heatmap-mode=table
  • All metric columns (Count/CPU Time/Avg/Max/P95)
  • Shows top 50 functions by CPU time
  • Best for detailed analysis

Understanding Metrics

Performance Summary:

=== Atmos Performance Summary ===
Elapsed: 234.56ms | CPU Time: 185.32ms | Parallelism: ~0.8x
Functions: 12 | Total Calls: 156
  • Elapsed: Total wall-clock execution time for the command
  • CPU Time: Sum of all self-times (actual CPU work done, excludes time spent in child functions)
  • Parallelism: CPU Time ÷ Elapsed time ratio (greater than 1.0 = parallel execution, less than 1.0 = single-threaded)
  • Functions: Number of unique functions tracked
  • Total Calls: Total number of function calls tracked

Function Metrics:

Function
Name of the tracked function
Count
Number of times the function was called (includes all recursive calls)
CPU Time
Sum of self-time across all calls - total CPU work done by the function itself, excluding time spent in child function calls. This avoids double-counting in nested/recursive calls and accurately represents the total CPU time spent executing the function's own code.
Avg
Average self-time per call - actual work done in the function, excluding time spent in child function calls (SelfTime ÷ Count). This metric accurately represents the function's own execution time and is the best indicator for optimization opportunities.
Max
Maximum self-time for a single function call among ALL executions (excludes time spent in children). Critical for identifying performance outliers and worst-case scenarios. If a function is called 100 times with most calls taking ~10ms but one call took 500ms, Max will show 500ms - revealing intermittent issues like excessive work, algorithmic edge cases, or system resource contention affecting that specific call.
P95
95th percentile of self-time latency - represents the self-time below which 95% of function calls complete. A consistent indicator of the function's own work performance, excluding time spent in child functions.
Understanding CPU Time and Self-Time Metrics

All timing columns use Self-Time: CPU Time, Avg, Max, and P95 all exclude time spent in child function calls, showing only the actual work done in the function itself. This avoids double-counting in nested/recursive calls.

CPU Time: Sum of self-time across all calls - represents total CPU work done by the function Avg: Average self-time per call (CPU Time ÷ Count) Max: Maximum self-time for any single call P95: 95th percentile of self-time across all calls

Example: If ProcessConfig is called 10 times and each call does 20ms of its own work (excluding 80ms spent in child functions):

  • CPU Time: 200ms (10 calls × 20ms per call)
  • Avg: 20ms (200ms ÷ 10 calls)
  • Wall-Clock Time: Would be ~1000ms (10 × 100ms including children) - but this would double-count child execution time

Why CPU Time vs Wall-Clock? When functions call each other, summing wall-clock times counts the same work multiple times. CPU Time (sum of self-times) accurately represents total work without double-counting, making it possible to meaningfully compare the sum of all CPU Times to the command's elapsed time.

Parallelism Factor: The ratio of CPU Time to Elapsed time shows execution characteristics:

  • ~0.8x: Single-threaded with some overhead
  • ~1.0x: Perfect single-threaded execution
  • Greater than 1.0x: Parallel execution across multiple cores (e.g., 4.0x = ~4 cores utilized)
Detecting Performance Outliers with Max

Max is your outlier detector! It shows the slowest single execution among all calls to a function (excluding time spent in children).

Example - Spotting Intermittent Issues:

Function                Count    CPU Time  Avg       Max       P95
network.FetchRemote 100 1.2s 12ms 850ms 15ms

Analysis:

  • Count: 100 calls total
  • CPU Time: 1.2s total (sum of all self-times)
  • Avg: 12ms average (typical performance)
  • Max: 850ms (one call's own work took 71x longer!)
  • P95: 15ms (95% of calls complete within 15ms)

Diagnosis: The large gap between Max (850ms) and P95 (15ms) reveals an intermittent performance issue affecting ~5% of calls. Since Max tracks self-time (excluding children), this indicates the function's own work was slow. Likely causes:

  • Network timeout on one request (if function makes network calls)
  • Disk I/O spike during that specific call
  • CPU contention or system resource starvation
  • GC pause occurred during execution
  • Algorithmic edge case (e.g., processing unusually large input)

Action: Investigate why that specific call was 71x slower than average. Use --profile-file with pprof for deeper analysis.

Interactive Controls

Keyboard Shortcuts:

  • ↑/↓: Move selection up/down (also k/j)
  • 1: Switch to Bar Chart mode
  • 2: Switch to Sparkline mode
  • 3: Switch to Table mode
  • q/esc: Quit and return to terminal

CLI Flags

--heatmap
Show performance heatmap visualization after command execution (includes P95 latency) (default: false)
--heatmap-mode
Heatmap visualization mode: bar, sparkline, table (press 1-3 to switch in TUI) (default: bar)

Common Use Cases

Identifying Performance Bottlenecks:

# Track which functions consume the most time
atmos terraform plan large-component -s prod --heatmap

# Focus on functions with high CPU Time

Before/After Comparison:

# Baseline measurement
atmos describe stacks --heatmap 2>baseline.txt

# After optimization
atmos describe stacks --heatmap 2>optimized.txt

# Compare results
diff baseline.txt optimized.txt

CI/CD Integration:

# Capture performance in automated pipelines
atmos terraform plan vpc -s prod --heatmap 2>&1 | tee performance.log

# Parse for regression detection

Development Workflow:

# Create an alias for convenient usage
alias atmos-perf='atmos --heatmap'

# Use during development
atmos-perf validate stacks
atmos-perf describe component vpc -s dev

pprof Profiling

Overview

pprof is Go's standard profiling tool that captures detailed runtime performance data.

Profile Types:

  • CPU Profile: Where your program spends CPU time
  • Heap Profile: Current heap memory allocation patterns
  • Allocs Profile: All memory allocations since program start
  • Goroutine Profile: Active goroutines and call stacks
  • Block Profile: Operations blocking on synchronization
  • Mutex Profile: Lock contention patterns
  • Thread Create Profile: Stack traces leading to thread creation
  • Trace Profile: Detailed execution traces for analysis

File-Based Profiling

Capture profiles directly to a file - ideal for CLI tools:

# CPU profiling (default)
atmos terraform plan vpc -s plat-ue2-dev --profile-file=cpu.prof

# Memory heap profiling
atmos terraform plan vpc -s plat-ue2-dev --profile-file=heap.prof --profile-type=heap

# Execution trace profiling
atmos terraform plan vpc -s plat-ue2-dev --profile-file=trace.out --profile-type=trace

# Goroutine profiling
atmos terraform plan vpc -s plat-ue2-dev --profile-file=goroutine.prof --profile-type=goroutine

File-based profiling output

INFO Profiling started type=cpu file=cpu.prof
INFO Profiling completed type=cpu file=cpu.prof

Server-Based Profiling

Start an HTTP server for interactive profiling:

atmos terraform plan vpc -s plat-ue2-dev --profiler-enabled

Server-based profiling output

INFO Profiler server available at: url=http://localhost:6060/debug/pprof/

Access different profiles through HTTP endpoints:

# CPU profile (30-second sample)
go tool pprof http://localhost:6060/debug/pprof/profile

# Memory allocation profile
go tool pprof http://localhost:6060/debug/pprof/heap

# Goroutine profile
go tool pprof http://localhost:6060/debug/pprof/goroutine

# Web interface
open http://localhost:6060/debug/pprof/

Configuration

CLI Flags:

--profile-file
Write profiling data to file instead of starting server
--profile-type
Type of profile to collect when using --profile-file. Options: cpu, heap, allocs, goroutine, block, mutex, threadcreate, trace (default: cpu)
--profiler-enabled
Enable pprof profiling server (default: false)
--profiler-host
Host for pprof profiling server (default: localhost)
--profiler-port
Port for pprof profiling server (default: 6060)

Environment Variables:

ATMOS_PROFILER_ENABLED
Enable pprof profiling server
ATMOS_PROFILER_HOST
Host address for profiling server
ATMOS_PROFILER_PORT
Port for profiling server
ATMOS_PROFILE_FILE
File path for file-based profiling
ATMOS_PROFILE_TYPE
Profile type for file-based profiling

Configuration File:

atmos.yaml

profiler:
enabled: true
host: "localhost"
port: 6060
file: "profile.out" # Optional: file-based profiling
profile_type: "cpu" # Optional: profile type

Configuration Precedence:

  1. Command-line flags (highest priority)
  2. Environment variables
  3. Configuration file (atmos.yaml)
  4. Default values (lowest priority)

Analyzing Profiles

CPU and Memory Profiles:

# Interactive text mode
go tool pprof cpu.prof
go tool pprof heap.prof

# Web interface (requires Graphviz: brew install graphviz)
go tool pprof -http=:8080 cpu.prof
go tool pprof -http=:8080 heap.prof

# Direct text output
go tool pprof -top cpu.prof
go tool pprof -top heap.prof

Trace Profiles:

# Use go tool trace for execution traces
go tool trace trace.out

# Opens web interface showing:
# - Timeline view of goroutines
# - Network blocking profile
# - Synchronization blocking profile
# - System call blocking profile

Sample pprof output

(pprof) top
Showing nodes accounting for 230ms, 95.83% of 240ms total
Dropped 15 nodes (cum <= 1.20ms)
flat flat% sum% cum cum%
80ms 33.33% 33.33% 80ms 33.33% github.com/cloudposse/atmos/internal/exec.processStackConfig
60ms 25.00% 58.33% 60ms 25.00% gopkg.in/yaml.v3.(*Decoder).Decode
40ms 16.67% 75.00% 40ms 16.67% github.com/cloudposse/atmos/pkg/utils.ProcessTmplWithDatasources
30ms 12.50% 87.50% 30ms 12.50% encoding/json.(*Decoder).Decode
20ms 8.33% 95.83% 20ms 8.33% github.com/cloudposse/atmos/pkg/stack.ProcessStackConfig

Understanding pprof Data:

  • flat: Time spent in the function itself
  • cum: Cumulative time (function + callees)
  • flat%: Percentage of total execution time
  • sum%: Cumulative percentage up to this function

Focus optimization on functions with high flat time and flat%.

Common Scenarios

Performance Optimization:

# Profile CPU usage in slow operations
atmos terraform plan large-component -s prod --profile-file=slow-plan.prof --profile-type=cpu

# Profile memory usage
atmos terraform plan large-component -s prod --profile-file=memory.prof --profile-type=heap

# Profile detailed execution
atmos terraform plan large-component -s prod --profile-file=trace.out --profile-type=trace

# Analyze results
go tool pprof -http=:8080 slow-plan.prof
go tool pprof -http=:8080 memory.prof
go tool trace trace.out

Memory Analysis:

# File-based memory profiling
atmos describe stacks --profile-file=heap.prof --profile-type=heap
atmos describe stacks --profile-file=allocs.prof --profile-type=allocs

# Analyze
go tool pprof -http=:8080 heap.prof
go tool pprof -http=:8080 allocs.prof

# Server-based memory profiling
atmos describe stacks --profiler-enabled

# In another terminal
go tool pprof http://localhost:6060/debug/pprof/heap
go tool pprof http://localhost:6060/debug/pprof/allocs

Custom Server Configuration:

atmos terraform apply vpc -s prod \
--profiler-enabled \
--profiler-host=0.0.0.0 \
--profiler-port=8060

Choosing the Right Tool

FeaturePerformance Heatmappprof Profiling
OverheadMinimal (microsecond tracking)Higher (full runtime instrumentation)
GranularityFunction-level metricsLine-level profiling
SetupCLI flag only (--heatmap)Requires flags or config
OutputReal-time visualizationFile or HTTP endpoint
Use CaseQuick performance checksDeep performance analysis
VisualizationBuilt-in interactive UIRequires go tool pprof
Data RetentionIn-memory (command lifetime)Persistent files
Profile TypesFunction timing onlyCPU, memory, goroutines, blocking, etc.

Use Performance Heatmap when:

  • Validating performance during development
  • Identifying which functions are called most
  • Comparing execution times between commands
  • Need quick insights without external tools
  • Monitoring lightweight operations

Use pprof Profiling when:

  • Deep diving into performance bottlenecks
  • Analyzing CPU and memory usage patterns
  • Investigating memory leaks
  • Profiling goroutines and lock contention
  • Need flame graphs and detailed visualizations
  • Historical comparison across runs

Use Both Together:

# Combine for comprehensive analysis
atmos terraform plan vpc -s prod \
--heatmap \
--profile-file=cpu.prof \
--profile-type=cpu

# View quick heatmap summary
# Then deep dive with pprof
go tool pprof -http=:8080 cpu.prof

Dependencies

Graphviz (Optional for pprof)

The pprof web interface requires Graphviz for visual graphs:

macOS:

brew install graphviz

Ubuntu/Debian:

sudo apt-get install graphviz

CentOS/RHEL:

sudo yum install graphviz

Without Graphviz, text-based pprof analysis still works.

Best Practices

File-Based vs Server-Based (pprof)

  • Use file-based profiling for most CLI operations and performance analysis
  • Use server-based profiling for long-running operations or interactive profiling
  • Supports all profile types: cpu, heap, allocs, goroutine, block, mutex, threadcreate, trace

Performance Heatmap Usage

  • Enable only when needed: Zero overhead when not using --heatmap flag
  • Use appropriate modes: bar for quick comparison, table for detailed analysis
  • Combine with logging: Correlate performance with debug logs
  • Automate in CI/CD: Track performance regressions in pipelines

Profile Regularly

  • Profile before and after optimizations
  • Establish baseline profiles for typical operations
  • Profile different stack sizes and complexity levels
  • Compare heatmap metrics for quick validation, pprof for deep analysis

Security Considerations

  • Server-based profiling exposes runtime information through HTTP
  • Use localhost binding in production environments
  • Disable profiling in production unless actively debugging
  • Performance heatmap is safe (in-memory only, no network exposure)

Troubleshooting

Performance Heatmap Issues

Heatmap Not Showing:

# Ensure you're using the latest Atmos version
atmos version --heatmap

No Functions Tracked:

  • Performance tracking is added incrementally
  • Run commands that process stacks/components for more data:
atmos describe stacks --heatmap
atmos terraform plan component -s stack --heatmap

P95 Shows Zero:

  • P95 requires multiple calls to be meaningful
  • Run commands that execute functions multiple times

Interactive Mode Not Working:

  • Requires TTY (terminal)
  • In scripts/CI/CD, static summary is displayed instead:
⚠️  No TTY available for interactive visualization. Summary displayed above.

pprof Issues

Profile File Creation Errors:

# Ensure directory exists
mkdir -p /path/to/profile/
atmos command --profile-file=/path/to/profile/cpu.prof --profile-type=cpu

Invalid Profile Type:

# Check supported types
echo "Supported: cpu, heap, allocs, goroutine, block, mutex, threadcreate, trace"
atmos command --profile-file=profile.out --profile-type=heap

Graphviz Not Found:

# Use text-based analysis instead
go tool pprof -top cpu.prof
go tool pprof -list=functionName cpu.prof

Server Already Running:

# Use different port
atmos command --profiler-enabled --profiler-port=7070

Examples

Development Workflow

# Quick performance check with heatmap
atmos describe stacks --heatmap

# Deep analysis with pprof if issues found
atmos describe stacks --profile-file=cpu.prof
go tool pprof -http=:8080 cpu.prof

CI/CD Integration

#!/bin/bash
# ci-performance-check.sh

# Run with heatmap for quick metrics
atmos validate stacks --heatmap 2>&1 | tee perf-output.txt

# Extract elapsed time
elapsed=$(grep "Elapsed:" perf-output.txt | awk '{print $2}')

# Check for regression (e.g., > 500ms)
if [ $(echo "$elapsed > 500" | bc) -eq 1 ]; then
echo "Performance regression: ${elapsed} > 500ms"
# Deep profile for investigation
atmos validate stacks --profile-file=cpu.prof --profile-type=cpu
exit 1
fi

Combined Analysis

# Use both tools for comprehensive analysis
atmos terraform plan large-stack -s prod \
--heatmap \
--heatmap-mode=bar \
--profile-file=analysis.prof \
--profile-type=cpu \
2>&1 | tee combined-analysis.log

# View heatmap summary immediately
# Then analyze with pprof
go tool pprof -http=:8080 analysis.prof