Performance Profiling

Atmos provides comprehensive performance profiling capabilities through two complementary approaches: a lightweight interactive heatmap for quick performance insights, and deep pprof integration for detailed runtime analysis.

Profiling Approaches

Atmos offers two methods for performance analysis, each suited to different use cases:

Performance Heatmap (Quick Analysis)

A lightweight, built-in visualization tool that provides real-time performance metrics with minimal overhead:

Function-level metrics: Call counts, execution times, and percentile statistics
Interactive visualization: Multiple display modes (bar charts, sparklines, tables)
Zero setup: Single CLI flag (--heatmap) enables tracking
Microsecond precision: Captures fast function executions
Best for: Quick performance checks, development workflow, identifying hot paths

pprof Profiling (Deep Analysis)

Go's comprehensive profiling toolkit for detailed runtime performance analysis:

Multiple profile types: CPU, memory (heap/allocs), goroutines, blocking, mutex contention
Line-level profiling: Pinpoint exact code locations
Flame graphs: Visual call stack analysis
Persistent data: Export profiles for historical comparison
Best for: Deep performance investigation, memory leak detection, production debugging

Performance Heatmap

Quick Start

Display the performance heatmap after any Atmos command:

# Run any Atmos command with the --heatmap flag
atmos describe stacks --heatmap

This launches an interactive TUI (Terminal User Interface) where you can switch between visualization modes by pressing 1-3.

Real-World Example

Here's actual output from atmos describe stacks --heatmap:

Interactive Mode (when running in a terminal with TTY):

Performance Heatmap - Bar Chart

The interactive TUI shows the top 25 functions with color-coded bars representing total CPU time. Each function displays its average execution time per call alongside the total number of calls, making it easy to identify both slow functions and high-frequency functions at a glance.

Example Interactive Display:

🔥 Performance Heatmap - Bar Chart

utils.processCustomTags           ████████████████████████████████████████ avg: 0.37ms | calls: 447586
exec.ProcessYAMLConfigFile        ████████████████████████████████ avg: 686µs | calls: 199
utils.UnmarshalYAMLFromFile       █████████████████ avg: 82µs | calls: 53429

Bar length: Proportional to total CPU time (shows overall impact on performance)
avg: Average self-time per call (shows typical function performance)
calls: Number of times the function was invoked

Interactive TUI Legend:

The TUI displays a comprehensive legend at the top showing performance summary and metric descriptions:

Parallelism: ~0.9x | Elapsed: 279.002ms | CPU Time: 255.378ms
Count: # calls (incl. recursion) | CPU Time: sum of self-time (excludes children)
Avg: avg self-time | Max: max self-time | P95: 95th percentile self-time

Line 1: Live performance metrics for this execution (Parallelism, Elapsed time, Total CPU time)
Line 2: Explanation of Count and CPU Time columns
Line 3: Explanation of statistical timing columns (Avg, Max, P95)

Non-Interactive Mode (CI/CD, scripts, or redirected output):

Performance heatmap output

=== Atmos Performance Summary ===
Elapsed: 54.16ms | CPU Time: 37.21ms | Parallelism: ~0.7x
Functions: 42 | Total Calls: 5980

Function                                            Count    CPU Time        Avg        Max        P95
exec.ProcessYAMLConfigFileWithContext                  52     7.488ms      144µs      760µs      662µs
exec.Execute                                            1     6.233ms    6.233ms    6.233ms    6.233ms
exec.ValidateStacks                                     1     5.499ms    5.499ms    5.499ms    5.499ms
merge.MergeWithOptions                                746     4.561ms        6µs      654µs       21µs
utils.processCustomTags                              1024      4.27ms        4µs      146µs       15µs
utils.GetHighlightedYAML                                1      3.86ms     3.86ms     3.86ms     3.86ms
merge.MergeWithContext                                356     3.123ms        8µs      655µs       29µs
utils.ConvertToYAML                                   177       2.5ms       14µs      309µs       56µs
exec.ProcessStackConfig                                12     1.926ms      160µs      367µs      186µs
utils.GetGlobMatches                                   48     1.859ms       38µs      418µs      226µs

Visualization Modes

The heatmap supports three visualization modes. In interactive mode, press 1-3 to switch:

1. Bar Chart (Default)

Horizontal bars with color gradient showing relative total CPU time:

atmos describe stacks --heatmap --heatmap-mode=bar

Bar length: Proportional to total CPU time (overall performance impact)
Display format: Shows avg: Xms | calls: N for each function
Color gradient: Red (highest impact) → Green (lowest impact)
Best for: Quick identification of both slow functions and high-impact functions
Example: utils.processCustomTags ████████ avg: 0.37ms | calls: 447586

2. Sparkline Mode

Compact sparkline charts showing relative average execution times:

atmos describe stacks --heatmap --heatmap-mode=sparkline

Sparkline height: Proportional to average self-time per call
Display format: Shows avg: Xms | calls: N for each function
Compact view: See more functions in less vertical space
Best for: Quick pattern recognition and comparing many functions at once
Example: utils.processCustomTags ▇▇▇▇▇▇▇▇▇▇ avg: 0.37ms | calls: 447586

3. Table Mode

Detailed tabular view with all metric columns (top 50 rows):

atmos describe stacks --heatmap --heatmap-mode=table

All metric columns (Count/CPU Time/Avg/Max/P95)
Shows top 50 functions by CPU time
Best for detailed analysis

Understanding Metrics

Performance Summary:

=== Atmos Performance Summary ===
Elapsed: 234.56ms | CPU Time: 185.32ms | Parallelism: ~0.8x
Functions: 12 | Total Calls: 156

Elapsed: Total wall-clock execution time for the command
CPU Time: Sum of all self-times (actual CPU work done, excludes time spent in child functions)
Parallelism: CPU Time ÷ Elapsed time ratio (greater than 1.0 = parallel execution, less than 1.0 = single-threaded)
Functions: Number of unique functions tracked
Total Calls: Total number of function calls tracked

Function Metrics:

Function: Name of the tracked function
Count: Number of times the function was called (includes all recursive calls)
CPU Time: Sum of self-time across all calls - total CPU work done by the function itself, excluding time spent in child function calls. This avoids double-counting in nested/recursive calls and accurately represents the total CPU time spent executing the function's own code.
Avg: Average self-time per call - actual work done in the function, excluding time spent in child function calls (SelfTime ÷ Count). This metric accurately represents the function's own execution time and is the best indicator for optimization opportunities.
Max: Maximum self-time for a single function call among ALL executions (excludes time spent in children). Critical for identifying performance outliers and worst-case scenarios. If a function is called 100 times with most calls taking ~10ms but one call took 500ms, Max will show 500ms - revealing intermittent issues like excessive work, algorithmic edge cases, or system resource contention affecting that specific call.
P95: 95th percentile of self-time latency - represents the self-time below which 95% of function calls complete. A consistent indicator of the function's own work performance, excluding time spent in child functions.

Understanding CPU Time and Self-Time Metrics

All timing columns use Self-Time: CPU Time, Avg, Max, and P95 all exclude time spent in child function calls, showing only the actual work done in the function itself. This avoids double-counting in nested/recursive calls.

CPU Time: Sum of self-time across all calls - represents total CPU work done by the function Avg: Average self-time per call (CPU Time ÷ Count) Max: Maximum self-time for any single call P95: 95th percentile of self-time across all calls

Example: If ProcessConfig is called 10 times and each call does 20ms of its own work (excluding 80ms spent in child functions):

CPU Time: 200ms (10 calls × 20ms per call)
Avg: 20ms (200ms ÷ 10 calls)
Wall-Clock Time: Would be ~1000ms (10 × 100ms including children) - but this would double-count child execution time

Why CPU Time vs Wall-Clock? When functions call each other, summing wall-clock times counts the same work multiple times. CPU Time (sum of self-times) accurately represents total work without double-counting, making it possible to meaningfully compare the sum of all CPU Times to the command's elapsed time.

Parallelism Factor: The ratio of CPU Time to Elapsed time shows execution characteristics:

~0.8x: Single-threaded with some overhead
~1.0x: Perfect single-threaded execution
Greater than 1.0x: Parallel execution across multiple cores (e.g., 4.0x = ~4 cores utilized)

Detecting Performance Outliers with Max

Max is your outlier detector! It shows the slowest single execution among all calls to a function (excluding time spent in children).

Example - Spotting Intermittent Issues:

Function                Count    CPU Time  Avg       Max       P95
network.FetchRemote     100      1.2s      12ms      850ms     15ms

Analysis:

Count: 100 calls total
CPU Time: 1.2s total (sum of all self-times)
Avg: 12ms average (typical performance)
Max: 850ms (one call's own work took 71x longer!)
P95: 15ms (95% of calls complete within 15ms)

Diagnosis: The large gap between Max (850ms) and P95 (15ms) reveals an intermittent performance issue affecting ~5% of calls. Since Max tracks self-time (excluding children), this indicates the function's own work was slow. Likely causes:

Network timeout on one request (if function makes network calls)
Disk I/O spike during that specific call
CPU contention or system resource starvation
GC pause occurred during execution
Algorithmic edge case (e.g., processing unusually large input)

Action: Investigate why that specific call was 71x slower than average. Use --profile-file with pprof for deeper analysis.

Interactive Controls

Keyboard Shortcuts:

↑/↓: Move selection up/down (also k/j)
1: Switch to Bar Chart mode
2: Switch to Sparkline mode
3: Switch to Table mode
q/esc: Quit and return to terminal

CLI Flags

--heatmap: Show performance heatmap visualization after command execution (includes P95 latency) (default: false)
--heatmap-mode: Heatmap visualization mode: bar, sparkline, table (press 1-3 to switch in TUI) (default: bar)

Common Use Cases

Identifying Performance Bottlenecks:

# Track which functions consume the most time
atmos terraform plan large-component -s prod --heatmap

# Focus on functions with high CPU Time

Before/After Comparison:

# Baseline measurement
atmos describe stacks --heatmap 2>baseline.txt

# After optimization
atmos describe stacks --heatmap 2>optimized.txt

# Compare results
diff baseline.txt optimized.txt

CI/CD Integration:

# Capture performance in automated pipelines
atmos terraform plan vpc -s prod --heatmap 2>&1 | tee performance.log

# Parse for regression detection

Development Workflow:

# Create an alias for convenient usage
alias atmos-perf='atmos --heatmap'

# Use during development
atmos-perf validate stacks
atmos-perf describe component vpc -s dev

pprof Profiling

Overview

pprof is Go's standard profiling tool that captures detailed runtime performance data.

Profile Types:

CPU Profile: Where your program spends CPU time
Heap Profile: Current heap memory allocation patterns
Allocs Profile: All memory allocations since program start
Goroutine Profile: Active goroutines and call stacks
Block Profile: Operations blocking on synchronization
Mutex Profile: Lock contention patterns
Thread Create Profile: Stack traces leading to thread creation
Trace Profile: Detailed execution traces for analysis

File-Based Profiling

Capture profiles directly to a file - ideal for CLI tools:

# CPU profiling (default)
atmos terraform plan vpc -s plat-ue2-dev --profile-file=cpu.prof

# Memory heap profiling
atmos terraform plan vpc -s plat-ue2-dev --profile-file=heap.prof --profile-type=heap

# Execution trace profiling
atmos terraform plan vpc -s plat-ue2-dev --profile-file=trace.out --profile-type=trace

# Goroutine profiling
atmos terraform plan vpc -s plat-ue2-dev --profile-file=goroutine.prof --profile-type=goroutine

File-based profiling output

INFO Profiling started type=cpu file=cpu.prof
INFO Profiling completed type=cpu file=cpu.prof

Server-Based Profiling

Start an HTTP server for interactive profiling:

atmos terraform plan vpc -s plat-ue2-dev --profiler-enabled

Server-based profiling output

INFO Profiler server available at: url=http://localhost:6060/debug/pprof/

Access different profiles through HTTP endpoints:

# CPU profile (30-second sample)
go tool pprof http://localhost:6060/debug/pprof/profile

# Memory allocation profile
go tool pprof http://localhost:6060/debug/pprof/heap

# Goroutine profile
go tool pprof http://localhost:6060/debug/pprof/goroutine

# Web interface
open http://localhost:6060/debug/pprof/

Configuration

CLI Flags:

--profile-file: Write profiling data to file instead of starting server
--profile-type: Type of profile to collect when using --profile-file. Options: cpu, heap, allocs, goroutine, block, mutex, threadcreate, trace (default: cpu)
--profiler-enabled: Enable pprof profiling server (default: false)
--profiler-host: Host for pprof profiling server (default: localhost)
--profiler-port: Port for pprof profiling server (default: 6060)

Environment Variables:

ATMOS_PROFILER_ENABLED: Enable pprof profiling server
ATMOS_PROFILER_HOST: Host address for profiling server
ATMOS_PROFILER_PORT: Port for profiling server
ATMOS_PROFILE_FILE: File path for file-based profiling
ATMOS_PROFILE_TYPE: Profile type for file-based profiling

Configuration File:

atmos.yaml

profiler:
  enabled: true
  host: "localhost"
  port: 6060
  file: "profile.out"           # Optional: file-based profiling
  profile_type: "cpu"           # Optional: profile type

Configuration Precedence:

Command-line flags (highest priority)
Environment variables
Configuration file (atmos.yaml)
Default values (lowest priority)

Analyzing Profiles

CPU and Memory Profiles:

# Interactive text mode
go tool pprof cpu.prof
go tool pprof heap.prof

# Web interface (requires Graphviz: brew install graphviz)
go tool pprof -http=:8080 cpu.prof
go tool pprof -http=:8080 heap.prof

# Direct text output
go tool pprof -top cpu.prof
go tool pprof -top heap.prof

Trace Profiles:

# Use go tool trace for execution traces
go tool trace trace.out

# Opens web interface showing:
# - Timeline view of goroutines
# - Network blocking profile
# - Synchronization blocking profile
# - System call blocking profile

Sample pprof output

(pprof) top
Showing nodes accounting for 230ms, 95.83% of 240ms total
Dropped 15 nodes (cum <= 1.20ms)
flat flat% sum% cum cum%
80ms 33.33% 33.33% 80ms 33.33% github.com/cloudposse/atmos/internal/exec.processStackConfig
60ms 25.00% 58.33% 60ms 25.00% gopkg.in/yaml.v3.(*Decoder).Decode
40ms 16.67% 75.00% 40ms 16.67% github.com/cloudposse/atmos/pkg/utils.ProcessTmplWithDatasources
30ms 12.50% 87.50% 30ms 12.50% encoding/json.(*Decoder).Decode
20ms 8.33% 95.83% 20ms 8.33% github.com/cloudposse/atmos/pkg/stack.ProcessStackConfig

Understanding pprof Data:

flat: Time spent in the function itself
cum: Cumulative time (function + callees)
flat%: Percentage of total execution time
sum%: Cumulative percentage up to this function

Focus optimization on functions with high flat time and flat%.

Common Scenarios

Performance Optimization:

# Profile CPU usage in slow operations
atmos terraform plan large-component -s prod --profile-file=slow-plan.prof --profile-type=cpu

# Profile memory usage
atmos terraform plan large-component -s prod --profile-file=memory.prof --profile-type=heap

# Profile detailed execution
atmos terraform plan large-component -s prod --profile-file=trace.out --profile-type=trace

# Analyze results
go tool pprof -http=:8080 slow-plan.prof
go tool pprof -http=:8080 memory.prof
go tool trace trace.out

Memory Analysis:

# File-based memory profiling
atmos describe stacks --profile-file=heap.prof --profile-type=heap
atmos describe stacks --profile-file=allocs.prof --profile-type=allocs

# Analyze
go tool pprof -http=:8080 heap.prof
go tool pprof -http=:8080 allocs.prof

# Server-based memory profiling
atmos describe stacks --profiler-enabled

# In another terminal
go tool pprof http://localhost:6060/debug/pprof/heap
go tool pprof http://localhost:6060/debug/pprof/allocs

Custom Server Configuration:

atmos terraform apply vpc -s prod \
  --profiler-enabled \
  --profiler-host=0.0.0.0 \
  --profiler-port=8060

Choosing the Right Tool

Feature	Performance Heatmap	pprof Profiling
Overhead	Minimal (microsecond tracking)	Higher (full runtime instrumentation)
Granularity	Function-level metrics	Line-level profiling
Setup	CLI flag only (`--heatmap`)	Requires flags or config
Output	Real-time visualization	File or HTTP endpoint
Use Case	Quick performance checks	Deep performance analysis
Visualization	Built-in interactive UI	Requires `go tool pprof`
Data Retention	In-memory (command lifetime)	Persistent files
Profile Types	Function timing only	CPU, memory, goroutines, blocking, etc.

Use Performance Heatmap when:

Validating performance during development
Identifying which functions are called most
Comparing execution times between commands
Need quick insights without external tools
Monitoring lightweight operations

Use pprof Profiling when:

Deep diving into performance bottlenecks
Analyzing CPU and memory usage patterns
Investigating memory leaks
Profiling goroutines and lock contention
Need flame graphs and detailed visualizations
Historical comparison across runs

Use Both Together:

# Combine for comprehensive analysis
atmos terraform plan vpc -s prod \
  --heatmap \
  --profile-file=cpu.prof \
  --profile-type=cpu

# View quick heatmap summary
# Then deep dive with pprof
go tool pprof -http=:8080 cpu.prof

Dependencies

Graphviz (Optional for pprof)

The pprof web interface requires Graphviz for visual graphs:

macOS:

brew install graphviz

Ubuntu/Debian:

sudo apt-get install graphviz

CentOS/RHEL:

sudo yum install graphviz

Without Graphviz, text-based pprof analysis still works.

Best Practices

File-Based vs Server-Based (pprof)

Use file-based profiling for most CLI operations and performance analysis
Use server-based profiling for long-running operations or interactive profiling
Supports all profile types: cpu, heap, allocs, goroutine, block, mutex, threadcreate, trace

Performance Heatmap Usage

Enable only when needed: Zero overhead when not using --heatmap flag
Use appropriate modes: bar for quick comparison, table for detailed analysis
Combine with logging: Correlate performance with debug logs
Automate in CI/CD: Track performance regressions in pipelines

Profile Regularly

Profile before and after optimizations
Establish baseline profiles for typical operations
Profile different stack sizes and complexity levels
Compare heatmap metrics for quick validation, pprof for deep analysis

Security Considerations

Server-based profiling exposes runtime information through HTTP
Use localhost binding in production environments
Disable profiling in production unless actively debugging
Performance heatmap is safe (in-memory only, no network exposure)

Troubleshooting

Performance Heatmap Issues

Heatmap Not Showing:

# Ensure you're using the latest Atmos version
atmos version --heatmap

No Functions Tracked:

Performance tracking is added incrementally
Run commands that process stacks/components for more data:

atmos describe stacks --heatmap
atmos terraform plan component -s stack --heatmap

P95 Shows Zero:

P95 requires multiple calls to be meaningful
Run commands that execute functions multiple times

Interactive Mode Not Working:

Requires TTY (terminal)
In scripts/CI/CD, static summary is displayed instead:

⚠️  No TTY available for interactive visualization. Summary displayed above.

pprof Issues

Profile File Creation Errors:

# Ensure directory exists
mkdir -p /path/to/profile/
atmos command --profile-file=/path/to/profile/cpu.prof --profile-type=cpu

Invalid Profile Type:

# Check supported types
echo "Supported: cpu, heap, allocs, goroutine, block, mutex, threadcreate, trace"
atmos command --profile-file=profile.out --profile-type=heap

Graphviz Not Found:

# Use text-based analysis instead
go tool pprof -top cpu.prof
go tool pprof -list=functionName cpu.prof

Server Already Running:

# Use different port
atmos command --profiler-enabled --profiler-port=7070

Examples

Development Workflow

# Quick performance check with heatmap
atmos describe stacks --heatmap

# Deep analysis with pprof if issues found
atmos describe stacks --profile-file=cpu.prof
go tool pprof -http=:8080 cpu.prof

CI/CD Integration

#!/bin/bash
# ci-performance-check.sh

# Run with heatmap for quick metrics
atmos validate stacks --heatmap 2>&1 | tee perf-output.txt

# Extract elapsed time
elapsed=$(grep "Elapsed:" perf-output.txt | awk '{print $2}')

# Check for regression (e.g., > 500ms)
if [ $(echo "$elapsed > 500" | bc) -eq 1 ]; then
  echo "Performance regression: ${elapsed} > 500ms"
  # Deep profile for investigation
  atmos validate stacks --profile-file=cpu.prof --profile-type=cpu
  exit 1
fi

Combined Analysis

# Use both tools for comprehensive analysis
atmos terraform plan large-stack -s prod \
  --heatmap \
  --heatmap-mode=bar \
  --profile-file=analysis.prof \
  --profile-type=cpu \
  2>&1 | tee combined-analysis.log

# View heatmap summary immediately
# Then analyze with pprof
go tool pprof -http=:8080 analysis.prof

Error Messages - Common error messages and solutions

Profiling Approaches​

Performance Heatmap (Quick Analysis)​

pprof Profiling (Deep Analysis)​

Performance Heatmap​

Quick Start​

Real-World Example​

Performance heatmap output

Visualization Modes​

1. Bar Chart (Default)​

2. Sparkline Mode​

3. Table Mode​

Understanding Metrics​

Interactive Controls​

CLI Flags​

Common Use Cases​

pprof Profiling​

Overview​

File-Based Profiling​

File-based profiling output

Server-Based Profiling​

Server-based profiling output

Configuration​

atmos.yaml

Analyzing Profiles​

Sample pprof output

Common Scenarios​

Choosing the Right Tool​

Dependencies​

Graphviz (Optional for pprof)​

Best Practices​

File-Based vs Server-Based (pprof)​

Performance Heatmap Usage​

Profile Regularly​

Security Considerations​

Troubleshooting​

Performance Heatmap Issues​

pprof Issues​

Examples​

Development Workflow​

CI/CD Integration​

Combined Analysis​

Related Documentation​