AI Providers

The ai.providers section configures connections to AI model providers. Atmos supports API providers (Anthropic, OpenAI, Gemini, GitHub Models, Grok, Ollama, Bedrock, Azure OpenAI) and CLI providers (Claude Code, OpenAI Codex, GitHub Copilot, Gemini CLI) that reuse your existing subscription.

Experimental

Configuration

atmos.yaml

ai:
  default_provider: "anthropic"
  providers:
    anthropic:
      model: "claude-sonnet-4-6"
      api_key: !env "ANTHROPIC_API_KEY"
      max_tokens: 4096
    openai:
      model: "gpt-4o"
      api_key: !env "OPENAI_API_KEY"
      max_tokens: 4096

Provider Settings

Each provider in the providers map supports the following settings:

model: Model identifier to use for this provider. Defaults vary by provider (see below).
api_key: API key for the provider. Use the !env YAML function to read from an environment variable (e.g., api_key: !env "ANTHROPIC_API_KEY"). Not required for Ollama (local) or Bedrock (uses AWS credentials).
max_tokens: Maximum tokens per response (default: 4096). OpenAI reasoning models (o3, o4-mini) use max_completion_tokens internally -- Atmos handles this automatically.
base_url: Custom API endpoint. Required for Grok (https://api.x.ai/v1) and Azure OpenAI (https://<resource>.openai.azure.com). For Bedrock, set to the AWS region (e.g., us-east-1). Ollama defaults to http://localhost:11434/v1.
api_version: API version string. Required for Azure OpenAI (e.g., 2025-04-01-preview).

Supported Providers

Anthropic (Claude) — claude-sonnet-4-6: API Key authentication. Advanced reasoning, default choice.
OpenAI (GPT) — gpt-4o: API Key authentication. Widely available, strong general capabilities.
Google (Gemini) — gemini-2.5-flash: API Key authentication. Fast responses, larger context window.
GitHub Models — openai/gpt-4o-mini: GitHub token authentication. OpenAI-compatible. Zero-secret option for CI — in GitHub Actions the built-in GITHUB_TOKEN works with permissions: models: read.
xAI (Grok) — grok-4-latest: API Key authentication. OpenAI-compatible, real-time knowledge.
Ollama (Local) — llama4: No authentication (local). Privacy, offline use, zero API costs.
AWS Bedrock — anthropic.claude-sonnet-4-6: AWS Credentials authentication. Enterprise security, AWS integration.
Azure OpenAI — gpt-4o: API Key / Azure AD authentication. Enterprise compliance, Azure integration.

CLI Providers

CLI providers invoke a locally installed AI tool as a subprocess. No API key needed — the CLI tool uses your existing subscription.

Claude Code — claude-code: Claude Pro/Max subscription. Full MCP support. Best for interactive development.
OpenAI Codex — codex-cli: ChatGPT Plus/Pro subscription. Full MCP support. Open source (Apache 2.0).
GitHub Copilot — copilot-cli: GitHub Copilot subscription. Full MCP support. Authenticates via copilot /login or COPILOT_GITHUB_TOKEN / GH_TOKEN / GITHUB_TOKEN.
Gemini CLI — gemini-cli: Google account (free tier: 1K req/day). MCP blocked for personal accounts.

CLI Provider Settings

In addition to model, CLI providers support:

binary: Path to the CLI binary. Optional — defaults to finding it on PATH.
max_turns: Maximum agentic turns per invocation (Claude Code only).
max_budget_usd: Budget cap per invocation in USD (Claude Code only).
full_auto: Enable automatic tool approval (Codex CLI and Copilot CLI). For Codex, this passes --full-auto (or --dangerously-bypass-approvals-and-sandbox when MCP servers are configured). For Copilot, this passes --allow-all-tools (implied automatically when MCP servers are configured).
allowed_tools: List of tools Claude Code is allowed to use (Claude Code only).

CLI Provider Examples

atmos.yaml

ai:
  default_provider: "claude-code"
  providers:
    claude-code:
      max_turns: 10
      # max_budget_usd: 1.00
      # allowed_tools: ["Read", "Glob", "Grep"]

    codex-cli:
      model: "gpt-5.4-mini"
      full_auto: true

    copilot-cli:
      model: "claude-sonnet-4.6"
      full_auto: true

    gemini-cli:
      model: "gemini-2.5-flash"

MCP Pass-Through

When mcp.servers is configured, CLI providers automatically pass MCP servers to the CLI tool:

Claude Code: Temp .mcp.json via --mcp-config
Codex CLI: Written to ~/.codex/config.toml (backup/restore after exit)
Copilot CLI: Merged into ~/.copilot/mcp-config.json (backup/restore after exit; honors COPILOT_HOME)
Gemini CLI: Written to .gemini/settings.json in cwd (backup/restore, MCP blocked for personal accounts)

Auth-requiring servers are wrapped with atmos auth exec -i <identity>. Toolchain PATH and ATMOS_* env vars are injected automatically.

Provider Comparison

Setup: Cloud: Easy (API key). Ollama: Medium (install + download model). Enterprise: Complex (cloud config).
Cost: Cloud: Per-token pricing. Ollama: Free. Enterprise: Per-token + cloud costs.
Data Privacy: Cloud: Sent to provider. Ollama: 100% local. Enterprise: Stays in your cloud.
Offline: Cloud: No. Ollama: Yes. Enterprise: No.
Rate Limits: Cloud: Yes. Ollama: No. Enterprise: Yes (configurable).
Compliance: Cloud: Provider-dependent. Ollama: Your control. Enterprise: SOC2, HIPAA, ISO.
Private Network: Cloud: No. Ollama: Yes. Enterprise: Yes (VPC/VNet).

Performance and Cost

Anthropic (Claude) — 200K token context: $3-$15 per 1M tokens (input/output).
OpenAI (GPT-4o) — 128K token context: $2.50-$10 per 1M tokens (input/output).
Google (Gemini) — up to 2M token context: $0.075-$0.30 per 1M tokens (input/output).
xAI (Grok) — 2M token context: $3-$15 per 1M tokens (input/output).
Ollama — 128K token context: $0 (hardware only).
AWS Bedrock — 200K token context: $3-$15 + AWS costs per 1M tokens.
Azure OpenAI — 128K token context: $2.50-$10 + Azure costs per 1M tokens.

Cost Optimization

For budget-conscious usage, try Gemini (cheapest cloud) or Ollama (free). For balanced cost and quality, Claude Sonnet or GPT-4o work well. Enterprise providers can leverage committed-use discounts.

Privacy and Security

Anthropic: Data location: US (cloud). Compliance: SOC 2, GDPR.
OpenAI: Data location: US (cloud). Compliance: SOC 2, GDPR.
Gemini: Data location: US (cloud). Compliance: SOC 2, GDPR.
Grok: Data location: US (cloud). Compliance: Standard.
Ollama: Data location: Your machine. Compliance: Your control.
Bedrock: Data location: Your AWS region. Compliance: SOC 2, HIPAA, ISO.
Azure OpenAI: Data location: Your Azure region. Compliance: SOC 2, HIPAA, ISO.

For sensitive infrastructure data, consider Ollama (never leaves your machine) or enterprise providers (stays in your cloud account).

Provider Examples

Cloud Providers
Local (Ollama)
Enterprise

atmos.yaml

ai:
  providers:
    anthropic:
      model: "claude-sonnet-4-6"
      api_key: !env "ANTHROPIC_API_KEY"
      max_tokens: 4096

    openai:
      model: "gpt-4o"
      api_key: !env "OPENAI_API_KEY"
      max_tokens: 4096

    gemini:
      model: "gemini-2.5-flash"
      api_key: !env "GEMINI_API_KEY"
      max_tokens: 8192

    github:
      model: "openai/gpt-4o-mini"
      api_key: !env "GITHUB_TOKEN"
      max_tokens: 4096

    grok:
      model: "grok-4-latest"
      api_key: !env "XAI_API_KEY"
      base_url: "https://api.x.ai/v1"
      max_tokens: 4096

atmos.yaml

ai:
  providers:
    ollama:
      model: "llama4"
      # base_url defaults to http://localhost:11434/v1
      # No api_key needed for local instances

    # Remote Ollama server
    ollama-remote:
      model: "llama4"
      base_url: "http://<server-ip>:11434/v1"

Ollama Installation

shell

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows: download from https://ollama.com/download

Start the service and pull a model:

shell

ollama serve        # Start Ollama (Linux auto-starts as a service)
ollama pull llama4          # Pull recommended model
ollama list                # List available models

Model Selection

llama4 — ~40GB, 64GB+ RAM: Production use (recommended).
llama3.3:70b — ~40GB, 64GB+ RAM: Production use (previous generation).
llama3.1:8b — ~5GB, 8GB+ RAM: Quick queries, laptops.
codellama:34b — ~19GB, 32GB+ RAM: Code generation.
codellama:13b — ~8GB, 16GB+ RAM: Code on mid-range hardware.
mixtral:8x7b — ~26GB, 32GB+ RAM: Complex reasoning.
mistral:7b — ~4GB, 8GB+ RAM: Fast responses, consumer hardware.

Performance Tips

Hardware recommendations:

Apple Silicon (M1/M2/M3/M4): Handles llama4 well with unified memory
16GB RAM: Use llama3.1:8b or mistral:7b
32GB+ RAM: Use mixtral:8x7b or llama3.3:70b
64GB+ RAM: Use llama4
GPU acceleration (NVIDIA/AMD/Apple Silicon) provides 10-50x faster inference; Ollama detects and uses it automatically

atmos.yaml

providers:
  ollama:
    model: "llama4"
    max_tokens: 4096       # Smaller = faster

Remote Deployment

To enable remote access on the server, set OLLAMA_HOST=0.0.0.0:11434 before starting Ollama.

caution

Binding to 0.0.0.0 exposes the Ollama API to all network interfaces. Ensure the server is behind a firewall or reverse proxy with authentication in production environments.

OLLAMA_HOST: Server bind address (default: 127.0.0.1:11434).
OLLAMA_ORIGINS: CORS allowed origins (default: * — allow all).
OLLAMA_MODELS: Model storage path (default: ~/.ollama/models).

atmos.yaml

ai:
  providers:
    bedrock:
      model: "anthropic.claude-sonnet-4-6"
      max_tokens: 4096
      base_url: "us-east-1"  # AWS region

    azureopenai:
      model: "gpt-4o"  # Azure deployment name
      api_key: !env "AZURE_OPENAI_API_KEY"
      base_url: "https://<your-resource>.openai.azure.com"
      api_version: "2025-04-01-preview"
      max_tokens: 4096

AWS Bedrock

Authentication uses the standard AWS SDK credential chain (IAM roles, profiles, environment variables):

shell

# Option 1: Environment variables
export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="..."

# Option 2: AWS Profile
export AWS_PROFILE="bedrock-profile"

# Option 3: IAM role (automatic in EC2/ECS/EKS)

Bedrock Endpoints

Use the global. prefix for dynamic routing across regions, or the anthropic. prefix for a specific region. Cross-region inference profiles use IDs like us.anthropic.claude-sonnet-4-6-v1:0.

Azure OpenAI

Azure-Specific Configuration

The model field must be your deployment name from Azure Portal, not the OpenAI model name. Both base_url (your resource endpoint) and api_version are required. See Azure OpenAI Documentation for available API versions.

GitHub Models in GitHub Actions (Zero Secrets)

The github provider talks to GitHub Models, an OpenAI-compatible inference API at https://models.github.ai/inference. Its headline use case is CI: in GitHub Actions the workflow's built-in GITHUB_TOKEN authenticates when you grant the models: read permission — no API keys or secrets to provision. Model names use the publisher/model-name format (e.g., openai/gpt-4o-mini).

atmos.yaml

ai:
  enabled: true
  default_provider: "github"
  providers:
    github:
      model: "openai/gpt-4o-mini"
      api_key: !env "GITHUB_TOKEN"

.github/workflows/ai.yaml

permissions:
  contents: read
  models: read

jobs:
  summarize:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v5

      - uses: cloudposse/github-action-setup-atmos@v2

      - run: atmos ai ask "summarize the stacks in this repo"
        env:
          # provider config: api_key: !env GITHUB_TOKEN
          GITHUB_TOKEN: ${{ github.token }}

Locally, any GitHub token works (e.g., export GITHUB_TOKEN=$(gh auth token)). For GitHub Enterprise, override base_url to your instance's inference endpoint. Non-interactive commands like atmos ai ask require no TTY, so they run cleanly in CI logs.

Token Caching

Token caching (prompt caching) reduces costs by reusing frequently-sent content like system prompts.

Anthropic — 90% discount: Manual markers. Configuration required (see below).
OpenAI — 50% discount: Automatic. No configuration required.
Gemini — Free (included): Automatic. No configuration required.
Grok — 75% discount: Automatic. No configuration required.
Bedrock — Up to 90% discount: Automatic. No configuration required.
Azure OpenAI — 50-100% discount: Automatic. No configuration required.
Ollama — N/A: Local processing, caching not applicable.

Most providers implement caching automatically. Only Anthropic requires explicit configuration:

atmos.yaml

ai:
  providers:
    anthropic:
      model: "claude-sonnet-4-6"
      api_key: !env "ANTHROPIC_API_KEY"
      cache:
        enabled: true              # Enable prompt caching (default)
        cache_system_prompt: true  # Cache skill system prompt
        cache_project_instructions: true # Cache ATMOS.md content

cache.enabled: Enable or disable prompt caching (default: true). Cached tokens cost 90% less for Anthropic.
cache.cache_system_prompt: Cache the skill system prompt across messages (default: true).
cache.cache_project_instructions: Cache the ATMOS.md project instructions content (default: true).

Maximize Cache Savings

For Anthropic, keep your ATMOS.md content stable during conversations and use named sessions for extended work on the same topic. For all other providers, caching works transparently with no action required.

Multi-Provider Configuration

Configure multiple providers and switch between them. Set default_provider for CLI commands, and press Ctrl+P during a chat session to switch mid-conversation.

Provider Isolation

Each provider maintains its own completely isolated conversation history. When you switch providers, the new provider only sees messages from its own thread. When you switch back, it picks up right where you left off. UI notifications like "Switched to..." are never sent to any provider.

Security Best Practices

Cloud Providers (Anthropic, OpenAI, Google, xAI): Use environment variables for API keys (never commit to git), rotate keys regularly, and review each provider's data retention policies.
Local Provider (Ollama): All processing is local. Keep Ollama updated and use disk encryption for sensitive data.
Enterprise Providers (Bedrock, Azure OpenAI): Use IAM roles or managed identities instead of API keys, enable audit logging (CloudTrail or Azure Monitor), configure private endpoints, and use customer-managed encryption keys where available.

AI Configuration - Top-level AI settings
MCP Server - Universal MCP integration
Claude Code Integration - IDE integration with Claude Code

Configuration​

Provider Settings​

Supported Providers​

CLI Providers​

CLI Provider Settings​

CLI Provider Examples​

MCP Pass-Through​

Provider Comparison​

Performance and Cost​

Privacy and Security​

Provider Examples​

Ollama Installation​

Model Selection​

Performance Tips​

Remote Deployment​

AWS Bedrock​

Azure OpenAI​

GitHub Models in GitHub Actions (Zero Secrets)​

Token Caching​

Multi-Provider Configuration​

Provider Isolation​

Security Best Practices​

Related Documentation​

Configuration

Provider Settings

Supported Providers

CLI Providers

CLI Provider Settings

CLI Provider Examples

MCP Pass-Through

Provider Comparison

Performance and Cost

Privacy and Security

Provider Examples

Ollama Installation

Model Selection

Performance Tips

Remote Deployment

AWS Bedrock

Azure OpenAI

GitHub Models in GitHub Actions (Zero Secrets)

Token Caching

Multi-Provider Configuration

Provider Isolation

Security Best Practices

Related Documentation