# AI Providers

The `ai.providers` section configures connections to AI model providers. Atmos supports **API providers**
(Anthropic, OpenAI, Gemini, Grok, Ollama, Bedrock, Azure OpenAI) and **CLI providers** (Claude Code, OpenAI Codex, Gemini CLI) that reuse your existing subscription.

> ⚠️ Experimental

## Configuration

**File:** `atmos.yaml`

```yaml
ai:
  default_provider: "anthropic"
  providers:
    anthropic:
      model: "claude-sonnet-4-6"
      api_key: !env "ANTHROPIC_API_KEY"
      max_tokens: 4096
    openai:
      model: "gpt-4o"
      api_key: !env "OPENAI_API_KEY"
      max_tokens: 4096
```

## Provider Settings

Each provider in the `providers` map supports the following settings:

- **`model`**
  Model identifier to use for this provider. Defaults vary by provider (see below).
- **`api_key`**
  API key for the provider. Use the 
  `!env`
   YAML function to read from an environment variable (e.g., 
  `api_key: !env "ANTHROPIC_API_KEY"`
  ). Not required for Ollama (local) or Bedrock (uses AWS credentials).
- **`max_tokens`**
  Maximum tokens per response (default: 
  `4096`
  ). OpenAI reasoning models (o3, o4-mini) use 
  `max_completion_tokens`
   internally -- Atmos handles this automatically.
- **`base_url`**
  Custom API endpoint. Required for Grok (
  `https://api.x.ai/v1`
  ) and Azure OpenAI (
  `https://<resource>.openai.azure.com`
  ). For Bedrock, set to the AWS region (e.g., 
  `us-east-1`
  ). Ollama defaults to 
  `http://localhost:11434/v1`
  .
- **`api_version`**
  API version string. 
  **Required**
   for Azure OpenAI (e.g., 
  `2025-04-01-preview`
  ).

## Supported Providers

- ****Anthropic (Claude)** — `claude-sonnet-4-6`**
  API Key authentication. Advanced reasoning, default choice.
- ****OpenAI (GPT)** — `gpt-4o`**
  API Key authentication. Widely available, strong general capabilities.
- ****Google (Gemini)** — `gemini-2.5-flash`**
  API Key authentication. Fast responses, larger context window.
- ****xAI (Grok)** — `grok-4-latest`**
  API Key authentication. OpenAI-compatible, real-time knowledge.
- ****Ollama (Local)** — `llama4`**
  No authentication (local). Privacy, offline use, zero API costs.
- ****AWS Bedrock** — `anthropic.claude-sonnet-4-6`**
  AWS Credentials authentication. Enterprise security, AWS integration.
- ****Azure OpenAI** — `gpt-4o`**
  API Key / Azure AD authentication. Enterprise compliance, Azure integration.

## CLI Providers

CLI providers invoke a locally installed AI tool as a subprocess. No API key needed — the CLI tool uses your existing subscription.

- ****Claude Code** — `claude-code`**
  Claude Pro/Max subscription. Full MCP support. Best for interactive development.
- ****OpenAI Codex** — `codex-cli`**
  ChatGPT Plus/Pro subscription. Full MCP support. Open source (Apache 2.0).
- ****Gemini CLI** — `gemini-cli`**
  Google account (free tier: 1K req/day). MCP blocked for personal accounts.

### CLI Provider Settings

In addition to `model`, CLI providers support:

- **`binary`**
  Path to the CLI binary. Optional — defaults to finding it on PATH.
- **`max_turns`**
  Maximum agentic turns per invocation (Claude Code only).
- **`max_budget_usd`**
  Budget cap per invocation in USD (Claude Code only).
- **`full_auto`**
  Enable automatic approval for file writes (Codex CLI only). When MCP servers are configured, 
  `--dangerously-bypass-approvals-and-sandbox`
   is used automatically.
- **`allowed_tools`**
  List of tools Claude Code is allowed to use (Claude Code only).

### CLI Provider Examples

**File:** `atmos.yaml`

```yaml
ai:
  default_provider: "claude-code"
  providers:
    claude-code:
      max_turns: 10
      # max_budget_usd: 1.00
      # allowed_tools: ["Read", "Glob", "Grep"]

    codex-cli:
      model: "gpt-5.4-mini"
      full_auto: true

    gemini-cli:
      model: "gemini-2.5-flash"
```

### MCP Pass-Through

When `mcp.servers` is configured, CLI providers automatically pass MCP servers to the CLI tool:

- **Claude Code:** Temp `.mcp.json` via `--mcp-config`
- **Codex CLI:** Written to `~/.codex/config.toml` (backup/restore after exit)
- **Gemini CLI:** Written to `.gemini/settings.json` in cwd (backup/restore, MCP blocked for personal accounts)

Auth-requiring servers are wrapped with `atmos auth exec -i <identity>`. Toolchain PATH and `ATMOS_*` env vars are injected automatically.

## Provider Comparison

- **Setup**
  **Cloud:**
   Easy (API key). 
  **Ollama:**
   Medium (install + download model). 
  **Enterprise:**
   Complex (cloud config).
- **Cost**
  **Cloud:**
   Per-token pricing. 
  **Ollama:**
   Free. 
  **Enterprise:**
   Per-token + cloud costs.
- **Data Privacy**
  **Cloud:**
   Sent to provider. 
  **Ollama:**
   100% local. 
  **Enterprise:**
   Stays in your cloud.
- **Offline**
  **Cloud:**
   No. 
  **Ollama:**
   Yes. 
  **Enterprise:**
   No.
- **Rate Limits**
  **Cloud:**
   Yes. 
  **Ollama:**
   No. 
  **Enterprise:**
   Yes (configurable).
- **Compliance**
  **Cloud:**
   Provider-dependent. 
  **Ollama:**
   Your control. 
  **Enterprise:**
   SOC2, HIPAA, ISO.
- **Private Network**
  **Cloud:**
   No. 
  **Ollama:**
   Yes. 
  **Enterprise:**
   Yes (VPC/VNet).

### Performance and Cost

- ****Anthropic (Claude)** — 200K token context**
  $3-$15 per 1M tokens (input/output).
- ****OpenAI (GPT-4o)** — 128K token context**
  $2.50-$10 per 1M tokens (input/output).
- ****Google (Gemini)** — up to 2M token context**
  $0.075-$0.30 per 1M tokens (input/output).
- ****xAI (Grok)** — 2M token context**
  $3-$15 per 1M tokens (input/output).
- ****Ollama** — 128K token context**
  $0 (hardware only).
- ****AWS Bedrock** — 200K token context**
  $3-$15 + AWS costs per 1M tokens.
- ****Azure OpenAI** — 128K token context**
  $2.50-$10 + Azure costs per 1M tokens.

:::tip Cost Optimization
For budget-conscious usage, try Gemini (cheapest cloud) or Ollama (free). For balanced cost and quality, Claude Sonnet or GPT-4o work well. Enterprise providers can leverage committed-use discounts.
:::

### Privacy and Security

- ****Anthropic****
  Data location: US (cloud). Compliance: SOC 2, GDPR.
- ****OpenAI****
  Data location: US (cloud). Compliance: SOC 2, GDPR.
- ****Gemini****
  Data location: US (cloud). Compliance: SOC 2, GDPR.
- ****Grok****
  Data location: US (cloud). Compliance: Standard.
- ****Ollama****
  Data location: Your machine. Compliance: Your control.
- ****Bedrock****
  Data location: Your AWS region. Compliance: SOC 2, HIPAA, ISO.
- ****Azure OpenAI****
  Data location: Your Azure region. Compliance: SOC 2, HIPAA, ISO.

For sensitive infrastructure data, consider Ollama (never leaves your machine) or enterprise providers (stays in your cloud account).

## Provider Examples

### Cloud Providers

**File:** `atmos.yaml`

```yaml
ai:
  providers:
    anthropic:
      model: "claude-sonnet-4-6"
      api_key: !env "ANTHROPIC_API_KEY"
      max_tokens: 4096

    openai:
      model: "gpt-4o"
      api_key: !env "OPENAI_API_KEY"
      max_tokens: 4096

    gemini:
      model: "gemini-2.5-flash"
      api_key: !env "GEMINI_API_KEY"
      max_tokens: 8192

    grok:
      model: "grok-4-latest"
      api_key: !env "XAI_API_KEY"
      base_url: "https://api.x.ai/v1"
      max_tokens: 4096
```

### Local (Ollama)

**File:** `atmos.yaml`

```yaml
ai:
  providers:
    ollama:
      model: "llama4"
      # base_url defaults to http://localhost:11434/v1
      # No api_key needed for local instances

    # Remote Ollama server
    ollama-remote:
      model: "llama4"
      base_url: "http://<server-ip>:11434/v1"
```

#### Ollama Installation

```shell
# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows: download from https://ollama.com/download
```

Start the service and pull a model:

```shell
ollama serve        # Start Ollama (Linux auto-starts as a service)
ollama pull llama4          # Pull recommended model
ollama list                # List available models
```

#### Model Selection

- **`llama4` — ~40GB, 64GB+ RAM**
  Production use (recommended).
- **`llama3.3:70b` — ~40GB, 64GB+ RAM**
  Production use (previous generation).
- **`llama3.1:8b` — ~5GB, 8GB+ RAM**
  Quick queries, laptops.
- **`codellama:34b` — ~19GB, 32GB+ RAM**
  Code generation.
- **`codellama:13b` — ~8GB, 16GB+ RAM**
  Code on mid-range hardware.
- **`mixtral:8x7b` — ~26GB, 32GB+ RAM**
  Complex reasoning.
- **`mistral:7b` — ~4GB, 8GB+ RAM**
  Fast responses, consumer hardware.

#### Performance Tips

Hardware recommendations:

- **Apple Silicon (M1/M2/M3/M4):** Handles `llama4` well with unified memory
- **16GB RAM:** Use `llama3.1:8b` or `mistral:7b`
- **32GB+ RAM:** Use `mixtral:8x7b` or `llama3.3:70b`
- **64GB+ RAM:** Use `llama4`
- **GPU acceleration** (NVIDIA/AMD/Apple Silicon) provides 10-50x faster inference; Ollama detects and uses it automatically

**File:** `atmos.yaml`

```yaml
providers:
  ollama:
    model: "llama4"
    max_tokens: 4096       # Smaller = faster
```

#### Remote Deployment

To enable remote access on the server, set `OLLAMA_HOST=0.0.0.0:11434` before starting Ollama.

:::caution
Binding to `0.0.0.0` exposes the Ollama API to all network interfaces. Ensure the server is behind a firewall or reverse proxy with authentication in production environments.
:::

- **`OLLAMA_HOST`**
  Server bind address (default: 
  `127.0.0.1:11434`
  ).
- **`OLLAMA_ORIGINS`**
  CORS allowed origins (default: 
  `*`
   — allow all).
- **`OLLAMA_MODELS`**
  Model storage path (default: 
  `~/.ollama/models`
  ).

### Enterprise

**File:** `atmos.yaml`

```yaml
ai:
  providers:
    bedrock:
      model: "anthropic.claude-sonnet-4-6"
      max_tokens: 4096
      base_url: "us-east-1"  # AWS region

    azureopenai:
      model: "gpt-4o"  # Azure deployment name
      api_key: !env "AZURE_OPENAI_API_KEY"
      base_url: "https://<your-resource>.openai.azure.com"
      api_version: "2025-04-01-preview"
      max_tokens: 4096
```

#### AWS Bedrock

Authentication uses the standard AWS SDK credential chain (IAM roles, profiles, environment variables):

```shell
# Option 1: Environment variables
export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="..."

# Option 2: AWS Profile
export AWS_PROFILE="bedrock-profile"

# Option 3: IAM role (automatic in EC2/ECS/EKS)
```

:::warning Bedrock Endpoints
Use the `global.` prefix for dynamic routing across regions, or the `anthropic.` prefix for a specific region. Cross-region inference profiles use IDs like `us.anthropic.claude-sonnet-4-6-v1:0`.
:::

#### Azure OpenAI

:::warning Azure-Specific Configuration
The `model` field must be your **deployment name** from Azure Portal, not the OpenAI model name. Both `base_url` (your resource endpoint) and `api_version` are required. See [Azure OpenAI Documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/) for available API versions.
:::

## Token Caching

Token caching (prompt caching) reduces costs by reusing frequently-sent content like system prompts.

- ****Anthropic** — 90% discount**
  Manual markers. Configuration required (see below).
- ****OpenAI** — 50% discount**
  Automatic. No configuration required.
- ****Gemini** — Free (included)**
  Automatic. No configuration required.
- ****Grok** — 75% discount**
  Automatic. No configuration required.
- ****Bedrock** — Up to 90% discount**
  Automatic. No configuration required.
- ****Azure OpenAI** — 50-100% discount**
  Automatic. No configuration required.
- ****Ollama** — N/A**
  Local processing, caching not applicable.

Most providers implement caching automatically. Only Anthropic requires explicit configuration:

**File:** `atmos.yaml`

```yaml
ai:
  providers:
    anthropic:
      model: "claude-sonnet-4-6"
      api_key: !env "ANTHROPIC_API_KEY"
      cache:
        enabled: true              # Enable prompt caching (default)
        cache_system_prompt: true  # Cache skill system prompt
        cache_project_instructions: true # Cache ATMOS.md content
```

- **`cache.enabled`**
  Enable or disable prompt caching (default: 
  `true`
  ). Cached tokens cost 90% less for Anthropic.
- **`cache.cache_system_prompt`**
  Cache the skill system prompt across messages (default: 
  `true`
  ).
- **`cache.cache_project_instructions`**
  Cache the ATMOS.md project instructions content (default: 
  `true`
  ).

:::tip Maximize Cache Savings
For Anthropic, keep your ATMOS.md content stable during conversations and use named sessions for extended work on the same topic. For all other providers, caching works transparently with no action required.
:::

## Multi-Provider Configuration

Configure multiple providers and switch between them. Set `default_provider` for CLI commands, and press `Ctrl+P` during a chat session to switch mid-conversation.

### Provider Isolation

Each provider maintains its own completely isolated conversation history. When you switch providers, the new provider only sees messages from its own thread. When you switch back, it picks up right where you left off. UI notifications like "Switched to..." are never sent to any provider.

## Security Best Practices

- **Cloud Providers (Anthropic, OpenAI, Google, xAI)**
  Use environment variables for API keys (never commit to git), rotate keys regularly, and review each provider's data retention policies.
- **Local Provider (Ollama)**
  All processing is local. Keep Ollama updated and use disk encryption for sensitive data.
- **Enterprise Providers (Bedrock, Azure OpenAI)**
  Use IAM roles or managed identities instead of API keys, enable audit logging (CloudTrail or Azure Monitor), configure private endpoints, and use customer-managed encryption keys where available.

## Related Documentation

- [AI Configuration](/cli/configuration/ai) - Top-level AI settings
- [MCP Server](/ai/mcp-server) - Universal MCP integration
- [Claude Code Integration](/ai/claude-code-integration) - IDE integration with Claude Code