AI Providers
The ai.providers section configures connections to AI model providers like Anthropic, OpenAI, Google Gemini,
Grok, Ollama, AWS Bedrock, and Azure OpenAI.
Configuration
atmos.yaml
Provider Settings
Each provider in the providers map supports the following settings:
model- Model identifier to use for this provider. Defaults vary by provider (see below).
api_key- API key for the provider. Use the
!envYAML function to read from an environment variable (e.g.,api_key: !env "ANTHROPIC_API_KEY"). Not required for Ollama (local) or Bedrock (uses AWS credentials). max_tokens- Maximum tokens per response (default:
4096). OpenAI reasoning models (o3, o4-mini) usemax_completion_tokensinternally -- Atmos handles this automatically. base_url- Custom API endpoint. Required for Grok (
https://api.x.ai/v1) and Azure OpenAI (https://<resource>.openai.azure.com). For Bedrock, set to the AWS region (e.g.,us-east-1). Ollama defaults tohttp://localhost:11434/v1. api_version- API version string. Required for Azure OpenAI (e.g.,
2025-04-01-preview).
Supported Providers
- Anthropic (Claude) —
claude-sonnet-4-6 - API Key authentication. Advanced reasoning, default choice.
- OpenAI (GPT) —
gpt-4o - API Key authentication. Widely available, strong general capabilities.
- Google (Gemini) —
gemini-2.5-flash - API Key authentication. Fast responses, larger context window.
- xAI (Grok) —
grok-4-latest - API Key authentication. OpenAI-compatible, real-time knowledge.
- Ollama (Local) —
llama4 - No authentication (local). Privacy, offline use, zero API costs.
- AWS Bedrock —
anthropic.claude-sonnet-4-6 - AWS Credentials authentication. Enterprise security, AWS integration.
- Azure OpenAI —
gpt-4o - API Key / Azure AD authentication. Enterprise compliance, Azure integration.
Provider Comparison
- Setup
- Cloud: Easy (API key). Ollama: Medium (install + download model). Enterprise: Complex (cloud config).
- Cost
- Cloud: Per-token pricing. Ollama: Free. Enterprise: Per-token + cloud costs.
- Data Privacy
- Cloud: Sent to provider. Ollama: 100% local. Enterprise: Stays in your cloud.
- Offline
- Cloud: No. Ollama: Yes. Enterprise: No.
- Rate Limits
- Cloud: Yes. Ollama: No. Enterprise: Yes (configurable).
- Compliance
- Cloud: Provider-dependent. Ollama: Your control. Enterprise: SOC2, HIPAA, ISO.
- Private Network
- Cloud: No. Ollama: Yes. Enterprise: Yes (VPC/VNet).
Performance and Cost
- Anthropic (Claude) — 200K token context
- $3-$15 per 1M tokens (input/output).
- OpenAI (GPT-4o) — 128K token context
- $2.50-$10 per 1M tokens (input/output).
- Google (Gemini) — up to 2M token context
- $0.075-$0.30 per 1M tokens (input/output).
- xAI (Grok) — 2M token context
- $3-$15 per 1M tokens (input/output).
- Ollama — 128K token context
- $0 (hardware only).
- AWS Bedrock — 200K token context
- $3-$15 + AWS costs per 1M tokens.
- Azure OpenAI — 128K token context
- $2.50-$10 + Azure costs per 1M tokens.
For budget-conscious usage, try Gemini (cheapest cloud) or Ollama (free). For balanced cost and quality, Claude Sonnet or GPT-4o work well. Enterprise providers can leverage committed-use discounts.
Privacy and Security
- Anthropic
- Data location: US (cloud). Compliance: SOC 2, GDPR.
- OpenAI
- Data location: US (cloud). Compliance: SOC 2, GDPR.
- Gemini
- Data location: US (cloud). Compliance: SOC 2, GDPR.
- Grok
- Data location: US (cloud). Compliance: Standard.
- Ollama
- Data location: Your machine. Compliance: Your control.
- Bedrock
- Data location: Your AWS region. Compliance: SOC 2, HIPAA, ISO.
- Azure OpenAI
- Data location: Your Azure region. Compliance: SOC 2, HIPAA, ISO.
For sensitive infrastructure data, consider Ollama (never leaves your machine) or enterprise providers (stays in your cloud account).
Provider Examples
- Cloud Providers
- Local (Ollama)
- Enterprise
atmos.yaml
atmos.yaml
Ollama Installation
Start the service and pull a model:
Model Selection
llama4— ~40GB, 64GB+ RAM- Production use (recommended).
llama3.3:70b— ~40GB, 64GB+ RAM- Production use (previous generation).
llama3.1:8b— ~5GB, 8GB+ RAM- Quick queries, laptops.
codellama:34b— ~19GB, 32GB+ RAM- Code generation.
codellama:13b— ~8GB, 16GB+ RAM- Code on mid-range hardware.
mixtral:8x7b— ~26GB, 32GB+ RAM- Complex reasoning.
mistral:7b— ~4GB, 8GB+ RAM- Fast responses, consumer hardware.
Performance Tips
Hardware recommendations:
- Apple Silicon (M1/M2/M3/M4): Handles
llama4well with unified memory - 16GB RAM: Use
llama3.1:8bormistral:7b - 32GB+ RAM: Use
mixtral:8x7borllama3.3:70b - 64GB+ RAM: Use
llama4 - GPU acceleration (NVIDIA/AMD/Apple Silicon) provides 10-50x faster inference; Ollama detects and uses it automatically
atmos.yaml
Remote Deployment
To enable remote access on the server, set OLLAMA_HOST=0.0.0.0:11434 before starting Ollama.
Binding to 0.0.0.0 exposes the Ollama API to all network interfaces. Ensure the server is behind a firewall or reverse proxy with authentication in production environments.
OLLAMA_HOST- Server bind address (default:
127.0.0.1:11434). OLLAMA_ORIGINS- CORS allowed origins (default:
*— allow all). OLLAMA_MODELS- Model storage path (default:
~/.ollama/models).
atmos.yaml
AWS Bedrock
Authentication uses the standard AWS SDK credential chain (IAM roles, profiles, environment variables):
Use the global. prefix for dynamic routing across regions, or the anthropic. prefix for a specific region. Cross-region inference profiles use IDs like us.anthropic.claude-sonnet-4-6-v1:0.
Azure OpenAI
The model field must be your deployment name from Azure Portal, not the OpenAI model name. Both base_url (your resource endpoint) and api_version are required. See Azure OpenAI Documentation for available API versions.
Token Caching
Token caching (prompt caching) reduces costs by reusing frequently-sent content like system prompts.
- Anthropic — 90% discount
- Manual markers. Configuration required (see below).
- OpenAI — 50% discount
- Automatic. No configuration required.
- Gemini — Free (included)
- Automatic. No configuration required.
- Grok — 75% discount
- Automatic. No configuration required.
- Bedrock — Up to 90% discount
- Automatic. No configuration required.
- Azure OpenAI — 50-100% discount
- Automatic. No configuration required.
- Ollama — N/A
- Local processing, caching not applicable.
Most providers implement caching automatically. Only Anthropic requires explicit configuration:
atmos.yaml
cache.enabled- Enable or disable prompt caching (default:
true). Cached tokens cost 90% less for Anthropic. cache.cache_system_prompt- Cache the skill system prompt across messages (default:
true). cache.cache_project_instructions- Cache the ATMOS.md project instructions content (default:
true).
For Anthropic, keep your ATMOS.md content stable during conversations and use named sessions for extended work on the same topic. For all other providers, caching works transparently with no action required.
Multi-Provider Configuration
Configure multiple providers and switch between them. Set default_provider for CLI commands, and press Ctrl+P during a chat session to switch mid-conversation.
Provider Isolation
Each provider maintains its own completely isolated conversation history. When you switch providers, the new provider only sees messages from its own thread. When you switch back, it picks up right where you left off. UI notifications like "Switched to..." are never sent to any provider.
Security Best Practices
- Cloud Providers (Anthropic, OpenAI, Google, xAI)
- Use environment variables for API keys (never commit to git), rotate keys regularly, and review each provider's data retention policies.
- Local Provider (Ollama)
- All processing is local. Keep Ollama updated and use disk encryption for sensitive data.
- Enterprise Providers (Bedrock, Azure OpenAI)
- Use IAM roles or managed identities instead of API keys, enable audit logging (CloudTrail or Azure Monitor), configure private endpoints, and use customer-managed encryption keys where available.
Related Documentation
- AI Configuration - Top-level AI settings
- MCP Server - Universal MCP integration
- Claude Code Integration - IDE integration with Claude Code