AI Troubleshooting

Solutions for common Atmos AI issues, organized by symptom.

Quick Checks

Before diving into specific errors, verify these basics:

ai:
  enabled: true
  default_provider: "anthropic"

Verify your API key is set

[ -n "$ANTHROPIC_API_KEY" ] && echo "Set" || echo "NOT set"

Test connectivity

atmos ai ask "test"

Common Errors

"AI features are not enabled"

Add ai.enabled: true to your atmos.yaml.

"API key not found"

Export the key for your provider:

export ANTHROPIC_API_KEY="sk-ant-..."  # From console.anthropic.com

export OPENAI_API_KEY="sk-..."  # From platform.openai.com

export GEMINI_API_KEY="..."  # From aistudio.google.com

export XAI_API_KEY="xai-..."  # From x.ai/api

export AZURE_OPENAI_API_KEY="..."  # From Azure Portal > OpenAI resource > Keys

No API key needed. Uses AWS SDK credentials (IAM roles, profiles, env vars).

aws sts get-caller-identity  # Verify AWS credentials

No API key needed. Just make sure the server is running:

curl http://localhost:11434/api/version

"Failed to create AI client"

Usually means an invalid or revoked API key, or insufficient credits. Test the key directly:

Test Anthropic API key

curl https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{"model": "claude-sonnet-4-6", "max_tokens": 1024, "messages": [{"role": "user", "content": "Hello"}]}'

"Unsupported AI provider"

Valid provider names:

API providers: anthropic, openai, gemini, grok, ollama, bedrock, azureopenai
CLI providers: claude-code, codex-cli, gemini-cli

Rate Limiting (429 Errors)

Long conversations send full history with each request. Reduce it:

ai:
  max_history_messages: 20

Provider Issues

Ollama

"Connection refused" — Start the server and verify your config:

shell

ollama serve
curl http://localhost:11434/api/version

providers:
  ollama:
    base_url: "http://localhost:11434/v1"

"Model not found" — Pull the model and use the exact name from ollama list:

shell

ollama list
ollama pull llama4

Slow or out of memory — Use a smaller model (e.g. llama3.1:8b needs ~8GB RAM) or reduce max_tokens.

AWS Bedrock

"Authentication failed" — Verify AWS credentials:

shell

aws sts get-caller-identity

"Access denied" — Your IAM role needs bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream permissions. Also enable model access in AWS Console under Bedrock > Model access.

"Model not found" — Use the full Bedrock model ID and check your region:

providers:
  bedrock:
    model: "anthropic.claude-sonnet-4-6"
    base_url: "us-east-1"

Azure OpenAI

"Authentication failed" — Get the key from Azure Portal > OpenAI resource > Keys and Endpoint.

"Resource not found" — Copy the exact endpoint URL from Azure Portal:

providers:
  azureopenai:
    base_url: "https://your-resource-name.openai.azure.com"

"Deployment not found" — Use your deployment name (not the model name):

providers:
  azureopenai:
    model: "gpt-4o"           # Your deployment name
    api_version: "2025-04-01-preview"

List deployments with az cognitiveservices account deployment list --name your-resource --resource-group your-rg.

Other Providers

Anthropic — "Authentication error": Key must start with sk-ant-.
OpenAI — "Insufficient quota": Add billing info at platform.openai.com.
Grok — "Connection failed": Set base_url: "https://api.x.ai/v1".

See AI Providers for full provider documentation.

CLI Providers

"CLI provider binary not found"

The CLI tool isn't installed or not on PATH:

Claude Code
OpenAI Codex
Gemini CLI

brew install --cask claude-code
claude auth login

npm install -g @openai/codex
codex login

npm install -g @google/gemini-cli
gemini  # Authenticates on first run

Or set an explicit path:

providers:
  claude-code:
    binary: /usr/local/bin/claude

MCP Servers Not Working with Codex CLI

Codex CLI MCP servers don't inherit the parent process environment. If atmos auth exec fails with "identity not found", ensure ATMOS_PROFILE is exported:

export ATMOS_PROFILE=managers  # Or your profile name
atmos ai ask "What did we spend on EC2?"

Atmos automatically injects ATMOS_* env vars into each MCP server's config, but the env var must be set in the shell.

Gemini CLI MCP Blocked

Gemini CLI blocks MCP for all personal Google accounts (oauth-personal auth) regardless of subscription tier. This is a server-side Google restriction.

Workaround: Use claude-code or codex-cli for MCP workflows. Gemini CLI works for prompt-only queries.

Sessions

Sessions Not Persisting

Enable sessions and use named sessions:

ai:
  sessions:
    enabled: true
    path: ".atmos/sessions"

shell

atmos ai chat --session my-project

Database Errors

If you get "Failed to initialize session storage", back up and recreate:

shell

cp .atmos/sessions/sessions.db .atmos/sessions/sessions.db.backup
rm .atmos/sessions/sessions.db
atmos ai chat  # Creates a new database

Conversation Memory Not Working

Use --session name — anonymous sessions don't persist
Add sessions.enabled: true to atmos.yaml
Check directory permissions: chmod 755 .atmos/sessions

All seven providers support conversation memory.

Auto-Compact Not Working

Verify the required settings are all present:

ai:
  max_history_messages: 50     # Required — threshold is calculated from this
  sessions:
    enabled: true
    auto_compact:
      enabled: true
      trigger_threshold: 0.75  # Triggers at 38 messages (50 x 0.75)

If use_ai_summary: true and no summaries appear, check that the AI provider is configured and the API key is valid. Test with use_ai_summary: false first.

Triggers too often? Increase trigger_threshold or max_history_messages.

Too expensive? Set use_ai_summary: false to use simple concatenation instead of AI summaries.

See Auto-Compact for details.

MCP Server

"MCP server is not enabled"

The MCP server is disabled by default and must be explicitly enabled:

mcp:
  enabled: true
ai:
  enabled: true
  tools:
    enabled: true

Not Starting

Test the MCP server directly:

shell

atmos mcp start  # Should output: Starting Atmos MCP server on stdio...

For Claude Desktop, use the full path to atmos:

{
  "mcpServers": {
    "atmos": {
      "command": "/usr/local/bin/atmos",
      "args": ["mcp", "start"]
    }
  }
}

Tools Not Available

Restart the MCP client and check for errors with debug logging:

shell

ATMOS_LOGS_LEVEL=Debug atmos mcp start

See MCP Server for the complete setup guide.

LSP Integration

Server Not Found

Install and verify:

shell

npm install -g yaml-language-server && yaml-language-server --version
brew install terraform-ls && terraform-ls version

If installed but not found, use the full path:

lsp:
  servers:
    yaml-ls:
      command: "/usr/local/bin/yaml-language-server"

Not Validating Files

Verify LSP is enabled and the server config matches your file types:

lsp:
  enabled: true
  servers:
    yaml-ls:
      filetypes: ["yaml", "yml"]

See LSP Client for detailed configuration.

Skills

Skill Not Appearing

Press Ctrl+A in chat to open the skill selector. Skills need all three required fields:

ai:
  skills:
    my-skill:
      display_name: "My Skill"    # Required
      description: "..."          # Required
      system_prompt: "..."        # Required

Ctrl+A only works in the main chat view — press Esc first if you're in another panel.

Skill Gives Generic Responses

The system_prompt needs more detail. Include a role definition, focus areas, and tool usage instructions:

system_prompt: |
  You are a specialized Atmos stacks analyst.
  FOCUS: Stack configuration, dependency analysis.
  APPROACH: Use atmos_describe_component first, then recommend.
  Always use tools to gather data before answering.

Tool Access Denied

The tool isn't in the skill's allowed_tools. Add it, or switch to the General skill (Ctrl+A) which has access to all tools.

Tool names use the atmos_ prefix: atmos_describe_component, atmos_list_stacks, atmos_validate_stacks, etc. See AI Tools for the full list.

Claude Code Subagents

Subagent Not Invoked

Check the file exists and has proper frontmatter:

shell

ls -la .claude/agents/atmos-expert.md

---
name: atmos-expert
description: Expert in Atmos infrastructure...
tools:
  - mcp__atmos__describe_stacks
model: inherit
---

Invoke with @atmos-expert your question.

Can't Access MCP Tools

Tool names need the mcp__atmos__ prefix. The MCP server must be running and the server name must match:

tools:
  - mcp__atmos__describe_stacks      # Correct
  - describe_stacks                   # Wrong

Restart Claude Desktop after changing subagent or MCP configuration.

See Claude Code Integration for the complete guide.

Project Instructions

AI Not Using Instructions

Verify instructions are enabled and the file exists:

ai:
  instructions:
    enabled: true
    file: "ATMOS.md"

shell

ls -la ATMOS.md

If the file doesn't exist, instructions are silently skipped. Create it manually.

Large File Performance

Keep ATMOS.md under 10KB. Remove outdated content to reduce tokens per request.

Debugging

When nothing else works:

1. Enable debug logging

atmos ai ask "test" --logs-level=Debug

2. Inspect configuration

atmos describe config | grep -A 10 "ai"

3. Test network connectivity:

Anthropic
OpenAI
Ollama
Bedrock

curl -sI https://api.anthropic.com | head -1

curl -sI https://api.openai.com | head -1

curl http://localhost:11434/api/version

aws bedrock list-foundation-models --region us-east-1 --max-results 1

4. Try a minimal config to rule out config problems:

ai:
  enabled: true
  default_provider: "anthropic"

Performance

Slow responses — Try a faster model (claude-haiku-4-5-20251001, gpt-5-mini, gemini-2.5-flash), go local with Ollama, reduce max_tokens, or limit history with max_history_messages: 20.

Timeout errors — Break complex questions into smaller parts.

Getting Help

Review AI Configuration
Search GitHub Issues
Open a new issue with: Atmos version, provider/model, error message, and reproduction steps

Quick Checks​

Common Errors​

"AI features are not enabled"​

"API key not found"​

"Failed to create AI client"​

"Unsupported AI provider"​

Rate Limiting (429 Errors)​

Provider Issues​

Ollama​

AWS Bedrock​

Azure OpenAI​

Other Providers​

CLI Providers​

"CLI provider binary not found"​

MCP Servers Not Working with Codex CLI​

Gemini CLI MCP Blocked​

Sessions​

Sessions Not Persisting​

Database Errors​

Conversation Memory Not Working​

Auto-Compact Not Working​

MCP Server​

"MCP server is not enabled"​

Not Starting​

Tools Not Available​

LSP Integration​

Server Not Found​

Not Validating Files​

Skills​

Skill Not Appearing​

Skill Gives Generic Responses​

Tool Access Denied​

Claude Code Subagents​

Subagent Not Invoked​

Can't Access MCP Tools​

Project Instructions​

AI Not Using Instructions​

Large File Performance​

Debugging​

Performance​

Getting Help​

Quick Checks

Common Errors

"AI features are not enabled"

"API key not found"

"Failed to create AI client"

"Unsupported AI provider"

Rate Limiting (429 Errors)

Provider Issues

Ollama

AWS Bedrock

Azure OpenAI

Other Providers

CLI Providers

"CLI provider binary not found"

MCP Servers Not Working with Codex CLI

Gemini CLI MCP Blocked

Sessions

Sessions Not Persisting

Database Errors

Conversation Memory Not Working

Auto-Compact Not Working

MCP Server

"MCP server is not enabled"

Not Starting

Tools Not Available

LSP Integration

Server Not Found

Not Validating Files

Skills

Skill Not Appearing

Skill Gives Generic Responses

Tool Access Denied

Claude Code Subagents

Subagent Not Invoked

Can't Access MCP Tools

Project Instructions

AI Not Using Instructions

Large File Performance

Debugging

Performance

Getting Help