AI Troubleshooting
Solutions for common Atmos AI issues, organized by symptom.
Quick Checks
Before diving into specific errors, verify these basics:
ai:
enabled: true
default_provider: "anthropic"
Common Errors
"AI features are not enabled"
Add ai.enabled: true to your atmos.yaml.
"API key not found"
Export the key for your provider:
- Anthropic
- OpenAI
- Gemini
- Grok
- Azure OpenAI
- AWS Bedrock
- Ollama
export ANTHROPIC_API_KEY="sk-ant-..." # From console.anthropic.com
export OPENAI_API_KEY="sk-..." # From platform.openai.com
export GEMINI_API_KEY="..." # From aistudio.google.com
export XAI_API_KEY="xai-..." # From x.ai/api
export AZURE_OPENAI_API_KEY="..." # From Azure Portal > OpenAI resource > Keys
No API key needed. Uses AWS SDK credentials (IAM roles, profiles, env vars).
aws sts get-caller-identity # Verify AWS credentials
No API key needed. Just make sure the server is running:
curl http://localhost:11434/api/version
"Failed to create AI client"
Usually means an invalid or revoked API key, or insufficient credits. Test the key directly:
"Unsupported AI provider"
Valid provider names: anthropic, openai, gemini, grok, ollama, bedrock, azureopenai.
Rate Limiting (429 Errors)
Long conversations send full history with each request. Reduce it:
ai:
max_history_messages: 20
Provider Issues
Ollama
"Connection refused" — Start the server and verify your config:
providers:
ollama:
base_url: "http://localhost:11434/v1"
"Model not found" — Pull the model and use the exact name from ollama list:
Slow or out of memory — Use a smaller model (e.g. llama3.1:8b needs ~8GB RAM) or reduce max_tokens.
AWS Bedrock
"Authentication failed" — Verify AWS credentials:
"Access denied" — Your IAM role needs bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream permissions. Also enable model access in AWS Console under Bedrock > Model access.
"Model not found" — Use the full Bedrock model ID and check your region:
providers:
bedrock:
model: "anthropic.claude-sonnet-4-6"
base_url: "us-east-1"
Azure OpenAI
"Authentication failed" — Get the key from Azure Portal > OpenAI resource > Keys and Endpoint.
"Resource not found" — Copy the exact endpoint URL from Azure Portal:
providers:
azureopenai:
base_url: "https://your-resource-name.openai.azure.com"
"Deployment not found" — Use your deployment name (not the model name):
providers:
azureopenai:
model: "gpt-4o" # Your deployment name
api_version: "2025-04-01-preview"
List deployments with az cognitiveservices account deployment list --name your-resource --resource-group your-rg.
Other Providers
- Anthropic — "Authentication error"
- Key must start with
sk-ant-. - OpenAI — "Insufficient quota"
- Add billing info at platform.openai.com.
- Grok — "Connection failed"
- Set
base_url: "https://api.x.ai/v1".
See AI Providers for full provider documentation.
Sessions
Sessions Not Persisting
Enable sessions and use named sessions:
ai:
sessions:
enabled: true
path: ".atmos/sessions"
Database Errors
If you get "Failed to initialize session storage", back up and recreate:
Conversation Memory Not Working
- Use
--session name— anonymous sessions don't persist - Add
sessions.enabled: trueto atmos.yaml - Check directory permissions:
chmod 755 .atmos/sessions
All seven providers support conversation memory.
Auto-Compact Not Working
Verify the required settings are all present:
ai:
max_history_messages: 50 # Required — threshold is calculated from this
sessions:
enabled: true
auto_compact:
enabled: true
trigger_threshold: 0.75 # Triggers at 38 messages (50 x 0.75)
If use_ai_summary: true and no summaries appear, check that the AI provider is configured and the API key is valid. Test with use_ai_summary: false first.
Triggers too often? Increase trigger_threshold or max_history_messages.
Too expensive? Set use_ai_summary: false to use simple concatenation instead of AI summaries.
See Auto-Compact for details.
MCP Server
"MCP server is not enabled"
The MCP server is disabled by default and must be explicitly enabled:
mcp:
enabled: true
ai:
enabled: true
tools:
enabled: true
Not Starting
Test the MCP server directly:
For Claude Desktop, use the full path to atmos:
{
"mcpServers": {
"atmos": {
"command": "/usr/local/bin/atmos",
"args": ["mcp", "start"]
}
}
}
Tools Not Available
Restart the MCP client and check for errors with debug logging:
See MCP Server for the complete setup guide.
LSP Integration
Server Not Found
Install and verify:
If installed but not found, use the full path:
lsp:
servers:
yaml-ls:
command: "/usr/local/bin/yaml-language-server"
Not Validating Files
Verify LSP is enabled and the server config matches your file types:
lsp:
enabled: true
servers:
yaml-ls:
filetypes: ["yaml", "yml"]
See LSP Client for detailed configuration.
Skills
Skill Not Appearing
Press Ctrl+A in chat to open the skill selector. Skills need all three required fields:
ai:
skills:
my-skill:
display_name: "My Skill" # Required
description: "..." # Required
system_prompt: "..." # Required
Ctrl+A only works in the main chat view — press Esc first if you're in another panel.
Skill Gives Generic Responses
The system_prompt needs more detail. Include a role definition, focus areas, and tool usage instructions:
system_prompt: |
You are a specialized Atmos stacks analyst.
FOCUS: Stack configuration, dependency analysis.
APPROACH: Use atmos_describe_component first, then recommend.
Always use tools to gather data before answering.
Tool Access Denied
The tool isn't in the skill's allowed_tools. Add it, or switch to the General skill (Ctrl+A) which has access to all tools.
Tool names use the atmos_ prefix: atmos_describe_component, atmos_list_stacks, atmos_validate_stacks, etc. See AI Tools for the full list.
Claude Code Subagents
Subagent Not Invoked
Check the file exists and has proper frontmatter:
---
name: atmos-expert
description: Expert in Atmos infrastructure...
tools:
- mcp__atmos__describe_stacks
model: inherit
---
Invoke with @atmos-expert your question.
Can't Access MCP Tools
Tool names need the mcp__atmos__ prefix. The MCP server must be running and the server name must match:
tools:
- mcp__atmos__describe_stacks # Correct
- describe_stacks # Wrong
Restart Claude Desktop after changing subagent or MCP configuration.
See Claude Code Integration for the complete guide.
Project Instructions
AI Not Using Instructions
Verify instructions are enabled and the file exists:
ai:
instructions:
enabled: true
file: "ATMOS.md"
If the file doesn't exist, instructions are silently skipped. Create it manually.
Large File Performance
Keep ATMOS.md under 10KB. Remove outdated content to reduce tokens per request.
Debugging
When nothing else works:
3. Test network connectivity:
- Anthropic
- OpenAI
- Ollama
- Bedrock
curl -sI https://api.anthropic.com | head -1
curl -sI https://api.openai.com | head -1
curl http://localhost:11434/api/version
aws bedrock list-foundation-models --region us-east-1 --max-results 1
4. Try a minimal config to rule out config problems:
ai:
enabled: true
default_provider: "anthropic"
Performance
Slow responses — Try a faster model (claude-haiku-4-5-20251001, gpt-5-mini, gemini-2.5-flash), go local with Ollama, reduce max_tokens, or limit history with max_history_messages: 20.
Timeout errors — Break complex questions into smaller parts.
Getting Help
- Review AI Configuration
- Search GitHub Issues
- Open a new issue with: Atmos version, provider/model, error message, and reproduction steps