AIProxy now supports YAML configuration files for managing channels, model configurations, and system options.
The configuration system follows this priority order (highest to lowest):
This means:
By default, AIProxy looks for config.yaml in the current working directory.
You can specify a custom location using the CONFIG_FILE_PATH environment variable:
export CONFIG_FILE_PATH=/path/to/your/config.yaml
The YAML configuration file has three main sections. The channel and modelconfig structures directly correspond to the database model types, making it easy to understand and maintain.
Define your API provider channels:
channels:
- name: "openai-primary"
type_name: "openai" # Human-readable type name (recommended)
# OR use numeric type:
# type: 1 # OpenAI channel type
key: "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
base_url: "https://api.openai.com"
models:
- "gpt-4"
- "gpt-3.5-turbo"
model_mapping:
"gpt-4": "gpt-4-0613"
status: 1 # 1=Enabled, 2=Disabled
priority: 0
balance: 100.0
balance_threshold: 10.0
enabled_auto_balance_check: true
sets:
- "default"
You can use either type_name (human-readable string) or type (numeric code). Using type_name is recommended for better readability:
Supported Type Names:
openai: OpenAI APIazure / azure2: Azure OpenAIanthropic / claude: Anthropic Claudegemini / google gemini: Google Geminigemini-openai / google gemini (openai): Google Gemini via OpenAI APIzhipu: Zhipu AIali / aliyun: Alibaba Cloudbaidu: Baidu Wenxinbaiduv2 / baidu v2: Baidu Wenxin v2xunfei: iFlytek Sparktencent: Tencent Hunyuanmoonshot: Moonshot AIdeepseek: DeepSeekaws: AWS Bedrockvertexai / vertex: Google Vertex AIxai: xAI Grokgroq: Groqmistral: Mistral AIcohere: Cohereopenrouter: OpenRoutercore/model/yaml_integration.go for the complete list)Numeric Channel Types:
1: OpenAI3: Azure14: Anthropic/Claude24: Google Geminicore/model/chtype.go for complete listDefine model-specific settings:
modelconfigs:
- model: "gpt-4"
owner: "openai"
type_name: "chat" # Human-readable type name (recommended)
# OR use numeric type:
# type: 1 # ChatCompletions
rpm: 3500 # Requests per minute
tpm: 80000 # Tokens per minute
retry_times: 3
timeout_config:
request_timeout: 300
stream_request_timeout: 600
warn_error_rate: 0.5
max_error_rate: 0.8
price:
input: 0.03 # Price per 1000 input tokens
output: 0.06 # Price per 1000 output tokens
config:
max_context_tokens: 8192
max_output_tokens: 4096
vision: false
tool_choice: true
- model: "text-embedding-3-small"
owner: "openai"
type_name: "embedding" # Embedding model
rpm: 3000
tpm: 1000000
price:
input: 0.00002
output: 0
You can use either type_name (human-readable string) or type (numeric code). Using type_name is recommended for better readability:
Supported Type Names:
chat / chatcompletions: Chat completion modelscompletion / completions: Text completion modelsembedding / embeddings: Embedding modelsmoderation / moderations: Moderation modelsimage / imagegenerations: Image generation modelsimageedit / imageedits: Image editing modelsaudio / speech / audiospeech: Text-to-speech modelstranscription / audiotranscription: Audio transcription modelstranslation / audiotranslation: Audio translation modelsrerank: Reranking modelspdf / parsepdf: PDF parsing modelsanthropic: Anthropic-specific modelscore/model/yaml_integration.go for the complete list)Numeric Model Types:
1: ChatCompletions2: Completions3: Embeddings4: Moderations5: ImagesGenerationscore/relay/mode/define.go for complete listCommon configuration keys:
max_context_tokens: Maximum context window sizemax_output_tokens: Maximum output tokensvision: Whether the model supports vision/image inputstool_choice: Whether the model supports function callingConfigure system-wide options:
options:
# Log retention (in hours)
LogStorageHours: "168" # 7 days
RetryLogStorageHours: "72" # 3 days
LogDetailStorageHours: "24" # 1 day
# Log settings
SaveAllLogDetail: "false"
LogDetailRequestBodyMaxSize: "10000"
LogDetailResponseBodyMaxSize: "10000"
# Rate limiting
IPGroupsThreshold: "100" # Requests per minute
IPGroupsBanThreshold: "200"
# Retry settings
RetryTimes: "3"
# Error rate alerts
DefaultWarnNotifyErrorRate: "0.5"
# Usage alerts
UsageAlertThreshold: "100"
LogStorageHours: How long to keep logs (hours)RetryLogStorageHours: How long to keep retry logs (hours)LogDetailStorageHours: How long to keep detailed logs (hours)CleanLogBatchSize: Batch size for log cleanup operationsIPGroupsThreshold: Request rate limit per IPIPGroupsBanThreshold: Ban threshold for IPSaveAllLogDetail: Whether to save all request/response detailsLogDetailRequestBodyMaxSize: Max size of request body to logLogDetailResponseBodyMaxSize: Max size of response body to logDisableServe: Disable API serving (for maintenance)RetryTimes: Number of retry attemptsDefaultChannelModels: Default models for new channels (JSON array)GroupMaxTokenNum: Max tokens per groupDefaultWarnNotifyErrorRate: Default error rate warning thresholdUsageAlertThreshold: Usage alert thresholdFuzzyTokenThreshold: Fuzzy token matching thresholdSee config.example.yaml for a complete example configuration file.
Create a config.yaml file in your project root or specify a custom location via CONFIG_FILE_PATH
Start AIProxy as usual:
./aiproxy
The configuration will be loaded in this order:
Changes to the YAML configuration file require restarting the application to take effect.
However, you can still use the web UI or API to modify configurations at runtime, which will be stored in the database.
You can extract your current database configuration and convert it to YAML format:
config.example.yamloptions section must be strings (they will be parsed according to their type)CONFIG_FILE_PATH environment variable)Environment variables override YAML values through the config.SetXxx() functions which check for environment variables on every call. Make sure you're using the correct environment variable names (see core/common/config/env.go for the list).
If you modify configuration through the web UI or API, those changes will be written to the database. On next restart, YAML will override those database values again. Use YAML for persistent configuration and the web UI for temporary changes.