Stream Fake Plugin Configuration Guide
Overview
Stream Fake Plugin is a specialized plugin designed to solve timeout issues with non-streaming requests. When AI models take a long time to respond, non-streaming requests may timeout while waiting for the complete response. This plugin avoids timeout issues by internally converting non-streaming requests to streaming requests, then reassembling the streaming response back to non-streaming format for the client, thus solving timeout problems while maintaining client compatibility.
Features
- Timeout Avoidance: Prevents request timeouts caused by long waits through streaming transmission
- Transparent Conversion: Automatically converts non-streaming requests to streaming format, transparent to clients
- Response Reconstruction: Collects all streaming data chunks and reconstructs them into complete non-streaming responses
- Content Integrity: Ensures all content types are properly processed and aggregated:
- Regular content
- Reasoning content (for models that support thinking processes)
- Tool calls and their proper merging
- Log probabilities
- Connection Keep-Alive: Maintains active connections through streaming transmission to avoid network timeouts
Problems Solved
Primary Issue: Upstream Request Timeout
- Long Response Timeout: When AI models generate long texts or complex responses, non-streaming requests are prone to timeout
- Network Timeout: In unstable network environments, long waits for complete responses cause connection timeouts
- Proxy Timeout: When going through proxy servers, proxies may disconnect due to prolonged periods without data
Solution
Through internal streaming transmission, connections remain active at all times, avoiding various timeout issues while clients still receive the expected non-streaming response format.
Use Cases
- Long Text Generation: Avoiding timeouts when generating long articles, reports, or code
- Complex Reasoning Tasks: Handling complex problems that require extended thinking time
- Unstable Network Environments: Environments with high latency or unstable networks
- Strict Timeout Restrictions: Clients or middleware with strict timeout limitations
- Legacy System Compatibility: Legacy systems where client timeout settings cannot be modified
How It Works
Problem Identification
- Detects non-streaming chat completion requests (
"stream": false or not set)
- Identifies scenarios with long responses that may cause timeouts
Internal Conversion
- Modifies the request to streaming format (
"stream": true)
- Forwards the modified request to upstream API
- Begins receiving streaming response data
Response Processing
- Receives streaming data chunks in real-time, keeping connection active
- Aggregates all response content
- Processes different types of content fragments
- Reconstructs complete non-streaming response format
- Sets correct response headers and returns to client
Timeout Avoidance Mechanism
- Continuous Data Flow: Streaming responses ensure connections always have data transmission
- Connection Keep-Alive: Avoids disconnection due to prolonged periods without response
- Progressive Processing: Receives and processes simultaneously, reducing overall wait time
Configuration Example
{
"model": "gpt-4",
"type": 1,
"plugin": {
"stream-fake": {
"enable": true
}
}
}
Configuration Fields
| Field |
Type |
Required |
Default |
Description |
enable |
bool |
Yes |
false |
Whether to enable Stream Fake Plugin to avoid timeout issues |
Timeout Scenario Examples
Scenario 1: Long Text Generation Timeout
Problem: Requesting generation of a 5000-word technical document, non-streaming request times out after 60 seconds
Original Request:
{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "Please write a detailed 5000-word technical document introducing microservice architecture design principles and best practices"
}
],
"stream": false,
"max_tokens": 4000
}
Plugin Processing:
- Automatically converts to
"stream": true
- Receives response fragments in real-time, avoiding timeout
- Reconstructs into complete non-streaming response for return
Scenario 2: Complex Reasoning Task Timeout
Problem: Complex mathematical problems require long thinking time, causing request timeout
Solution:
- Plugin ensures connection remains active during model thinking process
- No timeout occurs even with extended reasoning time
- Client ultimately receives complete reasoning results
Performance Benefits
Timeout Avoidance
- Eliminates Connection Timeouts: Streaming transmission keeps connections active
- Avoids Proxy Timeouts: Intermediate proxies won't disconnect due to prolonged periods without data
- Reduces Retry Attempts: Avoids request retries caused by timeouts
Response Time
- Faster Perceived Response: While total time remains essentially the same, timeout retries are avoided
- Better User Experience: Avoids request failures and the need to reinitiate requests
- Improved Resource Utilization: Reduces resource waste caused by timeouts