# Stream Fake Plugin Configuration Guide ## Overview Stream Fake Plugin is a specialized plugin designed to solve timeout issues with non-streaming requests. When AI models take a long time to respond, non-streaming requests may timeout while waiting for the complete response. This plugin avoids timeout issues by internally converting non-streaming requests to streaming requests, then reassembling the streaming response back to non-streaming format for the client, thus solving timeout problems while maintaining client compatibility. ## Features - **Timeout Avoidance**: Prevents request timeouts caused by long waits through streaming transmission - **Transparent Conversion**: Automatically converts non-streaming requests to streaming format, transparent to clients - **Response Reconstruction**: Collects all streaming data chunks and reconstructs them into complete non-streaming responses - **Content Integrity**: Ensures all content types are properly processed and aggregated: - Regular content - Reasoning content (for models that support thinking processes) - Tool calls and their proper merging - Log probabilities - **Connection Keep-Alive**: Maintains active connections through streaming transmission to avoid network timeouts ## Problems Solved ### Primary Issue: Upstream Request Timeout - **Long Response Timeout**: When AI models generate long texts or complex responses, non-streaming requests are prone to timeout - **Network Timeout**: In unstable network environments, long waits for complete responses cause connection timeouts - **Proxy Timeout**: When going through proxy servers, proxies may disconnect due to prolonged periods without data ### Solution Through internal streaming transmission, connections remain active at all times, avoiding various timeout issues while clients still receive the expected non-streaming response format. ## Use Cases 1. **Long Text Generation**: Avoiding timeouts when generating long articles, reports, or code 2. **Complex Reasoning Tasks**: Handling complex problems that require extended thinking time 3. **Unstable Network Environments**: Environments with high latency or unstable networks 4. **Strict Timeout Restrictions**: Clients or middleware with strict timeout limitations 5. **Legacy System Compatibility**: Legacy systems where client timeout settings cannot be modified ## How It Works ### Problem Identification 1. Detects non-streaming chat completion requests (`"stream": false` or not set) 2. Identifies scenarios with long responses that may cause timeouts ### Internal Conversion 1. Modifies the request to streaming format (`"stream": true`) 2. Forwards the modified request to upstream API 3. Begins receiving streaming response data ### Response Processing 1. Receives streaming data chunks in real-time, keeping connection active 2. Aggregates all response content 3. Processes different types of content fragments 4. Reconstructs complete non-streaming response format 5. Sets correct response headers and returns to client ### Timeout Avoidance Mechanism - **Continuous Data Flow**: Streaming responses ensure connections always have data transmission - **Connection Keep-Alive**: Avoids disconnection due to prolonged periods without response - **Progressive Processing**: Receives and processes simultaneously, reducing overall wait time ## Configuration Example ```json { "model": "gpt-4", "type": 1, "plugin": { "stream-fake": { "enable": true } } } ``` ## Configuration Fields | Field | Type | Required | Default | Description | |-------|------|----------|---------|-------------| | `enable` | bool | Yes | false | Whether to enable Stream Fake Plugin to avoid timeout issues | ## Timeout Scenario Examples ### Scenario 1: Long Text Generation Timeout **Problem**: Requesting generation of a 5000-word technical document, non-streaming request times out after 60 seconds **Original Request**: ```json { "model": "gpt-4", "messages": [ { "role": "user", "content": "Please write a detailed 5000-word technical document introducing microservice architecture design principles and best practices" } ], "stream": false, "max_tokens": 4000 } ``` **Plugin Processing**: 1. Automatically converts to `"stream": true` 2. Receives response fragments in real-time, avoiding timeout 3. Reconstructs into complete non-streaming response for return ### Scenario 2: Complex Reasoning Task Timeout **Problem**: Complex mathematical problems require long thinking time, causing request timeout **Solution**: - Plugin ensures connection remains active during model thinking process - No timeout occurs even with extended reasoning time - Client ultimately receives complete reasoning results ## Performance Benefits ### Timeout Avoidance - **Eliminates Connection Timeouts**: Streaming transmission keeps connections active - **Avoids Proxy Timeouts**: Intermediate proxies won't disconnect due to prolonged periods without data - **Reduces Retry Attempts**: Avoids request retries caused by timeouts ### Response Time - **Faster Perceived Response**: While total time remains essentially the same, timeout retries are avoided - **Better User Experience**: Avoids request failures and the need to reinitiate requests - **Improved Resource Utilization**: Reduces resource waste caused by timeouts