Stream Fake Plugin Configuration Guide

Overview

Stream Fake Plugin is a specialized plugin designed to solve timeout issues with non-streaming requests. When AI models take a long time to respond, non-streaming requests may timeout while waiting for the complete response. This plugin avoids timeout issues by internally converting non-streaming requests to streaming requests, then reassembling the streaming response back to non-streaming format for the client, thus solving timeout problems while maintaining client compatibility.

Features

Timeout Avoidance: Prevents request timeouts caused by long waits through streaming transmission
Transparent Conversion: Automatically converts non-streaming requests to streaming format, transparent to clients
Response Reconstruction: Collects all streaming data chunks and reconstructs them into complete non-streaming responses
Content Integrity: Ensures all content types are properly processed and aggregated:
- Regular content
- Reasoning content (for models that support thinking processes)
- Tool calls and their proper merging
- Log probabilities
Connection Keep-Alive: Maintains active connections through streaming transmission to avoid network timeouts

Problems Solved

Primary Issue: Upstream Request Timeout

Long Response Timeout: When AI models generate long texts or complex responses, non-streaming requests are prone to timeout
Network Timeout: In unstable network environments, long waits for complete responses cause connection timeouts
Proxy Timeout: When going through proxy servers, proxies may disconnect due to prolonged periods without data

Solution

Through internal streaming transmission, connections remain active at all times, avoiding various timeout issues while clients still receive the expected non-streaming response format.

Use Cases

Long Text Generation: Avoiding timeouts when generating long articles, reports, or code
Complex Reasoning Tasks: Handling complex problems that require extended thinking time
Unstable Network Environments: Environments with high latency or unstable networks
Strict Timeout Restrictions: Clients or middleware with strict timeout limitations
Legacy System Compatibility: Legacy systems where client timeout settings cannot be modified

How It Works

Problem Identification

Detects non-streaming chat completion requests ("stream": false or not set)
Identifies scenarios with long responses that may cause timeouts

Internal Conversion

Modifies the request to streaming format ("stream": true)
Forwards the modified request to upstream API
Begins receiving streaming response data

Response Processing

Receives streaming data chunks in real-time, keeping connection active
Aggregates all response content
Processes different types of content fragments
Reconstructs complete non-streaming response format
Sets correct response headers and returns to client

Timeout Avoidance Mechanism

Continuous Data Flow: Streaming responses ensure connections always have data transmission
Connection Keep-Alive: Avoids disconnection due to prolonged periods without response
Progressive Processing: Receives and processes simultaneously, reducing overall wait time

Configuration Example

{
    "model": "gpt-4",
    "type": 1,
    "plugin": {
        "stream-fake": {
            "enable": true
        }
    }
}

Configuration Fields

Field	Type	Required	Default	Description
`enable`	bool	Yes	false	Whether to enable Stream Fake Plugin to avoid timeout issues

Timeout Scenario Examples

Scenario 1: Long Text Generation Timeout

Problem: Requesting generation of a 5000-word technical document, non-streaming request times out after 60 seconds

Original Request:

{
    "model": "gpt-4",
    "messages": [
        {
            "role": "user",
            "content": "Please write a detailed 5000-word technical document introducing microservice architecture design principles and best practices"
        }
    ],
    "stream": false,
    "max_tokens": 4000
}

Plugin Processing:

Automatically converts to "stream": true
Receives response fragments in real-time, avoiding timeout
Reconstructs into complete non-streaming response for return

Scenario 2: Complex Reasoning Task Timeout

Problem: Complex mathematical problems require long thinking time, causing request timeout

Solution:

Plugin ensures connection remains active during model thinking process
No timeout occurs even with extended reasoning time
Client ultimately receives complete reasoning results

Performance Benefits

Timeout Avoidance

Eliminates Connection Timeouts: Streaming transmission keeps connections active
Avoids Proxy Timeouts: Intermediate proxies won't disconnect due to prolonged periods without data
Reduces Retry Attempts: Avoids request retries caused by timeouts

Response Time

Faster Perceived Response: While total time remains essentially the same, timeout retries are avoided
Better User Experience: Avoids request failures and the need to reinitiate requests
Improved Resource Utilization: Reduces resource waste caused by timeouts

README.md 5.3 KB Permalink Histórico Raw