README.md 5.3 KB

Stream Fake Plugin Configuration Guide

Overview

Stream Fake Plugin is a specialized plugin designed to solve timeout issues with non-streaming requests. When AI models take a long time to respond, non-streaming requests may timeout while waiting for the complete response. This plugin avoids timeout issues by internally converting non-streaming requests to streaming requests, then reassembling the streaming response back to non-streaming format for the client, thus solving timeout problems while maintaining client compatibility.

Features

  • Timeout Avoidance: Prevents request timeouts caused by long waits through streaming transmission
  • Transparent Conversion: Automatically converts non-streaming requests to streaming format, transparent to clients
  • Response Reconstruction: Collects all streaming data chunks and reconstructs them into complete non-streaming responses
  • Content Integrity: Ensures all content types are properly processed and aggregated:
    • Regular content
    • Reasoning content (for models that support thinking processes)
    • Tool calls and their proper merging
    • Log probabilities
  • Connection Keep-Alive: Maintains active connections through streaming transmission to avoid network timeouts

Problems Solved

Primary Issue: Upstream Request Timeout

  • Long Response Timeout: When AI models generate long texts or complex responses, non-streaming requests are prone to timeout
  • Network Timeout: In unstable network environments, long waits for complete responses cause connection timeouts
  • Proxy Timeout: When going through proxy servers, proxies may disconnect due to prolonged periods without data

Solution

Through internal streaming transmission, connections remain active at all times, avoiding various timeout issues while clients still receive the expected non-streaming response format.

Use Cases

  1. Long Text Generation: Avoiding timeouts when generating long articles, reports, or code
  2. Complex Reasoning Tasks: Handling complex problems that require extended thinking time
  3. Unstable Network Environments: Environments with high latency or unstable networks
  4. Strict Timeout Restrictions: Clients or middleware with strict timeout limitations
  5. Legacy System Compatibility: Legacy systems where client timeout settings cannot be modified

How It Works

Problem Identification

  1. Detects non-streaming chat completion requests ("stream": false or not set)
  2. Identifies scenarios with long responses that may cause timeouts

Internal Conversion

  1. Modifies the request to streaming format ("stream": true)
  2. Forwards the modified request to upstream API
  3. Begins receiving streaming response data

Response Processing

  1. Receives streaming data chunks in real-time, keeping connection active
  2. Aggregates all response content
  3. Processes different types of content fragments
  4. Reconstructs complete non-streaming response format
  5. Sets correct response headers and returns to client

Timeout Avoidance Mechanism

  • Continuous Data Flow: Streaming responses ensure connections always have data transmission
  • Connection Keep-Alive: Avoids disconnection due to prolonged periods without response
  • Progressive Processing: Receives and processes simultaneously, reducing overall wait time

Configuration Example

{
    "model": "gpt-4",
    "type": 1,
    "plugin": {
        "stream-fake": {
            "enable": true
        }
    }
}

Configuration Fields

Field Type Required Default Description
enable bool Yes false Whether to enable Stream Fake Plugin to avoid timeout issues

Timeout Scenario Examples

Scenario 1: Long Text Generation Timeout

Problem: Requesting generation of a 5000-word technical document, non-streaming request times out after 60 seconds

Original Request:

{
    "model": "gpt-4",
    "messages": [
        {
            "role": "user",
            "content": "Please write a detailed 5000-word technical document introducing microservice architecture design principles and best practices"
        }
    ],
    "stream": false,
    "max_tokens": 4000
}

Plugin Processing:

  1. Automatically converts to "stream": true
  2. Receives response fragments in real-time, avoiding timeout
  3. Reconstructs into complete non-streaming response for return

Scenario 2: Complex Reasoning Task Timeout

Problem: Complex mathematical problems require long thinking time, causing request timeout

Solution:

  • Plugin ensures connection remains active during model thinking process
  • No timeout occurs even with extended reasoning time
  • Client ultimately receives complete reasoning results

Performance Benefits

Timeout Avoidance

  • Eliminates Connection Timeouts: Streaming transmission keeps connections active
  • Avoids Proxy Timeouts: Intermediate proxies won't disconnect due to prolonged periods without data
  • Reduces Retry Attempts: Avoids request retries caused by timeouts

Response Time

  • Faster Perceived Response: While total time remains essentially the same, timeout retries are avoided
  • Better User Experience: Avoids request failures and the need to reinitiate requests
  • Improved Resource Utilization: Reduces resource waste caused by timeouts