System Architecture: claude-code-hub

Date: 2025-11-29 Architect: ding Version: 1.0 Project Type: web-app Project Level: 4 Status: Draft

Document Overview

This document defines the system architecture for Claude Code Hub. It provides the technical blueprint for implementation, addressing all functional and non-functional requirements from the PRD.

Related Documents:

Product Requirements Document: docs/prd-claude-code-hub-2025-11-29.md
Product Brief: docs/product-brief-claude-code-hub-2025-11-29.md

Executive Summary

Claude Code Hub (CCH) is an intelligent AI API proxy platform built on a Modular Monolith architecture using Next.js 15 with App Router and Hono. The system provides multi-provider management, intelligent load balancing, circuit breaker patterns, and comprehensive monitoring for AI coding tools like Claude Code and Codex.

The architecture emphasizes:

High Availability through multi-provider failover and circuit breakers
Low Latency through optimized proxy pipeline and Redis caching
Observability through comprehensive request logging and real-time dashboards
Extensibility through format converters and pluggable provider types

Architectural Drivers

These requirements heavily influence architectural decisions:

Critical Drivers (Must Address)

NFR	Requirement	Architectural Impact
NFR-001	Proxy latency < 50ms overhead	Requires in-memory caching, connection pooling, streaming optimization
NFR-002	100+ concurrent sessions	Requires stateless design, Redis for session state, connection limits
NFR-003	Reliable streaming	Requires proper backpressure handling, chunked transfer, timeout management
NFR-007	High availability	Requires multi-provider failover, circuit breaker, Redis fail-open strategy

Important Drivers (Should Address)

NFR	Requirement	Architectural Impact
NFR-004	Secure authentication	API key hashing, constant-time comparison, rate limiting
NFR-005	Data protection	TLS everywhere, key encryption at rest, masked logging
NFR-008	Horizontal scalability	Stateless app design, shared Redis, connection pooling
NFR-011	Code quality	TypeScript strict mode, Drizzle ORM for type safety

System Overview

High-Level Architecture

Claude Code Hub follows a Modular Monolith pattern with clear internal boundaries:

┌─────────────────────────────────────────────────────────────────────────────┐
│                              CLIENT LAYER                                    │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐        │
│  │ Claude Code │  │  Codex CLI  │  │ Cursor IDE  │  │  Admin UI   │        │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘        │
└─────────┼────────────────┼────────────────┼────────────────┼────────────────┘
          │                │                │                │
          ▼                ▼                ▼                ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                           API GATEWAY LAYER                                  │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    Next.js 15 + Hono Router                          │    │
│  │  ┌───────────┐  ┌───────────┐  ┌───────────┐  ┌───────────┐         │    │
│  │  │ /v1/      │  │ /api/     │  │ /settings │  │ /dashboard│         │    │
│  │  │ messages  │  │ actions   │  │ (UI)      │  │ (UI)      │         │    │
│  │  └─────┬─────┘  └─────┬─────┘  └─────┬─────┘  └─────┬─────┘         │    │
│  └────────┼──────────────┼──────────────┼──────────────┼───────────────┘    │
└───────────┼──────────────┼──────────────┼──────────────┼────────────────────┘
            │              │              │              │
            ▼              ▼              ▼              ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         GUARD PIPELINE LAYER                                 │
│  ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐         │
│  │  Auth  │→│Version │→│ Probe  │→│Session │→│Sensitive│→│  Rate  │→...    │
│  │ Guard  │ │ Guard  │ │Handler │ │ Guard  │ │ Guard  │ │ Limit  │         │
│  └────────┘ └────────┘ └────────┘ └────────┘ └────────┘ └────────┘         │
└─────────────────────────────────────────────────────────────────────────────┘
            │
            ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                        PROXY CORE LAYER                                      │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐              │
│  │ Provider        │  │ Format          │  │ Response        │              │
│  │ Selector        │  │ Converter       │  │ Handler         │              │
│  │ (weighted LB)   │  │ (Claude↔OpenAI) │  │ (streaming)     │              │
│  └────────┬────────┘  └────────┬────────┘  └────────┬────────┘              │
│           │                    │                    │                        │
│  ┌────────┴────────────────────┴────────────────────┴────────┐              │
│  │                    Circuit Breaker                         │              │
│  │  ┌─────────┐  ┌─────────┐  ┌─────────┐                    │              │
│  │  │ CLOSED  │←→│  OPEN   │←→│HALF-OPEN│                    │              │
│  │  └─────────┘  └─────────┘  └─────────┘                    │              │
│  └────────────────────────────────────────────────────────────┘              │
└─────────────────────────────────────────────────────────────────────────────┘
            │
            ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                       UPSTREAM PROVIDERS                                     │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐      │
│  │ Anthropic│  │  OpenAI  │  │  Gemini  │  │  Relay   │  │  Custom  │      │
│  │ (claude) │  │ (codex)  │  │ (gemini) │  │(claude-  │  │(openai-  │      │
│  │          │  │          │  │          │  │  auth)   │  │compatible│      │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘  └──────────┘      │
└─────────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────────┐
│                         DATA LAYER                                           │
│  ┌─────────────────────────┐      ┌─────────────────────────┐               │
│  │       PostgreSQL        │      │          Redis          │               │
│  │  ┌─────┐ ┌─────┐ ┌────┐│      │  ┌─────┐ ┌─────┐ ┌────┐│               │
│  │  │users│ │keys │ │prov││      │  │sess │ │rate │ │circ││               │
│  │  ├─────┤ ├─────┤ ├────┤│      │  │ions │ │limit│ │uit ││               │
│  │  │msgs │ │rules│ │conf││      │  └─────┘ └─────┘ └────┘│               │
│  │  └─────┘ └─────┘ └────┘│      └─────────────────────────┘               │
│  └─────────────────────────┘                                                 │
└─────────────────────────────────────────────────────────────────────────────┘

Architecture Diagram

flowchart TB
    subgraph Clients["Client Layer"]
        CC[Claude Code CLI]
        CX[Codex CLI]
        CU[Cursor IDE]
        WEB[Admin Dashboard]
    end

    subgraph Gateway["API Gateway (Next.js + Hono)"]
        V1["/v1/* Proxy Endpoints"]
        API["/api/actions/* Server Actions"]
        UI["React Admin UI"]
    end

    subgraph Guards["Guard Pipeline"]
        AUTH[Auth Guard]
        VER[Version Guard]
        SESS[Session Guard]
        SENS[Sensitive Guard]
        RATE[Rate Limit Guard]
        PROV[Provider Selector]
    end

    subgraph Core["Proxy Core"]
        FMT[Format Converter]
        FWD[Request Forwarder]
        RSP[Response Handler]
        CB[Circuit Breaker]
    end

    subgraph Providers["Upstream Providers"]
        ANT[Anthropic API]
        OAI[OpenAI API]
        GEM[Gemini API]
        RELAY[Relay Services]
    end

    subgraph Data["Data Layer"]
        PG[(PostgreSQL)]
        RD[(Redis)]
    end

    CC & CX & CU --> V1
    WEB --> UI
    UI --> API

    V1 --> AUTH --> VER --> SESS --> SENS --> RATE --> PROV
    PROV --> FMT --> FWD --> RSP
    FWD <--> CB
    CB --> ANT & OAI & GEM & RELAY

    AUTH & RATE --> RD
    SESS --> RD
    CB --> RD
    API --> PG
    RSP --> PG

Architectural Pattern

Pattern: Modular Monolith with Guard Pipeline

Rationale:

Simplicity: Single deployment unit reduces operational complexity
Performance: In-process communication minimizes latency (critical for proxy)
Coherence: Next.js provides unified frontend/backend development experience
Scalability: Stateless design with Redis allows horizontal scaling when needed
Guard Pipeline: Clean separation of cross-cutting concerns (auth, rate limiting, session)

Why Not Microservices:

Proxy latency requirement (< 50ms) makes network hops expensive
Single team (open-source maintainers) doesn't need service autonomy
Operational complexity of distributed systems not justified for this scale

Technology Stack

Frontend

Choice: Next.js 15 (App Router) + React 19 + Tailwind CSS + shadcn/ui

Rationale:

Next.js 15: Latest stable with App Router, Server Components, and Server Actions
React 19: Latest features for optimal performance
Tailwind CSS: Utility-first CSS for rapid UI development
shadcn/ui: High-quality, accessible component library

Trade-offs:

✓ Gain: Full-stack TypeScript, SSR, optimal DX
✗ Lose: Heavier than pure SPA for simple UIs

Key Libraries:

next-intl - Internationalization (i18n)
zustand - Client state management
recharts - Dashboard charts
lucide-react - Icon library

Backend

Choice: Hono + Next.js API Routes + Server Actions

Rationale:

Hono: Ultra-fast, lightweight routing (critical for proxy performance)
Next.js API Routes: Seamless integration with frontend
Server Actions: Type-safe RPC for admin operations

Trade-offs:

✓ Gain: Sub-millisecond routing, TypeScript end-to-end
✗ Lose: Less conventional than Express/Fastify

Key Libraries:

undici - HTTP client for upstream requests (faster than node-fetch)
zod - Runtime schema validation
next-safe-action - Type-safe Server Actions with OpenAPI generation

Database

Choice: PostgreSQL + Drizzle ORM

Rationale:

PostgreSQL: Robust ACID compliance, JSON support, excellent indexing
Drizzle ORM: Type-safe, SQL-first ORM with minimal overhead

Trade-offs:

✓ Gain: Full SQL power, type safety, relational integrity
✗ Lose: Requires PostgreSQL infrastructure (not serverless-native)

Key Features:

Connection pooling via pg driver
Soft delete pattern (deletedAt column)
JSON columns for flexible data (modelRedirects, providerChain)
Comprehensive indexing strategy

Caching & Session

Choice: Redis (ioredis)

Rationale:

Session Stickiness: 5-minute TTL session-to-provider mapping
Rate Limiting: Atomic Lua scripts for distributed rate limiting
Circuit Breaker State: Shared circuit state across instances

Trade-offs:

✓ Gain: Fast, distributed state, atomic operations
✗ Lose: Additional infrastructure dependency

Fail-Open Strategy:

Rate limiting fails open (allow requests) when Redis unavailable
Session creation falls back to new selection
Circuit breaker uses in-memory fallback

Infrastructure

Choice: Docker + Docker Compose (Self-hosted)

Rationale:

Self-Deployment Focus: Users deploy on their own infrastructure
Simplicity: Single docker-compose.yaml for complete stack
Portability: Runs on any Docker-compatible host

Stack:

services:
  app: # Next.js application
  postgres: # PostgreSQL database
  redis: # Redis cache

Future Consideration: Kubernetes Helm charts for enterprise deployments

Third-Party Services

Service	Purpose	Integration
Anthropic API	Primary AI provider	Claude Messages API
OpenAI API	Codex provider	Responses API
Google Gemini	Alternative provider	Gemini API
LiteLLM	Pricing data sync	Periodic fetch
Various Relays	Alternative access	Claude-auth format

Development & Deployment

Version Control: Git + GitHub

Package Manager: Bun (faster than npm/yarn)

Build Tool: Turbopack (Next.js built-in)

CI/CD: GitHub Actions

Lint + Type check on PR
Build verification
Docker image publishing

Code Quality:

ESLint + Prettier
TypeScript strict mode
Husky pre-commit hooks

System Components

Component: Guard Pipeline

Purpose: Ordered chain of request processors for cross-cutting concerns

Responsibilities:

Request authentication
Client version validation
Session management
Content filtering
Rate limiting
Provider selection

Interfaces:

interface GuardStep {
  name: string;
  execute(session: ProxySession): Promise<Response | null>;
}

interface GuardPipeline {
  run(session: ProxySession): Promise<Response | null>;
}

Dependencies:

Redis (session, rate limiting)
PostgreSQL (user/key lookup)

FRs Addressed: FR-010, FR-011, FR-012, FR-021, FR-023

Pipeline Configuration:

// Full pipeline for chat requests
const CHAT_PIPELINE = [
  "auth", // API key validation
  "version", // Client version check
  "probe", // Probe request handling
  "session", // Session stickiness
  "sensitive", // Content filtering
  "rateLimit", // Rate limiting
  "provider", // Provider selection
  "messageContext", // Request logging
];

// Minimal pipeline for count_tokens
const COUNT_TOKENS_PIPELINE = ["auth", "version", "probe", "provider"];

Component: Provider Selector

Purpose: Intelligently select optimal upstream provider for each request

Responsibilities:

Weighted random selection
Priority-based failover
Circuit breaker integration
Session stickiness
Group tag filtering

Algorithm:

1. Filter enabled providers
2. Filter by circuit breaker state (exclude OPEN)
3. Filter by effective provider group (key.providerGroup overrides user.providerGroup; key.providerGroup is admin-only; user.providerGroup is derived from Key groups on Key changes)
4. Check session cache for sticky provider
5. If no sticky: weighted random selection by weight
6. Return selected provider or null (all unavailable)

Interfaces:

interface ProviderSelection {
  provider: Provider;
  isSticky: boolean;
}

class ProxyProviderResolver {
  static ensure(session: ProxySession): Promise<Response | null>;
}

Dependencies:

Redis (session cache, circuit state)
PostgreSQL (provider list)

FRs Addressed: FR-003, FR-005, FR-007

Component: Format Converter

Purpose: Bidirectional conversion between different AI API formats

Responsibilities:

Request format transformation
Response format transformation
Streaming format adaptation
Token count mapping

Supported Conversions:

From	To	Direction
Claude Messages	OpenAI Chat	Bidirectional
Claude Messages	Codex Response	Bidirectional
Claude Messages	Gemini	Bidirectional

Interfaces:

interface FormatConverter {
  convertRequest(request: any, targetFormat: string): any;
  convertResponse(response: any, sourceFormat: string): any;
  convertStreamChunk(chunk: any, sourceFormat: string): any;
}

FRs Addressed: FR-006, FR-026, FR-027, FR-028

Component: Circuit Breaker

Purpose: Prevent cascade failures by isolating failing providers

Responsibilities:

Track provider failure rates
Transition between states (CLOSED → OPEN → HALF-OPEN → CLOSED)
Per-provider configuration
Distributed state via Redis

State Machine:

                    failure threshold reached
        ┌───────────────────────────────────────┐
        │                                       ▼
    ┌───────┐                             ┌─────────┐
    │CLOSED │                             │  OPEN   │
    └───┬───┘                             └────┬────┘
        │                                      │
        │ success                              │ timeout expires
        │                                      ▼
        │                               ┌───────────┐
        └───────────────────────────────│ HALF-OPEN │
                  success threshold     └───────────┘

Configuration (per provider):

{
  circuitBreakerFailureThreshold: 5,      // Failures before opening
  circuitBreakerOpenDuration: 1800000,    // 30 min open duration
  circuitBreakerHalfOpenSuccessThreshold: 2  // Successes to close
}

FRs Addressed: FR-004, FR-007

Component: Response Handler

Purpose: Process and transform upstream responses for clients

Responsibilities:

Streaming response forwarding
Error classification
Token counting
Cost calculation
Request logging

Streaming Architecture:

Upstream Provider (SSE)
        │
        ▼
┌───────────────────┐
│ Response Handler  │
│  - Parse chunks   │
│  - Transform      │
│  - Count tokens   │
│  - Calculate cost │
└────────┬──────────┘
         │
         ▼
   Client (SSE)

FRs Addressed: FR-006, FR-013

Component: Message Service

Purpose: Log and track all proxy requests

Responsibilities:

Request metadata capture
Token usage tracking
Cost calculation
Error logging
Session context

Captured Data:

interface MessageRequestLog {
  providerId: number;
  userId: number;
  key: string;
  model: string;
  sessionId: string;
  durationMs: number;
  costUsd: number;
  inputTokens: number;
  outputTokens: number;
  statusCode: number;
  errorMessage?: string;
  providerChain: { id: number; name: string }[];
}

FRs Addressed: FR-013, FR-015, FR-016

Component: Server Actions (Admin API)

Purpose: Type-safe RPC for admin operations

Modules:

providers.ts - Provider CRUD, testing, stats
users.ts - User management
keys.ts - API key management
error-rules.ts - Error rule configuration
sensitive-words.ts - Content filter rules
statistics.ts - Usage analytics
model-prices.ts - Pricing management

Features:

Automatic OpenAPI generation
Type-safe input validation (Zod)
Consistent error handling
Authentication via admin token

FRs Addressed: FR-001, FR-008, FR-009, FR-019-FR-025

Data Architecture

Data Model

┌─────────────────┐     ┌─────────────────┐
│     users       │     │    providers    │
├─────────────────┤     ├─────────────────┤
│ id (PK)         │     │ id (PK)         │
│ name            │     │ name            │
│ role            │     │ url             │
│ rpmLimit        │     │ key (encrypted) │
│ dailyLimitUsd   │     │ weight          │
│ limit5hUsd      │     │ priority        │
│ limitWeeklyUsd  │     │ providerType    │
│ limitMonthlyUsd │     │ isEnabled       │
│ createdAt       │     │ circuitBreaker* │
│ deletedAt       │     │ limits*         │
└────────┬────────┘     │ timeouts*       │
         │              │ createdAt       │
         │              │ deletedAt       │
         │              └────────┬────────┘
         │                       │
         ▼                       ▼
┌─────────────────┐     ┌─────────────────┐
│      keys       │     │ message_request │
├─────────────────┤     ├─────────────────┤
│ id (PK)         │     │ id (PK)         │
│ userId (FK)     │────▶│ userId (FK)     │
│ key (unique)    │     │ providerId (FK) │◀────┘
│ name            │     │ key             │
│ isEnabled       │     │ model           │
│ expiresAt       │     │ sessionId       │
│ limits*         │     │ costUsd         │
│ createdAt       │     │ inputTokens     │
│ deletedAt       │     │ outputTokens    │
└─────────────────┘     │ durationMs      │
                        │ statusCode      │
                        │ providerChain   │
                        │ createdAt       │
                        └─────────────────┘

┌─────────────────┐     ┌─────────────────┐
│   error_rules   │     │ sensitive_words │
├─────────────────┤     ├─────────────────┤
│ id (PK)         │     │ id (PK)         │
│ pattern         │     │ word            │
│ matchType       │     │ matchType       │
│ category        │     │ isEnabled       │
│ isEnabled       │     │ createdAt       │
│ priority        │     └─────────────────┘
└─────────────────┘

┌─────────────────┐     ┌─────────────────────┐
│  model_prices   │     │ system_settings     │
├─────────────────┤     ├─────────────────────┤
│ id (PK)         │     │ id (PK)             │
│ modelName       │     │ siteTitle           │
│ priceData (JSON)│     │ currencyDisplay     │
│ createdAt       │     │ enableAutoCleanup   │
└─────────────────┘     │ cleanupRetention    │
                        └─────────────────────┘

┌───────────────────────┐
│ notification_settings │
├───────────────────────┤
│ id (PK)               │
│ enabled               │
│ circuitBreakerWebhook │
│ dailyLeaderboard*     │
│ costAlert*            │
└───────────────────────┘

Database Design

Indexing Strategy:

Table	Index	Purpose
users	`idx_users_active_role_sort`	User list query optimization
keys	`idx_keys_user_id`	Foreign key lookup
providers	`idx_providers_enabled_priority`	Provider selection
message_request	`idx_message_request_user_date_cost`	Statistics aggregation
message_request	`idx_message_request_session_id`	Session grouping
error_rules	`unique_pattern`	Pattern uniqueness

Soft Delete Pattern:

All main tables have deletedAt column
Queries filter WHERE deleted_at IS NULL
Preserves referential integrity for historical data

JSON Columns:

providers.modelRedirects - Model name mapping
providers.allowedModels - Model whitelist
message_request.providerChain - Request routing history
model_prices.priceData - Flexible pricing structure

Data Flow

Request Flow:

1. Client Request
   │
2. Auth Guard: Lookup key in PostgreSQL
   │
3. Session Guard: Check Redis for sticky session
   │
4. Rate Limit Guard: Check Redis counters (Lua script)
   │
5. Provider Selector: Query enabled providers from cache
   │
6. Forwarder: Send to upstream provider
   │
7. Response Handler: Stream response to client
   │
8. Message Service: Write to message_request table

Session Data Flow:

Redis Key: session:{session_id}
Value: { providerId: number, createdAt: timestamp }
TTL: 300 seconds (5 minutes)

Lookup: O(1) hash get
Creation: O(1) hash set with TTL

Rate Limit Data Flow:

Redis Keys:
- ratelimit:rpm:{userId}:{minute}
- ratelimit:5h:{userId}:{window}
- ratelimit:daily:{userId}:{day}
- ratelimit:weekly:{userId}:{week}
- ratelimit:monthly:{userId}:{month}

Operations: Lua scripts for atomic increment + check

API Design

API Architecture

Pattern: REST + Server Actions

Versioning: URL path versioning (/v1/)

Authentication:

Proxy endpoints: API key in Authorization: Bearer {key} or x-api-key: {key}
Admin endpoints: Admin token in cookie (HTTP-only)

Response Format: JSON (Claude API compatible for proxy endpoints)

Proxy Endpoints

Method	Endpoint	Description	FR
POST	`/v1/messages`	Claude Messages API	FR-026
POST	`/v1/messages/count_tokens`	Token counting	FR-026
POST	`/v1/chat/completions`	OpenAI Chat API	FR-027
POST	`/v1/responses`	Codex Response API	FR-028
POST	`/v1/generateContent`	Gemini API	FR-002

Request Headers:

Authorization: Bearer {api_key}
Content-Type: application/json
x-session-id: {optional_session_id}
x-client-version: {optional_version}
anthropic-version: 2023-06-01

Response Headers:

Content-Type: application/json  (or text/event-stream for streaming)
x-request-id: {request_id}

Admin API (Server Actions)

Providers:

POST /api/actions - createProvider
GET  /api/actions - listProviders
GET  /api/actions - getProvider
PUT  /api/actions - updateProvider
DEL  /api/actions - deleteProvider
POST /api/actions - testProviderConnection
GET  /api/actions - getProviderStats

Users:

POST /api/actions - createUser
GET  /api/actions - listUsers
GET  /api/actions - getUser
PUT  /api/actions - updateUser
DEL  /api/actions - deleteUser
GET  /api/actions - getUserStats

Keys:

POST /api/actions - createKey
GET  /api/actions - listKeys
PUT  /api/actions - updateKey
DEL  /api/actions - revokeKey

Statistics:

GET /api/actions - getOverview
GET /api/actions - getUsageByUser
GET /api/actions - getUsageByProvider
GET /api/actions - getUsageByModel
GET /api/actions - getLeaderboard

Authentication & Authorization

API Key Authentication:

// Key format: cch_xxx...xxx (32+ characters)
// Storage: SHA-256 hash in database
// Lookup: O(1) indexed query

async function authenticateKey(key: string): Promise<AuthResult> {
  const hash = sha256(key);
  const keyRecord = await db.query.keys.findFirst({
    where: eq(keys.key, hash),
  });
  // Validate: exists, enabled, not expired
}

Admin Token Authentication:

// Environment variable: ADMIN_TOKEN
// Stored in HTTP-only cookie after login
// Validated on each admin request

Authorization Model:

Roles:
- admin: Full access to admin UI and API
- user: Access to personal stats and settings

Permissions:
- API key gives user-level access
- Admin token gives admin access

Non-Functional Requirements Coverage

NFR-001: Proxy Latency

Requirement: Proxy overhead < 50ms for request processing

Architecture Solution:

Hono Router: Sub-millisecond routing (vs 10ms+ for Express)
Connection Pooling: Reuse upstream connections
Redis Caching: In-memory session and rate limit checks
Streaming: Zero-copy chunk forwarding
Minimal Middleware: Guard pipeline optimized for speed

Implementation Notes:

// Use undici for upstream requests (faster than node-fetch)
import { fetch } from "undici";

// Redis connection pooling
const redis = new Redis({ lazyConnect: true });

// Streaming response forwarding
return new Response(
  new ReadableStream({
    async start(controller) {
      for await (const chunk of upstreamResponse.body) {
        controller.enqueue(chunk);
      }
    },
  })
);

Validation:

Measure request timing in message_request.duration_ms
Monitor p95/p99 latency in production
Target: < 50ms overhead, < 100ms p99

NFR-002: Concurrent Capacity

Requirement: Support 100+ concurrent streaming sessions

Architecture Solution:

Stateless Design: No in-memory session state
Redis Sessions: Distributed session storage
Connection Limits: Per-provider connection caps
Graceful Degradation: Queue or reject excess connections

Implementation Notes:

// Session stored in Redis with TTL
await redis.setex(
  `session:${sessionId}`,
  300,
  JSON.stringify({
    providerId,
    createdAt: Date.now(),
  })
);

// Provider concurrent session limit
if (activeSessions >= provider.limitConcurrentSessions) {
  return new Response("Too many concurrent sessions", { status: 429 });
}

Validation:

Load test with 100+ concurrent WebSocket connections
Monitor Redis memory usage
Track active_sessions metric

NFR-003: Streaming Reliability

Requirement: Reliable SSE streaming without message loss

Architecture Solution:

Chunked Transfer: Stream chunks as received
Backpressure: Respect client consumption rate
Timeout Handling: Configurable idle timeouts
Error Recovery: Clean connection termination

Implementation Notes:

// Streaming with backpressure
const stream = new ReadableStream({
  async pull(controller) {
    const chunk = await reader.read();
    if (chunk.done) {
      controller.close();
    } else {
      controller.enqueue(chunk.value);
    }
  },
});

// Idle timeout (provider-configurable)
setTimeout(() => {
  if (!lastChunkTime || Date.now() - lastChunkTime > idleTimeout) {
    stream.cancel();
  }
}, idleTimeout);

Validation:

End-to-end streaming tests
Verify no chunk loss under load
Test timeout behavior

NFR-004: Authentication Security

Requirement: Secure API key handling

Architecture Solution:

Key Hashing: SHA-256 hash stored, not plaintext
Constant-Time Comparison: Prevent timing attacks
Rate Limiting: Brute-force protection
Key Rotation: Support for key revocation

Implementation Notes:

// Key generation
function generateKey(): string {
  return `cch_${randomBytes(32).toString("hex")}`;
}

// Key hashing
function hashKey(key: string): string {
  return createHash("sha256").update(key).digest("hex");
}

// Constant-time comparison (via crypto.timingSafeEqual)

Validation:

Security audit of key handling
Penetration testing for auth bypass
Verify no plaintext keys in logs

NFR-007: High Availability

Requirement: Service continuity despite provider failures

Architecture Solution:

Circuit Breaker: Isolate failing providers (per-provider)
Automatic Failover: Retry on next available provider
Redis Fail-Open: Allow requests when Redis unavailable
Health Checks: Provider connectivity verification

Implementation Notes:

// Circuit breaker states in Redis
const state = await redis.hget(`cb:${providerId}`, "state");
if (state === "OPEN") {
  // Skip this provider, try next
  return selectNextProvider();
}

// Fail-open for Redis
try {
  await checkRateLimit();
} catch (redisError) {
  console.warn("Redis unavailable, allowing request");
  // Continue without rate limiting
}

Validation:

Chaos testing (kill providers)
Redis failover testing
Measure failover latency

Security Architecture

Authentication

API Key Authentication:

Format: cch_ prefix + 32 hex characters
Storage: SHA-256 hash in PostgreSQL
Transmission: HTTPS only, Authorization: Bearer header
Lifetime: Configurable expiration, revocable

Admin Authentication:

Single admin token (environment variable)
HTTP-only cookie after login
Session-based (not stateless)

Authorization

RBAC Model:

Roles:
├── admin
│   ├── Manage providers
│   ├── Manage users
│   ├── Manage keys
│   ├── View all statistics
│   └── Configure system
└── user
    ├── Use proxy endpoints
    ├── View own statistics
    └── Manage own keys

Permission Enforcement:

// Guard middleware pattern
const adminOnly = async (ctx, next) => {
  const token = ctx.cookies.get("adminToken");
  if (token !== process.env.ADMIN_TOKEN) {
    return ctx.json({ error: "Unauthorized" }, 401);
  }
  return next();
};

Data Encryption

In Transit:

TLS 1.3 for all connections
HTTPS enforced via middleware
Secure WebSocket (WSS) for real-time

At Rest:

Provider API keys: Application-level encryption (recommended)
Database: PostgreSQL encryption (optional, infrastructure)
Backups: Encrypted (infrastructure responsibility)

Security Best Practices

Input Validation:

// Zod schemas for all inputs
const createProviderSchema = z.object({
  name: z.string().min(1).max(255),
  url: z.string().url(),
  key: z.string().min(10),
  // ...
});

SQL Injection Prevention:

Drizzle ORM with parameterized queries
No raw SQL with user input

XSS Prevention:

React's automatic escaping
Content-Security-Policy headers

Rate Limiting:

Per-key rate limits
Auth failure rate limiting
DDoS protection (infrastructure)

Logging Security:

API keys masked in logs
Request bodies not logged by default
Audit trail for admin actions

Scalability & Performance

Scaling Strategy

Horizontal Scaling (Primary):

                    Load Balancer
                          │
          ┌───────────────┼───────────────┐
          │               │               │
      ┌───┴───┐       ┌───┴───┐       ┌───┴───┐
      │ CCH-1 │       │ CCH-2 │       │ CCH-3 │
      └───┬───┘       └───┬───┘       └───┬───┘
          │               │               │
          └───────────────┼───────────────┘
                          │
              ┌───────────┴───────────┐
              │                       │
          ┌───┴───┐               ┌───┴───┐
          │ Redis │               │Postgres│
          │(Master)│              │(Primary)│
          └───┬───┘               └───┬───┘
              │                       │
          ┌───┴───┐               ┌───┴───┐
          │ Redis │               │Postgres│
          │(Replica)              │(Replica)│
          └───────┘               └───────┘

Scaling Triggers:

CPU > 70% sustained
Memory > 80%
Request latency p99 > 200ms

Performance Optimization

Query Optimization:

Indexed queries for all hot paths
Connection pooling (pgBouncer compatible)
Prepared statements via Drizzle

N+1 Prevention:

Eager loading for related entities
Batch queries for statistics

Memory Efficiency:

Streaming responses (no buffering)
LRU cache for hot data (providers, rules)

Caching Strategy

Cache Layers:

┌─────────────────────────────────────┐
│         CDN (Static Assets)         │  TTL: Long
├─────────────────────────────────────┤
│     Application Cache (In-Memory)   │  TTL: Short
│  - Provider list                    │
│  - Error rules                      │
│  - Sensitive words                  │
├─────────────────────────────────────┤
│            Redis Cache              │  TTL: Medium
│  - Session mappings                 │
│  - Rate limit counters              │
│  - Circuit breaker state            │
├─────────────────────────────────────┤
│         PostgreSQL                  │  Source of truth
└─────────────────────────────────────┘

Cache Invalidation:

Provider changes: Invalidate provider cache
Rule changes: Reload rules
Session: TTL-based expiration

Load Balancing

Strategy: Round-robin with health checks

Health Check Endpoint: GET /api/health

{
  "status": "healthy",
  "checks": {
    "database": "ok",
    "redis": "ok"
  }
}

Reliability & Availability

High Availability Design

No Single Points of Failure:

Multiple app instances behind load balancer
Redis Sentinel or Cluster for failover
PostgreSQL streaming replication

Graceful Degradation:

Redis unavailable → Fail-open (allow requests)
Single provider down → Automatic failover
Database read issues → Serve from cache

Disaster Recovery

RPO (Recovery Point Objective): 1 hour RTO (Recovery Time Objective): 4 hours

Backup Strategy:

PostgreSQL: pg_dump daily, WAL archiving
Configuration: Git-versioned

Recovery Procedure:

Restore PostgreSQL from latest backup
Restore configuration from Git
Start application instances
Verify health checks
Resume traffic

Backup Strategy

Data	Frequency	Retention	Method
PostgreSQL	Daily	30 days	pg_dump + S3
WAL logs	Continuous	7 days	WAL archiving
Config	On change	Unlimited	Git
Redis	N/A	Ephemeral	Rebuilt on restart

Monitoring & Alerting

Metrics (Prometheus format):

# Request metrics
cch_requests_total{endpoint, status}
cch_request_duration_seconds{endpoint}

# Provider metrics
cch_provider_requests_total{provider, status}
cch_provider_circuit_state{provider}

# Resource metrics
cch_active_sessions
cch_redis_connections
cch_db_connections

Alerting Thresholds: | Alert | Condition | Severity | |-------|-----------|----------| | High Error Rate | 5xx > 5% for 5min | Critical | | High Latency | p99 > 500ms for 5min | Warning | | Circuit Open | Any provider OPEN | Warning | | Database Down | Health check fail | Critical |

Integration Architecture

External Integrations

Upstream AI Providers:

interface ProviderAdapter {
  formatRequest(session: ProxySession): Request;
  parseResponse(response: Response): AsyncGenerator<Chunk>;
  handleError(error: Error): ErrorClassification;
}

// Implementations
class AnthropicAdapter implements ProviderAdapter {}
class OpenAIAdapter implements ProviderAdapter {}
class GeminiAdapter implements ProviderAdapter {}

Webhook Integrations:

Enterprise WeChat robot notifications
Circuit breaker alerts
Daily usage reports

Internal Integrations

Module Communication:

Actions Layer ←→ Repository Layer ←→ Database
     ↓
Guard Pipeline ←→ Redis ←→ Circuit Breaker
     ↓
Format Converters ←→ Provider Adapters

Message/Event Architecture

Current: Synchronous request-response

Future Consideration: Event-driven for:

Async cost calculation
Background log processing
Notification delivery

Development Architecture

Code Organization

src/
├── app/                    # Next.js App Router
│   ├── v1/                 # Proxy endpoints
│   │   └── _lib/           # Proxy core
│   │       ├── proxy/      # Guard pipeline, forwarder
│   │       ├── converters/ # Format converters
│   │       └── gemini/     # Gemini-specific
│   ├── api/                # REST API
│   │   └── actions/        # OpenAPI documentation
│   ├── dashboard/          # Admin UI pages
│   └── settings/           # Settings pages
├── actions/                # Server Actions
├── repository/             # Database queries
├── drizzle/                # Schema + migrations
├── lib/                    # Shared utilities
│   ├── rate-limit/         # Rate limiting
│   ├── circuit-breaker/    # Circuit breaker
│   └── session/            # Session management
├── types/                  # TypeScript types
└── components/             # React components

Module Structure

Bounded Contexts:

Proxy: Request handling, format conversion, provider selection
Admin: User management, provider management, configuration
Analytics: Statistics, logging, monitoring

Dependency Rules:

Repository depends on Drizzle schema only
Actions depend on Repository
Guards depend on Repository and Redis
UI depends on Actions

Testing Strategy

Unit Tests:

Format converters
Rate limiting logic
Circuit breaker state machine
Validation schemas

Integration Tests:

Guard pipeline
Database queries
Redis operations

E2E Tests:

Full proxy flow
Admin workflows
Authentication flows

Coverage Target: 70%+

CI/CD Pipeline

name: CI/CD

on: [push, pull_request]

jobs:
  lint:
    - bun install
    - bun run lint
    - bun run typecheck

  test:
    - bun install
    - bun test

  build:
    - bun run build
    - docker build

  deploy:
    - docker push
    - deploy to staging/production

Deployment Architecture

Environments

Environment	Purpose	Database	Redis
Development	Local development	Local PostgreSQL	Local Redis
Staging	Pre-production testing	Staging DB	Staging Redis
Production	Live service	Production DB	Production Redis

Deployment Strategy

Current: Docker Compose (self-hosted)

version: "3.8"
services:
  app:
    image: claude-code-hub:latest
    ports:
      - "3000:3000"
    environment:
      - DSN=postgresql://...
      - REDIS_URL=redis://...
      - ADMIN_TOKEN=...
    depends_on:
      - postgres
      - redis

  postgres:
    image: postgres:15
    volumes:
      - pgdata:/var/lib/postgresql/data

  redis:
    image: redis:7
    volumes:
      - redisdata:/data

Future: Kubernetes with Helm charts

Infrastructure as Code

Docker:

Multi-stage Dockerfile for minimal image
Non-root user for security
Health check endpoint

Docker Compose:

Development environment
Local PostgreSQL + Redis
Auto-migration on startup

Requirements Traceability

Functional Requirements Coverage

FR ID	FR Name	Components	Status
FR-001	Multi-Provider Management	Admin Actions, Repository	✓
FR-002	Provider Type Support	Provider Adapters, Format Converters	✓
FR-003	Intelligent Provider Selection	Provider Selector	✓
FR-004	Circuit Breaker	Circuit Breaker Component	✓
FR-005	Session Stickiness	Session Guard, Redis	✓
FR-006	Format Conversion	Format Converters	✓
FR-007	Automatic Retry	Guard Pipeline, Circuit Breaker	✓
FR-008	User Management	Admin Actions, Repository	✓
FR-009	API Key Management	Admin Actions, Repository	✓
FR-010	Authentication	Auth Guard	✓
FR-011	Rate Limiting	Rate Limit Guard, Redis	✓
FR-012	Concurrent Session Limiting	Session Guard	✓
FR-013	Request Logging	Message Service	✓
FR-014	Real-Time Dashboard	Dashboard UI	✓
FR-015	Usage Statistics	Statistics Actions	✓
FR-016	Active Session Monitoring	Session Actions	✓
FR-017	Provider Health Status	Provider Stats	✓
FR-019	Admin Dashboard	Next.js UI	✓
FR-020	Error Rules Management	Error Rules Actions	✓
FR-021	Sensitive Words Filtering	Sensitive Guard	✓
FR-026	Claude Messages API	Proxy Endpoints	✓
FR-027	OpenAI Chat Completions	Proxy Endpoints, Converters	✓
FR-028	Codex Response API	Proxy Endpoints, Converters	✓

Non-Functional Requirements Coverage

NFR ID	NFR Name	Solution	Validation
NFR-001	Proxy Latency	Hono, Redis, streaming	p95 < 50ms
NFR-002	Concurrent Capacity	Stateless, Redis	100+ sessions
NFR-003	Streaming Reliability	Chunked transfer	No loss test
NFR-004	Authentication Security	Key hashing	Security audit
NFR-005	Data Protection	TLS, encryption	Compliance check
NFR-007	High Availability	Circuit breaker, failover	Chaos testing
NFR-008	Horizontal Scalability	Stateless design	Load testing
NFR-009	Internationalization	next-intl	UI review
NFR-011	Code Quality	TypeScript strict	Coverage report

Trade-offs & Decision Log

Decision 1: Modular Monolith vs Microservices

Decision: Modular Monolith

Trade-off:

✓ Gain: Simple deployment, low latency, easier debugging
✗ Lose: Independent scaling of components, team autonomy

Rationale: Proxy latency requirement (< 50ms) makes network hops expensive. Single team doesn't benefit from service boundaries.

Decision 2: Hono vs Express/Fastify

Decision: Hono

Trade-off:

✓ Gain: Sub-millisecond routing, modern API, lightweight
✗ Lose: Smaller ecosystem, less community support

Rationale: Performance is critical for proxy use case. Hono's edge-first design aligns with low-latency goals.

Decision 3: Redis for Sessions vs Database Sessions

Decision: Redis

Trade-off:

✓ Gain: Sub-millisecond session lookups, TTL support, atomic operations
✗ Lose: Additional infrastructure, data loss on restart

Rationale: Session stickiness requires very fast lookups on every request. 5-minute TTL makes persistence unnecessary.

Decision 4: PostgreSQL vs SQLite for Simplicity

Decision: PostgreSQL

Trade-off:

✓ Gain: Concurrent writes, full SQL, scalability
✗ Lose: Infrastructure complexity vs SQLite

Rationale: Multi-tenant workload requires proper concurrent write handling. Future multi-database support planned.

Open Issues & Risks

Issue	Impact	Mitigation
Redis single point	Medium	Redis Sentinel/Cluster for production
Large log volume	Medium	Auto-cleanup, retention policies
Provider API changes	Low	Adapter pattern for isolation
Key storage security	Medium	Consider HSM/KMS integration

Assumptions & Constraints

Assumptions:

Docker available for deployment
PostgreSQL 14+ and Redis 7+ available
TLS termination at load balancer or reverse proxy
Users have reliable network connectivity
Upstream provider APIs remain stable

Constraints:

Single admin token (no RBAC for admins)
PostgreSQL required (no serverless DB option currently)
Redis required (no in-memory-only option)
English + Chinese UI (i18n limited)

Future Considerations

Phase 2: Enhanced Intelligence

Provider Balance Fetching: Real-time quota awareness
Quota-Aware Scheduling: Smart routing based on remaining balance
Advanced Analytics: ML-based usage prediction

Phase 3: Platform Expansion

Multi-Database Support: MySQL, SQLite adapters
Electron Client: Desktop application for personal use
Plugin System: Custom provider adapters

Phase 4: Enterprise Features

SSO/SAML: Enterprise authentication
RBAC: Fine-grained admin permissions
Audit Logging: Compliance-grade logging
Multi-Region: Geographic distribution

Approval & Sign-off

Review Status:

Project Maintainer
Core Contributors
Community Review

Revision History

Version	Date	Author	Changes
1.0	2025-11-29	ding	Initial architecture

Next Steps

Phase 4: Sprint Planning & Implementation

Run /sprint-planning to:

Break epics into detailed user stories
Estimate story complexity
Plan sprint iterations
Begin implementation following this architectural blueprint

Key Implementation Principles:

Follow component boundaries defined in this document
Implement NFR solutions as specified
Use technology stack as defined
Follow API contracts exactly
Adhere to security and performance guidelines

This document was created using BMAD Method v6 - Phase 3 (Solutioning)

To continue: Run /workflow-status to see your progress and next recommended workflow.

Appendix A: Technology Evaluation Matrix

Category	Choice	Alternatives Considered	Key Factor
Framework	Next.js 15	Remix, Nuxt	SSR + API in one
Router	Hono	Express, Fastify	Performance
ORM	Drizzle	Prisma, TypeORM	Type safety + SQL
Database	PostgreSQL	MySQL, SQLite	JSON support + reliability
Cache	Redis	Memcached	Data structures + Lua
UI Library	shadcn/ui	MUI, Chakra	Customizable + lightweight

Appendix B: Capacity Planning

Baseline Assumptions:

50 concurrent users
10 requests/minute per user average
500 RPM total
2KB average request, 10KB average response

Resource Estimates:

Component	CPU	Memory	Storage
App (per instance)	2 cores	2GB	-
PostgreSQL	2 cores	4GB	50GB
Redis	1 core	1GB	1GB

Scaling Triggers:

Add app instance at 70% CPU
Scale PostgreSQL at 80% connections
Scale Redis at 70% memory

Appendix C: Cost Estimation

Self-Hosted (Small Team):

1 VPS (4 CPU, 8GB): ~$40/month
Total: ~$40/month

Self-Hosted (Medium Team):

2 App VPS (4 CPU, 8GB): ~$80/month
1 Database VPS (4 CPU, 16GB): ~$60/month
Total: ~$140/month

Cloud (AWS equivalent):

2 x t3.medium (app): ~$60/month
1 x db.t3.medium (RDS): ~$50/month
1 x cache.t3.micro (ElastiCache): ~$15/month
Total: ~$125/month

Note: Excludes upstream AI API costs (user-provided)

architecture-claude-code-hub-2025-11-29.md 58 KB Link permanente Histórico Em bruto

System Architecture: claude-code-hub

Document Overview

Executive Summary

Architectural Drivers

Critical Drivers (Must Address)

Important Drivers (Should Address)

System Overview

High-Level Architecture

Architecture Diagram

Architectural Pattern

Technology Stack

Frontend

Backend

Database

Caching & Session

Infrastructure

Third-Party Services

Development & Deployment

System Components

Component: Guard Pipeline

Component: Provider Selector

Component: Format Converter

Component: Circuit Breaker

Component: Response Handler

Component: Message Service

Component: Server Actions (Admin API)

Data Architecture

Data Model

Database Design

Data Flow

API Design

API Architecture

Proxy Endpoints

Admin API (Server Actions)

Authentication & Authorization

Non-Functional Requirements Coverage

NFR-001: Proxy Latency

NFR-002: Concurrent Capacity

NFR-003: Streaming Reliability

NFR-004: Authentication Security

NFR-007: High Availability

Security Architecture

Authentication

Authorization

Data Encryption

Security Best Practices

Scalability & Performance

Scaling Strategy

Performance Optimization

Caching Strategy

Load Balancing

Reliability & Availability

High Availability Design

Disaster Recovery

Backup Strategy

Monitoring & Alerting

Integration Architecture

External Integrations

Internal Integrations

Message/Event Architecture

Development Architecture

Code Organization

Module Structure

Testing Strategy

CI/CD Pipeline

Deployment Architecture

Environments

Deployment Strategy

Infrastructure as Code

Requirements Traceability

Functional Requirements Coverage

Non-Functional Requirements Coverage

Trade-offs & Decision Log

Decision 1: Modular Monolith vs Microservices

Decision 2: Hono vs Express/Fastify

Decision 3: Redis for Sessions vs Database Sessions

Decision 4: PostgreSQL vs SQLite for Simplicity

Open Issues & Risks

Assumptions & Constraints

architecture-claude-code-hub-2025-11-29.md 58 KB

Link permanente Histórico Em bruto