auto-compact.mdx 3.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475
  1. ---
  2. title: "Automatic Context Summarization"
  3. sidebarTitle: "Auto Compact"
  4. ---
  5. When your conversation approaches the model's context window limit, Cline automatically summarizes it to free up space and keep working.
  6. <Frame>
  7. <img
  8. src="https://storage.googleapis.com/cline_public_images/docs/assets/condensing.png"
  9. alt="Auto-compact feature condensing conversation context"
  10. />
  11. </Frame>
  12. ## How It Works
  13. Cline monitors token usage during your conversation. When you're getting close to the limit, he:
  14. 1. Creates a comprehensive summary of everything that's happened
  15. 2. Preserves all the technical details, code changes, and decisions
  16. 3. Replaces the conversation history with the summary
  17. 4. Continues exactly where he left off
  18. You'll see a summarization tool call when this happens, showing the total cost like any other api call in the chat view.
  19. ## Why This Matters
  20. Previously, Cline would truncate older messages when hitting context limits. This meant losing important context from earlier in the conversation.
  21. Now with summarization:
  22. - All technical decisions and code patterns are preserved
  23. - File changes and project context remain intact
  24. - Cline remembers everything he's done
  25. - You can work on much larger projects without interruption
  26. <Tip>
  27. Context Summarization synergizes beautifully with [Focus Chain](/features/focus-chain). When Focus Chain is enabled, todo lists persist across summarizations. This means Cline can work on long-horizon tasks that span multiple context windows while staying on track with the todo list guiding him through each reset.
  28. </Tip>
  29. ## Technical Details
  30. The summarization happens through your configured API provider using the same model you're already using. It leverages prompt caching to minimize costs.
  31. 1. Cline uses a [summarization prompt](https://github.com/cline/cline/blob/main/src/core/prompts/contextManagement.ts) to request a summary of the conversation.
  32. 2. Once the summary is generated, Cline replaces the conversation history with a [continuation prompt](https://github.com/cline/cline/blob/main/src/core/prompts/contextManagement.ts#L69) that asks Cline to keep working and provides the summary as context.
  33. Different models have different context window thresholds for when auto-summarization kicks in. You can see how thresholds are determined in [context-window-utils.ts](https://github.com/cline/cline/blob/main/src/core/context/context-management/context-window-utils.ts).
  34. ## Cost Considerations
  35. Summarization leverages your existing prompt cache from the conversation, so it costs about the same as any other tool call.
  36. Since most input tokens are already cached, you're primarily paying for the summary generation (output tokens), making it very cost-effective.
  37. ## Restoring Context with Checkpoints
  38. You can use [checkpoints](/features/checkpoints) to restore your task state from before a summarization occurred. This means you never truly lose context - you can always roll back to previous versions of your conversation.
  39. <Note>
  40. Editing a message before a summarization tool call will work similarly to a checkpoint, allowing you to restore the conversation to that point.
  41. </Note>
  42. ## Next Generation Model Support
  43. Auto Compact uses advanced LLM-based summarization which we've found works significantly better for next-generation models. We currently support this feature for the following models:
  44. - **Claude 4 series**
  45. - **Gemini 2.5 series**
  46. - **GPT-5**
  47. - **Grok 4**
  48. <Note>
  49. When using other models, Cline automatically falls back to the standard rule-based context truncation method, even if Auto Compact is enabled in settings.
  50. </Note>