Peter Dave Hello 85b18ad524 chore: bump Node.js version from 20.19.2 to 20.20.0 (#5076) 3 周之前
..
approvals 815feaec44 Add test case for inline completion in middle of expression 2 月之前
test-cases 1067415ac9 Merge branch 'main' into mark/improve-evals 1 月之前
.gitignore 8f1046bf07 Add html-output to gitignore and document report command 2 月之前
README.md 3b3721f364 docs: shorten LLM autocompletion README 2 月之前
approvals.spec.ts ecd17d4d74 Improve prompt for opus reviewing autocomplete suggestions 2 月之前
approvals.ts ecd17d4d74 Improve prompt for opus reviewing autocomplete suggestions 2 月之前
ghost-provider-tester.ts 96ad75aa1b feat: add context file support to autocomplete benchmark 2 月之前
html-report.ts 2fffbe00c4 Style prefix/suffix in grey in HTML report outputs 2 月之前
llm-client.ts 9f9fcb5a6b refactor: reuse getKiloBaseUriFromToken from llm-client in opus-approval 2 月之前
mock-context-provider.ts 4addf81c98 refactor: merge snippets/types.ts into autocomplete/types.ts 1 月之前
mock-vscode.ts 086beab38b do less mocking for the benchmark runner 4 月之前
opus-approval.ts ecd17d4d74 Improve prompt for opus reviewing autocomplete suggestions 2 月之前
package.json 85b18ad524 chore: bump Node.js version from 20.19.2 to 20.20.0 (#5076) 3 周之前
runner.ts 92c23c83d7 refactor: extract HTML report generation to separate file 2 月之前
test-cases.spec.ts 3c0da31183 also add tests 4 月之前
test-cases.ts 883a44b897 inline 2 月之前
tsconfig.json 6563f031d2 less duplication 4 月之前
utils.ts ff7a4773d9 Added fim-tester to test-llm-autocompletion 3 月之前

README.md

LLM Autocompletion Tests

Standalone approval-test suite for GhostInlineCompletionProvider using real LLM calls.

Setup

cd src/test-llm-autocompletion
cp .env.example .env

Set your kilocode API key in .env.

What “approval testing” means here

  • First time a test produces a new completion, you’ll be asked to approve/reject it.
  • The decision is stored under approvals/<category>/<test>/approved|rejected/*.txt.
  • Next runs:
    • matches an approved output → pass
    • matches a rejected output → fail
    • unseen output → prompt again (unless using --skip-approval / --opus-approval)

Run tests

# All tests
pnpm run test

# Verbose
pnpm run test:verbose

# Single test (substring match)
pnpm run test closing-brace

# Repeat runs (works with single test or all)
pnpm run test --runs 5
pnpm run test closing-brace --runs 5
pnpm run test -r 5

Non-interactive / CI mode

# Don’t prompt; fail only on known rejected outputs.
# New outputs become "unknown".
pnpm run test --skip-approval
pnpm run test -sa

Opus auto-approval (batching new outputs)

# Uses Claude Opus to auto-judge new outputs as APPROVED/REJECTED
pnpm run test --opus-approval
pnpm run test -oa

Clean up approvals for removed/renamed test cases

pnpm run clean

Model + completion strategy

Default model: mistralai/codestral-2508 (supports FIM).

# Override model
LLM_MODEL=anthropic/claude-3-haiku pnpm run test

The suite mirrors production behavior via GhostProviderTester:

  • If the model supports FIM → ghost-provider-fim (uses FimPromptBuilder)
  • Otherwise → ghost-provider-holefiller (uses HoleFiller)

HTML report

pnpm run test report

Outputs to html-output/ (gitignored):

  • html-output/index.html overview by category
  • per-test pages with input + all approved/rejected outputs