Peter Dave Hello 85b18ad524 chore: bump Node.js version from 20.19.2 to 20.20.0 (#5076)		3 周之前
..
approvals	815feaec44 Add test case for inline completion in middle of expression	2 月之前
test-cases	1067415ac9 Merge branch 'main' into mark/improve-evals	1 月之前
.gitignore	8f1046bf07 Add html-output to gitignore and document report command	2 月之前
README.md	3b3721f364 docs: shorten LLM autocompletion README	2 月之前
approvals.spec.ts	ecd17d4d74 Improve prompt for opus reviewing autocomplete suggestions	2 月之前
approvals.ts	ecd17d4d74 Improve prompt for opus reviewing autocomplete suggestions	2 月之前
ghost-provider-tester.ts	96ad75aa1b feat: add context file support to autocomplete benchmark	2 月之前
html-report.ts	2fffbe00c4 Style prefix/suffix in grey in HTML report outputs	2 月之前
llm-client.ts	9f9fcb5a6b refactor: reuse getKiloBaseUriFromToken from llm-client in opus-approval	2 月之前
mock-context-provider.ts	4addf81c98 refactor: merge snippets/types.ts into autocomplete/types.ts	1 月之前
mock-vscode.ts	086beab38b do less mocking for the benchmark runner	4 月之前
opus-approval.ts	ecd17d4d74 Improve prompt for opus reviewing autocomplete suggestions	2 月之前
package.json	85b18ad524 chore: bump Node.js version from 20.19.2 to 20.20.0 (#5076)	3 周之前
runner.ts	92c23c83d7 refactor: extract HTML report generation to separate file	2 月之前
test-cases.spec.ts	3c0da31183 also add tests	4 月之前
test-cases.ts	883a44b897 inline	2 月之前
tsconfig.json	6563f031d2 less duplication	4 月之前
utils.ts	ff7a4773d9 Added fim-tester to test-llm-autocompletion	3 月之前

LLM Autocompletion Tests

Standalone approval-test suite for GhostInlineCompletionProvider using real LLM calls.

Setup

cd src/test-llm-autocompletion
cp .env.example .env

Set your kilocode API key in .env.

What “approval testing” means here

First time a test produces a new completion, you’ll be asked to approve/reject it.
The decision is stored under approvals/<category>/<test>/approved|rejected/*.txt.
Next runs:
- matches an approved output → pass
- matches a rejected output → fail
- unseen output → prompt again (unless using --skip-approval / --opus-approval)

Run tests

# All tests
pnpm run test

# Verbose
pnpm run test:verbose

# Single test (substring match)
pnpm run test closing-brace

# Repeat runs (works with single test or all)
pnpm run test --runs 5
pnpm run test closing-brace --runs 5
pnpm run test -r 5

Non-interactive / CI mode

# Don’t prompt; fail only on known rejected outputs.
# New outputs become "unknown".
pnpm run test --skip-approval
pnpm run test -sa

Opus auto-approval (batching new outputs)

# Uses Claude Opus to auto-judge new outputs as APPROVED/REJECTED
pnpm run test --opus-approval
pnpm run test -oa

Clean up approvals for removed/renamed test cases

pnpm run clean

Model + completion strategy

Default model: mistralai/codestral-2508 (supports FIM).

# Override model
LLM_MODEL=anthropic/claude-3-haiku pnpm run test

The suite mirrors production behavior via GhostProviderTester:

If the model supports FIM → ghost-provider-fim (uses FimPromptBuilder)
Otherwise → ghost-provider-holefiller (uses HoleFiller)

HTML report

pnpm run test report

Outputs to html-output/ (gitignored):

html-output/index.html overview by category
per-test pages with input + all approved/rejected outputs

README.md