Chris Estreich e2bdce0961 Update benchmark/prompts/cpp.md 10 months ago
..
prompts e2bdce0961 Update benchmark/prompts/cpp.md 10 months ago
src f108dfaeb8 Evals 10 months ago
.env.local.sample f108dfaeb8 Evals 10 months ago
Dockerfile f108dfaeb8 Evals 10 months ago
README.md f108dfaeb8 Evals 10 months ago
entrypoint.sh f108dfaeb8 Evals 10 months ago
package-lock.json f108dfaeb8 Evals 10 months ago
package.json f108dfaeb8 Evals 10 months ago
tsconfig.json f108dfaeb8 Evals 10 months ago

README.md

Benchmark Harness

Configure ENV vars (OpenRouter, PostHog, etc):

cp .env.local.sample .env.local
# Update ENV vars as needed.

Build and run a Docker image with the development environment needed to run the benchmarks (C++, Go, Java, Node.js, Python & Rust):

npm run docker:start

Run an exercise:

npm run docker:benchmark -- -e exercises/javascript/binary

Select and run an exercise:

npm run cli

Select and run an exercise for a specific language:

npm run cli -- run rust

Run all exercises for a language:

npm run cli -- run rust all

Run all exercises:

npm run cli -- run all

Run all exercises using a specific runId (useful for re-trying when an unexpected error occurs):

npm run cli -- run all --runId 1