Chris Estreich e2bdce0961 Update benchmark/prompts/cpp.md hace 11 meses
..
prompts e2bdce0961 Update benchmark/prompts/cpp.md hace 11 meses
src f108dfaeb8 Evals hace 11 meses
.env.local.sample f108dfaeb8 Evals hace 11 meses
Dockerfile f108dfaeb8 Evals hace 11 meses
README.md f108dfaeb8 Evals hace 11 meses
entrypoint.sh f108dfaeb8 Evals hace 11 meses
package-lock.json f108dfaeb8 Evals hace 11 meses
package.json f108dfaeb8 Evals hace 11 meses
tsconfig.json f108dfaeb8 Evals hace 11 meses

README.md

Benchmark Harness

Configure ENV vars (OpenRouter, PostHog, etc):

cp .env.local.sample .env.local
# Update ENV vars as needed.

Build and run a Docker image with the development environment needed to run the benchmarks (C++, Go, Java, Node.js, Python & Rust):

npm run docker:start

Run an exercise:

npm run docker:benchmark -- -e exercises/javascript/binary

Select and run an exercise:

npm run cli

Select and run an exercise for a specific language:

npm run cli -- run rust

Run all exercises for a language:

npm run cli -- run rust all

Run all exercises:

npm run cli -- run all

Run all exercises using a specific runId (useful for re-trying when an unexpected error occurs):

npm run cli -- run all --runId 1