Chris Estreich e2bdce0961 Update benchmark/prompts/cpp.md il y a 9 mois
..
prompts e2bdce0961 Update benchmark/prompts/cpp.md il y a 9 mois
src f108dfaeb8 Evals il y a 9 mois
.env.local.sample f108dfaeb8 Evals il y a 9 mois
Dockerfile f108dfaeb8 Evals il y a 9 mois
README.md f108dfaeb8 Evals il y a 9 mois
entrypoint.sh f108dfaeb8 Evals il y a 9 mois
package-lock.json f108dfaeb8 Evals il y a 9 mois
package.json f108dfaeb8 Evals il y a 9 mois
tsconfig.json f108dfaeb8 Evals il y a 9 mois

README.md

Benchmark Harness

Configure ENV vars (OpenRouter, PostHog, etc):

cp .env.local.sample .env.local
# Update ENV vars as needed.

Build and run a Docker image with the development environment needed to run the benchmarks (C++, Go, Java, Node.js, Python & Rust):

npm run docker:start

Run an exercise:

npm run docker:benchmark -- -e exercises/javascript/binary

Select and run an exercise:

npm run cli

Select and run an exercise for a specific language:

npm run cli -- run rust

Run all exercises for a language:

npm run cli -- run rust all

Run all exercises:

npm run cli -- run all

Run all exercises using a specific runId (useful for re-trying when an unexpected error occurs):

npm run cli -- run all --runId 1