|
|
6 сар өмнө | |
|---|---|---|
| .. | ||
| .docker | cb5b9c3718 Improve Docker setup for evals (#4327) | 7 сар өмнө |
| scripts | f61b4e46a8 Fix git clone in evals setup script (#4418) | 7 сар өмнө |
| src | ab01fb3bdb Move to Postgres for evals on roocode.com (#4520) | 6 сар өмнө |
| .env.development | d87f890556 Move evals into pnpm workspace, switch from SQLite to Postgres (#4278) | 7 сар өмнө |
| .env.test | d87f890556 Move evals into pnpm workspace, switch from SQLite to Postgres (#4278) | 7 сар өмнө |
| .gitignore | cb5b9c3718 Improve Docker setup for evals (#4327) | 7 сар өмнө |
| ADDING-EVALS.md | 2bf4347cc3 Document how to add evals (#4470) | 6 сар өмнө |
| ARCHITECTURE.md | cb5b9c3718 Improve Docker setup for evals (#4327) | 7 сар өмнө |
| Dockerfile.runner | 8d5dab3518 GHA evals (#4472) | 6 сар өмнө |
| Dockerfile.web | 8d5dab3518 GHA evals (#4472) | 6 сар өмнө |
| README.md | 73ed9f2b26 Docker cleanup script (#4469) | 6 сар өмнө |
| docker-compose.yml | 083ac9333a Revert "chore(deps): update postgres docker tag to v17" (#4557) | 6 сар өмнө |
| drizzle.config.ts | d87f890556 Move evals into pnpm workspace, switch from SQLite to Postgres (#4278) | 7 сар өмнө |
| eslint.config.mjs | d87f890556 Move evals into pnpm workspace, switch from SQLite to Postgres (#4278) | 7 сар өмнө |
| package.json | ab01fb3bdb Move to Postgres for evals on roocode.com (#4520) | 6 сар өмнө |
| tsconfig.json | d87f890556 Move evals into pnpm workspace, switch from SQLite to Postgres (#4278) | 7 сар өмнө |
| vitest-global-setup.ts | d87f890556 Move evals into pnpm workspace, switch from SQLite to Postgres (#4278) | 7 сар өмнө |
| vitest.config.ts | 395f55b31f Convert jest tests to vitest and disable default watch mode for vitest (#4568) | 6 сар өмнө |
Clone the Roo Code repo:
git clone https://github.com/RooCodeInc/Roo-Code.git
cd Roo-Code
Add your OpenRouter API key:
echo "OPENROUTER_API_KEY=sk-or-v1-[...]" > packages/evals/.env.local
Start the evals service:
docker compose -f packages/evals/docker-compose.yml --profile server --profile runner up --build --scale runner=0
The initial build process can take a minute or two. Upon success you should see ouput indicating that a web service is running on localhost:3000:
Additionally, you'll find in Docker Desktop that database and redis services are running:
Navigate to localhost:3000 in your browser and click the 🚀 button.
By default a evals run will run all programming exercises in Roo Code Evals repository with the Claude Sonnet 4 model and default settings. For basic configuration you can specify the LLM to use and any subset of the exercises you'd like. For advanced configuration you can import a Roo Code settings file which will allow you to run the evals with Roo Code configured any way you'd like (this includes custom modes, a footgun prompt, etc).
After clicking "Launch" you should find that a "controller" container has spawned as well as N "task" containers where N is the value you chose for concurrency:
The web app's UI should update in realtime with the results of the eval run:
If you want to run evals with high parallelism by increasing the concurrency you need to be mindful of your Docker resource limits.
We've found the following formula to be helpful in practice:
Memory Limit = 3GB * concurrency
CPU Limit = 2 * concurrency
The memory and CPU limits can be set from the "Resources" section of the Docker Desktop settings:
To stop an evals run early you can simply stop the "controller" container using Docker Desktop. This will prevent any new task containers from being spawned. You can optionally stop any existing task containers immediately or let them finish their current tasks at which point they will exit.
The evals system runs VS Code headlessly in Docker containers for consistent, reproducible environments. While this design ensures reliability, it can make debugging more challenging. For debugging purposes, you can run the system locally on macOS, though this approach is less reliable due to hardware and environment variability.
To configure your MacOS system to run evals locally, execute the setup script:
cd packages/evals && ./scripts/setup.sh
The setup script does the following:
.env.localHere are some errors that you might encounter along with potential fixes:
Problem:
Error response from daemon: network 3d812c43410fcad072c764fa872a53fc0a5edf33634964699242a886947aff1a not found
Solution:
Prune orphaned resources:
docker system prune -f