Hannes Rudolph bd78a63844 feat(evals): add message log deduper utility (#10286) 6 days ago
..
.docker fca4bea7c3 Update evals Docker setup to work on Windows. (#4656) 6 months ago
scripts cad6145241 Switch from asdf to mise-en-place in bare-metal evals setup script (#9548) 1 month ago
src bd78a63844 feat(evals): add message log deduper utility (#10286) 6 days ago
.env.development 5a459d0516 Fix Docker port conflicts for evals services (#5909) 5 months ago
.env.test 5a459d0516 Fix Docker port conflicts for evals services (#5909) 5 months ago
.gitignore cb5b9c3718 Improve Docker setup for evals (#4327) 6 months ago
ADDING-EVALS.md 2bf4347cc3 Document how to add evals (#4470) 6 months ago
ARCHITECTURE.md cb5b9c3718 Improve Docker setup for evals (#4327) 6 months ago
Dockerfile.runner 08eed65aff fix(evals): add missing packages/core to Dockerfile.runner (#10272) 1 week ago
Dockerfile.web 25f61943ca Set port 3446 for web-evals in production mode (#8288) 3 months ago
README.md 25f61943ca Set port 3446 for web-evals in production mode (#8288) 3 months ago
docker-compose.override.yml 3f0a6971ca feat(web-evals): add task log viewing, export failed logs, and new run options (#9637) 1 month ago
docker-compose.yml 3f0a6971ca feat(web-evals): add task log viewing, export failed logs, and new run options (#9637) 1 month ago
drizzle.config.ts d87f890556 Move evals into pnpm workspace, switch from SQLite to Postgres (#4278) 6 months ago
eslint.config.mjs d87f890556 Move evals into pnpm workspace, switch from SQLite to Postgres (#4278) 6 months ago
package.json 247da38b02 Add model info to eval runs table (#7749) 3 months ago
tsconfig.json d87f890556 Move evals into pnpm workspace, switch from SQLite to Postgres (#4278) 6 months ago
vitest-global-setup.ts d87f890556 Move evals into pnpm workspace, switch from SQLite to Postgres (#4278) 6 months ago
vitest.config.ts 395f55b31f Convert jest tests to vitest and disable default watch mode for vitest (#4568) 6 months ago

README.md

Run Roo Code Evals

Prerequisites

Setup

Clone the Roo Code repo:

git clone https://github.com/RooCodeInc/Roo-Code.git
cd Roo-Code

Add your OpenRouter API key:

echo "OPENROUTER_API_KEY=sk-or-v1-[...]" > packages/evals/.env.local

Run

Start the evals service:

pnpm evals

The initial build process can take a minute or two. Upon success you should see output indicating that a web service is running on localhost:3446:

Additionally, you'll find in Docker Desktop that database and redis services are running:

Navigate to localhost:3446 in your browser and click the 🚀 button.

By default a evals run will run all programming exercises in Roo Code Evals repository with the Claude Sonnet 4 model and default settings. For basic configuration you can specify the LLM to use and any subset of the exercises you'd like. For advanced configuration you can import a Roo Code settings file which will allow you to run the evals with Roo Code configured any way you'd like (this includes custom modes, a footgun prompt, etc).

After clicking "Launch" you should find that a "controller" container has spawned as well as N "task" containers where N is the value you chose for concurrency:

The web app's UI should update in realtime with the results of the eval run:

Resource Usage

If you want to run evals with high parallelism by increasing the concurrency you need to be mindful of your Docker resource limits.

We've found the following formula to be helpful in practice:

Memory Limit = 3GB * concurrency
CPU Limit = 2 * concurrency

The memory and CPU limits can be set from the "Resources" section of the Docker Desktop settings:

Stopping

To stop an evals run early you can simply stop the "controller" container using Docker Desktop. This will prevent any new task containers from being spawned. You can optionally stop any existing task containers immediately or let them finish their current tasks at which point they will exit.

Advanced Usage

The evals system runs VS Code headlessly in Docker containers for consistent, reproducible environments. While this design ensures reliability, it can make debugging more challenging. For debugging purposes, you can run the system locally on macOS, though this approach is less reliable due to hardware and environment variability.

To configure your MacOS system to run evals locally, execute the setup script:

cd packages/evals && ./scripts/setup.sh

The setup script does the following:

  • Installs development tools: Homebrew, asdf, GitHub CLI, pnpm
  • Installs programming languages: Node.js 20.19.2, Python 3.13.2, Go 1.24.2, Rust 1.85.1, Java 17
  • Sets up VS Code with required extensions
  • Configures Docker services (PostgreSQL, Redis)
  • Clones/updates the evals repository
  • Creates and migrates a Postgres database
  • Prompts for an OpenRouter API key to add to .env.local
  • Optionally builds and installs the Roo Code extension from source

Port Configuration

By default, the evals system uses the following ports:

  • PostgreSQL: 5433 (external) → 5432 (internal)
  • Redis: 6380 (external) → 6379 (internal)
  • Web Service: 3446 (external) → 3446 (internal)

These ports are configured to avoid conflicts with other services that might be running on the standard PostgreSQL (5432) and Redis (6379) ports.

Customizing Ports

If you need to use different ports, you can customize them by creating a .env.local file in the packages/evals/ directory:

# Copy the example file and customize as needed
cp packages/evals/.env.local.example packages/evals/.env.local

Then edit .env.local to set your preferred ports:

# Custom port configuration
EVALS_DB_PORT=5434
EVALS_REDIS_PORT=6381
EVALS_WEB_PORT=3447

# Optional: Override database URL if needed
DATABASE_URL=postgres://postgres:password@localhost:5434/evals_development

Port Conflict Resolution

If you encounter port conflicts when running pnpm evals, you have several options:

  1. Use the default configuration (recommended): The system now uses non-standard ports by default
  2. Stop conflicting services: Temporarily stop other PostgreSQL/Redis services
  3. Customize ports: Use the .env.local file to set different ports
  4. Use Docker networks: Run services in isolated Docker networks

Troubleshooting

Here are some errors that you might encounter along with potential fixes:

Problem:

Error response from daemon: network 3d812c43410fcad072c764fa872a53fc0a5edf33634964699242a886947aff1a not found

Solution:

Prune orphaned resources:

docker system prune -f