This guide explains how to add new coding exercises to the Roo Code evals system. The evals system is a distributed evaluation platform that runs AI coding tasks in isolated VS Code environments to test AI coding capabilities across multiple programming languages.
An eval (evaluation) is fundamentally a coding exercise with a known solution that is expressed as a set of unit tests that must pass in order to prove the correctness of a solution. Each eval consists of:
The key principle is that the tests define the contract - if all tests pass, the solution is considered correct. This provides an objective, automated way to measure AI coding performance across different programming languages and problem domains.
Example Flow:
The evals system consists of several key components:
Roo-Code-Evals - Contains all exercise definitionsapps/web-evals - Management interface for creating and monitoring evaluation runspackages/evals - Contains both controller logic for orchestrating evaluation runs and runner container code for executing individual taskscontroller and runner as well as a Docker Compose file that provisions Postgres and Redis instances required for eval runs.The system currently supports these programming languages:
go test for testingcargo test for testingTL;DR - Here's a pull request that adds a new JavaScript eval: https://github.com/RooCodeInc/Roo-Code-Evals/pull/3
Each exercise follows a standardized directory structure:
/evals/{language}/{exercise-name}/
├── docs/
│ ├── instructions.md # Main exercise description
│ └── instructions.append.md # Additional instructions (optional)
├── {exercise-name}.{ext} # Implementation stub
├── {exercise-name}_test.{ext} # Test file
└── {language-specific-files} # go.mod, package.json, etc.
Clone the evals repository:
git clone https://github.com/RooCodeInc/Roo-Code-Evals.git evals
cd evals
Create exercise directory:
mkdir {language}/{exercise-name}
cd {language}/{exercise-name}
Create docs/instructions.md with a clear problem description:
# Instructions
Create an implementation of [problem description].
## Problem Description
[Detailed explanation of what needs to be implemented]
## Examples
- Input: [example input]
- Output: [expected output]
## Constraints
- [Any constraints or requirements]
Example from a simple reverse-string exercise:
# Instructions
Create a function that reverses a string.
## Problem Description
Write a function called `reverse` that takes a string as input and returns the string with its characters in reverse order.
## Examples
- Input: `reverse("hello")` → Output: `"olleh"`
- Input: `reverse("world")` → Output: `"dlrow"`
- Input: `reverse("")` → Output: `""`
- Input: `reverse("a")` → Output: `"a"`
## Constraints
- Input will always be a valid string
- Empty strings should return empty strings
Create the main implementation file with function signatures but no implementation:
Python example (reverse_string.py):
def reverse(text):
pass
Go example (reverse_string.go):
package reversestring
// Reverse returns the input string with its characters in reverse order
func Reverse(s string) string {
// TODO: implement
return ""
}
Create test files that validate the implementation:
Python example (reverse_string_test.py):
import unittest
from reverse_string import reverse
class ReverseStringTest(unittest.TestCase):
def test_reverse_hello(self):
self.assertEqual(reverse("hello"), "olleh")
def test_reverse_world(self):
self.assertEqual(reverse("world"), "dlrow")
def test_reverse_empty_string(self):
self.assertEqual(reverse(""), "")
def test_reverse_single_character(self):
self.assertEqual(reverse("a"), "a")
Go example (reverse_string_test.go):
package reversestring
import "testing"
func TestReverse(t *testing.T) {
tests := []struct {
input string
expected string
}{
{"hello", "olleh"},
{"world", "dlrow"},
{"", ""},
{"a", "a"},
}
for _, test := range tests {
result := Reverse(test.input)
if result != test.expected {
t.Errorf("Reverse(%q) = %q, expected %q", test.input, result, test.expected)
}
}
}
For Go exercises, create go.mod:
module reverse-string
go 1.18
For Python exercises, ensure the parent directory has pyproject.toml:
[project]
name = "python-exercises"
version = "0.1.0"
description = "Python exercises for Roo Code evals"
requires-python = ">=3.9"
dependencies = [
"pytest>=8.3.5",
]
Before committing, test your exercise locally:
Python:
cd python/reverse-string
uv run python3 -m pytest -o markers=task reverse_string_test.py
Go:
cd go/reverse-string
go test
The tests should fail with the stub implementation and pass when properly implemented.
Adding a new programming language requires changes to both the evals repository and the main Roo Code repository.
Add language to supported list in packages/evals/src/exercises/index.ts:
export const exerciseLanguages = [
"go",
"java",
"javascript",
"python",
"rust",
"your-new-language", // Add here
] as const
Create prompts/{language}.md in the evals repository:
Your job is to complete a coding exercise described the markdown files inside the `docs` directory.
A file with the implementation stubbed out has been created for you, along with a test file (the tests should be failing initially).
To successfully complete the exercise, you must pass all the tests in the test file.
To confirm that your solution is correct, run the tests with `{test-command}`. Do not alter the test file; it should be run as-is.
Do not use the "ask_followup_question" tool. Your job isn't done until the tests pass. Don't attempt completion until you run the tests and they pass.
You should start by reading the files in the `docs` directory so that you understand the exercise, and then examine the stubbed out implementation and the test file.
Replace {test-command} with the appropriate testing command for your language.
Modify packages/evals/Dockerfile.runner to install the new language runtime:
# Install your new language runtime
RUN apt update && apt install -y your-language-runtime
# Or for languages that need special installation:
ARG YOUR_LANGUAGE_VERSION=1.0.0
RUN curl -sSL https://install-your-language.sh | sh -s -- --version ${YOUR_LANGUAGE_VERSION}
If your language requires special test execution, update packages/evals/src/cli/runUnitTest.ts to handle the new language's testing framework.
Create at least 2-3 exercises for the new language following the structure described in the previous section.