|
|
@@ -0,0 +1,305 @@
|
|
|
+# Adding Additional Evals Exercises
|
|
|
+
|
|
|
+This guide explains how to add new coding exercises to the Roo Code evals system. The evals system is a distributed evaluation platform that runs AI coding tasks in isolated VS Code environments to test AI coding capabilities across multiple programming languages.
|
|
|
+
|
|
|
+## Table of Contents
|
|
|
+
|
|
|
+1. [What is an "Eval"?](#what-is-an-eval)
|
|
|
+2. [System Overview](#system-overview)
|
|
|
+3. [Adding Exercises to Existing Languages](#adding-exercises-to-existing-languages)
|
|
|
+4. [Adding Support for New Programming Languages](#adding-support-for-new-programming-languages)
|
|
|
+
|
|
|
+## What is an "Eval"?
|
|
|
+
|
|
|
+An **eval** (evaluation) is fundamentally a coding exercise with a known solution that is expressed as a set of unit tests that must pass in order to prove the correctness of a solution. Each eval consists of:
|
|
|
+
|
|
|
+- **Problem Description**: Clear instructions explaining what needs to be implemented
|
|
|
+- **Implementation Stub**: A skeleton file with function signatures but no implementation
|
|
|
+- **Unit Tests**: Comprehensive test suite that validates the correctness of the solution
|
|
|
+- **Success Criteria**: The AI must implement the solution such that all unit tests pass
|
|
|
+
|
|
|
+The key principle is that the tests define the contract - if all tests pass, the solution is considered correct. This provides an objective, automated way to measure AI coding performance across different programming languages and problem domains.
|
|
|
+
|
|
|
+**Example Flow**:
|
|
|
+
|
|
|
+1. AI receives a problem description (e.g., "implement a function that reverses a string")
|
|
|
+2. AI examines the stub implementation and test file
|
|
|
+3. AI writes code to make all tests pass
|
|
|
+4. System runs tests to verify correctness
|
|
|
+5. Success is measured by test pass/fail rate
|
|
|
+
|
|
|
+## System Overview
|
|
|
+
|
|
|
+The evals system consists of several key components:
|
|
|
+
|
|
|
+- **Exercises Repository**: [`Roo-Code-Evals`](https://github.com/RooCodeInc/Roo-Code-Evals) - Contains all exercise definitions
|
|
|
+- **Web Interface**: [`apps/web-evals`](../apps/web-evals) - Management interface for creating and monitoring evaluation runs
|
|
|
+- **Evals Package**: [`packages/evals`](../packages/evals) - Contains both controller logic for orchestrating evaluation runs and runner container code for executing individual tasks
|
|
|
+- **Docker Configuration**: Container definitions for the `controller` and `runner` as well as a Docker Compose file that provisions Postgres and Redis instances required for eval runs.
|
|
|
+
|
|
|
+### Current Language Support
|
|
|
+
|
|
|
+The system currently supports these programming languages:
|
|
|
+
|
|
|
+- **Go** - `go test` for testing
|
|
|
+- **Java** - Maven/Gradle for testing
|
|
|
+- **JavaScript** - Node.js with Jest/Mocha
|
|
|
+- **Python** - pytest for testing
|
|
|
+- **Rust** - `cargo test` for testing
|
|
|
+
|
|
|
+## Adding Exercises to Existing Languages
|
|
|
+
|
|
|
+TL;DR - Here's a pull request that adds a new JavaScript eval: https://github.com/RooCodeInc/Roo-Code-Evals/pull/3
|
|
|
+
|
|
|
+### Step 1: Understand the Exercise Structure
|
|
|
+
|
|
|
+Each exercise follows a standardized directory structure:
|
|
|
+
|
|
|
+```
|
|
|
+/evals/{language}/{exercise-name}/
|
|
|
+├── docs/
|
|
|
+│ ├── instructions.md # Main exercise description
|
|
|
+│ └── instructions.append.md # Additional instructions (optional)
|
|
|
+├── {exercise-name}.{ext} # Implementation stub
|
|
|
+├── {exercise-name}_test.{ext} # Test file
|
|
|
+└── {language-specific-files} # go.mod, package.json, etc.
|
|
|
+```
|
|
|
+
|
|
|
+### Step 2: Create Exercise Directory
|
|
|
+
|
|
|
+1. **Clone the evals repository**:
|
|
|
+
|
|
|
+ ```bash
|
|
|
+ git clone https://github.com/RooCodeInc/Roo-Code-Evals.git evals
|
|
|
+ cd evals
|
|
|
+ ```
|
|
|
+
|
|
|
+2. **Create exercise directory**:
|
|
|
+ ```bash
|
|
|
+ mkdir {language}/{exercise-name}
|
|
|
+ cd {language}/{exercise-name}
|
|
|
+ ```
|
|
|
+
|
|
|
+### Step 3: Write Exercise Instructions
|
|
|
+
|
|
|
+Create `docs/instructions.md` with a clear problem description:
|
|
|
+
|
|
|
+```markdown
|
|
|
+# Instructions
|
|
|
+
|
|
|
+Create an implementation of [problem description].
|
|
|
+
|
|
|
+## Problem Description
|
|
|
+
|
|
|
+[Detailed explanation of what needs to be implemented]
|
|
|
+
|
|
|
+## Examples
|
|
|
+
|
|
|
+- Input: [example input]
|
|
|
+- Output: [expected output]
|
|
|
+
|
|
|
+## Constraints
|
|
|
+
|
|
|
+- [Any constraints or requirements]
|
|
|
+```
|
|
|
+
|
|
|
+**Example from a simple reverse-string exercise**:
|
|
|
+
|
|
|
+```markdown
|
|
|
+# Instructions
|
|
|
+
|
|
|
+Create a function that reverses a string.
|
|
|
+
|
|
|
+## Problem Description
|
|
|
+
|
|
|
+Write a function called `reverse` that takes a string as input and returns the string with its characters in reverse order.
|
|
|
+
|
|
|
+## Examples
|
|
|
+
|
|
|
+- Input: `reverse("hello")` → Output: `"olleh"`
|
|
|
+- Input: `reverse("world")` → Output: `"dlrow"`
|
|
|
+- Input: `reverse("")` → Output: `""`
|
|
|
+- Input: `reverse("a")` → Output: `"a"`
|
|
|
+
|
|
|
+## Constraints
|
|
|
+
|
|
|
+- Input will always be a valid string
|
|
|
+- Empty strings should return empty strings
|
|
|
+```
|
|
|
+
|
|
|
+### Step 4: Create Implementation Stub
|
|
|
+
|
|
|
+Create the main implementation file with function signatures but no implementation:
|
|
|
+
|
|
|
+**Python example** (`reverse_string.py`):
|
|
|
+
|
|
|
+```python
|
|
|
+def reverse(text):
|
|
|
+ pass
|
|
|
+```
|
|
|
+
|
|
|
+**Go example** (`reverse_string.go`):
|
|
|
+
|
|
|
+```go
|
|
|
+package reversestring
|
|
|
+
|
|
|
+// Reverse returns the input string with its characters in reverse order
|
|
|
+func Reverse(s string) string {
|
|
|
+ // TODO: implement
|
|
|
+ return ""
|
|
|
+}
|
|
|
+```
|
|
|
+
|
|
|
+### Step 5: Write Comprehensive Tests
|
|
|
+
|
|
|
+Create test files that validate the implementation:
|
|
|
+
|
|
|
+**Python example** (`reverse_string_test.py`):
|
|
|
+
|
|
|
+```python
|
|
|
+import unittest
|
|
|
+from reverse_string import reverse
|
|
|
+
|
|
|
+class ReverseStringTest(unittest.TestCase):
|
|
|
+ def test_reverse_hello(self):
|
|
|
+ self.assertEqual(reverse("hello"), "olleh")
|
|
|
+
|
|
|
+ def test_reverse_world(self):
|
|
|
+ self.assertEqual(reverse("world"), "dlrow")
|
|
|
+
|
|
|
+ def test_reverse_empty_string(self):
|
|
|
+ self.assertEqual(reverse(""), "")
|
|
|
+
|
|
|
+ def test_reverse_single_character(self):
|
|
|
+ self.assertEqual(reverse("a"), "a")
|
|
|
+```
|
|
|
+
|
|
|
+**Go example** (`reverse_string_test.go`):
|
|
|
+
|
|
|
+```go
|
|
|
+package reversestring
|
|
|
+
|
|
|
+import "testing"
|
|
|
+
|
|
|
+func TestReverse(t *testing.T) {
|
|
|
+ tests := []struct {
|
|
|
+ input string
|
|
|
+ expected string
|
|
|
+ }{
|
|
|
+ {"hello", "olleh"},
|
|
|
+ {"world", "dlrow"},
|
|
|
+ {"", ""},
|
|
|
+ {"a", "a"},
|
|
|
+ }
|
|
|
+
|
|
|
+ for _, test := range tests {
|
|
|
+ result := Reverse(test.input)
|
|
|
+ if result != test.expected {
|
|
|
+ t.Errorf("Reverse(%q) = %q, expected %q", test.input, result, test.expected)
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+```
|
|
|
+
|
|
|
+### Step 6: Add Language-Specific Configuration
|
|
|
+
|
|
|
+**For Go exercises**, create `go.mod`:
|
|
|
+
|
|
|
+```go
|
|
|
+module reverse-string
|
|
|
+
|
|
|
+go 1.18
|
|
|
+```
|
|
|
+
|
|
|
+**For Python exercises**, ensure the parent directory has `pyproject.toml`:
|
|
|
+
|
|
|
+```toml
|
|
|
+[project]
|
|
|
+name = "python-exercises"
|
|
|
+version = "0.1.0"
|
|
|
+description = "Python exercises for Roo Code evals"
|
|
|
+requires-python = ">=3.9"
|
|
|
+dependencies = [
|
|
|
+ "pytest>=8.3.5",
|
|
|
+]
|
|
|
+```
|
|
|
+
|
|
|
+### Step 7: Test Locally
|
|
|
+
|
|
|
+Before committing, test your exercise locally:
|
|
|
+
|
|
|
+**Python**:
|
|
|
+
|
|
|
+```bash
|
|
|
+cd python/reverse-string
|
|
|
+uv run python3 -m pytest -o markers=task reverse_string_test.py
|
|
|
+```
|
|
|
+
|
|
|
+**Go**:
|
|
|
+
|
|
|
+```bash
|
|
|
+cd go/reverse-string
|
|
|
+go test
|
|
|
+```
|
|
|
+
|
|
|
+The tests should **fail** with the stub implementation and **pass** when properly implemented.
|
|
|
+
|
|
|
+## Adding Support for New Programming Languages
|
|
|
+
|
|
|
+Adding a new programming language requires changes to both the evals repository and the main Roo Code repository.
|
|
|
+
|
|
|
+### Step 1: Update Language Configuration
|
|
|
+
|
|
|
+1. **Add language to supported list** in [`packages/evals/src/exercises/index.ts`](../packages/evals/src/exercises/index.ts):
|
|
|
+
|
|
|
+```typescript
|
|
|
+export const exerciseLanguages = [
|
|
|
+ "go",
|
|
|
+ "java",
|
|
|
+ "javascript",
|
|
|
+ "python",
|
|
|
+ "rust",
|
|
|
+ "your-new-language", // Add here
|
|
|
+] as const
|
|
|
+```
|
|
|
+
|
|
|
+### Step 2: Create Language-Specific Prompt
|
|
|
+
|
|
|
+Create `prompts/{language}.md` in the evals repository:
|
|
|
+
|
|
|
+```markdown
|
|
|
+Your job is to complete a coding exercise described the markdown files inside the `docs` directory.
|
|
|
+
|
|
|
+A file with the implementation stubbed out has been created for you, along with a test file (the tests should be failing initially).
|
|
|
+
|
|
|
+To successfully complete the exercise, you must pass all the tests in the test file.
|
|
|
+
|
|
|
+To confirm that your solution is correct, run the tests with `{test-command}`. Do not alter the test file; it should be run as-is.
|
|
|
+
|
|
|
+Do not use the "ask_followup_question" tool. Your job isn't done until the tests pass. Don't attempt completion until you run the tests and they pass.
|
|
|
+
|
|
|
+You should start by reading the files in the `docs` directory so that you understand the exercise, and then examine the stubbed out implementation and the test file.
|
|
|
+```
|
|
|
+
|
|
|
+Replace `{test-command}` with the appropriate testing command for your language.
|
|
|
+
|
|
|
+### Step 3: Update Docker Configuration
|
|
|
+
|
|
|
+Modify [`packages/evals/Dockerfile.runner`](../packages/evals/Dockerfile.runner) to install the new language runtime:
|
|
|
+
|
|
|
+```dockerfile
|
|
|
+# Install your new language runtime
|
|
|
+RUN apt update && apt install -y your-language-runtime
|
|
|
+
|
|
|
+# Or for languages that need special installation:
|
|
|
+ARG YOUR_LANGUAGE_VERSION=1.0.0
|
|
|
+RUN curl -sSL https://install-your-language.sh | sh -s -- --version ${YOUR_LANGUAGE_VERSION}
|
|
|
+```
|
|
|
+
|
|
|
+### Step 4: Update Test Runner Integration
|
|
|
+
|
|
|
+If your language requires special test execution, update [`packages/evals/src/cli/runUnitTest.ts`](../packages/evals/src/cli/runUnitTest.ts) to handle the new language's testing framework.
|
|
|
+
|
|
|
+### Step 5: Create Initial Exercises
|
|
|
+
|
|
|
+Create at least 2-3 exercises for the new language following the structure described in the previous section.
|