Run batch evaluations
The Judge Batch Execution API allows you to evaluate multiple request-response pairs in parallel using a single judge. This is ideal for bulk evaluation scenarios like testing datasets and offline evals.
Typical Workflow
Step 1: Create a Batch Execution
Submit multiple inputs for evaluation. The API returns immediately with a batch execution ID.
curl -X POST "https://api.app.rootsignals.ai/v1/judges/{my_judge_id}/batch-execute/" \
-H "Authorization: Api-Key ${ROOTSIGNALS_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"inputs": [
{
"request": "What is the capital of France?",
"response": "Paris is the capital and largest city of France."
},
{
"request": "What is the capital of Spain?",
"response": "Madrid is the capital of Spain."
},
{
"request": "What is the capital of Italy?",
"response": "Rome is the capital city of Italy."
}
],
"tags": ["my-app-v1.2"]
}'Request Parameters:
inputs(required): Array of evaluation inputs (min: 1, max: 100)requestThe input/prompt/questionresponseThe output/answer to evaluatecontexts(optional): Array of context strings (if judge requires it)functions(optional): Array of function definitions (if judge requires it)expected_output(optional): Expected output for comparison (if judge requires it)
tags(optional): Array of strings to tag the execution logs withjudge_version_id(optional): Specific judge version UUID (defaults to latest)
Response (202 Accepted):
{
"batch_execution_id": "123e4567-e89b-12d3-a456-426614174000",
"status_url": "/v1/judges/batch-executions/123e4567-e89b-12d3-a456-426614174000/"
}Step 2: Poll for Status
Check the progress of your batch execution. Poll this endpoint until status is completed or failed.
BATCH_ID="123e4567-e89b-12d3-a456-426614174000"
curl -X GET "https://api.app.rootsignals.ai/v1/judges/batch-executions/${BATCH_ID}/" \
-H "Authorization: Api-Key ${ROOTSIGNALS_API_KEY}"Response:
{
"batch_execution_id": "123e4567-e89b-12d3-a456-426614174000",
"status": "processing",
"total_count": 3,
...
}Batch Status Values:
pending: Batch is queued and waiting to startprocessing: Batch is currently being executedcompleted: All items completed (check individual items for failures)failed: Entire batch failed
Item Status Values:
pending: Item waiting to be processedprocessing: Item currently being evaluatedcompleted: Item evaluation finishedfailed: Item evaluation failed
Step 3: Retrieve Results
Once status is completed, all evaluator results are available in the response.
"items": [
{
"index": 0,
"status": "completed",
"input": {
"request": "What is the capital of France?",
"response": "Paris is the capital and largest city of France.",
"contexts": null,
"functions": null,
"expected_output": null
},
"evaluator_results": [
{
"score": 0.95,
"justification": "The response is relevant to the request...",
"evaluator_name": "Relevance"
},
...
]
},
...
]
}Last updated