Run batch evaluations
The Judge Batch Execution API allows you to evaluate multiple request-response pairs in parallel using a single judge. This is ideal for bulk evaluation scenarios like testing datasets and offline evals.
Typical Workflow
Step 1: Create a Batch Execution
Submit multiple inputs for evaluation. The API returns immediately with a batch execution ID.
curl -X POST "https://api.app.rootsignals.ai/v1/judges/{my_judge_id}/batch-execute/" \
-H "Authorization: Api-Key ${ROOTSIGNALS_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"inputs": [
{
"request": "What is the capital of France?",
"response": "Paris is the capital and largest city of France."
},
{
"request": "What is the capital of Spain?",
"response": "Madrid is the capital of Spain."
},
{
"request": "What is the capital of Italy?",
"response": "Rome is the capital city of Italy."
}
],
"tags": ["my-app-v1.2"]
}'
Request Parameters:
inputs
(required): Array of evaluation inputs (min: 1, max: 100)request
The input/prompt/questionresponse
The output/answer to evaluatecontexts
(optional): Array of context strings (if judge requires it)functions
(optional): Array of function definitions (if judge requires it)expected_output
(optional): Expected output for comparison (if judge requires it)
tags
(optional): Array of strings to tag the execution logs withjudge_version_id
(optional): Specific judge version UUID (defaults to latest)
Response (202 Accepted):
{
"batch_execution_id": "123e4567-e89b-12d3-a456-426614174000",
"status_url": "/v1/judges/batch-executions/123e4567-e89b-12d3-a456-426614174000/"
}
Step 2: Poll for Status
Check the progress of your batch execution. Poll this endpoint until status is completed
or failed
.
BATCH_ID="123e4567-e89b-12d3-a456-426614174000"
curl -X GET "https://api.app.rootsignals.ai/v1/judges/batch-executions/${BATCH_ID}/" \
-H "Authorization: Api-Key ${ROOTSIGNALS_API_KEY}"
Response:
{
"batch_execution_id": "123e4567-e89b-12d3-a456-426614174000",
"status": "processing",
"total_count": 3,
...
}
Batch Status Values:
pending
: Batch is queued and waiting to startprocessing
: Batch is currently being executedcompleted
: All items completed (check individual items for failures)failed
: Entire batch failed
Item Status Values:
pending
: Item waiting to be processedprocessing
: Item currently being evaluatedcompleted
: Item evaluation finishedfailed
: Item evaluation failed
Step 3: Retrieve Results
Once status
is completed
, all evaluator results are available in the response.
"items": [
{
"index": 0,
"status": "completed",
"input": {
"request": "What is the capital of France?",
"response": "Paris is the capital and largest city of France.",
"contexts": null,
"functions": null,
"expected_output": null
},
"evaluator_results": [
{
"score": 0.95,
"justification": "The response is relevant to the request...",
"evaluator_name": "Relevance"
},
...
]
},
...
]
}
Last updated