Add a calibration test set

To ensure the reliability of the Direct Language evaluator, you can create and use test data, referred to as a calibration dataset. A calibration set is a collection of LLM outputs, prompts, and expected scores that serve as benchmarks for evaluator performance.

1. Attaching a Calibration Set

Start by attaching an empty calibration set to the evaluator:

Navigate to the Direct Language evaluator page and click Edit.
Select the Calibration section and click Add Dataset.
Name the dataset (e.g., “Direct Language Calibration Set”).

Optionally, add sample rows, such as:

"0,2","I am pretty sure that is what we need to do"

Click Save and close the dataset editor.
Optionally, click the Calibrate button to run the calibration set.
Save the evaluator

2. Adding Production Samples to the Calibration Set

You can enhance your calibration set using real-world data from evaluator runs stored in the execution log.

Go to the Execution Logs page.
Locate a relevant evaluator run and click on it.
Click Add to Calibration Dataset to include its output and score in the calibration set.

By regularly updating and running the calibration set, you safeguard the evaluator against unexpected behavior, ensuring its continued accuracy and reliability.

PreviousAdd a custom evaluator NextEvaluate an LLM response

Last updated 8 months ago