Root Signals Docs
  • Intro
  • QUICK START
    • Getting started in 30 seconds
  • OVERVIEW
    • Why Anything?
    • Concepts
  • USAGE
    • Usage
      • Models
      • Objectives
      • Evaluators
      • Datasets
        • Dataset permissions
      • Execution, Auditability and Versioning
      • Access Controls & Roles
      • Lifecycle Management
    • Cookbook
      • Add a custom evaluator
        • Add a calibration set
      • Evaluate an LLM response
      • Use evaluators and RAG
      • Connect a model
      • SDK Examples
      • Poker app
  • Integrations
    • Haystack
    • LangGraph
    • LangChain
    • LlamaIndex
  • Frequently Asked Questions
  • Breaking Change Policy
  • RESOURCES
    • Python SDK
    • Github Repo
    • REST API
    • Root Signals MCP
Powered by GitBook
On this page
Export as PDF
  1. USAGE
  2. Cookbook
  3. Add a custom evaluator

Add a calibration set

PreviousAdd a custom evaluatorNextEvaluate an LLM response

Last updated 5 months ago

To ensure the reliability of the evaluator, you can create and use test data, referred to as a calibration dataset. A calibration set is a collection of LLM outputs, prompts, and expected scores that serve as benchmarks for evaluator performance.


1. Attaching a Calibration Set

Start by attaching an empty calibration set to the evaluator:

  1. Navigate to the Direct Language evaluator page and click Edit.

  2. Select the Calibration section and click Add Dataset.

  3. Name the dataset (e.g., “Direct Language Calibration Set”).

  4. Optionally, add sample rows, such as:

    "0,2","I am pretty sure that is what we need to do"
  5. Click Save and close the dataset editor.

  6. Optionally, click the Calibrate button to run the calibration set.

  7. Save the evaluator


2. Adding Production Samples to the Calibration Set

You can enhance your calibration set using real-world data from evaluator runs stored in the execution log.

  1. Go to the page.

  2. Locate a relevant evaluator run and click on it.

  3. Click Add to Calibration Dataset to include its output and score in the calibration set.

By regularly updating and running the calibration set, you safeguard the evaluator against unexpected behavior, ensuring its continued accuracy and reliability.

Direct Language
Execution Logs