Root Signals Docs
  • Intro
  • QUICK START
    • Getting started in 30 seconds
  • OVERVIEW
    • Why Anything?
    • Concepts
  • USAGE
    • Usage
      • Models
      • Objectives
      • Evaluators
      • Datasets
        • Dataset permissions
      • Execution, Auditability and Versioning
      • Access Controls & Roles
      • Lifecycle Management
    • Cookbook
      • Add a custom evaluator
        • Add a calibration set
      • Evaluate an LLM response
      • Use evaluators and RAG
      • Connect a model
      • SDK Examples
      • Poker app
  • Integrations
    • Haystack
    • LangGraph
    • LangChain
    • LlamaIndex
  • Frequently Asked Questions
  • Breaking Change Policy
  • RESOURCES
    • Python SDK
    • Github Repo
    • REST API
    • Root Signals MCP
Powered by GitBook
On this page
  • Terminology
  • Behaviour
  • Usage
  • Models
Export as PDF

Frequently Asked Questions

Terminology

What is Intent for?

Intent is the high-level, human-understandable description of the attribute an Evaluator measures. For example: “To measure how clearly the returns handler explains the 20% discount offer on the next purchase”.

What are Datasets?

Datasets allow you to bring test data for benchmarking (Root & Custom) and optimizing (Custom) evaluators.

Behaviour

Does Intent change the behaviour of the evaluator?

No. Evaluator Intent does not alter the evaluator behaviour.

Does Calibration change the behaviour of the evaluator?

No. Calibration is for benchmarking (testing) the evaluators to understand whether they are "calibrated" to your expected/desired behaviour or not. Calibration samples do not alter the behaviour of the evaluators.

How do Demonstrations work?

Demonstrations are used as in-context few-shot samples combined with our well-tuned meta-prompt. They are not utilized for supervised fine-tuning (SFT).

Usage

Our stack is not in Python, can we still use Root Signals?
Do I need to have Calibrations for all Custom Evaluators?

You do not have to bring Calibration samples but we strongly recommend at least a handful of them in order to understand the behaviour of the evaluators.

Can I change the behaviour of the evaluator by bringing labeled data?

You can change the behaviour of your Custom Evaluators by bringing annotated samples as Demonstrations. Behaviour of Root Evaluators can not be altered.

Can I run a previous version of a Custom Evaluator?

Yes.

If we already have a ground truth expected output, can we use your evaluators?
How can I differentiate evaluations and related statistics for different applications (or versions) of mine?
Can I integrate Root Signals evaluators to experiment tracking tools such as MLflow etc.?

Yes. Our evaluators return a structured response (e.g. a dictionary) with scores, justifications, tags etc. These results can be logged to any experiment tracking system or database similar to any other metric, metadata, or attribute.

Models

What is the LLM that powers ready-made Root evaluators? Can I change it?

Root Evaluators are powered by various LLMs under the hood. This can not be changed except for on-premise deployments.

PreviousLlamaIndexNextBreaking Change Policy

Last updated 1 month ago

Absolutely. We have a that you can run from your favourite tech stack.

Yes. Various evaluators from us support reference-based evaluations where you can bring your ground truth expected responses. See our .

You can use arbitrary tags for evaluation executions. See the .

REST API
evaluator catalogue here
example here