Frequently Asked Questions

Terminology

What is Intent for?

Intent is the high-level, human-understandable description of the attribute an Evaluator measures. For example: “To measure how clearly the returns handler explains the 20% discount offer on the next purchase”.

What are Datasets?

Datasets allow you to bring test data for benchmarking (Root & Custom) and optimizing (Custom) evaluators.

Behaviour

Does Intent change the behaviour of the evaluator?

No. Evaluator Intent does not alter the evaluator behaviour.

Does Calibration change the behaviour of the evaluator?

No. Calibration is for benchmarking (testing) the evaluators to understand whether they are "calibrated" to your expected/desired behaviour or not. Calibration samples do not alter the behaviour of the evaluators.

How do Demonstrations work?

Demonstrations are used as in-context few-shot samples combined with our well-tuned meta-prompt. They are not utilized for supervised fine-tuning (SFT).

Usage

Our stack is not in Python, can we still use Root Signals?

Absolutely. We have a REST API that you can run from your favourite tech stack.

Do I need to have Calibrations for all Custom Evaluators?

You do not have to bring Calibration samples but we strongly recommend at least a handful of them in order to understand the behaviour of the evaluators.

Can I change the behaviour of the evaluator by bringing labeled data?

You can change the behaviour of your Custom Evaluators by bringing annotated samples as Demonstrations. Behaviour of Root Evaluators can not be altered.

Can I run a previous version of a Custom Evaluator?

Yes.

If we already have a ground truth expected output, can we use your evaluators?

Yes. Various evaluators from us support reference-based evaluations where you can bring your ground truth expected responses. See our evaluator catalogue here.

Can I integrate Root Signals evaluators to experiment tracking tools such as MLflow etc.?

Yes. Our evaluators return a structured response (e.g. a dictionary) with scores, justifications, tags etc. These results can be logged to any experiment tracking system or database similar to any other metric, metadata, or attribute.

Models

What is the LLM that powers ready-made Root evaluators? Can I change it?

Root Evaluators are powered by various LLMs under the hood. This can not be changed except for on-premise deployments.

Are Evaluators/Judges deterministic?

No. We have tight confidence intervals (for the same input) but small fluctuations are to be expected. Expected standard deviations can be found in our docs.

PreviousVertex AI Agent Builder NextBreaking Change Policy

Last updated 1 month ago