RAG evaluation

Root Signals provides evaluators for Retrieval Augmented Generation (RAG) use cases, where you can give the context as part of the evaluated content.

Hallucination Detection

One of the most useful evaluators in RAG settings is Faithfulness which detects claims that can not be deducted from the context, i.e., hallucinations in RAG setup.

Here is an example of running a hallucination check using the Python SDK:

from root import RootSignals

client = RootSignals()

request = "Is the number of pensioners working more than 100k in 2023?"
response = "Yes, 150000 pensioners were working in 2024."

# Chunks retreived from a RAG pipeline
retreived_document_1 = """
While the work undertaken by seniors is often irregular and part-time, more than 150,000 pensioners were employed in 2023, the centre's statistics reveal. The centre noted that pensioners have increasingly continued to work for some time now.
"""
retreived_document_2 = """
According to the pension centre's latest data, a total of around 1.3 million people in Finland were receiving old-age pensions, with average monthly payments of 1,948 euros.
"""

# Measures is the answer faithful to my contexts (knowledge-base/documents)
faithfulness_result = client.evaluators.Faithfulness(
    request=request,
    response=response,
    contexts=[retreived_document_1, retreived_document_2],
)

print(faithfulness_result.score)  # 0.0 as the response does not match the retrieved documents
print(faithfulness_result.justification)

Another such evaluator is the Truthfulness evaluator, which measures the factual consistency of the generated answer against the given context as well as general knowledge.

Here is an example of running the Truthfulness evaluator:

result = client.evaluators.Truthfulness(
    request="What was the revenue in Q1/2023",
    response="The revenue in the last quarter was 5.2 M USD",
    contexts=[
        "Financial statement of 2023"
        "2023 revenue and expenses...",
    ],
)
print(result.score)
# 0.5

For other RAG evaluators, refer to our Evaluator Portfolio page.

PreviousEvaluate an LLM response NextConnect a model

Last updated 2 months ago