Roadmap

Root Signals builds with the philosophy of transparency with multiple open source projects. This roadmap is a living document about what we're working on and what's next. Root Signals is the world's most principled and powerful system for measuring the behavior of LLM based applications, agents and workflows.

Scorable is the automated LLM Evaluation Engineer agent for co-managing this platform with you.

Vision

Our visin is to create and auto-optimize the strongest automated knowledge process evaluation stack possible, with the least amount of effort and information from the user.

  • Maximum Automated Information Extraction

    • From user intent and/or provided example/instruction data, extract as much relevant information as possible.

  • Awareness of the information quality

    • Engage the user with the smallest amount of maximally impactful questions.

  • Maximally Powerful Evaluation Stack Generation

    • Build the most comprehensive and accurate evaluation capabilities possible, within the confines of data available.

  • Built for Agents

    • Maximum compatibility with autonomous agents and workflows.

  • Maximum Integration Surface

    • Seamless integration with all key AI frameworks.

  • EvalOps Principles for Long Term

    • Follow Root EvalOps Principles for evaluator lifecycle management.

All feedback is highly appreciated and often leads to immediate action. Submit new GitHub issues or vote on existing ones, so we can take quick action on what is important to you.

🚀 Recently Released

  • Automated Policy Adherence Judges

    • Create judges from uploaded policy documents and intents

  • GDPR awareness of models

    • Filter out models not complying with GDPR

  • Evaluator Calibration Data Synthesizer v1.0

    • In the evaluator drill-in view, expand your calibration dataset from 1 or more examples

  • Evaluator version history to include all native Root Evaluators

  • Evaluator standard deviation ranges in reference datasets

  • Root Judge LLM 70B judge available for download and running in Root Signals for free!

🏗️ Next Up

  • Public Evaluation Reports

    • Generate HTML reports from any judge execution

  • TypeScript SDK

  • Rehashing of Example-driven Evaluation

    • Smoothly create the full judge from examples

  • Native Speech Evaluator API

    • Upload or stream audio directly to evaluators

  • Unified Experiments framework to Replace Skill Tests

  • Command Line Interface

  • Advanced Judge visibility controls

    • RBAC coverage on Judges (as in Evaluators, Skills and Datasets)

  • Output Refinement At-Origin

    • Refine your LLM outputs automatically based on scores

🗓️ Planned

Scorable Features

  • Agentic Classifier Generation 2.0

    • Create classifiers with the same robustness as metric evaluator stacks

  • Automatic Context Engineering

    • Refine your prompt templates automatically based on scores

  • Support all RAG evaluators

Core Platform Features

  • Improved Playground View

Root Evaluators

  • Agent Evaluation Pack 2.0

  • (Root Evaluator list expanding every 1-2 weeks, stay tuned)

Integrations

  • Full OpenTelemetry Support

  • LiteLLM Direct Support

  • OpenRouter Support

  • (more coming)

Developer Tools

  • Sync Judge & Evaluator Definitions to GitHub

Community & Deployment

  • Community Evals

  • Self-Hostable Evaluation Executor

MCP

  • Remote MCP Server

  • MCP Feature Extension Pack

    • Full judge feature access

    • Full log insights access

Models support

  • Reasoner-specific model parameters (incl. budget) in evaluators

  • (model support list continuously expanded, stay tuned)

More Planned Features coming as we sync our changelogs and the rest of the internal roadmap contents!

Feature Requests and Bug Reports:

🐛 Bug Reports: GitHub Issues

📧 Enterprise Features: Contact [email protected]

💡 General: Discord


Last updated: 2025-06-30

Last updated