Usage Flows

Root Signals enables several key workflows that transform how organizations measure, optimize, and control their AI applications. These flows represent common patterns for leveraging the platform's capabilities to achieve concrete outcomes.

Flow 1: Explicit Decomposition Structure as the First Class Citizen

In this flow, we transform a description of the workflow or measurement problem into a judge, consisting of a concrete set of evaluators that precisely measure success. The process involves:

  1. Success Criteria Definition: Start with your business problem or use case description, and/or what dimensions of success matter for your specific context

  2. Evaluator Selection: Map success criteria to specific evaluators from the Root Signals portfolio or create custom ones

  3. Evaluator Construction: Create custom evaluators for key measurement targets

  4. Judge Assembly: Combine selected evaluators into a coherent measurement strategy

Example: For a customer service chatbot, the problem "a chatbot for which we must ensure helpful and accurate responses" might decompose into:

  • Relevance evaluator (responses address the customer's question)

  • Completeness evaluator (all aspects of queries are addressed)

  • Politeness evaluator (maintaining professional tone)

  • Policy adherence evaluator (following company guidelines)

Flow 2: Optimization Flow

Evaluator-Driven Improvement of Prompts and Models for Operational Prompts

Given a set of evaluators, this flow systematically improves your AI application's performance:

  1. Baseline Measurement: Evaluate current prompts and models against the evaluators

  2. Variation Testing: Test different prompts, models, and configurations

  3. Optimal Performance Selection: Choose the configuration that maximizes evaluator scores against costs, and latencies

Key considerations:

  • Balance accuracy improvements against cost increases

  • Consider latency requirements for real-time applications

Calibration Data-Driven Improvement of Predicates and Models for Evaluators

Given a calibration dataset, this flow systematically improves the performance of individual evaluators:

  1. Baseline Measurement: Evaluate the current predicate and model against the calibration dataset

  2. Variation Testing: Test different predicates, models, and configurations

  3. Optimal Performance Selection: Choose the configuration that maximizes calibration scores against costs, and latencies

Key considerations:

  • Balance accuracy improvements against cost increases.

  • Consider latency requirements for real-time applications. Note some workflows are not sensitive to latency (email, offline agent operations)

Flow 3: Offline Data Measurement and Scoring

Transform Existing Data into Actionable Insights

This flow applies evaluators to existing datasets or LLM input-ouput telemetry, enabling data quality assessment and filtering:

  1. Data Ingestion: Load transcripts, chat logs, or other text data

  2. Evaluator Application: Score each data point across the multiple evaluation dimensions

  3. Metadata Enrichment: Attach scores as searchable metadata

  4. Filtering and Analysis: Identify high/low quality samples, policy violations, or improvement opportunities

Applications:

  • Call center transcript analysis (clarity, policy alignment, customer satisfaction indicators)

  • Training data curation (identifying high-quality examples)

  • Compliance monitoring (detecting policy violations)

  • Quality assurance sampling (focusing review on problematic cases)

Flow 4: Automated Self-Improvement and Rectification with Evaluation Feedback**

This flow creates a feedback loop that automatically improves content based on evaluation results:

  1. Initial Evaluation: Score the original content with relevant evaluators

  2. Feedback Generation: Extract scores and justifications from evaluators

  3. Improvement Execution:

    • For LLM-generated content: Re-prompt the original model with evaluation feedback

    • For existing content: Pass to any LLM with improvement instructions based on evaluator feedback

  4. Verification: Re-evaluate to confirm improvements

Use cases:

  • Iterative response refinement in production

  • Batch improvement of historical data

  • Automated content enhancement pipelines

  • Self-improving AI systems

Flow 5: Guard Rail Flow: Real-Time Protection Through Evaluation-Based Blocking

This flow implements safety and quality controls by preventing substandard LLM outputs from reaching users:

  1. Threshold Definition: Set minimum acceptable scores for critical evaluators

  2. Real-Time Evaluation: Score LLM outputs before delivery

  3. Conditional Blocking: Prevent responses that fall below thresholds from being served

  4. Fallback Handling: Trigger alternative responses or escalation procedures for blocked content

Implementation strategies:

  • Critical evaluators: Harmlessness, confidentiality, policy adherence

  • Quality thresholds: Minimum coherence, relevance, or completeness scores

  • Graceful degradation: Provide safe default responses when blocking occurs

  • Logging and alerting: Track blocked responses for system improvement

Applications:

  • Customer-facing chatbots requiring brand safety

  • Healthcare AI with strict accuracy requirements

  • Financial services with regulatory compliance needs

  • Educational tools requiring age-appropriate content

Flow 6: Lean Observation Flow

Zero-Impact Monitoring of LLM Traffic

This flow enables comprehensive observability without affecting application performance:

With Root Proxy (Simpler Implementation)

  1. Proxy Configuration: Route LLM traffic through Root Signals proxy

  2. Automatic Capture: All requests and responses logged transparently

  3. Asynchronous Processing of Evaluations: Evaluations occur out-of-band

  4. Dashboard Visibility: Real-time metrics

Benefits:

  • No code changes required in application, only base_url update

  • Automatic request/response pairing

  • Built-in retry and error handling

  • Centralized configuration management

Without Proxy (Direct Integration)

  1. Asynchronous Logging: Send request/response pairs to Root Signals API

  2. Non-Blocking Implementation: Use fire-and-forget pattern or background queues

  3. Batching Strategy: Aggregate logs for efficient transmission

  4. Resilient Design: Handle logging failures without affecting main flow

Benefits:

  • Full control over what gets logged

  • No network topology changes

  • Custom metadata enrichment

  • Selective logging based on business logic

Key considerations for both approaches:

  • Zero latency addition: Logging happens asynchronously

  • High-volume support: Handles production-scale traffic

  • Cost optimization: Sample high-volume, low-risk traffic

Last updated