Usage Flows

Root Signals enables several key workflows that transform how organizations measure, optimize, and control their AI applications. These flows represent common patterns for leveraging the platform's capabilities to achieve concrete outcomes.

Flow 1: Explicit Decomposition Structure as the First Class Citizen

In this flow, we transform a description of the workflow or measurement problem into a judge, consisting of a concrete set of evaluators that precisely measure success. The process involves:

Success Criteria Definition: Start with your business problem or use case description, and/or what dimensions of success matter for your specific context
Evaluator Selection: Map success criteria to specific evaluators from the Root Signals portfolio or create custom ones
Evaluator Construction: Create custom evaluators for key measurement targets
Judge Assembly: Combine selected evaluators into a coherent measurement strategy

Example: For a customer service chatbot, the problem "a chatbot for which we must ensure helpful and accurate responses" might decompose into:

Relevance evaluator (responses address the customer's question)
Completeness evaluator (all aspects of queries are addressed)
Politeness evaluator (maintaining professional tone)
Policy adherence evaluator (following company guidelines)

Flow 2: Optimization Flow

Evaluator-Driven Improvement of Prompts and Models for Operational Prompts

Given a set of evaluators, this flow systematically improves your AI application's performance:

Baseline Measurement: Evaluate current prompts and models against the evaluators
Variation Testing: Test different prompts, models, and configurations
Optimal Performance Selection: Choose the configuration that maximizes evaluator scores against costs, and latencies

Key considerations:

Balance accuracy improvements against cost increases
Consider latency requirements for real-time applications

Calibration Data-Driven Improvement of Predicates and Models for Evaluators

Given a calibration dataset, this flow systematically improves the performance of individual evaluators:

Baseline Measurement: Evaluate the current predicate and model against the calibration dataset
Variation Testing: Test different predicates, models, and configurations
Optimal Performance Selection: Choose the configuration that maximizes calibration scores against costs, and latencies

Key considerations:

Balance accuracy improvements against cost increases.
Consider latency requirements for real-time applications. Note some workflows are not sensitive to latency (email, offline agent operations)

Flow 3: Offline Data Measurement and Scoring

Transform Existing Data into Actionable Insights

This flow applies evaluators to existing datasets or LLM input-ouput telemetry, enabling data quality assessment and filtering:

Data Ingestion: Load transcripts, chat logs, or other text data
Evaluator Application: Score each data point across the multiple evaluation dimensions
Metadata Enrichment: Attach scores as searchable metadata
Filtering and Analysis: Identify high/low quality samples, policy violations, or improvement opportunities

Applications:

Call center transcript analysis (clarity, policy alignment, customer satisfaction indicators)
Training data curation (identifying high-quality examples)
Compliance monitoring (detecting policy violations)
Quality assurance sampling (focusing review on problematic cases)

Flow 4: Automated Self-Improvement and Rectification with Evaluation Feedback**

This flow creates a feedback loop that automatically improves content based on evaluation results:

Initial Evaluation: Score the original content with relevant evaluators
Feedback Generation: Extract scores and justifications from evaluators
Improvement Execution:
- For LLM-generated content: Re-prompt the original model with evaluation feedback
- For existing content: Pass to any LLM with improvement instructions based on evaluator feedback
Verification: Re-evaluate to confirm improvements

Use cases:

Iterative response refinement in production
Batch improvement of historical data
Automated content enhancement pipelines
Self-improving AI systems

Flow 5: Guardrail Flow: Real-Time Protection Through Evaluation-Based Blocking

This flow implements safety and quality controls by preventing substandard LLM outputs from reaching users:

Threshold Definition: Set minimum acceptable scores for critical evaluators
Real-Time Evaluation: Score LLM outputs before delivery
Conditional Blocking: Prevent responses that fall below thresholds from being served
Fallback Handling: Trigger alternative responses or escalation procedures for blocked content

Implementation strategies:

Critical evaluators: Harmlessness, confidentiality, policy adherence
Quality thresholds: Minimum coherence, relevance, or completeness scores
Graceful degradation: Provide safe default responses when blocking occurs
Logging and alerting: Track blocked responses for system improvement

Applications:

Customer-facing chatbots requiring brand safety
Healthcare AI with strict accuracy requirements
Financial services with regulatory compliance needs
Educational tools requiring age-appropriate content

Flow 6: Lean Observation Flow

Zero-Impact Monitoring of LLM Traffic

This flow enables comprehensive observability without affecting application performance:

With Root Proxy (Simpler Implementation)

Proxy Configuration: Route LLM traffic through Root Signals proxy
Automatic Capture: All requests and responses logged transparently
Asynchronous Processing of Evaluations: Evaluations occur out-of-band
Dashboard Visibility: Real-time metrics

Benefits:

No code changes required in application, only base_url update
Automatic request/response pairing
Built-in retry and error handling
Centralized configuration management

Without Proxy (Direct Integration)

Asynchronous Logging: Send request/response pairs to Root Signals API
Non-Blocking Implementation: Use fire-and-forget pattern or background queues
Batching Strategy: Aggregate logs for efficient transmission
Resilient Design: Handle logging failures without affecting main flow

Benefits:

Full control over what gets logged
No network topology changes
Custom metadata enrichment
Selective logging based on business logic

Key considerations for both approaches:

Zero latency addition: Logging happens asynchronously
High-volume support: Handles production-scale traffic
Cost optimization: Sample high-volume, low-risk traffic

PreviousPrinciples NextRoadmap

Last updated 3 months ago