Usage Flows
Root Signals enables several key workflows that transform how organizations measure, optimize, and control their AI applications. These flows represent common patterns for leveraging the platform's capabilities to achieve concrete outcomes.
Flow 1: Explicit Decomposition Structure as the First Class Citizen
In this flow, we transform a description of the workflow or measurement problem into a judge, consisting of a concrete set of evaluators that precisely measure success. The process involves:
Success Criteria Definition: Start with your business problem or use case description, and/or what dimensions of success matter for your specific context
Evaluator Selection: Map success criteria to specific evaluators from the Root Signals portfolio or create custom ones
Evaluator Construction: Create custom evaluators for key measurement targets
Judge Assembly: Combine selected evaluators into a coherent measurement strategy
Example: For a customer service chatbot, the problem "a chatbot for which we must ensure helpful and accurate responses" might decompose into:
Relevance evaluator (responses address the customer's question)
Completeness evaluator (all aspects of queries are addressed)
Politeness evaluator (maintaining professional tone)
Policy adherence evaluator (following company guidelines)
Flow 2: Optimization Flow
Evaluator-Driven Improvement of Prompts and Models for Operational Prompts
Given a set of evaluators, this flow systematically improves your AI application's performance:
Baseline Measurement: Evaluate current prompts and models against the evaluators
Variation Testing: Test different prompts, models, and configurations
Optimal Performance Selection: Choose the configuration that maximizes evaluator scores against costs, and latencies
Key considerations:
Balance accuracy improvements against cost increases
Consider latency requirements for real-time applications
Calibration Data-Driven Improvement of Predicates and Models for Evaluators
Given a calibration dataset, this flow systematically improves the performance of individual evaluators:
Baseline Measurement: Evaluate the current predicate and model against the calibration dataset
Variation Testing: Test different predicates, models, and configurations
Optimal Performance Selection: Choose the configuration that maximizes calibration scores against costs, and latencies
Key considerations:
Balance accuracy improvements against cost increases.
Consider latency requirements for real-time applications. Note some workflows are not sensitive to latency (email, offline agent operations)
Flow 3: Offline Data Measurement and Scoring
Transform Existing Data into Actionable Insights
This flow applies evaluators to existing datasets or LLM input-ouput telemetry, enabling data quality assessment and filtering:
Data Ingestion: Load transcripts, chat logs, or other text data
Evaluator Application: Score each data point across the multiple evaluation dimensions
Metadata Enrichment: Attach scores as searchable metadata
Filtering and Analysis: Identify high/low quality samples, policy violations, or improvement opportunities
Applications:
Call center transcript analysis (clarity, policy alignment, customer satisfaction indicators)
Training data curation (identifying high-quality examples)
Compliance monitoring (detecting policy violations)
Quality assurance sampling (focusing review on problematic cases)
Flow 4: Automated Self-Improvement and Rectification with Evaluation Feedback**
This flow creates a feedback loop that automatically improves content based on evaluation results:
Initial Evaluation: Score the original content with relevant evaluators
Feedback Generation: Extract scores and justifications from evaluators
Improvement Execution:
For LLM-generated content: Re-prompt the original model with evaluation feedback
For existing content: Pass to any LLM with improvement instructions based on evaluator feedback
Verification: Re-evaluate to confirm improvements
Use cases:
Iterative response refinement in production
Batch improvement of historical data
Automated content enhancement pipelines
Self-improving AI systems
Flow 5: Guard Rail Flow: Real-Time Protection Through Evaluation-Based Blocking
This flow implements safety and quality controls by preventing substandard LLM outputs from reaching users:
Threshold Definition: Set minimum acceptable scores for critical evaluators
Real-Time Evaluation: Score LLM outputs before delivery
Conditional Blocking: Prevent responses that fall below thresholds from being served
Fallback Handling: Trigger alternative responses or escalation procedures for blocked content
Implementation strategies:
Critical evaluators: Harmlessness, confidentiality, policy adherence
Quality thresholds: Minimum coherence, relevance, or completeness scores
Graceful degradation: Provide safe default responses when blocking occurs
Logging and alerting: Track blocked responses for system improvement
Applications:
Customer-facing chatbots requiring brand safety
Healthcare AI with strict accuracy requirements
Financial services with regulatory compliance needs
Educational tools requiring age-appropriate content
Flow 6: Lean Observation Flow
Zero-Impact Monitoring of LLM Traffic
This flow enables comprehensive observability without affecting application performance:
With Root Proxy (Simpler Implementation)
Proxy Configuration: Route LLM traffic through Root Signals proxy
Automatic Capture: All requests and responses logged transparently
Asynchronous Processing of Evaluations: Evaluations occur out-of-band
Dashboard Visibility: Real-time metrics
Benefits:
No code changes required in application, only base_url update
Automatic request/response pairing
Built-in retry and error handling
Centralized configuration management
Without Proxy (Direct Integration)
Asynchronous Logging: Send request/response pairs to Root Signals API
Non-Blocking Implementation: Use fire-and-forget pattern or background queues
Batching Strategy: Aggregate logs for efficient transmission
Resilient Design: Handle logging failures without affecting main flow
Benefits:
Full control over what gets logged
No network topology changes
Custom metadata enrichment
Selective logging based on business logic
Key considerations for both approaches:
Zero latency addition: Logging happens asynchronously
High-volume support: Handles production-scale traffic
Cost optimization: Sample high-volume, low-risk traffic
Last updated