Documentation Index
Fetch the complete documentation index at: https://docs.buildassistants.app/llms.txt
Use this file to discover all available pages before exploring further.
Background and Purpose
This SOP guides users on configuring and optimizing AI response times and queuing settings for AI assistants. The purpose is to balance response speed, contextual accuracy, and a human-like feel, aligning with user preferences and expectations.Performance Targets
Response Time Goals
- Simple responses: 2-3 seconds
- Complex responses with tools: 6-9 seconds
- Human-like interactions: 15+ seconds with synthetic delays
Key Benefits
- ✅ Improved user experience through optimized response timing
- ✅ Balanced performance between speed and accuracy
- ✅ Customizable interaction feel from instant to human-like
- ✅ Resource optimization through efficient configuration
- ✅ Scalable performance across different use cases
Step-by-Step Optimization Process
1. Set Up the AI Assistant
Initial Configuration
- Log into your AI configuration portal (buildassistants.app)
- Navigate to the workspace and create a new assistant
- Name appropriately (e.g., “Response Time Optimizer”)
- Add an Active Tag to enable logging and monitoring of this assistant
Purpose of Active Tags
- Enable logging for performance monitoring
- Track response metrics across conversations
- Monitor optimization impact over time
2. Adjust Queuing Times
Access Queuing Settings
- Access the assistant’s autopilot or settings panel
- Locate the “Wait Time” setting
- Configure based on use case:
Queuing Time Guidelines
Zero Seconds (0s):- Use Case: Instant responses
- Ideal For: Widgets, fast interactions, technical queries
- Trade-off: May feel less human but maximizes efficiency
- Use Case: Balanced response timing
- Ideal For: Customer service, general inquiries
- Trade-off: Good balance of speed and natural feel
- Use Case: Human-like delays
- Ideal For: Conversational realism, relationship building
- Trade-off: More natural but slower overall interaction
3. Monitor and Test Response Time
Testing Methodology
- Test using multiple channels (SMS, chat widget, voice)
- Record generation delay for different query types
- Measure total response time including queuing
- Document performance metrics for analysis
Performance Monitoring
- Track average response times across different scenarios
- Monitor user satisfaction with timing
- Identify patterns in slow responses
- Adjust settings based on data
4. Optimize Prompt Design
Prompt Length Guidelines
Minimize prompt length to reduce processing time: Effective Practices:- Use concise, direct commands instead of verbose instructions
- Employ templates for consistent structure
- Remove unnecessary context from prompts
- Focus on essential instructions only
Example Optimization:
Before (Verbose):5. Evaluate AI Tools and Contextual Features
Tool Impact Analysis
Analyze logs to identify performance impact of: Tool Integrations:- Calendar booking tools - May add 2-4 seconds
- CRM lookup tools - May add 1-3 seconds
- External API calls - Variable based on service
- Custom extraction tools - May add 1-2 seconds
- Large context size can slow processing
- Review context retention settings
- Consider conversation length limits
Optimization Strategies
- Remove unnecessary tools that aren’t being used
- Simplify complex tool configurations
- Optimize tool descriptions for clarity
- Consider tool call frequency and impact
6. Incorporate Synthetic Delays (Optional)
When to Use Synthetic Delays
Appropriate scenarios:- Non-immediate workflows requiring processing time
- Human-like interaction preferences
- Workflow detection time allowance
- Active tag processing delays
Implementation Guidelines
- 5-10 second delays for moderate human-like feel
- 15+ second delays for highly conversational interactions
- Consider user expectations for the specific use case
- Test different delay amounts to find optimal timing
7. Enable Knowledge Bases
Knowledge Base Optimization
- Connect relevant knowledge bases for contextual responses
- Optimize knowledge base size and structure
- Use specific, targeted knowledge rather than broad databases
- Regular knowledge base maintenance to remove outdated information
Performance Considerations
- Larger knowledge bases may increase query time
- Complex queries require more processing
- Optimize search algorithms within knowledge bases
- Consider knowledge base alternatives for frequently accessed information
8. Test for Robustness
Comprehensive Testing Scenarios
Simple Inquiries:- Basic questions (hours, contact info)
- FAQ-type queries
- Single-step responses
- Multi-step processes (booking appointments)
- Tool-dependent queries (CRM lookups)
- Knowledge base searches
- High context (long conversation history)
- Low context (new conversations)
- Mixed complexity within single conversations
Testing Metrics to Track
- Response time distribution across query types
- Success rate of complex interactions
- User satisfaction with timing
- Error rates related to timing issues
9. Iterate and Refine
Continuous Improvement Process
- Regular log review (weekly or bi-weekly)
- Performance trend analysis over time
- User feedback collection on response timing
- Incremental adjustments based on data
Optimization Cycle
- Measure current performance
- Identify bottlenecks or issues
- Implement targeted changes
- Test and validate improvements
- Document successful optimizations
10. Document Adjustments
Documentation Best Practices
- Record all configuration changes with timestamps
- Document performance impact of each change
- Share optimization results with team members
- Maintain configuration version history
- Create optimization playbooks for future reference
Performance Troubleshooting
Common Issues and Solutions
Slow Response Times (>10 seconds)
Potential Causes:- Oversized prompts consuming processing power
- Too many active tools slowing down decision-making
- Large conversation history requiring more processing
- Complex knowledge base queries
- Simplify and shorten prompts
- Remove unnecessary tools
- Implement conversation history limits
- Optimize knowledge base structure
Inconsistent Response Times
Potential Causes:- Variable tool execution times
- Network latency with external services
- Load balancing during peak usage
- Knowledge base query complexity variations
- Monitor external service performance
- Implement consistent tool timeout settings
- Use caching for frequently accessed data
- Optimize knowledge base search algorithms
Unnatural Response Timing
Potential Causes:- Inappropriate queuing settings for use case
- Inconsistent synthetic delays
- Tool execution creating unexpected pauses
- Adjust queuing times to match user expectations
- Implement consistent delay patterns
- Optimize tool performance and reliability
Advanced Optimization Strategies
Performance Monitoring Dashboard
Key Metrics to Track
- Average response time by query type
- 95th percentile response time for outlier detection
- Tool execution time breakdown
- Knowledge base query performance
- User satisfaction scores related to timing
A/B Testing for Response Times
Testing Framework
- Create test groups with different timing configurations
- Measure user engagement and satisfaction
- Compare completion rates across groups
- Analyze conversation quality metrics
Load Testing and Scaling
Performance Under Load
- Test response times during peak usage
- Monitor degradation patterns as load increases
- Implement graceful degradation strategies
- Plan scaling strategies for growth
Definition of Done
This optimization process is complete when:- ✅ AI responds within 2-3 seconds for minimal-context queries
- ✅ AI responds within 6-9 seconds for complex queries with tools and knowledge bases
- ✅ Synthetic delays and queuing times meet user preferences without negatively impacting quality
- ✅ Performance logs confirm improved or maintained performance after adjustments
- ✅ User satisfaction with response timing meets targets
- ✅ Documentation is complete and shared with team
FAQ
What is the ideal response time for AI assistants?
A:- Simple responses: 2-3 seconds
- Contextual responses: 6-9 seconds
- Add synthetic delays if human-like interaction is preferred
How does prompt size affect response time?
A: Larger prompts increase processing time significantly. Use short, task-focused instructions for optimal efficiency.What causes delays besides queuing?
A:- Lengthy conversation history
- Tool integrations and API calls
- Large knowledge base queries
- Network latency
- Complex decision-making processes
Can I bypass synthetic delays for specific interactions?
A: Yes, set queuing time to zero for immediate responses, or use conditional logic to apply delays selectively.How do I debug slow responses?
A: Use log analysis to pinpoint delays in:- Processing time
- Tool call execution
- Knowledge base retrieval
- External API responses
Should I prioritize speed or accuracy?
A: Balance both based on use case:- Transactional interactions: Prioritize speed
- Consultative interactions: Prioritize accuracy
- Customer service: Balance both with slight preference for accuracy
Best Practices Summary
Configuration Principles
- Start with baseline settings and optimize incrementally
- Test thoroughly before deploying changes
- Monitor continuously for performance degradation
- Document all changes for future reference
Performance Optimization
- Keep prompts concise and focused
- Remove unnecessary tools and features
- Optimize knowledge bases for specific use cases
- Use appropriate queuing for interaction type
User Experience
- Match timing to user expectations for the channel
- Provide consistent experience across interactions
- Consider context when setting response times
- Gather user feedback on timing preferences