Run A/B Experiments
The Experiments feature allows you to test different prompt strategies to see which performs better across AI models, helping you optimize your brand’s AI presence scientifically.Professional tier and above: Experiments are available on Professional and Enterprise plans. View pricing for details.
What are Experiments?
Experiments in friction let you:- Test prompt variations: Compare control vs. test prompts
- Measure performance differences: Track visibility, sentiment, authority, and purchase intent
- Make data-driven decisions: Choose the best-performing prompts based on actual AI responses
- Optimize continuously: Run multiple experiments to improve over time
Control vs Variant Testing
Compare two sets of prompts head-to-head
Multi-Metric Tracking
Monitor up to 4 target metrics simultaneously
AI Model Selection
Test across specific LLMs and platforms
Automated Execution
Nightly job runs experiments automatically
Experiment Lifecycle
The 6 Experiment Statuses
- Draft
- Scheduled
- Running
- Completed
- Failed
- Archived
Status: Experiment created but not startedWhat You Can Do:
- Edit all experiment details
- Add or remove prompts
- Delete the experiment
- Save changes without starting
Creating an Experiment
Step 1: Experiment Setup
Click ”+ Create Experiment” to open the creation form.Required Fields:
Descriptive name for your experiment (e.g., “Product Page Prompt Test - Q1 2024”)
Your expected outcome (required for “Save & Start”, optional for drafts)Example: “Benefit-focused prompts will increase purchase intent by 15% compared to feature-focused prompts”
Select 1-4 metrics to track:
- Visibility: Frequency and prominence of brand mentions
- Sentiment: Positive, neutral, or negative tone
- Authority: Recognition as category leader
- Purchase Intent: Likelihood of AI recommendation
Optional Fields:
Specific AI models to test (if none selected, uses backend defaults)Options:
- GPT-4, GPT-3.5 (OpenAI)
- Claude 3 Opus, Sonnet, Haiku (Anthropic)
- Gemini Pro, Ultra (Google)
- And more…
Specific AI platforms to test (if none selected, uses backend defaults)Examples: ChatGPT, Claude, Gemini, Perplexity
Optional start and end dates for the experiment
- Start Date: When experiment should begin (optional)
- End Date: When experiment should conclude (optional)
- Validation: End date must be after start date
Step 2: Add Prompts
Add prompts to both the Control and Variant buckets.Control Bucket
The baseline prompts representing your current approach. Examples:Variant Bucket
The test prompts representing your new approach. Examples (benefit-focused):Prompt Requirements:
Step 3: Save or Start
- Save Draft
- Save & Start
What It Does:
- Saves experiment without starting
- Status remains “Draft”
- Can edit anytime
- No minimum prompt requirement
- Building experiment incrementally
- Not ready to commit
- Need approval before running
Viewing Experiment Results
Experiments List Page
List View Features:
Filter experiments by status:
- All Experiments
- Draft
- Scheduled
- Running
- Completed
- Failed
- Show Archived (toggle)
Navigate large experiment lists:
- Page size: 10, 25, 50, or 100 experiments
- Previous/Next navigation
- Page count display
Key information at a glance:
- Experiment Name
- Status (with color-coded badge)
- Target Metrics
- Created Date
- Last Executed
- Actions (View, Edit, Archive)
Experiment Detail Page
Click any experiment to view detailed results and timeline.What You’ll See:
- Overview
- Results
- Prompt Lists
- Execution Timeline
- Experiment name and hypothesis
- Current status
- Target metrics
- LLM models and platforms
- Date range
- Creation and last updated timestamps
Understanding Results
Metric Comparison
Results show scores for each target metric in both buckets:How to Read Results:
- Visibility
- Sentiment
- Purchase Intent
Control: 68/100
Variant: 82/100
Lift: +14 points (+20.6%)Interpretation: Variant prompts resulted in significantly better brand visibilityAction: Adopt variant prompt style for visibility improvement
Statistical Significance
Confidence Level: Results are considered statistically significant if the confidence level exceeds 95%.Sample Size: Larger prompt sets (20+ per bucket) yield more reliable results.
Interpreting Significance:
High Confidence (95%+)
High Confidence (95%+)
Meaning: Very likely the observed difference is real, not random chanceAction: Confidently implement the winning variant
Medium Confidence (80-95%)
Medium Confidence (80-95%)
Meaning: Likely a real difference, but some uncertainty remainsAction: Consider running a larger follow-up experiment or implement cautiously
Low Confidence (<80%)
Low Confidence (<80%)
Meaning: Observed differences may be due to chanceAction: Run longer experiment with more prompts, or interpret results as inconclusive
Winner Determination
friction automatically recommends a winner based on:- Primary Metric Performance: Largest improvement in target metrics
- Statistical Significance: Confidence level of results
- No Negative Impacts: Guard rail metrics don’t decline significantly
- Consistency: Performance across different AI models
Best Practices
Designing Good Experiments
1
Test One Variable
Change one thing between control and variant for clear causalityGood: Control uses features, Variant uses benefitsBad: Control uses features + formal tone, Variant uses benefits + casual tone (which caused the difference?)
2
Use Sufficient Sample Size
Aim for 10-20 prompts per bucket minimumWhy: Larger samples reduce noise and increase confidence
3
Run for Appropriate Duration
Allow nightly job to execute multiple times if neededRecommendation: Let experiments run for at least 7 days for AI model caching to settle
4
Align with Business Goals
Choose target metrics that matter to your businessE-Commerce: Prioritize Purchase IntentB2B SaaS: Prioritize AuthorityConsumer Brand: Prioritize Sentiment
Experiment Ideas
- Prompt Framing
- Feature vs Benefit
- Comparison Positioning
- Specificity
- Problem-Solution
Control: “What is [Product]?”Variant: “How can [Product] help me?”Tests: Question framing impact on recommendations
Common Mistakes to Avoid
Testing Too Many Variables
Testing Too Many Variables
Problem: Can’t identify which change caused the differenceSolution: Test one major variable at a time
Insufficient Prompts
Insufficient Prompts
Problem: Results lack statistical powerSolution: Use 10-20 prompts per bucket minimum
Ignoring Negative Results
Ignoring Negative Results
Problem: Only implementing winning experiments misses learningsSolution: Document why variants failed - equally valuable insight
Not Aligning with Strategy
Not Aligning with Strategy
Problem: Running random experiments without clear goalsSolution: Tie experiments to specific business objectives
Forgetting to Implement Winners
Forgetting to Implement Winners
Problem: Running experiments but not applying insightsSolution: Create action plan to implement winning prompts across Visibility page
Managing Experiments
Editing Experiments
- Draft Experiments
- Scheduled/Running
- Completed
Can Edit: Everything
- Name, hypothesis
- Target metrics
- LLM models, platforms
- Date range
- Add/remove/edit prompts
Archiving Experiments
Archive old or irrelevant experiments to keep your list clean:1
Select Experiment
Click the archive icon on experiments list or detail page
2
Confirm Archive
Experiment moves to archived status
3
View Archived
Toggle “Show archived” filter to see archived experiments
4
Unarchive (if needed)
Click unarchive icon to restore experiment to active list
Archiving vs Deleting: Archiving preserves experiment data for future reference. Deleting permanently removes the experiment.
Deleting Experiments
When to Delete:- Test experiments during onboarding
- Duplicate experiments created by mistake
- Experiments with critical errors in setup
- Data you’re certain you’ll never need
- Completed experiments (preserve historical data)
- Experiments with learnings to reference later
- Anything you might want to review in the future
Experiment Workflow Example
Real-World Scenario: E-Commerce Brand
Goal: Increase purchase intent for flagship product1
Hypothesis Formation
Hypothesis: “Benefit-focused prompts emphasizing problem-solving will increase purchase intent by 20% compared to feature-focused prompts”Reasoning: Customers care more about outcomes than specifications
2
Create Experiment
- Name: “Product Page Prompts - Benefit vs Feature Test”
- Target Metrics: Purchase Intent (primary), Sentiment (secondary)
- LLM Models: GPT-4, Claude 3, Gemini Pro
3
Build Control Bucket (Feature-Focused)
4
Build Variant Bucket (Benefit-Focused)
5
Save & Start
Experiment moves to “Scheduled” status, queued for nightly execution
6
Monitor Results
After execution completes:
- Control Purchase Intent: 62/100
- Variant Purchase Intent: 79/100
- Lift: +17 points (+27.4%)
- Confidence: 96%
7
Apply Insights
- Update Visibility page prompts to use benefit-focused language
- Apply benefit framing to product pages
- Run follow-up experiment testing different benefit themes
Troubleshooting
Experiment Stuck in Scheduled
Experiment Stuck in Scheduled
Possible Causes:
- Nightly job hasn’t run yet (wait 24 hours)
- Queue backlog (multiple experiments scheduled)
- System issue
Experiment Failed
Experiment Failed
Check: Failure reason in status badge tooltip or detail pageCommon Causes:
- Invalid prompts (empty, malformed)
- API rate limiting
- Timeout due to too many prompts
Results Show No Difference
Results Show No Difference
Interpretation: Control and variant performed similarlyPossible Reasons:
- Tested variable doesn’t matter for this metric
- Change was too subtle
- Sample size too small
Can't Start Experiment
Can't Start Experiment
Check:
- Hypothesis filled in
- At least 1 target metric selected
- Minimum 5 prompts per bucket
- Valid date range (end > start)
Next Steps
Define Prompts
Apply experiment insights to your prompt library
Analyze Results
See how experiments impact overall brand health
View Dashboard
Monitor impact of implemented changes
Commerce Tracking
Run commerce-specific experiments
Pro Tip: Start with high-impact, low-effort experiments. Test prompt framing or feature vs. benefit messaging first - these often show significant results with minimal effort. Save complex multivariate tests for later when you have more experience.