Skip to main content

Run A/B Experiments

The Experiments feature allows you to test different prompt strategies to see which performs better across AI models, helping you optimize your brand’s AI presence scientifically.
Professional tier and above: Experiments are available on Professional and Enterprise plans. View pricing for details.

What are Experiments?

Experiments in friction let you:
  • Test prompt variations: Compare control vs. test prompts
  • Measure performance differences: Track visibility, sentiment, authority, and purchase intent
  • Make data-driven decisions: Choose the best-performing prompts based on actual AI responses
  • Optimize continuously: Run multiple experiments to improve over time

Control vs Variant Testing

Compare two sets of prompts head-to-head

Multi-Metric Tracking

Monitor up to 4 target metrics simultaneously

AI Model Selection

Test across specific LLMs and platforms

Automated Execution

Nightly job runs experiments automatically
A/B experiment results comparing control vs variant across multiple metrics with statistical significance indicators

Experiment Lifecycle

The 6 Experiment Statuses

Status: Experiment created but not startedWhat You Can Do:
  • Edit all experiment details
  • Add or remove prompts
  • Delete the experiment
  • Save changes without starting
Next Step: Click “Save & Start” when ready to begin

Creating an Experiment

Step 1: Experiment Setup

Click ”+ Create Experiment” to open the creation form.

Required Fields:

Experiment Name
string
required
Descriptive name for your experiment (e.g., “Product Page Prompt Test - Q1 2024”)
Hypothesis
string
required
Your expected outcome (required for “Save & Start”, optional for drafts)Example: “Benefit-focused prompts will increase purchase intent by 15% compared to feature-focused prompts”
Target Metrics
array
required
Select 1-4 metrics to track:
  • Visibility: Frequency and prominence of brand mentions
  • Sentiment: Positive, neutral, or negative tone
  • Authority: Recognition as category leader
  • Purchase Intent: Likelihood of AI recommendation
Note: At least 1 metric must be selected

Optional Fields:

LLM Models
array
Specific AI models to test (if none selected, uses backend defaults)Options:
  • GPT-4, GPT-3.5 (OpenAI)
  • Claude 3 Opus, Sonnet, Haiku (Anthropic)
  • Gemini Pro, Ultra (Google)
  • And more…
AI Platforms
array
Specific AI platforms to test (if none selected, uses backend defaults)Examples: ChatGPT, Claude, Gemini, Perplexity
Date Range
object
Optional start and end dates for the experiment
  • Start Date: When experiment should begin (optional)
  • End Date: When experiment should conclude (optional)
  • Validation: End date must be after start date

Step 2: Add Prompts

Add prompts to both the Control and Variant buckets.

Control Bucket

The baseline prompts representing your current approach. Examples:
"What is [Brand Name]?"
"Features of [Product]"
"How does [Product] work?"
"[Brand] specifications"

Variant Bucket

The test prompts representing your new approach. Examples (benefit-focused):
"How can [Brand Name] help me?"
"Benefits of [Product]"
"What problems does [Product] solve?"
"Why choose [Brand]?"

Prompt Requirements:

Minimum: 5 prompts per bucket (10 total) required before you can “Save & Start”Recommendation: 10-20 prompts per bucket for statistically significant results
You can save as a draft with fewer than 5 prompts per bucket, but you must add more before starting the experiment.

Step 3: Save or Start

What It Does:
  • Saves experiment without starting
  • Status remains “Draft”
  • Can edit anytime
  • No minimum prompt requirement
Use When:
  • Building experiment incrementally
  • Not ready to commit
  • Need approval before running

Viewing Experiment Results

Experiments List Page

List View Features:

Filters
object
Filter experiments by status:
  • All Experiments
  • Draft
  • Scheduled
  • Running
  • Completed
  • Failed
  • Show Archived (toggle)
Pagination
object
Navigate large experiment lists:
  • Page size: 10, 25, 50, or 100 experiments
  • Previous/Next navigation
  • Page count display
Columns
array
Key information at a glance:
  • Experiment Name
  • Status (with color-coded badge)
  • Target Metrics
  • Created Date
  • Last Executed
  • Actions (View, Edit, Archive)

Experiment Detail Page

Click any experiment to view detailed results and timeline.

What You’ll See:

  • Experiment name and hypothesis
  • Current status
  • Target metrics
  • LLM models and platforms
  • Date range
  • Creation and last updated timestamps

Understanding Results

Metric Comparison

Results show scores for each target metric in both buckets:

How to Read Results:

Control: 68/100 Variant: 82/100 Lift: +14 points (+20.6%)Interpretation: Variant prompts resulted in significantly better brand visibilityAction: Adopt variant prompt style for visibility improvement

Statistical Significance

Confidence Level: Results are considered statistically significant if the confidence level exceeds 95%.Sample Size: Larger prompt sets (20+ per bucket) yield more reliable results.

Interpreting Significance:

Meaning: Very likely the observed difference is real, not random chanceAction: Confidently implement the winning variant
Meaning: Likely a real difference, but some uncertainty remainsAction: Consider running a larger follow-up experiment or implement cautiously
Meaning: Observed differences may be due to chanceAction: Run longer experiment with more prompts, or interpret results as inconclusive

Winner Determination

friction automatically recommends a winner based on:
  1. Primary Metric Performance: Largest improvement in target metrics
  2. Statistical Significance: Confidence level of results
  3. No Negative Impacts: Guard rail metrics don’t decline significantly
  4. Consistency: Performance across different AI models

Best Practices

Designing Good Experiments

1

Test One Variable

Change one thing between control and variant for clear causalityGood: Control uses features, Variant uses benefitsBad: Control uses features + formal tone, Variant uses benefits + casual tone (which caused the difference?)
2

Use Sufficient Sample Size

Aim for 10-20 prompts per bucket minimumWhy: Larger samples reduce noise and increase confidence
3

Run for Appropriate Duration

Allow nightly job to execute multiple times if neededRecommendation: Let experiments run for at least 7 days for AI model caching to settle
4

Align with Business Goals

Choose target metrics that matter to your businessE-Commerce: Prioritize Purchase IntentB2B SaaS: Prioritize AuthorityConsumer Brand: Prioritize Sentiment

Experiment Ideas

Control: “What is [Product]?”Variant: “How can [Product] help me?”Tests: Question framing impact on recommendations

Common Mistakes to Avoid

Don’t end experiments prematurely. Wait for full execution and statistical significance before drawing conclusions.
Problem: Can’t identify which change caused the differenceSolution: Test one major variable at a time
Problem: Results lack statistical powerSolution: Use 10-20 prompts per bucket minimum
Problem: Only implementing winning experiments misses learningsSolution: Document why variants failed - equally valuable insight
Problem: Running random experiments without clear goalsSolution: Tie experiments to specific business objectives
Problem: Running experiments but not applying insightsSolution: Create action plan to implement winning prompts across Visibility page

Managing Experiments

Editing Experiments

Can Edit: Everything
  • Name, hypothesis
  • Target metrics
  • LLM models, platforms
  • Date range
  • Add/remove/edit prompts
How: Click “Edit” on experiments list or detail page

Archiving Experiments

Archive old or irrelevant experiments to keep your list clean:
1

Select Experiment

Click the archive icon on experiments list or detail page
2

Confirm Archive

Experiment moves to archived status
3

View Archived

Toggle “Show archived” filter to see archived experiments
4

Unarchive (if needed)

Click unarchive icon to restore experiment to active list
Archiving vs Deleting: Archiving preserves experiment data for future reference. Deleting permanently removes the experiment.

Deleting Experiments

Deleting an experiment is permanent and cannot be undone. All data will be lost.
When to Delete:
  • Test experiments during onboarding
  • Duplicate experiments created by mistake
  • Experiments with critical errors in setup
  • Data you’re certain you’ll never need
When to Archive Instead:
  • Completed experiments (preserve historical data)
  • Experiments with learnings to reference later
  • Anything you might want to review in the future

Experiment Workflow Example

Real-World Scenario: E-Commerce Brand

Goal: Increase purchase intent for flagship product
1

Hypothesis Formation

Hypothesis: “Benefit-focused prompts emphasizing problem-solving will increase purchase intent by 20% compared to feature-focused prompts”Reasoning: Customers care more about outcomes than specifications
2

Create Experiment

  • Name: “Product Page Prompts - Benefit vs Feature Test”
  • Target Metrics: Purchase Intent (primary), Sentiment (secondary)
  • LLM Models: GPT-4, Claude 3, Gemini Pro
3

Build Control Bucket (Feature-Focused)

"What are the features of [Product]?"
"Specifications of [Product]"
"Technical details of [Product]"
"[Product] capabilities"
"What does [Product] include?"
... (10 total feature-focused prompts)
4

Build Variant Bucket (Benefit-Focused)

"How can [Product] help me save time?"
"What problems does [Product] solve?"
"Benefits of using [Product]"
"Why should I choose [Product]?"
"How does [Product] make life easier?"
... (10 total benefit-focused prompts)
5

Save & Start

Experiment moves to “Scheduled” status, queued for nightly execution
6

Monitor Results

After execution completes:
  • Control Purchase Intent: 62/100
  • Variant Purchase Intent: 79/100
  • Lift: +17 points (+27.4%)
  • Confidence: 96%
Winner: Variant (benefit-focused prompts)
7

Apply Insights

  • Update Visibility page prompts to use benefit-focused language
  • Apply benefit framing to product pages
  • Run follow-up experiment testing different benefit themes

Troubleshooting

Possible Causes:
  • Nightly job hasn’t run yet (wait 24 hours)
  • Queue backlog (multiple experiments scheduled)
  • System issue
Solution: Wait for next nightly execution window. Contact support if stuck >48 hours.
Check: Failure reason in status badge tooltip or detail pageCommon Causes:
  • Invalid prompts (empty, malformed)
  • API rate limiting
  • Timeout due to too many prompts
Solution: Fix issue, create new experiment
Interpretation: Control and variant performed similarlyPossible Reasons:
  • Tested variable doesn’t matter for this metric
  • Change was too subtle
  • Sample size too small
Next Steps: Try more dramatic variation or different test variable
Check:
  • Hypothesis filled in
  • At least 1 target metric selected
  • Minimum 5 prompts per bucket
  • Valid date range (end > start)
Solution: Address missing requirements

Next Steps


Pro Tip: Start with high-impact, low-effort experiments. Test prompt framing or feature vs. benefit messaging first - these often show significant results with minimal effort. Save complex multivariate tests for later when you have more experience.