Blog
RRF Experiment
Why More Articles Beat Fewer High-Ranking Ones

Why More Articles Beat Fewer High-Ranking Ones

by impaca.ai team

As many SEO and GEO practitioners have discovered independently, ChatGPT's client-side code contains references to Reciprocal Rank Fusion (RRF) models: ret-rr-skysight-v3, rrf_alpha, rrf_input_threshhold. This discovery led us to conduct an experiment that challenges conventional SEO wisdom.

What is RRF?

Reciprocal Rank Fusion (RRF) is a method for combining search results from multiple searches into one ranked list. Think of it like asking several people the same question and then combining their answers to get the best overall response.

Here's how it works: each search result gets a score based on its position using this formula:

RRF score=1result count+rank position\text{RRF score} = \frac{1}{\text{result count} + \text{rank position}}

The result count in ChatGPT appears to be around 30-60 - so the RRF score of a sinlge page ranking #5 could be: 145+5=0.02\frac{1}{45+5}=0.02

So a #1 result gets a higher score than a #10 result. If the same page shows up in multiple searches, all its scores get added together. Pages that consistently appear across different searches end up ranking higher in the final results.

This causes a wierd phemomenon:

Let's compare two scenarios (with result count=60\text{result count} = 60):

  • Scenario A: Your page ranks #1 in one sub-query:
160+10.0164\frac{1}{60+1} \approx 0.0164
  • Scenario B: Your page ranks #10 in two different sub-queries:
160+10+160+102×0.01430.0286\frac{1}{60+10} + \frac{1}{60+10} \approx 2 \times 0.0143 \approx 0.0286

Even though Scenario A has a higher individual ranking, Scenario B achieves a better final RRF score by appearing in multiple searches! The experiment in this research post will attempt to test this in practice.

Why is RRF needed?

This is essential because ChatGPT often uses a "query fan-out" approach, breaking down user questions into multiple sub-queries to gather comprehensive information from various sources. The aggregation and re-ranking of these diverse results is managed by SonicBerry, ChatGPT's search engine system.

Example:

  • User query: "What are the best headphones for running?"
  • Fanned-out queries could be:
    • "best running headphones 2025"
    • "wireless earbuds sweatproof waterproof"
    • "bone conduction headphones running"
    • "sports earbuds secure fit review"
Hypothesis

More lower-ranking articles are better than a few high-ranking ones, as more results may result in a higher RRF score.

The Red vs Blue Experiment

We created two blogs for fictional products: "Red Rixie RXTYZ4 drones" and "Blue Rixie RXTYZ4 drones". Red had fewer (2) but higher-quality articles; Blue had more (5) but lower-quality articles. We used impaca.ai to measure visibility.

Experiment setup

Experiment Setup

  • Red site: 5 slightly lower-quality articles
  • Blue site: 2 longer and higher-quality articles
  • Both sites registered to Google Search Console, since research shows ChatGPT search uses Google SERP in combination with other sources
  • Used impaca.ai to monitor GEO performance of both sites

SEO Performance Results

  • Red articles: Mostly ranked in spots 2-4
  • Blue articles: Both ranked #1

Results

Understanding the Metrics

Visibility Score measures the percentage of AI answers that mention a brand across selected topics. Only answers that mention at least one industry player are counted, making it a relative measure of how often a brand appears in AI responses compared to competitors.

Top Recommendation Share tracks the percentage of prompts where a brand appears as the first recommendation. This metric focuses on prominence rather than just presence, measuring how often a brand is positioned as the leading choice in AI-generated answers.

Monitored with impaca.ai

Top recommendation share over 6 days: Red site (5 articles, red line) vs Blue site (2 articles, blue line). Despite Blue's initial SEO advantage, Red consistently captures more top recommendations through volume strategy.

Monitored with impaca.ai

Visibility score over 6 days: Red site (5 articles, red line) vs Blue site (2 articles, blue line). Shows the raw visibility metrics that contribute to the primary recommendation share.

Key Findings

Red (more articles) was consistently showing above Blue in ChatGPT results, and in some user prompts, Blue wasn't showing up at all.

This doesn't mean that SEOs/GEOs shouldn't target high-ranking articles, but ~5 articles ranking 3-5 may be better than one or two ranking #1.

Why RRF Favors Volume

The RRF algorithm's strength lies in its ability to aggregate signals across multiple queries. When ChatGPT fans out a user question into sub-queries:

  1. More articles = more opportunities to appear in different sub-query results
  2. Consistent moderate ranking across multiple searches beats perfect ranking in fewer searches
  3. Cross-query reinforcement - the same domain appearing in multiple sub-queries gets boosted