The GEO score: Quantifying the perfect content playbook
by impaca.ai team
Building on our RRF findings, we at impaca.ai decided to take a more methodical and data science-based approach to GEO in ChatGPT, which led us to the following hypothesis:
The best article strategy for GEO is for articles to live in the same idea space, but avoid over-clustering.
The natural follow-up for this hypothesis is thinking of a way to quantify the concepts of "the same idea space" and "over-clustering".
We defined a machine learning-derived metric ("The GEO score"), the we believed to quantify these two concepts.
Technical deep dive into the GEO score
1. Ensuring Same Idea Space: Cohesion to Theme
We used sentence embeddings (OpenAI's text-embedding-3-small
) to convert each article title into a high-dimensional vector representation. To ensure all titles live in the same conceptual neighborhood:
- Create a theme anchor by computing the centroid of all title embeddings
- Calculate cosine similarity between each title embedding and the theme anchor
- Compute the mean similarity across all titles
This gives us a value in [0,1] where higher values indicate the titles are thematically consistent: they're all in the same conceptual neighborhood rather than scattered across different domains.
2. Preventing Over-clustering: Top-k Neighbor Analysis
To ensure titles aren't just variations of the same concept, we analyze top-k nearest-neighbor similarity:
- For each title, find its k=3 most similar neighbors
- Calculate Top3_sim as the mean cosine similarity to these 3 neighbors
Using top-3 neighbors rather than single nearest neighbor reduces noise and provides a more stable redundancy measure. Low Top3_sim indicates healthy diversity: titles are related but not duplicative.
3. Composite Score
We use a harmonic mean to force balance between our metrics:
Where:
- C (Cohesion): Measures how well titles align with the theme (0 to 1)
- N (Non-redundancy): 1 - Top3_sim, measures diversity (0 to 1)
Both metrics are normalized to [0,1], making the score directly comparable across test runs. The harmonic mean ensures both metrics must be reasonably high: a high score means titles are cohesively on-theme while avoiding redundancy.
We then:
- Created 30 fake blogs seperated into 6 groups, with 5 articles each.
- Each of the blogs different article topics.
- Used impaca.ai to measure the AI visibility and GEO.
How we built the sites and measured the visibility
Site Creation Process
For each test site, we followed a standardized methodology:
- Created small Astro blogs - Fast, lightweight static sites optimized for content delivery
- Deployed statically on AWS - Used S3 + Amplify.
- Registered with Google Search Console - To allow chatgpt to consume from the google SERP.
- Waited for indexing.
- Used the impaca.ai platform to monitor GEO visibility for all sites.
Group strategy
We created 5 groups of 6 sites each (30 total), where each group uses a unique gibberish word that only appears in that group's titles. This prevents groups from competing with each other or with existing, established sites on AI mentions.
Results
Hover data points to explore
Click to pin and expand
After a few days of tracking AI mentions across multiple queries, the data revealed a clear pattern: Composite Score vs Visibility scatter plot: The trend line shows a clear positive correlation between composite score and AI visibility. Points are color-coded by group, with each group testing the same strategy variations using different gibberish words. Sites with higher composite scores (balancing cohesion and non-redundancy) consistently achieve greater visibility. Data monitored by impaca.ai.
Key Findings
Across all 30 test sites, the data confirms our hypothesis:
- Composite Score ≥ 0.35 Consistently Wins: Sites with scores between 0.35-0.43 achieve highest visibility (29-38 mentions)
- Over-Clustering Penalty: Very similar titles (scores 0.15-0.21) plateau at moderate visibility (9-24 mentions)
- Random Strategy Penalty: Scattered topics (scores 0.15-0.19) consistently score lowest (8-18 mentions)
- Optimal Balance: High cohesion with low redundancy (Top3_sim < 0.5) represents the sweet spot