AI Overviews are the most significant SEO change agent since mobile – maybe ever.
Until now, we’ve lacked a representative data set to thoroughly analyze how AIOs (AI Overviews) work.
Thanks to exclusive data from Surfer, I conducted the largest analysis of AI Overviews so far with over 546,000 rows and +44 GB of data.
The data answers who, why, and how to rank in AIOs with astonishing clarity. In other cases, it raises new questions we can seek to answer and refine our understanding of how to succeed in AIOs.
The stakes are high: AIOs can lead to a significant traffic decrease of -10% (according to my first analysis), depending on citation design and user intent – and there is no escaping this.
Since the AIO pullback two weeks after the initial launch at the end of May, they’ve slowly been ramping up.
Image Credit: Lyna ™
The Data
The data set spans 546,513 rows, 44.4 GB, and over 12 million domains. There is no known exploration of a comparable dataset.
85% of queries and results are in English.
253,710 results are live (not part of SGE, Google’s beta environment), 285,000 of results are part of SGE.
8,297 queries show AIOs for both SGE and non-SGE.
The data contains queries, organic results, cited domains, and AIO answers.
The dataset was pulled in June.
Limitations:
It’s possible that new features are not included since AIOs change all the time.
The dataset does not yet contain languages like Portuguese or Spanish that were recently added.
I will share insights over several Memos, so stay tuned for part 2.
Answers
I sought to answer five questions in this first exploration.
Which domains are most visible in AIOs?
Does every AIO have citations?
Does organic position determine AIO visibility?
How many AIOs contain the search query?
How different are AIOs in vs. outside of SGE?
Which Domains Are Most Visible In AIOs?
We can assume that the most cited domains also get the most traffic from AIOs.
In my previous analyses, Wikipedia and Reddit were the most cited sources. This time, we see a different picture.
The top 10 most cited domains in AIOs:
youtube.com.
wikipedia.com.
linkedin.com.
NIH (National Library of Medicine).
support.google.com.
healthline.com.
webmd.com.
support.microsoft.com.
mayoclinic.org.
The top 10 best-ranking domains in classic search results:
www.google.com.
www.youtube.com.
www.reddit.com.
www.quora.com.
en.wikipedia.org.
www.linkedin.com.
support.google.com.
www.healthline.com.
www.ncbi.nlm.nih.gov.
www.webmd.com.
The biggest difference? Reddit, Quora, and Google are completely underrepresented in AIO citations, which is completely counterintuitive and against trends we’ve seen in the past. I found only a few AIO citations for the three domains:
Reddit: 130.
Quora: 398.
Google: 612.
Did Google make a conscious change here?
We can see that AIOs can show vast differences between cited URLs and ranking URLs in classic search results.
The fact that two social networks, YouTube and LinkedIn, are in the top three most cited domains raises the question of whether we can influence AIO answers with content on YouTube and LinkedIn more than our own.
Videos take more effort to produce than LinkedIn answers, but they might also be more defensible against copycats. AIO-optimization strategies should include social and video content.
Does Every AIO Have Citations?
We assume every AIO has citations, but that’s not always the case.
Queries with very simple user intent, like “What is a meta description for an article?” or “Is 1.5 a whole number?” don’t show any citations.
I counted 4,691 zero-citation queries (0.85%) in the data set – less than 1% (0.85%).
It’s questionable how valuable this traffic would have been in the first place.
However, the fact that Google is willing to display AI answers without citations raises the question of whether we’ll also see more complex and valuable queries without sources.
The impact would be devastating, as citations are the only way to get clicks from AIOs.
Does Organic Position Determine AIO Visibility?
Lately, more data came out showing a high overlap between pages cited in AIOs and pages ranking in the top spots for the same query.
The underlying question is: Do you need to do anything different to optimize for AIOs than for classic search results?
Early on, Google would cite URLs in AIOs that don’t rank in the top 10 results. Some would even come from penalized or non-indexed domains.
The concern was that a system would pick citations far removed from classic search results ranking, making it hard to optimize for AIOs and leading to questionable answers.
Over the last one to two months, that trend seems to have changed, but the data does not indicate a turnaround.
I found:
9.2 million total unique URLs in the top 20 search results.
2.7 million total URLs in AIO citations.
1.1 million unique URLs in both the top 20 search result positions and as AIO citations.
12.1% of URLs in the top 20 search results are also AIO citations. In reverse, 59.6% of AIO citations are not from the top 20 search results.
The observation is supported by a Google patent showing how links are selected after summarization and weak correlations between search results rank and AIO citations: -0.19 in total and -0.21 for the top 3 search results.
Ranking higher in the search results certainly increases the chances of being visible in AIOs, but it’s by far not the only factor. Google aims for more diversity in AIO citations.
In the search results, URLs rank for ~15.7 keywords on average, no matter whether they’re in or out of the top 10 positions. In AIO citations, it’s almost exactly half: 8.7x.