Each major LLM lists different products, with only one shoe consistently recommended by all four tested AI systems.
All the systems exhibit a degree of creativity, suggesting varying products across different sessions.
Notably, Copilot, Perplexity and ChatGPT primarily reference non-commercial sources, such as shop websites or product detail pages, aligning with the prompt’s purpose.
Claude was not tested further. While it also suggests shoe models, its recommendations are based solely on initial training data without access to real-time data or its own retrieval system.
As you can see from the different results, each LLM will have its own process of selecting sources and content, making the GEO challenge even greater.
The recommendations are influenced by co-occurrences, co-mentions and context.
The more frequently specific tokens are mentioned together, the more likely they are to be contextually related.
In simple terms, this increases the probability score during decoding.
Dig deeper: How to gain visibility in generative AI answers: GEO for Perplexity and ChatGPT
Get the newsletter search marketers rely on.
Source and information selection for retrieval-augmented generation
GEO focuses on positioning products, brands and content within the training data of LLMs. Understanding the training process of LLMs is crucial for identifying potential opportunities for inclusion.
The following insights are drawn from studies, patents, scientific documents, research on E-E-A-T and personal analysis. The central questions are:
How big the influence of the retrieval systems is in the RAG process.
How important the initial training data is.
What other factors can play a role.
Recent studies, particularly on source selection for AI Overviews, Perplexity and Copilot, suggest overlaps in selected sources.
For example, Google AI Overviews show about 50% overlap in source selection, as evidenced by studies from Rich Sanger and Authoritas and Surfer.
The fluctuation range is very high. The overlap in studies from the beginning of 2024 was still around 15%. However, some studies found a 99% overlap.
The retrieval system appears to influence approximately 50% of the AI Overviews’ results, suggesting ongoing experimentation to improve performance. This aligns with justified criticism regarding the quality of AI Overview outputs.
The selection of referenced sources in AI answers highlights where it is beneficial to position brands or products in a contextually appropriate way.
It’s important to differentiate between sources used during the initial training of models and those added on a topic-specific basis during the RAG process.
Examining the model training process provides clarity. For instance, Google’s Gemini – a multimodal large language model – processes diverse data types, including text, images, audio, video and code.
Its training data comprises web documents, books, code and multimedia, enabling it to perform complex tasks efficiently.
Studies on AI Overviews and their most frequently referenced sources offer insights into which sources Google uses for its indices and knowledge graph during pre-training, providing opportunities to align content for inclusion.
In the RAG process, domain-specific sources are incorporated to enhance contextual relevance.
A key feature of Gemini is its use of a Mixture of Experts (MoE) architecture.
Unlike traditional Transformers, which operate as a single large neural network, an MoE model is divided into smaller “expert” networks.
The model selectively activates the most relevant expert paths based on the input type, significantly improving efficiency and performance.
The RAG process is likely integrated into this architecture.
Gemini is developed by Google through multiple training phases, utilizing publicly available data and specialized techniques to maximize the relevance and precision of its generated content:
Pre-training
Similar to other large language models (LLMs), Gemini is first pre-trained on various public data sources. Google applies various filters to ensure data quality and avoid problematic content.
The training considers a flexible selection of likely words, allowing for more creative and contextually appropriate responses.
Supervised fine-tuning (SFT)
After pre-training, the model is optimized using high-quality examples either created by experts or generated by models and then reviewed by experts.
This process is similar to learning good text structure and content by seeing examples of well-written texts.
Reinforcement learning from human feedback (RLHF)
The model is further developed based on human evaluations. A reward model based on user preferences helps Gemini recognize and learn preferred response styles and content.
Extensions and retrieval augmentation
Gemini can search external data sources such as Google Search, Maps, YouTube or specific extensions to provide contextual information about the response.
For example, when asked about current weather conditions or news, Gemini could access Google Search directly to find timely, reliable data and incorporate it into the response.
Gemini performs search results filtering to select the most relevant information for the answer. The model takes into account the contextuality of the query and filters the data so that it fits the question as closely as possible.
An example of this would be a complex technical question where the model selects results that are scientific or technical in nature rather than using general web content.
Gemini combines the information retrieved from external sources with the model output.
This process involves creating an optimized draft response that draws on both the model’s prior knowledge and information from the retrieved data sources.
The model structures the answer so that the information is logically brought together and presented in a readable manner.
Each answer undergoes additional review to ensure that it meets Google’s quality standards and does not contain problematic or inappropriate content.
This security check is complemented by a ranking that favors the best quality versions of the answer. The model then presents the highest-ranked answer to the user.
User feedback and continuous optimization
Google continuously integrates feedback from users and experts to adapt the model and fix any weak points.
One possibility is that AI applications access existing retrieval systems and use their search results.
Studies suggest that a strong ranking in the respective search engine increases the likelihood of being cited as a source in connected AI applications.
However, as noted, the overlaps do not yet show a clear correlation between top rankings and referenced sources.
Another criterion appears to influence source selection.
Google’s approach, for example, emphasizes adherence to quality standards when choosing sources for pre-training and RAG.
The use of classifiers is also mentioned as a factor in this process.
When naming classifiers, a bridge can be made to E-E-A-T, where quality classifiers are also used.
Information from Google regarding post-training also references using E-E-A-T in classifying sources according to quality.
The reference to evaluators connects to the role of quality raters in assessing E-E-A-T.
Rankings in most search engines are influenced by relevance and quality at the document, domain and author or source entity levels.
Sources may be chosen less for relevance and more for quality at the domain and source entity level.
This would also make sense, as more complex prompts have to be rewritten in the background so that appropriate search queries are created to query the rankings.
While relevance is query-dependent, quality remains consistent.
This distinction helps explain the weak correlation between rankings and sources referenced by generative AI and why lower-ranking sources are sometimes included.
To assess quality, search engines like Google and Bing rely on classifiers, including Google’s E-E-A-T framework.
Google has emphasized that E-E-A-T varies by subject area, necessitating topic-specific strategies, particularly in GEO strategies.
Referenced domain sources differ by industry or topic, with platforms like Wikipedia, Reddit and Amazon playing varying roles, according to a BrightEdge study.
Thus, industry- and topic-specific factors must be integrated into positioning strategies.
Dig deeper: How to implement generative engine optimization (GEO) strategies
Tactical and strategic approaches for LLMO / GEO
As previously noted, there are no proven success stories yet for influencing the results of generative AI.
Platform operators themselves seem uncertain about how to qualify the sources selected during the RAG process.
These points underscore the importance of identifying where optimization efforts should focus – specifically, determining which sources are sufficiently trustworthy and relevant to prioritize.
The next challenge is understanding how to establish yourself as one of those sources.
The research paper “GEO: Generative Engine Optimization” introduced the concept of GEO, exploring how generative AI outputs can be influenced and identifying the factors responsible for this.
According to the study, the visibility and effectiveness of GEO can be enhanced by the following factors:
Authority in writing: Improves performance, particularly on debate questions and queries in historical contexts, as more persuasive writing is likely to have more value in debate-like contexts.
Citations (cite sources): Particularly beneficial for factual questions, as they provide a source of verification for the facts presented, thereby increasing the credibility of the answer.
Statistical addition: Particularly effective in fields such as Law, Government and Opinion, where incorporating relevant statistics into webpage content can enhance visibility in specific contexts.
Quotation addition: Most impactful in areas like People and Society, Explanations and History, likely because these topics often involve personal narratives or historical events where direct quotes add authenticity and depth.
These factors vary in effectiveness depending on the domain, suggesting that incorporating domain-specific, targeted customizations into web pages is essential for increased visibility.
The following tactical dos for GEO and LLMO can be derived from the paper:
Use citable sources: Incorporate citable sources into your content to increase credibility and authenticity, especially factual ones
Insert statistics: Add relevant statistics to strengthen your arguments, especially in areas like Law and Government and opinion questions.
Add quotes: Use quotes to enrich content in areas such as People and Society, Explanations and History as they add authenticity and depth.
Domain-specific optimization: Consider the specifics of your domain when optimizing, as the effectiveness of GEO methods varies depending on the area.
Focus on content quality: Focus on creating high-quality, relevant and informative content that provides value to users.
Additionally, tactical don’ts can also be identified:
Avoid keyword stuffing: Traditional keyword stuffing shows little to no improvement in generative search engine responses and should be avoided.
Don’t ignore the context: Avoid generating content that is unrelated to the topic or does not provide any added value for the user.
Don’t overlook user intent: Don’t neglect the intent behind search queries. Make sure your content actually answers users’ questions.
BrightEdge has outlined the following strategic considerations based on the aforementioned research:
Different impacts of backlinks and co-citations
AI Overviews and Perplexity favor distinct domain sets depending on the industry.
In healthcare and education, both platforms prioritize trusted sources like mayoclinic.org and coursera.com, making these or similar domains key targets for effective SEO strategies.
Conversely, in sectors like ecommerce and finance, Perplexity shows a preference for domains such as reddit.com, yahoo.com, and marketwatch.com.
Tailoring SEO efforts to these preferences by leveraging backlinks and co-citations can significantly enhance performance.
Tailored strategies for AI-powered search
AI-powered search approaches must be customized for each industry.
For instance, Perplexity’s preference for reddit.com underscores the importance of community insights in ecommerce, while AI Overviews leans toward established review and Q&A sites like consumerreports.org and quora.com.
Marketers and SEOs should align their content strategies with these tendencies by creating detailed product reviews or fostering Q&A forums to support ecommerce brands.
Anticipate changes in the citation landscape
SEOs must closely monitor Perplexity’s preferred domains, especially the platform’s reliance on reddit.com for community-driven content.
Google’s partnership with Reddit could influence Perplexity’s algorithms to prioritize Reddit’s content further. This trend indicates a growing emphasis on user-generated content.
SEOs should remain proactive and adaptable, refining strategies to align with Perplexity’s evolving citation preferences to maintain relevance and effectiveness.
Below are industry-specific tactical and strategic measures for GEO.
B2B tech
Establish a presence on authoritative tech domains, particularly techtarget.com, ibm.com, microsoft.com and cloudflare.com, which are recognized as trusted sources by both platforms.
Leverage content syndication on these established platforms to get cited as a trusted source faster.
In the long term, build your own domain authority through high-quality content, as competition for syndication spots will increase.
Enter into partnerships with leading tech platforms and actively contribute content there.
Demonstrate expertise through credentials, certifications and expert opinions to signal trustworthiness.
Ecommerce
Establish a strong presence on Amazon, as Perplexity’s platform is widely used as a source.
Actively promote product reviews and user-generated content on Amazon and other relevant platforms.
Distribute product information via established dealer platforms and comparison sites
Syndicate content and partner with trusted domains.
Maintain detailed and up-to-date product descriptions on all sales platforms.
Get involved on relevant specialist portals and community platforms such as Reddit.
Pursue a balanced marketing strategy that relies on both external platforms and your own domain authority.
Continuing education
Build trustworthy sources and collaborate with authoritative domains such as coursera.org, usnews.com and bestcolleges.com, as these are considered relevant by both systems.
Create up-to-date, high-quality content that AI systems classify as trustworthy. The content should be clearly structured and supported by expert knowledge.
Build an active presence on relevant platforms like Reddit as community-driven content becomes increasingly important.
Optimize your own content for AI systems through clear structuring, clear headings and concise answers to common user questions.
Clearly highlight quality features such as certifications and accreditations, as these increase credibility.
Finance
Build a presence on trustworthy financial portals such as yahoo.com and marketwatch.com, as these are preferred sources by AI systems.
Maintain current and accurate company information on leading platforms such as Yahoo Finance.
Create high-quality, factually correct content and support it with references to recognized sources.
Build an active presence in relevant Reddit communities as Reddit gains traction as a source for AI systems.
Enter into partnerships with established financial media to increase your own visibility and credibility.
Demonstrate expertise through specialist knowledge, certifications and expert opinions.
Health
Link and reference content to trusted sources such as mayoclinic.org, nih.gov and medlineplus.gov.
Incorporate current medical research and trends into the content.
Provide comprehensive and well-researched medical information backed by official institutions.
Rely on credibility and expertise through certifications and qualifications.
Conduct regular content updates with new medical findings.
Pursue a balanced content strategy that both builds your own domain authority and leverages established healthcare platforms.
Insurance
Use trustworthy sources: Place content on recognized domains such as forbes.com and official government websites (.gov), as these are considered particularly credible by AI search engines.
Provide current and accurate information: Insurance information must always be current and factually correct. This particularly applies to product and service descriptions.
Content syndication: Publish content on authoritative platforms such as Forbes or recognized specialist portals in order to be cited as a trustworthy source more quickly.
Emphasize local relevance: Content should be adapted to regional markets and take local insurance regulations into account.
Restaurants
Build and maintain a strong presence on key review platforms such as Yelp, TripAdvisor, OpenTable and GrubHub.
Actively promote and collect positive ratings and reviews from guests.
Provide complete and up-to-date information on these platforms (menus, opening times, photos, etc.).
Interact with food communities and specialized gastro platforms such as Eater.com.
Perform local SEO optimization as AI searches place a strong emphasis on local relevance.
Create and update comprehensive and well-maintained Wikipedia entries.
Offer a seamless online reservation process via relevant platforms.
Provide high-quality content about the restaurant on various channels.
Tourism / Travel
Optimize presence on key travel platforms such as TripAdvisor, Expedia, Kayak, Hotels.com and Booking.com, as they are viewed as trusted sources by AI search engines.
Create comprehensive content with travel guides, tips and authentic reviews.
Optimize the booking process and make it user-friendly.
Perform local SEO since AI searches are often location-based.
Be active on relevant platforms and encourage reviews.
Providing high-quality content with added value for the user.
Collaborate with trusted domains and partners.
The future of GEO and what it means for brands
The significance of GEO for companies hinges on whether future generations will adapt their search behavior and shift from Google to other platforms.
Emerging trends in this area should become apparent in the coming years, potentially affecting the search market share.
For instance, ChatGPT Search relies heavily on Microsoft Bing’s search technology.
If ChatGPT establishes itself as a dominant generative AI application, ranking well on Microsoft Bing could become critical for companies aiming to influence AI-driven applications.
This development could offer Microsoft Bing an opportunity to gain market share indirectly.
Whether LLMO or GEO will evolve into a viable strategy for steering LLMs toward specific goals remains uncertain.
However, if it does, achieving the following objectives will be essential:
Establishing owned media as a source for LLM training data through E-E-A-T principles.
Generating mentions of the brand and its products in reputable media.
Creating co-occurrences of the brand with relevant entities and attributes in authoritative media.
Producing high-quality content that ranks well and is considered in RAG processes.
Ensuring inclusion in established graph databases like the Knowledge Graph or Shopping Graph.
The success of LLM optimization correlates with market size. In niche markets, it is easier to position a brand within its thematic context due to reduced competition.
Fewer co-occurrences in qualified media are required to associate the brand with relevant attributes and entities in LLMs.
Conversely, in larger markets, achieving this is more challenging because competitors often have extensive PR and marketing resources and a well-established presence.
Implementing GEO or LLMO demands significantly greater resources than traditional SEO, as it involves influencing public perception at scale.
Companies must strategically prepare for these shifts, which is where frameworks like digital authority management come into play. This concept helps organizations align structurally and operationally to succeed in an AI-driven future.
In the future, large brands are likely to hold substantial advantages in search engine rankings and generative AI outputs due to their superior PR and marketing resources.
However, traditional SEO can still play a role in training LLMs by leveraging high-ranking content.
The extent of this influence depends on how retrieval systems weigh content in the training process.
Ultimately, companies should prioritize the co-occurrence of their brands/products with relevant attributes and entities while optimizing for these relationships in qualified media.
Dig deeper: 5 GEO trends shaping the future of search
Contributing authors are invited to create content for Search Engine Land and are chosen for their expertise and contribution to the search community. Our contributors work under the oversight of the editorial staff and contributions are checked for quality and relevance to our readers. The opinions they express are their own.