Skip to main content

By Wesley Fouse

A recent survey from the Imagining the Digital Future Center at Elon University found that more than half (52%) of adult Americans now use Large Language Models (LLMs) like ChatGPT, Gemini, and Claude, with 68% of those users saying they use LLMs as search engines to look up information, answer questions and solve problems. 

With a large and growing user base, it’s natural to wonder: How do LLMs find the information they display and decide what content to share with users?

While the underlying decision-making processes remain opaque, since LLM developers rarely disclose their source selection algorithms and weights, researchers and technologists are beginning to piece together a growing understanding of how LLMs operate. This reveals their tendencies as well as the kinds of content they favor, offering valuable lessons for anyone trying to boost visibility and credibility in a world where AI now plays a major role in surfacing both earned and owned media.

How Large Language Models Select Sources:

LLMs have two primary ways of answering a query. The first is through what it “remembers” from training, meaning patterns and facts it absorbed from the massive datasets – books, websites, articles, code and more – that it was trained on. This suffices for a question that doesn’t need new information, like a math problem or a question about the definition of a word. 

However, as people increasingly use LLMs as a search engine, there will be questions that the AI cannot answer based solely on its training data. For example, a user might ask a question like: “Are any wildfires happening right now?” To answer this, LLMs need real-time information and must comb through live sources to find it, rather than relying solely on its static training data. 

To generate timely answers, LLMs rely on a four-step process: search, rank, filter and synthesize.

1) Search

The first step an LLM will take is performing a live search on the web. When a user asks a time-sensitive question, the model will look at the prompt and generate multiple keyword-based search queries that it will use to scan the internet for the most relevant, recent information. Imagine performing multiple Google searches around the same topic, but slightly tweaking your query each time to get a larger swath of responses. It works like that.For example, when asked “Are any wildfires happening right now?”, the LLM might search for terms like “active wildfire incidents June 26 2025” or “Cal Fire current reports.”

2) Rank

Searching the web multiple times provides countless results. Since the LLM doesn’t have the time or resources to thoroughly read each one, it quickly ranks the sources and determines which it should crawl; it seems to do this by looking at a few elements:

  • Relevancy: The LLM will rank sources by determining how closely it believes the content matches the user’s question. 
  • Domain Authority: Domain authority is an industry standard that ranks the trustworthiness of a website. LLMs want to show users high-quality results and deliver answers that are factually correct, so high “domain authority” sites like government agencies, universities or major media outlets are often weighted more heavily because they are trusted sources with lots of high-quality backlinks, consistent traffic and a long history online. 
  • Freshness: Depending on the relevancy of the user’s query, finding the most up-to-date information on a topic is important. An article from this morning will win out over one from five months ago if the topic is urgent.

3) Filtering

Then comes filtering. Not everything that ranks highly gets through to the user. Content that’s low quality, paywalled, hard to read or misleading may be removed before the model uses it. Essentially, the LLM is doing what you would do if you were looking for information online – just much faster. It’s scanning through multiple sites and if it can’t access the content on a site or the site doesn’t actually provide the information that the user is looking for, it will get filtered out from search results. 

4) Synthesizing

Finally, the system pulls the most useful excerpts into the model’s context window, the portion of memory it can access when formulating a response. Only then does it generate an answer, citing the sources that it used throughout its response.

What This Means:

This has big implications for communications – and informs how we should think about visibility. Since LLMs are constantly retrieving live content and deciding what to surface, the materials you publish today can shape the answers people receive tomorrow.

Earned media remains a powerful tool. Articles in well-regarded publications tend to rank highly in LLM retrievals. They’re recent, relevant and come from sources that systems have likely been trained on and trust. If your brand is featured in the right story at the right time, it could show up in an AI-generated answer, especially for fast-moving topics.

But even when your organization isn’t named directly, your message can still break through. LLMs often pull language and framing from trusted sources, meaning that strong, consistent messaging – whether in a press release, a background briefing or a well-placed quote – can echo across summaries and responses. In that sense, influencing the narrative matters as much as visibility.

Owned content plays a role here too. Press releases, blog posts and updates on your own website can be pulled into an LLM’s context window if they’re accessible. That means your site should be structured so crawlers can find and index your materials. Clear news sections, up-to-date content and technically sound web design all help. So does avoiding blockers like login walls, slow page speed or inconsistent formatting.

More than ever, traditional communications strategies now intersect with the technical mechanics of information discovery. In the age of AI, communicators must think like web architects as well as storytellers.

LLMs aren’t rewriting the rules of visibility, but they are reshaping how messages spread. Organizations must adapt to this new reality to both show up in the right moments and shape the public conversation.

Est Reading Time: 6min