Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

SEOs Are Recommending Structured Data For AI Search… Why?


A post on LinkedIn questioned the idea that Schema.org structured data has an impact on what the big language model provides. There are reportedly some SEOs who recommend structured data for better AI search engine rankings.

Patrick Stox wrote the following post on LinkedIn:

“Did I miss something? Why do SEOs think schema markup will affect LLM results?”

Patrick said “LLM output” in the context of the SEO recommendation so it’s likely a reference to ChatGPT Search and other AI search engines. So do AI search engines get their data from structured data?

LLMs are trained on web text, books, government records, legal documents and other textual data (as well as other forms of media) which are then used to produce summaries and responses, but without plagiarizing the training data. This means that it is pointless to think that optimizing your web content will result in LLM itself sending referrals to that website.

AI search engines are based on search indexes (and knowledge graphs) via Retrieval Augmented Generation (RAG). Search engine indexes themselves are created from indexed data, not structured schema data.

For example, Perplexity AI ranks web-indexed content using a modified version of PageRank on its search index. Google and Bing index text data and do things like remove duplicate content, remove stop words, and other manipulations of text extracted from HTML, plus not every page has structured data on it.

In fact, Google uses only a fraction of the Schema.org structured data available for certain types of search experiences and rich results, which in turn limits the type of structured data that publishers use.

There’s also the fact that both Bing and Google’s crawlers render HTML, identify headers, footers, and body content (from which they extract text for ranking purposes). Why would they do that if they’re going to rely on structured Schema data, right?

The idea that it’s good to use Schema.org structured data for better AI search engine rankings is not based on fact, it’s just fanciful speculation. Or it could be from the “game of telephone” effect where one person says something and then twenty people later it transforms into something completely different.

For example, Jono Alderson suggested it structured data could be the standard that AI search engines could use better understand the web. He didn’t say that AI search engines currently use it, he just suggested that AI search engines should consider adopting it and maybe that post twenty SEOs later turned into a full theory.

Unfortunately, there are many unfounded ideas in SEO circles. I saw an SEO claim on social media the other day that Google Local Search does not use IP addresses in response to “near me” search queries. All anyone had to do to test the idea was to log into a VPN, select a geographic location for their IP address, and perform a search query for “near me” and they would see that the IP address used by the VPN had affected “near me” ” search results.

Screenshot of Near Me query affected by IP address

Google even publishes a support page it says they use the IP address to personalize search results, but there are people who believe otherwise because some SEO did a correlation study, and when questioned, we’re back to someone yelling that Google is lying.

Will you believe your lying eyes?

Schema.Org Structured data and AI search results

“SEOs” recommending that publishers use Schema.org structured data for LLM training data also make no sense because the training data is not listed in the LLM output, only for the web-sourced output, which itself is sourced from the search index which is from the caterpillar. As mentioned earlier, publishers use only a fraction of the Schema.org structured data available because Google itself uses only a small fraction of it. So it doesn’t make sense for an AI search engine to rely on structured data for its output.

Search marketing expert Christopher Shin (LinkedIn profilee) commented:

“I was thinking the same thing after reading your post Patrick. This is how I interpret it at the moment. I thought LLM usually doesn’t generate answers from search engine serps, but from data interpretation. Right? But SER{s would use schema data markup to display rich snippets, etc. not? I think the key nuance with schema and LLMs is that search engines use schemas for SERPs, while LLMs use data interpretation when it comes to how schema affects LLMs.”

People like Christopher Shin and Patrick Stox give me hope that pragmatic and sensible SEO is still struggling to cut through the noise, Patrick’s post on LinkedIn is proof of that.

Pragmatic SEO

The definition of pragmatics is doing things for reasonable and realistic reasons, not based on opinions based on incomplete information and guesswork.

Speaking as someone who has been involved in SEO since its birth, not thinking things through is why SEOs and publishers have traditionally wasted time on ill-defined problems, spinning their wheels on useless activities like superficial EEAT signals and so on and so forth . It’s really disheartening to point to documentation and official statements and be answered with statements like “Google is lying”. Such an attitude makes a person “want to shout”.

A little more pragmatic SEO please.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *