Manick Bhan
Founder CEO/CTO

The Limits of Schema Markup for AI Search: An Empirical Analysis of Citation Patterns Across Major Models

You may also read a concise version of this research in our blog: The Limits...

Did like a post? Share it with:

Manick Bhan
Founder CEO/CTO

You may also read a concise version of this research in our blog: The Limits of Schema Markup for AI Search: LLM Citation Analysis

This study examines whether implementing schema markup on webpages influences how often domains are cited by major large language models (LLMs). Using a dataset of extracted HTML schema information and a separate dataset measuring domain visibility in LLM responses, we compare schema coverage with visibility metrics across OpenAI, Gemini, and Perplexity.

Overall, the analysis shows that schema adoption varies significantly across domains, but higher schema coverage does not reliably lead to higher visibility scores. This suggests that schema markup alone is not a major driver of LLM citation behavior, despite common industry assumptions.

Methodology

1. Schema Data Extraction

We analyzed webpage HTML outputs to determine whether each URL contained schema markup in formats such as JSON-LD, microdata, or RDFa.

Each page was labeled as either containing a schema or not. Domain names were then extracted from URLs so that schema usage could be evaluated at the domain level and compared with LLM visibility scores.

2. Computing Schema Coverage

Schema coverage was calculated as the proportion of URLs within each domain that contained schema markup. This resulted in a domain-level schema percentage indicating how widely structured data was implemented.

3. Categorizing Domains into Schema Buckets

To enable meaningful comparisons, each domain was assigned to one of five schema usage categories:

No Schema: 0%
Minimal Schema: 1–30%
Moderate Schema: 31–70%
High Schema: 71–99%
Full Schema: 100%

These buckets allowed us to compare visibility patterns across different levels of schema adoption.

4. LLM Visibility Data Preparation

The visibility dataset measured how often each domain appeared in LLM-generated responses. Some domains appeared multiple times within the same platform, so additional preprocessing ensured accurate aggregation.

This included:

Normalizing all domain names into a consistent domain.tld format to allow correct matching across datasets.
Aggregating repeated domain records using weighted averages, where visibility scores were weighted by the number of responses the domain competed in. This ensured that visibility measurements based on a large number of LLM responses carried more influence than one-off or low-sample observations.
Ensuring cross-platform comparability so that each domain had a unified and consistent visibility representation across OpenAI, Gemini, and Perplexity.

This produced a single, reliable visibility record per domain, per platform.

5. Merging Schema and Visibility Data

Schema coverage metrics were matched with LLM visibility records using normalized domain names.

This produced a unified dataset containing each domain’s schema percentage, schema category, visibility scores across platforms, and supporting metrics such as competition counts and appearance frequency.

This merged dataset formed the basis of all downstream analysis.

6. Analysis Framework

The analysis examined how visibility scores varied across schema categories. The schema buckets served as the primary comparison groups. We evaluated:

Distribution of visibility scores across schema levels
Whether higher schema adoption correlated with better visibility
Platform-specific behavior and consistency

This framework enabled a direct assessment of whether schema markup influences LLM citation behavior.

Distribution of Schema Coverage Across Domains

The highly polarized distribution is driven in part by uneven sample sizes across domains.

Many domains in the dataset only had a small number of sampled URLs, and when all of those pages either contained schema or lacked it, those domains naturally fell into the 0 percent or 100 percent buckets.

Larger domains with hundreds of sampled URLs showed more variation, but even in those cases, schema adoption tended to be consistent enough that their percentages still clustered near the extremes.

As a result, the distribution reflects differences in sampling density as much as differences in schema adoption practices.

Impact of Schema Coverage on LLM Visibility

Box Plot

Key Insights

Visibility distributions for Perplexity, Gemini, and OpenAI are highly similar across all schema categories.

There is no consistent upward trend showing that domains with more schema markup achieve higher visibility scores.

The medians, interquartile ranges, and overall spread look nearly identical from “No Schema” to “Full Schema,” indicating that schema adoption does not appear to be a determining factor in how prominently a domain is cited by any of the three LLM platforms.

Violin Plot

This visualization highlights the density and spread of visibility scores within each schema category.

Key Insights

1. Distributions remain unchanged across schema levels

The density shapes, medians, and spread look almost identical across all schema categories, indicating that schema percentage has no meaningful effect on the distribution of visibility scores.

2. High- and low-performing domains appear in every category

Domains with very high visibility (70–100) and very low visibility (0–20) are present in all schema buckets. This shows that schema usage does not distinguish strong performers from weak ones.

3. Platform patterns remain stable

Perplexity consistently has the highest median visibility, Gemini the lowest, and OpenAI falls in between. These platform differences stay the same regardless of schema adoption level.

4. No upward trend with higher schema adoption

If schema improved visibility, high-schema categories would show higher medians or tighter distributions. Instead, the shapes remain virtually the same across all buckets.

5. Wide overlap across all categories

All platforms show broad visibility ranges (0 to 100) within every schema group, demonstrating that schema coverage alone does not explain or predict visibility outcomes.

Conclusion

The findings show that schema markup has no measurable effect on LLM visibility across OpenAI, Gemini, or Perplexity.

Domains with complete schema coverage perform no better than those with minimal or no schema, and visibility distributions remain almost identical across all schema categories. While schema markup continues to play a role in traditional SEO, it does not appear to influence how frequently LLMs cite a domain.

This indicates that the belief that schema improves LLM visibility is overstated, and that other factors such as content quality, topical relevance, and LLM retrieval behavior are likely far more important in determining which sources LLMs reference.

Manick Bhan
Founder CEO/CTO

Manick Bhan is a 3x INC 5000 Founder CEO/CTO of Search Atlas which is an AI SEO automation platform used by thousands of brands and agencies and awarded Best SEO Platform by the Global Search Awards, Shortlisted by Capterra, Front Runners by Software Advice, Category Leaders by GetApp, and best tool for customer satisfaction and usability by Gartner.

Manick Bhan founded LinkGraph, a digital marketing firm that helps enterprise brands and agencies scale through data-driven SEO with clients like Shutterfly and Samsung. LinkGraph is listed as one of the Fastest Growing Private Companies in the US by inc.5000, as one of the Best Workplaces in Advertising & Marketing by Fortune, as New York’s B2B Leaders by Clutch, won no.1 Spot in Nevada’s Top Workplaces, Best B2B SEO Campaign by The Drum Awards for Search, and named Best Start-Up Agency at U.S. Search Awards.

Manick Bhan is the owner for Signal Genesys, the leading platform for automated press release distribution and digital presence management, and LinkLaboratory, the largest online publisher catalog in the world.

With 10+ years of experience in SEO from the in-house and agency side, Manick Bhan has taught both startups and Fortune 500 companies how to scale their brands with a data-driven SEO strategy that can break into any market and outrank even the biggest of competitors. Bhan’s innovative approach to SEO has helped Search Atlas and LinkGraph scale to multiple 8 figures.

Manick's thought leadership has appeared in leading publications like Forbes, Search Engine Journal (SEJ), VentureBeat, G2, Digital Summit, Wordstream, Wix SEO Hub, Wordable, Inc. Masters, AllBusiness, SEO Blog, Jumpstory, Serpstat, Outbrain, Improvado, Unstack, Clickbank, Built in, Martechseries, Smartbrief, Marketingprofs, Readwrite, Honeybook, Content Marketing Institute, LocalIQ, CXL, Oncrawl, Venture Beat, Addicted2Success, Search Engine Watch, Business 2 Community, Digital Connect MAG, and VegasInc.

Manick Bhan is a speaker at events like TechCrunch Disrupt, Traffic & Conversion Summit, Ad World, HighLevel Summit, Chiang Mai SEO, Merchant Mastery, SEO Week, AI Bot Summit, SEO Spring Training, LeadSnap Mansion Mastermind, SEOROCKSTARS, LeadSnapEvents, DigiMarCon, brightonSEO, Affiliate Summit West, Traffic and Conversion Summit, Outranking Summit, TES Affiliate Conference, billo Summit, ContentTECH Summit, Content Marketing Conference, VEGPRENEUR Expert Hour, Ai4 Conference, SMX West, and Affiliate Summit West.

Manick Bhan is the Founder CEO/CTO of the SEOTheory community, a community designed for agency owners looking to increase their SEO results.

Manick Bhan enjoys writing and speaking on topics that range from digital marketing to artificial intelligence and machine learning to social impact in the animal welfare and environmental space.

Manick lives in Medellin, Colombia with his wife Sophia Deluz-Bhan, daughter Ruby, and a house full of animals including Voodoo the SEO cat.

The New Era Of AI Visibility

Join Our Community Of SEO Experts Today!

Visualize Your SEO Success: Expert Videos & Strategies

Play

Real Success Stories: In-Depth Case Studies

Business name:

Dr. David McInnis Orthodontics (dmsmile.com)

472% Organic Traffic Growth & 380% More Patient Conversions in 6 Months

The Challenge:

Dr. David McInnis Orthodontics struggled with low search visibility and inconsistent patient inquiries. Despite offering premium orthodontic services, their online presence failed to generate steady leads.

472% increase in organic traffic

380% growth in patient inquiries & conversions

250+ high-intent keywords ranking on Page 1

53% lower cost-per-acquisition

How We Did It:

By implementing Search Atlas’s advanced SEO strategy, we restructured their website for search intent alignment, optimized local SEO, and enhanced technical performance to dominate Google rankings.

Now, Dr. David McInnis Orthodontics enjoys a steady stream of organic leads and a powerful online presence, making them the go-to orthodontic practice in their area.

Business name:

Rehab Facility

Rehab Facility Dominates SERP with 1400+ Keywords in Top 3

The Challenge:

Their mission is to provide clients with all the tools necessary to tackle addiction at its source. To do this, they needed to significantly increase their online presence and support their crucial mission.

+277% Organic Traffic

+ 135% Organic Keywords

1400 + Keywords Ranking Top 3

659% referring domains increased

How We Did It:

The client utilized Search Atlas to identify and resolve technical flaws, including broken links, slow loading times, and navigation issues. With OTTO, they performed these fixes and optimizations in one day.

Business name:

DUI Law Firm

Making an Austin DUI Law Firm a Local Reference with OTTO

The Challenge:

In Austin’s bustling legal market, standing out as a DUI law firm is challenging due to intense competition. Achieving local search visibility requires an innovative strategic SEO approach.

+100% Pins Improved

+88% Locations Ranking Top 3

+88% Higher Positions in Local Searches

How We Did It:

To improve search rankings for their keywords, we incorporated these terms into the website and Google Business Profile (GBP) over 4 weeks using OTTO. After OTTO implementation, 100% of the pins are ranking either in top 3 or top 5 local search positions.

OTTO’s automated SEO optimization process simplifies SEO efforts, reducing manual labor and allowing the team to focus on other crucial tasks.

Business name:

nonprofit sensory learning center

Nonprofit Climbs from #27 to #1 and Doubles Traffic with OTTO

The Challenge:

This center is dedicated to providing essential resources and programs for children with special needs and their families. Despite their valuable mission, the center’s website traffic had stalled for months, preventing them from connecting with potential clients.

+ 111% Organic Traffic

+75.5% Organic Keywords

Top 1 Ranking for Target Keyword

How We Did It:

To drive more traffic to their site, the client implemented OTTO’s recommendations. This included enhancing content quality, optimizing technical aspects of the site, refining on-page SEO elements, and building authority through the publication of 2 press releases.

The results were astounding. The client transitioned from being relatively obscure online to becoming a go-to resource in local search results for families seeking support.

Ready to Replace Your SEO Stack With a Smarter System?

If Any of These Sound Familiar, It’s Time for an Enterprise SEO Solution:

25 - 1000+ websites being managed

25 - 1000+ PPC accounts being managed

25 - 1000+ GBP accounts being managed

The Limits of Schema Markup for AI Search: An Empirical Analysis of Citation Patterns Across Major Models

Did like a post? Share it with:

Methodology

1. Schema Data Extraction

2. Computing Schema Coverage

3. Categorizing Domains into Schema Buckets

4. LLM Visibility Data Preparation

5. Merging Schema and Visibility Data

6. Analysis Framework

Distribution of Schema Coverage Across Domains

Impact of Schema Coverage on LLM Visibility

Box Plot

Key Insights

Violin Plot

Key Insights

Conclusion

The New Era Of AI Visibility

Join Our Community Of SEO Experts Today!

Related Reads to Boost Your SEO Knowledge

Do LLMs Retain or “Leak” Retrieved Knowledge

How LLMs Rank Local Businesses: An Empirical Study of “Near Me” Query Citations

A Comparative Evaluation of LLM Responses from Gemini, OpenAI, and Perplexity

URL Freshness in LLM-Generated Answers: Comparing Search-Enabled and Search-Disabled Citation Patterns

Domain Industry Analysis in LLM Responses

How GPT Results Differ from Search Engine Results

Visualize Your SEO Success: Expert Videos & Strategies

Real Success Stories: In-Depth Case Studies

472% Organic Traffic Growth & 380% More Patient Conversions in 6 Months

The Challenge:

472% increase in organic traffic

380% growth in patient inquiries & conversions

250+ high-intent keywords ranking on Page 1

53% lower cost-per-acquisition

How We Did It:

Rehab Facility Dominates SERP with 1400+ Keywords in Top 3

The Challenge:

+277% Organic Traffic

+ 135% Organic Keywords

1400 + Keywords Ranking Top 3

659% referring domains increased

How We Did It:

Making an Austin DUI Law Firm a Local Reference with OTTO

The Challenge:

+100% Pins Improved

+88% Locations Ranking Top 3

+88% Higher Positions in Local Searches

How We Did It:

Nonprofit Climbs from #27 to #1 and Doubles Traffic with OTTO

The Challenge:

+ 111% Organic Traffic

+75.5% Organic Keywords

Top 1 Ranking for Target Keyword

How We Did It:

Ready to Replace Your SEO Stack With a Smarter System?