CatalogSignal Field Notes

What AI assistants get wrong about real catalogs

CatalogSignal Field Notes · ~700 words · For commerce leaders

The CEI Benchmark measures how AI shopping assistants actually read, trust, and recommend real product catalogs. Here is some of what it is showing, and it is not flattering to the current state of catalogs.

The average catalog is only about half-ready. The mean Commerce Eligibility Index score across the panel lands near 50 on a 0 to 100 scale. In plain terms, the typical brand is leaving a lot on the table before AI-assisted discovery reliably works in its favor.

Roughly half of what AI says about a catalog does not hold up. Our funnel-accuracy measure checks AI-generated product claims against the brand's own catalog, and it sits around 48%. Close to half of the claims an assistant makes about products fail validation against the source data. Not malicious, just wrong: a fit, a material, a compatibility detail the data never actually supported.

About one in ten responses carries a commercial-harm risk. Roughly 9% of responses contained an outright hallucination, and around 11% carried what we classify as commercial harm, a confident answer that could cost the brand a sale or mislead the shopper. At the scale AI discovery is reaching, that is not a rounding error.

Figure 1. Across the panel: the average catalog scores near 50 on the 0 to 100 CEI, close to half of AI product claims fail validation against the source catalog, and roughly one in ten responses carries a commercial-harm risk.

Being well known does not make you safe. Familiarity and accuracy are not the same thing. Established brands tended to draw fewer outright hallucinations, but plenty of recognizable names still landed in risky territory, while some smaller, cleaner catalogs were described more accurately than far bigger ones. Brand equity does not travel into the model on its own. The data does.

Category matters. Readiness varied widely by vertical. Some categories were markedly readier than others, and the gap was large enough that a brand's number only makes sense in the context of its own category, not the market as a whole.

None of these are verdicts on any single brand, and we do not publish individual scores. They are the shape of the shelf as AI sees it: a lot of catalogs that look fine to a human and read as half-finished to a machine. The encouraging part is that almost everything driving a low score is fixable, because it comes down to the same unglamorous things in the data: missing attributes, descriptions written for style instead of meaning, and claims that do not reconcile with reviews.

The benchmark refreshes over time, so we can watch how the shelf moves. For now the takeaway is simple. AI is already building the shortlist, it is getting a meaningful share of catalogs wrong, and the brands that measure and close those gaps first will be the ones it learns to recommend.

See where your catalog sits. A Commerce Eligibility Index™ assessment shows exactly where assistants can, and cannot, find, trust, and recommend your products. Request a CEI assessment at catalogsignal.com.

About these figures

Figures are drawn from the CatalogSignal CEI Benchmark and reflect aggregate, directional patterns across the panel. We do not report any individual brand's score.

Next →Why good SEO won’t save you in AI shopping