How ChatGPT, Claude, and Perplexity Read Your Website Images

By the ImageSEO Team. Updated June 2026. ~12 min read.

When a user asks ChatGPT “show me red leather handbags under €100,” a short, mostly invisible sequence of events decides whether your product appears in the answer or your competitor’s does. The same sequence runs when someone asks Perplexity for “the best image optimization plugin for WordPress” or asks Claude to “compare WebP and AVIF for a photography site.” Understanding that sequence — and the specific signals each engine reads off your images — is how you get cited by AI search engines in 2026.

This guide breaks down exactly what happens to your images when an AI assistant answers a question, how ChatGPT, Claude, Perplexity, and Google’s AI Overviews differ, which signals they actually read, and the concrete steps that make your images quotable. If you want the strategic view of why AI visibility matters at all, read our companion piece on optimizing for AI search first — this article is the technical how-to that sits underneath it.

Why this matters now

Search is no longer a single ranked list of blue links. A growing share of product research and how-to questions now starts inside an AI assistant that reads across sources and synthesizes one answer. In that answer there is no “position six” — either your brand and your images are part of the picture the model paints, or they are absent. For visual and e-commerce sites especially, images are not decoration in this world; they are quotable evidence the model can attribute to you.

The good news: almost everything that makes an image legible to an AI engine is the same work that makes it rank in Google Images. You are not building a second, separate strategy — you are sharpening the one you already have. The rest of this article shows where the emphasis changes.

The four steps: what happens when an AI answers a query

ChatGPT (with browsing), Claude (with web search), Perplexity, and Google’s AI Overviews all follow roughly the same pipeline. The details differ, but the shape is consistent.

Step 1: The query is parsed

The engine breaks the request into structured parts: intent (shopping, research, comparison), the object (handbag, plugin, camera), attributes (red, leather, under €100), and the desired output (images, a list, a recommendation). This parsed intent determines what the model goes looking for — and what kind of image, if any, belongs in the answer.

Step 2: A web search runs

All of these engines call a search index — Bing in ChatGPT’s case, a mix of Google, Bing, and Brave in others — and get back a ranked list of candidate URLs. This is the gate most teams forget about: if you don’t rank in conventional search for the query, you are not in the AI’s candidate set. AI visibility is not a replacement for traditional SEO; it is built on top of it. Your image SEO foundations — crawlable pages, relevant content, Google Images presence — are what get you into the room.

Step 3: The AI reads the top results

This is where images come in. Modern models are multimodal — they can see images directly through vision — but in a live retrieval answer they rely far more on the text around an image than on pixel-level analysis, because reading markup is fast, cheap, and unambiguous. When the engine fetches your page, it reads:

alt attributes on your <img> tags
The image file URL (semantic filenames are a signal on their own)
<figcaption> elements and nearby captions
The surrounding heading and paragraph text
Structured data — ImageObject, Product, and other JSON-LD
Open Graph and Twitter Card tags

If your alt text says “red leather handbag, autumn 2026 boutique collection, €89” — that is what the AI can quote. If it says alt="", the model has nothing to attribute to you and reaches for a competitor’s description instead.

Step 4: The AI composes and cites the answer

Finally the model writes a natural-language answer and attaches citations. ChatGPT and Claude show citation links; Perplexity shows both a text answer and image carousels. Whether you get cited comes down to how quotable your page is — and alt text plus captions are the easiest things on a page to quote. A clear, specific caption is, in effect, the sentence the AI reads out on your behalf.

How each engine differs

The pipeline is shared, but each engine treats images a little differently. Here is the practical breakdown as of 2026.

Engine	Search source	How it uses images	What to optimize for
ChatGPT (browsing)	Bing index	Shows inline images with citation links; reads alt text and captions heavily	Bing indexing, alt text, semantic filenames
Perplexity	Google / Bing / Brave mix	Dedicated image carousels alongside the text answer	Image rank, descriptive alt, unique captions
Claude (web search)	Brave / web	Text-first; reads image context to describe and attribute	Surrounding text, figcaption, structured data
Google AI Overviews	Google index	Pulls thumbnails from already-ranking Google Images results	Classic Google Images SEO + schema
Bing Copilot	Bing index	Inline images with source cards	Bing image indexing, Open Graph tags

The common thread: every engine leans on the text representation of your image. None of them can reliably attribute a photo to you from pixels alone. The metadata is the citation hook.

The signals AI engines read from your images

If you only optimize one thing, optimize alt text. But the full set of signals compounds — a page that gets all of them right is dramatically more citable than one that nails only a single field.

Signal	Why the AI cares	Good example
Alt text	The primary caption the model quotes	`alt="red leather tote bag, autumn 2026 collection, €89"`
Filename	A signal read before the page even loads	`red-leather-tote-autumn-2026.jpg`
Figcaption	Visible, human-written context the model trusts	“Our best-selling tote, photographed in natural light.”
Surrounding text	Disambiguates what the image shows	A product paragraph naming material, price, use
ImageObject / Product schema	Machine-readable facts: price, SKU, availability	JSON-LD with `name`, `caption`, `contentUrl`
Open Graph	Drives the preview card and a fallback description	`og:image` + `og:image:alt`

Alt text is the caption the AI reads out

Think of alt text in 2026 as the line an AI assistant attributes to you. When ChatGPT wants to show a photo in an answer, the alt text is the description it reads. Get it right and the model says “according to imageseo.io, this is a red leather handbag from the autumn 2026 collection.” Get it wrong — or leave it empty — and it uses a competitor’s words instead. The difference between good and bad alt text is the difference between being the source and being invisible.

Bad alt text	Why it fails	Better
`alt=""`	Nothing to quote; image is invisible to text retrieval	Describe the subject in plain language
`alt="IMG_4821"`	Filename noise, zero meaning	`alt="ceramic pour-over coffee dripper on oak counter"`
`alt="handbag bag purse leather buy cheap handbag"`	Keyword stuffing reads as spam and gets downweighted	`alt="red leather handbag, autumn 2026 collection, €89"`

For a deeper treatment of writing alt text that ranks and reads naturally, see our guide to alt text for SEO.

How to make your images more AI-citable

1. Write natural-language alt text, not keyword lists

Models are trained on human writing, so they reward human phrasing. Describe what is genuinely in the image, including the details that matter for the query you want to win — material, color, context, price where relevant. Stuffed alt text reads as spam to the same models that read natural alt text as a trustworthy caption.

2. Use semantic filenames

A URL like red-leather-handbag-autumn-2026.jpg is a signal before the AI even parses the page body. IMG_2024.jpg tells it nothing. Rename files to describe their contents — our guide on how to name images for SEO covers the conventions.

3. Add ImageObject and Product schema

Structured data hands the model clean, unambiguous facts. On a product page, Product schema exposes price, SKU, and availability; ImageObject exposes a caption and the canonical image URL. These are exactly the fields an AI quotes when it recommends a product. A minimal example:

{
  "@context": "https://schema.org",
  "@type": "ImageObject",
  "contentUrl": "https://example.com/red-leather-handbag-autumn-2026.jpg",
  "name": "Red leather handbag, autumn 2026 collection",
  "caption": "Hand-stitched red leather tote, autumn 2026, €89"
}

4. Write captions with context or credit

A <figcaption> that adds context (“photographed in natural light at our Lisbon studio”) makes the image uniquely yours and gives the model a sentence it can attribute. Generic captions get ignored; specific ones get quoted.

5. Keep every description unique

Duplicating the same alt text across dozens of images reads as boilerplate and gets downweighted. Each image should describe its own subject. This is where automation helps — writing unique, specific alt text by hand across a large library is where most teams give up.

6. Stay technically crawlable

None of the above matters if a bot cannot reach the image. Don’t block image directories in robots.txt, serve images fast (retrieval pipelines time out on slow pages), and implement lazy loading correctly so crawlers still see the <img> markup. See our image-for-SEO fundamentals for the checklist.

Image format and speed: the retrieval tax

Retrieval pipelines operate under tight time budgets. When an AI engine fetches your page to read it, a slow response or a multi-megabyte hero image can cause the fetch to time out before your content is parsed — which means none of your carefully written alt text gets read. Format and weight are therefore not just a Core Web Vitals concern; they are an AI-visibility concern.

Serve modern formats — WebP or AVIF — at appropriately sized dimensions, and let the browser and crawler pick the right one with srcset. A 200 KB WebP that loads instantly is read in full; a 4 MB PNG that takes three seconds may never be reached. The same compression discipline that wins LCP in real-user metrics is what keeps your images inside the retrieval window. If you are weighing formats, our comparison of image optimization fundamentals walks through the trade-offs.

The rule of thumb: every meaningful image should be both describable (good metadata) and reachable (fast, crawlable, not JavaScript-gated). Miss either half and the image is effectively invisible to AI search.

Common mistakes that make images invisible to AI

Empty alt attributes on meaningful content images — the single most common reason an image is uncitable.
Decorative-only thinking — treating every image as decoration and never describing the ones that carry information.
Boilerplate alt text repeated site-wide (e.g., your brand name on every image).
Blocking crawlers from image folders or rendering images only via JavaScript that bots don’t execute.
No structured data on product or recipe pages, leaving the most quotable facts machine-invisible.
Slow-loading originals that retrieval bots abandon before reading.

Different site types, different priorities

The signals are universal, but where you spend your effort first depends on what kind of site you run. A quick map:

Site type	Highest-leverage signal	Why
E-commerce	`Product` + `ImageObject` schema, price-bearing alt text	Shopping queries want quotable facts: price, material, availability
Publisher / blog	Descriptive alt text + figcaptions on supporting images	AI quotes captions when illustrating an explanatory answer
Photographer / portfolio	Semantic filenames + unique per-image descriptions	Visual queries lean on filename and caption to attribute the shot
SaaS / B2B	Diagram alt text + entity-consistent naming	Comparison queries reward clear, consistent product descriptions

Whatever the type, start with your highest-traffic pages: they already rank, which means they are already in the candidate set, so improving their image metadata has the fastest path to an AI citation.

How to measure AI citations and image visibility

There is no official “AI citations” dashboard yet, so you triangulate from three sources:

Referral traffic from chatgpt.com, claude.ai, and perplexity.ai in your analytics. Watch these over 4–8 week windows after any alt-text overhaul — they are the leading indicator that you are being cited.
Manual prompt testing. Pick the 20–30 questions most central to your category and run them across ChatGPT, Perplexity, Google AI Overviews, and Claude on a regular cadence. Note which brands appear, how they are described, and which sources get cited.
AI-visibility monitoring tools. A new category of platforms tracks brand presence in AI answers and share of voice over time. They turn the manual testing above into a repeatable report.

A 30-day action plan

Week	Focus	Outcome
Week 1	Audit: find empty, duplicated, and noise alt text across your top-traffic pages	A prioritized fix list
Week 2	Rewrite alt text and rename files on your highest-value pages	Most-quotable pages cleaned up
Week 3	Add ImageObject / Product schema and captions where missing	Machine-readable facts exposed
Week 4	Baseline manual prompt tests + start tracking AI referral traffic	A measurement loop you can repeat

Frequently asked questions

Do AI engines “see” my images or just read the text around them?

Both, but in a live web answer they lean heavily on the text — alt text, captions, filenames, and surrounding copy — because reading markup is faster and less ambiguous than analyzing pixels. The vision capability matters most when you upload an image directly into a chat, not when the model is retrieving from the web.

Is image alt text still worth it if I rank well in normal search?

Yes — ranking gets you into the candidate set, but alt text and captions decide whether the AI can quote you once you’re there. Two pages that rank similarly can have very different AI visibility depending on how citable their image metadata is.

Should alt text include prices and product details?

On product images, yes, where it reads naturally — those are exactly the facts an AI quotes when recommending products. Keep it descriptive, not stuffed. Put the structured, machine-readable version of the same facts in Product schema.

How long until I see AI referral traffic after fixing alt text?

Plan on 4–8 weeks. Engines re-crawl and re-index on their own schedule, and AI referral traffic builds gradually rather than spiking, so judge it on a trend over a window, not day to day.

Does this apply to non-English sites?

Yes. Write alt text in the language of the page and the audience you want to reach. Multilingual sites should describe images in each locale rather than reusing one English description everywhere.

Can I get penalized for over-optimizing alt text for AI?

You can get downweighted, not penalized in a formal sense. The failure mode is keyword stuffing — cramming repeated terms into alt text reads as spam to the same models you are trying to impress. The safe path is to describe the image accurately and specifically; that satisfies both classic search and AI retrieval at once.

Do I need different alt text for ChatGPT versus Perplexity?

No. Write one clear, natural description per image and it serves every engine — they all read the same underlying markup. There is no value in maintaining engine-specific variants; there is enormous value in making the single description specific and unique.

Conclusion

AI assistants do not read your images the way a person does — they read the words you attach to them. Alt text, filenames, captions, and structured data are the caption the model reads out, the facts it quotes, and the citation it attributes to your brand. The sites that win AI visibility in 2026 are the ones that treat every meaningful image as a quotable, attributable piece of evidence rather than decoration.

For WordPress, ImageSEO writes alt text tuned for exactly this — natural-language, specific, and unique per image, with semantic filenames and the structured data AI engines read. It’s the same setup we run on our own site. If you’re ready to go deeper, start with our complete image SEO guide and the strategic overview in optimizing for AI search.