By the ImageSEO Team. Updated June 2026. ~12 min read.
When a user asks ChatGPT “show me red leather handbags under €100,” a short, mostly invisible sequence of events decides whether your product appears in the answer or your competitor’s does. The same sequence runs when someone asks Perplexity for “the best image optimization plugin for WordPress” or asks Claude to “compare WebP and AVIF for a photography site.” Understanding that sequence — and the specific signals each engine reads off your images — is how you get cited by AI search engines in 2026.
This guide breaks down exactly what happens to your images when an AI assistant answers a question, how ChatGPT, Claude, Perplexity, and Google’s AI Overviews differ, which signals they actually read, and the concrete steps that make your images quotable. If you want the strategic view of why AI visibility matters at all, read our companion piece on optimizing for AI search first — this article is the technical how-to that sits underneath it.
Search is no longer a single ranked list of blue links. A growing share of product research and how-to questions now starts inside an AI assistant that reads across sources and synthesizes one answer. In that answer there is no “position six” — either your brand and your images are part of the picture the model paints, or they are absent. For visual and e-commerce sites especially, images are not decoration in this world; they are quotable evidence the model can attribute to you.
The good news: almost everything that makes an image legible to an AI engine is the same work that makes it rank in Google Images. You are not building a second, separate strategy — you are sharpening the one you already have. The rest of this article shows where the emphasis changes.
ChatGPT (with browsing), Claude (with web search), Perplexity, and Google’s AI Overviews all follow roughly the same pipeline. The details differ, but the shape is consistent.
The engine breaks the request into structured parts: intent (shopping, research, comparison), the object (handbag, plugin, camera), attributes (red, leather, under €100), and the desired output (images, a list, a recommendation). This parsed intent determines what the model goes looking for — and what kind of image, if any, belongs in the answer.
All of these engines call a search index — Bing in ChatGPT’s case, a mix of Google, Bing, and Brave in others — and get back a ranked list of candidate URLs. This is the gate most teams forget about: if you don’t rank in conventional search for the query, you are not in the AI’s candidate set. AI visibility is not a replacement for traditional SEO; it is built on top of it. Your image SEO foundations — crawlable pages, relevant content, Google Images presence — are what get you into the room.
This is where images come in. Modern models are multimodal — they can see images directly through vision — but in a live retrieval answer they rely far more on the text around an image than on pixel-level analysis, because reading markup is fast, cheap, and unambiguous. When the engine fetches your page, it reads:
alt attributes on your <img> tags<figcaption> elements and nearby captionsImageObject, Product, and other JSON-LDIf your alt text says “red leather handbag, autumn 2026 boutique collection, €89” — that is what the AI can quote. If it says alt="", the model has nothing to attribute to you and reaches for a competitor’s description instead.
Finally the model writes a natural-language answer and attaches citations. ChatGPT and Claude show citation links; Perplexity shows both a text answer and image carousels. Whether you get cited comes down to how quotable your page is — and alt text plus captions are the easiest things on a page to quote. A clear, specific caption is, in effect, the sentence the AI reads out on your behalf.
The pipeline is shared, but each engine treats images a little differently. Here is the practical breakdown as of 2026.
| Engine | Search source | How it uses images | What to optimize for |
|---|---|---|---|
| ChatGPT (browsing) | Bing index | Shows inline images with citation links; reads alt text and captions heavily | Bing indexing, alt text, semantic filenames |
| Perplexity | Google / Bing / Brave mix | Dedicated image carousels alongside the text answer | Image rank, descriptive alt, unique captions |
| Claude (web search) | Brave / web | Text-first; reads image context to describe and attribute | Surrounding text, figcaption, structured data |
| Google AI Overviews | Google index | Pulls thumbnails from already-ranking Google Images results | Classic Google Images SEO + schema |
| Bing Copilot | Bing index | Inline images with source cards | Bing image indexing, Open Graph tags |
The common thread: every engine leans on the text representation of your image. None of them can reliably attribute a photo to you from pixels alone. The metadata is the citation hook.
If you only optimize one thing, optimize alt text. But the full set of signals compounds — a page that gets all of them right is dramatically more citable than one that nails only a single field.
| Signal | Why the AI cares | Good example |
|---|---|---|
| Alt text | The primary caption the model quotes | alt="red leather tote bag, autumn 2026 collection, €89" |
| Filename | A signal read before the page even loads | red-leather-tote-autumn-2026.jpg |
| Figcaption | Visible, human-written context the model trusts | “Our best-selling tote, photographed in natural light.” |
| Surrounding text | Disambiguates what the image shows | A product paragraph naming material, price, use |
| ImageObject / Product schema | Machine-readable facts: price, SKU, availability | JSON-LD with name, caption, contentUrl |
| Open Graph | Drives the preview card and a fallback description | og:image + og:image:alt |
Think of alt text in 2026 as the line an AI assistant attributes to you. When ChatGPT wants to show a photo in an answer, the alt text is the description it reads. Get it right and the model says “according to imageseo.io, this is a red leather handbag from the autumn 2026 collection.” Get it wrong — or leave it empty — and it uses a competitor’s words instead. The difference between good and bad alt text is the difference between being the source and being invisible.
| Bad alt text | Why it fails | Better |
|---|---|---|
alt="" | Nothing to quote; image is invisible to text retrieval | Describe the subject in plain language |
alt="IMG_4821" | Filename noise, zero meaning | alt="ceramic pour-over coffee dripper on oak counter" |
alt="handbag bag purse leather buy cheap handbag" | Keyword stuffing reads as spam and gets downweighted | alt="red leather handbag, autumn 2026 collection, €89" |
For a deeper treatment of writing alt text that ranks and reads naturally, see our guide to alt text for SEO.
Models are trained on human writing, so they reward human phrasing. Describe what is genuinely in the image, including the details that matter for the query you want to win — material, color, context, price where relevant. Stuffed alt text reads as spam to the same models that read natural alt text as a trustworthy caption.
A URL like red-leather-handbag-autumn-2026.jpg is a signal before the AI even parses the page body. IMG_2024.jpg tells it nothing. Rename files to describe their contents — our guide on how to name images for SEO covers the conventions.
Structured data hands the model clean, unambiguous facts. On a product page, Product schema exposes price, SKU, and availability; ImageObject exposes a caption and the canonical image URL. These are exactly the fields an AI quotes when it recommends a product. A minimal example:
{
"@context": "https://schema.org",
"@type": "ImageObject",
"contentUrl": "https://example.com/red-leather-handbag-autumn-2026.jpg",
"name": "Red leather handbag, autumn 2026 collection",
"caption": "Hand-stitched red leather tote, autumn 2026, €89"
}
A <figcaption> that adds context (“photographed in natural light at our Lisbon studio”) makes the image uniquely yours and gives the model a sentence it can attribute. Generic captions get ignored; specific ones get quoted.
Duplicating the same alt text across dozens of images reads as boilerplate and gets downweighted. Each image should describe its own subject. This is where automation helps — writing unique, specific alt text by hand across a large library is where most teams give up.
None of the above matters if a bot cannot reach the image. Don’t block image directories in robots.txt, serve images fast (retrieval pipelines time out on slow pages), and implement lazy loading correctly so crawlers still see the <img> markup. See our image-for-SEO fundamentals for the checklist.
Retrieval pipelines operate under tight time budgets. When an AI engine fetches your page to read it, a slow response or a multi-megabyte hero image can cause the fetch to time out before your content is parsed — which means none of your carefully written alt text gets read. Format and weight are therefore not just a Core Web Vitals concern; they are an AI-visibility concern.
Serve modern formats — WebP or AVIF — at appropriately sized dimensions, and let the browser and crawler pick the right one with srcset. A 200 KB WebP that loads instantly is read in full; a 4 MB PNG that takes three seconds may never be reached. The same compression discipline that wins LCP in real-user metrics is what keeps your images inside the retrieval window. If you are weighing formats, our comparison of image optimization fundamentals walks through the trade-offs.
The rule of thumb: every meaningful image should be both describable (good metadata) and reachable (fast, crawlable, not JavaScript-gated). Miss either half and the image is effectively invisible to AI search.
The signals are universal, but where you spend your effort first depends on what kind of site you run. A quick map:
| Site type | Highest-leverage signal | Why |
|---|---|---|
| E-commerce | Product + ImageObject schema, price-bearing alt text | Shopping queries want quotable facts: price, material, availability |
| Publisher / blog | Descriptive alt text + figcaptions on supporting images | AI quotes captions when illustrating an explanatory answer |
| Photographer / portfolio | Semantic filenames + unique per-image descriptions | Visual queries lean on filename and caption to attribute the shot |
| SaaS / B2B | Diagram alt text + entity-consistent naming | Comparison queries reward clear, consistent product descriptions |
Whatever the type, start with your highest-traffic pages: they already rank, which means they are already in the candidate set, so improving their image metadata has the fastest path to an AI citation.
There is no official “AI citations” dashboard yet, so you triangulate from three sources:
chatgpt.com, claude.ai, and perplexity.ai in your analytics. Watch these over 4–8 week windows after any alt-text overhaul — they are the leading indicator that you are being cited.| Week | Focus | Outcome |
|---|---|---|
| Week 1 | Audit: find empty, duplicated, and noise alt text across your top-traffic pages | A prioritized fix list |
| Week 2 | Rewrite alt text and rename files on your highest-value pages | Most-quotable pages cleaned up |
| Week 3 | Add ImageObject / Product schema and captions where missing | Machine-readable facts exposed |
| Week 4 | Baseline manual prompt tests + start tracking AI referral traffic | A measurement loop you can repeat |
Both, but in a live web answer they lean heavily on the text — alt text, captions, filenames, and surrounding copy — because reading markup is faster and less ambiguous than analyzing pixels. The vision capability matters most when you upload an image directly into a chat, not when the model is retrieving from the web.
Yes — ranking gets you into the candidate set, but alt text and captions decide whether the AI can quote you once you’re there. Two pages that rank similarly can have very different AI visibility depending on how citable their image metadata is.
On product images, yes, where it reads naturally — those are exactly the facts an AI quotes when recommending products. Keep it descriptive, not stuffed. Put the structured, machine-readable version of the same facts in Product schema.
Plan on 4–8 weeks. Engines re-crawl and re-index on their own schedule, and AI referral traffic builds gradually rather than spiking, so judge it on a trend over a window, not day to day.
Yes. Write alt text in the language of the page and the audience you want to reach. Multilingual sites should describe images in each locale rather than reusing one English description everywhere.
You can get downweighted, not penalized in a formal sense. The failure mode is keyword stuffing — cramming repeated terms into alt text reads as spam to the same models you are trying to impress. The safe path is to describe the image accurately and specifically; that satisfies both classic search and AI retrieval at once.
No. Write one clear, natural description per image and it serves every engine — they all read the same underlying markup. There is no value in maintaining engine-specific variants; there is enormous value in making the single description specific and unique.
AI assistants do not read your images the way a person does — they read the words you attach to them. Alt text, filenames, captions, and structured data are the caption the model reads out, the facts it quotes, and the citation it attributes to your brand. The sites that win AI visibility in 2026 are the ones that treat every meaningful image as a quotable, attributable piece of evidence rather than decoration.
For WordPress, ImageSEO writes alt text tuned for exactly this — natural-language, specific, and unique per image, with semantic filenames and the structured data AI engines read. It’s the same setup we run on our own site. If you’re ready to go deeper, start with our complete image SEO guide and the strategic overview in optimizing for AI search.