GPT Image 2 is OpenAI's new flagship image generation model, announced on April 21, 2026 under the product name "ChatGPT Images 2.0". The model API ID is gpt-image-2, with a pinned snapshot of gpt-image-2-2026-04-21. It is the direct successor to gpt-image-1 (March 2025) and DALL·E 3, both of which retire on May 12, 2026.
OpenAI's framing on launch day: this is "OpenAI's most capable image generation model yet" — and notably, it is the company's first image model with O-series reasoning baked in. In OpenAI's own words, the model is "built for production workflows, where images need to be accurate, readable, on-brand, localized, formatted for the destination surface, and usable without heavy cleanup."
This post breaks down what shipped, why it matters for production work, and how to actually call the API.
The four official capability pillars
OpenAI organized the launch around four pillars. Each one is a direct answer to a long-standing pain point with gpt-image-1 and DALL·E 3.
1. Asset Creation
More aspect ratios and resolutions up to 2K, targeted at apps, ads, product flows, social, presentations, and docs. The previous generation maxed out at 1536×1024 with a thin set of ratios; GPT Image 2 widens the surface to cover the formats real product teams actually ship in.
2. Text-Heavy Visuals
Stronger structured generation — diagrams, infographics, charts, posters, comics — and significantly improved multilingual text rendering. Latin scripts are no longer the only first-class citizens; Japanese, Korean, Cyrillic, Arabic, and dense CJK layouts now render legibly, including non-trivial cases like manga panels and infographics with stacked labels.
3. Control & Precision
More reliable instruction-following, detail preservation, and composition. If you have ever asked the previous generation for "two specific objects, one in each corner" and gotten one or the other, this is the pillar that addresses that failure mode.
4. Reasoning Integration
This is the architectural story. GPT Image 2 is the first OpenAI image model that uses O-series reasoning — it thinks before it draws. With reasoning models, it can research the prompt, transform inputs, generate variations, and self-check the result. For workflows like "render an infographic that summarizes this PDF" or "produce a poster localized to four markets," the model is no longer just painting — it is reasoning over the brief first.
Why "thinks before it draws" matters
Every previous diffusion or autoregressive image model essentially started from noise and tried to converge on a plausible image given the prompt. GPT Image 2 inserts a reasoning step before the visual generation begins. That sounds incremental, but it changes the shape of what the model can do:
- Multi-step briefs become reliable. "Build a quarterly performance poster from these numbers, in the brand color, with the headline in Korean" stops being three separate retry rounds.
- Self-check closes obvious failure modes. The model can catch its own mis-rendered text or wrong aspect ratio before returning a result.
- Tool-aware generation. Combined with reasoning models in the broader stack, it can transform structured input (CSVs, JSON, snippets of source) into the visuals that summarize them.
This is the key reason OpenAI is positioning GPT Image 2 as a production tool rather than a creative toy. The Image Arena results back the framing — gpt-image-2 took #1 in every category on launch, with a +242 point lead in Text-to-Image (the largest lead ever recorded on that leaderboard).
Multilingual text rendering, in practice
Multilingual text was the most-requested fix from the gpt-image-1 era, and it is the area where GPT Image 2 makes the biggest visible jump. Where the old model would scramble non-Latin glyphs or invent characters, the new one ships:
- Japanese vertical layouts and manga panels with readable speech bubbles
- Korean Hangul without the typical block-spacing failures
- Cyrillic signage at full body length — store fronts, posters, transit signs
- Arabic with correct right-to-left flow and connected forms
- Dense infographics with stacked labels, axis text, and footnotes that stay aligned
For e-commerce, app screenshots, posters, and any layout where copy lives inside the frame, this is the difference between "needs a designer to fix" and "ship it."
How it compares
| GPT Image 2 | gpt-image-1 (Mar 2025) | DALL·E 3 | |
|---|---|---|---|
| Status | Live | Deprecated | Retires May 12, 2026 |
| Reasoning | O-series, integrated | None | None |
| Max resolution | Up to 2K | 1536×1024 | 1024×1792 |
| Aspect ratios | Wide range | 3 fixed ratios | 3 fixed ratios |
| Multilingual text | Strong (CJK, Cyrillic, Arabic) | Latin-first, weak CJK | Latin-first |
| Image inputs | High-fidelity | Yes | Limited |
| API endpoints | images/generations, images/edits | Same | Same |
| Image Arena rank | #1 every category, +242 in T2I | — | — |
DALL·E retirement note: OpenAI is retiring both DALL·E 2 and DALL·E 3 on May 12, 2026. GPT Image 2 becomes the default image model across ChatGPT and the OpenAI API. If you are still calling dall-e-3, the migration window is short.
API quick-start
GPT Image 2 is accessible through the same image endpoints as the previous generation, so the migration is mostly a model-ID swap. The API model card documents the full surface.
Endpoints
POST /v1/images/generations— text-to-imagePOST /v1/images/edits— image editing (with image input)
Modalities — input: text + image. Output: image.
Model ID — gpt-image-2. Pin a snapshot with gpt-image-2-2026-04-21 if you need stable behavior.
Pricing (per million tokens)
| Input | Cached input | Output | |
|---|---|---|---|
| Image tokens | $8.00 | $2.00 | $30.00 |
| Text tokens | $5.00 | $1.25 | $10.00 |
The cached-input pricing is the relevant lever for production — if your prompts share a stable preamble (brand guidelines, style constraints, layout rules), you pay roughly a quarter of the per-token cost on the cached portion.
Availability rollout
- April 21, 2026 — announcement
- April 22, 2026 — ChatGPT and Codex users get access
- Early May 2026 — API rollout begins (see the developer community post for the live status)
What is missing today
The launch surface is intentionally narrow. Worth knowing before you architect around it:
- No streaming. You wait for the full image; you cannot incrementally render.
- No function calling.
- No structured outputs.
- No fine-tuning yet.
For most product use cases this does not bite. If you were planning to build a pipeline that fine-tunes a brand-specific image model on your own asset library, that capability is not on the menu at launch — track the model card for updates.
What this means for builders
- Migrate off DALL·E 3 now. API shutoff is May 12, 2026. The lift is mostly a model-ID change; see the API quick-start above.
- Audit your prompt library. GPT Image 2 reads instructions much more literally than DALL·E 3, and reasoning-integrated generation rewards prompts that state intent precisely. Our curated GPT Image 2 prompt library shows patterns that work, with verified output images.
- Reach for cached-input pricing. If you are running a high-volume generation pipeline, structuring prompts so the brand/style preamble is reusable cuts the input bill by ~75% on that portion.
- Text-in-image is finally production-ready. If you have parked features waiting for correct CJK or RTL rendering, unpark them.
Try it now
GPT Image 2 is live in our generator via the gpt-image-2 model — no waitlist, no migration work. Open the canvas and drop a prompt, or browse the curated prompt library for ship-ready examples with verified output images you can copy and tweak.
New signups get 10 free credits — enough to see the new text rendering and reasoning integration on your own prompts before spending anything.
We will keep this post current as the API rollout completes and OpenAI ships follow-on capabilities. Subscribe via our changelog for updates.

