Skip to content

Commit 9bd5ab5

Browse files
authored
feat (provider): add providerMetadata to ImageModelV2 interface (vercel#5977)
## Background Compare vercel#5698 ## Summary Additional provider-specific options to the image model provider interface. They are passed through to the provider from the AI SDK and enable provider-specific functionality that can be fully encapsulated in the provider. Unlike other models, ImageModel request return an array of images, and provider can return image-specific metadata for each. So far, this pull request passing through the revised prompt used for each image. In order to make that possible, I introduced a new type `ImageModelV2ProviderMetadata` which is the same as `SharedV2ProviderMetadata` plus it guarantees the presence of the `.images` key ```js export type ImageModelV2ProviderMetadata = Record< string, { images: JSONArray; } & JSONValue >; ``` That also makes it possible to deeply merge providerMetadata from multiple responses effectively. ## Verification I updated the `examples/ai-core/src/generate-image/openai.ts` example to verify that the code is working.
1 parent fd1924b commit 9bd5ab5

File tree

15 files changed

+224
-14
lines changed

15 files changed

+224
-14
lines changed

.changeset/sour-bananas-remain.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
---
2+
'@ai-sdk/provider': patch
3+
'@ai-sdk/openai': patch
4+
'ai': patch
5+
---
6+
7+
feat (provider): add providerMetadata to ImageModelV2 interface (#5977)
8+
9+
The `experimental_generateImage` method from the `ai` package now returnes revised prompts for OpenAI's image models.
10+
11+
```js
12+
const prompt = 'Santa Claus driving a Cadillac';
13+
14+
const { providerMetadata } = await experimental_generateImage({
15+
model: openai.image('dall-e-3'),
16+
prompt,
17+
});
18+
19+
const revisedPrompt = providerMetadata.openai.images[0]?.revisedPrompt;
20+
21+
console.log({
22+
prompt,
23+
revisedPrompt,
24+
});
25+
```

content/docs/03-ai-sdk-core/35-image-generation.mdx

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -182,6 +182,28 @@ const { image, warnings } = await generateImage({
182182
});
183183
```
184184

185+
### Additional provider-specific meta data
186+
187+
Some providers expose additional meta data for the result overall or per image.
188+
189+
```tsx
190+
const prompt = 'Santa Claus driving a Cadillac';
191+
192+
const { image, providerMetaData } = await generateImage({
193+
model: openai.image('dall-e-3'),
194+
prompt,
195+
});
196+
197+
const revisedPrompt = providerMetaData.openai.images[0]?.revisedPrompt;
198+
199+
console.log({
200+
prompt,
201+
revisedPrompt,
202+
});
203+
```
204+
205+
The outer key of the returned `providerMetaData` is the provider name. The inner values are the metadata. An `images` key is always present in the metadata and is an array with the same length as the top level `images` key.
206+
185207
### Error Handling
186208

187209
When `generateImage` cannot generate a valid image, it throws a [`AI_NoImageGeneratedError`](/docs/reference/ai-sdk-errors/ai-no-image-generated-error).

content/docs/07-reference/01-ai-sdk-core/10-generate-image.mdx

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -165,6 +165,13 @@ console.log(images);
165165
description:
166166
'Warnings from the model provider (e.g. unsupported settings).',
167167
},
168+
{
169+
name: 'providerMetadata',
170+
type: 'ImageModelV2ProviderMetadata',
171+
isOptional: true,
172+
description:
173+
'Optional metadata from the provider. The outer key is the provider name. The inner values are the metadata. An `images` key is always present in the metadata and is an array with the same length as the top level `images` key. Details depend on the provider.',
174+
},
168175
{
169176
name: 'responses',
170177
type: 'Array<ImageModelResponseMetadata>',

content/providers/01-ai-sdk-providers/02-openai.mdx

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -909,7 +909,7 @@ const model = openai.image('dall-e-3');
909909
You can pass optional `providerOptions` to the image model. These are prone to change by OpenAI and are model dependent. For example, the `gpt-image-1` model supports the `quality` option:
910910

911911
```ts
912-
const { image } = await generateImage({
912+
const { image, providerMetadata } = await generateImage({
913913
model: openai.image('gpt-image-1'),
914914
prompt: 'A salamander at sunrise in a forest pond in the Seychelles.',
915915
providerOptions: {
@@ -920,6 +920,8 @@ const { image } = await generateImage({
920920

921921
For more on `generateImage()` see [Image Generation](/docs/ai-sdk-core/image-generation).
922922

923+
OpenAI's image models may return a revised prompt for each image. It can be access at `providerMetadata.openai.images[0]?.revisedPrompt`.
924+
923925
For more information on the available OpenAI image model options, see the [OpenAI API reference](https://platform.openai.com/docs/api-reference/images/create).
924926

925927
## Transcription Models

examples/ai-core/src/generate-image/openai.ts

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,21 @@ import { presentImages } from '../lib/present-image';
44
import 'dotenv/config';
55

66
async function main() {
7-
const { image } = await generateImage({
7+
const prompt = 'Santa Claus driving a Cadillac';
8+
const result = await generateImage({
89
model: openai.image('dall-e-3'),
9-
prompt: 'Santa Claus driving a Cadillac',
10+
prompt,
1011
});
1112

12-
await presentImages([image]);
13+
// @ts-expect-error
14+
const revisedPrompt = result.providerMetadata.openai.images[0]?.revisedPrompt;
15+
16+
console.log({
17+
prompt,
18+
revisedPrompt,
19+
});
20+
21+
await presentImages([result.image]);
1322
}
1423

1524
main().catch(console.error);

packages/ai/core/generate-image/generate-image-result.ts

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
import { GeneratedFile } from '../generate-text';
2-
import { ImageGenerationWarning } from '../types/image-model';
2+
import {
3+
ImageGenerationWarning,
4+
ImageModelProviderMetadata,
5+
} from '../types/image-model';
36
import { ImageModelResponseMetadata } from '../types/image-model-response-metadata';
47

58
/**
@@ -26,4 +29,10 @@ Warnings for the call, e.g. unsupported settings.
2629
Response metadata from the provider. There may be multiple responses if we made multiple calls to the model.
2730
*/
2831
readonly responses: Array<ImageModelResponseMetadata>;
32+
33+
/**
34+
* Provider-specific metadata. They are passed through from the provider to the AI SDK and enable provider-specific
35+
* results that can be fully encapsulated in the provider.
36+
*/
37+
readonly providerMetadata: ImageModelProviderMetadata;
2938
}

packages/ai/core/generate-image/generate-image.test.ts

Lines changed: 37 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
1-
import { ImageModelV2, ImageModelV2CallWarning } from '@ai-sdk/provider';
1+
import {
2+
ImageModelV2,
3+
ImageModelV2CallWarning,
4+
ImageModelV2ProviderMetadata,
5+
} from '@ai-sdk/provider';
26
import { MockImageModelV2 } from '../test/mock-image-model-v2';
37
import { generateImage } from './generate-image';
48
import {
@@ -20,10 +24,16 @@ const createMockResponse = (options: {
2024
warnings?: ImageModelV2CallWarning[];
2125
timestamp?: Date;
2226
modelId?: string;
27+
providerMetaData?: ImageModelV2ProviderMetadata;
2328
headers?: Record<string, string>;
2429
}) => ({
2530
images: options.images,
2631
warnings: options.warnings ?? [],
32+
providerMetadata: options.providerMetaData ?? {
33+
testProvider: {
34+
images: options.images.map(() => null),
35+
},
36+
},
2737
response: {
2838
timestamp: options.timestamp ?? new Date(),
2939
modelId: options.modelId ?? 'test-model-id',
@@ -382,4 +392,30 @@ describe('generateImage', () => {
382392
},
383393
]);
384394
});
395+
396+
it('should return provider metadata', async () => {
397+
const result = await generateImage({
398+
model: new MockImageModelV2({
399+
doGenerate: async () =>
400+
createMockResponse({
401+
images: [pngBase64, pngBase64],
402+
timestamp: testDate,
403+
modelId: 'test-model',
404+
providerMetaData: {
405+
testProvider: {
406+
images: [{ revisedPrompt: 'test-revised-prompt' }, null],
407+
},
408+
},
409+
headers: {},
410+
}),
411+
}),
412+
prompt,
413+
});
414+
415+
expect(result.providerMetadata).toStrictEqual({
416+
testProvider: {
417+
images: [{ revisedPrompt: 'test-revised-prompt' }, null],
418+
},
419+
});
420+
});
385421
});

packages/ai/core/generate-image/generate-image.ts

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
1-
import { ImageModelV2, JSONValue } from '@ai-sdk/provider';
1+
import { ImageModelV2, ImageModelV2ProviderMetadata } from '@ai-sdk/provider';
22
import { NoImageGeneratedError } from '../../errors/no-image-generated-error';
33
import {
44
DefaultGeneratedFile,
55
GeneratedFile,
66
} from '../generate-text/generated-file';
77
import { prepareRetries } from '../prompt/prepare-retries';
8+
import { ProviderMetadata } from '../types';
89
import { ImageGenerationWarning } from '../types/image-model';
910
import { ImageModelResponseMetadata } from '../types/image-model-response-metadata';
1011
import { GenerateImageResult } from './generate-image-result';
@@ -144,6 +145,7 @@ Only applicable for HTTP-based providers.
144145
const images: Array<DefaultGeneratedFile> = [];
145146
const warnings: Array<ImageGenerationWarning> = [];
146147
const responses: Array<ImageModelResponseMetadata> = [];
148+
const providerMetadata: ImageModelV2ProviderMetadata = {};
147149
for (const result of results) {
148150
images.push(
149151
...result.images.map(
@@ -159,29 +161,49 @@ Only applicable for HTTP-based providers.
159161
),
160162
);
161163
warnings.push(...result.warnings);
164+
165+
if (result.providerMetadata) {
166+
for (const [providerName, metadata] of Object.entries<{
167+
images: unknown;
168+
}>(result.providerMetadata)) {
169+
providerMetadata[providerName] ??= { images: [] };
170+
providerMetadata[providerName].images.push(
171+
...result.providerMetadata[providerName].images,
172+
);
173+
}
174+
}
175+
162176
responses.push(result.response);
163177
}
164178

165179
if (!images.length) {
166180
throw new NoImageGeneratedError({ responses });
167181
}
168182

169-
return new DefaultGenerateImageResult({ images, warnings, responses });
183+
return new DefaultGenerateImageResult({
184+
images,
185+
warnings,
186+
responses,
187+
providerMetadata,
188+
});
170189
}
171190

172191
class DefaultGenerateImageResult implements GenerateImageResult {
173192
readonly images: Array<GeneratedFile>;
174193
readonly warnings: Array<ImageGenerationWarning>;
175194
readonly responses: Array<ImageModelResponseMetadata>;
195+
readonly providerMetadata: ImageModelV2ProviderMetadata;
176196

177197
constructor(options: {
178198
images: Array<GeneratedFile>;
179199
warnings: Array<ImageGenerationWarning>;
180200
responses: Array<ImageModelResponseMetadata>;
201+
providerMetadata: ImageModelV2ProviderMetadata;
181202
}) {
182203
this.images = options.images;
183204
this.warnings = options.warnings;
184205
this.responses = options.responses;
206+
this.providerMetadata = options.providerMetadata;
185207
}
186208

187209
get image() {

packages/ai/core/types/image-model.ts

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
1-
import { ImageModelV2, ImageModelV2CallWarning } from '@ai-sdk/provider';
1+
import {
2+
ImageModelV2,
3+
ImageModelV2CallWarning,
4+
ImageModelV2ProviderMetadata,
5+
} from '@ai-sdk/provider';
26

37
/**
48
Image model that is used by the AI SDK Core functions.
@@ -10,3 +14,8 @@ Warning from the model provider for this call. The call will proceed, but e.g.
1014
some settings might not be supported, which can lead to suboptimal results.
1115
*/
1216
export type ImageGenerationWarning = ImageModelV2CallWarning;
17+
18+
/**
19+
Metadata from the model provider for this call
20+
*/
21+
export type ImageModelProviderMetadata = ImageModelV2ProviderMetadata;

packages/ai/core/types/index.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ export type { Embedding, EmbeddingModel } from './embedding-model';
22
export type {
33
ImageModel,
44
ImageGenerationWarning as ImageModelCallWarning,
5+
ImageModelProviderMetadata,
56
} from './image-model';
67
export type { ImageModelResponseMetadata } from './image-model-response-metadata';
78
export type { JSONValue } from './json-value';

packages/openai/src/openai-image-model.test.ts

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -253,4 +253,29 @@ describe('doGenerate', () => {
253253
const requestBody = await server.calls[server.calls.length - 1].requestBody;
254254
expect(requestBody).toHaveProperty('response_format', 'b64_json');
255255
});
256+
257+
it('should return image meta data', async () => {
258+
prepareJsonResponse();
259+
260+
const result = await model.doGenerate({
261+
prompt,
262+
n: 1,
263+
size: '1024x1024',
264+
aspectRatio: undefined,
265+
seed: undefined,
266+
providerOptions: { openai: { style: 'vivid' } },
267+
});
268+
269+
expect(result.providerMetadata).toStrictEqual({
270+
openai: {
271+
images: [
272+
{
273+
revisedPrompt:
274+
'A charming visual illustration of a baby sea otter swimming joyously.',
275+
},
276+
null,
277+
],
278+
},
279+
});
280+
});
256281
});

packages/openai/src/openai-image-model.ts

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,12 +99,25 @@ export class OpenAIImageModel implements ImageModelV2 {
9999
modelId: this.modelId,
100100
headers: responseHeaders,
101101
},
102+
providerMetadata: {
103+
openai: {
104+
images: response.data.map(item =>
105+
item.revised_prompt
106+
? {
107+
revisedPrompt: item.revised_prompt,
108+
}
109+
: null,
110+
),
111+
},
112+
},
102113
};
103114
}
104115
}
105116

106117
// minimal version of the schema, focussed on what is needed for the implementation
107118
// this approach limits breakages when the API changes and increases efficiency
108119
const openaiImageResponseSchema = z.object({
109-
data: z.array(z.object({ b64_json: z.string() })),
120+
data: z.array(
121+
z.object({ b64_json: z.string(), revised_prompt: z.string().optional() }),
122+
),
110123
});

packages/provider/src/image-model/v2/image-model-v2-call-options.ts

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -39,9 +39,9 @@ The outer record is keyed by the provider name, and the inner
3939
record is keyed by the provider-specific metadata key.
4040
```ts
4141
{
42-
"openai": {
43-
"style": "vivid"
44-
}
42+
"openai": {
43+
"style": "vivid"
44+
}
4545
}
4646
```
4747
*/

0 commit comments

Comments
 (0)