openai · rm-openai · Mar 20, 2025 · Mar 20, 2025 · Mar 20, 2025
diff --git a/docs/ref/voice/events.md b/docs/ref/voice/events.md
@@ -0,0 +1,3 @@
+# `Events`
+
+::: agents.voice.events
diff --git a/docs/ref/voice/exceptions.md b/docs/ref/voice/exceptions.md
@@ -0,0 +1,3 @@
+# `Exceptions`
+
+::: agents.voice.exceptions
diff --git a/docs/ref/voice/input.md b/docs/ref/voice/input.md
@@ -0,0 +1,3 @@
+# `Input`
+
+::: agents.voice.input
diff --git a/docs/ref/voice/model.md b/docs/ref/voice/model.md
@@ -0,0 +1,3 @@
+# `Model`
+
+::: agents.voice.model
diff --git a/docs/ref/voice/models/openai_provider.md b/docs/ref/voice/models/openai_provider.md
@@ -0,0 +1,3 @@
+# `OpenAIVoiceModelProvider`
+
+::: agents.voice.models.openai_model_provider
diff --git a/docs/ref/voice/models/openai_stt.md b/docs/ref/voice/models/openai_stt.md
@@ -0,0 +1,3 @@
+# `OpenAI STT`
+
+::: agents.voice.models.openai_stt
diff --git a/docs/ref/voice/models/openai_tts.md b/docs/ref/voice/models/openai_tts.md
@@ -0,0 +1,3 @@
+# `OpenAI TTS`
+
+::: agents.voice.models.openai_tts
diff --git a/docs/ref/voice/pipeline.md b/docs/ref/voice/pipeline.md
@@ -0,0 +1,3 @@
+# `Pipeline`
+
+::: agents.voice.pipeline
diff --git a/docs/ref/voice/pipeline_config.md b/docs/ref/voice/pipeline_config.md
@@ -0,0 +1,3 @@
+# `Pipeline Config`
+
+::: agents.voice.pipeline_config
diff --git a/docs/ref/voice/result.md b/docs/ref/voice/result.md
@@ -0,0 +1,3 @@
+# `Result`
+
+::: agents.voice.result
diff --git a/docs/ref/voice/utils.md b/docs/ref/voice/utils.md
@@ -0,0 +1,3 @@
+# `Utils`
+
+::: agents.voice.utils
diff --git a/docs/ref/voice/workflow.md b/docs/ref/voice/workflow.md
@@ -0,0 +1,3 @@
+# `Workflow`
+
+::: agents.voice.workflow
diff --git a/docs/voice/pipeline.md b/docs/voice/pipeline.md
@@ -0,0 +1,75 @@
+# Pipelines and workflows
+
+[`VoicePipeline`][agents.voice.pipeline.VoicePipeline] is a class that makes it easy to turn your agentic workflows into a voice app. You pass in a workflow to run, and the pipeline takes care of transcribing input audio, detecting when the audio ends, calling your workflow at the right time, and turning the workflow output back into audio.
+
+```mermaid
+graph LR
+    %% Input
+    A["🎤 Audio Input"]
+
+    %% Voice Pipeline
+    subgraph Voice_Pipeline [Voice Pipeline]
+        direction TB
+        B["Transcribe (speech-to-text)"]
+        C["Your Code"]:::highlight
+        D["Text-to-speech"]
+        B --> C --> D
+    end
+
+    %% Output
+    E["🎧 Audio Output"]
+
+    %% Flow
+    A --> Voice_Pipeline
+    Voice_Pipeline --> E
+
+    %% Custom styling
+    classDef highlight fill:#ffcc66,stroke:#333,stroke-width:1px,font-weight:700;
+
+```
+
+## Configuring a pipeline
+
+When you create a pipeline, you can set a few things:
+
+1. The [`workflow`][agents.voice.workflow.VoiceWorkflowBase], which is the code that runs each time new audio is transcribed.
+2. The [`speech-to-text`][agents.voice.model.STTModel] and [`text-to-speech`][agents.voice.model.TTSModel] models used
+3. The [`config`][agents.voice.pipeline_config.VoicePipelineConfig], which lets you configure things like:
+    - A model provider, which can map model names to models
+    - Tracing, including whether to disable tracing, whether audio files are uploaded, the workflow name, trace IDs etc.
+    - Settings on the TTS and STT models, like the prompt, language and data types used.
+
+## Running a pipeline
+
+You can run a pipeline via the [`run()`][agents.voice.pipeline.VoicePipeline.run] method, which lets you pass in audio input in two forms:
+
+1. [`AudioInput`][agents.voice.input.AudioInput] is used when you have a full audio transcript, and just want to produce a result for it. This is useful in cases where you don't need to detect when a speaker is done speaking; for example, when you have pre-recorded audio or in push-to-talk apps where it's clear when the user is done speaking.
+2. [`StreamedAudioInput`][agents.voice.input.StreamedAudioInput] is used when you might need to detect when a user is done speaking. It allows you to push audio chunks as they are detected, and the voice pipeline will automatically run the agent workflow at the right time, via a process called "activity detection".
+
+## Results
+
+The result of a voice pipeline run is a [`StreamedAudioResult`][agents.voice.result.StreamedAudioResult]. This is an object that lets you stream events as they occur. There are a few kinds of [`VoiceStreamEvent`][agents.voice.events.VoiceStreamEvent], including:
+
+1. [`VoiceStreamEventAudio`][agents.voice.events.VoiceStreamEventAudio], which contains a chunk of audio.
+2. [`VoiceStreamEventLifecycle`][agents.voice.events.VoiceStreamEventLifecycle], which informs you of lifecycle events like a turn starting or ending.
+3. [`VoiceStreamEventError`][agents.voice.events.VoiceStreamEventError], is an error event.
+
+```python
+
+result = await pipeline.run(input)
+
+async for event in result.stream():
+    if event.type == "voice_stream_event_audio":
+        # play audio
+    elif event.type == "voice_stream_event_lifecycle":
+        # lifecycle
+    elif event.type == "voice_stream_event_error"
+        # error
+    ...
+```
+
+## Best practices
+
+### Interruptions
+
+The Agents SDK currently does not support any built-in interruptions support for [`StreamedAudioInput`][agents.voice.input.StreamedAudioInput]. Instead for every detected turn it will trigger a separate run of your workflow. If you want to handle interruptions inside your application you can listen to the [`VoiceStreamEventLifecycle`][agents.voice.events.VoiceStreamEventLifecycle] events. `turn_started` will indicate that a new turn was transcribed and processing is beginning. `turn_ended` will trigger after all the audio was dispatched for a respective turn. You could use these events to mute the microphone of the speaker when the model starts a turn and unmute it after you flushed all the related audio for a turn.
diff --git a/docs/voice/quickstart.md b/docs/voice/quickstart.md
@@ -0,0 +1,189 @@
+# Quickstart
+
+## Prerequisites
+
+Make sure you've followed the base [quickstart instructions](../quickstart.md) for the Agents SDK, and set up a virtual environment. Then, install the optional voice dependencies from the SDK:
+
+```bash
+pip install openai-agents[voice]
+```
+
+## Concepts
+
+The main concept to know about is a [`VoicePipeline`][agents.voice.pipeline.VoicePipeline], which is a 3 step process:
+
+1. Run a speech-to-text model to turn audio into text.
+2. Run your code, which is usually an agentic workflow, to produce a result.
+3. Run a text-to-speech model to turn the result text back into audio.
+
+```mermaid
+graph LR
+    %% Input
+    A["🎤 Audio Input"]
+
+    %% Voice Pipeline
+    subgraph Voice_Pipeline [Voice Pipeline]
+        direction TB
+        B["Transcribe (speech-to-text)"]
+        C["Your Code"]:::highlight
+        D["Text-to-speech"]
+        B --> C --> D
+    end
+
+    %% Output
+    E["🎧 Audio Output"]
+
+    %% Flow
+    A --> Voice_Pipeline
+    Voice_Pipeline --> E
+
+    %% Custom styling
+    classDef highlight fill:#ffcc66,stroke:#333,stroke-width:1px,font-weight:700;
+
+```
+
+## Agents
+
+First, let's set up some Agents. This should feel familiar to you if you've built any agents with this SDK. We'll have a couple of Agents, a handoff, and a tool.
+
+```python
+import asyncio
+import random
+
+from agents import (
+    Agent,
+    function_tool,
+)
+from agents.extensions.handoff_prompt import prompt_with_handoff_instructions
+
+
+
+@function_tool
+def get_weather(city: str) -> str:
+    """Get the weather for a given city."""
+    print(f"[debug] get_weather called with city: {city}")
+    choices = ["sunny", "cloudy", "rainy", "snowy"]
+    return f"The weather in {city} is {random.choice(choices)}."
+
+
+spanish_agent = Agent(
+    name="Spanish",
+    handoff_description="A spanish speaking agent.",
+    instructions=prompt_with_handoff_instructions(
+        "You're speaking to a human, so be polite and concise. Speak in Spanish.",
+    ),
+    model="gpt-4o-mini",
+)
+
+agent = Agent(
+    name="Assistant",
+    instructions=prompt_with_handoff_instructions(
+        "You're speaking to a human, so be polite and concise. If the user speaks in Spanish, handoff to the spanish agent.",
+    ),
+    model="gpt-4o-mini",
+    handoffs=[spanish_agent],
+    tools=[get_weather],
+)
+```
+
+## Voice pipeline
+
+We'll set up a simple voice pipeline, using [`SingleAgentVoiceWorkflow`][agents.voice.workflow.SingleAgentVoiceWorkflow] as the workflow.
+
+```python
+from agents import SingleAgentVoiceWorkflow, VoicePipeline,
+pipeline = VoicePipeline(workflow=SingleAgentVoiceWorkflow(agent))
+```
+
+## Run the pipeline
+
+```python
+import numpy as np
+import sounddevice as sd
+
+# For simplicity, we'll just create 3 seconds of silence
+# In reality, you'd get microphone data
+audio = np.zeros(24000 * 3, dtype=np.int16)
+result = await pipeline.run(audio_input)
+
+# Create an audio player using `sounddevice`
+player = sd.OutputStream(samplerate=24000, channels=1, dtype=np.int16)
+player.start()
+
+# Play the audio stream as it comes in
+async for event in result.stream():
+    if event.type == "voice_stream_event_audio":
+        player.write(event.data)
+
+```
+
+## Put it all together
+
+```python
+import asyncio
+import random
+
+import numpy as np
+import sounddevice as sd
+
+from agents import (
+    Agent,
+    AudioInput,
+    SingleAgentVoiceWorkflow,
+    VoicePipeline,
+    function_tool,
+    set_tracing_disabled,
+)
+from agents.extensions.handoff_prompt import prompt_with_handoff_instructions
+
+
+@function_tool
+def get_weather(city: str) -> str:
+    """Get the weather for a given city."""
+    print(f"[debug] get_weather called with city: {city}")
+    choices = ["sunny", "cloudy", "rainy", "snowy"]
+    return f"The weather in {city} is {random.choice(choices)}."
+
+
+spanish_agent = Agent(
+    name="Spanish",
+    handoff_description="A spanish speaking agent.",
+    instructions=prompt_with_handoff_instructions(
+        "You're speaking to a human, so be polite and concise. Speak in Spanish.",
+    ),
+    model="gpt-4o-mini",
+)
+
+agent = Agent(
+    name="Assistant",
+    instructions=prompt_with_handoff_instructions(
+        "You're speaking to a human, so be polite and concise. If the user speaks in Spanish, handoff to the spanish agent.",
+    ),
+    model="gpt-4o-mini",
+    handoffs=[spanish_agent],
+    tools=[get_weather],
+)
+
+
+async def main():
+    pipeline = VoicePipeline(workflow=SingleAgentVoiceWorkflow(agent))
+    buffer = np.zeros(24000 * 3, dtype=np.int16)
+    audio_input = AudioInput(buffer=buffer)
+
+    result = await pipeline.run(audio_input)
+
+    # Create an audio player using `sounddevice`
+    player = sd.OutputStream(samplerate=24000, channels=1, dtype=np.int16)
+    player.start()
+
+    # Play the audio stream as it comes in
+    async for event in result.stream():
+        if event.type == "voice_stream_event_audio":
+            player.write(event.data)
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+If you run this example, the agent will speak to you! Check out the example in [examples/voice/static](https://github.com/openai/openai-agents-python/tree/main/examples/voice/static) to see a demo where you can speak to the agent yourself.
diff --git a/docs/voice/tracing.md b/docs/voice/tracing.md
@@ -0,0 +1,14 @@
+# Tracing
+
+Just like the way [agents are traced](../tracing.md), voice pipelines are also automatically traced.
+
+You can read the tracing doc above for basic tracing information, but you can additionally configure tracing of a pipeline via [`VoicePipelineConfig`][agents.voice.pipeline_config.VoicePipelineConfig].
+
+Key tracing related fields are:
+
+-   [`tracing_disabled`][agents.voice.pipeline_config.VoicePipelineConfig.tracing_disabled]: controls whether tracing is disabled. By default, tracing is enabled.
+-   [`trace_include_sensitive_data`][agents.voice.pipeline_config.VoicePipelineConfig.trace_include_sensitive_data]: controls whether traces include potentially sensitive data, like audio transcripts. This is specifically for the voice pipeline, and not for anything that goes on inside your Workflow.
+-   [`trace_include_sensitive_audio_data`][agents.voice.pipeline_config.VoicePipelineConfig.trace_include_sensitive_audio_data]: controls whether traces include audio data.
+-   [`workflow_name`][agents.voice.pipeline_config.VoicePipelineConfig.workflow_name]: The name of the trace workflow.
+-   [`group_id`][agents.voice.pipeline_config.VoicePipelineConfig.group_id]: The `group_id` of the trace, which lets you link multiple traces.
+-   [`trace_metadata`][agents.voice.pipeline_config.VoicePipelineConfig.tracing_disabled]: Additional metadata to include with the trace.
diff --git a/examples/financial_research_agent/manager.py b/examples/financial_research_agent/manager.py
@@ -42,16 +42,14 @@ async def run(self, query: str) -> None:
                 is_done=True,
                 hide_checkmark=True,
             )
-            self.printer.update_item(
-                "start", "Starting financial research...", is_done=True)
+            self.printer.update_item("start", "Starting financial research...", is_done=True)
             search_plan = await self._plan_searches(query)
             search_results = await self._perform_searches(search_plan)
             report = await self._write_report(query, search_results)
             verification = await self._verify_report(report)
 
             final_report = f"Report summary\n\n{report.short_summary}"
-            self.printer.update_item(
-                "final_report", final_report, is_done=True)
+            self.printer.update_item("final_report", final_report, is_done=True)
 
             self.printer.end()
 
@@ -76,8 +74,7 @@ async def _plan_searches(self, query: str) -> FinancialSearchPlan:
     async def _perform_searches(self, search_plan: FinancialSearchPlan) -> Sequence[str]:
         with custom_span("Search the web"):
             self.printer.update_item("searching", "Searching...")
-            tasks = [asyncio.create_task(self._search(item))
-                     for item in search_plan.searches]
+            tasks = [asyncio.create_task(self._search(item)) for item in search_plan.searches]
             results: list[str] = []
             num_completed = 0
             for task in asyncio.as_completed(tasks):
@@ -112,8 +109,7 @@ async def _write_report(self, query: str, search_results: Sequence[str]) -> Fina
             tool_description="Use to get a short write‑up of potential red flags",
             custom_output_extractor=_summary_extractor,
         )
-        writer_with_tools = writer_agent.clone(
-            tools=[fundamentals_tool, risk_tool])
+        writer_with_tools = writer_agent.clone(tools=[fundamentals_tool, risk_tool])
         self.printer.update_item("writing", "Thinking about report...")
         input_data = f"Original query: {query}\nSummarized search results: {search_results}"
         result = Runner.run_streamed(writer_with_tools, input_data)
@@ -126,8 +122,7 @@ async def _write_report(self, query: str, search_results: Sequence[str]) -> Fina
         next_message = 0
         async for _ in result.stream_events():
             if time.time() - last_update > 5 and next_message < len(update_messages):
-                self.printer.update_item(
-                    "writing", update_messages[next_message])
+                self.printer.update_item("writing", update_messages[next_message])
                 next_message += 1
                 last_update = time.time()
         self.printer.mark_item_done("writing")

diff --git a/examples/financial_research_agent/printer.py b/examples/financial_research_agent/printer.py
@@ -10,6 +10,7 @@ class Printer:
     Simple wrapper to stream status updates. Used by the financial bot
     manager as it orchestrates planning, search and writing.
     """
+
     def __init__(self, console: Console) -> None:
         self.live = Live(console=console)
         self.items: dict[str, tuple[str, bool]] = {}

diff --git a/examples/voice/__init__.py b/examples/voice/__init__.py
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		# `OpenAIVoiceModelProvider`

		::: agents.voice.models.openai_model_provider
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		# `OpenAI STT`

		::: agents.voice.models.openai_stt
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		# `OpenAI TTS`

		::: agents.voice.models.openai_tts
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		# `Pipeline Config`

		::: agents.voice.pipeline_config