Allan Carranza presents a step-by-step developer guide to using the latest GPT-4o audio models on Azure OpenAI via Microsoft Foundry, with practical examples for speech-to-text and TTS integration.

Introducing GPT-4o Audio Models in Microsoft Foundry: A Practical Guide for Developers

Author: Allan Carranza

Overview

This guide explores the latest additions to Azure OpenAI—GPT-4o-Transcribe, GPT-4o-Mini-Transcribe, and GPT-4o-Mini-TTS—offering cutting-edge audio capabilities for transcription and text-to-speech (TTS) use cases.

What’s New in OpenAI’s Audio Models?

GPT-4o-Transcribe & GPT-4o-Mini-Transcribe: Advanced speech-to-text models that surpass previous benchmarks for accuracy and speed.
GPT-4o-Mini-TTS: A flexible text-to-speech model supporting custom speech instructions for interactive and accessible applications.

Feature	GPT-4o-Transcribe	GPT-4o-Mini-Transcribe	GPT-4o-Mini-TTS
Performance	Best Quality	Great Quality	Best Quality
Speed	Fast	Fastest	Fastest
Input	Text, Audio	Text, Audio	Text
Output	Text	Text	Audio
Streaming	✅	✅	✅
Ideal Use Cases	Accurate transcription for complex audio (e.g., call centers, meetings)	Live captioning, rapid response, budget scenarios	Interactive voice outputs for bots, assistants, accessibility, edu apps

Technical Innovations

Targeted Audio Pretraining: Utilizes large-scale, specialized datasets for improved speech understanding.
Advanced Distillation: Preserves high model performance while reducing size for efficiency.
Reinforcement Learning: Boosts transcription accuracy, minimizing misrecognition in complex environments.

Getting Started Guide

1. Set Up Azure OpenAI Environment

Obtain your Azure OpenAI endpoint and API key.
Authenticate via Azure CLI:

az login

2. Configure Project Environment

Create a .env file with your credentials:

AZURE_OPENAI_ENDPOINT="your-endpoint-url"
AZURE_OPENAI_API_KEY="your-api-key"
AZURE_OPENAI_API_VERSION="2025-04-14"

3. Install Dependencies

Configure your Python virtual environment and install essentials:

uv venv
source .venv/bin/activate        # macOS/Linux
.venv\Scripts\activate           # Windows
uv add azure-ai-openai python-dotenv gradio aiohttp

4. Deploy and Test Using Gradio

Launch your Gradio app for audio streaming:

python your_gradio_app.py

Developer Impact

By integrating GPT-4o audio models, developers can:

Easily add transcription and TTS capabilities to apps
Build responsive, accessible, and voice-driven user experiences
Leverage customizable voice features for unique interfaces

Additional Resources

We encourage developers to explore these new models and share results and feedback to keep improving the platform.

This post appeared first on “Microsoft AI Foundry Blog”. Read the entire article here