Introducing GPT-4o Audio Models in Microsoft Foundry: A Practical Guide for Developers
Allan Carranza presents a step-by-step developer guide to using the latest GPT-4o audio models on Azure OpenAI via Microsoft Foundry, with practical examples for speech-to-text and TTS integration.
Introducing GPT-4o Audio Models in Microsoft Foundry: A Practical Guide for Developers
Author: Allan Carranza
Overview
This guide explores the latest additions to Azure OpenAI—GPT-4o-Transcribe, GPT-4o-Mini-Transcribe, and GPT-4o-Mini-TTS—offering cutting-edge audio capabilities for transcription and text-to-speech (TTS) use cases.
What’s New in OpenAI’s Audio Models?
- GPT-4o-Transcribe & GPT-4o-Mini-Transcribe: Advanced speech-to-text models that surpass previous benchmarks for accuracy and speed.
- GPT-4o-Mini-TTS: A flexible text-to-speech model supporting custom speech instructions for interactive and accessible applications.
| Feature | GPT-4o-Transcribe | GPT-4o-Mini-Transcribe | GPT-4o-Mini-TTS |
|---|---|---|---|
| Performance | Best Quality | Great Quality | Best Quality |
| Speed | Fast | Fastest | Fastest |
| Input | Text, Audio | Text, Audio | Text |
| Output | Text | Text | Audio |
| Streaming | ✅ | ✅ | ✅ |
| Ideal Use Cases | Accurate transcription for complex audio (e.g., call centers, meetings) | Live captioning, rapid response, budget scenarios | Interactive voice outputs for bots, assistants, accessibility, edu apps |
Technical Innovations
- Targeted Audio Pretraining: Utilizes large-scale, specialized datasets for improved speech understanding.
- Advanced Distillation: Preserves high model performance while reducing size for efficiency.
- Reinforcement Learning: Boosts transcription accuracy, minimizing misrecognition in complex environments.
Getting Started Guide
1. Set Up Azure OpenAI Environment
- Obtain your Azure OpenAI endpoint and API key.
- Authenticate via Azure CLI:
az login
2. Configure Project Environment
- Create a
.envfile with your credentials:
AZURE_OPENAI_ENDPOINT="your-endpoint-url"
AZURE_OPENAI_API_KEY="your-api-key"
AZURE_OPENAI_API_VERSION="2025-04-14"
3. Install Dependencies
- Configure your Python virtual environment and install essentials:
uv venv
source .venv/bin/activate # macOS/Linux
.venv\Scripts\activate # Windows
uv add azure-ai-openai python-dotenv gradio aiohttp
4. Deploy and Test Using Gradio
- Launch your Gradio app for audio streaming:
python your_gradio_app.py
Developer Impact
By integrating GPT-4o audio models, developers can:
- Easily add transcription and TTS capabilities to apps
- Build responsive, accessible, and voice-driven user experiences
- Leverage customizable voice features for unique interfaces
Additional Resources
We encourage developers to explore these new models and share results and feedback to keep improving the platform.
This post appeared first on “Microsoft AI Foundry Blog”. Read the entire article here