Create multimodal AI agents with persistent memory | DEMSP390
Edo Segal demonstrates how to build multimodal AI agents with persistent memory, including a live walkthrough of provisioning Napster as an Azure resource and integrating the agent securely with Azure AI Foundry.
Overview
This Microsoft Build 2026 session focuses on building a working “video AI agent” that can operate across multiple user touchpoints (web, app, store, support) while retaining context over time via persistent memory.
Key themes covered in the session description and chapter outline include:
- Building multimodal agents via an API, including “vibe coding” style iteration.
- Provisioning Napster as an Azure Resource and using the Napster Omniagent API.
- Embedding an MCP Server directly in JavaScript to enable local intelligence.
- Secure integration with Azure AI Foundry using minimal roles (least-privilege/RBAC-style approach).
- A demo scenario that queries for products (example: searching for OLED TVs under $2000).
Session structure (from chapters)
Starting point and motivation
- The session opens with a brief mention of a book exploring AI’s impact on society and future generations.
- It then frames the problem: users interact across many channels, but each interaction often “starts from zero” without shared context.
Building multimodal agents via API
- Edo Segal outlines how to start creating multimodal agents through an API-driven approach.
- The session emphasizes practical code patterns and a “next step” path to build an agent that can live across multiple surfaces.
Provisioning and setup on Azure
- The demo includes provisioning Napster as an Azure Resource.
- The walkthrough uses the Napster Omniagent API (“Omni Agent”) as the core agent capability.
- The session also touches on partnership benefits related to streamlined enterprise procurement.
MCP Server embedded in JavaScript
- A highlighted technical point is embedding an MCP Server directly in JavaScript to provide local intelligence.
- This is presented as a breakthrough enabling new patterns for agent behavior closer to where code runs.
Secure integration with Azure AI Foundry
- The session calls out secure integration within Azure AI Foundry.
- It specifically mentions doing this with minimal roles, aligning with least-privilege access patterns.
Demo example
- The chapter list includes a concrete demo query: searching for OLED TVs under $2000.
What you should take away
- A reference architecture for multimodal agents with persistent memory.
- Practical code patterns for building an agent via API.
- A security posture that emphasizes minimal roles when integrating with Azure AI Foundry.
- A clear next step to implement a similar agent across multiple user-facing channels.