Weekly AI Roundup: Agent Runtimes, MCP Auth, and Day-Two Ops
This week's AI updates pushed in two directions: more “agent runtime + tools + governance” building blocks reaching GA, and clearer paths to operationalize them (local models, MCP tool wiring with real auth, and agent-specific observability/grounding patterns that can work in production). It continues last week's “run it like software” framing: stable runtimes, inspectable tool contracts, and day-two controls (cost, identity, evaluation, safety) becoming the default.
Microsoft Foundry: GA agent runtime, SDK 2.0 shifts, and the “cloud + local” story
Foundry's March update reads like consolidation. Foundry Agent Service is now GA and built on the OpenAI Responses API, which keeps agents wire-compatible while adding enterprise needs like private networking (BYO VNet with no public egress, subnet/container injection), Entra RBAC, tracing, and production-focused evaluations/monitoring. It builds on last week's Agent Framework/Copilot Studio direction: orchestration and governance are moving toward platform capabilities (versionable, observable, enforceable) instead of being rebuilt per application.
For Azure AI Projects SDK users, workflows are increasingly centered on AIProjectClient, and SDKs moved to 2.0 stable across Python/JS/TS/Java (with .NET 2.0 shortly after on April 1). That also means migration work: packages and namespaces changed (for example, AIProjectClient.OpenAI → ProjectOpenAIClient in .NET, plus renames in JS/TS and Java). Teams should plan the upgrade rather than treating it as a last-minute patch. This matches last week's theme that stable contracts matter, and now the “contract” is often the SDK surface and Responses API shape.
On models and routing, Foundry's catalog continues to expand. GPT-5.4 (GA) is positioned for production agent reliability (tool calling, fewer mid-run failures, better multi-step stability) with up to 272K context and cached-input pricing. GPT-5.4 Mini (GA) targets high-volume classification/extraction and lightweight tool calls. Additions like Phi-4 Reasoning Vision 15B and Fireworks AI integration for open models/BYOW broaden options for “right model per step” architectures. This fits last week's cost/ops thread: once you route by step, provider choice and model tier become routine engineering settings.
Foundry Local reaching GA completes the “cloud + local” story. It keeps an OpenAI-compatible API while moving inference to device/edge. It follows last week's Foundry Local offline blueprints (RAG vs CAG) by pairing local-first patterns with a supported GA runtime: OpenAI-compatible schemas, local caching/streaming, and acceleration selection. The flow is designed to be straightforward: add a thin SDK wrapper (JS/Python/.NET/Rust), build pulls in Foundry Local Core and ONNX Runtime native bits, first run downloads a hardware-optimized model from the catalog, then runs fully offline with caching and token streaming. It chooses acceleration automatically (GPU/NPU with CPU fallback), supports OpenAI-compatible chat completions and audio transcription (plus Open Responses API format), and can run without a local HTTP server unless you enable an OpenAI-compatible endpoint. Platform specifics include WinML on Windows (with OS-managed execution provider/plugin acquisition), Metal on Apple Silicon, and support for Linux/macOS/Windows.
Overall, Foundry's message is less about one new feature and more about “these are the runtimes and SDK shapes to build on”: cloud-hosted agents with production controls plus a local runtime that preserves message schemas and tool calling across deployments. It follows the same continuity thread as last week: standardize runtimes, keep tools and governance enforceable, and make local vs cloud a deployment choice rather than a rewrite.
MCP on Azure: from tool hosting to authentication patterns and self-hosted automation
MCP continued to take shape as the connection layer between agents and real systems, with progress focused on making MCP deployments look like supportable services rather than one-off demos. This builds on last week's MCP “tooling glue” theme (Functions hosting, VS Code loops, governance-aware access). The emphasis is shifting from “you can host a tool” to “here is how you secure it, version it, and run it as a shared capability.”
For UI-capable MCP “Apps” on Azure Functions, there are now two paths. The TypeScript quickstart shows the serverless workflow: define tools via app.mcpTool() and UI resources via app.mcpResource() (text/html;profile=mcp-app), run locally with func start, validate with MCP Inspector, and deploy with azd plus Bicep to /runtime/webhooks/mcp. In .NET, the Functions MCP extension added a fluent API (preview) to hide brittle protocol wiring between tools and UI. AsMcpApp(...) generates the synthetic UI resource function, sets the MIME type, and aligns _meta.ui bindings. It also moves security into code: explicit permissions (for example, clipboard read/write), CSP allowlisting, static asset hosting with source maps excluded by default, and visibility controls to keep UI renderable while hiding tools from model tool selection. This matches last week's “remote HTTPS endpoints plus identity” intent, but it pushes safer defaults up into the framework so teams need less custom glue.
Once tools are hosted, safe calling becomes the next focus. The Foundry integration walkthrough focuses on Functions-hosted MCP servers as reusable backends: one server can serve IDE clients (VS Code, Visual Studio, Cursor, etc.) and Foundry agents using the same tool schema. The main value is the auth decision tree, continuing last week's identity options but with concrete fields: function keys for shared-secret access, Entra managed identity when agents act as themselves (agent identity preferred for production; project managed identity often OK for dev), OAuth passthrough for per-user permissions and auditing, and unauthenticated only for public or dev cases. It also lists the specific Foundry fields (audience/App ID URI, tenant endpoints, scopes like user_impersonation) to reduce portal guesswork.
Azure MCP Server 2.0 reaching stable is the other key MCP update. It is an open-source MCP server that exposes Azure operations as structured tools (276 tools across 57 services), with 2.0 focusing on remote/self-hosted deployment. This matches last week's “governed tool surfaces” direction. Instead of each dev running local tool defaults, you can run Azure MCP as a centrally managed internal service with consistent tenant/subscription defaults, telemetry policy, and network boundaries. It supports managed identity (with guidance for Foundry-adjacent setups) and an OBO pattern via OpenID Connect delegation to operate in the signed-in user's context. The release also highlights hardening (endpoint validation, injection protections for query-style tools), container image improvements, and sovereign cloud support. These are day-two details that fit treating tools as production infrastructure.
For Entra-authenticated MCP servers (especially with a pre-authorized client like VS Code), the Entra guide is explicit. Since Entra does not support MCP's CIMD/DCR flows today, the recommendation is pre-registration and pre-authorization: register the MCP server as a protected resource (API), define a delegated scope (for example, user_impersonation), configure requested_access_token_version=2, and add VS Code as a pre_authorized_application to avoid extra consent prompts. FastMCP validates JWTs using Entra public keys (no client secret needed for validation), and middleware captures the user oid so tools can enforce per-user auth/storage. For downstream calls (like Microsoft Graph) in user context, it shows the OBO flow, including admin consent and a managed-identity plus federated identity credential setup for Azure Container Apps. This is the identity glue implied by last week's hosting patterns, now written out as a workable recipe.
- Announcing Azure MCP Server 2.0 Stable Release for Self-Hosted Agentic Cloud Automation
- Give your Foundry Agent Custom Tools with MCP Servers on Azure Functions
- ‘MCP Apps on Azure Functions: Quickstart with TypeScript’
- ‘MCP as Easy as 1-2-3: Introducing the Fluent API for MCP Apps’
- Building MCP servers with Entra ID and pre-authorized clients
Agent operations in practice: observability, real-time UIs, and guarded automation
As teams move from “an agent that works” to “an agent we can run,” two themes stood out: visibility into agent behavior and explicit control points for risky actions. It continues last week's operational thread (Agent Framework 1.0 plus Copilot Studio GA hooks, plus SRE Agent prerequisites/billing). Once agents touch real systems, you need traceability, approval gates, and cost/usage visibility built into the loop.
For observability, Application Insights' new Agents view (preview) shows the platform adapting to agent-focused telemetry. The walkthrough instruments a .NET multi-agent “travel planner” on Azure App Service with OpenTelemetry GenAI semantic conventions so App Insights can answer agent-shaped questions: token usage per agent, per-agent latency/error rate, and end-to-end traces across an API plus WebJob split. The Agents view requires the right GenAI attributes (for example, gen_ai.agent.name). It demonstrates two instrumentation layers: Microsoft.Extensions.AI for LLM-call spans (token/model/provider attrs) and Microsoft Agent Framework (MAF) for agent-identity spans used for per-agent grouping. It also calls out a tradeoff: double instrumentation can duplicate spans, so you may choose identity grouping vs token detail depending on what you need most. This matches last week's “inspectable and governable” stance: instrumentation is part of system design, not an afterthought.
For making behavior visible and controllable, the AG-UI plus MAF demo streams multi-agent execution events to a live frontend over SSE so users can see which agent is active, what step is running, and why it is waiting. This echoes last week's streaming patterns in Foundry Local samples. Production agents need explicit progress/state signaling. The backend uses an explicit handoff graph (declared routing edges) and interrupts for user-info requests and human approval of sensitive tools. Marking tools with approval_mode="always_require" forces an interrupt instead of execution, and the React UI renders an approval modal with tool name/args before resuming. This “declared topology plus interruptible tools plus real-time events” matches last week's versioned orchestration and moderated tool integration, expressed as runtime plus UX.
For day-two ops, the Well-Architected operational excellence discussion reinforced a practical sequence: start with observability (OpenTelemetry), then use AI to summarize incidents and suggest next steps, and only move toward automation once guardrails, evaluation, and human-in-the-loop controls exist for high-impact actions. This aligns with last week's SRE Agent framing: the constraint is not whether an agent can act, but whether you can prove what happened, limit blast radius, and sustain it on-call.
- Monitor AI Agents on App Service with OpenTelemetry and the New Application Insights Agents View
- Building a Real-Time Multi-Agent UI with AG-UI and Microsoft Agent Framework Workflows
- Use AI to Achieve Operational Excellence with the Well-Architected Framework practices
Other AI News
Data and analytics teams got two options for making data easier to use with agents without adding a separate AI pipeline layer. Fabric Data Warehouse introduced built-in AI functions (preview) that run directly in T-SQL for JSON extraction (ai_extract), sentiment (ai_analyze_sentiment), classification (ai_classify), summarization/translation/grammar fixes, and a prompt-based escape hatch (ai_generate_response) that you can wrap in UDFs/stored procedures to standardize prompts. This extends last week's Fabric “trusted data for AI” story by moving agent-enablement into the governed warehouse surface where permissions, lineage, and operational controls already exist.
A related “intelligence platforms” guide described an enterprise agent architecture: unify access with OneLake, enforce meaning/permissions with Fabric semantic models (measures/RLS), expose governed NL querying via Fabric Data Agents (preview), and connect that to Azure AI Foundry agents as a reusable tool, optionally enriched with Microsoft Graph context. It continues last week's governance-and-metadata theme: grounding tends to be more reliable when it is a permission-aware tool call over curated semantics, rather than copied text in prompts.
- Working with unstructured text in Fabric Data Warehouse with built-in AI functions (Preview)
- Why data platforms must become intelligence platforms for AI agents (with Microsoft Fabric + Azure AI Foundry)
Legacy modernization got a concrete agent example. An IIS migration guide showed using an MCP server to orchestrate Microsoft's IIS-to-App-Service migration scripts with human approvals, producing artifacts like
install.ps1, adapter ARM templates, andMigrationSettings.json. It highlights Managed Instance on App Service specifics (PremiumV4 withIsCustomMode=true, plus OS dependencies like COM, MSMQ/SMTP, registry, drive-letter storage). This is a practical “MCP as governed automation interface” pattern: wrap existing scripts as typed tools, host them remotely, and require explicit approval before provisioning billable resources. That is similar to last week's remote-only MCP endpoints and identity boundaries. - Agentic IIS Migration to Managed Instance on Azure App Service On developer workflow, a short Cozy AI Kitchen episode showed design-to-code MCP: wiring a Figma MCP server into VS Code so design artifacts become structured context. The practical goal is fewer handoff cycles by referencing the real source of truth (Figma components/layout metadata) instead of text descriptions, reinforcing the wider shift across these roundups toward tools over prompts.
- Setting Up Figma MCP Server in VS Code