Weekly AI Roundup: Production-ready agents, MCP tools, Fabric AI

This week's AI updates focused less on feature demos and more on making agent systems easier to run. Microsoft moved Azure AI Foundry's agent runtime into GA with enterprise networking, identity, and evaluation hooks; MCP kept showing up as the tool-wiring layer; and Fabric continued to blend analytics and AI app building with more multimodal, real-time, and Copilot-driven workflows. Overall, it feels like a continuation of last week's “run it like software” focus (approval gates, sandboxing, OpenTelemetry, structured outputs): more of those patterns are arriving as defaults (private networking, managed identity options, continuous eval, and tool connectivity without bespoke glue).

Azure AI Foundry Agents move into production: GA runtime, private networking, voice, and built-in evaluation

Foundry Agent Service reached GA with a runtime built on the OpenAI Responses API, aiming to be wire-compatible for teams already aligned to Responses/Agents-style interfaces. Last week's “treat agents like deployable software” theme shows up here as consolidation: rather than assembling orchestration, auth, and telemetry by hand (like last week's Agent Framework + Foundry examples), Foundry is standardizing how agents are created, invoked, and governed. For Python, GA also tightens SDK guidance: agents become first-class operations on AIProjectClient in azure-ai-projects, with an explicit migration off azure-ai-agents (remove pin, use azure-ai-projects, call via AIProjectClient.get_openai_client() and responses.create(..., extra_body={"agent_reference": ...})). The most practical production change is end-to-end private networking with a “Standard Setup” that keeps agent traffic off the public internet once agents start doing retrieval and tool calls. It extends last week's “secure execution + governed gateways” theme: last week emphasized Foundry as a controlled model gateway plus approval and sandbox patterns at the tool layer, and this week adds the network boundary so models, retrieval, and tools can stay on private paths. Microsoft says this covers model runtime traffic and tool connectivity, including MCP servers, Azure AI Search indexes, and Fabric data agents, enabling a BYO VNet design without public egress for the workflow. Identity and tool access also became more enterprise-shaped. Foundry expanded MCP authentication patterns: key-based access, Entra Agent Identity, Entra Foundry Project Managed Identity (project-level isolation), and OAuth identity passthrough for user-delegated access. This connects to last week's least-privilege and approvals theme: when tools are real actuators (deployments, incidents, repo ops), these identity modes keep “agent can do things” from defaulting to shared secrets and broad access. On interaction modes, Voice Live arrived as a managed, real-time speech-to-speech channel using the same agent runtime as text. Voice does not change the core problems (tool calling, tracing, approvals, compaction), but it does increase latency and reliability requirements. Having one runtime surface with shared tracing, evaluation, and cost accounting helps avoid building a separate voice stack that is hard to monitor. Foundry Evaluations are now GA, with built-in evaluators (fluency/coherence, relevance, groundedness, retrieval quality, safety), custom evaluators, and continuous evaluation sampling of production traffic. This is the platform counterpart to last week's operational loops (OpenTelemetry/Aspire dashboards, azd debugging, Fabric telemetry pipelines): instead of per-app eval harnesses, quality checks become continuous and viewable alongside latency, failures, and cost in Azure Monitor / Application Insights.

MCP as the agent tool layer: remote servers in Foundry, managed observability gateways, and a C# SDK v1.0

MCP kept moving from “developer curiosity” to product integration, continuing last week's story of MCP as a bridge between agent frameworks and external systems. This week's tone is less “how to wire MCP” and more “here are MCP endpoints you can use,” which is typically when protocols become operationally relevant. Microsoft Foundry added a remote Azure DevOps MCP Server (public preview), letting Foundry agents connect to an Azure DevOps org via the tool catalog and call DevOps operations through MCP. It fits last week's operational-agent direction (investigation, PR-shaped fixes): DevOps is where “agent does work” becomes risky without boundaries. A key control is restricting which DevOps tools the agent can use, which helps prevent early experiments from turning into “agent can access everything,” echoing last week's approval-gated tools and structured outputs. Azure Managed Grafana MCP takes a different angle. Instead of deploying and securing a custom MCP server to expose telemetry, every Azure Managed Grafana instance can provide a built-in managed remote MCP endpoint. This connects to last week's point that agents need day-two loops (telemetry, debugging, evaluation): MCP makes the observability estate queryable by agents while still using Azure RBAC and Grafana access controls. The approach is straightforward: authenticate with managed identity and let agents query Azure Monitor, Application Insights, and Kusto-backed sources without adding another hosted service. For .NET teams, the MCP C# SDK v1.0 shipped with a community standup walkthrough focused on MCP as a vendor-neutral contract for exchanging context, requests, and responses. This matches last week's polyglot MCP tool-server hint: Python ecosystems help, but a supported .NET SDK makes it easier to standardize tool wiring across enterprise services without tying everything to one runtime. The ecosystem conversation remains noisy (GitHub's “The Download” jokes about an “MCP funeral”), but Microsoft shipped multiple concrete MCP updates in the same week. Alongside last week's repeated MCP appearances in Agent Framework + Foundry designs, the developer takeaway is that MCP is increasingly a practical option for tool connectivity, with identity, governance, and hosted endpoints becoming less DIY, while still keeping an eye on portability and cross-vendor compatibility.

Knowledge-grounded agents with Foundry IQ: permission-aware retrieval via Azure AI Search + MCP

Foundry IQ aims to make “enterprise RAG” less bespoke by formalizing reusable knowledge bases over multiple sources (SharePoint, OneLake, Blob Storage, Azure AI Search, and more). It extends last week's pattern of moving context out of prompts and into governed systems (Azure SRE Agent “Deep Context,” Fabric telemetry-as-data-plane). Instead of each agent writing retrieval glue, IQ treats knowledge access as a platform component called through a standard tool surface. The tutorial shows how this connects to Foundry Agent Service via MCP: the agent calls knowledge_base_retrieve exposed through an Azure AI Search endpoint, using a preview API path like /knowledgebases/{kb-name}/mcp?api-version=2025-11-01-preview. The parts that matter most for developers are security and ops patterns. Retrieval is permission-aware: ACLs can sync into the index and be enforced at query time so the agent retrieves only what the current user is allowed to see, with citations generated as part of retrieval. This matches last week's “structured outputs + approvals + least privilege” theme: production RAG is mostly access control, attribution, and repeatability. The sample also shows Entra ID + RBAC setup instead of keys: enable RBAC auth on Azure AI Search, grant project managed identity Search Index Data Reader, create an Azure AI Projects project connection as a RemoteTool target authenticated with Project Managed Identity, then attach an MCPTool and require tool usage plus citations in instructions. Microsoft also announced a three-episode “IQ Series: Foundry IQ” starting March 18, 2026, with videos, notebooks, and cookbooks aimed at taking teams from concepts to multi-source knowledge bases and queries. The message is that retrieval is becoming a reusable platform surface (sources/bases + MCP endpoint + RBAC/ACL), not app-specific glue.

Building, testing, and operating agents: VS Code AI Toolkit, azd local run/invoke, and resilient long-running runs

Agent tooling posts kept shortening the path from prototype to repeatable testing, continuing last week's day-two operability thread (azd status/logs, Aspire dashboards, production harnesses). The VS Code AI Toolkit + Foundry walkthrough shows an end-to-end workflow: start in an Agent Builder UI (assemble agent, attach tools, iterate in playground, ground on business data) then move to a code-first hosted template where you add custom Python functions, debug locally, deploy, run evals, do AI red teaming, and monitor quality/latency/cost across a fleet. It pulls last week's production patterns into an IDE workflow so more teams can apply them consistently. On CLI workflows, the Azure Developer CLI extension azure.ai.agents (v0.1.14-preview) adds azd ai agent run and azd ai agent invoke. This builds on last week's hosted-agent visibility commands: run and invoke are what make regression testing and troubleshooting scriptable. run detects project type (Python/Node.js) and dependencies before launching. invoke supports streaming responses and persistent session IDs so multi-turn testing does not require manual state handling. It fits prompt testing scripts and CI-like agent checks, especially with Foundry Evaluations now GA. For reliability, Microsoft Agent Framework added background responses for long-running operations without holding client connections open, especially for reasoning models that can take minutes. This extends last week's long-session concerns (compaction, dynamic sessions, latency/cost control) into a resumable job model. In .NET, enable AllowBackgroundResponses; in Python, set background=True. If supported, you get a continuation token you can poll (non-streaming) or persist during streaming to resume after disconnects from the interruption point. The advice to persist continuation tokens (DB/cache) is a sign that agent runs increasingly behave like durable workflows with checkpoints, retries, and reattachment.

Agent patterns in practice: Python agent workflows and realtime voice multi-agent systems in .NET

Two longer pieces focused on architecture patterns, mapping closely to last week's “multi-agent composition + observability + evaluation + approvals” blueprint. Pamela Fox's recap of the “Python + Agents” livestream series is a Microsoft Agent Framework curriculum: tool calling → MCP servers → supervisor/subagent patterns → RAG with SQLite/PostgreSQL → memory with Redis/Mem0 → OpenTelemetry via Aspire dashboards → evaluation with Azure AI Evaluation SDK → branching/fan-out/fan-in workflows → human approvals and checkpoint/resume for long-running work. It turns last week's “real apps” examples into a repeatable build approach, including the same ops loops (OTel/Aspire, evaluation SDK) that both weeks keep reinforcing. On the .NET side, RT.Assistant is a reference for low-latency voice assistants using OpenAI's Realtime API over WebRTC, orchestrated in F#. It complements this week's Foundry Voice Live: both treat voice as a first-class modality, but RT.Assistant focuses on runtime details (WebRTC/OPUS, message bus), while Voice Live emphasizes a managed channel with evaluation and tracing. RT.Assistant also makes multi-agent behavior more predictable with a deterministic state machine (“Flow”) and a strongly typed message bus using F# discriminated unions, similar in spirit to last week's structured schemas and explicit handoffs for automatable outputs. It argues for WebRTC + OPUS efficiency versus base64 PCM over WebSockets, and shows “structured RAG” where an LLM generates Prolog queries executed against a local KB (Tau Prolog) instead of embeddings/vector search.

Microsoft Fabric as an AI execution surface: multimodal functions, Copilot-in-notebooks, real-time intelligence, and pipeline automation

Fabric's AI story kept moving from individual features to an integrated execution surface for analytics plus AI, extending last week's Fabric thread (operational telemetry/eval loops, ontology automation, schema-controlled extraction). This week mostly deepens that direction with more modalities, more cost and operability surfaces, and more ways to push outputs into governed pipelines instead of one-off notebooks. Fabric AI Functions added multimodal input (preview), so notebooks and Dataflows Gen2 can process images and PDFs (and common text formats) by passing file paths (including column_type="path"). It continues last week's ExtractLabel theme: turn unstructured inputs into pipeline-friendly structured outputs, with schemas as the contract for downstream reliability. Helpers like aifunc.load() (folder-to-table with optional prompt/schema), aifunc.list_file_paths(), and ai.infer_schema() shorten the path from files to reproducible extraction via ai.extract(). Operability also improved: a progress bar estimating tokens and Capacity Units (CUs), and clearer capacity attribution in the Fabric Capacity Metrics App under “AI Functions.” Evaluation notebooks for LLM-as-judge loops (executor + judge models, with precision/recall/F1/coherence) aim to reduce ad-hoc iteration and complement last week's “reuse telemetry as eval data” guidance. In notebooks, Fabric previewed an updated Copilot for data engineering/science with always-on context awareness (workspace, Lakehouse, notebook structure, runtime state), Spark performance recommendations based on observed behavior (joins, shuffles), and a “Fix with Copilot” loop that captures failure context, proposes patches, and applies them via diff review, plus a /Fix command for a cell or whole notebook. It is a notebook-native version of last week's debugging and operability push: close the loop where people iterate. Beyond notebooks, Fabric continued pushing real-time and operational AI via Real-Time Intelligence + Fabric IQ (ontology), building on last week's Ontology Rules + Activator “Observe → Analyze → Decide → Act” loop. OneLake ties to Eventstream/Eventhouse/Activator/real-time dashboards with a semantics layer so teams (and agents) interpret events consistently. Developer callouts include Maps GA, Business Events for semantic detection/triggers, Fabric Graph scaling and GQL updates (including shortest-path), and an Eventstream SQL Operator for SQL-based streaming transforms/routing (early April). Microsoft also announced a Microsoft-NVIDIA Omniverse direction (private preview planned in April) to embed 3D scenes into real-time dashboards for digital-twin/physical AI scenarios. Fabric's pipeline tooling also advanced: Data Factory pipelines added a Lakehouse Utility Suite (preview) with Lakehouse Maintenance activity and Refresh SQL Endpoint activity, while Copilot in the pipeline expression builder is GA for natural-language-to-expression authoring. It is a practical follow-on: once AI extraction and agent signals land in OneLake, you still need scheduled maintenance and automation to keep tables and endpoints healthy.

Other AI News

Fabric's broader analytics/AI update included many GA vs preview details: Materialized Lake Views GA for incremental, quality-constrained lakehouse transforms; Runtime 2.0 preview moving toward Spark 4.0 / Delta Lake 4.0; new connectivity (JDBC GA, Spark ODBC and ADO.NET preview); and warehouse updates like compute isolation (Custom SQL Pools preview), freshness/stats automation, and AI functions callable from T-SQL. The roundup also introduced open-source “Agent Skills for Fabric” for GitHub Copilot CLI to scaffold and automate Fabric tasks from natural language, similar to last week's Azure Skills direction but focused on Fabric operations. Fabric Mirroring added opt-in paid “extended capabilities” (preview): Delta Change Data Feed into OneLake and Mirroring Views (Snowflake preview) to materialize source views as Delta tables, supporting incremental processing without custom change tracking. This matches last week's governed data plane theme: keeping OneLake current helps downstream analytics, quality loops, and automation triggers stay aligned. Fabric previewed Planning in Fabric IQ for budgeting/forecasting/scenarios on governed data and Power BI semantic models, with SQL writeback plus approval/audit/RBAC hooks. It is useful context for end-to-end operational analytics systems and another example of “semantic layer + governed actions,” consistent with last week's Ontology Rules theme. Microsoft introduced MAI-Image-2, a text-to-image model focused on photorealism and more reliable in-image typography, with testing via MAI Playground and broader API access expected via Foundry. For Foundry model optimization, Microsoft published videos on supervised fine-tuning, improving tool-calling accuracy (synthetic data and distillation), and post-training workflows (custom graders, evaluation, cost planning, deployment). The emphasis on graders, evaluation, and cost planning matches last week's evaluation-loop focus and this week's Foundry Evaluations GA. For teams hosting inference stacks, AKS “inference at scale” guidance covered tensor/pipeline/data parallelism tradeoffs, quantization-first advice, Ray placement groups (Anyscale on Azure) for shard-aware scheduling, and production security posture (private clusters, Cilium policy, Entra ID + managed identities, Key Vault), plus core metrics (tokens/sec/GPU, tail latency, KV cache hit rate, tokens per GPU-hour). It fits the broader arc: private networking, managed identity, and measurable ops loops become baseline expectations when AI moves from prototypes to services.