Weekly AI Roundup: IaC Agents, Local Endpoints, Governed Context

Mar 30, 2026 by TechHub

This week's AI updates tracked two parallel themes: shipping agents into production with repeatable workflows and governance, and adopting more local-first, inspectable patterns for building and operating AI systems. Across Azure AI Foundry, Foundry Local, and Microsoft Fabric, the common thread was making agent behavior easier to deploy, ground, observe, and control via IaC scaffolding, structured tool plans, ontology/graph grounding, and cost guardrails. This continues last week's “run it like software” arc: last week delivered GA runtimes, private networking, managed identity, evaluation hooks, and MCP tooling glue; this week shows how teams ship and operate those ideas (IaC-first delivery, offline OpenAI-style endpoints, and more traceable retrieval/reasoning).

This Week's Overview

Azure AI Foundry & Foundry Local: agent delivery from cloud endpoints to offline workflows

Foundry updates covered both ends of deployment: “code-to-cloud” publishing of agent endpoints and offline/on-device systems that still look like OpenAI-compatible apps. If last week was about the agent runtime being ready for production (GA runtime, private networking, evaluation, managed identities), this week focused on making that posture reproducible from a repo and extending “treat it like a service” to local endpoints. On the cloud side, the Azure Developer CLI added a direct path from a Python agent repo to a live Azure AI Foundry agent endpoint via the azd ai agent extension. Building on last week's azd ai agent run/invoke, it now covers deployment with infra/identity defaults that match the governance patterns we've been tracking (managed identity + RBAC, scripted flows for CI/CD). The workflow is intentionally opinionated: azd ai agent init scaffolds Bicep IaC (notably infra/main.bicep), azure.yaml, and agent.yaml for metadata/env vars. Then azd up provisions Foundry, deploys a model config (GPT-4o example), configures managed identity + RBAC, and publishes the endpoint with a portal link for playground validation. The inner loop includes azd ai agent invoke (multi-turn), azd ai agent run (local execution through the same flow), azd ai agent monitor/--follow for logs, and azd down for cost cleanup. The optional chat frontend wiring highlights using azd env get-values/set to keep app<→agent connections repeatable across environments and CI/CD (for example, GitHub Actions running azd up on main), complementing last week's focus on repeatable evaluation/monitoring loops. Foundry Local also matured as the offline counterpart, with examples that treat “local LLM runtime” as a dependency you operate rather than a demo shortcut. The OpenAI-compatible endpoint detail continues last week's wire-compatibility thread: whether you use cloud Foundry or Foundry Local, client code can stay stable while you swap endpoints/environments. One guide shows a multi-agent robotics automation pipeline that keeps the LLM away from direct simulator control using a constrained contract: “LLM → strict JSON plan → safety validation → executor.” This matches last week's themes (structured outputs, approvals, least-privilege tools) but in a local control loop where safety/determinism matter more. Foundry Local exposes an OpenAI-compatible endpoint, so the main client change is the base URL; the example uses FoundryLocalManager and models like qwen2.5-coder-0.5b with automatic backend selection (CUDA GPU → QNN NPU → CPU). Agents are split cleanly: PlannerAgent emits JSON tool calls, SafetyAgent validates schema/bounds in sub-millisecond time, and ExecutorAgent runs PyBullet behaviors (IK movement, pick/place, scene description). It also includes offline voice commands (browser MediaRecorder, 16kHz mono WAV resampling, server-side ONNX Whisper with caching/chunking) feeding the same flow, which is useful for hands-free or low-latency local control. Compared with last week's Foundry Voice Live, it's a contrast between cloud real-time voice and local capture/transcription, with the same need for traceable plans and safety gates. Model timing comparisons (sub-5s on the smallest vs ~35-45s on larger ones) make the interactive tradeoffs concrete. A second Foundry Local tutorial applies the OpenAI-compatible runtime to an offline RAG assistant (“Interview Doctor”), using a deliberately lightweight retrieval approach. Instead of embeddings, it chunks docs (~200 tokens + overlap), stores term-frequency vectors in SQLite (sql.js), and retrieves via cosine similarity, positioned as ~1ms for small corpora (CV + job descriptions) without running an embedding model alongside the LLM. This pairs with last week's Foundry IQ direction (permission-aware retrieval as a standard tool surface): different stack, same goal of grounding as a reusable, testable component. The app is Node/Express with a single-file web UI streaming via SSE plus a CLI, and it notes an operational gotcha: Foundry Local uses a dynamic port, so use the SDK-discovered endpoint (manager.endpoint) instead of hardcoding localhost. It also demonstrates testability using node:test so core logic can be validated without the local runtime running, another “run it like software” signal applied to offline builds.

Microsoft Fabric for AI agents: governed events, ontology context, and inspectable graph reasoning (Previews + GA)

Fabric's AI updates focused on making real-time signals and business context reusable and governable, so agents, automation, and analytics can share definitions across notebooks, pipelines, and dashboards. It continues last week's Fabric direction: put AI work into shared, governed surfaces (real-time intelligence + IQ context + semantics) instead of isolating it in one notebook or app. Business Events in Fabric (preview) add a business-level event layer in Real-Time Hub. Instead of raw telemetry sent to tightly coupled consumers, teams define business event types with governed schemas via Schema Registry/Schema Sets, then emit events from Fabric compute (Notebooks and User Data Functions are called out). This extends last week's “Observe → Analyze → Decide → Act” loop by making “observe” a versioned contract: less ad-hoc plumbing, more reusable signals downstream systems (including agents) can trust. The value is decoupled fan-out: one Business Event can drive Activator actions, Power Automate, Notebooks, Spark jobs, Dataflows, and AI/ML enrichment without the publisher knowing consumers. The manufacturing example (anomaly → “CriticalVibrationDetected” → safe-mode + ticket + root-cause notebook) illustrates reducing glue code while keeping schemas consistent. For agent context, Fabric IQ Ontology (preview) positions ontology items as operational context: mapping entities/relationships to OneLake data and events so agents do not rely on inconsistent definitions. This builds on last week's Fabric IQ/ontology push: centralize semantics for humans and agents. The roadmap adds embedded rules/actions (via Activator), Fabric-aligned permissions/sharing (Read/Edit/Reshare), and tenant/workspace Azure Private Link hardening, mirroring last week's move toward private networking and governed access across Foundry/Fabric. It also points to interoperability via upcoming “Ontology MCP endpoints,” exposing ontology context through public MCP endpoints so external MCP-capable agents can retrieve the same grounded business context. Given last week's MCP endpoint momentum, this is another step toward “business context as a standard tool surface,” not copied into prompts. Fabric also previewed “graph-powered AI reasoning,” combining Fabric Data Agent with Fabric Graph for more inspectable answers via deterministic traversal (“graph RAG”). This matches last week's evaluation/traceability emphasis but constrains the reasoning path: translate natural language to validated GQL via NL2GQL, run deterministic graph traversals, and expose a GQL trace so users can review which relationships produced the answer. The Adventure Works example highlights recommendation logic that can be awkward in SQL but explicit in a graph (including derived nodes like country) and queryable with traceable outputs, which is useful when you need an auditable reasoning path rather than only probabilistic text. Finally, Fabric's Workload Hub delivered an AI data-readiness GA: Tonic Textual is now generally available for scanning unstructured OneLake text for sensitive entities and applying transformations (redaction, masking, synthetic replacement, custom rules), writing results back to a separate OneLake location. This aligns with last week's “production RAG is access control and repeatability” theme: data needs a standardized privacy/prep step before retrieval/eval pipelines become trustworthy. The practical benefit is OneLake-to-OneLake de-identification that preserves structure (dialogue, contracts) instead of exporting to external tools.

MCP and agent platform choices: standardizing tool/context access and picking the right builder

A recurring architecture thread this week was how agents get tools and context, and how teams choose an agent-building surface as requirements grow. Last week framed MCP as increasingly operational (remote MCP servers in Foundry, managed Grafana MCP endpoints, and a .NET SDK). This week continued that theme with protocol maturity and tool-surface examples beyond typical enterprise apps. Two MCP pieces reinforce the momentum. GitHub's Universe video explains MCP as a standardized contract for exposing tools/data to agents, especially private or new information that is not in training data, and notes the official open-source MCP server was rewritten from TypeScript to Go, which changes deployment and contribution details. That kind of reference shift suggests MCP is moving from experimentation into maintainability (runtime footprint, deployment model, contributor workflow), matching last week's shift toward hosted, identity-aware endpoints. Unity-MCP shows MCP-style structured calls in a game engine where “context” is editor/project state (scenes, GameObjects, assets, components), giving AI a more deterministic surface than text alone. Two “which platform should I use?” pieces also landed, making explicit what tends to trigger migration. When governance, evaluation, observability, and custom tool/knowledge wiring matter, teams often move from simpler builders toward Foundry/Azure AI Agents-style surfaces (often keeping a separate interaction layer). One compares Copilot Studio vs Azure AI Agents: low-code connectors and predictable pricing for well-defined assistants vs developer-built, consumption-based systems with model choice, RAG, orchestration, evaluation, and observability. The other provides a broader framework across Agent Builder, Copilot Studio, and Azure AI Foundry, emphasizing criteria that drive migration: complexity, model flexibility, deployment targets, lifecycle ops (eval/observability), safety/guardrails, memory/state, tool/knowledge integration, and cost control. Together they reinforce a hybrid approach: Copilot Studio for UI/flows, backed by a programmable Foundry/Azure AI Agents layer for intelligence and governance, consistent with last week's standardization on MCP and operational loops under multiple app surfaces.

Other AI News

Cost and operations management showed up as platform guardrails and incident automation, connecting to last week's “day-two” focus (evaluation, observability, private networking, identity). One guide shows an Azure-native spend control loop for Azure OpenAI: Cost Management Budgets trigger Action Groups, which run Azure Automation PowerShell to disable local auth (Set-AzCognitiveServicesAccount -DisableLocalAuth $true) when thresholds are hit, with a separate manual runbook to re-enable after review. Separately, Azure SRE Agent HTTP Triggers shows starting an automated investigation from Jira using an Azure Logic App as a Managed Identity auth bridge (the trigger endpoint is Entra-protected and uses SRE Agent data-plane RBAC). The pattern (external system → Managed Identity bridge → SRE Agent trigger) keeps credentials out of Jira while preserving audit history, and uses a Jira MCP connector (mcp-atlassian 2.0.0 over STDIO). In the context of last week's MCP identity modes and managed endpoints, it's another example of pairing agent actions with identity boundaries and auditability.

Automating Azure OpenAI Cost Control Using Budgets, Action Groups, and Automation Runbooks
HTTP Triggers in Azure SRE Agent: From Jira Ticket to Automated Investigation .NET developers got an updated learning path: “Generative AI for Beginners .NET” v2 is rebuilt on .NET 10, switches foundational model calling from Semantic Kernel to Microsoft.Extensions.AI (IChatClient + middleware pipeline), and standardizes auth with AzureCliCredential so you can log in once via Azure CLI instead of distributing keys. This aligns with last week's managed identity/consistent auth defaults and complements last week's MCP/.NET tooling by clarifying the baseline stack before heavier orchestration. RAG content is reworked toward native SDKs, and the agent module uses Microsoft Agent Framework (RC), keeping orchestration as a dedicated topic rather than the default entry point.
Generative AI for Beginners .NET: Version 2 on .NET 10 RAG patterns continued to diversify beyond embeddings, echoing last week's themes of governed grounding and traceability. A “vectorless reasoning-based RAG” tutorial uses PageIndex to build a hierarchical document tree for long PDFs, then has an LLM select relevant nodes (pages/sections) via strict JSON before answering only from retrieved text. The goal is fewer moving parts than embeddings + vector DB and better traceability back to page indices and node IDs.
Vectorless Reasoning-Based RAG: A New Approach to Retrieval-Augmented Generation Foundry Labs introduced a “scout → evaluate → graduate” workflow: try early-stage model/agent experiments (30+ projects) with clear maturity expectations, then move promising work into Azure AI Foundry where evaluation, tracing, monitoring, and governance are first-class. This maps to the two-week story: prototype quickly, but capture telemetry early and graduate into a runtime where evaluation/observability are standard (last week's Foundry Evaluations GA and agent runtime GA are the likely destinations). It also ties observability to Azure API Management's genAI gateway controls (token metrics, prompt logging, quotas, safety policies) and suggests capturing telemetry from day one (even JSONL logs).
Microsoft Foundry Labs: A Practical Fast Lane from Research to Real Developer Work Fabric Real-Time Dashboards added a Copilot preview that generates and iterates KQL tiles from natural-language requests, suggesting a visualization, showing a preview table, and exposing editable KQL. It matches last week's Fabric theme of speeding authoring where teams work while keeping the query layer inspectable, applied to real-time ops dashboards.
Use Copilot to create visuals in Real-Time Dashboards (Preview) Two “Budget Bytes” teaser posts pointed to a cost-constrained, hands-on AI app series around Azure SQL Database, linking to playlist/repo/free offer resources rather than going deep technically. In the context of last week's cost/eval/ops focus, it's another signal that budget-aware engineering is now common in AI guidance.
Budget Bytes: Azure Data Leaders on AI & Budget (Sneak Peek)
What Would You Buy With $25? Answers from Execs A manufacturing case study highlighted a “copilot for operators” pattern: ARUM's CNC assistant uses Azure AI Speech plus Azure OpenAI hosted in Microsoft Foundry (noted as GPT-5) to provide Japanese, step-by-step guidance for safety-critical setup tasks. It's light on implementation details, but it connects threads we've been tracking: voice modalities (last week's Voice Live vs this week's offline voice pipeline) and production guardrails where procedures and human confirmation matter.
Japan’s ARUM turns craftsmanship into scalable AI for precision manufacturing