Weekly AI Roundup: Multi-Agent Ops, Safe Tools, and Telemetry Loops

Mar 16, 2026 by TechHub

AI coverage kept coming back to a practical question: how do you move from “an LLM that chats” to systems that can operate safely, repeatably, and at scale. This continues last week’s thread on production-ready agent tooling (skills, orchestration, sandboxing, MCP/OpenTelemetry), but with more “run it like software” patterns: multi-agent composition, approval gates, context compaction, and the operational plumbing (deploy automation, debugging loops, telemetry/evaluation, data platforms) needed for real deployments.

This Week's Overview

Microsoft Agent Framework: production patterns for multi-agent apps (Python + .NET)

Microsoft Agent Framework content leaned more toward engineering details this week. It builds on last week’s reusable skills, orchestration patterns, and secure execution by showing how teams assemble multi-agent apps that can ship and run reliably. One guide turns incident response into a multi-agent workflow by splitting “on-call copilot” into four narrow agents - Triage, Summary, Comms, and PIR - with strict JSON schemas so outputs can feed automation (tickets, updates, runbooks) without brittle parsing. The orchestrator uses ConcurrentBuilder and asyncio.gather() to run agents in parallel, replacing one large prompt with lower latency and more predictable structure. Deployment is set up for production use: a containerized Python orchestrator as a Foundry Hosted Agent, with model choice delegated to Azure OpenAI Model Router (one deployment like model-router routing between gpt-4o and gpt-4o-mini). Auth uses DefaultAzureCredential with the https://cognitiveservices.azure.com/.default scope (local via az login, prod via managed identity) so teams do not have to distribute API keys. It reads as a direct follow-on to last week’s guidance on boundaries and orchestration. On the .NET side, the “Interview Coach” architecture uses the same multi-agent approach: receptionist/triage, behavioral interviewer, technical interviewer, and summarizer with explicit handoffs. It uses Agent Framework patterns (DI, type safety, OpenTelemetry), Microsoft Foundry as a governed model gateway (single endpoint, centralized identity/governance/moderation like PII detection), and external capabilities via MCP tool servers. The design is deliberately polyglot (for example, Python MarkItDown used by a .NET agent), which extends last week’s open-standards/MCP SDK coverage by showing cross-language tool servers in practice. .NET Aspire provides orchestration, service discovery, health checks, and a traces/health dashboard, with an end-to-end path from local aspire run to cloud azd up. As agent apps start executing actions, the agent harness patterns post fills in safety and operability details that were previewed last week (dynamic sessions, secure execution). The theme is consistent: expose shell/filesystem tools with explicit approvals, move execution into hosted/container sandboxes, and keep long sessions from expanding tokens and latency. Python shows an approval-gated shell tool around subprocess.run(...) (timeouts, stdout/stderr capture), while .NET uses approval-required wrappers (for example, ApprovalRequiredAIFunction). For context compaction, Python shows sliding-window retention, while .NET combines strategies (tool result compaction + sliding windows + truncation) via Microsoft.Agents.AI.Compaction to tune responsiveness and cost. These are practical complements to last week’s “load skills only when needed” theme. Python SDK Agent Skills updates also move skills closer to normal software development and extend last week’s skills SDK coverage. Skills can be defined in code (not only bundled files), resources can be dynamic via functions, scripts can be decorator-based in-process functions or file-based scripts via pluggable runners, and script execution can require human approval (require_script_approval=True). Across the posts, the pattern is consistent: multi-agent composition for clarity and latency, structured outputs for automation, tool execution behind approvals and sandboxing, and explicit strategies to keep long sessions reliable and affordable.

Azure automation and agent operations: Skills Plugin, SRE Agent “Deep Context,” and azd debugging

This week’s operational theme was making agents less advice-only and more able to execute real Azure work, while also improving how teams debug deployed agents. This continues last week’s secure execution and durable-tasks story (dynamic sessions, MCP SDKs), but with a day-two focus: environment context, tool-backed deployments, and CLI-first troubleshooting. The Azure Skills Plugin is the clearest push in this area. It ships Azure skills (19+ guarded workflows), an Azure MCP Server with 200+ tools across 40+ services, and a Foundry MCP Server for model catalog/management/deployment. The goal is to turn prompts like “Deploy my Python Flask API to Azure” into a structured Prepare → Validate → Deploy flow: generate artifacts (for example, Dockerfiles), run preflight checks, generate/use IaC, then deploy via azd. It operationalizes last week’s reusable skills and tool discovery approach by shipping a ready-made Azure tool/skills surface. Requirements make it clear this is meant for execution: a compatible host (Copilot in VS Code, Copilot CLI, or Claude Code), Node.js 18+, az, azd, and an authenticated Azure account. Smoke tests include a guidance-only question and a live tool call (list resource groups) to confirm MCP servers and skills are active. Azure SRE Agent also moved further from an incident assistant toward an operations agent that builds environment-specific expertise. Deep Context (described as available in GA) centers on continuous access to connected repositories and artifacts (auto-cloned/indexed), persistent memory across sessions (including capture via #remember), and background intelligence that discovers log schemas/KQL tables and generates reusable query templates. This extends last week’s boundaries theme: rather than stuffing context into prompts, the agent maintains a governed workspace and pulls evidence into the conversation when needed. The example (HTTP 5xx spike on a container app) shows the intent: start incidents with recent code/config and history already ingested. Another post describes “autonomous investigation” using a real cache-hit alert: parallel subagents tested hypotheses, filesystem workflows (grep, find, shell, reading files) tied telemetry to exact code versions, and the result was PR-shaped remediation (exclude uncacheable requests from alerting logic; restore prompt-prefix stability affecting caching). Across both, the pattern is consistent: treat the agent like a developer in a repository, layer context intentionally, keep evidence out of prompts until needed, and route changes through PR/CI gates. To support hosted-agent operations, azd added debugging via the azure.ai.agents extension. azd ai agent show reports container status/health/replicas/errors, and azd ai agent monitor streams container logs, keeping troubleshooting in one CLI loop instead of bouncing between portals. This complements last week’s traceability focus (OpenTelemetry/OAuth): once agents are services, a status/log loop and consistent identity become part of basic supportability. Version details are explicit: azure.ai.agents v0.1.12-preview, included with azd 1.23.7+, plus upgrade (azd extension upgrade azure.ai.agents) and bootstrap (azd ai agent init).

Microsoft Foundry and Microsoft Fabric: model deployment choices and production telemetry/evaluation loops

Platform coverage connected two parts of production AI work: model deployment with controls, and telemetry/data for evaluating and governing what agent apps actually do. This extends last week’s Foundry theme (models + agent features + stable SDKs) by adding open-model deployment options, while Fabric positions observability and accountability as an end-to-end data plane. Microsoft Foundry added a public preview integration with Fireworks AI for open-model inference hosted on Azure but managed through Foundry’s control plane. Teams can browse the catalog, evaluate models, deploy endpoints, monitor usage/quality, and apply governance without wiring together separate tools. Deployment supports serverless pay-per-token (“Data Zone Standard”) and provisioned throughput units (PTUs). It also adds BYOW (Bring Your Own Weights): upload/register custom (quantized or fine-tuned) weights and serve them through the same workflow. This extends last week’s “single control plane + stable SDKs” message to teams mixing frontier models and open weights. The post cites catalog models (for example, DeepSeek V3.2, OpenAI gpt-oss-120b, Kimi K2.5, MiniMax M2.5), signaling a consistent “try → deploy → govern” flow even as the open-model set changes. Microsoft Fabric’s agentic guidance focuses on observability and operations. One post frames Fabric as the operational data plane for agents: land structured telemetry into a governed OneLake workspace so teams can monitor routing, tool calls, latency, safety blocks, and failures in near real time (Eventstream → Eventhouse with KQL), and also do historical/business correlation (Lakehouse + semantic model + Power BI). It builds on last week’s best practices (boundaries, compliance, observability) by describing what to emit and where to store it. A reference implementation (Agentic Banking App: React + Python/LangGraph) demonstrates the telemetry pipeline, and the quality loop uses notebooks plus Azure AI Evaluation SDK by reusing captured telemetry instead of rebuilding ad-hoc datasets. Fabric also strengthened the link between business semantics and automation. Ontology Rules integrate with Fabric Activator so teams define real-time conditions/actions using Ontology entities/properties (Customer, Order, Device) rather than raw tables or stream-specific logic. The cold-chain example (“Freezer temperature exceeds safe limits for sustained period → trigger alert”) shows the goal: define thresholds in a governed semantic layer so analytics, agents, and automations reuse consistent definitions. Fabric AI Functions added ExtractLabel for schema-driven extraction of structured fields from unstructured text in pandas and PySpark. The key is enforcing an explicit output contract (JSON Schema or Pydantic schema) with required/optional fields, enums, nested structures, and additionalProperties=False to prevent extra keys, making outputs predictable for downstream validation and pipelines. This mirrors the structured-output discipline in Agent Framework workflows: reliable machine-consumable AI outputs reduce brittle parsing. It also works in distributed PySpark via synapse.ml.spark.aifunc, supporting LLM extraction at data-engineering scale.

Other AI News

Microsoft Research introduced AgentRx, a framework concept for systematic debugging of multimodal agents by centralizing traces across modalities and adding verifier-style checks to isolate failures (input interpretation, action selection, intermediate decisions, output validation). With this week’s production debugging focus (azd logs/status, Aspire dashboards, OpenTelemetry), AgentRx reads like the research-side version of the same idea: as tools and modalities expand, agents need failure modes that teams can observe and debug.

‘Systematic Debugging for AI Agents: Introducing the AgentRx Framework’ VS Code added chat forking: branch a conversation at any point, explore alternatives in parallel, and keep the original thread for comparison. This aligns with last week’s VS Code agent UX work (including forking) and reinforces that chat is becoming a workflow control surface, not only a single linear thread.
Forking Chat Sessions in Visual Studio Code Several higher-level pieces reinforced common constraints around autonomy and security in agentic systems. One describes an IT loop (observe → detect → analyze → act → learn) using Azure Monitor, Automation/runbooks, AKS self-healing, CI/CD hooks, and security tooling. Another breaks down Copilot agent design (goals, memory, tools, autonomy) with guardrails like least privilege and human approval. A “computer use agents” overview highlights risk when agents can operate software environments, which puts least-privilege identity and authorization design at the center. This echoes last week’s secure execution focus once agents move from recommend to act.
‘Agentic AI in IT: Self-Healing Systems and Smart Incident Response in the Microsoft Ecosystem’
‘How Copilot Agents Think: Goals, Memory, Tools, and Autonomy’
‘Building Computer Use Agents: Types, Functionality, and Security Risks’ Low-code agent building showed up via a cost-focused walkthrough: Copilot Studio with Azure SQL Database as system of record, including how to keep an entry-level deployment around ~$10/month by using free/low-cost options and careful SKU choices, then iterating agent behavior in Copilot Studio. It complements last week’s Copilot/Fabric coverage by grounding adoption in budgeting, SKU selection, and incremental rollout.
Building Low-Code AI Agents with Copilot Studio and Azure SQL Database for Under $10/Month