Weekly AI Roundup: IDE agents, governed tools, and hosting

This week's AI news leaned into making agent development look more like normal software engineering: tighter IDE loops for building, debugging, and evaluating; clearer production hosting and orchestration options; and concrete patterns for connecting agents to governed data and automation. This continues last week's “run it like software” framing where stable runtimes, inspectable tool contracts, and day-two controls (identity, policy, cost, evaluation) become the default rather than add-ons. Microsoft Foundry and Fabric also expanded platform capabilities with new models, fine-tuning options, MCP toolchains, and agent experiences that are easier to monitor and audit.

Microsoft Foundry in the IDE: end-to-end agent and model workflows

Building on last week's Foundry standardization theme (Responses API compatibility, Agent Service GA, cloud+local deployment), Microsoft Foundry Toolkit for VS Code is now GA. It positions VS Code as the place where experimentation, agent engineering, evaluation-as-tests, deployment, and on-device model work (enabled by Foundry Local GA) can live in one loop. For experimentation, the Model Catalog includes 100+ cloud and local models (OpenAI/Anthropic/Google plus local ONNX/Foundry Local/Ollama), along with a Model Playground for side-by-side comparisons, multimodal testing, optional web search, and streaming. “View Code” generates Python/JavaScript/C#/Java snippets that match what you tested, which mirrors last week's focus on keeping a tight contract between what you test and what you run. Agent building splits into low-code Agent Builder (prompt optimizer, tool catalog including local MCP servers, tool approvals, save to Foundry) and code-first scaffolding aligned to frameworks like Microsoft Agent Framework and LangGraph. The toolkit treats MCP servers as first-class tool sources inside the IDE, continuing last week's MCP operationalization thread. Agent Inspector adds IDE-native debugging and observability: F5 debugging with breakpoints, stepping and variables, streaming tool-call visualization, workflow graphs, and local span tracing across tool calls. That brings last week's “observability is design” message earlier into local development. Evaluations show up as tests: pytest-style definitions run in VS Code Test Explorer, results can be analyzed in Data Wrangler, and reused for scaled runs in Foundry. This fits last week's point that evaluation, monitoring, and cost controls are day-two expectations, not something you bolt on later. The GA deep dive also treats Windows on-device AI as part of the same surface, extending last week's “cloud + local” story with IDE controls. The local pipeline converts, quantizes, and evaluates Hugging Face models into ONNX for Windows ML, and targets execution providers across hardware (OpenVINO, TensorRT, Qualcomm QNN, AMD MIGraphX/VitisAI). Profiling includes CPU/GPU/NPU/memory views, Windows ML event breakdown (startup vs per-request), and operator-level tracing showing placement and timing. It also supports LoRA fine-tuning for Phi Silica via a cloud job on Azure Container Apps, then downloads the adapter for Phi Silica LoRA APIs. The intent is to reduce bespoke ML infrastructure when you only need an adapter.

Production agent architecture: security, governance, memory, and hosting choices

As agents move from prototypes to production, guidance converged on control: observability does not help much without enforceable policies (tool allowlists, least privilege, auditable decisions) and a hosting model that fits scale and operational constraints. This continues last week's shift from “build an agent” to “run an agent,” where tracing, identity, evaluation, and guarded automation are treated as core design inputs. On governance, the Agent Governance Toolkit walkthrough shows deterministic runtime policy enforcement for a multi-agent ASP.NET Core app on Azure App Service (MAF 1.0) using Microsoft.AgentGovernance 3.0.2. The flow is middleware-like: default-deny governance-policies.yaml with allow rules, loaded into a GovernanceKernel at startup (audit + metrics), then builder.UseGovernance(kernel, AgentName) so tool calls are evaluated before execution. This complements last week's “interruptible tools” and human approvals by making governance a runtime gate across agents. Decisions and reasons land in Application Insights alongside OpenTelemetry traces, with KQL to find violations (customDimensions[“governance.decision”] != “ALLOWED”) and track token budgets via customMetrics (“governance.tokens.consumed”). It also extends into reliability with YAML SLOs and circuit breakers that reduce autonomy as error budgets burn, which matches last week's “guardrails before automation” sequencing. The Foundry security architecture checklist maps agent risks (prompt injection, tool misuse, exfiltration, over-privilege, drift) into Azure patterns: managed identities and Entra RBAC, private endpoints/Private Link, Key Vault, API Management gateways, tool allowlists, and strict output validation (JSON schema). It also calls for CI/CD evaluation and red-teaming so prompt and model changes trigger regression tests that can block deploys. This reinforces last week's MCP auth patterns (managed identity vs OAuth passthrough vs secrets) in an end-to-end posture: tool auth is not enough without least privilege, network boundaries, and validation. Statefulness also advanced with Foundry managed memory (preview), positioned as built-in long-term persistence integrated with Microsoft Agent Framework and LangGraph. Hooks include per-user scoping, automatic extraction (the platform decides what to store), and CRUD APIs so apps can inspect and correct memory (including “forget this”). It reduces the need for custom memory stores and standardizes user controls. For hosting, a guide compared Container Apps, AKS, Functions, App Service, Foundry Agents, and Foundry Hosted Agents, then focused on Hosted Agents as “containerized custom app + agent-native APIs.” This ties to last week's Agent Service GA by showing how runtime and code package together while keeping agent concepts like Responses protocols and telemetry export. The walkthrough is implementation-level: LangGraph calculator agent + adapter, agent.yaml (kind: hosted, protocols: responses), Python 3.11-slim container on port 8088, deployed via azd and azure.ai.agents extension (build/push to ACR, create/start Hosted Agent). It also calls out scale-to-zero vs min-replicas cold-start tradeoffs, automatic OpenTelemetry export to App Insights, and RBAC implications (publishing can create a dedicated agent identity that needs separate permissions from the project managed identity).

MCP and data-connected agents: Fabric OneLake, Postgres, and Oracle Database@Azure

Tool calling and agent-to-data patterns continue consolidating around MCP, extending last week's shift from “tool glue” to supportable infrastructure (hosting, auth, self-hosted Azure MCP Server 2.0). The emphasis is on explicit discovery (schemas, metadata, permissions) so agents behave predictably, and on identity and RBAC as the primary safety boundary. In Fabric, OneLake MCP tools are GA: a 19-command toolset for discovering workspaces and items, inspecting schemas and metadata via Table APIs, and browsing, reading, writing, and mapping storage via OneLake File/Access APIs. All access is constrained by the caller's Azure identity and Fabric permissions. This pairs with last week's Fabric “intelligence platform” guidance: instead of copying data into agent pipelines, you expose governed surfaces (semantic models, Table APIs, OneLake files) as tools with enforceable permissions. The example is practical: an agent inventories a mirrored database (“House Price Open Mirror”), documents schemas, distinguishes Parquet landing zones from managed Delta outputs, and checks replication health and monitoring signals to generate docs and basic health reports without manual portal work. For Postgres, a Foundry + PostgreSQL walkthrough emphasizes MCP as a controlled integration layer for exploring the database, pairing natural-language-to-SQL with vector search for RAG retrieval. This matches last week's Entra-authenticated MCP guidance (pre-authorized clients, JWT validation, OBO): database access becomes a governable tool surface instead of a broad connection string. A related “PostgreSQL Like a Pro” announcement points to more demos on modernizing Postgres apps on Azure, including MCP agent patterns and AI-assisted Oracle-to-Azure Database for PostgreSQL migrations in VS Code, pointing toward IDE loops where the agent helps iterate on conversion issues. For Oracle estates, an Oracle Database@Azure patterns article lays out options based on how deterministic you need behavior to be: Copilot Studio + Oracle connectors; ORDS + PL/SQL REST APIs for predictable behavior (with DB governance like RLS/VPD); and hybrid Oracle “Select AI” (DBMS_CLOUD_AI.GENERATE) using Azure OpenAI to generate and validate SQL inside Oracle. It includes code-first Azure Functions (JDBC/python-oracledb), Logic Apps/Power Automate orchestration, and an advanced direction where MCP and in-database runtimes participate in ReAct-style loops. That continues last week's “MCP as governed automation interface” thread applied to database operations and policy controls.

Model updates and customization: text-to-image efficiency and reinforcement fine-tuning

Foundry expanded its model surface in two ways: a higher-throughput text-to-image option intended for production usage, and more workflow controls for reinforcement fine-tuning with graders you can tune over time. This continues last week's “right model per step” and “model choice as an engineering setting” thread, expanding beyond text agents to image pipelines and training workflows. MAI-Image-2-Efficient (MAI-Image-2e) is now available in Microsoft Foundry without a waitlist, positioned as the throughput-focused sibling to MAI-Image-2 for high-volume and interactive generation. The announcement is explicit about tradeoffs: MAI-Image-2e targets latency and per-image cost (pricing: $5 per 1M text input tokens and $19.50 per 1M image output tokens), while MAI-Image-2 remains positioned for higher fidelity (portraits/photorealism, stylized outputs, longer/complex in-image text). Benchmark context (NVIDIA H100 at 1024x1024 with normalization notes) helps set expectations for how batch size and concurrency affect real deployments. The April 2026 Foundry fine-tuning update focused on reinforcement fine-tuning (RFT) for o4-mini with api-version=2025-04-01-preview. Key changes include “global training” (trainingType: “globalstandard”) across 13+ regions to lower per-token training rates with consistent infrastructure and model quality, plus more grader options. RFT now supports model graders using GPT-4.1, GPT-4.1-mini, and GPT-4.1-nano, alongside deterministic graders (string checks, Python, endpoint-based). Guidance is operational: start deterministic for speed, cost, and reproducibility, use model graders for semantic or multi-dimensional partial credit, and iterate nano → mini → full as rubrics stabilize. It also calls out common pitfalls (role ordering, schema mismatches, missing structured response_format when graders reference output_json) and reinforces a best practice aligned with last week's MCP thread: treat tools as part of the environment and consider MCP for production tools even if you also offer a function-calling-compatible interface for fine-tuning.

Copilot Studio orchestration: mixing agents with workflows

Copilot Studio added clearer orchestration primitives for business process automation scenarios where you want agent reasoning but still need deterministic execution and audit trails. This builds on last week's focus on approvals, guarded automation, and explicit topology by formalizing two patterns inside a workflow engine. The update describes “agent-in-workflow” (agent nodes for interpretation/synthesis at specific steps, with request-for-information escalation) and “workflow-as-tool” (agents call workflows as reliable sub-process tools, authored in natural language or reused from a library). The practical benefit is keeping the workflow as the orchestrator (branching, handoffs, approvals stay explicit) while inserting LLM steps where ambiguity is unavoidable.

Other AI News

Visual Studio's Copilot Chat gained a “Debugger Agent” workflow in Visual Studio 18.5 GA that ties an agent into a live debugging session. You provide a GitHub/ADO item (or description), it proposes a hypothesis, sets breakpoints (with approval), observes the repro, inspects runtime state (variables/call stack), then proposes or applies a fix and reruns. This complements the Foundry Toolkit “tight IDE loop” story: with agent inspectors, evaluation-as-tests, and debugger participation, agents can be tested and observed with familiar software engineering workflows.