greenie-msft explains how the durable task extension for the Microsoft Agent Framework simplifies building highly resilient and scalable AI agents on Azure, with practical code samples and best practices for serverless durability.

Building Resilient AI Agents with the Durable Task Extension for Microsoft Agent Framework

Author: greenie-msft

Overview

The durable task extension for the Microsoft Agent Framework allows developers to create production-ready AI agents that are resilient, scalable, and cost-effective. By integrating features from Azure Durable Functions, the extension brings automatic session management, distributed execution, and deterministic orchestration to AI agents deployed on Azure.

Key Features

Automatic Session Management: Agents preserve context and state across crashes, restarts, and distributed environments.
Deterministic Multi-Agent Orchestrations: Predictable, code-driven workflows for coordinating specialized agents.
Serverless Cost Savings: Pause agent execution for human input without consuming compute resources.
Built-in Observability: Visualize agent operations and orchestrations with the Durable Task Scheduler’s UI dashboard.

Why Use the Durable Task Extension?

AI workloads increasingly require features such as:

Long-running conversations and workflows
Resilience to failures
Elastic scaling across thousands of parallel agent instances
Human-in-the-loop approval flows without unnecessary compute costs

The durable task extension enables these patterns using technology proven in Azure Durable Functions and entities, abstracted into the Microsoft Agent Framework for ease of use.

The 4 Pillars (“4D’s”)

Durability: Agents automatically checkpoint state for reliable, persistent operation.
Distributed: Scale execution across many Azure instances with seamless failover.
Deterministic: Write orchestrations as ordinary code for testability and predictability.
Debuggability: Use familiar programming tools and techniques for agent development and troubleshooting.

Example: Building an AI Agent with Durable Execution

Python Example

import os
from azure.agentframework import AzureOpenAIChatClient, AgentFunctionApp
from azure.identity import AzureCliCredential

endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
deployment_name = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME", "gpt-4o-mini")

agent = AzureOpenAIChatClient(
    endpoint=endpoint,
    deployment_name=deployment_name,
    credential=AzureCliCredential()
).create_agent(
    instructions="You are good at telling jokes.",
    name="Joker"
)

app = AgentFunctionApp(agents=[agent])
app.run()

C# Example

var endpoint = Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT");
var deploymentName = Environment.GetEnvironmentVariable("AZURE_OPENAI_DEPLOYMENT") ?? "gpt-4o-mini";

AIAgent agent = new AzureOpenAIClient(new Uri(endpoint), new AzureCliCredential())
    .GetChatClient(deploymentName)
    .CreateAIAgent(instructions: "You are good at telling jokes.", name: "Joker");

var app = FunctionsApplication
    .CreateBuilder(args)
    .ConfigureFunctionsWebApplication()
    .ConfigureDurableAgents(options => options.AddAIAgent(agent))
    .Build();
app.Run();

Advanced Scenarios

Human-in-the-Loop Workflow

Agents can pause indefinitely for human approval, incurring no compute costs while waiting. Once input arrives, execution resumes automatically with all relevant state.

Python Sample

app.orchestration_trigger(context_name="context")
def content_approval_workflow(context: DurableOrchestrationContext):
    topic = context.get_input()
    content_agent = context.get_agent("ContentGenerationAgent")
    draft_content = yield content_agent.run(f"Write an article about {topic}")

    yield context.call_activity("notify_reviewer", draft_content)
    approval_event = context.wait_for_external_event("ApprovalDecision")
    timeout_task = context.create_timer(context.current_utc_datetime + timedelta(hours=24))
    winner = yield context.task_any([approval_event, timeout_task])
    if winner == approval_event:
        timeout_task.cancel()
        approved = approval_event.result
        if approved:
            result = yield context.call_activity("publish_content", draft_content)
            return result
        else:
            return "Content rejected"
    else:
        result = yield context.call_activity("escalate_for_review", draft_content)
        return result

Operational Visibility

Integrate your Function App backend with the Durable Task Scheduler for built-in observability:

Conversation history
Multi-agent visualization
Performance metrics and execution logs

Supported Languages and Environments

C# (.NET 8.0+) on Azure Functions
Python (3.10+) on Azure Functions
Additional compute options coming soon

Get Started

This post appeared first on “Microsoft Tech Community”. Read the entire article here