Browse DevOps Community (199)

Surprise AI bill? GitHub Billing controls to the rescue!

Yesterday by Chris Noring

Chris Noring explains how to investigate unexpected GitHub Copilot Enterprise AI-credit spend and then control it using GitHub Billing guardrails. The post walks through identifying the SKU driving costs, attributing usage to organizations and cost centers, and applying budgets, alerts, and per-user limits without breaking productive workflows.

Cap it with GitHub, make it count with Azure: governing GitHub Copilot spend

Yesterday by tonimontez

tonimontez explains how to govern GitHub Copilot’s usage-based spend using GitHub’s native budget controls first, then adds an Azure API Management gateway in front of an Azure AI Foundry (Azure OpenAI) deployment for real-time, token-granular quotas and per-developer telemetry.

AI Gateway tier of API Management now in public preview

2 days ago by vladvino

vladvino announces the public preview of the AI Gateway tier for Azure API Management, focused on publishing and governing AI models and MCP servers. The post explains the new portal experience, policy-card governance (rate limits, quotas, Content Safety, fallback), and OpenTelemetry token metrics to destinations like Application Insights.

Preserve a Legacy IP During Azure Migration with Private Link Service Direct Connect

2 days ago by Kumar Kaushal

Kumar Kaushal demonstrates a transitional Azure networking pattern for phased migrations where an application is hard-coded to a legacy private IP. The lab uses Private Link Service Direct Connect plus a Private Endpoint and a site-to-site VPN so Azure-hosted tiers can keep calling the same on-premises address during the interim phase.

MCP Connect: Why Every AI Engineer and Developer Should Care About the Model Context Protocol

2 days ago by Lee Stott

Lee Stott explains why the Model Context Protocol (MCP) is becoming the standard way AI agents connect to tools and data, then shows minimal runnable MCP server examples in Python and TypeScript plus practical guidance for VS Code hosting, security, and production operations.

Your Entire Agentic AI Workflow, Now Inside VS Code: New Course Available

5 days ago by carlottacaste

carlottacaste shares a short walkthrough of the Foundry Toolkit for Visual Studio Code and a companion VS Code Learn course, focused on keeping model selection, prompt iteration, and agent development inside VS Code, with links to a full video playlist and a featured setup episode.

Changing the engine while the plane is flying: migrating 60,000 apps under live load

6 days ago by AbodeSaafan

AbodeSaafan explains how the Azure Logic Apps team migrated roughly 60,000 per-customer Azure Functions apps from the end-of-life v1/v2 runtime to Functions v4 (isolated worker) under live load, using shadow traffic, strict parity checks, and a progressive rollout with fast rollback and careful retirement of the old fleet.

Introducing Kubernetes-Native Policy Validation with CEL and VAP in Azure Policy

6 days ago by stevenbucher

stevenbucher explains how Azure Policy for Kubernetes can now enforce policies using Kubernetes Validating Admission Policy (VAP) with CEL, via Gatekeeper’s integration. The post contrasts the older webhook/Rego flow with in-process validation, then walks through packaging a CEL constraint template into an Azure Policy definition and rolling it out across AKS clusters.

Design, test, and ship Foundry hosted agents from a canvas in GitHub Copilot App

6 days ago by junjieli

junjieli introduces the Foundry Agent Canvas (public preview), a GitHub Copilot App extension that lets developers discover resources, scaffold and configure a Foundry hosted agent, test it locally with an embedded Agent Inspector, and deploy it to Foundry Agent Service using azd-driven workflows.

Hybrid Logic Apps on RKE2: a self-managed cluster with MetalLB

6 days ago by anandgmenon

anandgmenon walks through running Azure Logic Apps Hybrid on a self-managed RKE2 Kubernetes cluster, including how to provide an ingress IP with MetalLB, connect the cluster to Azure Arc, install the Container Apps extension, and fix the RKE2-specific DNS and inotify issues that can break deployments.

Reminder: Path to Production for Agents Webinar Series Starts Next Week

6 days ago by brauerblogs

brauerblogs shares a reminder to register for Microsoft’s “Path to Production for Agents” webinar series (July 27–28), focused on taking AI agent solutions from experimentation to secure, scalable production with guidance on governance, platform design, AgentOps, and multi-agent architecture patterns.

CDK Global modernizes automotive CRM on Azure SQL Managed Instance

1 weeks ago by Ricardo Duncan

Ricardo Duncan describes how CDK Global migrated a business-critical automotive CRM platform to Azure SQL Managed Instance, including the architecture choices, migration approach, and operational practices used to move more than 1,000 databases with minimal downtime.

Connecting Microsoft Discovery App to Azure HPC with Azure NetApp Files and CycleCloud

1 weeks ago by richpaw

richpaw describes a reference architecture for running Microsoft Discovery on a Windows VM in Azure and connecting it to an Azure CycleCloud HPC cluster via Azure NetApp Files (NFS) and SSH, so agentic workflows can submit Slurm jobs, read/write POSIX files, and stay inside a private network boundary.

Take command of your Microsoft Foundry AI agents with the Azure Copilot Observability Agent

1 weeks ago by Ron Frenkel

Ron Frenkel explains how the Azure Copilot Observability Agent (now generally available in Azure Monitor) helps teams investigate Azure AI Foundry and GenAI agent issues using Application Insights telemetry, including failures, latency, tool-call errors, token spikes, and dependency bottlenecks with evidence-backed root cause analysis.

Hypervelocity Engineering: Accelerating Enterprise AI with Azure AI Landing Zones

1 weeks ago by VimalVerma

VimalVerma outlines Hypervelocity Engineering (HVE) as an operating model for building and continuously evolving Azure AI Landing Zones, with a focus on platform engineering, Infrastructure as Code, Policy as Code, and security-by-design so enterprise AI platforms can scale without losing governance.

How Microsoft 365 built a platform engineering layer on AKS to ship faster at global scale

1 weeks ago by Suma SaganeGowda

Suma SaganeGowda explains how Microsoft 365 built COSMIC, an internal platform layer on Azure Kubernetes Service (AKS), to standardize provisioning, deployments, security/compliance guardrails, and observability across globally distributed services so product teams can ship faster without taking on Kubernetes operational overhead.

Public Preview: Advanced platform metrics in Azure Monitor

2 weeks ago by alyssaschimm

alyssaschimm announces a public preview in Azure Monitor that adds advanced platform metrics for Azure Storage, giving container-level visibility into blob capacity and blob count so teams can investigate growth drivers, set better alerts, and improve cost and capacity planning without building custom reporting pipelines.

Load testing Copilot Studio agents with Locust and Azure Load Testing

2 weeks ago by Krishna Roy

Krishna Roy shows how to load test Copilot Studio agents by simulating real multi-turn conversations over Direct Line (HTTP + WebSockets) with Locust, then running the same workload in Azure Load Testing. The post covers measuring TTFB vs full turn completion, handling turn.complete, file uploads, and secret handling with Key Vault.

Orchestrate Azure Container Apps Jobs with Apache Airflow

2 weeks ago by hetvip

hetvip explains how to use Apache Airflow as an orchestrator for Azure Container Apps (ACA) Jobs, enabling dependency ordering, parallel fan-out, and per-task retries. The post introduces two open-source templates and a shared Airflow operator that triggers ACA Job executions via the ACA Jobs API.

Introducing Azure Front Door edge actions - Bringing secure, programmable logic to the edge

2 weeks ago by akhilkarmalkar

akhilkarmalkar announces Azure Front Door edge actions (public preview), a way to run lightweight JavaScript during request processing at Microsoft’s global edge. The post explains where edge actions run in the Front Door pipeline, what scenarios they enable (routing, headers, auth checks), and how Hyperlight micro-VM isolation is used to keep execution secure.

Building AI Agents from Zero to Production

2 weeks ago by Lee Stott

Lee Stott tours Microsoft’s open-source course on taking AI agents from prototype to production using Microsoft Agent Framework and Microsoft Foundry, covering agent design, multi-agent orchestration, evaluation, deployment, data sovereignty, and tool governance with concrete commands and code patterns.

From Multi-Model Chaos to a Governed AI Gateway: Cost Optimization on Azure

2 weeks ago by jisunchoi

jisunchoi explains how to replace “multi-model chaos” with a governed AI gateway on Azure using Azure API Management, covering cost controls (token quotas and budget-based model downgrades), security hardening (managed identity + private endpoints), observability with Application Insights, and a Terraform-based deployment you can integrate with GitHub Copilot.

The AI Agent Lifecycle: A Simple Guide

2 weeks ago by supriyas

supriyas lays out a practical, end-to-end lifecycle for building enterprise AI agents, using a banking “loan agent” example to show how to design guardrails, build with safety controls, test with evaluations and red teaming, deploy gradually, and continuously monitor and iterate using Microsoft Foundry and Azure services.

Beyond the Canvas: The Azure Architecture Diagram Builder Becomes Agent-Ready

2 weeks ago by Arturo Quiroga

Arturo Quiroga shares major updates to the open-source Azure Architecture Diagram Builder, including a multi-turn Architecture Chat, a Blueprint (whiteboard-style) rendering mode, and a new MCP server interface so AI agents can generate, validate, cost, and render Azure architectures programmatically.

Designing for cloud sovereignty with Radius and Dapr

3 weeks ago by CollinBrian

CollinBrian explains how to design cloud-sovereign applications by keeping both deployment and runtime dependencies portable, using Radius for an application model and Dapr for consistent distributed-systems APIs across environments like Kubernetes and Azure.

A Paradigm Shift in Cloud Operations with Azure SRE Agent

3 weeks ago by Nir Mashkowski

Nir Mashkowski shares customer examples of how Azure SRE Agent is being used to reduce incident triage and investigation time by having an AI-powered agent gather evidence, classify issues, and recommend next steps, with an emphasis on governance controls and operational “memory” for teams running production on Azure.

Azure Monitor Observability Agent goes autonomous (preview)

3 weeks ago by Efrat Ben Porat

Efrat Ben Porat announces public preview support for autonomous operations in the Azure Copilot Observability Agent, which can listen to Azure Monitor alerts, correlate them into issues, and run deep investigations automatically. The post explains what’s new, how correlation is guided by custom instructions, and how this changes on-call triage workflows.

Hybrid Logic Apps Deployment on Red Hat OpenShift

3 weeks ago by anandgmenon

anandgmenon walks through deploying Azure Logic Apps (Standard) in Hybrid mode on Red Hat OpenShift using Azure Arc and the Container Apps extension, calling out the OpenShift-specific gotchas around SCC permissions and DNS, plus concrete CLI/Helm commands and troubleshooting tips.

Golden Paths Are a Product. Treat Them Like One.

3 weeks ago by KishoreKumarPattabiraman

KishoreKumarPattabiraman explains why “golden paths” in platform engineering need to be treated as long-lived products, not one-off projects. Using examples like an AKS migration and identity modernization, the post lays out an operating canvas for ownership, guardrails, adoption strategy, and measurable feedback loops that keep paved paths trusted over time.

Optimizing GitHub Copilot Cost in the Usage-Based Billing Era

3 weeks ago by Gaurav Bhardwaj

Dustin Ellis outlines practical ways to keep GitHub Copilot usage predictable under usage-based billing, focusing on token drivers like model choice, context size, and tool/agent usage. The post includes concrete prompting patterns, team guardrails, and lightweight policy ideas to reduce waste without losing the benefits of AI-assisted development.

Coding with Logic Apps Standard: Local Functions

3 weeks ago by hcamposu

hcamposu introduces Logic Apps Standard Local Functions and explains when workflow-scoped .NET code belongs inside the same Logic Apps project instead of being split into a separate Azure Functions app, with practical guidance on scenarios, runtime behavior, and deployment implications.

Logic Apps Aviators Newsletter - July 2026

3 weeks ago by WSilveira

WSilveira’s July 2026 Logic Apps Aviators newsletter rounds up key Azure Integration Services updates and community posts, including the Logic Apps Standard move toward Azure Functions out-of-proc hosting for .NET 10, dynamic connection names, MCP server management in API Management, and agentic Logic Apps patterns.

Automatically Route Azure Service Health Alerts to the Right Service Owners Using Agentic Logic Apps

3 weeks ago by Arpit_MSFT

Arpit_MSFT shows how to route Azure Service Health alerts to the right team automatically by using Azure Monitor Action Groups to trigger an Autonomous Agent Logic App, backed by a simple service-to-owner mapping (for example, a CSV in Azure Blob Storage) with a default fallback recipient.

Azure landing zone (ALZ) enters its next chapter

3 weeks ago by Jack Tracey

Jack Tracey announces that Azure landing zone (ALZ) has moved from a community-led open-source initiative to an officially owned Microsoft product under the Azure Migrate team, while keeping the existing repos and consumption model unchanged for users.

Find anomalies in Prometheus and OpenTelemetry metrics with Dynamic Thresholds (Preview)

3 weeks ago by yairgil

yairgil introduces Dynamic Thresholds (Preview) for query-based metric alerts in Azure Monitor, showing how Azure can learn per-time-series baselines for Prometheus and OpenTelemetry metrics. The post includes PromQL examples for AKS CPU anomaly detection and p95 latency regression alerting, plus practical query design tips to reduce noisy alerts.

Introducing kars - an Agent Reference Stack for Kubernetes

4 weeks ago by pallakatos

pallakatos introduces kars, a Kubernetes-native runtime for running AI agents on Azure with a “treat agents as untrusted code” security model: per-agent sandboxes, policy enforced via CRDs, zero credentials in the agent process, and an end-to-end encrypted inter-agent mesh designed for governance at scale on AKS.

Smoke Test Microsoft Foundry Agents with GitHub Actions

4 weeks ago by j_folberth

j_folberth shows how to add fast smoke tests to a GitHub Actions deployment pipeline for Azure AI Foundry hosted agents, using a JSON prompt catalog and a Python runner to validate basic agent behavior (reachability, prompt alignment, threading, refusals, and hallucination resistance) right after deployment.

Shaping Software While It Runs: A Canvas Scenario, Start to Finish

4 weeks ago by Lee Stott

Lee Stott walks through a full “Multi‑Agent Dev Canvas” scenario for GitHub Copilot Canvas, showing how to decompose work, execute agent flows, validate with in-surface tests, inject failures, and evolve the design live (including GDPR/PII redaction) until the system meets its acceptance criteria.

My Journey with Azure SRE Agent

Jun 29, 2026 by jometzg

jometzg walks through building an autonomous PIM elevation audit workflow using Azure SRE Agent, including how to move from interactive chat exploration to a headless scheduled subagent that queries Log Analytics and emails a daily alignment report to stakeholders.

A Better Way to View Logs in Kudu for Azure App Service on Linux

Jun 25, 2026 by TulikaC

TulikaC introduces the new Log stream page in Kudu for Azure App Service on Linux, showing how to stream and inspect application and platform logs with filters and search to speed up troubleshooting for startup issues, runtime errors, failed requests, and container restarts.