Weekly Azure Roundup: Managed Identity, Observability, DNS

Apr 13, 2026 by TechHub

Azure's updates this week leaned toward making production operations less brittle, continuing last week's theme of controlled transitions and day-two readiness. Identity continues shifting away from long-lived secrets, ops tooling continues emphasizing “observe first, automate safely,” and app hosting continues smoothing runtime upgrades and practical deployment paths. Architecture guidance stayed grounded in scale realities: DNS as a hard dependency in private-first designs and DR choices aligned to real RTO/RPO needs.

This Week's Overview

Managed identities and least-privilege automation across Azure platforms

Azure Red Hat OpenShift (ARO) reached an identity/governance milestone: Azure Managed Identities (for platform/operators) and Microsoft Entra Workload Identity (for pods/apps) are now GA. It fits last week's identity/governance thread by moving ARO components off one broadly permissioned service principal and onto multiple user-assigned managed identities, each mapped to dedicated built-in ARO roles so RBAC can be scoped per component. Provisioning is supported via portal (including an all-in-one flow), ARM/Bicep, and az aro CLI (Azure CLI 2.84.0+). For workloads, Workload Identity uses OIDC federation to map a Kubernetes service account to a user-assigned managed identity so pods can request short-lived tokens for narrowly scoped access (Key Vault, Storage, Azure SQL) without in-cluster secrets. A key transition note remains: legacy service-principal clusters cannot migrate in-place. You need a new managed-identity cluster and then migrate workloads. The same move away from secrets shows up in App Service CI/CD guidance. A walkthrough shows deploying to Azure Web Apps from Azure DevOps using an ARM service connection authenticated via a user-assigned managed identity (UAMI). The flow is simple: assign Website Contributor on the app (or resource group), create the service connection with UAMI auth, and verify who deployed via App Service audit logs (AppServiceAuditLogs), where the initiator is the UAMI object ID. It also notes that setup-time interactive sign-in may appear in logs, so “who set it up” can differ from “which identity deployed.” Hybrid onboarding got a similar least-privilege update with a new Azure Arc onboarding role for Ansible. It matches last week's “standardize the baseline” message by letting teams onboard Arc-enabled servers through existing Ansible playbooks with more tightly scoped permissions, which helps with fleet-scale onboarding.

Kubernetes and App Service operations: tracing, alert automation, and simpler deployments

AKS troubleshooting got a service-mesh tracing pattern that reinforces last week's “constraints plus observability” message. The guide correlates Istio/Envoy access logs with Application Insights using W3C Trace Context. It enables Istio Telemetry in the managed Istio system namespace, then uses EnvoyFilter resources (inbound/outbound) to emit structured JSON logs to stdout so they land in Log Analytics (ContainerLogV2). The practical win is correlation: parse the traceparent header from Envoy logs, extract the 32-hex trace id, and match it to App Insights operation_Id via KQL to follow one request across mesh hops and app telemetry without adding another tracing stack. For incident response, Azure SRE Agent's Azure Monitor integration continued last week's day-two focus. It ingests fired alerts through the Azure Alerts Management REST API (“Get all alerts”), handles multiple alert types, and routes matches into Incident Response Plans in “review” or “autonomous” mode. The demo (AKS Node.js app failing Redis auth due to a bad secret) shows polling every 60s, acknowledging alerts, investigating logs, fixing the secret by retrieving the right Redis key, rolling pods, and resolving. A key nuance is merging: repeated firings from the same rule merge into one active thread (7-day window) rather than creating new incidents, which interacts with whether a rule uses auto-resolve. The advice to keep auto-resolve OFF for persistent failures (bad creds, CrashLoopBackOff-like issues) matters if you want one investigation per ongoing problem. App Service got two developer-facing improvements that reduce friction at opposite ends of the workflow, building on last week's hosting direction. PHP 8.5 is now supported on App Service for Linux in all public regions, including better fatal error backtraces and language features like the pipe operator (|>). For quick deployment, App Service for Linux now offers a simpler zip deployment flow in Kudu/SCM: drag-and-drop a zip, preview contents, choose whether to run a server-side build, and track Upload/Build/Deployment progress with logs and runtime log links. It targets quick tests and initial setup, while still pointing teams to CI/CD for repeatability.

Reliability and private networking: DNS pitfalls, multi-region choices, and Key Vault continuity

A hub-spoke incident write-up reinforced a point that extends last week's private networking items: DNS is Tier-0, especially with custom hub DNS, centralized egress (firewall/NVA), Private Endpoints (OpenAI, AI Search, Key Vault, Storage), and Azure Container Apps (ACA) with internal ingress. The symptoms looked like inconsistent platform behavior (ACA startup/scaling issues, Terraform stalls, endpoint reachability gaps, secret/auth problems) but came down to DNS gaps: custom resolvers not forwarding to 168.63.129.16, missing conditional forwarders for privatelink zones (privatelink.openai.azure.com, privatelink.vaultcore.azure.net, privatelink.search.windows.net, privatelink.blob.core.windows.net), incomplete private DNS zone links across VNets/subscriptions, and missed ACA internal ingress DNS needs (private DNS zone for the environment domain plus wildcard/apex records pointing to the static IP). As with last week's DNS reconciliation pattern, remediation required no application changes, only consistent zone management and correct forwarding/linking. Azure multi-region resilience guidance continues last week's reliability framing by focusing on matching patterns to requirements: Availability Zones for in-region HA, regional BCDR across paired or non-paired regions based on capacity/latency/residency, and active/active when you can handle the operational complexity. It reinforces that regional failover is often customer-orchestrated, so testing failover/failback and aligning data replication/consistency are still on you. It distinguishes Azure Site Recovery (VM mobility/failover) from Azure Backup (restore-based protection) and points to the Resiliency agent in Azure Copilot (preview) as emerging tooling for coverage gaps and drills. Key Vault continuity got a practical warning that fits last week's “do not assume day-2 is handled” theme. Paired-region replication improves survivability but does not guarantee continuity for applications that need writes during outages. During Microsoft-managed regional failover, the vault becomes read-only (reads and crypto ops continue, but writes/updates/rotation/cert renewal stop). If you need rotation and deterministic failover/testing, the recommended pattern is multiple independent vaults across regions with customer-managed replication and failover, supported by Terraform reference architectures including event-based sync (variants for private-endpoint-only regulated environments vs simpler public setups).

Other Azure News

Provisioning workflows continue to standardize around repeatability, echoing last week's deterministic dev/test and predictable rollout theme. An Azure Developer CLI guide walks through the azd loop (azd init, azd auth login, azd up, azd show, azd down --force --purge) and the project structure (azure.yaml, infra/, .azure/) so teams can move from local code to reproducible deployments without portal-driven glue.

AZD for Beginners: A Practical Introduction to Azure Developer CLI JavaScript teams got a dependency planning notice that fits last week's “small notices become real work” reminder. The Azure SDK for JavaScript ends support for Node.js 20.x after July 9, 2026, and newer releases will require Node.js 22.x in package.json engines. If CI uses engine-strict=true, installs will fail once you adopt those SDK versions, so plan Node upgrades or be ready to pin packages.
Announcing the end of support for Node.js 20.x in the Azure SDK for JavaScript Essential Machine Management (public preview) in Compute Infrastructure Hub pulls more machine operations into subscription-level baselines, aligning with last week's governance-by-default and runbook/observability emphasis. It is an “enroll once per subscription” bundle that enables VM Insights plus recommended alerts, Update Manager, Change Tracking and Inventory, Machine Configuration, and a security baseline policy for Azure VMs and Arc-enabled servers, with pricing depending on Arc coverage/licensing.
Announcing Public Preview for Essential Machine Management For constrained environments, Azure Local Disconnected Operations described running Azure Local fully air-gapped while still supporting VMs, Kubernetes (AKS), and selected Arc-enabled services. It extends last week's sovereign/edge direction (Azure Local modular datacenters, Foundry Local on-site inference) with more operational detail on what “run disconnected” involves.
Azure Local Disconnected Operations: Running Sovereign Cloud, Productivity, and AI in Air‑Gapped Environments This week's broader platform roundup continued last week's messaging/governance theme. Service Bus NSP support appears again, reinforcing “perimeter plus transition mode” as a cross-service pattern to track. The roundup also included AKS networking/ops updates (CNI overlay CIDR expansion, disabling HTTP proxy, observability improvements), a new Azure Functions MCP resource trigger, ARO NVIDIA GPU support, Network Watcher rule impact analysis, Azure Migrate Azure Files assessment, and PostgreSQL updates (maintenance notifications, PgBouncer 1.25.1).
Azure Update 10th April 2026 Cosmos DB cost/performance tuning got a longer optimization walkthrough aligned with last week's FinOps framing: RU budgeting, throughput choice (manual vs autoscale), account throughput limits, reserved capacity, and how partition keys, document shape, and indexing affect RU use and hot partitions. A short companion video is linked, but it adds little detail beyond the main walkthrough.
Cosmos DB Optimization
Cosmos DB Optimization #cosmosdb #database Fabric updates continued last week's “modernize without rewrites” and “reduce glue code” storyline. Shortcut transformations in OneLake/Lakehouse are now GA for turning shortcut files (CSV/Parquet/JSON) into managed Delta tables with continuous sync and schema inference/evolution without ETL pipelines. SQL database in Fabric also added a Migration Assistant (preview) using DACPAC schema migration plus Fabric Copy Jobs (optional Data Gateway) for data moves, including compatibility checks and Copilot fix suggestions.
Shortcut transformations: Turn files into Delta tables without pipelines (Generally Available)
Introducing Migration assistant for SQL database in Fabric (Preview) For IIS behind Azure Application Gateway, a troubleshooting guide focused on preventing false Unhealthy probe states (and 502/504s) by using a dedicated /health endpoint returning 200 anonymously, avoiding redirects/auth, keeping it fast and dependency-free (even static content), and matching probe TLS/host header settings (including “Pick host name from backend” to avoid SNI/CN mismatch). It matches last week's “design around managed dataplane behaviors” lesson: with App Gateway, probe behavior is part of the contract, and small mismatches can look like random outages until probes are made deterministic.
Designing Reliable Health Check Endpoints for IIS Behind Azure Application Gateway