Scaling Kubernetes Securely and Reliably with AKS
Microsoft Events presents practical lessons on operating large Kubernetes clusters with AKS, featuring security, scaling, and cluster management tips from Microsoft Ignite 2025 speakers Brendan Burns, Jorge Palma, and Durga Rachapudi.
Scaling Kubernetes Securely and Reliably with AKS
Overview
As Kubernetes and AI adoption continue to accelerate, managing clusters at scale is critical for modern organizations. This Microsoft Ignite 2025 breakout session (BRK120), led by Brendan Burns, Jorge Palma, and Durga Rachapudi, delivers hands-on strategies for administering large clusters using Azure Kubernetes Service (AKS).
Key Topics Covered
- Kubernetes as foundational infrastructure for AI and modern applications
- Microsoft 365 platform alignment with Azure and the adoption of Kubernetes for efficiency
- AKS enterprise-grade scalability: Handling millions of workloads and the reality of hybrid clusters
- ‘What’s New in AKS’: Latest advancements and operational models—Standard and Automatic
- Azure Kubernetes Fleet Manager: Simplifying global cluster capacity, cluster placement, and consistent multi-cluster management
- AI-Aware Scheduling: Optimizing resources for AI workloads running at scale
- Automated Rollouts and Upgrades: Ensuring reliability and reducing manual intervention across environments
- Hybrid and Edge Environments: Maintaining consistent management practices
- Local DNS Enhancements: Driving improved latency and reliability for services
Practical Tips and Recommendations
- Adopt multi-cluster management tools like Azure Fleet Manager for ease of global operations
- Use smart scheduling to optimize performance, particularly for AI-centric clusters
- Leverage automated upgrade rollouts to improve reliability and reduce downtime
- Apply consistent management practices for hybrid and edge deployments
- Employ local DNS solutions to boost application latency and reliability
Resources
Speakers
- Brendan Burns
- Jorge Palma
- Durga Rachapudi
This session is for intermediate practitioners looking to improve the reliability, scalability, and security of their AKS clusters for complex, AI-powered applications.