Distributed systems to AI platforms with Mark Russinovich & Ion Stoica | BRK227
Mark Russinovich and Ion Stoica discuss how distributed-systems principles are shaping next-generation AI platforms, covering what changes as workloads become agentic, multimodal, and globally distributed, and why open source, security, and governance are now core requirements from training through real-time serving.
Overview
This Microsoft Build 2026 breakout (BRK227) is a conversation about how AI infrastructure needs to evolve to support the “agent era”. The session connects classic distributed-systems fundamentals to modern AI platform requirements, spanning training, real-time serving, and platform concerns like developer experience, security, and governance.
What’s changing in AI platform architecture
- AI systems are increasingly:
- Agentic (systems that plan and act across tools and services)
- Multimodal (working across text, images, audio, etc.)
- Globally distributed (latency, data locality, and reliability become first-class constraints)
- The discussion frames these shifts as architectural drivers that affect the full stack: algorithms, systems software, hardware, and data center design.
Distributed systems fundamentals applied to AI infrastructure
- The session emphasizes using distributed-systems principles to reason about:
- Scalability and fault tolerance
- Scheduling and resource management
- Data movement and locality
- End-to-end performance across training and serving
From data centers to AI supercomputing regions
- The speakers discuss the evolution of data centers and the rise of large-scale AI compute regions designed for modern training and inference workloads.
Serverless computing in AI workloads
- The session highlights modern serverless computing and its role in AI workloads, particularly where elasticity and operational simplicity matter.
Cross-layer optimization and the open source stack
- The conversation covers optimization layers across:
- Algorithms
- Hardware
- Architecture / systems design
- It also calls out the importance (and difficulty) of cross-layer optimization in an open source AI infrastructure stack.
Security, governance, and protecting sensitive AI data
- Security and governance are treated as core platform requirements, including:
- Protecting sensitive AI data
- Confidential computing as part of the security story
Developer experience and verification challenges
- The session touches on developer experience and code verification challenges, including discussion of limits on future automation.