Scale agentic AI from on-device to cloud orchestration | BRKSP92

Name: Scale agentic AI from on-device to cloud orchestration | BRKSP92
Uploaded: 2026-06-04T12:40:12+00:00
Description: Karthik Vijayan, Colin Helms, imran Sheik Mohamed, and Jayneel Vora show how agentic AI systems can span client, edge, and cloud, with demos covering...

Jun 4, 2026 by Karthik Vijayan, Colin Helms, imran Sheik Mohamed, Jayneel Vora

Karthik Vijayan, Colin Helms, imran Sheik Mohamed, and Jayneel Vora present a Microsoft Build 2026 breakout on designing agentic AI systems that run across client devices, edge environments, and the cloud.

Overview

Modern AI systems often span multiple environments rather than running as a single model in one place. This session explores how agentic AI workloads operate across client, edge, and cloud through three demos:

Real-time on-device agents (including on-device reasoning and NPU activity)
Distributed inference across edge systems
Enterprise-scale multi-agent orchestration on Azure Kubernetes Service (AKS) with Intel Xeon

The session also focuses on practical guidance for deciding where to place inference, reasoning, and orchestration to balance responsiveness, scale, and efficiency.

Session chapters (from the video)

0:00 — Introduction to distributed AI systems across cloud, edge, and client
03:04 — Live AI client demo showing on-device reasoning and NPU activity
07:17 — Overview of Intel Core Ultra Series 3 processors with integrated Arc graphics
07:48 — Shared GPU and fast LPDDR5X memory enabling AI workloads
14:21 — Demo begins: running a sandboxed script with PowerShell
16:01 — Demonstration of model speed and token processing performance
22:01 — OpenFlow sends execution results to an SG Lang LLM engine for summarization
24:16 — Demo of auto-scaling mini pods and instance replicas based on CPU load
25:02 — Multi-agent workflow and nightly job automation within a single VM setup

Key themes

Placing AI capabilities across environments

How to think about splitting responsibilities between:
- On-device reasoning (latency/responsiveness)
- Edge inference (locality and distributed capacity)
- Cloud orchestration (coordination and scale)

Orchestrating multi-agent systems on AKS

Using Azure Kubernetes Service (AKS) as the platform for enterprise-scale orchestration
Scaling behavior demonstrated via:
- Pods and replicas
- Autoscaling based on CPU load

Performance and hardware considerations

Observing on-device NPU activity during real-time reasoning
Hardware context called out in the session:
- Intel Core Ultra Series 3 processors with integrated Arc graphics
- Shared GPU and LPDDR5X memory characteristics
- Intel Xeon for cloud/cluster scenarios

Demo workflow elements

Running a sandboxed script with PowerShell
Measuring model speed and token processing performance
Sending execution results through OpenFlow to an SG Lang LLM engine for summarization
Multi-agent workflow and nightly job automation within a single VM setup