.NET AI Community Standup: High-Performance AI Inference on a Budget

Hosted by the dotnet team, this session features Bruno Capuano and Tal Wald as they demonstrate strategies for achieving fast, cost-effective AI inference using .NET, ONNX Runtime, and modern APIs.

.NET AI Community Standup: Blazing-Fast AI Inference on a Budget

Presented by: Bruno Capuano and Tal Wald
Hosted by: dotnet team

Overview

This session explores how developers can process over 20,000 sentences per second with minimal expenses by leveraging .NET for AI inference. The presenters demonstrate how AI workloads, typically developed in Python, can be accelerated and optimized using Microsoft's .NET ecosystem.

Highlights include:

Key Topics Covered

Migrating from Python to .NET for AI

Hugging Face Model Integration

High-Performance Inference with ONNX Runtime

.NET 9 & 10 AI APIs

AI Library Architecture

Demos and Performance Comparisons

From Research to Production

Conclusion

This standup provides actionable insights for .NET developers aiming to bring AI workloads to production efficiently. By combining ONNX Runtime, .NET’s new APIs, and proper architectural design, high performance and low cost are achievable without sacrificing flexibility.