STATE-Bench: Memory-agnostic Benchmark

Microsoft Developer introduces STATE-Bench (Stateful Task Agent Evaluation Benchmark), an open-source, memory-agnostic benchmark that measures whether memory improves AI agent performance on realistic, stateful enterprise tasks.

Overview

STATE-Bench evaluates AI agents beyond simple recall by focusing on how well they execute procedural, stateful workflows and how they behave across repeated runs.

What STATE-Bench is

Why traditional memory benchmarks fall short

What STATE-Bench measures

STATE-Bench evaluates agent performance across dimensions such as:

It includes domains such as:

How to contribute and learn more

Video chapters