STATE-Bench: Memory-agnostic Benchmark

Name: STATE-Bench: Memory-agnostic Benchmark
Uploaded: 2026-05-19T17:00:52+00:00
Description: Microsoft Developer introduces STATE-Bench, an open-source benchmark for evaluating whether “memory” actually improves AI agent performance on realistic,...

May 19, 2026 by Microsoft Developer

Microsoft Developer introduces STATE-Bench (Stateful Task Agent Evaluation Benchmark), an open-source, memory-agnostic benchmark that measures whether memory improves AI agent performance on realistic, stateful enterprise tasks.

Overview

STATE-Bench evaluates AI agents beyond simple recall by focusing on how well they execute procedural, stateful workflows and how they behave across repeated runs.

What STATE-Bench is

STATE-Bench stands for Stateful Task Agent Evaluation Benchmark.
It is designed to test production-readiness characteristics of agents on realistic enterprise tasks.
It is memory-agnostic and supports a “bring your own memory” approach.

Why traditional memory benchmarks fall short

Many benchmarks emphasize recall-style tests.
STATE-Bench targets stateful task execution, where success depends on reliably completing multi-step procedures rather than remembering isolated facts.

What STATE-Bench measures

STATE-Bench evaluates agent performance across dimensions such as:

Procedural workflow handling (stateful, multi-step tasks)
Reliability across repeated runs
Efficiency
User experience

It includes domains such as:

Customer support
Travel
Shopping

How to contribute and learn more

GitHub repository: https://github.com/microsoft/STATE-Bench
Related video: Using Microsoft Agent Framework with Foundry managed memory: https://youtu.be/DZn9bNDEs4U?si=IV2itRlRjMXPYQl8
Short link for this video: https://aka.ms/memory-benchmark

Video chapters

00:00 What's project STATE Bench
03:45 Why this benchmark is different
13:06 How it works
18:57 What's Next and How to Contribute
20:58 Final statements