Microsoft Developer’s session with Ayan Gupta and Julien Dubois explores building advanced multi-agent AI systems in Java using LangChain4j, with hands-on agent orchestration, GPT-4o Mini, and speech synthesis. Learn how these technologies work together to automate creative and practical AI workflows.

Building Multi-Agent AI Systems with LangChain4j and Java

Presented by Microsoft Developer: Ayan Gupta (host) and Julien Dubois (guest)

This episode dives into modern AI development practices with a focus on multi-agent orchestration using LangChain4j. Learn how to construct a coordinated system where specialized AI agents collaborate to perform complex tasks—such as generating poetry and transforming it into speech automatically.

Session Overview

Introduction to AI agents and their real-world applications (e.g., automating a coffee order with agent coordination).
Step-by-step creation of a three-agent system:
- Author Agent: Uses GPT-4o Mini for poetry generation.
- Actor Agent: Leverages Mistral 3B and MaryTTS for text-to-speech (TTS), outputting audio files.
- Supervisor Agent: Orchestrates workflow, ensuring seamless collaboration.

Orchestration Strategies

Pure AI Orchestration: A supervisor LLM determines which agent to call and when.
Workflow-Based Orchestration: LangChain4j’s SequenceBuilder and related APIs enable defining explicit sequences, parallel flows, and loops.
For this session, workflow orchestration is emphasized for clarity and accessibility.

Technical Walkthrough

Author Agent
- Implements poetry generation using GPT-4o Mini.
- Features annotations such as @UserMessage and @Agent for input handling.
- Testing and validating stand-alone behavior.
Actor Agent
- Integrates Mistral 3B for advanced text processing.
- Connects to MaryTTS (run via Docker container) for TTS conversion.
- Uses the @Tool annotation for tool integration.
Supervisor Agent
- Maintains a shared context map for workflow state.
- Employs SequenceBuilder to link Author and Actor agents.
Demo: Request a poem about the Java Virtual Machine (JVM), automatically have the Author agent generate it, the Actor agent render audio, and the Supervisor chain the process—resulting in a playable audio file (output.wav).

Implementation Steps

Set up LangChain4j and Agent Interfaces
Register Model APIs (GPT-4o Mini, Mistral 3B)
Configure Docker for MaryTTS
Annotate and deploy agents
Use SequenceBuilder to define coordination logic
Test end-to-end: from prompt input to audio output

Key Technologies

Java + LangChain4j
GPT-4o Mini (text generation)
Mistral 3B (text processing)
MaryTTS (text-to-speech, via Docker)
SequenceBuilder (workflow orchestration)

Resources

Official session page & code resources
LangChain4j on GitHub
Docker docs for MaryTTS setup

By following this walkthrough, developers can build and orchestrate their own intelligent multi-agent AI solutions in Java, combining generative text and speech, and unlocking new workflow automation possibilities. Presented by Julian Dubois with Ayan Gupta for Microsoft Developer.