.NET data ingestion with MarkItDown MCP, SQL Server 2025, Ollama, Docker Desktop, and a PDF file
Authorised Territory demonstrates a .NET data ingestion pipeline that converts a PDF to Markdown via the MarkItDown MCP server, generates embeddings with a local Ollama model, and stores those embeddings in SQL Server 2025 running in Docker Desktop.
Overview
The video walks through building an ingestion workflow in .NET for turning PDF content into LLM-ready text and vectors:
- Convert a PDF into Markdown using MarkItDown (a Python utility) exposed via Microsoft’s MarkItDown MCP server.
- Use the NuGet package Microsoft.Extensions.DataIngestion to build the ingestion pipeline and create embeddings.
- Store the generated embeddings in SQL Server 2025.
- Run both the MarkItDown MCP server and SQL Server 2025 in Docker Desktop.
Links referenced
- MarkItDown MCP image on Docker Hub: https://hub.docker.com/r/mcp/markitdown/
- MarkItDown MCP package source: https://github.com/microsoft/markitdown/tree/main/packages/markitdown-mcp