Introducing langchain-azure-storage: Azure Storage Integration with LangChain
Kyle Knapp presents langchain-azure-storage, detailing how Microsoft’s official Azure Storage integration enhances LangChain RAG pipelines with secure, scalable, and customizable document loading capabilities.
Introducing langchain-azure-storage: Azure Storage Integrations for LangChain
Author: Kyle Knapp
Microsoft has released langchain-azure-storage, the official package for connecting Azure Storage with LangChain 1.0. This integration introduces the new AzureBlobStorageLoader, now in public preview, which streamlines and improves document-loading for applications, particularly those using Retrieval-Augmented Generation (RAG) approaches with LLMs.
Key Features
- Unified Blob/Container Access: Load documents by entire container, specific prefix, or individual blob names.
- Memory Efficiency: Supports lazy loading, enabling scalable processing even for very large datasets (millions or billions of blobs).
- Secure Authentication: Leverages default OAuth 2.0 with DefaultAzureCredential and supports custom authentication (Managed Identity, SAS).
- Pluggable Parsing: Easily integrate any LangChain-compatible loader—parse file types like PDF, DOCX, and more, using the loader factory interface.
How it Fits into RAG Workflows
- In a RAG pipeline, documents are often stored on Azure Blob Storage.
- Workflows:
- Collect docs (PDFs, DOCX, etc.)
- Parse to text and metadata as LangChain
Documentobjects - Chunk/embed and store in a vector database (e.g., Azure AI Search)
- Query: Retrieve relevant documents for context to LLMs
- LangChain loaders make steps 1 and 2 simple and consistent. See RAG tutorial.
Using AzureBlobStorageLoader
Install the Package
pip install langchain-azure-storage
Load All Blobs From a Container
from langchain_azure_storage.document_loaders import AzureBlobStorageLoader
loader = AzureBlobStorageLoader("https://<your-storage-account>.blob.core.windows.net/", "<your-container-name>")
for doc in loader.lazy_load():
print(doc.metadata["source"])
print(doc.page_content)
Load Specific Blobs by Name
from langchain_azure_storage.document_loaders import AzureBlobStorageLoader
loader = AzureBlobStorageLoader(
"https://<your-storage-account>.blob.core.windows.net/",
"<your-container-name>",
["<blob-name-1>", "<blob-name-2>"]
)
for doc in loader.lazy_load():
print(doc.metadata["source"])
print(doc.page_content)
Customize Parsing with Loader Factory
from langchain_azure_storage.document_loaders import AzureBlobStorageLoader
from langchain_community.document_loaders import PyPDFLoader # install langchain-community and pypdf
loader = AzureBlobStorageLoader(
"https://<your-storage-account>.blob.core.windows.net/",
"<your-container-name>",
prefix="pdfs/", # only blobs prefixed with 'pdfs/'
loader_factory=PyPDFLoader
)
for doc in loader.lazy_load():
print(doc.page_content)
- This approach works with any file-path-based loader (e.g., DOCX, Markdown, custom formats).
Migration Guide: Community Loaders to langchain-azure-storage
If you use the old AzureBlobStorageContainerLoader or AzureBlobStorageFileLoader (from langchain-community), migrate by:
- Switching to
langchain-azure-storageas a dependency. - Updating imports to
langchain_azure_storage.document_loaders. - Using
AzureBlobStorageLoaderin place of container/file loaders. - Passing account URLs instead of connection strings.
- Adopting Microsoft Entra ID authentication (enable with
az loginor managed identity), moving away from connection strings.
Code Sample: Before/After
Before:
from langchain_community.document_loaders import AzureBlobStorageContainerLoader, AzureBlobStorageFileLoader
container_loader = AzureBlobStorageContainerLoader("DefaultEndpointsProtocol=https;AccountName=...", "container")
file_loader = AzureBlobStorageFileLoader("DefaultEndpointsProtocol=https;AccountName=...", "container", "blob")
After:
from langchain_azure_storage.document_loaders import AzureBlobStorageLoader
container_loader = AzureBlobStorageLoader("https://<account>.blob.core.windows.net", "container", loader_factory=UnstructuredLoader)
file_loader = AzureBlobStorageLoader("https://<account>.blob.core.windows.net", "container", "blob", loader_factory=UnstructuredLoader)
Resources and Feedback
You can provide feedback or request features by filing an issue on GitHub or emailing the team.
Try out the new loader and help make it better for the entire community!
This post appeared first on “Microsoft Tech Community”. Read the entire article here