Weekly Machine Learning Roundup: Faster Lakehouse Workflows

Oct 20, 2025 by TechHub

Recent ML updates target smoother data engineering and greater Azure integration, making performance and reliability improvements for lakehouse and machine learning frameworks common in big data workflows.

This Week's Overview

Microsoft Fabric Spark: Adaptive File Size Management for Delta Tables

Fabric Spark introduces adaptive file size management, automatically choosing optimal Delta table file sizes based on telemetry data. This automation streamlines ELT and analytics tasks, resulting in up to 2.8 times faster file compaction and 1.6 times TPC-DS performance improvements. Settings update automatically as workloads shift, but developers can tailor configurations to suit specific needs. Benefits also include improved data skipping, reduced file rewrite costs, and increased processing parallelism, all supporting secure and flexible solutions.

Adaptive Target File Size Management in Fabric Spark

Azure Data Lake Integrations: adlfs Python Library Improvements

The adlfs Python library receives speed improvements through parallel block uploads and smaller chunk defaults, helping users avoid timeouts on geo-distributed systems and supporting more secure data pipelines. Frameworks like Dask, Pandas, Ray, PyTorch, and PyIceberg work seamlessly with these updates, which include easier authentication and continued fsspec compatibility, supporting efficient integration for modern data and AI workflows.

Easily Connect AI Workloads to Azure Blob Storage with adlfs