Weekly Machine Learning Roundup: Open Models, Lakehouses, Faster Training
Machine learning updates this week include expanded cloud support for open models, step-by-step LLM deployment, scalable optimizers, and upgraded analytics tools. Microsoft introduced more open-source, cloud-native, and production-friendly options, while new tools like Dion are making large model training more efficient. Companies are highlighting useful deployment strategies and tuning guidance so teams can deliver quality ML systems with less friction.
Cloud-Native LLM Deployment and Optimization
A new, comprehensive guide walks through deploying OpenAI’s GPT-OSS-20B model on Azure Kubernetes Service (AKS) with KAITO and vLLM, using managed GPUs for scalable and reproducible inference. The tutorial covers everything from setting up clusters to benchmarking, making it easier for teams to roll out open LLMs in Azure environments.
Innovations in Data Lake Interoperability
Microsoft Fabric's OneLake now lets you access Delta Lake tables as Apache Iceberg format using Apache XTable. This enables analytics engines such as Spark, Trino, or Snowflake to work with lake data without ETL or duplication, advancing Microsoft’s vision for a more flexible, open lakehouse platform.
Advances in Distributed Optimization for AI Model Training
Microsoft Research introduced Dion, a distributed optimizer for training massive models like LLaMA-3 405B. Dion leverages orthonormal updates to make optimizer steps up to 10x faster while preserving accuracy, and works well with distributed training frameworks such as FSDP2 and tensor parallelism.
Practical Data Engineering and Analytics Platform Enhancements
A deep dive into the Spark UI offers practical advice for improving job run times, fixing data skew and joins, and spotting garbage collection issues—especially for Databricks users seeking to move past trial-and-error tuning.
- A Deep Dive into Spark UI for Job Optimization Microsoft Fabric's Copy Job feature now supports table-level incremental resets, automatic destination table creation, and JSON files—streamlining ETL pipeline deployment by reducing manual steps.
- Enhancements to Microsoft Fabric Copy Job: Reset Incremental Copy, Auto Table Creation, and JSON Support Azure Essentials Show featured Databricks, highlighting unified analytics, ML lifecycle support, and integration across the Azure platform—useful for developers building new skills for Azure ML environments.
- Supercharge Data and AI Innovation with Azure Databricks
Enterprise ML Transformation and Modern DataOps
A case study from Adastra and Heritage Grocers Group illustrates how Microsoft Fabric and Azure OpenAI unified post-acquisition data, powered predictive analytics, and rolled out a working system in just six months, showing real benefits from a modern, cloud-based ML setup.
Other ML News
Excel’s 40th anniversary content showcases its transformation into a capable platform for analytics and ML, including expanded modeling support, Power BI linkage, and deeper connection to Microsoft Fabric.