Weekly Machine Learning Roundup: OneLake, streaming, and ML ops

May 4, 2026 by TechHub

This week, the Machine Learning story was mostly about getting data into shape for ML and analytics at scale: Microsoft Fabric leaned further into OneLake as the common data layer, tightened up real-time streaming so features and signals can arrive with fewer surprises, and nudged SQL developers toward a more modern, Git-friendly workflow in VS Code. Alongside those platform updates, Microsoft also shared an early look at how unconventional hardware (and its digital twins) might run real lending models in the future.

This Week's Overview

Microsoft Fabric and OneLake: broader access to governed data and metadata

Fabric expanded the practical ways teams can discover and reuse data without copying it around. A new preview feature, “Mirrored Dremio catalog”, mirrors Dremio-managed Apache Iceberg catalog metadata into OneLake using shortcuts, which keeps access effectively zero-copy while still making those tables show up across Fabric workloads. The key idea is that Fabric can “see” the Iceberg tables through the catalog mirror rather than forcing another ingestion path, which is useful if Dremio already owns table layout, optimization, and governance. On the discovery side, Fabric also introduced a preview OneLake Catalog Search REST API for finding items across workspaces by metadata, with the same capability wired into the Fabric core MCP server and exposed in Fabric CLI as fab find. For teams trying to scale ML across multiple domains and workspaces, the value is less time hunting for the right lakehouse/warehouse/semantic model, and more consistent ways to script discovery into tooling (including agentic workflows) using API calls and CLI filtering (including JMESPath).

Real-time pipelines in Fabric: SQL-based streaming plus better observability

Fabric's real-time tooling got a clearer “build it, test it, run it, monitor it” arc this week. The Eventstreams SQL operator reached general availability, positioning SQL as a first-class way to express streaming transforms while adding production-ready capabilities like multi-destination fan-out, built-in testing, and event-time processing. Event-time processing matters when late or out-of-order events are normal (common in IoT, clickstreams, and operational logs) because it lets you reason about “when it happened” instead of “when it arrived”, which can stabilize aggregations and windowed calculations that downstream ML features depend on. At the same time, Eventstreams gained workspace monitoring in preview. Enabling it creates a managed monitoring Eventhouse and emits KQL tables that track node status, per-minute throughput, and per-minute error metrics. The guidance to republish existing Eventstreams to pick up monitoring is a practical detail for teams already running production pipelines: instrumentation is becoming part of the product surface, not a separate DIY logging project.

Fabric April 2026 update: MLflow logging, notebooks, and warehouse features that affect ML workflows

The April 2026 Fabric feature summary tied together several changes that land directly in day-to-day ML and analytics work. Fabric added VS Code-based workspace and environment management, which fits the broader theme of moving operational tasks closer to developer tooling. Notebooks picked up retry policies, a small-sounding change that can make scheduled training and feature engineering runs more resilient when transient failures happen. On the ML lifecycle side, Fabric now supports MLflow cross-workspace logging (including OAP workspaces), which is useful when teams separate experimentation, shared model registries, and production workspaces for governance. Semantic Link (SemPy) advanced to 0.14.0 with admin APIs, which matters for teams automating semantic model management and connecting Power BI semantics to Python-driven analysis and ML feature work. Data Warehouse improvements like transactional ALTER TABLE and COPY INTO support for JSONL also feed into ML pipelines, especially when teams stage semi-structured data and want predictable schema evolution and repeatable loads. Real-Time Intelligence updates (including Eventstream observability and an Eventhouse remote MCP) reinforce the push to make streaming systems easier to operate and easier to connect into automated workflows.

Fabric April 2026 Feature Summary

SQL development for Fabric: Azure Data Studio retirement and the move to VS Code

Fabric SQL developers got a clear direction: Azure Data Studio is retired, and the recommended path is VS Code with SQL Database Projects and the MSSQL extension. The emphasis is on adopting software-engineering workflows for database changes: Git-based source control, pull request reviews, schema compare, and publish script previews so teams can see what a deployment will do before it runs. For ML teams that manage feature-store-like tables or training data schemas in Fabric warehouses, this shift reduces “drift by manual edits” and makes schema changes auditable and reviewable. The VS Code MSSQL extension's support for GitHub Copilot is positioned as a productivity boost inside the editor, and Microsoft also called out an ADS migration toolkit to help teams move existing setups rather than starting from scratch.

Azure Data Studio to VS Code: What it means for SQL database in Fabric developers

Other Machine Learning News

Fabric pipelines continued to shift from classic ETL toward broader workflow orchestration, with a preview Approval activity that enables human-in-the-loop steps (useful for governance gates like model sign-off, data access approval, or controlled production promotion), plus more focus on observability for long-running workflows.

Pipelines are evolving beyond ETL A longer-horizon case study looked at a real fintech lending decisioning workload (weighted ensembles with explainability and auditability requirements) evaluated using Microsoft Analog Optical Computer (AOC) digital twins on Azure, offering an early signal of how alternative compute approaches might be tested against regulated ML scenarios before hardware is broadly available.
First real-world Lending ML workload evaluated on Microsoft Optical AOC Computer digital twins