Weekly ML Roundup: Fabric Governance, RAG Discovery, and Ops
This Week's Overview
- Microsoft Fabric and OneLake: governance, discovery, and operations tighten up
- OneLake security reaches GA (RLS/CLS, UX, and APIs)
- OneLake catalog becomes native in Azure AI Foundry (GA)
- Cross-workspace discovery: OneLake Catalog Search API, MCP, and Fabric CLI (Preview)
- Capacity visibility and chargeback go GA
- Spark automation scales out: high concurrency Fabric Livy API (Preview)
- Monitoring hub adds centralized schedule failure notifications (Preview)
- Dataflow Gen2 introduces “My queries” for Power Query M reuse (Preview)
- Fabric Data Warehouse adds ALTER COLUMN to support schema evolution (Preview)
- Direct Lake on SQL: guidance to keep Power BI models fast and avoid DirectQuery fallback
- Community signal: Fabric Influencers Spotlight (April 2026)
Microsoft Fabric and OneLake: governance, discovery, and operations tighten up
A lot of this week's ML-adjacent data platform movement sits in Microsoft Fabric and OneLake, with several changes aimed at making governed data easier to find, safer to share, and cheaper to operate. The common thread is reducing the friction between “where the data lives” and “how you build models and analytics on top of it”, especially when multiple workspaces, teams, and capacity constraints get involved.
OneLake security reaches GA (RLS/CLS, UX, and APIs)
OneLake security is now generally available in Fabric, with a default-enablement rollout that puts security posture front and center instead of making it an optional add-on. The GA release includes improved role management UX, inline row-level security (RLS) validation, and a role creation wizard that guides you through RLS and column-level security (CLS) setup.
For developers and platform teams, the most practical change is the new REST APIs for role management, which makes it easier to manage governance with automation instead of click-ops. If you're building internal tooling around dataset onboarding, workspace provisioning, or “data product” publishing, you can now treat OneLake access control as part of your deployment pipeline, alongside networking (for example Private Link) and mirroring - a continuation of last week's thread that “operationalizing ML in Fabric” increasingly means baking security and environment boundaries into the default workflow.
OneLake catalog becomes native in Azure AI Foundry (GA)
Azure AI Foundry now has native OneLake catalog access (GA), so teams building RAG apps can discover and select governed Fabric/OneLake data without context switching between tools. This also pairs with knowledge base creation using Azure AI Search, aligning the “data discovery → indexing → retrieval” flow more closely with enterprise governance.
In practice, this reduces the gap between data platform ownership and application teams trying to ship copilots or AI assistants. If your organization standardizes on OneLake as the lake layer, Foundry can now act as a front door for building knowledge bases while staying inside the boundaries of cataloged, permissioned content - building on last week's focus on making the “plumbing” (streaming, lake structure, and private networking) production-ready by connecting it to the actual RAG build surface.
Cross-workspace discovery: OneLake Catalog Search API, MCP, and Fabric CLI (Preview)
OneLake Catalog Search is now available as a REST API in Preview, with additional tooling support via the Fabric Core MCP Server (Model Context Protocol) and a new fabric CLI find command. The theme here is discoverability at scale: locating items across workspaces becomes a programmable operation rather than a UI scavenger hunt.
For AI teams, this matters because RAG and analytics pipelines often need “inventory” capabilities (what tables exist, where, and under which governance rules) before they can safely build indexes, semantic models, or feature sets. The MCP angle is especially relevant if you're wiring catalog lookup into agentic workflows or internal developer tools, since MCP provides a standardized way for tools and agents to query systems - and it pairs naturally with last week's cross-workspace MLflow story by making “what exists across workspaces” and “what can be promoted across workspaces” feel like two parts of the same operational loop.
Capacity visibility and chargeback go GA
Fabric's capacity metrics app added a new health page plus timepoint summary/detail views, and the Fabric Chargeback app is now generally available to allocate capacity costs across workspaces and workloads. This is a welcome shift from “capacity is slow/expensive” complaints toward actionable data about where capacity is being spent and when the pressure spikes.
For teams running Spark, Warehousing, Eventhouse, and BI together, chargeback helps turn shared platform costs into something each product team can see and influence. It also gives platform owners a stronger feedback loop when they need to justify reserved capacity sizing, workload isolation decisions, or guardrails on expensive patterns - the same kind of operational maturity push we saw last week as Fabric added more “run it repeatedly and safely” building blocks around MLOps and real-time pipelines.
Spark automation scales out: high concurrency Fabric Livy API (Preview)
Fabric's Livy API now supports high concurrency sessions in Preview, enabling parallel Spark execution with session reuse via sessionTag, isolated REPLs, and monitoring in the Fabric Monitoring Hub. This targets a real operational pain point: orchestrators and automation systems often need to run many Spark jobs concurrently without paying the startup tax or losing observability.
If you build platform automation (for example triggering notebooks from CI, running data quality suites, or launching feature engineering pipelines), session reuse can reduce latency and make throughput more predictable. The inclusion of monitoring hooks matters too, because parallel session patterns are hard to debug without a clear view into which session ran what and when - and it complements last week's theme of tightening the MLOps loop by making “repeatable execution at scale” less dependent on one-off notebook runs.
Monitoring hub adds centralized schedule failure notifications (Preview)
Fabric's monitoring hub adds a Schedule failures page (Preview) to configure and manage email notifications for scheduled items across workspaces. Instead of distributing responsibility across many workspace owners (or relying on ad-hoc alerting), this centralizes “who gets paged when the pipeline breaks” in one place.
For data and ML teams, this is a small but practical reliability improvement. When feature pipelines, refreshes, or ingestion jobs fail silently, downstream model training and dashboards suffer, so tightening the failure-notification loop is often one of the fastest ways to improve end-to-end stability - especially after last week's emphasis on streaming and orchestration patterns where more moving parts can otherwise mean more silent failure modes.
Dataflow Gen2 introduces “My queries” for Power Query M reuse (Preview)
Dataflow Gen2 now has “My queries” in Preview, which lets you save Power Query M queries to a personal library and import them into other dataflows. This targets a common source of waste: rewriting the same cleanup and normalization logic across projects.
The bigger implication is standardization. If your team relies on Power Query for ingestion and prep feeding semantic models or training datasets, reusable M building blocks reduce drift between pipelines and make it easier to enforce shared transformation patterns (naming, type handling, PII redaction steps, and so on) - an incremental follow-on to last week's “make data prep more maintainable” story (nested folders, more orchestration options), but aimed at reuse and consistency rather than lake layout mechanics.
Fabric Data Warehouse adds ALTER COLUMN to support schema evolution (Preview)
Fabric Data Warehouse now supports ALTER TABLE ... ALTER COLUMN in Preview, enabling metadata-only schema changes such as widening numeric and string types. The key point is avoiding table rebuilds or rewriting stored data files for common evolution scenarios.
For developers, this is a quality-of-life improvement that reduces downtime and operational risk when upstream sources evolve. If you're building ML feature tables or curated “gold” datasets that change over time, being able to widen types without a migration rebuild helps keep pipelines flowing and reduces pressure to over-provision initial schemas “just in case” - and it maps cleanly to last week's medallion guidance emphasis on schema evolution as one of the operational details that determines whether curated layers stay trustworthy.
Direct Lake on SQL: guidance to keep Power BI models fast and avoid DirectQuery fallback
New guidance explains how Direct Lake on SQL works with Fabric Data Warehouse, with practical design advice to keep Power BI semantic models performant and avoid falling back to DirectQuery. The performance nuance matters because Direct Lake aims to combine lake-style storage with in-memory performance (via VertiPaq), but design choices can push you into less predictable query paths.
For teams building analytics layers that support ML monitoring or business-facing model outputs, this guidance helps reduce “it was fast in testing but slow in production” surprises. Expect to pay attention to model design details that influence when Direct Lake can stay in the happy path versus when it degrades to DirectQuery behavior - a useful complement to last week's focus on getting fresher signals into Fabric (Eventstream/Eventhouse), since low-latency data is only valuable if your consumption layer stays reliably fast.
Community signal: Fabric Influencers Spotlight (April 2026)
Microsoft published a roundup of community posts and videos spanning Fabric warehousing (including T-SQL AI functions), Power BI/DAX performance, SQL Server 2025 mirroring into Fabric, governance features, and real-time patterns with Eventstream, Eventhouse, and KQL. It's not a product release, but it is a good map of what practitioners are actively tuning and debating.
If you're trying to decide where to invest learning time, the mix is telling: performance (DAX and warehousing), governance, and real-time ingestion remain recurring themes. Those are the same pressure points you run into when you operationalize ML systems beyond notebooks and prototypes - and it mirrors last week's Fabric-heavy roundup where MLOps, streaming, and private networking kept showing up as the “stuff that breaks first” in real deployments.