Weekly Machine Learning Roundup: Fabric ops, dbt, and event actions
This week's ML-adjacent momentum mostly came through Microsoft Fabric, with updates that make analytics engineering more like a managed product: repeatable transformation workflows (dbt), more event-driven automation (Activator + UDFs), and steadier ingestion mechanics (Copy job upgrades, more connectors, easier troubleshooting). Building on last week's “pipelines over one-off notebooks” theme (Materialized Lake Views, Environments, Notebook Public APIs), the thread is Fabric turning those building blocks into managed operating surfaces: author in familiar tools, execute in Fabric, and connect actions with less custom glue. Fabric also tightened admin/governance with better workspace organization at scale.
Microsoft Fabric’s dbt roadmap: adapters, operational dbt Jobs, and a path to Fusion
Fabric continues treating dbt as a first-class workflow, focusing not just on adapter availability but on correctness for Fabric SQL semantics, materializations, and performance. This mirrors last week's shift toward declarative transforms via Materialized Lake Views: dbt is another “transformations as code” path, and Fabric is aiming for clean mapping to Warehouse and (soon) Lakehouse execution. Today, the recommendation for SQL-first managed warehouse work is the Fabric Warehouse dbt core adapter; a Fabric Lakehouse dbt core adapter is “coming soon” as GA for running dbt directly on Lakehouse tables in OneLake, aligned with Fabric governance and compute/storage separation. Operationally, dbt Jobs in Fabric (public preview since December 2025) is positioned as the control plane for scheduling, retries, environment promotion, and observability. This matches last week's “managed orchestration” focus (Notebook Public APIs + Job Scheduler): less interactive execution, more managed jobs with traceable outputs. Recent additions include public package support, native GitHub support (run jobs from GitHub-hosted dbt projects for CI/CD alignment), and OneLake-based enterprise logging with no size limits (removing the prior 1 MB cap). API support enables automation, and “coming soon” items include dbt Jobs as a Fabric Pipelines activity with parameterization, plus Lakehouse adapter support in dbt Jobs (Warehouse supported today). Looking ahead, Fabric called out planned dbt Fusion support expected later in calendar Q2 2026, focusing on clean Warehouse/Lakehouse adapter integration and aligned execution metadata/observability as Fusion enters dbt's runtime story. The net effect is a cohesive path: author in GitHub, execute/schedule in Fabric, centralize logs in OneLake, and adopt Fusion-backed execution later without reworking Warehouse/Lakehouse layouts.
Fabric Real-Time Intelligence: Activator grows from alerting into action (Teams, Spark, Dataflows, and UDF triggers)
Fabric Activator is expanding from “tell me something happened” to “do something when it happens,” adding rule actions to send Microsoft Teams messages and trigger compute/pipeline work: run a Spark job, run a User Data Function (UDF), or run a Dataflow (Dataflows Gen2). This reduces glue code by removing the need for custom listener services that translate events into downstream work, especially when teams want event-driven processing instead of scheduled refresh. It follows last week's automation direction: after notebooks became easier to run/manage via APIs, Activator now provides an “event → execution” surface inside Fabric without external schedulers. Two additions stand out for operational workflows. First, triggering UDFs from Activator creates a direct event-to-function bridge: rules can pass entity IDs, values, and timestamps into code, enabling incidents/runbooks/custom logic without new infrastructure. This pairs with this week's UDF defaults update: as UDFs become shared primitives invoked by rules, backwards-compatible signatures matter more. Second, Spark job and Dataflow actions can respond to Fabric and Azure Blob Storage events, enabling “data landed, process now” patterns rather than waiting for schedules, similar in spirit to last week's near-real-time pipeline patterns but implemented through Fabric's event/action model. Authoring surfaces broadened too: Warehouse SQL query monitoring rules (Preview) let rules run on ad-hoc or saved query results on a frequency, and Ontology entity rules (Preview) support entity-level conditions. Rule creation is now embedded in Eventstream, and Power BI integration improved so Activator can alert when a new row appears in a table visual in a published report, which helps when dashboards function as queue views.
Fabric Data Factory: Copy job and connector upgrades for incremental movement, CDC, and cross-cloud destinations
Fabric Data Factory's Copy job updates targeted ingestion constraints where schemas do not match ideal assumptions. This is Fabric's version of the “productionize the plumbing” story we touched last week (Databricks Lakeflow simplifying ingestion + CDC + SCD): in Fabric, improvements are landing in Copy job incremental and CDC behavior, which often blocks teams before transformations like MLVs or dbt. Incremental copy is now more flexible in GA with additional watermark types: ROWVERSION, date/datetime (with delayed extraction to reduce missed late updates), and string columns interpreted as datetime. This reduces custom query workarounds while still using built-in state tracking and checkpointing. CDC replication added three practical updates: Oracle as a CDC source, Fabric Data Warehouse as a CDC sink, and an SCD Type 2 write method in Preview as a simple toggle. The SCD2 option provides history-table semantics (new version rows on updates; soft deletes via expiring current versions), reducing per-table MERGE logic and custom frameworks. It echoes last week's SCD2-as-first-class capability in Databricks, but here it's pushed down into ingestion so history tables can be created earlier without bespoke transform code. Connector and throughput improvements also landed. SharePoint Online File is now GA as source/destination, easing “files in SharePoint” ingestion/publishing. BigQuery, MySQL, and PostgreSQL gained destination write support in Preview for more cross-cloud movement. “Native incremental copy” expanded to more connectors (including RDS variants, ODBC, GCS, SharePoint Lists/Files, Fabric Lakehouse tables/files), and automatic partitioning was introduced to speed large-table loads by parallelizing reads/writes via a selected partition column without manual setup.
- Incremental copy gets more flexible—New watermark column types in Copy job in Fabric Data Factory (Generally Available)
- Richer CDC in Fabric Data Factory Copy job: Oracle source, Fabric Data Warehouse sink, and SCD Type 2 (Preview)
- Outstanding connectivity for data movement in Fabric Data Factory
Other ML News
Fabric's programmable surfaces got a small but useful update: User Data Functions (UDFs) now support default arguments in Python. Because inputs are JSON-serialized, defaults must be JSON-serializable (strings, numbers, booleans, arrays/lists, objects/dicts, and datetime-like strings, ideally ISO 8601). The guidance also reiterates standard Python practice for mutable defaults (use None then assign inside), which helps teams evolve shared UDFs without breaking callers. This pairs with Activator triggering UDFs: defaults allow signature extension without updating every rule immediately.
- Support for default arguments in Fabric User data functions Dataflow Gen2 troubleshooting is becoming more self-service. A Preview feature lets admins/support download a per-run diagnostic package from run history after completion. It bundles metadata, structured logs, execution traces, and runtime/environment signals, reducing time spent collecting evidence across views for failed or slow runs. This continues last week's day-2 manageability thread: as more execution becomes managed and event-driven, diagnostics determine whether failures are quickly explainable.
- Dataflow Gen2 – Dataflow Diagnostics Download (Preview) Workspace tags are now GA, providing a first-class way to label workspaces (team, project, environment, cost center) and filter them in the workspaces list and OneLake Catalog Explorer. Tags are also exposed via REST APIs (create/apply/remove and included in Get/List Workspaces), supporting automated inventory and governance reporting; Fabric Scanner APIs are expected to include tags later. This complements last week's API-driven ops push: as teams automate notebook/job lifecycles, programmatic workspace organization helps control sprawl.
- Find and manage workspaces faster with workspace tags (Generally Available) Fabric Open Mirroring added a GA ERP replication option: the BC2Fab Fabric Workload (Navida) replicates Dynamics 365 Business Central tables into Fabric with incremental change detection and schema evolution handling. The goal is lighter transformation-heavy ingestion and reduced load on production ERP, while enabling querying in Fabric engines and Power BI reporting on OneLake-backed copies. Like last week's consolidation of ingestion and governance for near-real-time pipelines, it continues moving replication closer to standardized OneLake landing zones so downstream dbt/MLV work can focus on shaping data, not extraction.
- Integrating Dynamics 365 Business Central with Microsoft Fabric using Open Mirroring with BC2Fab workload (Generally Available)