Weekly Machine Learning Roundup: Fabric ops, dbt, and event actions

This week's ML-adjacent momentum mostly came through Microsoft Fabric, with updates that make analytics engineering more like a managed product: repeatable transformation workflows (dbt), more event-driven automation (Activator + UDFs), and steadier ingestion mechanics (Copy job upgrades, more connectors, easier troubleshooting). Building on last week's “pipelines over one-off notebooks” theme (Materialized Lake Views, Environments, Notebook Public APIs), the thread is Fabric turning those building blocks into managed operating surfaces: author in familiar tools, execute in Fabric, and connect actions with less custom glue. Fabric also tightened admin/governance with better workspace organization at scale.

Microsoft Fabric’s dbt roadmap: adapters, operational dbt Jobs, and a path to Fusion

Fabric continues treating dbt as a first-class workflow, focusing not just on adapter availability but on correctness for Fabric SQL semantics, materializations, and performance. This mirrors last week's shift toward declarative transforms via Materialized Lake Views: dbt is another “transformations as code” path, and Fabric is aiming for clean mapping to Warehouse and (soon) Lakehouse execution. Today, the recommendation for SQL-first managed warehouse work is the Fabric Warehouse dbt core adapter; a Fabric Lakehouse dbt core adapter is “coming soon” as GA for running dbt directly on Lakehouse tables in OneLake, aligned with Fabric governance and compute/storage separation. Operationally, dbt Jobs in Fabric (public preview since December 2025) is positioned as the control plane for scheduling, retries, environment promotion, and observability. This matches last week's “managed orchestration” focus (Notebook Public APIs + Job Scheduler): less interactive execution, more managed jobs with traceable outputs. Recent additions include public package support, native GitHub support (run jobs from GitHub-hosted dbt projects for CI/CD alignment), and OneLake-based enterprise logging with no size limits (removing the prior 1 MB cap). API support enables automation, and “coming soon” items include dbt Jobs as a Fabric Pipelines activity with parameterization, plus Lakehouse adapter support in dbt Jobs (Warehouse supported today). Looking ahead, Fabric called out planned dbt Fusion support expected later in calendar Q2 2026, focusing on clean Warehouse/Lakehouse adapter integration and aligned execution metadata/observability as Fusion enters dbt's runtime story. The net effect is a cohesive path: author in GitHub, execute/schedule in Fabric, centralize logs in OneLake, and adopt Fusion-backed execution later without reworking Warehouse/Lakehouse layouts.

Fabric Real-Time Intelligence: Activator grows from alerting into action (Teams, Spark, Dataflows, and UDF triggers)

Fabric Activator is expanding from “tell me something happened” to “do something when it happens,” adding rule actions to send Microsoft Teams messages and trigger compute/pipeline work: run a Spark job, run a User Data Function (UDF), or run a Dataflow (Dataflows Gen2). This reduces glue code by removing the need for custom listener services that translate events into downstream work, especially when teams want event-driven processing instead of scheduled refresh. It follows last week's automation direction: after notebooks became easier to run/manage via APIs, Activator now provides an “event → execution” surface inside Fabric without external schedulers. Two additions stand out for operational workflows. First, triggering UDFs from Activator creates a direct event-to-function bridge: rules can pass entity IDs, values, and timestamps into code, enabling incidents/runbooks/custom logic without new infrastructure. This pairs with this week's UDF defaults update: as UDFs become shared primitives invoked by rules, backwards-compatible signatures matter more. Second, Spark job and Dataflow actions can respond to Fabric and Azure Blob Storage events, enabling “data landed, process now” patterns rather than waiting for schedules, similar in spirit to last week's near-real-time pipeline patterns but implemented through Fabric's event/action model. Authoring surfaces broadened too: Warehouse SQL query monitoring rules (Preview) let rules run on ad-hoc or saved query results on a frequency, and Ontology entity rules (Preview) support entity-level conditions. Rule creation is now embedded in Eventstream, and Power BI integration improved so Activator can alert when a new row appears in a table visual in a published report, which helps when dashboards function as queue views.

Fabric Data Factory: Copy job and connector upgrades for incremental movement, CDC, and cross-cloud destinations

Fabric Data Factory's Copy job updates targeted ingestion constraints where schemas do not match ideal assumptions. This is Fabric's version of the “productionize the plumbing” story we touched last week (Databricks Lakeflow simplifying ingestion + CDC + SCD): in Fabric, improvements are landing in Copy job incremental and CDC behavior, which often blocks teams before transformations like MLVs or dbt. Incremental copy is now more flexible in GA with additional watermark types: ROWVERSION, date/datetime (with delayed extraction to reduce missed late updates), and string columns interpreted as datetime. This reduces custom query workarounds while still using built-in state tracking and checkpointing. CDC replication added three practical updates: Oracle as a CDC source, Fabric Data Warehouse as a CDC sink, and an SCD Type 2 write method in Preview as a simple toggle. The SCD2 option provides history-table semantics (new version rows on updates; soft deletes via expiring current versions), reducing per-table MERGE logic and custom frameworks. It echoes last week's SCD2-as-first-class capability in Databricks, but here it's pushed down into ingestion so history tables can be created earlier without bespoke transform code. Connector and throughput improvements also landed. SharePoint Online File is now GA as source/destination, easing “files in SharePoint” ingestion/publishing. BigQuery, MySQL, and PostgreSQL gained destination write support in Preview for more cross-cloud movement. “Native incremental copy” expanded to more connectors (including RDS variants, ODBC, GCS, SharePoint Lists/Files, Fabric Lakehouse tables/files), and automatic partitioning was introduced to speed large-table loads by parallelizing reads/writes via a selected partition column without manual setup.

Other ML News

Fabric's programmable surfaces got a small but useful update: User Data Functions (UDFs) now support default arguments in Python. Because inputs are JSON-serialized, defaults must be JSON-serializable (strings, numbers, booleans, arrays/lists, objects/dicts, and datetime-like strings, ideally ISO 8601). The guidance also reiterates standard Python practice for mutable defaults (use None then assign inside), which helps teams evolve shared UDFs without breaking callers. This pairs with Activator triggering UDFs: defaults allow signature extension without updating every rule immediately.