Microsoft Fabric Blog introduces a major update by enabling native Pandas DataFrame and Series support in User Data Functions within Fabric Notebooks. This enhancement, powered by Apache Arrow, streamlines large-scale data analysis and enables high-efficiency workflows for data engineers and scientists.

Enhancing Fabric Notebooks: Native Pandas DataFrame Support in User Data Functions

Microsoft Fabric has introduced a significant enhancement to its Notebook integration: User Data Functions (UDFs) now natively support Pandas DataFrames and Series both as input and output types, thanks to deep integration with Apache Arrow.

Key Highlights

  • Pandas Integration: UDFs in Fabric Notebooks now accept and return Pandas DataFrames and Series, greatly simplifying analytics workflows.
  • Powered by Apache Arrow: Adoption of Arrow’s efficient columnar memory format enables high-performance serialization, zero-copy data sharing, and improved scalability for large datasets.
  • Cross-language Compatibility: The update allows seamless use of UDFs in Python, PySpark, Scala, and R.

Benefits

  • Performance: Significant improvements in data transfer efficiency and reduced memory overhead.
  • Developer Experience: No need to manually serialize large datasets to JSON; DataFrames can be passed directly for computation.
  • Scalability: Effortlessly handle millions of rows in real-time analytics or feature engineering tasks.
  • Code Reuse and Collaboration: Teams can modularize business logic and share tested functions, promoting consistent implementation across projects.

Sample Usage

PySpark / Python

# Retrieve the function

agg_func = notebookutils.udf.getFunctions("AggregateRevenueByDriver")

# Example Pandas DataFrame input

import pandas as pd
df = pd.DataFrame({"driver_id": [1, 2, 1], "revenue": [100.0, 150.0, 200.0]})

# Call the UDF

esult_df = agg_func.aggregate(df)
print(result_df)

Scala

val aggFunc = notebookutils.udf.getFunctions("AggregateRevenueByDriver")
val input = Seq((1, 100.0), (2, 150.0), (1, 200.0)).toDF("driver_id", "revenue")
val result = aggFunc.aggregate(input)
result.show()

R

agg_func <- notebookutils.udf.getFunctions("AggregateRevenueByDriver")
df <- data.frame(driver_id = c(1, 2, 1), revenue = c(100.0, 150.0, 200.0))
result <- agg_func$aggregate(df)
print(result)

Impact

This update empowers data professionals to:

  • Accelerate interactive analytics on massive datasets
  • Standardize and reuse robust function logic across projects
  • Speed up workflows for real-time metrics, aggregation, and feature engineering

Getting Started

Register a Pandas-compatible UDF in your Fabric Notebook and start leveraging these performance benefits today. See NotebookUtils for Fabric documentation for setup instructions.

Further Reading


Article originally published by Microsoft Fabric Blog.

This post appeared first on “Microsoft Fabric Blog”. Read the entire article here