Microsoft Fabric Blog explains how to leverage Tonic Textual for secure AI/ML development by automatically preparing sensitive unstructured text within Fabric, maintaining privacy and compliance throughout the workflow.

Transform Sensitive Text into AI-Ready Data on Microsoft Fabric

Organizations face significant challenges unlocking sensitive unstructured text data for responsible and compliant AI/ML development. This guide demonstrates how integrating Tonic Textual into Microsoft Fabric empowers teams to process Office documents, PDFs, and images containing confidential information directly within their existing Fabric ecosystem.

Key Features

Automated Entity Detection & Redaction: Tonic Textual detects personal identifiers (names, dates, medical/financial details) in unstructured data, allowing users to anonymize or synthesize content securely.
Compliant Data Preparation: Process sensitive data within governed Fabric environments, maintaining compliance with regulations like HIPAA and GDPR.
End-to-End Integration: Workflows run natively in Fabric, eliminating manual off-platform preprocessing and minimizing risk.
Unlock Downstream Analytics: Privacy-preserving datasets become immediately usable for model training, generative AI, retrieval-augmented generation (RAG), and analytics apps.

Step-by-Step Workflow

1. Add the Tonic Textual Workload

From your Fabric console, add Tonic Textual as a workload to your workspace. This makes the Textual UI accessible directly within Microsoft Fabric.

For details on adding workloads, see the official documentation.

2. Configure Input and Output Locations

Select the source Lakehouse containing files to process and specify a destination folder for sanitized outputs.

3. Create a Tonic Textual Item

Open your workspace, create a new item, and select ‘Tonic Textual’.
Choose files or folders to process; these will be analyzed for sensitive data.

4. Scan Files for Sensitive Text

The tool scans and detects a range of sensitive entities within the chosen files.
Detected items (e.g., names, dates) are displayed for review.

5. Set De-identification Preferences

Decide whether to redact, synthesize (replace with realistic surrogates), or leave entities unchanged.
Bulk edit options enable large-scale processing across datasets.

6. Access Sanitized Outputs

Sanitized files are saved in the specified destination and are replicas with sensitive data safely redacted or synthesized.
Original files remain untouched in their source location.

7. Enable AI/ML Workflows

Use your newly sanitized data for:
- Building AI agents with Azure Copilot Studio
- AI-powered search via Azure AI Search
- Training custom ML models in Azure Machine Learning

Why This Matters

Compliance-First: Keeps privacy-sensitive data within your Microsoft Fabric environment
Seamless Integration: Eliminates risky manual transfers or external preprocessing
Acceleration: Enables rapid development of compliant, AI-ready datasets
Enterprise-Ready: Scales with organizational needs and Fabric’s governance framework

Learn More

Explore integration details and further resources at Tonic.ai’s Microsoft Fabric partner page
Attend upcoming sessions at Microsoft Ignite for live demonstrations

By leveraging Tonic Textual within Fabric, organizations move from blocked to AI-ready, ultimately driving innovation while maintaining privacy and regulatory compliance.

This post appeared first on “Microsoft Fabric Blog”. Read the entire article here