Harshi_mrinal provides a detailed walkthrough for monitoring and calculating blob expiry and retention periods in Azure Blob Storage, using inventory reports and Azure Synapse to ensure compliance with storage policies.

Calculating Expiry and Retention Days for Azure Blob Storage Using Inventory and Synapse

Managing large volumes of data in Azure Blob Storage and Azure Data Lake Gen2 involves ensuring data compliance and optimizing cloud storage costs. One critical aspect is identifying blobs that have been soft deleted or are set to expire in the future and calculating their remaining retention periods.

This guide outlines a step-by-step solution:

1. Understanding Blob Expiry and Soft Delete

  • Blob soft delete allows you to recover blobs after deletion within a specified retention period. After this period, data is permanently deleted.
  • Blob expiry can be configured through the Set Blob Expiry operation for hierarchical namespace (HNS) enabled storage accounts, scheduling automatic deletions.

2. Setting Blob Expiry

3. Generating and Accessing the Blob Inventory Report

  • Configure a Blob Inventory rule in your storage account.
  • Inventory jobs output a CSV file listing metadata for each blob, such as name, deletion status, expiry time, etc.
  • To download the report:
    • Go to the dedicated inventory container.
    • Navigate to the latest date folder.
    • Obtain the CSV file URL.

4. Querying Blob Inventory with Azure Synapse Analytics

SELECT
  LEFT([Name], CHARINDEX('/', [Name]) - 1) AS Container,
  RIGHT([Name], LEN([Name]) - CHARINDEX('/', [Name])) AS Blob,
  [Expiry-time]
FROM OPENROWSET(
  BULK '<URL to your inventory CSV file>',
  FORMAT = 'csv', PARSER_VERSION = '2.0', HEADER_ROW = TRUE
) AS Source;
  • To also retrieve retention days for soft-deleted blobs:
SELECT
  LEFT([Name], CHARINDEX('/', [Name]) - 1) AS Container,
  RIGHT([Name], LEN([Name]) - CHARINDEX('/', [Name])) AS Blob,
  [Expiry-time],
  RemainingRetentionDays
FROM OPENROWSET(
  BULK '<URL to your inventory CSV file>',
  FORMAT = 'csv', PARSER_VERSION = '2.0', HEADER_ROW = TRUE
) AS Source;
  • Null expiry values mean either the blob is not deleted or no expiry time is set.

5. Alternative Approaches

  • For organizations requiring automation, similar results can be achieved with PowerShell or Azure CLI scripts.

6. References

Summary

By leveraging Azure Blob Inventory and Synapse Analytics, you can efficiently manage your data’s lifecycle within Azure, ensuring policies and compliance requirements are met without manual overhead.

This post appeared first on “Microsoft Tech Community”. Read the entire article here