DQ Validator Module

Section Contents

DQ Validator Module#

The DQ Validator module provides comprehensive in-memory data quality validation capabilities for streaming data records, Pandas DataFrames, and PySpark DataFrames.

This module is designed for high-performance validation with support for:

  • Array-based Records: Optimized for streaming data

  • Pandas DataFrames: Memory-efficient chunked processing

  • PySpark DataFrames: Distributed validation at scale

  • REST API Integration: Integration with IBM Cloud Pak for Data

Key Capabilities#

Validation Engine

Core validation framework with metadata-driven rules and fluent API

Ten Check Types

Comprehensive validation coverage including length, format, datatype, range, regex, CEL expressions, and more

Data Quality Dimensions

Track validations across 8 standard DQ dimensions (Accuracy, Completeness, Conformity, etc.)

Result Consolidation

Aggregate and analyze validation results with detailed statistics

REST API Integration

Fetch rules from glossary, report issues, and integrate with IBM Cloud Pak for Data

Note

This documentation is under active development. More detailed content will be added in upcoming releases.