DQ Validator Module#
The DQ Validator module provides comprehensive in-memory data quality validation capabilities for streaming data records, Pandas DataFrames, and PySpark DataFrames.
This module is designed for high-performance validation with support for:
Array-based Records: Optimized for streaming data
Pandas DataFrames: Memory-efficient chunked processing
PySpark DataFrames: Distributed validation at scale
REST API Integration: Integration with IBM Cloud Pak for Data
Key Capabilities#
- Validation Engine
Core validation framework with metadata-driven rules and fluent API
- Ten Check Types
Comprehensive validation coverage including length, format, datatype, range, regex, CEL expressions, and more
- Data Quality Dimensions
Track validations across 8 standard DQ dimensions (Accuracy, Completeness, Conformity, etc.)
- Result Consolidation
Aggregate and analyze validation results with detailed statistics
- REST API Integration
Fetch rules from glossary, report issues, and integrate with IBM Cloud Pak for Data
Note
This documentation is under active development. More detailed content will be added in upcoming releases.