Core Classes API#
The core classes provide the foundation for data quality validation.
Validator#
- class wxdi.dq_validator.validator.Validator(metadata: AssetMetadata)#
Bases:
objectMain validator for data quality checks.
Supports both column-level and table-level validation rules: - Column-level rules: Validate individual column values - Table-level rules: Validate entire records with cross-column logic
Example
>>> from wxdi.dq_validator import ( ... Validator, ValidationRule, TableValidationRule, ... LengthCheck, TableCELCheck ... ) >>> >>> validator = Validator(metadata) >>> >>> # Column-level rule >>> validator.add_rule( ... ValidationRule('name') ... .add_check(LengthCheck(min_length=2)) ... ) >>> >>> # Table-level rule >>> validator.add_table_rule( ... TableValidationRule('business_rules') ... .add_check(TableCELCheck('salary > min_salary && age >= 18')) ... )
Initialize validator.
- Parameters:
metadata (
AssetMetadata) – Asset metadata defining table structure
- add_rule(rule: ValidationRule)#
Add a column-level validation rule (fluent API).
- Parameters:
rule (
ValidationRule) – The validation rule to add- Return type:
- Returns:
Self for method chaining
Example
>>> validator.add_rule( ... ValidationRule('email') ... .add_check(FormatCheck('email')) ... )
- add_table_rule(rule: TableValidationRule)#
Add a table-level validation rule (fluent API).
Table-level rules validate entire records, enabling cross-column validation and complex business logic.
- Parameters:
rule (
TableValidationRule) – The table validation rule to add- Return type:
- Returns:
Self for method chaining
Example
>>> validator.add_table_rule( ... TableValidationRule('salary_check') ... .add_check(TableCELCheck('salary > min_salary')) ... )
ValidationRule#
- class wxdi.dq_validator.rule.ValidationRule(column_name: str)#
Bases:
objectValidation rules for a specific column
Initialize validation rule
- Parameters:
column_name (
str) – Name of the column to validate
- add_check(check: BaseCheck)#
Add a validation check (fluent API)
- Parameters:
check (
BaseCheck) – The check to add- Return type:
- Returns:
Self for method chaining
- validate(record: List[Any], metadata: AssetMetadata)#
Validate the column value in the record
- Parameters:
metadata (
AssetMetadata) – Asset metadata for column mapping
- Return type:
- Returns:
List of validation errors (empty if all pass)
BaseCheck#
- class wxdi.dq_validator.base.BaseCheck(dimension: DataQualityDimension)#
Bases:
ABCBase class for all validation checks
Initialize base check with dimension
- Parameters:
dimension (
DataQualityDimension) – The data quality dimension this check belongs to
- get_dimension()#
Return dimension to which the check belongs
- Return type:
- set_dimension(dimension: DataQualityDimension)#
Set the dimension to which the check belongs
- Return type:
- Parameters:
dimension (DataQualityDimension)
- abstractmethod validate(value: Any, context: Dict[str, Any])#
Validate a value
- Parameters:
value (
Any) – The value to validatecontext (
Dict[str,Any]) – Additional context (e.g., other column values, metadata) Expected keys: - ‘column_name’: Name of the column being validated - ‘record’: The full record array (for column-to-column comparisons) - ‘metadata’: AssetMetadata object (for column lookups)
- Return type:
- Returns:
ValidationError if validation fails, None if passes
ValidationError#
Data Quality Dimensions#
- class wxdi.dq_validator.data_quality_dimension.DataQualityDimension(*values)#
Bases:
EnumStandard data quality dimensions with their definitions.
Each dimension represents a key aspect of data quality that can be measured and validated to ensure data meets business requirements.
- ACCURACY = 'The degree to which data correctly describes the real world object or event being described.'#
- COMPLETENESS = 'The proportion of data stored against the potential for 100%.'#
- CONFORMITY = 'The degree to which data adheres to defined standards, formats, and permissible values.'#
- CONSISTENCY = 'The absence of difference, when comparing two or more representations of a thing against a definition.'#
- COVERAGE = 'The extent to which the expected dataset is represented, typically measured by record counts or population completeness.'#
- TIMELINESS = 'The degree to which data represent reality from the required point in time.'#
- UNIQUENESS = 'No entity instance (thing) will be recorded more than once based upon how that thing is identified.'#
- VALIDITY = 'Data is valid if it conforms to the syntax of its definition.'#
Usage Examples#
See Core Concepts for detailed usage examples.