Core Classes API#

The core classes provide the foundation for data quality validation.

Validator#

class wxdi.dq_validator.validator.Validator(metadata: AssetMetadata)#

Bases: object

Main validator for data quality checks.

Supports both column-level and table-level validation rules: - Column-level rules: Validate individual column values - Table-level rules: Validate entire records with cross-column logic

Example

>>> from wxdi.dq_validator import (
...     Validator, ValidationRule, TableValidationRule,
...     LengthCheck, TableCELCheck
... )
>>>
>>> validator = Validator(metadata)
>>>
>>> # Column-level rule
>>> validator.add_rule(
...     ValidationRule('name')
...         .add_check(LengthCheck(min_length=2))
... )
>>>
>>> # Table-level rule
>>> validator.add_table_rule(
...     TableValidationRule('business_rules')
...         .add_check(TableCELCheck('salary > min_salary && age >= 18'))
... )

Initialize validator.

Parameters:

metadata (AssetMetadata) – Asset metadata defining table structure

add_rule(rule: ValidationRule)#

Add a column-level validation rule (fluent API).

Parameters:

rule (ValidationRule) – The validation rule to add

Return type:

Validator

Returns:

Self for method chaining

Example

>>> validator.add_rule(
...     ValidationRule('email')
...         .add_check(FormatCheck('email'))
... )
add_table_rule(rule: TableValidationRule)#

Add a table-level validation rule (fluent API).

Table-level rules validate entire records, enabling cross-column validation and complex business logic.

Parameters:

rule (TableValidationRule) – The table validation rule to add

Return type:

Validator

Returns:

Self for method chaining

Example

>>> validator.add_table_rule(
...     TableValidationRule('salary_check')
...         .add_check(TableCELCheck('salary > min_salary'))
... )
validate(record: List[Any], record_index: int = 0)#

Validate a single record using both column-level and table-level rules.

Parameters:
  • record (List[Any]) – The record array to validate

  • record_index (int, default: 0) – Index of the record (for tracking)

Return type:

ValidationResult

Returns:

ValidationResult with errors and statistics

validate_batch(records: List[List[Any]])#

Validate multiple records

Parameters:

records (List[List[Any]]) – List of record arrays to validate

Return type:

List[ValidationResult]

Returns:

List of ValidationResult objects

ValidationRule#

class wxdi.dq_validator.rule.ValidationRule(column_name: str)#

Bases: object

Validation rules for a specific column

Initialize validation rule

Parameters:

column_name (str) – Name of the column to validate

add_check(check: BaseCheck)#

Add a validation check (fluent API)

Parameters:

check (BaseCheck) – The check to add

Return type:

ValidationRule

Returns:

Self for method chaining

validate(record: List[Any], metadata: AssetMetadata)#

Validate the column value in the record

Parameters:
  • record (List[Any]) – The record array

  • metadata (AssetMetadata) – Asset metadata for column mapping

Return type:

List[ValidationError]

Returns:

List of validation errors (empty if all pass)

BaseCheck#

class wxdi.dq_validator.base.BaseCheck(dimension: DataQualityDimension)#

Bases: ABC

Base class for all validation checks

Initialize base check with dimension

Parameters:

dimension (DataQualityDimension) – The data quality dimension this check belongs to

get_dimension()#

Return dimension to which the check belongs

Return type:

DataQualityDimension

set_dimension(dimension: DataQualityDimension)#

Set the dimension to which the check belongs

Return type:

None

Parameters:

dimension (DataQualityDimension)

abstractmethod validate(value: Any, context: Dict[str, Any])#

Validate a value

Parameters:
  • value (Any) – The value to validate

  • context (Dict[str, Any]) – Additional context (e.g., other column values, metadata) Expected keys: - ‘column_name’: Name of the column being validated - ‘record’: The full record array (for column-to-column comparisons) - ‘metadata’: AssetMetadata object (for column lookups)

Return type:

ValidationError | None

Returns:

ValidationError if validation fails, None if passes

abstractmethod get_check_name()#

Return the name of this check type

Return type:

str

ValidationError#

class wxdi.dq_validator.base.ValidationError(column_name: str, check_name: str, message: str, value: Any, expected: Any = None)#

Bases: object

Represents a validation error

Initialize validation error

Parameters:
  • column_name (str) – Name of the column that failed

  • check_name (str) – Type of check that failed

  • message (str) – Human-readable error message

  • value (Any) – The actual value that failed

  • expected (Any, default: None) – The expected value/constraint (optional)

to_dict()#

Convert to dictionary

Return type:

dict

Data Quality Dimensions#

class wxdi.dq_validator.data_quality_dimension.DataQualityDimension(*values)#

Bases: Enum

Standard data quality dimensions with their definitions.

Each dimension represents a key aspect of data quality that can be measured and validated to ensure data meets business requirements.

ACCURACY = 'The degree to which data correctly describes the real world object or event being described.'#
COMPLETENESS = 'The proportion of data stored against the potential for 100%.'#
CONFORMITY = 'The degree to which data adheres to defined standards, formats, and permissible values.'#
CONSISTENCY = 'The absence of difference, when comparing two or more representations of a thing against a definition.'#
COVERAGE = 'The extent to which the expected dataset is represented, typically measured by record counts or population completeness.'#
TIMELINESS = 'The degree to which data represent reality from the required point in time.'#
UNIQUENESS = 'No entity instance (thing) will be recorded more than once based upon how that thing is identified.'#
VALIDITY = 'Data is valid if it conforms to the syntax of its definition.'#
property description: str#

Returns the description of the dimension

classmethod get_all_dimensions()#

Returns all dimensions as a dictionary.

Returns:

Dictionary mapping dimension names to their descriptions

Return type:

dict

Usage Examples#

See Core Concepts for detailed usage examples.