Core Concepts#

The DQ Validator module is built around several core concepts that work together to provide comprehensive data quality validation.

Note

This section provides an overview of core concepts. Detailed API documentation is available in the API Reference.

Metadata#

Metadata defines the structure of your data and is the foundation of validation.

AssetMetadata#

Represents a data asset (table) with its columns:

from wxdi.dq_validator import AssetMetadata, ColumnMetadata, DataType

metadata = AssetMetadata(
    table_name="customers",
    columns=[
        ColumnMetadata(name="customer_id", data_type=DataType.INTEGER, position=0),
        ColumnMetadata(name="email", data_type=DataType.STRING, position=1),
        ColumnMetadata(name="age", data_type=DataType.INTEGER, position=2)
    ]
)

ColumnMetadata#

Defines individual column properties:

  • name: Column name

  • data_type: Expected data type

  • position: Position in the record array (0-based)

Validator#

The main validation engine that applies rules to records.

from wxdi.dq_validator import Validator

validator = Validator(metadata)
validator.add_rule(rule1)
validator.add_rule(rule2)

result = validator.validate(record)

ValidationRule#

Defines validation logic for a specific column.

from wxdi.dq_validator import ValidationRule
from wxdi.dq_validator.checks import CompletenessCheck, LengthCheck

rule = ValidationRule("email")
rule.add_check(CompletenessCheck())
rule.add_check(LengthCheck(min_length=5, max_length=100))

Validation Checks#

Individual validation checks that can be added to rules. See Validation Checks for details.

ValidationResult#

Contains the results of validating a single record:

result = validator.validate(record)

print(f"Score: {result.validation_score}")
print(f"Pass Rate: {result.pass_rate}%")
print(f"Errors: {len(result.errors)}")

Data Quality Dimensions#

Validation checks are categorized by 8 standard data quality dimensions:

  • Accuracy: Data correctly represents real-world values

  • Completeness: Required data is present

  • Conformity: Data conforms to specified formats

  • Consistency: Data is consistent across systems

  • Coverage: Data covers the required scope

  • Timeliness: Data is up-to-date

  • Uniqueness: Data has no duplicates where required

  • Validity: Data is valid according to business rules

For more information, see the API Reference.