DataFrame Integration

DataFrame Integration#

The DQ Validator module provides seamless integration with Pandas and PySpark DataFrames.

Note

This section provides an overview. Detailed API documentation is available in the API Reference.

Pandas Integration#

Memory-efficient chunked processing for Pandas DataFrames:

from wxdi.dq_validator.integrations import PandasValidator

validator = PandasValidator(metadata)
validator.add_rule(rule)

result_df = validator.validate_dataframe(df, chunk_size=1000)

PySpark Integration#

Distributed validation using Spark UDFs:

from wxdi.dq_validator.integrations import SparkValidator

validator = SparkValidator(metadata)
validator.add_rule(rule)

result_df = validator.validate_dataframe(spark_df)

For detailed usage examples and API documentation, see the API Reference.