DataFrame Integration#
The DQ Validator module provides seamless integration with Pandas and PySpark DataFrames.
Note
This section provides an overview. Detailed API documentation is available in the API Reference.
Pandas Integration#
Memory-efficient chunked processing for Pandas DataFrames:
from wxdi.dq_validator.integrations import PandasValidator
validator = PandasValidator(metadata)
validator.add_rule(rule)
result_df = validator.validate_dataframe(df, chunk_size=1000)
PySpark Integration#
Distributed validation using Spark UDFs:
from wxdi.dq_validator.integrations import SparkValidator
validator = SparkValidator(metadata)
validator.add_rule(rule)
result_df = validator.validate_dataframe(spark_df)
For detailed usage examples and API documentation, see the API Reference.