REST API Providers

Section Contents

REST API Providers#

The providers module integrates with IBM Cloud Pak for Data REST APIs.

Configuration#

ProviderConfig#

class wxdi.dq_validator.provider.config.ProviderConfig(url: str, auth_token: str | None = None, project_id: str | None = None, catalog_id: str | None = None, auth_config: AuthConfig | None = None)#

Bases: object

Parameters:
  • url (str)

  • auth_token (str | None)

  • project_id (str | None)

  • catalog_id (str | None)

  • auth_config (AuthConfig | None)

property auth_token: str#
get_auth_token()#
Return type:

str

Base Provider#

class wxdi.dq_validator.provider.base_provider.BaseProvider(config: ProviderConfig)#

Bases: object

Base provider class with shared functionality for all providers.

This class provides common functionality like thread-local session management that is shared across all provider implementations.

config#

Configuration containing URL and authentication token

Type:

ProviderConfig

Initialize the BaseProvider with configuration.

Parameters:

config (ProviderConfig) – Provider configuration with URL and auth token

property session: Session#

Get or create a thread-local session instance.

Returns:

A requests Session object unique to the current thread

Return type:

Session

Glossary Provider#

class wxdi.dq_validator.provider.glossary.GlossaryProvider(config: ProviderConfig)#

Bases: BaseProvider

Initialize the BaseProvider with configuration.

Parameters:

config (ProviderConfig) – Provider configuration with URL and auth token

get_published_artifact_by_id(artifact_id: str, options: dict | None = None)#
Return type:

GlossaryTerm

Parameters:
  • artifact_id (str)

  • options (dict | None)

get_term_by_version_id(artifact_id: str, version_id: str, options: dict | None = None)#
Return type:

GlossaryTerm

Parameters:
  • artifact_id (str)

  • version_id (str)

  • options (dict | None)

CAMS Provider#

class wxdi.dq_validator.provider.cams.CamsProvider(config: ProviderConfig)#

Bases: BaseProvider

Initialize the BaseProvider with configuration.

Parameters:

config (ProviderConfig) – Provider configuration with URL and auth token

get_asset_by_id(asset_id: str, options: dict | None = None)#
Return type:

DataAsset

Parameters:
  • asset_id (str)

  • options (dict | None)

Assets Provider#

class wxdi.dq_validator.provider.assets.DQAssetsProvider(config: ProviderConfig)#

Bases: BaseProvider

Provider for managing data quality assets.

This provider allows interaction with the data quality assets API, such as retrieving asset information.

Parameters:

config (ProviderConfig) – Configuration containing URL and authentication token

Example

>>> from dq_validator.provider import ProviderConfig, DQAssetsProvider
>>> config = ProviderConfig(
...     url="https://your-instance.com",
...     auth_token="Bearer your-token"
... )
>>> provider = DQAssetsProvider(config)
>>> assets = provider.get_assets(project_id="project-123")

Initialize the DQAssetsProvider with configuration.

Parameters:

config (ProviderConfig) – Provider configuration with URL and auth token

get_assets(project_id: str | None = None, catalog_id: str | None = None, start: str | None = None, limit: int | None = None, include_children: bool | None = None, asset_type: str | None = None)#

Get data quality assets.

This method retrieves data quality assets based on the provided filters.

Parameters:
  • project_id (str | None, default: None) – The project ID to use

  • catalog_id (str | None, default: None) – The catalog ID to use

  • start (str | None, default: None) – The start token for pagination

  • limit (int | None, default: None) – Maximum number of resources to return

  • include_children (bool | None, default: None) – If true, include children in the response

  • asset_type (str | None, default: None) – The type of resource to search

Returns:

The response from the API containing the assets data

Return type:

dict

Raises:

ValueError – If the API request fails or returns an error status, or if neither project_id nor catalog_id is provided, or if both are provided

Example

>>> # Using project_id
>>> provider.get_assets(
...     project_id="project-123",
...     limit=10,
...     asset_type="column"
... )
>>> # Using catalog_id
>>> provider.get_assets(
...     catalog_id="catalog-123",
...     include_children=True
... )

Dimensions Provider#

class wxdi.dq_validator.provider.dimensions.DimensionsProvider(config: ProviderConfig)#

Bases: BaseProvider

Provider for managing data quality dimensions.

This provider allows interaction with the data quality dimensions API, including retrieving dimension information.

Parameters:

config (ProviderConfig) – Configuration containing URL and authentication token

Example

>>> from dq_validator.provider import ProviderConfig, DimensionsProvider
>>> config = ProviderConfig(
...     url="https://your-instance.com",
...     auth_token="Bearer your-token"
... )
>>> provider = DimensionsProvider(config)
>>> dimension = provider.search_dimension()

Initialize the DimensionsProvider with configuration.

Parameters:

config (ProviderConfig) – Provider configuration with URL and auth token

search_dimension(name: str)#

Search for a data quality dimension ID by name.

This method searches for dimension information by name and returns the dimension ID. The name matching is case-insensitive.

Parameters:

name (str) – The name of the dimension (e.g., “Completeness”, “Accuracy”)

Returns:

The dimension ID

Return type:

str

Raises:

ValueError – If the API request fails, dimension not found, or response is invalid

Example

>>> provider = DimensionProvider(config)
>>> dimension_id = provider.search_dimension("Completeness")
'371114cd-5516-4691-8b2e-1e66edf66486'
>>> # Case-insensitive matching
>>> dimension_id = provider.search_dimension("completeness")
'371114cd-5516-4691-8b2e-1e66edf66486'

Checks Provider#

class wxdi.dq_validator.provider.checks.ChecksProvider(config: ProviderConfig)#

Bases: BaseProvider

Provider for managing data quality checks.

This provider allows interaction with the data quality checks API, including creating new checks for data assets.

Parameters:

config (ProviderConfig) – Configuration containing URL and authentication token

Example

>>> from dq_validator.provider import ProviderConfig, ChecksProvider
>>> config = ProviderConfig(
...     url="https://your-instance.com",
...     auth_token="Bearer your-token",
...     project_id="project-123"
... )
>>> provider = ChecksProvider(config)
>>> provider.create_check(
...     asset_id="asset-123",
...     column_name="email",
...     check_name="format_check",
...     number_of_occurrences=10,
...     total_records=100
... )

Initialize the ChecksProvider with configuration.

Parameters:

config (ProviderConfig) – Provider configuration with URL and auth token

create_check(name: str, dimension_id: str, native_id: str, check_type: str | None = None, project_id: str | None = None, catalog_id: str | None = None, parent_check_id: str | None = None)#

Create a new check for a data asset.

Note: Table-level checks are created without a parent_check_id, while column-level checks require the table-level check ID as parent_check_id to establish the hierarchical relationship.

Parameters:
  • name (str) – Name of the check (e.g., “check_uniqueness_of_id”)

  • dimension_id (str) – The dimension ID for the check

  • native_id (str) – Native ID in format “<asset_id>/<check_id>”

  • check_type (str | None, default: None) – Type of check (optional, defaults to the check name if not provided)

  • project_id (str | None, default: None) – The project ID containing the check

  • catalog_id (str | None, default: None) – The catalog ID containing the check

  • parent_check_id (str | None, default: None) – The parent check ID. Required for column-level checks (use table-level check ID). Omit for table-level checks.

Returns:

The check ID from the created check

Return type:

str

Raises:

ValueError – If the API request fails or returns an error status, or if neither project_id nor catalog_id is provided, or if both are provided

Example

>>> provider.create_check(
...     name="check_uniqueness_of_id",
...     dimension_id="371114cd-5516-4691-8b2e-1e66edf66486",
...     native_id="4cdcd382-4e3a-4537-b7ae-09993acee4cf/3e51167c-6eb2-4069-96dc-5d6df808fd47",
...     project_id="project-123"
... )
'6be18374-573a-4cf8-8ab7-e428506e428b'
>>> # With parent parameter
>>> provider.create_check(
...     name="Format check",
...     dimension_id="ec453723-669c-48bb-82c1-11b69b3b8c93",
...     native_id="ba23145a-6d0a-46db-b314-41526b1e465f/format/sample3",
...     project_id="project-123",
...     parent_check_id="848aaddc-7401-4a43-ad2b-96a0946d4674"
... )
'7be18374-573a-4cf8-8ab7-e428506e428c'
get_checks(dq_asset_id: str, check_type: str, project_id: str | None = None, catalog_id: str | None = None, include_children: bool = True)#

Get all checks for a specific asset filtered by check type.

Parameters:
  • dq_asset_id (str) – The data quality asset identifier (column asset ID)

  • check_type (str) – Type of check to filter by (e.g., “case”, “completeness”, “comparison”)

  • project_id (str | None, default: None) – The project ID containing the checks

  • catalog_id (str | None, default: None) – The catalog ID containing the checks

  • include_children (bool, default: True) – If true include the children in the returned resource. Defaults to True.

Returns:

List of check objects matching the criteria

Return type:

list

Raises:

ValueError – If the API request fails or returns an error status, or if neither project_id nor catalog_id is provided, or if both are provided

Example

>>> provider.get_checks(
...     dq_asset_id="column-asset-123",
...     check_type="case",
...     project_id="project-123"
... )
[{'id': 'check-id-1', 'type': 'case', ...}, ...]

Issues Provider#

class wxdi.dq_validator.provider.issues.IssuesProvider(config: ProviderConfig)#

Bases: BaseProvider

Provider for managing data quality issues.

This provider allows interaction with the data quality issues API, including updating issue occurrences and tested records.

Parameters:

config (ProviderConfig) – Configuration containing URL and authentication token

Example

>>> from dq_validator.provider import ProviderConfig, IssuesProvider
>>> config = ProviderConfig(
...     url="https://your-instance.com",
...     auth_token="Bearer your-token"
... )
>>> provider = IssuesProvider(config)
>>> provider.update_issue_values("issue-123", "project-123", occurrences=10, tested_records=100)

Initialize the IssuesProvider with configuration.

Parameters:

config (ProviderConfig) – Provider configuration with URL and auth token

update_issue_values(issue_id: str, occurrences: int, tested_records: int, project_id: str | None = None, catalog_id: str | None = None, operation: str = 'add')#

Update issue occurrences and tested records in a single PATCH call.

This method combines updates for both number_of_occurrences and number_of_tested_records in a single API call for better performance. Both occurrences and tested_records are mandatory.

Parameters:
  • issue_id (str) – The unique identifier of the issue to update

  • occurrences (int) – The number of occurrences to update/add (mandatory)

  • tested_records (int) – The number of tested records to update/add (mandatory)

  • project_id (str | None, default: None) – The project ID containing the issue

  • catalog_id (str | None, default: None) – The catalog ID containing the issue

  • operation (str, default: 'add') – Operation for both metrics - “add” or “replace” (default: “add”)

Returns:

The response from the API containing the updated issue data

Return type:

dict

Raises:

ValueError – If the API request fails or returns an error status, or if neither project_id nor catalog_id is provided, or if both are provided

Example

>>> # Update both metrics with add operation using project_id
>>> provider.update_issue_values("issue-123", occurrences=10, tested_records=100, project_id="project-456")
{'issue_id': 'issue-123', 'number_of_occurrences': 777, 'number_of_tested_records': 1100, ...}
>>> # Use catalog_id instead
>>> provider.update_issue_values("issue-123", occurrences=10, tested_records=100, catalog_id="catalog-789")
>>> # Use replace operation for both
>>> provider.update_issue_values(
...     "issue-123",
...     occurrences=100,
...     tested_records=1000,
...     project_id="project-456",
...     operation="replace"
... )
get_issue(reported_for_id: str, dq_check_id: str, project_id: str | None = None, catalog_id: str | None = None)#

Get the issue for a specific asset and check.

This method uses the REST API POST /data_quality/v4/search_dq_issue to search for an issue and returns the full response.

Parameters:
  • reported_for_id (str) – The DQ asset ID to search for

  • dq_check_id (str) – The check ID to search for

  • project_id (str | None, default: None) – The project ID containing the issue

  • catalog_id (str | None, default: None) – The catalog ID containing the issue

Returns:

The response from the API containing the issue data

Return type:

dict

Raises:

ValueError – If the API request fails or returns an error status, or if neither project_id nor catalog_id is provided, or if both are provided

Example

>>> # Using project_id
>>> provider.get_issue(
...     reported_for_id="1488a413-99f9-4bed-906d-c33b505d5728",
...     dq_check_id="ad277842-dea7-44ef-8e4b-d940df0f79aa",
...     project_id="24419069-d649-45cb-a2c1-64d6eed650d5"
... )
>>> # Using catalog_id
>>> provider.get_issue(
...     reported_for_id="1488a413-99f9-4bed-906d-c33b505d5728",
...     dq_check_id="ad277842-dea7-44ef-8e4b-d940df0f79aa",
...     catalog_id="catalog-123"
... )
{
    'id': 'b8f4252b-cd35-4668-9b35-4635bfc6e2e0',
    'project_id': '24419069-d649-45cb-a2c1-64d6eed650d5',
    ...
}
get_issue_id(reported_for_id: str, dq_check_id: str, project_id: str | None = None, catalog_id: str | None = None)#

Get the issue ID for a specific asset and check.

This method uses the REST API POST /data_quality/v4/search_dq_issue to search for an issue and returns just the issue ID.

Parameters:
  • reported_for_id (str) – The DQ asset ID to search for

  • dq_check_id (str) – The check ID to search for

  • project_id (str | None, default: None) – The project ID containing the issue

  • catalog_id (str | None, default: None) – The catalog ID containing the issue

Returns:

The issue ID

Return type:

str

Raises:

ValueError – If the API request fails or returns an error status, or if neither project_id nor catalog_id is provided, or if both are provided

Example

>>> # Using project_id
>>> provider.get_issue_id(
...     reported_for_id="1488a413-99f9-4bed-906d-c33b505d5728",
...     dq_check_id="ad277842-dea7-44ef-8e4b-d940df0f79aa",
...     project_id="24419069-d649-45cb-a2c1-64d6eed650d5"
... )
>>> # Using catalog_id
>>> provider.get_issue_id(
...     reported_for_id="1488a413-99f9-4bed-906d-c33b505d5728",
...     dq_check_id="ad277842-dea7-44ef-8e4b-d940df0f79aa",
...     catalog_id="catalog-123"
... )
'b8f4252b-cd35-4668-9b35-4635bfc6e2e0'
update_issue_metrics(occurrences: int, tested_records: int, column_name: str, check_type: str, project_id: str | None = None, catalog_id: str | None = None, asset_type: str = 'column', operation: str = 'add', asset_id: str | None = None, check_id: str | None = None, check_native_id: str | None = None)#

Update issue metrics using CAMS asset and check IDs or check native_id.

This is a convenience method that combines searching for the DQ asset, finding the issue by check ID, and updating its metrics in a single call.

Parameters:
  • occurrences (int) – The number of occurrences to update/add

  • tested_records (int) – The number of tested records to update/add

  • column_name (str) – The column name (required for column type assets)

  • check_type (str) – The type of check (e.g., “format”, “completeness”, “range”, “datatype”, “length”, “regex”, “valid_values”, “case”, “comparison”)

  • project_id (str | None, default: None) – The project ID containing the issue

  • catalog_id (str | None, default: None) – The catalog ID containing the issue

  • asset_type (str, default: 'column') – The type of asset (“column” or “table”). Default is “column”

  • operation (str, default: 'add') – Operation for both metrics - “add” or “replace”. Default is “add”

  • asset_id (str | None, default: None) – The CAMS data asset ID (required if check_native_id not provided)

  • check_id (str | None, default: None) – The CAMS check ID (required if check_native_id not provided)

  • check_native_id (str | None, default: None) – The check native_id (required if asset_id and check_id not provided). Format: “<asset_id>/<check_id>” where check_id can contain slashes

Returns:

The response from the API containing the updated issue data

Return type:

dict

Raises:

ValueError – If the API request fails or returns an error status, or if neither project_id nor catalog_id is provided, or if both are provided, or if neither (asset_id + check_id) nor check_native_id is provided

Example

>>> # Using asset_id and check_id with project_id
>>> provider.update_issue_metrics(
...     asset_id="b2debda2-6ab9-4a39-8c23-17954e004dcf",
...     check_id="7377e2cd-ac0e-4833-8760-fd0e8cb682aa",
...     occurrences=10,
...     tested_records=100,
...     column_name="RTN",
...     check_type="format",
...     project_id="24419069-d649-45cb-a2c1-64d6eed650d5",
...     asset_type="column"
... )
>>> # Using check_native_id with project_id
>>> provider.update_issue_metrics(
...     check_native_id="b2debda2-6ab9-4a39-8c23-17954e004dcf/rtn/format",
...     occurrences=10,
...     tested_records=100,
...     column_name="RTN",
...     check_type="format",
...     project_id="24419069-d649-45cb-a2c1-64d6eed650d5"
... )
>>> # Using check_native_id with catalog_id
>>> provider.update_issue_metrics(
...     check_native_id="b2debda2-6ab9-4a39-8c23-17954e004dcf/rtn/format",
...     occurrences=10,
...     tested_records=100,
...     column_name="RTN",
...     check_type="format",
...     catalog_id="catalog-789"
... )
{'issue_id': 'b8f4252b-cd35-4668-9b35-4635bfc6e2e0', 'number_of_occurrences': 10, ...}
get_issues(dq_asset_id: str, check_type: str, check_id: str, project_id: str | None = None, catalog_id: str | None = None, limit: int | None = 200, latest_only: bool | None = True, include_children: bool | None = False, sort_by: str | None = 'check_name', sort_direction: str | None = 'asc')#

Get issues for a specific DQ asset and check type, filtered by check_id.

This method uses the REST API GET /data_quality/v4/issues to retrieve issues based on the DQ asset ID and check type, then filters to return only the issue whose check.native_id contains the specified check_id.

Parameters:
  • dq_asset_id (str) – The DQ asset ID (reported_for.id)

  • check_type (str) – The type of check (e.g., “completeness”, “format”, “range”, “datatype”, “length”, “regex”, “valid_values”, “case”, “comparison”)

  • check_id (str) – The check ID to filter by. Returns only the issue whose check.native_id contains this check_id

  • project_id (str | None, default: None) – The project ID containing the issues

  • catalog_id (str | None, default: None) – The catalog ID containing the issues

  • limit (int | None, default: 200) – Maximum number of issues to return. Default is 20

  • latest_only (bool | None, default: True) – Return only the latest issues. Default is True

  • include_children (bool | None, default: False) – Include child issues. Default is False

  • sort_by (str | None, default: 'check_name') – Field to sort by. Default is “check_name”

  • sort_direction (str | None, default: 'asc') – Sort direction (“asc” or “desc”). Default is “asc”

Returns:

The specific issue item that matches the check_id, or None if no match found

Return type:

dict | None

Raises:

ValueError – If the API request fails or returns an error status, or if neither project_id nor catalog_id is provided, or if both are provided

Example

>>> # Filter by check_id with catalog_id
>>> provider.get_issues(
...     dq_asset_id="08b139ca-35a6-4b61-b87b-aa832870d89c",
...     check_type="format",
...     check_id="45a0e78c-ee2c-40bc-956f-743251cef2a6",
...     catalog_id="07708fd8-8d77-4a07-a01b-0132130bce0e"
... )
{
    'id': '656a80e0-64b4-418e-bdf4-de86450d2e76',
    'check': {
        'id': '6ff20abc-b41e-4c2e-8bc8-2ad94e3fd562',
        'native_id': 'b7a254e8-a88d-44e2-920d-5b237a1085dd/45a0e78c-ee2c-40bc-956f-743251cef2a6',
        ...
    },
    ...
}
>>> # Using project_id
>>> provider.get_issues(
...     dq_asset_id="08b139ca-35a6-4b61-b87b-aa832870d89c",
...     check_type="format",
...     check_id="b8f3616c-dac2-40bb-a4d3-59aba475ebee",
...     project_id="24419069-d649-45cb-a2c1-64d6eed650d5"
... )
create_issue(dq_check_id: str, reported_for_id: str, number_of_occurrences: int, number_of_tested_records: int, status: str = 'actual', ignored: bool = False, project_id: str | None = None, catalog_id: str | None = None)#

Create a new data quality issue.

This method creates a new issue for a specific check and data asset.

Parameters:
  • dq_check_id (str) – The ID of the check for which to create the issue

  • reported_for_id (str) – The ID of the data asset being reported on

  • number_of_occurrences (int) – Number of issue occurrences

  • number_of_tested_records (int) – Total number of records tested

  • status (str, default: 'actual') – Status of the issue (default: “actual”)

  • ignored (bool, default: False) – Whether the issue is ignored (default: False)

  • project_id (str | None, default: None) – The project ID containing the issue

  • catalog_id (str | None, default: None) – The catalog ID containing the issue

Returns:

The created issue ID

Return type:

str

Raises:

ValueError – If the API request fails or returns an error status, or if neither project_id nor catalog_id is provided, or if both are provided

Example

>>> provider.create_issue(
...     dq_check_id="6be18374-573a-4cf8-8ab7-e428506e428b",
...     reported_for_id="894d01fd-bdfc-4a4f-b68b-62751e06e06a",
...     number_of_occurrences=123,
...     number_of_tested_records=456789,
...     project_id="project-123"
... )
'046605b5-48d9-489e-b846-8ef96a7a1aba'
create_issues_bulk(payload: dict, project_id: str | None = None, catalog_id: str | None = None, incremental_reporting: bool = False, refresh_assets: bool = False)#

Create multiple data quality issues in bulk.

This method creates multiple issues, assets, and checks in a single API call for better performance when reporting multiple related issues.

Parameters:
  • payload (dict) –

    The bulk payload containing issues, assets, and existing_checks arrays. Expected structure is

    >>> {
    ...     "issues": [
    ...         {
    ...             "check": {"native_id": str, "type": str},
    ...             "reported_for": {"native_id": str, "type": str},
    ...             "number_of_occurrences": int,
    ...             "number_of_tested_records": int,
    ...             "status": str,
    ...             "ignored": bool
    ...         },
    ...         ...
    ...     ],
    ...     "assets": [
    ...         {
    ...             "name": str,
    ...             "type": str,
    ...             "native_id": str,
    ...             "weight": int,
    ...             "parent": {"native_id": str, "type": str} (optional)
    ...         },
    ...         ...
    ...     ],
    ...     "existing_checks": [
    ...         {"native_id": str, "type": str},
    ...         ...
    ...     ]
    ... }
    

  • project_id (str | None, default: None) – The project ID containing the issues

  • catalog_id (str | None, default: None) – The catalog ID containing the issues

  • incremental_reporting (bool, default: False) – If true, adds archived issue counts to new issues instead of replacing them. Default is False.

  • refresh_assets (bool, default: False) – If true, assets will be refreshed and any assets not present in the updated list will be deleted. Default is False.

Returns:

The response from the API containing the created issues data

Return type:

dict

Raises:

ValueError – If the API request fails or returns an error status, or if neither project_id nor catalog_id is provided, or if both are provided

Example

>>> payload = {
...     "issues": [
...         {
...             "check": {
...                 "native_id": "ba23145a-6d0a-46db-b314-41526b1e465f/format/Validity",
...                 "type": "format"
...             },
...             "reported_for": {
...                 "native_id": "ba23145a-6d0a-46db-b314-41526b1e465f",
...                 "type": "data_asset"
...             },
...             "number_of_occurrences": 200,
...             "number_of_tested_records": 1000,
...             "status": "aggregation",
...             "ignored": False
...         }
...     ],
...     "assets": [
...         {
...             "name": "ACCOUNT_HOLDERS.csv",
...             "type": "data_asset",
...             "native_id": "ba23145a-6d0a-46db-b314-41526b1e465f",
...             "weight": 1
...         }
...     ],
...     "existing_checks": [
...         {
...             "native_id": "ba23145a-6d0a-46db-b314-41526b1e465f/format/Validity",
...             "type": "format"
...         }
...     ]
... }
>>> provider.create_issues_bulk(
...     payload=payload,
...     project_id="project-123",
...     incremental_reporting=True
... )
{'issues': [...], 'assets': [...], ...}

DQ Search Provider#

class wxdi.dq_validator.provider.dq_search.DQSearchProvider(config: ProviderConfig)#

Bases: BaseProvider

Provider for searching data quality checks and assets.

This provider allows searching for DQ checks and assets using their native IDs.

Parameters:

config (ProviderConfig) – Configuration containing URL and authentication token

Example

>>> from dq_validator.provider import ProviderConfig, DQSearchProvider
>>> config = ProviderConfig(
...     url="https://your-instance.com",
...     auth_token="Bearer your-token"
... )
>>> provider = DQSearchProvider(config)
>>> check = provider.search_dq_check(
...     native_id="b2debda2-6ab9-4a39-8c23-17954e004dcf/7377e2cd-ac0e-4833-8760-fd0e8cb682aa",
...     check_type="format",
...     project_id="24419069-d649-45cb-a2c1-64d6eed650d5"
... )

Initialize the DQSearchProvider with configuration.

Parameters:

config (ProviderConfig) – Provider configuration with URL and auth token

search_dq_check(native_id: str, check_type: str, project_id: str | None = None, catalog_id: str | None = None, include_children: bool = True)#

Search for a DQ check by native ID and type.

This method searches for data quality checks using the native ID (format: <cams_data_asset_id>/<check_id>) and the check type.

Parameters:
  • native_id (str) – The native ID of the check in format <cams_data_asset_id>/<check_id>

  • check_type (str) – The type of check (e.g., “format”, “completeness”, “range”, etc.)

  • project_id (str | None, default: None) – The project ID containing the check

  • catalog_id (str | None, default: None) – The catalog ID containing the check

  • include_children (bool, default: True) – Include child checks. Default is True

Returns:

The response from the API containing the check data

Return type:

dict

Raises:

ValueError – If the API request fails or returns an error status, or if neither project_id nor catalog_id is provided, or if both are provided

Example

>>> # Using project_id
>>> provider.search_dq_check(
...     native_id="b2debda2-6ab9-4a39-8c23-17954e004dcf/7377e2cd-ac0e-4833-8760-fd0e8cb682aa",
...     check_type="format",
...     project_id="24419069-d649-45cb-a2c1-64d6eed650d5"
... )
>>> # Using catalog_id
>>> provider.search_dq_check(
...     native_id="b2debda2-6ab9-4a39-8c23-17954e004dcf/7377e2cd-ac0e-4833-8760-fd0e8cb682aa",
...     check_type="format",
...     catalog_id="catalog-123"
... )
{
    'id': 'ad277842-dea7-44ef-8e4b-d940df0f79aa',
    'name': 'Format check',
    'type': 'format',
    ...
}
search_dq_asset(native_id: str, project_id: str | None = None, catalog_id: str | None = None, asset_type: str = 'column', include_children: bool = True, get_actual_asset: bool = False)#

Search for a DQ asset by native ID.

This method searches for data quality assets using the native ID (format: <cams_data_asset_id>/<column_name>).

Parameters:
  • native_id (str) – The native ID of the asset in format <cams_data_asset_id>/<column_name>

  • project_id (str | None, default: None) – The project ID containing the asset

  • catalog_id (str | None, default: None) – The catalog ID containing the asset

  • asset_type (str, default: 'column') – The type of asset. Default is “column”

  • include_children (bool, default: True) – Include child assets. Default is True

  • get_actual_asset (bool, default: False) – Get the actual asset details. Default is False

Returns:

The response from the API containing the asset data

Return type:

dict

Raises:

ValueError – If the API request fails or returns an error status, or if neither project_id nor catalog_id is provided, or if both are provided

Example

>>> # Using project_id
>>> provider.search_dq_asset(
...     native_id="b2debda2-6ab9-4a39-8c23-17954e004dcf/RTN",
...     project_id="24419069-d649-45cb-a2c1-64d6eed650d5",
...     asset_type="column"
... )
>>> # Using catalog_id
>>> provider.search_dq_asset(
...     native_id="b2debda2-6ab9-4a39-8c23-17954e004dcf/RTN",
...     catalog_id="catalog-123",
...     asset_type="column"
... )
{
    'id': '1488a413-99f9-4bed-906d-c33b505d5728',
    'name': 'RTN',
    'type': 'column',
    ...
}

Data Models#

DataAsset#

Pydantic models for Data Asset API responses

class wxdi.dq_validator.provider.data_asset_model.AssetMetadata(*args: Any, **kwargs: Any)#

Bases: BaseModel

Metadata for data asset

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

project_id: str#
name: str#
catalog_id: str#
tags: List[str] | None = []#
asset_type: str | None = None#
created: int | None = None#
created_at: str | None = None#
owner_id: str | None = None#
size: int | None = None#
version: int | None = None#
asset_state: str | None = None#
asset_attributes: List[str] | None = []#
asset_id: str | None = None#
asset_category: str | None = None#
creator_id: str | None = None#
class wxdi.dq_validator.provider.data_asset_model.ColumnType(*args: Any, **kwargs: Any)#

Bases: BaseModel

Column type information

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

type: str#
scale: int | None = None#
length: int | None = None#
signed: bool | None = None#
nullable: bool | None = True#
native_type: str | None = None#
class wxdi.dq_validator.provider.data_asset_model.Column(*args: Any, **kwargs: Any)#

Bases: BaseModel

Column definition in data asset

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

name: str#
type: ColumnType#
class wxdi.dq_validator.provider.data_asset_model.Property(*args: Any, **kwargs: Any)#

Bases: BaseModel

Property key-value pair

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

name: str#
value: str#
class wxdi.dq_validator.provider.data_asset_model.DataAssetInfo(*args: Any, **kwargs: Any)#

Bases: BaseModel

Data asset information

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

columns: List[Column]#
dataset: bool#
mime_type: str | None = None#
properties: List[Property] | None = []#
class wxdi.dq_validator.provider.data_asset_model.DataClass(*args: Any, **kwargs: Any)#

Bases: BaseModel

Data class information

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

id: str#
name: str | None = None#
setByUser: bool | None = False#
confidence: float | None = None#
class wxdi.dq_validator.provider.data_asset_model.DataClassInfo(*args: Any, **kwargs: Any)#

Bases: BaseModel

Data class information for a column

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

suggested_classes: List[DataClass] | None = None#
selected_data_class: DataClass | None = None#
class wxdi.dq_validator.provider.data_asset_model.InferredType(*args: Any, **kwargs: Any)#

Bases: BaseModel

Inferred type information

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

type: str#
scale: int | None = None#
length: int | None = None#
precision: int | None = None#
display_name: str | None = None#
class wxdi.dq_validator.provider.data_asset_model.ColumnInfo(*args: Any, **kwargs: Any)#

Bases: BaseModel

Column information including checks and data class

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

data_class: DataClassInfo | None = None#
column_checks: List[DataQualityConstraint] | None = []#
inferred_type: InferredType | None = None#
rejected_checks: List[Any] | None = []#
suggested_checks: List[Any] | None = []#
class wxdi.dq_validator.provider.data_asset_model.RecordInfo(*args: Any, **kwargs: Any)#

Bases: BaseModel

Record information

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

computed: bool#
approximated: bool#
number_of_records: int#
class wxdi.dq_validator.provider.data_asset_model.ExtendedMetadata(*args: Any, **kwargs: Any)#

Bases: BaseModel

Extended metadata item

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

name: str#
value: str#
class wxdi.dq_validator.provider.data_asset_model.AssetDataQualityConstraint(*args: Any, **kwargs: Any)#

Bases: BaseModel

Asset-level data quality constraints

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

asset_checks: List[Any] | None = []#
rejected_checks: List[Any] | None = []#
suggested_checks: List[Any] | None = []#
class wxdi.dq_validator.provider.data_asset_model.DataAssetEntity(*args: Any, **kwargs: Any)#

Bases: BaseModel

Entity containing all data asset information

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

data_asset: DataAssetInfo#
column_info: Dict[str, ColumnInfo] | None = None#
asset_data_quality_constraint: AssetDataQualityConstraint | None = None#
class wxdi.dq_validator.provider.data_asset_model.DataAsset(*args: Any, **kwargs: Any)#

Bases: BaseModel

Root model for data asset response

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

metadata: AssetMetadata#
entity: DataAssetEntity#
classmethod from_dict(data: Dict)#

Create DataAsset instance from dictionary

Return type:

DataAsset

Parameters:

data (Dict)

Constraint Models#

Pydantic models for Data Quality Constraints

class wxdi.dq_validator.provider.constraint_model.CheckType(*values)#

Bases: StrEnum

Enumeration of data quality check types

UNIQUENESS = 'uniqueness'#
COMPLETENESS = 'completeness'#
COMPARISON = 'comparison'#
DATA_CLASS = 'data_class'#
DATA_TYPE = 'data_type'#
FORMAT = 'format'#
RANGE = 'range'#
POSSIBLE_VALUES = 'possible_values'#
REGEX = 'regex'#
LENGTH = 'length'#
RULE = 'rule'#
CASE = 'case'#
NONSTANDARD_MISSING_VALUES = 'nonstandard_missing_values'#
SUSPECT_VALUES = 'suspect_values'#
REFERENTIAL_INTEGRITY = 'referential_integrity'#
HISTORY_STABILITY = 'history_stability'#
class wxdi.dq_validator.provider.constraint_model.CheckConstraint(*args: Any, **kwargs: Any)#

Bases: BaseModel

Data quality check constraint

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

name: str#
value: str | None = None#
numeric_value: int | None = None#
boolean_value: bool | None = None#
list_value: list | None = None#
timestamp_value: datetime | None = None#
date_value: datetime | None = None#
time_value: datetime | None = None#
classmethod from_dict(data: Dict)#

Create CheckConstraint instance from dictionary

Return type:

CheckConstraint

Parameters:

data (Dict)

get_constraint_value()#

Get the actual constraint value from the appropriate field

Return type:

Any

class wxdi.dq_validator.provider.constraint_model.CheckDefinition#

Bases: object

Definition of a data quality check with all possible parameters

range_type: str | None = None#
min: int | None = None#
max: int | None = None#
case_type: str | None = None#
missing_values_allowed: bool | None = None#
data_type: str | None = None#
precision: int | None = None#
scale: int | None = None#
length: int | None = None#
formats: List[str] | None = None#
unique: bool | None = None#
expression: str | None = None#
values: List[str] | None = None#
data_class: str | None = None#
allowed_values: List[str] | None = None#
parent_asset_id: str | None = None#
parent_column_name: str | None = None#
metric: List[str] | None = None#
class wxdi.dq_validator.provider.constraint_model.ConstraintMetadata(*args: Any, **kwargs: Any)#

Bases: BaseModel

Metadata for data quality constraints

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

type: CheckType#
check_id: str | None = None#
confirmed: bool | None = None#
hidden: bool | None = None#
dimension: str | None = None#
created_at: str | None = None#
description: str | None = None#
modified_at: str | None = None#
origin_type: str | None = None#
classmethod from_dict(data: Dict)#

Create ConstraintMetadata instance from dictionary

Return type:

ConstraintMetadata

Parameters:

data (Dict)

class wxdi.dq_validator.provider.constraint_model.DataQualityConstraint(*args: Any, **kwargs: Any)#

Bases: BaseModel

Data quality constraint model

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

metadata: ConstraintMetadata#
origin: List[dict]#
check: List[CheckConstraint]#
classmethod from_dict(data: Dict)#

Create DataQualityConstraint instance from dictionary

Return type:

DataQualityConstraint

Parameters:

data (Dict)

map_checks()#

Map check constraints to a CheckDefinition object

Return type:

CheckDefinition

to_check()#

Convert the constraint to a BaseCheck instance

Return type:

BaseCheck | None

Returns:

BaseCheck instance or None if the check type is not supported

Response Models#

class wxdi.dq_validator.provider.response_model.Metadata(*args: Any, **kwargs: Any)#

Bases: BaseModel

Base metadata model for artifacts

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

to_iso(dt: datetime)#
Parameters:

dt (datetime)

artifact_type: str#
artifact_id: str#
version_id: str#
source_repository_id: str#
global_id: str#
workflow_id: str | None = None#
draft_mode: str | None = None#
is_target_draft: bool | None = None#
effective_start_date: datetime | None = None#
created_by: str#
created_at: datetime#
modified_by: str#
modified_at: datetime#
revision: str#
state: str#
read_only: bool | None = None#
draft_ancestor_id: str | None = None#
name: str | None = None#
tags: List[str] | None = None#
steward_ids: List[str] | None = None#
steward_group_ids: List[str] | None = None#
workflow_state: str | None = None#
user_access: bool | None = None#
classmethod from_dict(data: Dict)#

Create Metadata instance from dictionary

Return type:

Metadata

Parameters:

data (Dict)

class wxdi.dq_validator.provider.response_model.ExtendedAttributeGroups(*args: Any, **kwargs: Any)#

Bases: BaseModel

Extended attribute groups containing data quality constraints

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

dq_constraints: List[DataQualityConstraint]#
classmethod from_dict(data: Dict)#

Create ExtendedAttributeGroups instance from dictionary

Return type:

ExtendedAttributeGroups

Parameters:

data (Dict)

class wxdi.dq_validator.provider.response_model.GlossaryTermEntity(*args: Any, **kwargs: Any)#

Bases: BaseModel

Main entity model for glossary term

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

extended_attribute_groups: ExtendedAttributeGroups | None = None#
classmethod from_dict(data: Dict)#

Create GlossaryTermEntity instance from dictionary

Return type:

GlossaryTermEntity

Parameters:

data (Dict)

class wxdi.dq_validator.provider.response_model.GlossaryTerm(*args: Any, **kwargs: Any)#

Bases: BaseModel

Root model for glossary term artifact

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

metadata: Metadata#
entity: GlossaryTermEntity#
classmethod from_dict(data: Dict)#

Create GlossaryTerm instance from dictionary

Return type:

GlossaryTerm

Parameters:

data (Dict)

classmethod from_json(json_str: str)#

Create GlossaryTerm instance from JSON string

Return type:

GlossaryTerm

Parameters:

json_str (str)

to_dict()#

Convert GlossaryTerm instance to dictionary

Return type:

Dict

to_json(indent: int | None = None)#

Convert GlossaryTerm instance to JSON string

Return type:

str

Parameters:

indent (int | None)

wxdi.dq_validator.provider.response_model.example_usage()#

Demonstrates how to use the GlossaryTerm model classes

Usage Examples#

See REST API Integration for detailed usage examples.