Data Product Recommender Reference#

The data product recommender module provides query-log analysis and platform-specific parsers.

Core Classes#

Data Product Recommender - Core recommendation engine

wxdi.data_product_recommender.recommender.normalize_query_pattern(query_text: str)#

Normalize a SQL query into a pattern by replacing literals with placeholders.

This helps group similar queries together by removing variable parts like: - Numbers (123 -> ?) - Quoted strings (‘value’ -> ?) - Date literals (‘2024-01-01’ -> ?)

Parameters:

query_text (str) – The SQL query text to normalize

Return type:

str

Returns:

Normalized query pattern with literals replaced by ?

class wxdi.data_product_recommender.recommender.DataProductRecommender(parser: QueryLogParser)#

Bases: object

Analyzes query logs and recommends data products

Parameters:

parser (QueryLogParser)

ERROR_QUERY_LOGS_NOT_LOADED = 'Query logs not loaded'#
load_query_logs_from_json_file(file_path: str)#

Load and normalize query logs from JSON file

Parameters:

file_path (str)

load_query_logs_from_csv_file(file_path: str)#

Load and normalize query logs from CSV or JSON file

Parameters:

file_path (str)

calculate_metrics()#

Calculate metrics for each table

Return type:

DataFrame

score_tables(query_weight=0.375, user_weight=0.375, recency_weight=0.15, consistency_weight=0.1)#

Score tables based on weighted metrics

Parameters:
  • query_weight (default: 0.375) – Weight for query frequency (default 0.375)

  • user_weight (default: 0.375) – Weight for user diversity (default 0.375)

  • recency_weight (default: 0.15) – Weight for recent activity (default 0.15)

  • consistency_weight (default: 0.1) – Weight for consistent usage over time (default 0.10)

Returns:

DataFrame with scored tables sorted by recommendation_score

Note

Relationship metrics are not included as standalone tables are packaged in isolation without their related tables.

get_top_query_patterns(tables: List[str], top_n: int = 5)#

Get the most frequent query patterns for a set of tables.

Parameters:
  • tables (List[str]) – List of table names to analyze

  • top_n (int, default: 5) – Number of top patterns to return (default 5)

Returns:

  • pattern: The normalized query pattern

  • count: Number of times this pattern appears

  • example: An actual query text example

  • tables_used: Tables from the group that appear in this pattern

Return type:

List[Dict]

identify_table_groups(min_cooccurrence=None)#

Identify groups of frequently co-occurring tables

Parameters:

min_cooccurrence (default: None) – Minimum number of times tables must appear together. If None, automatically calculated as 0.01% of total queries (minimum 2, maximum 100)

Returns:

(table_group, count) sorted by count descending

Return type:

List of tuples

recommend_data_products(num_recommendations=10, min_score=None, min_frequency_threshold=0.1, min_group_size=2, max_group_size=10)#

Generate final data product recommendations using frequency-based clustering

Parameters:
  • num_recommendations (default: 10) – Maximum number of top recommendations to return

  • min_score (default: None) – Minimum recommendation score threshold (0-100). Tables below this score will be excluded from all recommendations. Also used to filter standalone (unclustered) tables. If None, no score filtering is applied.

  • min_frequency_threshold (default: 0.1) – Minimum join frequency for clustering (0.0-1.0)

  • min_group_size (default: 2) – Minimum tables in a cluster

  • max_group_size (default: 10) – Maximum tables in a cluster

Returns:

Dictionary with ‘individual_tables’, ‘table_groups’, and optionally ‘standalone_tables’

export_recommendations_markdown(recommendations: dict, output_file: str)#

Export recommendations to markdown file

Parameters:
  • recommendations (dict)

  • output_file (str)

export_recommendations_json(recommendations: dict, output_file: str)#

Export recommendations to JSON file for agent consumption

Parameters:
  • recommendations (dict)

  • output_file (str)

Platform Parsers#

Platform-specific query log parsers for Snowflake, Databricks, BigQuery, and watsonx.data

Note: This module only provides query parsing functionality. Database connections are not supported - use file-based input instead.

class wxdi.data_product_recommender.platforms.SnowflakeQueryParser#

Bases: QueryLogParser

Snowflake-specific query parser

normalize_columns(df: pandas.DataFrame)#

Normalize Snowflake column names

Return type:

DataFrame

Parameters:

df (pandas.DataFrame)

extract_tables(query_text: str)#

Extract table names from SQL query using regex patterns

Return type:

List[str]

Parameters:

query_text (str)

class wxdi.data_product_recommender.platforms.DatabricksQueryParser#

Bases: QueryLogParser

Databricks-specific query parser

normalize_columns(df: pandas.DataFrame)#

Normalize Databricks column names

Return type:

DataFrame

Parameters:

df (pandas.DataFrame)

extract_tables(query_text: str)#

Extract table names from SQL query using regex patterns

Return type:

List[str]

Parameters:

query_text (str)

class wxdi.data_product_recommender.platforms.BigQueryQueryParser#

Bases: QueryLogParser

BigQuery-specific query parser

normalize_columns(df: pandas.DataFrame)#

Normalize BigQuery column names

Return type:

DataFrame

Parameters:

df (pandas.DataFrame)

extract_tables(query_text: str)#

Extract table names from BigQuery SQL query using regex patterns

Return type:

List[str]

Parameters:

query_text (str)

class wxdi.data_product_recommender.platforms.WatsonxDataQueryParser#

Bases: QueryLogParser

watsonx.data-specific query parser

normalize_columns(df: pandas.DataFrame)#

Normalize watsonx.data column names

Return type:

DataFrame

Parameters:

df (pandas.DataFrame)

extract_tables(query_text: str)#

Extract table names from SQL query using regex patterns

Return type:

List[str]

Parameters:

query_text (str)

Base Classes#

Abstract base class for query log parsers.

class wxdi.data_product_recommender.base.QueryLogParser#

Bases: ABC

Abstract base class for parsing query logs

abstractmethod normalize_columns(df: pandas.DataFrame)#

Normalize column names to standard format

Return type:

DataFrame

Parameters:

df (pandas.DataFrame)

abstractmethod extract_tables(query_text: str)#

Extract table names from SQL query

Return type:

List[str]

Parameters:

query_text (str)