Data Product Recommender Reference#
The data product recommender module provides query-log analysis and platform-specific parsers.
Core Classes#
Data Product Recommender - Core recommendation engine
- wxdi.data_product_recommender.recommender.normalize_query_pattern(query_text: str)#
Normalize a SQL query into a pattern by replacing literals with placeholders.
This helps group similar queries together by removing variable parts like: - Numbers (123 -> ?) - Quoted strings (‘value’ -> ?) - Date literals (‘2024-01-01’ -> ?)
- class wxdi.data_product_recommender.recommender.DataProductRecommender(parser: QueryLogParser)#
Bases:
objectAnalyzes query logs and recommends data products
- Parameters:
parser (QueryLogParser)
- ERROR_QUERY_LOGS_NOT_LOADED = 'Query logs not loaded'#
- load_query_logs_from_json_file(file_path: str)#
Load and normalize query logs from JSON file
- Parameters:
file_path (str)
- load_query_logs_from_csv_file(file_path: str)#
Load and normalize query logs from CSV or JSON file
- Parameters:
file_path (str)
- score_tables(query_weight=0.375, user_weight=0.375, recency_weight=0.15, consistency_weight=0.1)#
Score tables based on weighted metrics
- Parameters:
query_weight (default:
0.375) – Weight for query frequency (default 0.375)user_weight (default:
0.375) – Weight for user diversity (default 0.375)recency_weight (default:
0.15) – Weight for recent activity (default 0.15)consistency_weight (default:
0.1) – Weight for consistent usage over time (default 0.10)
- Returns:
DataFrame with scored tables sorted by recommendation_score
Note
Relationship metrics are not included as standalone tables are packaged in isolation without their related tables.
- get_top_query_patterns(tables: List[str], top_n: int = 5)#
Get the most frequent query patterns for a set of tables.
- Parameters:
- Returns:
pattern: The normalized query pattern
count: Number of times this pattern appears
example: An actual query text example
tables_used: Tables from the group that appear in this pattern
- Return type:
- identify_table_groups(min_cooccurrence=None)#
Identify groups of frequently co-occurring tables
- Parameters:
min_cooccurrence (default:
None) – Minimum number of times tables must appear together. If None, automatically calculated as 0.01% of total queries (minimum 2, maximum 100)- Returns:
(table_group, count) sorted by count descending
- Return type:
List of tuples
- recommend_data_products(num_recommendations=10, min_score=None, min_frequency_threshold=0.1, min_group_size=2, max_group_size=10)#
Generate final data product recommendations using frequency-based clustering
- Parameters:
num_recommendations (default:
10) – Maximum number of top recommendations to returnmin_score (default:
None) – Minimum recommendation score threshold (0-100). Tables below this score will be excluded from all recommendations. Also used to filter standalone (unclustered) tables. If None, no score filtering is applied.min_frequency_threshold (default:
0.1) – Minimum join frequency for clustering (0.0-1.0)min_group_size (default:
2) – Minimum tables in a clustermax_group_size (default:
10) – Maximum tables in a cluster
- Returns:
Dictionary with ‘individual_tables’, ‘table_groups’, and optionally ‘standalone_tables’
Platform Parsers#
Platform-specific query log parsers for Snowflake, Databricks, BigQuery, and watsonx.data
Note: This module only provides query parsing functionality. Database connections are not supported - use file-based input instead.
- class wxdi.data_product_recommender.platforms.SnowflakeQueryParser#
Bases:
QueryLogParserSnowflake-specific query parser
- normalize_columns(df: pandas.DataFrame)#
Normalize Snowflake column names
- Return type:
- Parameters:
df (pandas.DataFrame)
- class wxdi.data_product_recommender.platforms.DatabricksQueryParser#
Bases:
QueryLogParserDatabricks-specific query parser
- normalize_columns(df: pandas.DataFrame)#
Normalize Databricks column names
- Return type:
- Parameters:
df (pandas.DataFrame)
- class wxdi.data_product_recommender.platforms.BigQueryQueryParser#
Bases:
QueryLogParserBigQuery-specific query parser
- normalize_columns(df: pandas.DataFrame)#
Normalize BigQuery column names
- Return type:
- Parameters:
df (pandas.DataFrame)
- class wxdi.data_product_recommender.platforms.WatsonxDataQueryParser#
Bases:
QueryLogParserwatsonx.data-specific query parser
- normalize_columns(df: pandas.DataFrame)#
Normalize watsonx.data column names
- Return type:
- Parameters:
df (pandas.DataFrame)
Base Classes#
Abstract base class for query log parsers.
- class wxdi.data_product_recommender.base.QueryLogParser#
Bases:
ABCAbstract base class for parsing query logs
- abstractmethod normalize_columns(df: pandas.DataFrame)#
Normalize column names to standard format
- Return type:
- Parameters:
df (pandas.DataFrame)