Clustering¶
generate_clusters(df, field_name, sim_df, threshold=0.4, verbose=0, cluster_algorithm='greedy_incremental')
¶
Generates clusters from a DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
DataFrame with entities to cluster. |
required |
field_name |
str
|
Name of the field with the entity information (e.g., |
required |
threshold |
float
|
Similarity value above which entities will be considered similar, defaults to 0.4 |
0.4
|
sim_df |
DataFrame
|
DataFrame with similarities ( |
required |
verbose |
int
|
How much information will be displayed. Options: - 0: Errors, - 1: Warnings, - 2: All Defaults to 0 |
0
|
cluster_algorithm |
str
|
Clustering algorithm to use. Options: - |
'greedy_incremental'
|
Returns:
Type | Description |
---|---|
pd.DataFrame
|
DataFrame with entities and the cluster they belong to. |
Raises:
Type | Description |
---|---|
NotImplementedError
|
Clustering algorithm is not supported |