RecordLinkageManager#

class RecordLinkageManager#

RecordLinkageManager class for managing the Privacy-Preserving Record-Linkage (PPRL) protocol. See example for how to use this class in examples/python/notebooks/17_Entity_resolution.ipynb

apply_secret_key_to_records(self: pyhelayers.RecordLinkageManager, package_other: pyhelayers.RecordLinkagePackage) None#

Given a RecordLinkagePackage from the other participant in the protocol, multiplies each EC point by by the secret key of this RecordLinkageManager.

Parameters:

package_other – the RecordLinkagePackage from the other participant

encrypt_fields_for_equal_rule(self: pyhelayers.RecordLinkageManager) pyhelayers.RecordLinkagePackage#

Concatenates the content of the fields that the RL_RULE_EQUAL applies for, hash this value, transform it to an EC point and multiply it by the secret key of this RecordLinkageManager. returns a RecordLinkagePackage containing the encrypted data to be transferred to the other participant in the protocol.

encrypt_fields_for_similar_rule(self: pyhelayers.RecordLinkageManager) pyhelayers.RecordLinkagePackage#

If the current rule contains fields with RL_RULE_EQUAL rule type, the following steps will be applied only for records with matches from the previous steps, otherwise they will be applied to all non-matched records. Generate shingles sets for each record, generate min-hash for these shingle sets, transform the hashes to EC points and multiply them by the secret key of this RecordLinkageManager. returns a RecordLinkagePackage containing the encrypted data to be transferred to the other participant in the protocol.

get_next_expected_function_name(self: pyhelayers.RecordLinkageManager) str#

Get the next expected function to be called according to the current state of this RecordLinkageManager and by the protocol.

get_num_matched_records(self: pyhelayers.RecordLinkageManager, report_blocked: bool = False) Tuple[int, int]#

Find matching pairs of records from both party’s tables. To be called after all the above steps of the protocol have already been carried out.

Parameters:

report_blocked – indicates whether to include “blocked” records (records that had been matched with more than one record) in the report

get_num_of_records(self: pyhelayers.RecordLinkageManager) int#

Get the number of processed records.

init_records_from_file(self: pyhelayers.RecordLinkageManager, csv_path: str, num_of_samples_to_take: int) None#

Reads record from file and performs pre-processing and encryption

Parameters:
  • csv_path – File path of csv file that contains the table records

  • num_of_samples_to_take – Number of records to read. The default is -1 implying all records will be read.

match_records_by_equal_rule(self: pyhelayers.RecordLinkageManager, package_own: pyhelayers.RecordLinkagePackage, package_other: pyhelayers.RecordLinkagePackage) None#

Given two doubly encrypted RecordLinkagePackage objects, finds the matching records. If the current rule contains fields with RL_RULE_SIMILAR rule type, the resulted matches will be kept in an temporary data structure, to be processed at the end of the iteration. Otherwise, it will be used to update the Similarity Graph.

Parameters:
  • package_own – RecordLinkagePackage originated from this RecordLinkageManager after the other participant applied its secret key to it

  • package_other – RecordLinkagePackage originated in the other participant’s RecordLinkageManager after applying this RecordLinkageManager’s secret key to it

match_records_by_similar_rule(self: pyhelayers.RecordLinkageManager, package_own: pyhelayers.RecordLinkagePackage, package_other: pyhelayers.RecordLinkagePackage) None#

Given two doubly encrypted RecordLinkagePackage objects, finds the matching records. If the current rule contains fields with RL_RULE_EQUAL rule type, the resulted matches will be intersected with the resulting matches from the RL_RULE_EQUAL protocol part. The matches will be used to update the Similarity Graph

Parameters:
  • package_own – RecordLinkagePackage originated from this RecordLinkageManager after the other participant applied its secret key to it

  • package_other – RecordLinkagePackage originated in the other participant’s RecordLinkageManager after applying this RecordLinkageManager’s secret key to it

report_matched_records(self: pyhelayers.RecordLinkageManager, report_blocked: bool = False) Tuple[int, int]#

Find and print matching pairs of records from both party’s tables. To be called after all the above steps of the protocol have already been carried out.

Parameters:

report_blocked – indicates whether to include “blocked” records (records that had been matched with more than one record) in the report

report_matched_records_along_with_other_side_records(self: pyhelayers.RecordLinkageManager, rlm: pyhelayers.RecordLinkageManager, report_blocked: bool = False) Tuple[int, int]#

Find and print matching pairs of records from both party’s tables. To be called after all the above steps of the protocol have already been carried out. The printout will also uses the other side’s RecordLinkageManager to extract the other information about the matched record of the other side. This is for debugging purposes only, since no side is expected to have access to both RecordLinkageManagers.

Parameters:
  • rlm – “other” party’s RecordLinkageManager.

  • report_blocked – indicates whether to include “blocked” records (records that had been matched with more than one record) in the report

set_current_rule(self: pyhelayers.RecordLinkageManager, rule: pyhelayers.RecordLinkageRule) None#

Set a rule for following iteration of the protocol. If the rule contains fields with RL_RULE_EQUAL rule type, the user must first apply the functions related to this rule type (encryptFieldsForEqualRule, applySecretKeyToRecords, matchRecordsByEqualRule). At each iteration, the given rule will only be applied to try to match non-matched records (as indicated by the Similarity Graph).

Parameters:

rule – the rule to apply in this iteration. This rule will also be added to the output RecordLinkagePackage object, and to this RecordLinkageManager object.