RecordLinkageManager#
- class RecordLinkageManager#
RecordLinkageManager class for managing the Privacy-Preserving Record-Linkage (PPRL) protocol. See example for how to use this class in examples/python/notebooks/17_Entity_resolution.ipynb
- apply_secret_key_to_records(self: pyhelayers.RecordLinkageManager, package_other: pyhelayers.RecordLinkagePackage) None #
Given a RecordLinkagePackage from the other participant in the protocol, multiplies each EC point by by the secret key of this RecordLinkageManager.
- Parameters:
package_other – the RecordLinkagePackage from the other participant
- encrypt_fields_for_equal_rule(self: pyhelayers.RecordLinkageManager) pyhelayers.RecordLinkagePackage #
Concatenates the content of the fields that the RL_RULE_EQUAL applies for, hash this value, transform it to an EC point and multiply it by the secret key of this RecordLinkageManager. returns a RecordLinkagePackage containing the encrypted data to be transferred to the other participant in the protocol.
- encrypt_fields_for_similar_rule(self: pyhelayers.RecordLinkageManager) pyhelayers.RecordLinkagePackage #
If the current rule contains fields with RL_RULE_EQUAL rule type, the following steps will be applied only for records with matches from the previous steps, otherwise they will be applied to all non-matched records. Generate shingles sets for each record, generate min-hash for these shingle sets, transform the hashes to EC points and multiply them by the secret key of this RecordLinkageManager. returns a RecordLinkagePackage containing the encrypted data to be transferred to the other participant in the protocol.
- get_next_expected_function_name(self: pyhelayers.RecordLinkageManager) str #
Get the next expected function to be called according to the current state of this RecordLinkageManager and by the protocol.
- get_num_matched_records(self: pyhelayers.RecordLinkageManager, report_blocked: bool = False) Tuple[int, int] #
Find matching pairs of records from both party’s tables. To be called after all the above steps of the protocol have already been carried out.
- Parameters:
report_blocked – indicates whether to include “blocked” records (records that had been matched with more than one record) in the report
- get_num_of_records(self: pyhelayers.RecordLinkageManager) int #
Get the number of processed records.
- init_records_from_file(self: pyhelayers.RecordLinkageManager, csv_path: str, num_of_samples_to_take: int) None #
Reads record from file and performs pre-processing and encryption
- Parameters:
csv_path – File path of csv file that contains the table records
num_of_samples_to_take – Number of records to read. The default is -1 implying all records will be read.
- match_records_by_equal_rule(self: pyhelayers.RecordLinkageManager, package_own: pyhelayers.RecordLinkagePackage, package_other: pyhelayers.RecordLinkagePackage) None #
Given two doubly encrypted RecordLinkagePackage objects, finds the matching records. If the current rule contains fields with RL_RULE_SIMILAR rule type, the resulted matches will be kept in an temporary data structure, to be processed at the end of the iteration. Otherwise, it will be used to update the Similarity Graph.
- Parameters:
package_own – RecordLinkagePackage originated from this RecordLinkageManager after the other participant applied its secret key to it
package_other – RecordLinkagePackage originated in the other participant’s RecordLinkageManager after applying this RecordLinkageManager’s secret key to it
- match_records_by_similar_rule(self: pyhelayers.RecordLinkageManager, package_own: pyhelayers.RecordLinkagePackage, package_other: pyhelayers.RecordLinkagePackage) None #
Given two doubly encrypted RecordLinkagePackage objects, finds the matching records. If the current rule contains fields with RL_RULE_EQUAL rule type, the resulted matches will be intersected with the resulting matches from the RL_RULE_EQUAL protocol part. The matches will be used to update the Similarity Graph
- Parameters:
package_own – RecordLinkagePackage originated from this RecordLinkageManager after the other participant applied its secret key to it
package_other – RecordLinkagePackage originated in the other participant’s RecordLinkageManager after applying this RecordLinkageManager’s secret key to it
- report_matched_records(self: pyhelayers.RecordLinkageManager, report_blocked: bool = False) Tuple[int, int] #
Find and print matching pairs of records from both party’s tables. To be called after all the above steps of the protocol have already been carried out.
- Parameters:
report_blocked – indicates whether to include “blocked” records (records that had been matched with more than one record) in the report
- report_matched_records_along_with_other_side_records(self: pyhelayers.RecordLinkageManager, rlm: pyhelayers.RecordLinkageManager, report_blocked: bool = False) Tuple[int, int] #
Find and print matching pairs of records from both party’s tables. To be called after all the above steps of the protocol have already been carried out. The printout will also uses the other side’s RecordLinkageManager to extract the other information about the matched record of the other side. This is for debugging purposes only, since no side is expected to have access to both RecordLinkageManagers.
- Parameters:
rlm – “other” party’s RecordLinkageManager.
report_blocked – indicates whether to include “blocked” records (records that had been matched with more than one record) in the report
- set_current_rule(self: pyhelayers.RecordLinkageManager, rule: pyhelayers.RecordLinkageRule) None #
Set a rule for following iteration of the protocol. If the rule contains fields with RL_RULE_EQUAL rule type, the user must first apply the functions related to this rule type (encryptFieldsForEqualRule, applySecretKeyToRecords, matchRecordsByEqualRule). At each iteration, the given rule will only be applied to try to match non-matched records (as indicated by the Similarity Graph).
- Parameters:
rule – the rule to apply in this iteration. This rule will also be added to the output RecordLinkagePackage object, and to this RecordLinkageManager object.
-
class RecordLinkageManager#
RecordLinkageManager class for managing the Privacy-Preserving Record-Linkage (PPRL) protocol.
See example for how to use this class in examples/cpp/er/er_basic_example.cpp
Subclassed by helayers::er::RecordLinkageMockManager
Public Functions
-
RecordLinkageManager(const RecordLinkageConfig &config)#
Construct a new RecordLinkageConfig object.
- Parameters:
config – The Record-Linkage configuration with record field definitions and various tunings of the Record-Linkage algorithm and heuristics.
- Throws:
runtime_error – if RecordLinkageConfig is not fully initialized
-
inline virtual ~RecordLinkageManager()#
-
virtual void initRecordsFromFile(std::string csvPath, int numOfSamplesToTake = -1)#
Reads record from file and performs pre-processing.
- Parameters:
csvPath – File path of csv file that contains the table records
numOfSamplesToTake – Number of records to read. The default is -1 implying all records will be read.
-
virtual void initRecordsFromStream(std::istream &stream, int numOfSamplesToTake = -1)#
Reads record from stream and performs pre-processing.
- Parameters:
stream – Stream to read records from
numOfSamplesToTake – Number of records to read. The default is -1 implying all records will be read.
-
inline int getNumOfRecords() const#
Get the number of processed records.
-
void setCurrentRule(const RecordLinkageRule &rule)#
Set a rule for following iteration of the protocol.
If the rule contains fields with RL_RULE_EQUAL rule type, the user must first apply the functions related to this rule type (encryptFieldsForEqualRule, applySecretKeyToRecords, matchRecordsByEqualRule). At each iteration, the given rule will only be applied to try to match non-matched records (as indicated by the Similarity Graph).
- Parameters:
rule – the rule to apply in this iteration. This rule will also be added to the output RecordLinkagePackage object, and to this RecordLinkageManager object.
- Throws:
runtime_error – if called at the middle of an iteration run.
-
RecordLinkagePackage encryptFieldsForEqualRule()#
Concatenates the content of the fields that the RL_RULE_EQUAL applies for, hash this value, transform it to an EC point and multiply it by the secret key of this RecordLinkageManager.
returns a RecordLinkagePackage containing the encrypted data to be transferred to the other participant in the protocol.
- Throws:
runtime_error – if called when not expected by the protocol
-
void matchRecordsByEqualRule(const RecordLinkagePackage &packageOwn, const RecordLinkagePackage &packageOther)#
Given two doubly encrypted RecordLinkagePackage objects, finds the matching records.
If the current rule contains fields with RL_RULE_SIMILAR rule type, the resulted matches will be kept in an temporary data structure, to be processed at the end of the iteration. Otherwise, it will be used to update the Similarity Graph.
- Parameters:
packageOwn – RecordLinkagePackage originated from this RecordLinkageManager after the other participant applied its secret key to it
packageOther – RecordLinkagePackage originated in the other participant’s RecordLinkageManager after applying this RecordLinkageManager’s secret key to it
- Throws:
runtime_error – if the rule attached to the given RecordLinkagePackage doesn’t match the current rule in this RecordLinkageManager object.
runtime_error – if called when not expected by the protocol
-
RecordLinkagePackage encryptFieldsForSimilarRule()#
If the current rule contains fields with RL_RULE_EQUAL rule type, the following steps will be applied only for records with matches from the previous steps, otherwise they will be applied to all non-matched records.
Generate shingles sets for each record, generate min-hash for these shingle sets, transform the hashes to EC points and multiply them by the secret key of this RecordLinkageManager. returns a RecordLinkagePackage containing the encrypted data to be transferred to the other participant in the protocol.
- Throws:
runtime_error – if called when not expected by the protocol
-
void matchRecordsBySimilarRule(const RecordLinkagePackage &packageOwn, const RecordLinkagePackage &packageOther)#
Given two doubly encrypted RecordLinkagePackage objects, finds the matching records.
If the current rule contains fields with RL_RULE_EQUAL rule type, the resulted matches will be intersected with the resulting matches from the RL_RULE_EQUAL protocol part. The matches will be used to update the Similarity Graph
- Parameters:
packageOwn – RecordLinkagePackage originated from this RecordLinkageManager after the other participant applied its secret key to it
packageOther – RecordLinkagePackage originated in the other participant’s RecordLinkageManager after applying this RecordLinkageManager’s secret key to it
- Throws:
runtime_error – if the rule attached to the given RecordLinkagePackage doesn’t match the current rule in this RecordLinkageManager object.
runtime_error – if called when not expected by the protocol
-
void applySecretKeyToRecords(RecordLinkagePackage &packageOther)#
Given a RecordLinkagePackage from the other participant in the protocol, multiplies each EC point by by the secret key of this RecordLinkageManager.
- Parameters:
packageOther – the RecordLinkagePackage from the other participant
- Throws:
runtime_error – if the rule attached to the given RecordLinkagePackage doesn’t match the current rule in this RecordLinkageManager object.
runtime_error – if called when not expected by the protocol
-
std::pair<int, int> reportMatchedRecords(bool reportBlocked = false)#
find and print matching pairs of records from both party’s tables as indicated by the current state of the Similarity Graph.
Returns the number of matches and number of blocked records.
- Parameters:
reportBlocked – indicates whether to include “blocked” records (records that had been matched with more than one record) in the report
-
std::pair<int, int> getNumMatchedRecords(bool reportBlocked = false)#
find matching pairs of records from both party’s tables as indicated by the current state of the Similarity Graph.
Returns the number of matches and number of blocked records.
- Parameters:
reportBlocked – indicates whether to include “blocked” records (records that had been matched with more than one record) in the report
-
std::pair<int, int> reportMatchedRecordsAlongWithOtherSideRecords(const RecordLinkageManager &rlmOther, bool reportBlocked = false)#
find and print matching pairs of records from both party’s tables.
To be called after all the above steps of the protocol have already been carried out. Returns the number of matches and number of blocked records. The printout will also uses the other side’s RecordLinkageManager to extract the other information about the matched record of the other side. This is for debugging purposes only, since no side is expected to have access to both RecordLinkageManagers.
- Parameters:
rlmOther – “other” party’s RecordLinkageManager.
reportBlocked – indicates whether to include “blocked” records (records that had been matched with more than one record) in the report
-
std::string getNextExpectedFunctionName() const#
Get the next expected function to be called according to the current state of this RecordLinkageManager and by the protocol.
-
RecordLinkageManager(const RecordLinkageConfig &config)#