Ededup Python Transform
Please see the set of transform project conventions for details on general project conventions, transform configuration, testing and IDE set up.
Also see here on details of implementation
Summary
This is a python version of ededup
Configuration and command line Options
See common ededup parameters
Running
Launched Command Line Options
The following command line arguments are available in addition to the options provided by the python launcher.
--ededup_doc_column EDEDUP_DOC_COLUMN
name of the column containing document
--ededup_doc_id_column EDEDUP_DOC_ID_COLUMN
name of the column containing document id
--ededup_use_snapshot EDEDUP_USE_SNAPSHOT
flag to continue from snapshot
--ededup_snapshot_directory EDEDUP_SNAPSHOT_DIRECTORY
location of snapshot files
To use the transform image to transform your data, please refer to the running images quickstart, substituting the name of this transform image and runtime as appropriate.