Malware Transform
Please see the set of transform project conventions for details on general project conventions, transform configuration, testing and IDE set up.
Summary
This filter scans the 'contents' column of an input table using ClamAV, and outputs corresponding tables containing 'virus_detection' column (by default).
If a virus is detected, the 'virus_detection' column contains the detected virus signature name; otherwise null.
Pre-requisites for Mac
For testing and running this transform on local, we are using a unix socket shared with a docker container.
However, docker for mac doesn't support a shared unix socket.
For Mac users, ClamAV will be set up by running make venv
.
If thet script doesn't work for you, please ensure that you have installed clamd
command, and it runs with a local unix socket: /var/run/clamav/clamd.ctl
.
Example for manual set up for Mac:
- Install ClamAV with Homebrew
- Copy and edit config files.
cp $(brew --prefix)/etc/clamav/clamd.conf.sample $(brew --prefix)/etc/clamav/clamd.conf sed -i '' -e 's/^Example/# Example/' $(brew --prefix)/etc/clamav/clamd.conf echo "DatabaseDirectory /var/lib/clamav" >> $(brew --prefix)/etc/clamav/clamd.conf echo "LocalSocket /var/run/clamav/clamd.ctl" >> $(brew --prefix)/etc/clamav/clamd.conf cp $(brew --prefix)/etc/clamav/freshclam.conf.sample $(brew --prefix)/etc/clamav/freshclam.conf sed -i '' -e 's/^Example/# Example/' $(brew --prefix)/etc/clamav/freshclam.conf echo "DatabaseDirectory /var/lib/clamav" >> $(brew --prefix)/etc/clamav/freshclam.conf
- Create a directory for a local unix socket
- Create a direcotry for a database of ClamAV
- Update a database of ClamAV
- Edit
venv/bin/activate
, and add following lines to startclamd
bysource venv/bin/activate
Configuration and Command Line Options
The set of dictionary keys holding MalwareTransform configuration for values are as follows:
- malware_input_column - specifies the input column's name to scan. (default:
contents
) - malware_output_column - specifies the output column's name of the detected virus signature name. (default:
virus_detection
)
Metadata Fields
As shown in the output of the local run of malware transform, the metadata contains several statistics:
* Global statistics:
* infected
: total number of documents (rows) in which any malwares were detected.
* clean
: total number of documents (rows) in which no malwares were detected.
Running
Launched Command Line Options
The following command line arguments are available in addition to the options provided by the python launcher and the python launcher.
--malware_input_column MALWARE_INPUT_COLUMN
input column name
--malware_output_column MALWARE_OUTPUT_COLUMN
output column name
Running the samples
To run the samples, use the following make
targets
run-cli-sample
- runs src/malware_transform_python.py using command line argsrun-local-sample
- runs src/malware_local.pyrun-local-python-sample
- runs src/malware_local_python.py
These targets will activate the virtual environment and set up any configuration needed.
Use the -n
option of make
to see the detail of what is done to run the sample.
For example,
Then To see results of the transform.Transforming data using the transform image
To use the transform image to transform your data, please refer to the running images quickstart, substituting the name of this transform image and runtime as appropriate.