Skip to content

Malware Transform

Please see the set of transform project conventions for details on general project conventions, transform configuration, testing and IDE set up.

Summary

This filter scans the 'contents' column of an input table using ClamAV, and outputs corresponding tables containing 'virus_detection' column (by default).

If a virus is detected, the 'virus_detection' column contains the detected virus signature name; otherwise null.

Pre-requisites for Mac

For testing and running this transform on local, we are using a unix socket shared with a docker container. However, docker for mac doesn't support a shared unix socket. For Mac users, ClamAV will be set up by running make venv. If thet script doesn't work for you, please ensure that you have installed clamd command, and it runs with a local unix socket: /var/run/clamav/clamd.ctl.

Example for manual set up for Mac:

  1. Install ClamAV with Homebrew
    brew install clamav
    
  2. Copy and edit config files.
    cp $(brew --prefix)/etc/clamav/clamd.conf.sample $(brew --prefix)/etc/clamav/clamd.conf
    sed -i '' -e 's/^Example/# Example/' $(brew --prefix)/etc/clamav/clamd.conf
    echo "DatabaseDirectory /var/lib/clamav" >> $(brew --prefix)/etc/clamav/clamd.conf
    echo "LocalSocket /var/run/clamav/clamd.ctl" >> $(brew --prefix)/etc/clamav/clamd.conf
    cp $(brew --prefix)/etc/clamav/freshclam.conf.sample $(brew --prefix)/etc/clamav/freshclam.conf
    sed -i '' -e 's/^Example/# Example/' $(brew --prefix)/etc/clamav/freshclam.conf
    echo "DatabaseDirectory /var/lib/clamav" >> $(brew --prefix)/etc/clamav/freshclam.conf
    
  3. Create a directory for a local unix socket
    sudo mkdir -p /var/run/clamav
    sudo chown $(id -u):$(id -g) /var/run/clamav
    
  4. Create a direcotry for a database of ClamAV
    sudo mkdir -p /var/lib/clamav
    sudo chown $(id -u):$(id -g) /var/lib/clamav
    
  5. Update a database of ClamAV
    freshclam
    
  6. Edit venv/bin/activate, and add following lines to start clamd by source venv/bin/activate
    if [ ! -e /var/run/clamav/clamd.ctl ]; then
        clamd --config-file=$(brew --prefix)/etc/clamav/clamd.conf
    fi
    

Configuration and Command Line Options

The set of dictionary keys holding MalwareTransform configuration for values are as follows:

  • malware_input_column - specifies the input column's name to scan. (default: contents)
  • malware_output_column - specifies the output column's name of the detected virus signature name. (default: virus_detection)

Metadata Fields

As shown in the output of the local run of malware transform, the metadata contains several statistics: * Global statistics:
* infected: total number of documents (rows) in which any malwares were detected.
* clean: total number of documents (rows) in which no malwares were detected.

Running

Launched Command Line Options

The following command line arguments are available in addition to the options provided by the python launcher and the python launcher.

  --malware_input_column MALWARE_INPUT_COLUMN
                        input column name
  --malware_output_column MALWARE_OUTPUT_COLUMN
                        output column name

Running the samples

To run the samples, use the following make targets

  • run-cli-sample - runs src/malware_transform_python.py using command line args
  • run-local-sample - runs src/malware_local.py
  • run-local-python-sample - runs src/malware_local_python.py

These targets will activate the virtual environment and set up any configuration needed. Use the -n option of make to see the detail of what is done to run the sample.

For example,

make run-cli-sample
...
Then
ls output
To see results of the transform.

Transforming data using the transform image

To use the transform image to transform your data, please refer to the running images quickstart, substituting the name of this transform image and runtime as appropriate.