Storage

Function Fuse Storage classes define how the results produced by Nodes are stored. All Workflows can have a Storage class attached using the set_storage() method. We use the storage_factory to instantiate a storage object from a configuration dictionary, with “kind” key indicating the storage class to use, and “options” providing class-specific options:

from functionfuse.storage import storage_factory

opt = {
    "kind": "file",
    "options": {
        "path": "storage"
    }
}

storage = storage_factory(opt)

The basic template for using a storage object looks as follows:

from functionfuse.backends.builtin.localback import LocalWorkflow

the_workflow_name = "operations"
local_workflow = LocalWorkflow(node1, workflow_name=the_workflow_name)

local_workflow.set_storage(storage)
_ = local_workflow.run()

This will set the FileStorage object declared above into the LocalWorkflow and the combination of the backend and the storage dictate how and where node outputs are saved.

After running a workflow, instantiating a storage class with identical options will provide access to list and read task results from storage:

from functionfuse.storage import storage_factory

opt = {
    "kind": "file",
    "options": {
        "path": "storage"
    }
}

storage = storage_factory(opt)
all_tasks = storage.list_tasks(workflow_name=the_workflow_name, pattern="*")
print("All graph node names: ", all_tasks)

node2_result = storage.read_task(workflow_name=the_workflow_name, task_name="node2")

Serializers

Storage can utilize Serializers attached to the storage object. The intention of serializers is to be able to flexibly assign custom serialization approaches for certain types of objects to any storage object. For example, to utilize custom serialization of dask arrays to hdf5 files (rather than using pickle to dump to file), register the DaskArraySerializer:

from functionfuse.storage import storage_factory
from functionfuse.serializers.daskarray import DaskArraySerializer

opt = {
   "kind": "file",
   "options": {
      "path": "storage"
   }
}
storage = storage_factory(opt)
storage.register_persistent_serializers(DaskArraySerializer)

Protocols

Different storage classes may use different Protocols for opening, reading to, and writing from files. To make serializers, and other functions that might want to interface with a storage object, aware of the protocols a storage class uses, each storage class implements a protocols dictionary that is passed to the pickle and unpickle functions used by serializers. Currently there are 2 default keys used by this dictionary (FILE_PROTOCOL, S3_PROTOCOL) that custom pickle functions can reference for how to access the file protocols used by the storage class.

Storage Classes