Backend Development
Function Fuse backends need to define how the DAG specified by the frontend will be executed. We recommend that developers of a new backend first familiarize themselves with the source code for the Local Backend, then with the Ray Backend for an example based on asynchronous processing with futures.
Some of the basic functions and interfaces used by all of the backends are introduced below.
Basic Workflow Functionality
The BaseWorkflow class is the parent class for all backends, and provides functions for backends to rely on:
_find_roots()
– called when initializing a backend, given a set of leaf
nodes (i.e. output nodes of a DAG), _find_roots
follows the chain of
node inputs to find root nodes (i.e. input nodes to a DAG that do not have
another node as an input).
graph_traversal()
– traverses the DAG of nodes from the root nodes
identified by _find_roots()
, yielding each node in turn. This function is
the basis for executing each node using a backend.
find_nodes(pattern)
– traverses the DAG and returns a list of nodes with names
matching the pattern provided.
Workflow Interface
Backends can perform any operations on the nodes yielded by
BaseWorkflow.graph_traversal()
. Typically, an interface should implement
at least two primary functions:
set_storage()
– set a Storage class within the
Workflow instance to be referenced during workflow execution
run()
– call graph_traversal()
and perform an operation on each node
in the DAG, usually executing the node’s func()
Queries
Query()
– a backend may define a Query
class that handles backend-specific
assignment of information to the .backend_info
field of individual Nodes.
Typical implementation is that a Workflow.query("pattern")
function is defined
that uses BaseWorkflow.find_nodes()
to find a list of Nodes with names
matching the pattern, which are assigned to a new Query
object. Methods
of the Query
class can then set specific items within the .backend_info
field of the Nodes assigned to that Query
object.
Plugins
Plugins
are additional functions that are executed for each node during DAG
execution, prior to the node’s main function call. Typical use includes state
initialization on a remote machine (InitializerPlugin
) and maintenance of
random state across machines (RandomStatePlugin
). We also implement
PluginCollection
as a wrapper for multiple plugins.
Plugins implement two functions: local_initialize()
and remote_initialize()
that a backend can call at the appropriate time. For example, during graph
traversal the RayWorkflow calls plugin.local_initialize()
in the main
process, then passes plugin.remote_initialize()
as the plugin function to
call as part of the remote function. The intention is that local_initialize
can access information in the main process and help synchronize state, and that
information can be available to remote_initialize
if required.
Storage
Backends that interface with Storage should provide the
option to add the Storage class to a Workflow instance using the
set_storage()
function. The backend can make use of functions in the
Storage class for checking and searching storage, such as list_tasks()
and
file_exists()
. The standard function for saving a node is save()
, and
for loading results from a stored node is read_task()
. See LocalWorkflow
and RayWorkflow backend source code for examples of using file storage locally
compared with file storage on a remote machine accessed in a setting using
futures for asynchronous computation.