Schemas#

For batch flows, a Schema is shared on the Link between two batch stages. A schema defines the structure of your data, including its columns and data types.

To create a schema, you must first create a link between two batch stages, as explained in the Connecting Batch Stages section. Then use the Link.create_schema() method to create the schema object.

>>> row_gen = batch_flow.add_stage('Row Generator', 'Row_Generator_2')
>>> peek = batch_flow.add_stage('Peek', 'Peek_2')
>>> link = row_gen.connect_output_to(peek)
>>> link.name = 'Link_2'
>>> schema = link.create_schema()

Initializing a Schema Field#

Schemas are populated with fields. To add a field to a schema, use the Schema.add_field() method, which requires an odbc_type and a name. You can also pass additional parameters to Schema.add_field() to initialize the field with specific properties.

>>> field1 = schema.add_field(odbc_type = 'CHAR', name = 'COLUMN_1')
>>> field2 = schema.add_field('CHAR', 'COLUMN_2', length = 100, nullable = True)

Editing and Removing a Schema Field#

To edit an existing schema field, directly modify the property you want to change. For a list of all properties, see the parameters of the Schema.add_field() method. Some properties depend on odbc_type and may not always be settable.

To remove a field, use the Schema.remove_field() method, which takes the field name.

>>> field3 = schema.add_field(odbc_type = 'CHAR', name = 'COLUMN_3', length = 100, nullable = True)
>>> field3.nullable = False
>>> schema = schema.remove_field('COLUMN_3')

Schemas dictate what data is sent between stages. In this example, the schema has two CHAR fields called COLUMN_1 and COLUMN_2. As a result, the Row Generator outputs two CHAR columns which are sent to the Peek stage.

All schemas use data fields for input and output. Each field is defined by a data type and a name, as shown above. The supported data types are:

'BIGINT', 'BINARY', 'BIT', 'CHAR', 'DATE', 'DECIMAL', 'DOUBLE', 'FLOAT', 'INTEGER',
'LONGVARBINARY', 'LONGVARCHAR', 'NUMERIC', 'REAL', 'SMALLINT', 'TIME', 'TIMESTAMP',
'TINYINT', 'UNKNOWN', 'VARBINARY', 'VARCHAR', 'NCHAR', 'LONGNVARCHAR', 'NVARCHAR'

Note

Certain stages (e.g. Copy, Match Frequency) will automatically populate certain schemas in the UI. That functionality is replicated in the SDK. You should only specify the schemas that you would normally need to specify.

Working with Data Definitions#

Schemas can import fields from DataDefinition objects, and can also export their fields to create new data definitions. This lets you reuse common data structures across multiple flows.

For detailed information about importing and exporting data definitions with schemas, see: