Batch Schemas#

For batch flows, a schema is shared in the link between two batch stages.

To create a schema you must first create a link between two batch stages, which is explained in the Connecting Batch Stages section. Then use the Link.create_schema() method to create the schema object.

>>> row_gen = batch_flow.add_stage('Row Generator', 'Row_Generator_2')
>>> peek = batch_flow.add_stage('Peek', 'Peek_2')
>>> link = row_gen.connect_output_to(peek)
>>> schema = link.create_schema()

Initializing a Schema Field#

Schemas are populated with fields. To add a field to a schema use the Schema.add_field() method which requires an odbc_type and a name. You can also add other desired parameters to the Schema.add_field() method to intialize the field with specific properties.

>>> field1 = schema.add_field(odbc_type = 'CHAR', name = 'COLUMN_1')
>>> field2 = schema.add_field('CHAR', 'COLUMN_2', length = 100, nullable = True)

Editing and Removing a Schema Field#

To edit an already existing schema field you can directly edit the property you wish to change. To get a list of all properties check the parameters of the Schema.add_field() function. Some properties are dependent on odbc_type and may not always be settable.

To remove a field simply use the Schema.remove_field() method which takes in the name of the field.

>>> field3 = schema.add_field(odbc_type = 'CHAR', name = 'COLUMN_3', length = 100, nullable = True)
>>> field3.nullable = False
>>> schema = schema.remove_field('COLUMN_3')

Schemas dictate what data is sent between stages. In this example, the schema has two CHAR fields called COLUMN_1 and COLUMN_2. As a result, the Row Generator outputs two CHAR columns which are sent to the Peek stage.

All schemas use data fields for the input and output. Each field is defined by a datatype and name, as seen above. The supported datatypes are as follows:

'BIGINT', 'BINARY', 'BIT', 'CHAR', 'DATE', 'DECIMAL', 'DOUBLE', 'FLOAT', 'INTEGER',
'LONGVARBINARY', 'LONGVARCHAR', 'NUMERIC', 'REAL', 'SMALLINT', 'TIME', 'TIMESTAMP',
'TINYINT', 'UNKNOWN', 'VARBINARY', 'VARCHAR', 'NCHAR', 'LONGNVARCHAR', 'NVARCHAR'

Note: Certain stages (e.g. Copy, Match Frequency) will automatically populate certain schemas in the UI. That functionality is replicated in the SDK. You should only specify the schemas that you would normally need to specify.