Publishing Events With Complex Data Formats
Info
See
client.publish_events_batch_schema
for detailed information on function usage.
Using a mapping to transform event data
Fiddler supports publishing batches of events that are stored in unconventional formats.
These formats include:
To handle complex data formats, Fiddler offers the ability to transform event data prior to ingestion.
The function client.publish_events_batch_schema
accepts a "schema" containing a mapping which will be used to transform production data according to your needs.
Here's an example of what one of these schemas may look like:
publish_schema = {
"__static": {
"__project": "example_project",
"__model": "example_model"
},
"__dynamic": {
"__timestamp": "column0",
"feature0": "column1",
"feature1": "column2",
"feature2": "column3",
"model_output": "column4",
"model_target": "column5"
}
}
The above schema allows us to take unlabeled columns (named column0
through column4
) and map them to the names that Fiddler expects (specified in fdl.ModelInfo
).
Some notes about the above schema:
__static
fields are hard-coded. They do not reference anything within the structure.__dynamic
fields point to a location within the structure. We can use forward slashes (/) to indicate traversal of the nested structure.__project
refers to the project ID for the project to which we would like to publish events.__model
refers to the model ID for the model to which we would like to publish events.__timestamp
must refer to the timestamp field within the file structure. If it is not specified, you’ll need to include__default_timestamp
in the__static
section.- You can set the default timestamp to the current time by setting
"__default_timestamp": "CURRENT_TIME"
in the__static
section.
Once you have a schema, you can publish a batch of events using the client.publish_events_batch_schema
function:
client.publish_events_batch_schema(
publish_schema=publish_schema,
batch_source='example_batch.csv'
)
Unlabeled tabular data
You can use one of these schemas in the case where you have a CSV file with no column headers.
The mapping will allow you to reference columns by index rather than name.
Suppose we took the above example, except this time the columns had no headers.
We could use the following schema to map the columns to the necessary names.
publish_schema = {
"__static": {
"__project": "example_project",
"__model": "example_model"
},
"__dynamic": {
"__timestamp": "[0]",
"feature0": "[1]",
"feature1": "[2]",
"feature2": "[3]",
"model_output": "[4]",
"model_target": "[5]"
}
}
Nested data formats
Fiddler supports publishing batches of events that are stored in non-tabular formats.
For JSON/Avro nested structures, you can provide a mapping dictionary that will extract the fields you want to monitor and flatten them prior to upload.f
Suppose you have some nested data that’s structured as follows.
Here, value0
through value7
are the fields we want to monitor.
{
"data": {
"value0": 0,
"value1": 1,
"more_data": {
"value2": 2,
"value3": 3,
"even_more_data": [
{
"value4": 4,
"value5": 5
},
{
"value6": 6,
"value7": 7
}
]
}
}
}
For Fiddler to extract these six inputs, we can use the following mapping to flatten the data.
publish_schema = {
"__static": {
"__project": "example_project",
"__model": "example_model",
"__default_timestamp": "CURRENT_TIME"
},
"__dynamic": {
"value0": "data/value0",
"value1": "data/value1",
"value2": "data/more_data/value2",
"value3": "data/more_data/value3",
"value4": "data/more_data/even_more_data[0]/value4",
"value5": "data/more_data/even_more_data[0]/value5",
"value6": "data/more_data/even_more_data[-1]/value6",
"value7": "data/more_data/even_more_data[-1]/value7"
}
}
Using iterators for multiple events stored in the same row
We can also use mappings to extract multiple events contained in a single JSON/Avro row.
For this, we will look for "iterators" in the row, which are just lists/arrays containing the multiple events we would like to extract.
__iterator
fields point to the location of a list of subtrees we would like to iterate over to obtain multiple records. For each tree within the iterator, additional dynamic fields can be specified. Those fields will be joined with the fields outside of the iterator.- Note that an
__iterator_key
must be specified for iterators. This should contain the path to the list containing items to be iterated over.
- Note that an
Suppose you have some nested data that’s structured as follows.
Here, value0
through value5
are the fields we want to monitor.
{
"data": {
"value0": 0,
"value1": 1,
"more_data": [
{
"value2": 2,
"value3": 3,
"even_more_data": [
{
"value4": 4,
"value5": 5
},
{
"value4": 6,
"value5": 7
}
]
},
{
"value2": 8,
"value3": 9,
"even_more_data": [
{
"value4": 10,
"value5": 11
},
{
"value4": 12,
"value5": 13
}
]
}
]
}
}
Notice that we have four records contained within identical subtrees of the structure. Fiddler will perform a join on the values within the subtrees and the values outside of the subtrees.
publish_schema = {
"__static": {
"__project": "example_project",
"__model": "example_model",
"__default_timestamp": "CURRENT_TIME"
},
"__dynamic": {
"value0": "data/value0",
"value1": "data/value1"
},
"__iterator": {
"__iterator_key": "more_data",
"__dynamic": {
"value2": "value2",
"value3": "value3"
},
"__iterator": {
"__iterator_key": "even_more_data",
"__dynamic": {
"value4": "value4",
"value5": "value5"
}
}
}
}
To clarify, this is the output we will see once the values from above example are flattened.
Note that the outermost fields have been duplicated across all the records.
Publishing to multiple models from the same file
Fiddler allows you to publish a single file containing events for multiple models using one API call.
To do this, you can include conditional keys in the schema, which can be used to tell Fiddler which project/model to publish to.
Here's an example of what these conditionals looks like within a schema:
publish_schema = {
"__static": {
"__default_timestamp": "CURRENT_TIME"
},
"__dynamic": {
"__timestamp": "column0",
"__project": "column1"
"__model": "column2",
"!example_project_1,example_model_1": {
"feature0": "column3",
"feature1": "column4",
"model_output": "column6",
"model_target": "column7"
},
"!example_project_1,example_model_2": {
"feature0": "column4",
"feature1": "column5",
"model_output": "column6",
"model_target": "column7"
},
"!example_project_2,example_model_3": {
"feature0": "column3",
"feature1": "column5",
"model_output": "column6",
"model_target": "column7"
}
}
}
In the above schema, we use the "!example_project_1,example_model_1"
conditional to tell Fiddler to publish the events to the example_project_1
project and example_model_1
model, using the schema defined for that conditional.
Note
In order to use this conditional functionality, you'll need to specify the
__project
and__model
outside of the conditional.
Updated about 2 months ago