- Numeric: FLOAT, INTEGER - enable statistical analysis
- Categorical: BOOLEAN, CATEGORY - enable distribution analysis
- Textual: STRING - enable text-based monitoring
- Temporal: TIMESTAMP - enable time-based analysis
- Vector: VECTOR - enable embedding-based monitoring
Examples
Defining column data types in model schema:Choose data types that accurately represent your data for optimal
monitoring and validation. Incorrect data types may lead to
inappropriate metrics or monitoring failures.
FLOAT = ‘float’
Floating-point numerical values. Used for continuous numerical data with decimal precision. Enables comprehensive statistical analysis and numerical drift detection. Characteristics:- Decimal precision values
- Statistical distribution analysis
- Range and outlier detection
- Correlation analysis support
- Mean, median, standard deviation tracking
- Distribution drift detection (KS test, PSI)
- Range violation alerts
- Outlier detection and analysis
- Prices, costs, revenues
- Probabilities and confidence scores
- Measurements and sensor readings
- Performance metrics and ratios
- Model prediction scores
INTEGER = ‘int’
Integer numerical values. Used for whole number data without decimal places. Supports numerical analysis while recognizing discrete nature of integer data. Characteristics:- Whole number values only
- Discrete distribution analysis
- Count-based statistics
- Range validation
- Count distribution tracking
- Range violation detection
- Discrete value frequency analysis
- Statistical drift detection
- Counts and quantities
- Age, years, days
- IDs and identifiers (when numeric)
- Ranking positions
- Categorical codes (when numeric)
BOOLEAN = ‘bool’
True/false binary values. Used for binary flag data with exactly two possible values. Enables binary distribution analysis and proportion tracking. Characteristics:- Exactly two values (True/False, 1/0, Yes/No)
- Binary distribution analysis
- Proportion-based metrics
- Simple categorical handling
- True/False ratio tracking
- Binary distribution drift
- Proportion change detection
- Flag frequency analysis
- Feature flags and indicators
- Binary classifications
- Yes/No survey responses
- Membership status
- Activation states
STRING = ‘str’
Text string values. Used for textual data of variable length. Supports text-based analysis and can be combined with text embeddings for advanced monitoring. Characteristics:- Variable length text
- Text-based analysis
- String pattern detection
- Encoding-aware processing
- Length distribution tracking
- Pattern and format analysis
- Text embedding integration
- String uniqueness analysis
- Names and descriptions
- Comments and reviews
- URLs and paths
- Free-form text inputs
- JSON or XML strings
- Can be converted to embeddings for semantic monitoring
- Supports text enrichment features
- May require text preprocessing
CATEGORY = ‘category’
Categorical values with limited distinct options. Used for data with a finite set of possible values or categories. Enables categorical distribution analysis and new category detection. Characteristics:- Limited set of possible values
- Categorical distribution tracking
- Category frequency analysis
- New category detection
- Category distribution drift
- New/missing category alerts
- Frequency change detection
- Category proportion analysis
- Product categories
- Geographic regions
- Status codes
- Demographic categories
- Classification labels
- Use for data with < 1000 unique values
- Consider STRING type for high-cardinality categories
- Define expected categories during schema creation
TIMESTAMP = ‘timestamp’
Date and time values. Used for temporal data including dates, times, and timestamps. Enables time-based analysis and temporal pattern detection. Characteristics:- Date/time information
- Temporal ordering
- Time-based aggregations
- Timezone awareness
- Temporal pattern analysis
- Time gap detection
- Seasonal trend monitoring
- Data freshness tracking
- Event timestamps
- Creation/modification dates
- Transaction times
- Log timestamps
- Scheduled events
- Unix timestamps
- ISO 8601 strings
- Pandas datetime objects
- Various date formats (with parsing)
VECTOR = ‘vector’
Multi-dimensional numerical vectors (embeddings). Used for embedding vectors, feature vectors, and other multi-dimensional numerical data. Enables embedding-based drift detection and clustering analysis. Characteristics:- Fixed-dimension numerical arrays
- Embedding-based analysis
- Vector similarity metrics
- Clustering support
- Embedding drift detection
- Cluster analysis and visualization
- Vector similarity tracking
- Dimensionality validation
- Text embeddings (Word2Vec, BERT, etc.)
- Image embeddings (CNN features)
- User/item embeddings
- Feature vectors from neural networks
- Recommendation system embeddings
- Requires consistent vector dimensions
- Benefits from custom feature definitions
- Supports clustering and UMAP visualization
is_numeric()
Check if the data type is numeric.Returns
True if data type is INTEGER or FLOAT
is_bool_or_cat()
Check if the data type is boolean or categorical.Returns
True if data type is BOOLEAN or CATEGORY
is_vector()
Check if the data type is vector.Returns
True if data type is VECTOR