- Classification: Accuracy, precision, recall, F1, AUC, confusion matrix
- Regression: MAE, MSE, RMSE, R², residual analysis
- Ranking: NDCG, MAP, precision@k, ranking-specific metrics
- LLM: Token-based metrics, response quality, safety metrics
Examples
Configuring models for different tasks:Task type cannot be changed after model creation. Choose carefully
based on your model’s primary objective and output format.
BINARY_CLASSIFICATION = ‘binary_classification’
Two-class classification tasks. Used for models that predict one of two possible outcomes or classes. Enables binary classification metrics and threshold-based analysis. Available metrics:- Accuracy, Precision, Recall, F1-score
- AUC-ROC, AUC-PR curves
- Confusion matrix analysis
- Threshold optimization tools
- Fraud detection (fraud/legitimate)
- Email spam filtering (spam/ham)
- Medical diagnosis (positive/negative)
- Credit approval (approve/deny)
- Churn prediction (churn/retain)
MULTICLASS_CLASSIFICATION = ‘multiclass_classification’
Multi-class classification tasks. Used for models that predict one of multiple possible classes or categories. Supports comprehensive multiclass performance analysis and class-specific metrics. Available metrics:- Per-class precision, recall, F1-score
- Macro and micro-averaged metrics
- Confusion matrix with multiple classes
- Class distribution analysis
- Document categorization (multiple topics)
- Image classification (multiple objects)
- Sentiment analysis (positive/neutral/negative)
- Product categorization
- Intent classification in chatbots
REGRESSION = ‘regression’
Continuous value prediction tasks. Used for models that predict numerical values on a continuous scale. Enables regression-specific metrics and residual analysis. Available metrics:- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
- R-squared (coefficient of determination)
- Residual distribution analysis
- Price prediction
- Sales forecasting
- Risk scoring (continuous scores)
- Demand forecasting
- Performance rating prediction
RANKING = ‘ranking’
Ranking and recommendation tasks. Used for models that rank items or provide ordered recommendations. Supports ranking-specific metrics and list-wise evaluation. Available metrics:- Normalized Discounted Cumulative Gain (NDCG)
- Mean Average Precision (MAP)
- Precision@K, Recall@K
- Mean Reciprocal Rank (MRR)
- Hit Rate analysis
- Search result ranking
- Product recommendations
- Content recommendation systems
- Information retrieval
- Personalized ranking
LLM = ‘llm’
Large language model and generative AI tasks. Used for language models, chatbots, and generative AI applications. Enables LLM-specific monitoring including safety, quality, and performance metrics. Available metrics:- Response quality metrics
- Safety and toxicity detection
- Hallucination detection
- Token-based analysis
- Latency and throughput metrics
- Chatbots and conversational AI
- Text generation models
- Question-answering systems
- Code generation models
- Content creation assistants
- Guardrails integration
- Safety monitoring
- Prompt and response analysis
- Token usage tracking
NOT_SET = ‘not_set’
Placeholder for undefined or unspecified tasks. Used as a default value when the model task has not been explicitly defined. Should be replaced with an appropriate task type during model configuration. This value should not be used for production models as it limits available monitoring capabilities and metrics.is_classification()
Check if the task is a classification type.Returns
True if task is binary or multiclass classification
is_regression()
Check if the task is regression.Returns
True if task is regression