Embedding Visualization with UMAP

Overview

Embedding visualization enables users to effectively explore and interpret complex relationships in high-dimensional data. By reducing these dimensions into a more comprehensible 2D or 3D space, it becomes feasible to visually identify patterns, clusters, and anomalies. This document provides an in-depth look at how to leverage UMAP (Uniform Manifold Approximation and Projection) for embedding visualization within Fiddler's platform.

UMAP features

Structure Preservation: UMAP is renowned for preserving the local and global structure of data, which is crucial for accurately interpreting the relationships in your embeddings.

Versatility: Supports both text and image embeddings, which makes it a versatile tool suitable for a broad range of applications, from natural language processing to computer vision.

Scalability: Handles large datasets efficiently, making it appropriate for enterprise-scale applications.

Within Fiddler, UMAP is supported for both Text and Image embeddings in Custom feature.

Use Cases for Embedding Visualization

Embedding visualization is not only a tool for data exploration but also serves critical functions across various domains:

Anomaly Detection: Quickly identify outliers or anomalous data points that deviate from normal patterns. This is particularly useful in fields like fraud detection, network security, and quality assurance.

Cluster Analysis: Discover natural groupings or clusters within your data. This application is vital in market segmentation, genetic research, and social network analysis.

Data Labeling: Assist in the labeling process by revealing groupings that can be manually tagged. This is beneficial in supervised learning scenarios where labeled data may be scarce or expensive to obtain.

Feature Selection and Engineering: Determine which features contribute most to data clustering and differentiation. This insight helps in refining the feature set for machine learning models, enhancing model performance.

Comparative Analysis: Compare different sets of embeddings (like pre- and post-training model states) to assess changes or improvements. Useful in model development and tuning.


📘

To create an embedding visualization chart

Follow the UI Guide on creating the embedding visualization chart here.