Training Pipeline

Configure taxonomy, preprocess data, and train the Logistic Regression model.

source_of_truth.json

This JSON defines the category structure. The pipeline uses this to generate synthetic labels for the initial supervised training set.

Training samples per class

Preprocessing

Regex Cleaning & Normalization

Feature Extraction

TF-IDF Vectorizer (n-gram 1,2)

Classification

Logistic Regression (L2 penalty)

Build Output

Waiting for job execution...

compiling... 0%

The inference engine is running on fallback logic. For accurate predictions, please complete the training process in the dashboard tab first.

Transaction Descriptor

Quick Test

Ready for analysis

Enter a transaction to see the prediction

Validate low-confidence predictions to improve model accuracy.

Active Learning

Transaction Descriptor	Predicted Label	Confidence	Key Terms	Validation

Shopping