Training Pipeline
Configure taxonomy, preprocess data, and train the Logistic Regression model.
Taxonomy Source
source_of_truth.json
This JSON defines the category structure. The pipeline uses this to generate synthetic labels for the initial supervised training set.
Dataset Distribution
Training samples per classPipeline Steps
1
Preprocessing
Regex Cleaning & Normalization
2
Feature Extraction
TF-IDF Vectorizer (n-gram 1,2)
3
Classification
Logistic Regression (L2 penalty)
Build Output
compiling...
0%