Model Evaluation Tool — Demo

Model Metrics Calculator

Enter predicted vs. actual labels to compute precision, recall, F1-score, and confusion matrix.

Actual Labels (one per line or comma-separated)

Model Predictions (one per line or comma-separated)

—

Accuracy

—

Precision (macro)

—

Recall (macro)

—

F1 Score (macro)

Run evaluation to see matrix

Run evaluation to see breakdown

Python + HuggingFace integration: BLEU/ROUGE scoring, side-by-side model comparison, evaluation datasets, and HTML report export.