◄ Back to all products
DIGITAL DOWNLOAD
Model Evaluation Tool
$4.99Know if your LLM is actually working. This tool evaluates model outputs against reference answers or rubrics across multiple quality dimensions: accuracy, relevance, tone, conciseness, and safety. Run batch evaluations from a JSONL dataset and get a scored report in seconds.
What’s Included
- BLEU and ROUGE scoring for reference-based evaluation
- G-Eval — LLM-as-judge scoring on custom rubrics (accuracy, coherence, etc.)
- Keyword-presence and semantic similarity checks
- Safety classifier — flags toxic, harmful, or off-topic outputs
- Batch mode: evaluate a JSONL test set and export scored report to CSV
- CLI interface with JSON and human-readable output modes
- Extensible scoring API — add custom Python scoring functions
- Example datasets, README, and MIT License
Apple Pay
Google Pay
🔒 Secure checkout via Stripe · Instant digital download · Python 3.10+ · MIT License