◄ Back to all products DIGITAL DOWNLOAD

Model Evaluation Tool

$4.99

Know if your LLM is actually working. This tool evaluates model outputs against reference answers or rubrics across multiple quality dimensions: accuracy, relevance, tone, conciseness, and safety. Run batch evaluations from a JSONL dataset and get a scored report in seconds.

What’s Included

✓ BLEU and ROUGE scoring for reference-based evaluation
✓ G-Eval — LLM-as-judge scoring on custom rubrics (accuracy, coherence, etc.)
✓ Keyword-presence and semantic similarity checks
✓ Safety classifier — flags toxic, harmful, or off-topic outputs
✓ Batch mode: evaluate a JSONL test set and export scored report to CSV
✓ CLI interface with JSON and human-readable output modes
✓ Extensible scoring API — add custom Python scoring functions
✓ Example datasets, README, and MIT License

⚡ Buy Now — $4.99 ◄ Back to Store

Apple Pay Google Pay

🔒 Secure checkout via Stripe · Instant digital download · Python 3.10+ · MIT License