◄ Back to all products DIGITAL DOWNLOAD

Model Evaluation Tool

$4.99

Know if your LLM is actually working. This tool evaluates model outputs against reference answers or rubrics across multiple quality dimensions: accuracy, relevance, tone, conciseness, and safety. Run batch evaluations from a JSONL dataset and get a scored report in seconds.

What’s Included

  • BLEU and ROUGE scoring for reference-based evaluation
  • G-Eval — LLM-as-judge scoring on custom rubrics (accuracy, coherence, etc.)
  • Keyword-presence and semantic similarity checks
  • Safety classifier — flags toxic, harmful, or off-topic outputs
  • Batch mode: evaluate a JSONL test set and export scored report to CSV
  • CLI interface with JSON and human-readable output modes
  • Extensible scoring API — add custom Python scoring functions
  • Example datasets, README, and MIT License
⚡ Buy Now — $4.99 ◄ Back to Store
Apple Pay Google Pay

🔒 Secure checkout via Stripe · Instant digital download · Python 3.10+ · MIT License