tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Generative AI > AI Observability > LLM Regression Testing with Promptfoo

LLM Regression Testing with Promptfoo

Author: Venkata Sudhakar

ShopMax India's customer service chatbot went live with GPT-4o but the team wants to evaluate whether switching to a cheaper model will degrade response quality. Without a structured regression testing process, every prompt change or model swap is a risk. Promptfoo is an open-source CLI tool for evaluating LLM outputs against defined test cases. It runs your prompts against multiple models, grades responses using custom assertions, and flags regressions before they reach production.

Promptfoo works by defining a YAML config file with prompts, providers (models), and test cases. Each test case has an input and one or more assertions - string matching, LLM-graded scoring, or custom JavaScript checks. When you run promptfoo eval, it executes all combinations and produces a report showing pass/fail rates, latency, and cost per model. You can integrate it into CI/CD pipelines to block deployments if pass rate drops below a threshold.

The example below shows a Promptfoo setup for ShopMax India's return policy assistant. We test two models against three customer questions and write results to a JSON report using Python to generate the config and run the evaluation.


It gives the following output,

Tests run: 6
Passed:    5
Failed:    1
Pass rate: 83.3%
Regression test complete - check results.json for full report.

Integrate Promptfoo into GitHub Actions to block merges when pass rate drops below 80%. Store baseline results in version control and use promptfoo eval --grader openai:gpt-4o for LLM-as-judge semantic scoring, not just string matching. For ShopMax India, maintain separate test suites per agent type - returns, pricing, and inventory - so regressions are caught at the module level before they affect the full customer experience.


 
  


  
bl  br