Evals
What is an Eval?
An eval (short for evaluation set) is a collection of questions, each with specific criteria for what constitutes a correct or high-quality answer. We use evals to automatically measure the quality of our system’s responses. By running our system on an evalset, we can assess how well it meets the defined criteria for each question.
All decisions about rolling out new features or models are based on the results of evals.
Supported Criteria
Currently, we support the following criteria for evaluating answers:
- Target URL: The answer should rely on information from a specific reference document, provided as a URL.
- Instructions: A short high-level description of the correct answer.
Examples:
-
Question: How do I install SerenityGPT?
Target URL: https://docs.serenitygpt.com/deployment/overview/ -
Question: What main features does SerenityGPT have?
Instructions: The answer should mention custom integration and security.
Eval formats
We support CSV and JSON with the following fields (columns):
- question (values required)
- target_url (values optional)
- instructions (values optional)
While we support all of the listed formats, .yaml is the system’s native format. Any other format will be automatically converted to .yaml.
Here is an example of a YAML file we use for evals.
questions:
tenant-name:
- question: How do I install SerenityGPT?
target_url: https://docs.serenitygpt.com/deployment/overview/
- question: What is SerenityGPT?
instructions: The answer should mention that SerenityGPT is a RAG and Agentic AI framework
target_url:
- https://docs.serenitygpt.com/
- https://docs.serenitygpt.com/product/overview/
- question: What main features does SerenityGPT have?
instructions: The answer should mention custom integration and security.
target_url: ^docs.serenitygpt.com/.*
Here is an example of the same configuration table format:
| Tenant Name | Question | Target URL(s) | Instructions |
|---|---|---|---|
| tenant-name | How do I install SerenityGPT? | https://docs.serenitygpt.com/deployment/overview/ | - |
| tenant-name | What is SerenityGPT? | https://docs.serenitygpt.com/ https://docs.serenitygpt.com/product/overview/ |
The answer should mention that SerenityGPT is a RAG and Agentic AI framework |
| tenant-name | What main features does SerenityGPT have? | ^docs.serenitygpt.com/.* | The answer should mention custom integration and security. |