Skip to content

Evals

What is an Eval?

An eval (short for evaluation set) is a collection of questions, each with specific criteria for what constitutes a correct or high-quality answer. We use evals to automatically measure the quality of our system’s responses. By running our system on an evalset, we can assess how well it meets the defined criteria for each question.

All decisions about rolling out new features or models are based on the results of evals.

Supported Criteria

Currently, we support the following criteria for evaluating answers:

  • Target URL: The answer should rely on information from a specific reference document, provided as a URL.
  • Instructions: A short high-level description of the correct answer.

Examples:

  • Question: How do I install SerenityGPT?
    Target URL: https://docs.serenitygpt.com/deployment/overview/

  • Question: What main features does SerenityGPT have?
    Instructions: The answer should mention custom integration and security.

Eval formats

We support CSV and JSON with the following fields (columns):

  • question (values required)
  • target_url (values optional)
  • instructions (values optional)

Example of an eval in the CSV format

Question Target URL(s) Instructions
How do I install SerenityGPT? https://docs.serenitygpt.com/deployment/overview/
What is SerenityGPT? The answer should mention that SerenityGPT is a RAG and Agentic AI framework

You can include follow-up questions by associating them with the same Chat ID:

Chat ID Question Target URL(s) Instructions
1 What is SerenityGPT? The answer should mention that SerenityGPT is a RAG and Agentic AI framework
1 How do I install it? https://docs.serenitygpt.com/deployment/overview/

Example of an eval in the YAML format

See YAML evals