Eval Sets
Evaluating the quality of Serenity's configuration is what lets us achieve exceptional accuracy. At the heart of this evaluation process are eval sets — structured collections of questions and correct answers to measure accuracy on.
What is an Eval?
An eval is a curated dataset consisting of:
- A list of questions that are likely to be asked or have been asked before.
- Corresponding correct answers.
During evaluation, SerenityGPT is prompted with each question. Its response is then compared to the correct answer using LLM-as-a-judge. This judge assesses how well the generated answer aligns with the expected response.
The performance on these eval sets drives all configuration decisions.
Formats for Correct Answers
We support two formats for providing correct answers in an eval set:
-
Reference Link (preferred and easiest)
Link to the source document or page where the correct answer can be found. -
Answer Highlights
A high-level summary of the answer that can also mention what the answer should not contain.
Feel free to mix and match the formats or specify both for the same question. You can provide the eval as a CSV file or as YAML file.
CSV Example
Question | Link | Answer |
---|---|---|
Can I travel with a dog | https://help.lyft.com/hc/en-us/all/articles/8559088908-pet-rides-for-riders | |
How do I install Puppet Enterprise | mention 2 installation modes: tarball and installation manager |