Choose Your Level
Pick the difficulty that matches where you are. You can come back and try a harder level later.
Beginner
Create a Golden Test Set
Create 10 test cases to evaluate a new customer support prompt.
1 to 2 hours 3 criteria
Start this level
Intermediate
Write an LLM-as-a-Judge Prompt
Use GPT-4 to grade the outputs of a smaller, cheaper model.
2 to 3 hours 3 criteria
Start this level
Advanced
Design an A/B Testing Framework
Compare two different prompts in production and determine the winner statistically.
3 to 5 hours 3 criteria
Start this level