PortfolioAI & PromptingAI Evaluation & Benchmarking
Topic

AI Evaluation & Benchmarking

Create datasets to test if an AI prompt actually works. Tests "LLM-as-a-judge" concepts, test-case creation, and scoring rubrics.

Test set creationLLM-as-a-judgeBenchmarkingQuality assurance

Choose Your Level

Pick the difficulty that matches where you are. You can come back and try a harder level later.