AI & PromptingAdvanced 3 to 5 hours

Design an A/B Testing Framework

Compare two different prompts in production and determine the winner statistically.

The Scenario

Prompt A is short and uses zero-shot. Prompt B is long and uses few-shot. Both are live in your app, routing 50% of traffic to each. You need to design the framework to decide which one is actually better for the business.

The Brief

Write an evaluation strategy document. How will you measure success beyond just "the text looks nice"?

Deliverables

  • The core business metric you will track (e.g., thumbs up/down, copy-paste events, latency)
  • The automated evaluation metrics (e.g., JSON parse failure rate)
  • A strategy for handling latency vs. quality trade-offs (Prompt B is better but takes 3 seconds longer to generate)
  • The threshold for declaring a "winner"

Submission Guidance

Prompt engineering in production is an engineering discipline, not creative writing. Latency, token cost, and API failure rates matter just as much as prose quality.

Submit Your Work

Your submission is graded against the rubric on the right. If you pass, you get a public Badge URL you can share on LinkedIn. There is no draft save, so work offline first and paste your finished response here.

This appears on your public Badge.

0/20000 charactersMarkdown supported

One per line or comma separated. Up to 5 links.

By submitting, you agree your submission text, name, and evaluation will appear on a public Badge URL.