The Scenario
You are building an AI HR bot using RAG. You have a 200-page PDF of the company handbook. If you embed whole pages, the AI will get confused. If you embed single sentences, the AI loses context.
The Brief
Write a strategy document detailing exactly how you will chunk the handbook text before sending it to the embedding model.
Deliverables
- The Chunk Size (e.g., 500 tokens) and Overlap size (e.g., 50 tokens)
- The Chunking Method (e.g., Fixed-size, Sentence-aware, or Header-based) and why you chose it
- One example of a "bad" chunk (where context is lost) and how your strategy prevents it
Submission Guidance
For structured documents like handbooks, semantic chunking (splitting by Markdown headers like `### Leave Policy`) is usually vastly superior to dumb character-count chunking.
Submit Your Work
Your submission is graded against the rubric on the right. If you pass, you get a public Badge URL you can share on LinkedIn. There is no draft save, so work offline first and paste your finished response here.