Prompt for Metadata Extraction — RAG (Retrieval-Augmented Generation) Prep Intermediate Task | Graduates Hub

The Scenario

Your RAG system is returning bad results because it relies entirely on vector similarity. A search for "2024 budget" is returning the "2022 budget" because the words are similar. You need to extract metadata (Year, Department, DocType) from every document *before* embedding so you can use hard filters.

The Brief

Write the LLM prompt that will process raw document text and output a strict JSON block of metadata for the vector database.

Deliverables

The Metadata Extraction Prompt
The JSON Schema required (Year, Department, DocType, Summary)
A fallback instruction (what the AI should do if the Year is not mentioned in the text)

Submission Guidance

Hybrid search (Vector Similarity + Metadata Filtering) is the industry standard for RAG. Your prompt must be robust enough to handle documents that are missing information without hallucinating dates.

Submit Your Work

Your submission is graded against the rubric on the right. If you pass, you get a public Badge URL you can share on LinkedIn. There is no draft save, so work offline first and paste your finished response here.