The Scenario
Your company ingests data from a third-party API every hour. Last week, the API provider added 3 new fields and renamed one existing field without warning. Your pipeline crashed, and the data warehouse was missing 6 hours of data before anyone noticed.
The Brief
Redesign the pipeline to handle schema evolution gracefully. The pipeline must detect changes, adapt where possible, alert on breaking changes, and never silently drop data.
Deliverables
- A schema detection strategy: how the pipeline discovers new, renamed, or removed fields
- Handling rules for each type of change: new fields (add column), removed fields (nullable/default), renamed fields (mapping table)
- An alerting mechanism that notifies the team of breaking changes without stopping the pipeline
Submission Guidance
The worst outcome is silent data loss. The second worst is a pipeline that crashes at 2am and nobody knows until Monday. Design for both.
Submit Your Work
Your submission is graded against the rubric on the right. If you pass, you get a public Badge URL you can share on LinkedIn. There is no draft save, so work offline first and paste your finished response here.