The Scenario
A hospital exported 18 months of patient admission records. The dataset has missing values scattered across columns like Age (3%), Blood Pressure (12%), Diagnosis Code (8%), and Admission Date (0.5%). Each column demands a different imputation strategy because the data is missing for different reasons.
The Brief
You do not have the real file. Describe the imputation strategy you would use for each column type, justify why, and explain the risks of getting it wrong in a healthcare context.
Deliverables
- A table listing each column, its missing rate, your chosen strategy (mean, mode, forward-fill, drop, flag, or model-based), and a one-sentence justification
- An explanation of the difference between MCAR, MAR, and MNAR and which category each column likely falls into
- One risk specific to healthcare data where a bad imputation could lead to a dangerous clinical decision
Submission Guidance
Do not default to "drop all rows with nulls". In healthcare, every row is a patient. Show that you understand the cost of discarding data versus the cost of inventing it.
Submit Your Work
Your submission is graded against the rubric on the right. If you pass, you get a public Badge URL you can share on LinkedIn. There is no draft save, so work offline first and paste your finished response here.