DataIntermediate 2 to 3 hours

Merge, Reshape, and Aggregate Multi-Source Data

Combine 3 CSVs with different schemas into one clean analysis-ready DataFrame.

The Scenario

A retail chain stores its data across three separate systems: `customers.csv` (customer demographics), `orders.csv` (order headers), and `order_items.csv` (line items with product and price). Your analyst needs a single, denormalized DataFrame to work from.

The Brief

Write the pandas code to load, merge, reshape, and aggregate these three files into one clean DataFrame. Handle schema mismatches (different column names, date formats, missing keys).

Deliverables

  • The complete Python script showing the merge strategy (inner, left, etc.) and join keys
  • How you handled schema mismatches (column renames, type casting, date parsing)
  • The final aggregated output: revenue per customer, with at least 2 derived columns

Submission Guidance

Show that you understand the difference between inner, left, and outer joins. A wrong join type silently drops data.

Submit Your Work

Your submission is graded against the rubric on the right. If you pass, you get a public Badge URL you can share on LinkedIn. There is no draft save, so work offline first and paste your finished response here.

This appears on your public Badge.

0/20000 charactersMarkdown supported

One per line or comma separated. Up to 5 links.

By submitting, you agree your submission text, name, and evaluation will appear on a public Badge URL.