The Scenario
An e-commerce company wants real-time analytics on user behaviour. Currently, clickstream data is batch-loaded every 4 hours. Marketing wants to see what users are doing right now so they can trigger personalised push notifications within 60 seconds of a key event.
The Brief
Design a real-time streaming pipeline. Choose the message broker (Kafka, Kinesis, Pub/Sub), the processing framework (Spark Streaming, Flink, or a simpler consumer), and the output sink (real-time dashboard, notification trigger, or both).
Deliverables
- An architecture diagram showing producers, broker, consumers, and output sinks
- Your technology choices with a defense of each (why Kafka over SQS, why Flink over Spark, etc.)
- How you handle late-arriving events, duplicate events, and consumer failures
Submission Guidance
This is a senior data engineering task. Focus on exactly-once vs at-least-once semantics and how your architecture handles each.
Submit Your Work
Your submission is graded against the rubric on the right. If you pass, you get a public Badge URL you can share on LinkedIn. There is no draft save, so work offline first and paste your finished response here.