Real-time stream processing with Spark, Kafka, and Elasticsearch

In this hands-on course, you will build a real-time data pipeline that receives data from Twitter, stores it into Kafka, processes the stream using Spark, and stores the processed stream into Elasticsearch.

Kafka

  • Setup and configuration
  • Topics, partitions
  • API
  • Connecting to Spark

Elasticsearch

  • Setup
  • API
  • Kibana
  • Marvel plugin

Real-time Data Pipeline

  • Twitter API
  • Kafka
  • Spark streaming
  • Elasticsearch

After this course, you will

  • Understand the use of queuing for decoupling distributed systems
  • Know how to use Spark streaming to filter and aggregate data
  • Know when and how to use Spark and Kafka together
  • Know how to explore datasets and build dashboards with Elasticsearch
  • Have a real-time pipeline running in AWS