Spark Kafka Streaming Application

Overview:

This project is a dockerised pub sub based application written in python to publish data using Kafka and consume/aggregate data using the Spark processing framework.

Requirement:

A dockerised kafka application that produces user behaviour data and pushes into its producer.
A dockerised pyspark based application that could read real time data from a kafka producer and perform aggregations on the raw data and perform aggregations real time.

Architecture:

The Architecture involves the following tech stack.

Kafka v1.4.7
Spark v2.4.4
Python v3.7.2
Docker

What can be done better

Push the data into a persistant storage like InfluxDB/MySQl/PostGres.
Stream real time aggregations from persistant storage systems into dashboards built in Grafana.

How to run this application

### Clone the repository [https://github.com/ashshetty90/spark-kafka-application.git]

### After cloning the repository, run the below command in the root directory of the project. Make sure docker is installed on your machine.
$ docker-compose up -d
### The above command will run the kafka producer and the spark consumer inside a docker container.

$ docker container ps 

### This would show all the currently running containers inside docker

$ docker logs <container_id>

The logs for each of the container can be seen by running the above command

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.idea		.idea
images		images
producer		producer
spark-consumer		spark-consumer
tests		tests
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spark Kafka Streaming Application

Overview:

Requirement:

Architecture:

What can be done better

How to run this application

Screenshots

Sample Outputs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

ashshetty90/spark-kafka-application

Folders and files

Latest commit

History

Repository files navigation

Spark Kafka Streaming Application

Overview:

Requirement:

Architecture:

What can be done better

How to run this application

Screenshots

Sample Outputs

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages