GitHub - owenrh/flink-variance: Apache Flink takes on the Stream Processing Backpressure Smackdown

Flink Variance

This project contains Flink job definitions that use synthetic data to create four different backpressure scenarios used in the Stream Processing Backpressure Smackdown.

Running Flink

There are a number of ways to install and run Spark depending on your available hardware. I am just going to cover running it locally on a Mac laptop.

Install Flink

Download the version in question from the Flink website. I used v1.8.1.

(Optional) Install InfluxDB and Chronograf

These can be installed via Brew, and run up as normal Brew services, e.g. brew services start influxdb/chronograf.

Then create a database for the metrics, and name it flink or something along those lines.

Configure Flink

When configuring Flink locally you have a couple of options:

Use a single worker.
Use multiple workers and therefore JVMs (more closely mimicking a distributed setup).

For the purposes of the smackdown, I found the results from both configurations to be the same.

The number of workers is configured in FLINK_HOME/conf/slaves. By default, it will run a single worker. If you want more then add repeated lines of localhost. For four workers use:

localhost
localhost
localhost
localhost

You may also need to increase the number of task slots available your workers, especially if using a single worker. Our scenarios will require a minimum of four slots in total to be available across our workers, due to the four parallel dataflow tasks.

The worker (Task Manager) slots can be configured via the FLINK_HOME/conf/flink-conf.yaml. For example, to increase

taskmanager.numberOfTaskSlots: 4

If using InfluxDB for metrics collection then the following lines need to be added to the flink-conf.yaml:

metrics.reporter.influxdb.class: org.apache.flink.metrics.influxdb.InfluxdbReporter
metrics.reporter.influxdb.host: localhost
metrics.reporter.influxdb.port: 8086
metrics.reporter.influxdb.db: flink
metrics.reporter.influxdb.retentionPolicy: one_hour

Run Flink

Finally, we run up the Flink cluster on our local machine:

start-cluster.sh

Running scenarios

First build this project using mvn clean package. Then run the appropriate scenario using:

flink run target/flink-variance-1.0-SNAPSHOT.jar --properties <scenario name>.properties

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src/main		src/main
README.md		README.md
bad-straggler.properties		bad-straggler.properties
constant-straggler.properties		constant-straggler.properties
minimal-variance.properties		minimal-variance.properties
pom.xml		pom.xml
slight-straggler.properties		slight-straggler.properties

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Flink Variance

Running Flink

Install Flink

(Optional) Install InfluxDB and Chronograf

Configure Flink

Run Flink

Running scenarios

About

Uh oh!

Releases

Packages

Uh oh!

Languages

owenrh/flink-variance

Folders and files

Latest commit

History

Repository files navigation

Flink Variance

Running Flink

Install Flink

(Optional) Install InfluxDB and Chronograf

Configure Flink

Run Flink

Running scenarios

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages