Skip to content

leonasa/udacitySFCrimeStream

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
Leonardo Santana
Jan 22, 2021
5a13270 · Jan 22, 2021

History

4 Commits
Jan 22, 2021
Jan 21, 2021
Jan 21, 2021
Jan 21, 2021
Jan 21, 2021
Jan 21, 2021
Jan 21, 2021
Jan 21, 2021
Jan 22, 2021
Jan 21, 2021
Jan 22, 2021
Jan 22, 2021

Repository files navigation

SF Crime Statistics with Spark Streaming

Questions

  1. How did changing values on the SparkSession property parameters affect the throughput and latency of the data? Changing the values for maxRatePerPartition and maxOffSetPerTrigger can make the value processedRowsPerSecond vary considerably. Higher values for them increases the processedRowsPerSecond until a certain point when the value processedRowsPerSecond starts to decrease.

  2. What were the 2-3 most efficient SparkSession property key/value pairs? Through testing multiple variations on values, how can you tell these were the most optimal? I found the most variation when changing these values: maxRatePerPartition and maxOffSetPerTrigger. We can tell the optimal value by checking the values of processedRowsPerSecond and durantionMs values in the progression report.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages