Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set explicit guidelines when running large jobs #459

Open
williambrandler opened this issue Dec 7, 2021 · 0 comments
Open

Set explicit guidelines when running large jobs #459

williambrandler opened this issue Dec 7, 2021 · 0 comments

Comments

@williambrandler
Copy link
Contributor

Spark jobs with lots of partitions can crash if the driver is too small, for example with the error,

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 731259 tasks (4.0 GiB) is bigger than spark.driver.maxResultSize 4.0 GiB.

Set explicit guidelines for each job about cluster setup.

This will come once we have the continuous integration pipeline running on multitask jobs with a different setup for each use case (ingest vs etl vs regressions etc)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant