-
-
Notifications
You must be signed in to change notification settings - Fork 19
Unable to run Spark jobs on an Azure Databricks cluster: java.lang.NullPointerException at org.apache.spark.util.LocalHadoopConfiguration.set(SparkHadoopConfiguration.scala:196) #102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Aha, I was able to get it back working again:
Result is that it started to work again. So I can tell you with 100% confidence we introduced a new bug in this commit: 5f54f8d, but it got recently exposed (due to an underlying Azure Databricks upgrade). Unfortunately, I cannot really tell you yet what exactly is wrong in |
Hi @jochenhebbrecht I'm going to try to reproduce the error using the last version v1.0.10 BTW, the last time I tested it in Azure was 7 months ago, so the version tested there should be 1.0.7. |
Yes, sorry, we are using In the meantime, Azure support gave us a possibility to revert to the previous Databricks environment, but according to them, their underlying upgrade actually just exposes a bug on |
I was able to reproduce it in Azure. It is weird, because this line is getting the file system, so nothing special. Also, the NPE is triggered in a non reader related class. Looks like it is not broadcasting the hadoop configuration properly. |
yes, it looks like the line I noticed that this change was done related to a bug, but it's not clear what the cause was: #62 |
That fixed a back compatibility problem with Spark 2. So the problem could be a combination of the Spark version and the Hadoop version. Could you try to run it using Spark 2.4? Only to know if it is still working there. |
In AWS EMR with Hadoop 3.2.1 and Spark 3.1.2, version 1.0.10 is working without problems. Looks like there is something weird in Azure Databrick. |
Thanks Ángel for already having a look at this. When downgrading to We still have the support case open and Azure are also looking further into this problem on their side. I'll provide this feedback as well. It is still perfectly possible something is broken on Azure Databricks side, and that your code is actually perfectly fine. We are also still waiting for the source code on their side, to get smarter on where the NPE occurs |
I will research more using Azure, but I will not have time until the weekend. Also, this type of error requires infrastructure that is not free. Do you know if Microsoft or Databricks would give me resources for testing? That would be great. Could ask them? |
That sounds like a very great idea, it's definitely something I would like to achieve. I will contact the support team of Azure and ask whether you could get some capacity on resources to further investigate this bug. |
I cannot reproduce the problem on |
Feedback from Azure
So I would not worry too much on your codebase, let's give the support team some time and freedom to first investigate what they potentially broke. With regards to asking resources to reproduce it on your end:
I would suggest to close this ticket for now. We can always reopen it in case Azure Databricks would still come back to us indicating something's not valid in the codebase of |
Hi @jochenhebbrecht Thanks for your update. I will close the ticket, but let's recap to keep it in mind. Maybe it is necessary to change something in the connector related to this, like find a way to avoid serializing the HadoopConfig object. Tested in cloud providers with Spark 3.
Looks like the error is related to Hadoop config serialization. It makes sense because in the single node execution serialization is not needed, so it does not fail. FYI: I created ticket #103 to be able to test these types of problems quickly. |
Just wanted to confirm that databricks fixed the issue on azure. |
Hi,
Since last Friday, 5th of November, we seem to be unable to use the
osm4scala
library on an Azure Databricks cluster with configuration: 8.3 (includes Apache Spark 3.1.1, Scala 2.12).When we try to run this simple command in a notebook:
... we're getting following exception:
As we have been running
osm4scala
so many times and we didn't change anything on our side, we escaled this as a critical support case to Azure.Unfortunately, this is the feedback we've received:
(c) Azure Support
If we follow the stacktrace, you'll notice we'll end up at this part of your code:
... which is this line:
Are there any insights what could be causing this very annoying behavior?
It is also very unclear from where the following classes are originating. I would assume those are Azure Databricks specific, but Azure support is not confirming this.
The text was updated successfully, but these errors were encountered: