Skip to content

BigQuery writes with GenericRecord format don't support overridden non-String types #5644

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
clairemcginty opened this issue Mar 17, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@clairemcginty
Copy link
Contributor

Any BQT class that uses OverrideTypeProvider to wrap a non-String type will fail when GenericRecord format is used to write records to BQ. For example:

// sample of override type provider
object NonNegativeInt {
  def parse(data: String): NonNegativeInt = NonNegativeInt(data.toInt)

  def stringType: String = "NONNEGATIVEINT"

  def bigQueryType: String = "INTEGER"
}
object Index {
  def getIndexCompileTimeTypes(c: blackbox.Context): mutable.Map[c.Type, Class[_]] = {
    import c.universe._
    mutable.Map[Type, Class[_]](
      typeOf[NonNegativeInt] -> classOf[NonNegativeInt]
    )
  }
  def getIndexClass: mutable.Map[String, Class[_]] =
    mutable.Map[String, Class[_]](
      NonNegativeInt.stringType -> classOf[NonNegativeInt]
    )
  def getIndexRuntimeTypes: mutable.Map[Type, Class[_]] =
    mutable.Map[Type, Class[_]](
      typeOf[NonNegativeInt] -> classOf[NonNegativeInt]
    )
}

// sample of job
@BigQueryType.ToTable
case class MyRecord(i: NonNegativeInt)

sc
  .parallelize(1 to 10)
  .map(MyRecord(NonNegativeInt(i))
  .saveAsTypedBigQueryTable(...)

will fail with a class cast exception like:

[info]   at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:326)
[info]   at org.apache.beam.sdk.io.gcp.bigquery.AvroRowWriter.write(AvroRowWriter.java:58)
[info]   at org.apache.beam.sdk.io.gcp.bigquery.WriteBundlesToFiles.processElement(WriteBundlesToFiles.java:247)
[info]   ...
[info]   Cause: java.lang.ClassCastException: value 31 (a com.spotify.scio.example.NonNegativeInt) cannot be cast to expected type long at MyRecord.i
[info]   at org.apache.avro.path.TracingClassCastException.summarize(TracingClassCastException.java:79)
[info]   at org.apache.avro.path.TracingClassCastException.summarize(TracingClassCastException.java:30)
[info]   at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:84)
[info]   at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:323)
[info]   at org.apache.beam.sdk.io.gcp.bigquery.AvroRowWriter.write(AvroRowWriter.java:58)
[info]   at org.apache.beam.sdk.io.gcp.bigquery.WriteBundlesToFiles.processElement(WriteBundlesToFiles.java:247)
[info]   at org.apache.beam.sdk.io.gcp.bigquery.WriteBundlesToFiles$DoFnInvoker.invokeProcessElement(Unknown Source)
[info]   at org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:212)
[info]   at org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:186)
[info]   at org.apache.beam.runners.core.SimplePushbackSideInputDoFnRunner.processElementInReadyWindows(SimplePushbackSideInputDoFnRunner.java:88)

This is because toAvroInternal converts all overridden types to String: https://github.com/spotify/scio/blob/v0.14.14/scio-google-cloud-platform/src/main/scala/com/spotify/scio/bigquery/types/ConverterProvider.scala#L174 . I think this was just copied from the toTableRow behavior, where it works fine because JSON format supports stringified everything, but Avro is more strict; the converted avroSchema correctly expects an Integer value.

The workaround is to fall back to TableRow format.

@clairemcginty
Copy link
Contributor Author

I think this is theoretically easy to fix (remove the .toString) but I think it will break existing implementations of OverrideTypeProvider, i.e. Elitzur. May be simpler for these users to just fall back to TableRow format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant