Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Incorrect mapping for timezone-less timestamp between Ibis schema and sqlglot #11062

Open
1 task done
h2o1 opened this issue Mar 31, 2025 · 0 comments
Open
1 task done
Labels
bug Incorrect behavior inside of ibis

Comments

@h2o1
Copy link

h2o1 commented Mar 31, 2025

What happened?

When converting an Ibis schema containing a timezone-less Timestamp column (i.e. dt.Timestamp with timezone set to None) via a Databricks backend, the column is mapped to the sqlglot TIMESTAMP (defined in _from_ibis_Timestamp in the SqlglotType class). However, in Databricks, TIMESTAMP corresponds to the timestamp type with timezone information (https://docs.databricks.com/aws/en/sql/language-manual/data-types/timestamp-type). The correct type to convert to should be TIMESTAMP_NTZ (https://docs.databricks.com/aws/en/sql/language-manual/data-types/timestamp-ntz-type).

Example:

>>> conn = ibis.databricks.connect(...)
>>> schema = ibis.schema([('ts_col', 'timestamp')])
>>> schema['ts_col']
Timestamp(timezone=None, scale=None, nullable=True)
>>> conn.create_table('ts_col_table', schema=schema)
DatabaseTable: catalog_test.default.ts_col_table
  ts_col timestamp('UTC')
>>> read_schema = conn.table('ts_col_table').schema()
>>> read_schema
ibis.Schema {
  ts_col  timestamp('UTC')
}

As a workaround, I was able to monkey patch this behavior locally by implementing a custom DatabricksTypeMapper with a def _from_ibis_Timestamp method that returns sge.DataType(this=sge.DataType.Type.TIMESTAMPNTZ) when the input timestamp's timezone is None and wiring this TypeMapper into the DatabricksCompiler:

class DatabricksTypeMapper(PySparkType):
  @classmethod
  def _from_ibis_Timestamp(cls, dtype: dt.Timestamp) -> sge.DataType:
    code = sge.DataType.Type.TIMESTAMPNTZ if dtype.timezone is None else sge.DataType.Type.TIMESTAMP
    if dtype.scale is not None:
      scale = sge.DataTypeParam(this=sge.Literal.number(dtype.scale))
      return sge.DataType(this=code, expressions=[scale])
    else:
      return sge.DataType(this=code)

DatabricksCompiler.type_mapper = DatabricksTypeMapper
ibis.backends.sql.compilers.databricks.compiler = DatabricksCompiler()
ibis.backends.sql.datatypes.TYPE_MAPPERS[ibis.backends.sql.compilers.databricks.compiler.dialect] = DatabricksTypeMapper

What version of ibis are you using?

10.3.1

What backend(s) are you using, if any?

Databricks

Relevant log output

Code of Conduct

  • I agree to follow this project's Code of Conduct
@h2o1 h2o1 added the bug Incorrect behavior inside of ibis label Mar 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis
Projects
Status: backlog
Development

No branches or pull requests

1 participant