Skip to content

Issues with quoted identifiers and CTEs #1559

@aguynamedryan

Description

@aguynamedryan

After completing #1558, I discovered there were some other issues with CTEs, namely, dbplyr quoted tables names inconsistently and incorrectly.

I created this test case to generate the two different issues I encountered.

test_that("CTE quoting works right", {
  lf1 <- lazy_frame(x = 1, y = 2, .name = "lf1")
  lf2 <- lazy_frame(x = 1, z = 2, .name = "lf2")

  # The query is nonsensical, because I took a failing query from the wild and
  # whiddled it down to the minimum amount of code required to reveal two
  # separate quoting issues when using CTEs

  double_it <- function(.data, column_name) {
    .data %>%
      mutate(
        "{ column_name }" := .data[[column_name]] * 2
      )
  }

  skinny <- function(column_name) {
    lf1 %>%
      double_it(column_name) %>%
      mutate(
        column_name = column_name,
        .keep = "none"
      )
  }

  tall_tbl <- purrr::map(c("x", "y"), skinny) %>%
    purrr::reduce(dplyr::union_all)

  query <- tall_tbl %>%
    left_join(tall_tbl, by = join_by(column_name)) %>%
    left_join(tall_tbl, by = join_by(column_name)) %>%
    sql_render(sql_options = sql_options(cte = TRUE)) %>%
    expect_snapshot()
})

The output produced for this snapshot is the following:

WITH `q01` AS (
    SELECT `x` * 2.0 AS `x`, `y`
    FROM `lf1`
  ),
  `q02` AS (
    SELECT 'x' AS `column_name`
    FROM `q01`
  ),
  `q03` AS (
    SELECT `x`, `y` * 2.0 AS `y`
    FROM `lf1`
  ),
  `q04` AS (
    SELECT 'y' AS `column_name`
    FROM `q03` AS `q01`
  ),
  `q05` AS (
    SELECT *
    FROM `q02`
  
    UNION ALL
  
    SELECT *
    FROM `q04`
  ),
  `q06` AS (
    SELECT 'x' AS `column_name`
    FROM ```q01``` AS `q01`
  ),
  `q07` AS (
    SELECT 'y' AS `column_name`
    FROM ```q03``` AS `q01`
  ),
  `q08` AS (
    SELECT *
    FROM `q06`
  
    UNION ALL
  
    SELECT *
    FROM `q07`
  ),
  `q09` AS (
    SELECT *
    FROM '`q06`'
  
    UNION ALL
  
    SELECT *
    FROM '`q07`'
  )
  SELECT `...1`.`column_name` AS `column_name`
  FROM `q05` AS `...1`
  LEFT JOIN `q08` AS `...2`
    ON (`...1`.`column_name` = `...2`.`column_name`)
  LEFT JOIN `q09` AS `...3`
    ON (`...1`.`column_name` = `...3`.`column_name`)

Here are the two issues I noticed:

  • If you look at the definition for CTE q06, you'll see the FROM clause has incorrectly triple-quoted the q03 identifier
  • If you look at the definition for CTE q09, you'll see the FROM clause has incorrectly applied single quotes around the q06 identifier

Either of these issues render the SQL statement invalid and unable to run on a database.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions