Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregation without group by produces incorrect results for scalars #1198

Merged
merged 11 commits into from
Apr 6, 2025

Conversation

ihorandrianov
Copy link
Contributor

Closes #954
Before:
Знімок екрана 2025-03-27 о 21 49 19

After:

Знімок екрана 2025-03-27 о 21 49 44

SQLite:

Знімок екрана 2025-03-27 о 21 50 04

@jussisaurio
Copy link
Collaborator

Hi, thanks! this fix needs to be a bit more general in my opinion - things like this still give incorrect results:

# notice id is not in the group by clause, but sqlite still returns the first id per group, whereas limbo returns NULL
select id, first_name, sum(age) from users group by first_name limit 1;
42|Aaron|2271

@ihorandrianov ihorandrianov marked this pull request as draft March 30, 2025 16:55
@ihorandrianov ihorandrianov marked this pull request as ready for review March 31, 2025 08:10
@ihorandrianov
Copy link
Contributor Author

Ready for second review try. Also fixed order by elimination when group by is present like:

select id, first_name, sum(age) from users group by first_name order by id limit 5;

@jussisaurio
Copy link
Collaborator

Hi @ihorandrianov can you please add at least TCL tests (*.test) that demonstrate the fix works?

@@ -63,6 +63,8 @@ pub struct TranslateCtx<'a> {
pub label_main_loop_end: Option<BranchOffset>,
// First register of the aggregation results
pub reg_agg_start: Option<usize>,
// Register to track if we set non aggregate cols to first encountered row in non group by agg statement
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I had no context at all and didn't know what this PR was about, I would have no idea what this means.

How about:

// In non-group-by statements with aggregations (e.g. SELECT foo, bar, sum(baz) FROM t), we want to emit the non-aggregate columns (foo and bar) only once. This register is a flag that tracks whether we have already done that.


let sort_cursor = program.alloc_cursor_id(None, CursorType::Sorter);

let reg_abort_flag = program.alloc_register();
let reg_group_exprs_cmp = program.alloc_registers(group_by.exprs.len());
let reg_group_exprs_acc = program.alloc_registers(group_by.exprs.len());
let reg_group_exprs_acc = program.alloc_registers(non_aggregate_count);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be renamed reg_non_aggregate_exprs_acc

.iter()
.map(|agg| agg.args.len())
.sum::<usize>();
let sorter_column_count = non_aggregate_count + agg_args_count;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above should now say "all non-aggregate columns and all arguments of agg functions are in the sorter"

@@ -296,7 +311,7 @@ pub fn emit_group_by<'a>(
});

// Read the group by columns for a finished group
for i in 0..group_by.exprs.len() {
for i in 0..non_aggregate_count {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Read the non-aggregate columns for a finished group

.result_columns
.iter()
.filter(|rc| !rc.contains_aggregates)
.collect::<Vec<_>>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be called non_aggregate_result_columns and I don't think we need to .collect()

}
let reg = start_reg + num_aggs + i;

let if_label = if let Some(flag) = t_ctx.reg_agg_flag {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let label_emit_nonagg_only_once?

while i < o.len() {
let (key, order) = &o[i];

if matches!(order, Direction::Descending) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add comment

if pos != insert_pos {
let mut current_pos = pos;
while current_pos > insert_pos {
g.exprs.swap(current_pos, current_pos - 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add comment

@@ -108,6 +156,11 @@ fn eliminate_unnecessary_orderby(plan: &mut SelectPlan, schema: &Schema) -> Resu
return Ok(());
}

// if pk will be removed later
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does 'if pk' mean

@@ -25,7 +25,9 @@ pub fn emit_select_result(
}

let start_reg = t_ctx.reg_result_cols_start.unwrap();
for (i, rc) in plan.result_columns.iter().enumerate() {
for (i, rc) in plan.result_columns.iter().enumerate().filter(|(_, rc)| {
t_ctx.reg_agg_flag.is_some() && rc.contains_aggregates || t_ctx.reg_agg_flag.is_none()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is happening here and why

Copy link
Collaborator

@pereman2 pereman2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good for me, I cannot give better review than @jussisaurio

Comment on lines 250 to 257
if !plan.aggregates.is_empty()
&& plan.group_by.is_none()
&& plan.result_columns.iter().any(|c| !c.contains_aggregates)
{
let flag = program.alloc_register();
program.emit_int(0, flag);
t_ctx.reg_nonagg_emit_once_flag = Some(flag);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have comment here about the behaviour? I know it's in the reg_nonagg_emit_once_flag definition but it could be good to know here too.

Copy link
Collaborator

@jussisaurio jussisaurio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff, I left some very minor nits

@jussisaurio jussisaurio merged commit c19e4fc into tursodatabase:main Apr 6, 2025
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Aggregation without group by produces incorrect results for scalars
3 participants