Skip to content

Conversation

polyominal
Copy link
Contributor

@polyominal polyominal commented Aug 16, 2025

Common sense:

auto KeyAt(int index) const -> const KeyType&;

is a strictly better interface than

auto KeyAt(int index) const -> KeyType;

(I thought it was obvious that these methods should return values whose lifetime is bound to the page object.)

FYI this is largely motivated by troubleshooting UB from writing something like

auto INDEXITERATOR_TYPE::operator*() -> std::pair<const KeyType &, const ValueType &> {
  return {page_guard.As<Page>()->KeyAt(index_), page_guard.As<Page>()->ValueAt(index_)};
}

in the B+tree iterator implementation 🫠.

Copy link
Member

@connortsui20 connortsui20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the most part I think this is fine to add, with a few caveats that may or may not be relevant, as my C++ is rusty (which is intentionally a pun here as what I'll say is true for Rust).

In BusTub there are no checks to see if the keys and values are properly aligned in the leaf pages (and if that were true, then I think accessing the references would be UB). I think this is not a problem because the templates give the key and value arrays in the leaves proper padding implicitly (which I think is also a problem, but one for another day).

On top of that, if people start passing references to keys all over the place, and the buffer pool manager decides to evict that memory and replace it with something else, that could also be UB (unfortunately we do not have lifetimes for references in C++ so there is nothing preventing that).

Regardless, I think I agree with you that we should probably have these return data with lifetimes bound to the scope of the page they live on rather than the function caller.

Could you document this in the header files? And also I think this might need to change for the other indexes (extendible hash table) to be consistent.

@polyominal
Copy link
Contributor Author

polyominal commented Aug 17, 2025

Makes sense to have consistent interfaces with documentation changes. I'll look into this after I finish having fun with the projects.

On top of that, if people start passing references to keys all over the place, and the buffer pool manager decides to evict that memory and replace it with something else, that could also be UB

I'm also bothered by how easy it is to create UAF in C++. A direct example in the BusTub context:

char *hacked = std::invoke([&] {
  WritePageGuard guard = bpm->WritePage(pid);
  return guard.AsMut<char>();
});
snprintf(hacked, 3, "hi");

In C++ it seems safety for such cases relies largely on user discipline and tooling, etc., which contrasts with rusty (the 🦀 sense) approaches.

Now that I think of it, the iterator operator*() example wasn't really an argument for this change. Having raw pointers/references to random stuff whose lifetime is not clear is trivially an evil practice, so it was just me whining about my own mistake. I guess the real motivation is that returning const references from accessor-like methods seems to be standard practice in most C++ codebases?

I'll continue working on the projects and see what feels right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants