Skip to content

Make chunk map as global side metadata, and each space only lists its own chunks using chunk map #1304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

qinsoon
Copy link
Member

@qinsoon qinsoon commented Apr 17, 2025

The chunk map records the lowest chunk and the highest chunk used by a space, and accesses all the chunk states (local side metadata) in the range. There are multiple issues when we use discontiguous spaces.

  1. Chunks in the range may be used unused, thus the local side metadata may be unmapped.
  2. Chunks in the range may belong to other spaces. If other spaces also use the chunk map, they will list all the chunks even when the chunks do not belong to the space.

This PR makes the chunk state side metadata as a global side metadata. We need the chunk map for plans that use the chunk map. As we will use mark sweep or nonmoving immix as the non moving policy, they both use the chunk map, and the non moving policy is included in every plan, we just include the side metadata for every plan.

This PR fixes #1197.

@qinsoon qinsoon force-pushed the chunkmap-ownership branch from 6c236b8 to 4472263 Compare April 17, 2025 04:37
@qinsoon qinsoon added the PR-extended-testing Run extended tests for the pull request label Apr 22, 2025
@qinsoon qinsoon marked this pull request as ready for review April 23, 2025 00:00
@qinsoon qinsoon requested review from wks and k-sareen April 23, 2025 00:00
@k-sareen
Copy link
Collaborator

Chunks in the range may be used, thus the local side metadata may be unmapped.

I don't quite follow what you mean. Do you mean that chunks may be used so local metadata is mapped? Because how can it be unmapped if the chunk is used? Or do you mean some chunks in the range may not be used so the side metadata is unmapped.

@qinsoon
Copy link
Member Author

qinsoon commented Apr 23, 2025

Chunks in the range may be used, thus the local side metadata may be unmapped.

I don't quite follow what you mean. Do you mean that chunks may be used so local metadata is mapped? Because how can it be unmapped if the chunk is used? Or do you mean some chunks in the range may not be used so the side metadata is unmapped.

Yeah. That is a typo. I meant chunks may not be used.

Copy link
Collaborator

@k-sareen k-sareen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The major question I have with this PR is that why do we want to associate free chunks with a space? A free chunk is free and to me it doesn't make sense to associate with a space. In my branch I just used 0x0 to mark a chunk as completely free and any space could use it theoretically.

When I implemented this in my branch I didn't do any performance evaluation or think about alternatives. I would at least like to make sure this doesn't degrade performance much for Immix or MarkSweep.

Also I would like if the PR title was made less verbose. Something like: "Make chunk map a global metadata" or "Fix bugs with chunk map being shared across spaces" is sufficient I think.

/// Set chunk state
pub fn set(&self, chunk: Chunk, state: ChunkState) {
/// Set a chunk as allocated, or as free.
pub fn set_allocated(&self, chunk: Chunk, allocated: bool) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this function name is confusing. set_allocated to me sounds like this function will set the chunk to the allocated state, but it can actually set it to both allocated or free.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The boolean argument in set_allocated(bool) should be pretty clear that you can either set it to 'allocated' or 'not allocated'. Do you have a better suggestion?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why set is not good enough? The previous one's signature is pretty much the same except state is now a bool. Maybe @wks has a better idea. Or if he's fine with the new name, then that's also okay.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt after changing the argument from an enum to a boolean, chunk_map.set(chunk, true) was not as clear as chunk_map.set_allocated(chunk, true) so I made the change.

Copy link
Collaborator

@k-sareen k-sareen Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough 👍 I'll wait for @wks to see if he has a better suggestion.

}

/// Get chunk state, regardless of the space. This should always be private.
fn get_any(&self, chunk: Chunk) -> ChunkState {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_any is not really descriptive of what it's doing. Should be something like get_internal or get_private I feel like.

Comment on lines +177 to 189
/// A range of all chunks in the heap.
pub fn all_chunks(&self) -> impl Iterator<Item = Chunk> + '_ {
let chunk_range = self.chunk_range.lock();
RegionIterator::<Chunk>::new(chunk_range.start, chunk_range.end)
.filter(|c| self.get(*c).is_some())
}

/// A range of all chunks in the heap.
pub fn all_chunks(&self) -> RegionIterator<Chunk> {
pub fn all_allocated_chunks(&self) -> impl Iterator<Item = Chunk> + '_ {
let chunk_range = self.chunk_range.lock();
RegionIterator::<Chunk>::new(chunk_range.start, chunk_range.end)
.filter(|c| self.get(*c).is_some_and(|state| state.is_allocated()))
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand why we need this. The issue here is that when a chunk is freed, we still associate a space index with it. But why is that the case? We can just have the 0x0 space descriptor associated with it (I think 0x0 is reserved for the empty space). This would mean we don't need to have two functions for iterating through chunks for a space since we will automatically skip the free ones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just making sure all_chunks does the same thing as before, which lists all the chunks for the space no matter it is allocated or free.

Copy link
Collaborator

@k-sareen k-sareen Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did a grep of where the all_chunks function is used. Most of the times it's used as if all_allocated_chunks. The only times it's used as actually all chunks regardless of allocation state is when we are iterating through the chunks to bulk clear/set metadata bits. But I think it doesn't make sense to do that for free chunks since that (i.e. bulk clearing/setting metadata) should happen when we either free or allocate a (new) chunk.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I don't think the above is a good idea anyway. We shouldn't be going through free chunks when clearing/setting metadata. It should almost always be the allocated chunks. Do you have a use-case in mind where it makes sense to have a space associated with a free chunk?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure. I will try replace the current use of all_chunks to all_allocated_chunks and see if it works.

}

/// A byte-map to record all the allocated chunks.
/// A plan can use this to maintain records for the chunks that they used, and the states of the chunks.
/// Any plan that uses the chunk map should include the `ALLOC_TABLE` spec in their local sidemetadata specs
/// Any plan that uses the chunk map should include the `ALLOC_TABLE` spec in their local sidemetadata specs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Any plan that uses the chunk map should include the `ALLOC_TABLE` spec in their local sidemetadata specs.
/// Any plan that uses the chunk map should include the `ALLOC_TABLE` spec in their global sidemetadata specs.

@@ -60,6 +60,8 @@ define_side_metadata_specs!(
MS_ACTIVE_CHUNK = (global: true, log_num_of_bits: 3, log_bytes_in_region: LOG_BYTES_IN_CHUNK),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my branch I did a quick fix and didn't bother to coalesce MS_ACTIVE_CHUNK and CHUNK_MARK but I think we should for this PR. It's not useful to have the same metadata under different names (they basically are the same metadata).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MS_ACTIVE_CHUNK only cares about chunks with malloc in them. But with this PR, we should be able to distinguish between a chunk mapped by the non-moving immix space and the malloc space (for example). I feel like MS_ACTIVE_CHUNK can be subsumed by CHUNK_MARK now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be possible. We could simply replace MS_ACTIVE_CHUNK in malloc mark sweep to CHUNK_MARK (which might be an issue if we have other spaces that use ChunkMap). Or we can start using ChunkMap in malloc mark sweep, but this may not be an easy refactoring. Either way, I will give it a try after this PR.

@@ -60,6 +60,8 @@ define_side_metadata_specs!(
MS_ACTIVE_CHUNK = (global: true, log_num_of_bits: 3, log_bytes_in_region: LOG_BYTES_IN_CHUNK),
// Track the index in SFT map for a chunk (only used for SFT sparse chunk map)
SFT_DENSE_CHUNK_MAP_INDEX = (global: true, log_num_of_bits: 3, log_bytes_in_region: LOG_BYTES_IN_CHUNK),
// Mark chunks (any plan that uses the chunk map should include this spec in their local sidemetadata specs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Mark chunks (any plan that uses the chunk map should include this spec in their local sidemetadata specs)
// Mark chunks (any plan that uses the chunk map should include this spec in their global side-metadata specs)

@wks
Copy link
Collaborator

wks commented Apr 23, 2025

... why do we want to associate free chunks with a space?

I have the same concern as @k-sareen's. But I'd like to make clear what purpose ChunkMap serves. There could be two answers.

  1. ChunkMap is for identifying (and iterating over) all chunks that belongs to a space and holds meaningful data (Immix blocks and MarkSweep blocks).
  2. ChunkMap is for identifying chunks that the VMMap have given to a specific space.

The second purpose can be supported by VMMap. Map64 simply uses address range to identify allocated chunks, and chunks from different spaces don't overlap. Map32 maintains an array descriptor_map which maps each Chunk index to a SpaceDescriptor that encodes the space ID. If we want to iterate through all chunks VMMap given to a space, we can use its underlying PageResource (note that each concrete space knows exactly what PageResource impl it is using).

The first purpose, however, cannot always be supported by VMMap. A Chunk can be given to a Space and the space never returns the Chunk to the VMMap. Furthermore, Map64 doesn't have any per-chunk metadata at all, and it never shrinks the high watermark! I think that's why we introduced ChunkMap in the first place. We introduced ChunkMap to record whether each chunk is allocated or not because Map64 won't tell us.

If we make use of VMMap, I think we can implement ChunkMap using only a one-bit-per-chunk metadata. 1=allocated, and 0=unallocated.

  • For Map32, we use the descriptor_map to identify chunks that belong to a given space, and use the one-bit-per-chunk ChunkMap to identify allocated chunks.
  • For Map64, we just iterate through all chunks between the base_address and high_water, and use the one-bit-per-chunk ChunkMap to identify allocated chunks.

I did some Git archaeology into the JikesRVM repo. @steveblackburn developed its predecessor ChunkList in 2008 for ImmixSpace alone, and it was for iterating all allocated blocks in allocated chunks in order to compute the histogram. But it was a linked list structure instead of bitmap or byte map. It looks like it was intended to be a way to find allocated chunks without scanning through all chunks. Maybe our work packet and GC worker architecture allowed us to shift our paradigm to a more parallelized approach, i.e. making one work packet for each chunk instead of doing the iteration in one single thread. This historical usage implies that it was for the first purpose, i.e. identifying allocated blocks in ImmixSpace.

@wks
Copy link
Collaborator

wks commented Apr 23, 2025

After this PR, ChunkMap::all_chunks() is only used in two palces, in ImmixSpace and MarkSweepSpace, respectily. Both use it to clear side log bits. Because that happens during Prepare, all objects with log bits are in "allocated" chunks, and we ensured that there is no stale log bits after each GC, ChunkMap::all_chunks() should be equivalent to ChunkMap::all_allocated_chunks().

So I think it should be sound to simply record the space ID in each ChunkState, and we use 0 to identify unallocated chunks (i.e. it doesn't belong to any space).

And if we can use both VMMap and ChunkMap, we just need one bit as ChunkState (i.e. just like the old ChunkState), and let Map32::descriptor_map (and the address range for Map64) identify the chunk owner.

@k-sareen
Copy link
Collaborator

After this PR, ChunkMap::all_chunks() is only used in two palces, in ImmixSpace and MarkSweepSpace, respectily. Both use it to clear side log bits.

I think I can answer this. In my branch, all_chunks only iterates through the chunks allocated for that space since I use 0x0 for free chunks. And this escaped my attention when I implemented it for master. I don't think we need to iterate through free chunks for that.

And if we can use both VMMap and ChunkMap, we just need one bit as ChunkState (i.e. just like the old ChunkState), and let Map32::descriptor_map (and the address range for Map64) identify the chunk owner.

The old ChunkState was still a u8 because 1B / 4MB is trivial in terms of metadata overhead. Using 1 bit will result in unnecessary bit twiddling when trying to go through all chunks unless we use some implementation that reads a byte and resolves all the allocated chunks in there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR-extended-testing Run extended tests for the pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ChunkMap is broken when using multiple spaces that access it
3 participants