Make chunk map as global side metadata, and each space only lists its own chunks using chunk map #1304

qinsoon · 2025-04-17T01:36:31Z

The chunk map records the lowest chunk and the highest chunk used by a space, and accesses all the chunk states (local side metadata) in the range. There are multiple issues when we use discontiguous spaces.

Chunks in the range may be ~~used~~ unused, thus the local side metadata may be unmapped.
Chunks in the range may belong to other spaces. If other spaces also use the chunk map, they will list all the chunks even when the chunks do not belong to the space.

This PR makes the chunk state side metadata as a global side metadata. We need the chunk map for plans that use the chunk map. As we will use mark sweep or nonmoving immix as the non moving policy, they both use the chunk map, and the non moving policy is included in every plan, we just include the side metadata for every plan.

This PR fixes #1197.

k-sareen · 2025-04-23T00:55:43Z

Chunks in the range may be used, thus the local side metadata may be unmapped.

I don't quite follow what you mean. Do you mean that chunks may be used so local metadata is mapped? Because how can it be unmapped if the chunk is used? Or do you mean some chunks in the range may not be used so the side metadata is unmapped.

qinsoon · 2025-04-23T00:56:48Z

Chunks in the range may be used, thus the local side metadata may be unmapped.

I don't quite follow what you mean. Do you mean that chunks may be used so local metadata is mapped? Because how can it be unmapped if the chunk is used? Or do you mean some chunks in the range may not be used so the side metadata is unmapped.

Yeah. That is a typo. I meant chunks may not be used.

k-sareen

The major question I have with this PR is that why do we want to associate free chunks with a space? A free chunk is free and to me it doesn't make sense to associate with a space. In my branch I just used 0x0 to mark a chunk as completely free and any space could use it theoretically.

When I implemented this in my branch I didn't do any performance evaluation or think about alternatives. I would at least like to make sure this doesn't degrade performance much for Immix or MarkSweep.

Also I would like if the PR title was made less verbose. Something like: "Make chunk map a global metadata" or "Fix bugs with chunk map being shared across spaces" is sufficient I think.

k-sareen · 2025-04-23T00:57:15Z

src/util/heap/chunk_map.rs

-    /// Set chunk state
-    pub fn set(&self, chunk: Chunk, state: ChunkState) {
+    /// Set a chunk as allocated, or as free.
+    pub fn set_allocated(&self, chunk: Chunk, allocated: bool) {


I feel this function name is confusing. set_allocated to me sounds like this function will set the chunk to the allocated state, but it can actually set it to both allocated or free.

The boolean argument in set_allocated(bool) should be pretty clear that you can either set it to 'allocated' or 'not allocated'. Do you have a better suggestion?

I'm not sure why set is not good enough? The previous one's signature is pretty much the same except state is now a bool. Maybe @wks has a better idea. Or if he's fine with the new name, then that's also okay.

I felt after changing the argument from an enum to a boolean, chunk_map.set(chunk, true) was not as clear as chunk_map.set_allocated(chunk, true) so I made the change.

Fair enough 👍 I'll wait for @wks to see if he has a better suggestion.

k-sareen · 2025-04-23T00:58:06Z

src/util/heap/chunk_map.rs

+    }
+
+    /// Get chunk state, regardless of the space. This should always be private.
+    fn get_any(&self, chunk: Chunk) -> ChunkState {


get_any is not really descriptive of what it's doing. Should be something like get_internal or get_private I feel like.

k-sareen · 2025-04-23T00:59:43Z

src/util/heap/chunk_map.rs

+    /// A range of all chunks in the heap.
+    pub fn all_chunks(&self) -> impl Iterator<Item = Chunk> + '_ {
+        let chunk_range = self.chunk_range.lock();
+        RegionIterator::<Chunk>::new(chunk_range.start, chunk_range.end)
+            .filter(|c| self.get(*c).is_some())
    }

    /// A range of all chunks in the heap.
-    pub fn all_chunks(&self) -> RegionIterator<Chunk> {
+    pub fn all_allocated_chunks(&self) -> impl Iterator<Item = Chunk> + '_ {
        let chunk_range = self.chunk_range.lock();
        RegionIterator::<Chunk>::new(chunk_range.start, chunk_range.end)
+            .filter(|c| self.get(*c).is_some_and(|state| state.is_allocated()))
    }


I don't really understand why we need this. The issue here is that when a chunk is freed, we still associate a space index with it. But why is that the case? We can just have the 0x0 space descriptor associated with it (I think 0x0 is reserved for the empty space). This would mean we don't need to have two functions for iterating through chunks for a space since we will automatically skip the free ones.

I am just making sure all_chunks does the same thing as before, which lists all the chunks for the space no matter it is allocated or free.

I just did a grep of where the all_chunks function is used. Most of the times it's used as if all_allocated_chunks. The only times it's used as actually all chunks regardless of allocation state is when we are iterating through the chunks to bulk clear/set metadata bits. But I think it doesn't make sense to do that for free chunks since that (i.e. bulk clearing/setting metadata) should happen when we either free or allocate a (new) chunk.

Also I don't think the above is a good idea anyway. We shouldn't be going through free chunks when clearing/setting metadata. It should almost always be the allocated chunks. Do you have a use-case in mind where it makes sense to have a space associated with a free chunk?

I am not sure. I will try replace the current use of all_chunks to all_allocated_chunks and see if it works.

k-sareen · 2025-04-23T01:00:20Z

src/util/heap/chunk_map.rs

 }

 /// A byte-map to record all the allocated chunks.
 /// A plan can use this to maintain records for the chunks that they used, and the states of the chunks.
-/// Any plan that uses the chunk map should include the `ALLOC_TABLE` spec in their local sidemetadata specs
+/// Any plan that uses the chunk map should include the `ALLOC_TABLE` spec in their local sidemetadata specs.


Suggested change

/// Any plan that uses the chunk map should include the `ALLOC_TABLE` spec in their local sidemetadata specs.

/// Any plan that uses the chunk map should include the `ALLOC_TABLE` spec in their global sidemetadata specs.

k-sareen · 2025-04-23T01:02:20Z

src/util/metadata/side_metadata/spec_defs.rs

@@ -60,6 +60,8 @@ define_side_metadata_specs!(
    MS_ACTIVE_CHUNK = (global: true, log_num_of_bits: 3, log_bytes_in_region: LOG_BYTES_IN_CHUNK),


In my branch I did a quick fix and didn't bother to coalesce MS_ACTIVE_CHUNK and CHUNK_MARK but I think we should for this PR. It's not useful to have the same metadata under different names (they basically are the same metadata).

MS_ACTIVE_CHUNK only cares about chunks with malloc in them. But with this PR, we should be able to distinguish between a chunk mapped by the non-moving immix space and the malloc space (for example). I feel like MS_ACTIVE_CHUNK can be subsumed by CHUNK_MARK now.

It might be possible. We could simply replace MS_ACTIVE_CHUNK in malloc mark sweep to CHUNK_MARK (which might be an issue if we have other spaces that use ChunkMap). Or we can start using ChunkMap in malloc mark sweep, but this may not be an easy refactoring. Either way, I will give it a try after this PR.

k-sareen · 2025-04-23T01:02:27Z

src/util/metadata/side_metadata/spec_defs.rs

@@ -60,6 +60,8 @@ define_side_metadata_specs!(
    MS_ACTIVE_CHUNK = (global: true, log_num_of_bits: 3, log_bytes_in_region: LOG_BYTES_IN_CHUNK),
    // Track the index in SFT map for a chunk (only used for SFT sparse chunk map)
    SFT_DENSE_CHUNK_MAP_INDEX   = (global: true, log_num_of_bits: 3, log_bytes_in_region: LOG_BYTES_IN_CHUNK),
+    // Mark chunks (any plan that uses the chunk map should include this spec in their local sidemetadata specs)


Suggested change

// Mark chunks (any plan that uses the chunk map should include this spec in their local sidemetadata specs)

// Mark chunks (any plan that uses the chunk map should include this spec in their global side-metadata specs)

wks · 2025-04-23T08:28:57Z

... why do we want to associate free chunks with a space?

I have the same concern as @k-sareen's. But I'd like to make clear what purpose ChunkMap serves. There could be two answers.

ChunkMap is for identifying (and iterating over) all chunks that belongs to a space and holds meaningful data (Immix blocks and MarkSweep blocks).
ChunkMap is for identifying chunks that the VMMap have given to a specific space.

The second purpose can be supported by VMMap. Map64 simply uses address range to identify allocated chunks, and chunks from different spaces don't overlap. Map32 maintains an array descriptor_map which maps each Chunk index to a SpaceDescriptor that encodes the space ID. If we want to iterate through all chunks VMMap given to a space, we can use its underlying PageResource (note that each concrete space knows exactly what PageResource impl it is using).

The first purpose, however, cannot always be supported by VMMap. A Chunk can be given to a Space and the space never returns the Chunk to the VMMap. Furthermore, Map64 doesn't have any per-chunk metadata at all, and it never shrinks the high watermark! I think that's why we introduced ChunkMap in the first place. We introduced ChunkMap to record whether each chunk is allocated or not because Map64 won't tell us.

If we make use of VMMap, I think we can implement ChunkMap using only a one-bit-per-chunk metadata. 1=allocated, and 0=unallocated.

For Map32, we use the descriptor_map to identify chunks that belong to a given space, and use the one-bit-per-chunk ChunkMap to identify allocated chunks.
For Map64, we just iterate through all chunks between the base_address and high_water, and use the one-bit-per-chunk ChunkMap to identify allocated chunks.

I did some Git archaeology into the JikesRVM repo. @steveblackburn developed its predecessor ChunkList in 2008 for ImmixSpace alone, and it was for iterating all allocated blocks in allocated chunks in order to compute the histogram. But it was a linked list structure instead of bitmap or byte map. It looks like it was intended to be a way to find allocated chunks without scanning through all chunks. Maybe our work packet and GC worker architecture allowed us to shift our paradigm to a more parallelized approach, i.e. making one work packet for each chunk instead of doing the iteration in one single thread. This historical usage implies that it was for the first purpose, i.e. identifying allocated blocks in ImmixSpace.

wks · 2025-04-23T09:01:36Z

After this PR, ChunkMap::all_chunks() is only used in two palces, in ImmixSpace and MarkSweepSpace, respectily. Both use it to clear side log bits. Because that happens during Prepare, all objects with log bits are in "allocated" chunks, and we ensured that there is no stale log bits after each GC, ChunkMap::all_chunks() should be equivalent to ChunkMap::all_allocated_chunks().

So I think it should be sound to simply record the space ID in each ChunkState, and we use 0 to identify unallocated chunks (i.e. it doesn't belong to any space).

And if we can use both VMMap and ChunkMap, we just need one bit as ChunkState (i.e. just like the old ChunkState), and let Map32::descriptor_map (and the address range for Map64) identify the chunk owner.

k-sareen · 2025-04-23T10:22:08Z

After this PR, ChunkMap::all_chunks() is only used in two palces, in ImmixSpace and MarkSweepSpace, respectily. Both use it to clear side log bits.

I think I can answer this. In my branch, all_chunks only iterates through the chunks allocated for that space since I use 0x0 for free chunks. And this escaped my attention when I implemented it for master. I don't think we need to iterate through free chunks for that.

And if we can use both VMMap and ChunkMap, we just need one bit as ChunkState (i.e. just like the old ChunkState), and let Map32::descriptor_map (and the address range for Map64) identify the chunk owner.

The old ChunkState was still a u8 because 1B / 4MB is trivial in terms of metadata overhead. Using 1 bit will result in unnecessary bit twiddling when trying to go through all chunks unless we use some implementation that reads a byte and resolves all the allocated chunks in there.

qinsoon added 4 commits April 17, 2025 00:20

Refactor ChunkState to encode space index

b7b5988

Make chunk map as global side metadata

1c64dcb

Remove ALLOC_TABLE from local specs. Fix build for 1.74.1

abbde8b

Remove an outdated assertion

4472263

qinsoon force-pushed the chunkmap-ownership branch from 6c236b8 to 4472263 Compare April 17, 2025 04:37

qinsoon added the PR-extended-testing Run extended tests for the pull request label Apr 22, 2025

qinsoon marked this pull request as ready for review April 23, 2025 00:00

qinsoon requested review from wks and k-sareen April 23, 2025 00:00

k-sareen reviewed Apr 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make chunk map as global side metadata, and each space only lists its own chunks using chunk map #1304

Make chunk map as global side metadata, and each space only lists its own chunks using chunk map #1304

qinsoon commented Apr 17, 2025 •

edited

Loading

k-sareen commented Apr 23, 2025

qinsoon commented Apr 23, 2025

k-sareen left a comment •

edited

Loading

k-sareen Apr 23, 2025

qinsoon Apr 23, 2025

k-sareen Apr 23, 2025

qinsoon Apr 23, 2025

k-sareen Apr 23, 2025 •

edited

Loading

k-sareen Apr 23, 2025

k-sareen Apr 23, 2025

qinsoon Apr 23, 2025

k-sareen Apr 23, 2025 •

edited

Loading

k-sareen Apr 23, 2025

qinsoon Apr 23, 2025

k-sareen Apr 23, 2025

k-sareen Apr 23, 2025

k-sareen Apr 23, 2025

qinsoon Apr 23, 2025

k-sareen Apr 23, 2025

wks commented Apr 23, 2025

wks commented Apr 23, 2025

k-sareen commented Apr 23, 2025

	/// Any plan that uses the chunk map should include the `ALLOC_TABLE` spec in their local sidemetadata specs.
	/// Any plan that uses the chunk map should include the `ALLOC_TABLE` spec in their global sidemetadata specs.

		@@ -60,6 +60,8 @@ define_side_metadata_specs!(
		MS_ACTIVE_CHUNK = (global: true, log_num_of_bits: 3, log_bytes_in_region: LOG_BYTES_IN_CHUNK),

	// Mark chunks (any plan that uses the chunk map should include this spec in their local sidemetadata specs)
	// Mark chunks (any plan that uses the chunk map should include this spec in their global side-metadata specs)

Make chunk map as global side metadata, and each space only lists its own chunks using chunk map #1304

Are you sure you want to change the base?

Make chunk map as global side metadata, and each space only lists its own chunks using chunk map #1304

Conversation

qinsoon commented Apr 17, 2025 • edited Loading

k-sareen commented Apr 23, 2025

qinsoon commented Apr 23, 2025

k-sareen left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k-sareen Apr 23, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k-sareen Apr 23, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wks commented Apr 23, 2025

wks commented Apr 23, 2025

k-sareen commented Apr 23, 2025

qinsoon commented Apr 17, 2025 •

edited

Loading

k-sareen left a comment •

edited

Loading

k-sareen Apr 23, 2025 •

edited

Loading

k-sareen Apr 23, 2025 •

edited

Loading