Skip to content

Conversation

tianleq
Copy link
Collaborator

@tianleq tianleq commented Jul 28, 2025

We added support for the new plan ConcurrentImmix added to the mmtk-core in mmtk/mmtk-core#1355

We implemented the SATB barrier fast paths in the OpenJDK binding, and refactored the barriers to support both pre and post barriers, as well as (weak) reference loading barrier. The OpenJDK binding is now aware of concurrent marking, too.

@@ -0,0 +1,536 @@
#define private public // too lazy to change openjdk...
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from lxr branch. Basically when code needs to be patched, we need to call a private method. A proper solution is to define friend class in OpenJDK

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works around the fact that LIRAssembler::as_Address is private in OpenJDK 11. It is a public method in OpenJDK 21. We may remove this workaround when porting.

@tianleq tianleq changed the title WIP Concurrent Immix Jul 28, 2025
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this mmtkSATBBarrier is also an object remembering barrier. Theoretically, we should be able to reuse the existing mmtkObjectBarrier. But due to the current barrier api design, the mmtkObjectBarrier has the generational semantic baked in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Extracting the common code paths should be helpful from the software engineering's point of view. It will also be helpful for porting this to the OpenJDK 21 binding, in which case fewer lines of code means fewer places to merge. I attempted to extract some code, but found myself digging into a rat hole of many new crashes. For now I'll make this PR stable and get it merged first, then we can refactor the barriers in mmtk-openjdk, and refactor the pause/GC things in mmtk-core in parallel. I'll create another PR for the barriers refactoring.

_pre_barrier_c1_runtime_code_blob = Runtime1::generate_blob(buffer_blob, -1, "mmtk_pre_write_code_gen_cl", false, &pre_write_code_gen_cl);
MMTkPostBarrierCodeGenClosure post_write_code_gen_cl;
_post_barrier_c1_runtime_code_blob = Runtime1::generate_blob(buffer_blob, -1, "mmtk_post_write_code_gen_cl", false, &post_write_code_gen_cl);
// MMTkBarrierCodeGenClosure write_code_gen_cl_patch_fix(true);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was in the lxr branch and after I discussed with wenyu, we believe it is redundant. The code patching has already been dealt with, no need to do any special things here

@@ -70,13 +72,16 @@ class MMTkBarrierSetRuntime: public CHeapObj<mtGC> {
static void object_reference_array_copy_pre_call(void* src, void* dst, size_t count);
Copy link
Collaborator Author

@tianleq tianleq Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In OpenJDK, array copy does not give base address of src or dst arrays, so nothing can be remembered. I have a patch in Iso to pass down base address of both src and dst arrarys (they are required by the publication semantics). I am not sure if it is worthwhile to have that in our OpenJDK fork

@qinsoon
Copy link
Member

qinsoon commented Jul 29, 2025

All the tests failed. @tianleq Is there anything we need to do for running ConcurrentImmix?

@tianleq
Copy link
Collaborator Author

tianleq commented Jul 29, 2025

All the tests failed. @tianleq Is there anything we need to do for running ConcurrentImmix?

I did not test compressed pointers as in Iso, it is not supported yet. Other than that, the minheap is much larger due to it being non-moving immix

@qinsoon
Copy link
Member

qinsoon commented Jul 29, 2025

All the tests failed. @tianleq Is there anything we need to do for running ConcurrentImmix?

I did not test compressed pointers as in Iso, it is not supported yet. Other than that, the minheap is much larger due to it being non-moving immix

Thanks. I temporarily disabled compressed pointer for concurrent Immix. We should get compressed pointer working before merging the PR.

@qinsoon
Copy link
Member

qinsoon commented Jul 29, 2025

OOM is expected, as concurrent Immix is non-moving.

Tianle and Wenyu clarified the issue about compressed pointer. Currently the SATB barrier computes metadata address differently, based on whether compressed pointer is in used or not, which is incorrect. We should use the same approach as how metadata is computed in the object barrier. The current implementation of SATB barrier is derived from lxr which uses field barrier, and needs to deal with the difference of field slots with/without compressed pointer.

@qinsoon qinsoon force-pushed the concurrent-immix branch 5 times, most recently from ea4c968 to fc9d143 Compare July 30, 2025 07:20
@qinsoon qinsoon force-pushed the concurrent-immix branch from fc9d143 to 5018b70 Compare July 30, 2025 08:00
@qinsoon
Copy link
Member

qinsoon commented Jul 31, 2025

Current CI runs concurrent Immix with 4x min heap. There are still some benchmarks that ran out of memory. I will increase it to 5x.

There are a few correctness issues.

Segfault in mmtk_openjdk::abi::OopDesc::size: fastdebug h2, fastdebug jython

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007ff207be7228, pid=3100, tid=3110
#
# JRE version: OpenJDK Runtime Environment (11.0.19) (fastdebug build 11.0.19-internal+0-adhoc.runner.openjdk)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 11.0.19-internal+0-adhoc.runner.openjdk, mixed mode, tiered, third-party gc, linux-amd64)
# Problematic frame:
# C  [libmmtk_openjdk.so+0x1e7228]  mmtk_openjdk::abi::OopDesc::size::h58760be027304211+0x18
#
# Core dump will be written. Default location: Core dumps may be processed with "/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %h %d" (or dumping to /tmp/runbms-kt4q31ff/core.3100)
#
# An error report file with more information is saved as:
# /tmp/runbms-kt4q31ff/hs_err_pid3100.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#
Current thread is 3110
Dumping core ...

old_queue is not empty: fastdebug pmd

thread '<unnamed>' panicked at /home/runner/.cargo/git/checkouts/mmtk-core-783748a1e19f117d/100c049/src/scheduler/scheduler.rs:719:13:
assertion failed: old_queue.is_empty()
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at core/src/panicking.rs:221:5:
panic in a function that cannot unwind
stack backtrace:
   0:     0x7f441a685fca - std::backtrace_rs::backtrace::libunwind::trace::h5a5b8284f2d0c266
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/../../backtrace/src/backtrace/libunwind.rs:116:5
   1:     0x7f441a685fca - std::backtrace_rs::backtrace::trace_unsynchronized::h76d4f1c9b0b875e3
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x7f441a685fca - std::sys::backtrace::_print_fmt::hc4546b8364a537c6
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/sys/backtrace.rs:66:9
   3:     0x7f441a685fca - <std::sys::backtrace::BacktraceLock::print::DisplayBacktrace as core::fmt::Display>::fmt::h5b6bd5631a6d1f6b
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/sys/backtrace.rs:39:26
   4:     0x7f441a6aa7d3 - core::fmt::rt::Argument::fmt::h270f6602a2b96f62
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/fmt/rt.rs:177:76
   5:     0x7f441a6aa7d3 - core::fmt::write::h7550c97b06c86515
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/fmt/mod.rs:1186:21
   6:     0x7f441a683693 - std::io::Write::write_fmt::h7b09c64fe0be9c84
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/io/mod.rs:1839:15
   7:     0x7f441a685e12 - std::sys::backtrace::BacktraceLock::print::h2395ccd2c84ba3aa
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/sys/backtrace.rs:42:9
   8:     0x7f441a686efc - std::panicking::default_hook::{{closure}}::he19d4c7230e07961
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:268:22
   9:     0x7f441a686d42 - std::panicking::default_hook::hf614597d3c67bbdb
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:295:9
  10:     0x7f441a6874d7 - std::panicking::rust_panic_with_hook::h8942133a8b252070
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:801:13
  11:     0x7f441a687336 - std::panicking::begin_panic_handler::{{closure}}::hb5f5963570096b29
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:667:13
  12:     0x7f441a6864a9 - std::sys::backtrace::__rust_end_short_backtrace::h6208cedc1922feda
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/sys/backtrace.rs:170:18
  13:     0x7f441a686ffc - rust_begin_unwind
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:665:5
  14:     0x7f4419ec8e2d - core::panicking::panic_nounwind_fmt::runtime::h1f507a806003dfb2
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/panicking.rs:112:18
  15:     0x7f4419ec8e2d - core::panicking::panic_nounwind_fmt::h357fc035dc231634
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/panicking.rs:122:5
  16:     0x7f4419ec8ec2 - core::panicking::panic_nounwind::hd0dad372654c389a
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/panicking.rs:221:5
  17:     0x7f4419ec9025 - core::panicking::panic_cannot_unwind::h65aefd062253eb19
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/panicking.rs:310:5
  18:     0x7f441a2a81ec - start_worker
                               at /home/runner/work/mmtk-openjdk/mmtk-openjdk/git/mmtk-openjdk/mmtk/src/api.rs:210:1
  19:     0x7f441c33492b - _ZN6Thread8call_runEv
  20:     0x7f441c0356c6 - _ZL19thread_native_entryP6Thread
  21:     0x7f441cc94ac3 - <unknown>
  22:     0x7f441cd26850 - <unknown>
  23:                0x0 - <unknown>

Segfault in mark_lines_for_object: release h2, release jython

This could be the same issue as the first one (mmtk_openjdk::abi::OopDesc::size). We need to get object size in mark_lines_for_object, which segfaults.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f4edd5e4d4d, pid=2945, tid=2952
#
# JRE version: OpenJDK Runtime Environment (11.0.19) (build 11.0.19-internal+0-adhoc.runner.openjdk)
# Java VM: OpenJDK 64-Bit Server VM (11.0.19-internal+0-adhoc.runner.openjdk, mixed mode, tiered, third-party gc, linux-amd64)
# Problematic frame:
# C  [libmmtk_openjdk.so+0x1e4d4d]  mmtk::policy::immix::line::Line::mark_lines_for_object::h6c322a8017999525+0xd
#
# Core dump will be written. Default location: Core dumps may be processed with "/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %h %d" (or dumping to /tmp/runbms-k75bcqcl/core.2945)
#
# An error report file with more information is saved as:
# /tmp/runbms-k75bcqcl/hs_err_pid2945.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#

@tianleq
Copy link
Collaborator Author

tianleq commented Jul 31, 2025

Current CI runs concurrent Immix with 4x min heap. There are still some benchmarks that ran out of memory. I will increase it to 5x.

There are a few correctness issues.

Segfault in mmtk_openjdk::abi::OopDesc::size: fastdebug h2, fastdebug jython

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007ff207be7228, pid=3100, tid=3110
#
# JRE version: OpenJDK Runtime Environment (11.0.19) (fastdebug build 11.0.19-internal+0-adhoc.runner.openjdk)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 11.0.19-internal+0-adhoc.runner.openjdk, mixed mode, tiered, third-party gc, linux-amd64)
# Problematic frame:
# C  [libmmtk_openjdk.so+0x1e7228]  mmtk_openjdk::abi::OopDesc::size::h58760be027304211+0x18
#
# Core dump will be written. Default location: Core dumps may be processed with "/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %h %d" (or dumping to /tmp/runbms-kt4q31ff/core.3100)
#
# An error report file with more information is saved as:
# /tmp/runbms-kt4q31ff/hs_err_pid3100.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#
Current thread is 3110
Dumping core ...

old_queue is not empty: fastdebug pmd

thread '<unnamed>' panicked at /home/runner/.cargo/git/checkouts/mmtk-core-783748a1e19f117d/100c049/src/scheduler/scheduler.rs:719:13:
assertion failed: old_queue.is_empty()
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at core/src/panicking.rs:221:5:
panic in a function that cannot unwind
stack backtrace:
   0:     0x7f441a685fca - std::backtrace_rs::backtrace::libunwind::trace::h5a5b8284f2d0c266
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/../../backtrace/src/backtrace/libunwind.rs:116:5
   1:     0x7f441a685fca - std::backtrace_rs::backtrace::trace_unsynchronized::h76d4f1c9b0b875e3
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x7f441a685fca - std::sys::backtrace::_print_fmt::hc4546b8364a537c6
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/sys/backtrace.rs:66:9
   3:     0x7f441a685fca - <std::sys::backtrace::BacktraceLock::print::DisplayBacktrace as core::fmt::Display>::fmt::h5b6bd5631a6d1f6b
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/sys/backtrace.rs:39:26
   4:     0x7f441a6aa7d3 - core::fmt::rt::Argument::fmt::h270f6602a2b96f62
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/fmt/rt.rs:177:76
   5:     0x7f441a6aa7d3 - core::fmt::write::h7550c97b06c86515
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/fmt/mod.rs:1186:21
   6:     0x7f441a683693 - std::io::Write::write_fmt::h7b09c64fe0be9c84
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/io/mod.rs:1839:15
   7:     0x7f441a685e12 - std::sys::backtrace::BacktraceLock::print::h2395ccd2c84ba3aa
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/sys/backtrace.rs:42:9
   8:     0x7f441a686efc - std::panicking::default_hook::{{closure}}::he19d4c7230e07961
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:268:22
   9:     0x7f441a686d42 - std::panicking::default_hook::hf614597d3c67bbdb
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:295:9
  10:     0x7f441a6874d7 - std::panicking::rust_panic_with_hook::h8942133a8b252070
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:801:13
  11:     0x7f441a687336 - std::panicking::begin_panic_handler::{{closure}}::hb5f5963570096b29
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:667:13
  12:     0x7f441a6864a9 - std::sys::backtrace::__rust_end_short_backtrace::h6208cedc1922feda
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/sys/backtrace.rs:170:18
  13:     0x7f441a686ffc - rust_begin_unwind
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:665:5
  14:     0x7f4419ec8e2d - core::panicking::panic_nounwind_fmt::runtime::h1f507a806003dfb2
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/panicking.rs:112:18
  15:     0x7f4419ec8e2d - core::panicking::panic_nounwind_fmt::h357fc035dc231634
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/panicking.rs:122:5
  16:     0x7f4419ec8ec2 - core::panicking::panic_nounwind::hd0dad372654c389a
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/panicking.rs:221:5
  17:     0x7f4419ec9025 - core::panicking::panic_cannot_unwind::h65aefd062253eb19
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/panicking.rs:310:5
  18:     0x7f441a2a81ec - start_worker
                               at /home/runner/work/mmtk-openjdk/mmtk-openjdk/git/mmtk-openjdk/mmtk/src/api.rs:210:1
  19:     0x7f441c33492b - _ZN6Thread8call_runEv
  20:     0x7f441c0356c6 - _ZL19thread_native_entryP6Thread
  21:     0x7f441cc94ac3 - <unknown>
  22:     0x7f441cd26850 - <unknown>
  23:                0x0 - <unknown>

Segfault in mark_lines_for_object: release h2, release jython

This could be the same issue as the first one (mmtk_openjdk::abi::OopDesc::size). We need to get object size in mark_lines_for_object, which segfaults.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f4edd5e4d4d, pid=2945, tid=2952
#
# JRE version: OpenJDK Runtime Environment (11.0.19) (build 11.0.19-internal+0-adhoc.runner.openjdk)
# Java VM: OpenJDK 64-Bit Server VM (11.0.19-internal+0-adhoc.runner.openjdk, mixed mode, tiered, third-party gc, linux-amd64)
# Problematic frame:
# C  [libmmtk_openjdk.so+0x1e4d4d]  mmtk::policy::immix::line::Line::mark_lines_for_object::h6c322a8017999525+0xd
#
# Core dump will be written. Default location: Core dumps may be processed with "/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %h %d" (or dumping to /tmp/runbms-k75bcqcl/core.2945)
#
# An error report file with more information is saved as:
# /tmp/runbms-k75bcqcl/hs_err_pid2945.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#

I am aware of 4x is not enough, jython OOM even with 5x, this is true in the stop-the-world non-moving immix. The iython crash is what I saw when barrier is incorrectly eliminated. I fixed that and never saw it again. I also look at my test run, only see OOMs on jython. I will try to reproduce those locally

wks
wks previously requested changes Jul 31, 2025
Copy link
Contributor

@wks wks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the code were copied from the field logging barrier in the lxr branch. When not using compressed oops, each field is 64 bits, and the granularity of field-logging bits is one bit per 8 bytes. But when using compressed oops, fields become 32 bits, and the field-logging bits becomes one bit per 4 bytes. That's why the shift changes between 5 or 6 depending on whether compressed oops is enabled.

But here we are working on the object-remembering barrier, and using the regular global unlog bits. Its granularity is only related to the object alignment, not the field size. On 64-bit machines, objects are always 64-bit aligned. So we should always shift by 6 when computing the address of unlog bits. Similarly, when computing the in-byte shift, we should always shift by 3.

Change this and the segmentation fault disappears, even when using compressed oops.

@tianleq
Copy link
Collaborator Author

tianleq commented Jul 31, 2025

I guess the code were copied from the field logging barrier in the lxr branch. When not using compressed oops, each field is 64 bits, and the granularity of field-logging bits is one bit per 8 bytes. But when using compressed oops, fields become 32 bits, and the field-logging bits becomes one bit per 4 bytes. That's why the shift changes between 5 or 6 depending on whether compressed oops is enabled.

But here we are working on the object-remembering barrier, and using the regular global unlog bits. Its granularity is only related to the object alignment, not the field size. On 64-bit machines, objects are always 64-bit aligned. So we should always shift by 6 when computing the address of unlog bits. Similarly, when computing the in-byte shift, we should always shift by 3.

Change this and the segmentation fault disappears, even when using compressed oops.

Yes, @qinsoon and I @tianleq were aware of this.

@wks
Copy link
Contributor

wks commented Jul 31, 2025

I also observed a crash during a full GC (after patching the shifting operations for unlog bits as I suggested above).

[2025-07-31T03:38:33Z INFO  mmtk::util::heap::gc_trigger] [POLL] immix: Triggering collection (20484/20480 pages)
[2025-07-31T03:38:33Z INFO  mmtk::util::heap::gc_trigger] [POLL] immix: Triggering collection (20484/20480 pages)
[2025-07-31T03:38:33Z INFO  mmtk::policy::immix::defrag] Defrag: false
Full start
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007ffff4ffca18, pid=103107, tid=103139
#
# JRE version: OpenJDK Runtime Environment (11.0.19) (fastdebug build 11.0.19-internal+0-adhoc.wks.openjdk)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 11.0.19-internal+0-adhoc.wks.openjdk, mixed mode, tiered, compressed oops, third-party gc, linux-amd64)
# Problematic frame:
# C  [libmmtk_openjdk.so+0x1fca18]  mmtk_openjdk::abi::OopDesc::size::hacb6815c8bcd90cb+0x18
#
...
Thread 27 "MMTk Collector " received signal SIGABRT, Aborted.
[Switching to Thread 0x7fff82be36c0 (LWP 103139)]
0x00007ffff7d6e5df in abort () from /usr/lib/libc.so.6
(gdb) bt
#28 0x00007ffff70f60e2 in signalHandler (sig=11, info=0x7fff82be10b0, uc=0x7fff82be0f80)
    at /home/wks/projects/mmtk-github/parallels/review/concurrent-immix/openjdk/src/hotspot/os/linux/os_linux.cpp:4917
#29 <signal handler called>
#30 mmtk_openjdk::abi::OopDesc::size<true> (self=0x401e93a8) at src/abi.rs:390
#31 0x00007ffff4ff1682 in mmtk_openjdk::object_model::{impl#0}::get_current_size<true> (object=...) at src/object_model.rs:66
#32 0x00007ffff4f4f578 in mmtk::policy::immix::line::Line::mark_lines_for_object<mmtk_openjdk::OpenJDK<true>> (object=..., state=92)
    at /home/wks/projects/mmtk-github/parallels/review/concurrent-immix/mmtk-core/src/policy/immix/line.rs:70
#33 0x00007ffff4f5a0a6 in mmtk::policy::immix::immixspace::ImmixSpace<mmtk_openjdk::OpenJDK<true>>::mark_lines<mmtk_openjdk::OpenJDK<true>> (self=0x7ffff00b11c0, object=...)
    at /home/wks/projects/mmtk-github/parallels/review/concurrent-immix/mmtk-core/src/policy/immix/immixspace.rs:790
#34 0x00007ffff4f5cb50 in mmtk::policy::immix::immixspace::ImmixSpace<mmtk_openjdk::OpenJDK<true>>::trace_object_without_moving<mmtk_openjdk::OpenJDK<true>, mmtk::plan::tracing::VectorQueue<mmtk::util::address::ObjectReference>> (self=0x7ffff00b11c0, queue=0x7fff82be2518, object=...) at /home/wks/projects/mmtk-github/parallels/review/concurrent-immix/mmtk-core/src/policy/immix/immixspace.rs:654
#35 0x00007ffff4f59656 in mmtk::policy::immix::immixspace::{impl#3}::trace_object<mmtk_openjdk::OpenJDK<true>, mmtk::plan::tracing::VectorQueue<mmtk::util::address::ObjectReference>, 0> (
    self=0x7ffff00b11c0, queue=0x7fff82be2518, object=..., copy=..., worker=0x7ffff020db70)
    at /home/wks/projects/mmtk-github/parallels/review/concurrent-immix/mmtk-core/src/policy/immix/immixspace.rs:244
#36 0x00007ffff52943d2 in mmtk::plan::concurrent::immix::global::{impl#10}::trace_object<mmtk_openjdk::OpenJDK<true>, mmtk::plan::tracing::VectorQueue<mmtk::util::address::ObjectReference>, 0> (
    self=0x7ffff00b11c0, __mmtk_queue=0x7fff82be2518, __mmtk_objref=..., __mmtk_worker=0x7ffff020db70)
    at /home/wks/projects/mmtk-github/parallels/review/concurrent-immix/mmtk-core/src/plan/concurrent/immix/global.rs:45
#37 0x00007ffff51c6f62 in mmtk::scheduler::gc_work::{impl#39}::trace_object<mmtk_openjdk::OpenJDK<true>, mmtk::plan::concurrent::immix::global::ConcurrentImmix<mmtk_openjdk::OpenJDK<true>>, 0> (
    self=0x7fff82be2500, object=...) at /home/wks/projects/mmtk-github/parallels/review/concurrent-immix/mmtk-core/src/scheduler/gc_work.rs:997
#38 0x00007ffff4fd39e8 in mmtk::vm::reference_glue::{impl#0}::keep_alive<mmtk::scheduler::gc_work::PlanProcessEdges<mmtk_openjdk::OpenJDK<true>, mmtk::plan::concurrent::immix::global::ConcurrentImmix<mmtk_openjdk::OpenJDK<true>>, 0>> (self=0x7fff2c018b60, trace=0x7fff82be2500) at /home/wks/projects/mmtk-github/parallels/review/concurrent-immix/mmtk-core/src/vm/reference_glue.rs:81
#39 0x00007ffff502c66e in mmtk::util::finalizable_processor::FinalizableProcessor<mmtk::util::address::ObjectReference>::forward_finalizable_reference<mmtk::util::address::ObjectReference, mmtk::scheduler::gc_work::PlanProcessEdges<mmtk_openjdk::OpenJDK<true>, mmtk::plan::concurrent::immix::global::ConcurrentImmix<mmtk_openjdk::OpenJDK<true>>, 0>> (e=0x7fff82be2500, finalizable=0x7fff2c018b60)
    at /home/wks/projects/mmtk-github/parallels/review/concurrent-immix/mmtk-core/src/util/finalizable_processor.rs:41
#40 0x00007ffff502c3a6 in mmtk::util::finalizable_processor::{impl#0}::forward_finalizable::{closure#0}<mmtk::util::address::ObjectReference, mmtk::scheduler::gc_work::PlanProcessEdges<mmtk_openjdk::OpenJDK<true>, mmtk::plan::concurrent::immix::global::ConcurrentImmix<mmtk_openjdk::OpenJDK<true>>, 0>> (f=0x7fff2c018b60)
    at /home/wks/projects/mmtk-github/parallels/review/concurrent-immix/mmtk-core/src/util/finalizable_processor.rs:92
#41 0x00007ffff50f7656 in core::slice::iter::{impl#190}::for_each<mmtk::util::address::ObjectReference, mmtk::util::finalizable_processor::{impl#0}::forward_finalizable::{closure_env#0}<mmtk::util::address::ObjectReference, mmtk::scheduler::gc_work::PlanProcessEdges<mmtk_openjdk::OpenJDK<true>, mmtk::plan::concurrent::immix::global::ConcurrentImmix<mmtk_openjdk::OpenJDK<true>>, 0>>> (self=..., f=...)
    at /home/wks/.rustup/toolchains/1.83.0-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/iter/macros.rs:254
#42 0x00007ffff502bdaf in mmtk::util::finalizable_processor::FinalizableProcessor<mmtk::util::address::ObjectReference>::forward_finalizable<mmtk::util::address::ObjectReference, mmtk::scheduler::gc_work::PlanProcessEdges<mmtk_openjdk::OpenJDK<true>, mmtk::plan::concurrent::immix::global::ConcurrentImmix<mmtk_openjdk::OpenJDK<true>>, 0>> (
    self=0x7ffff5950448 <<mmtk_openjdk::SINGLETON_COMPRESSED as core::ops::deref::Deref>::deref::__stability::LAZY+8>, e=0x7fff82be2500, _nursery=false)
    at /home/wks/projects/mmtk-github/parallels/review/concurrent-immix/mmtk-core/src/util/finalizable_processor.rs:90
#43 0x00007ffff502e8a9 in mmtk::util::finalizable_processor::FinalizableProcessor<mmtk::util::address::ObjectReference>::scan<mmtk::util::address::ObjectReference, mmtk::scheduler::gc_work::PlanProcessEdges<mmtk_openjdk::OpenJDK<true>, mmtk::plan::concurrent::immix::global::ConcurrentImmix<mmtk_openjdk::OpenJDK<true>>, 0>> (
    self=0x7ffff5950448 <<mmtk_openjdk::SINGLETON_COMPRESSED as core::ops::deref::Deref>::deref::__stability::LAZY+8>, tls=..., e=0x7fff82be2500, nursery=false)
    at /home/wks/projects/mmtk-github/parallels/review/concurrent-immix/mmtk-core/src/util/finalizable_processor.rs:74
#44 0x00007ffff5012e54 in mmtk::util::finalizable_processor::{impl#1}::do_work<mmtk::scheduler::gc_work::PlanProcessEdges<mmtk_openjdk::OpenJDK<true>, mmtk::plan::concurrent::immix::global::ConcurrentImmix<mmtk_openjdk::OpenJDK<true>>, 0>> (self=0x1, worker=0x7ffff020db70, mmtk=0x7ffff5950440 <<mmtk_openjdk::SINGLETON_COMPRESSED as core::ops::deref::Deref>::deref::__stability::LAZY>)
    at /home/wks/projects/mmtk-github/parallels/review/concurrent-immix/mmtk-core/src/util/finalizable_processor.rs:164
#45 0x00007ffff50427ad in mmtk::scheduler::work::GCWork::do_work_with_stat<mmtk::util::finalizable_processor::Finalization<mmtk::scheduler::gc_work::PlanProcessEdges<mmtk_openjdk::OpenJDK<true>, mmtk::plan::concurrent::immix::global::ConcurrentImmix<mmtk_openjdk::OpenJDK<true>>, 0>>, mmtk_openjdk::OpenJDK<true>> (self=0x1, worker=0x7ffff020db70, 
    mmtk=0x7ffff5950440 <<mmtk_openjdk::SINGLETON_COMPRESSED as core::ops::deref::Deref>::deref::__stability::LAZY>)
    at /home/wks/projects/mmtk-github/parallels/review/concurrent-immix/mmtk-core/src/scheduler/work.rs:45
#46 0x00007ffff505733f in mmtk::scheduler::worker::GCWorker<mmtk_openjdk::OpenJDK<true>>::run<mmtk_openjdk::OpenJDK<true>> (self=0x7ffff020db70, tls=..., 
    mmtk=0x7ffff5950440 <<mmtk_openjdk::SINGLETON_COMPRESSED as core::ops::deref::Deref>::deref::__stability::LAZY>)
    at /home/wks/projects/mmtk-github/parallels/review/concurrent-immix/mmtk-core/src/scheduler/worker.rs:257
#47 0x00007ffff5273065 in mmtk::memory_manager::start_worker<mmtk_openjdk::OpenJDK<true>> (
    mmtk=0x7ffff5950440 <<mmtk_openjdk::SINGLETON_COMPRESSED as core::ops::deref::Deref>::deref::__stability::LAZY>, tls=..., worker=0x7ffff020db70)
    at /home/wks/projects/mmtk-github/parallels/review/concurrent-immix/mmtk-core/src/memory_manager.rs:559
#48 0x00007ffff4fb6df3 in mmtk_openjdk::api::start_worker (tls=..., worker=0x7ffff020db70) at src/api.rs:213
#49 0x00007ffff7439130 in Thread::call_run (this=0x7ffff0239000) at /home/wks/projects/mmtk-github/parallels/review/concurrent-immix/openjdk/src/hotspot/share/runtime/thread.cpp:402
#50 0x00007ffff7105a2e in thread_native_entry (thread=0x7ffff0239000) at /home/wks/projects/mmtk-github/parallels/review/concurrent-immix/openjdk/src/hotspot/os/linux/os_linux.cpp:826
#51 0x00007ffff7dde7eb in ?? () from /usr/lib/libc.so.6
#52 0x00007ffff7e6220c in ?? () from /usr/lib/libc.so.6

It has something to do with finalizers.

@qinsoon
Copy link
Member

qinsoon commented Jul 31, 2025

I also observed a crash during a full GC (after patching the shifting operations for unlog bits as I suggested above).

It has something to do with finalizers.

The initial implementation does not schedule finalization related packets in final pause. This commit should have fixed that: mmtk/mmtk-core@2bbb200

wks added 3 commits August 28, 2025 16:11
We will introduce it later when we implement a plan that needs it.
... instead of from a mutator instance.
@wks
Copy link
Contributor

wks commented Sep 17, 2025

The h2o benchmark got stuck for some reason. When given a 1000M heap, ConcurrentImmix should finish h2o in 20 seconds (tested locally).

@wks wks marked this pull request as ready for review September 17, 2025 09:40
@wks
Copy link
Contributor

wks commented Sep 17, 2025

I think this PR is stable enough. I labelled this PR as ready for review.

There are some things that I choose to do after this PR.

  • Extracting common code between the ObjectBarrier and the SATBbarrier. Both of them use the unlog bits, and the fast paths for checking if the unlog bit is set can be shared between ObjectBarrier and SATBBarrier. However, that will result in radical changes to the class hierarchy. So I decide to make this PR stable, fix obvious bugs, and do the refactoring in the next PR.
  • Cleaning up code lines that are commented out. They may contain insights during the developments of those barriers and I currently don't understand all of them. It will take some time to understand them and tidy them up. I'll probably do that together with the refactoring of extracting common code.

};

void MMTkSATBBarrierSetRuntime::object_probable_write(oop new_obj) const {
// We intentionally leave this method blank.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait is this actually sound? I thought this can be called for de-optimizations as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment of the SharedRuntime::on_slowpath_allocation_exit method describes it as:

  // Post-slow-path-allocation, pre-initializing-stores step for
  // implementing e.g. ReduceInitialCardMarks
  static void on_slowpath_allocation_exit(JavaThread* thread);

This means it is called before any initialization stores, i.e. all reference fields are null.

I grepped the code and found that SharedRuntime::on_slowpath_allocation_exit is only called in the following places:

  • OptoRuntime::new_instance_C
  • OptoRuntime::new_array_C
  • OptoRuntime::new_array_nozero_C
  • and two more places in JVMCIRuntime.

Let's ignore JVMCIRuntime for now because it is for third-party compilers like Graal. Those methods in OptoRuntime are called by C2-compiled code after fast-path allocation fails. For example, new_instance_C does the following

  • run oop result = InstanceKlass::cast(klass)->allocate_instance(THREAD); which will eventually enter MMTkHeap::mem_allocate and allocate a new object (or trigger GC), and
  • call deoptimize_caller_frame(current, HAS_PENDING_EXCEPTION); which sets up a return barrier so that after returning from new_instance_C, the caller (C2-compiled Java method that failed the fast path allocation) will not continue from the original PC, and
  • call SharedRuntime::on_slowpath_allocation_exit(current); which will enter MMTkSATBBarrierSetRuntime::object_probable_write(result).

From those code paths, we see there is no intermediate writes between the allocation and the invocation of object_probable_write. So the fields of result must be all null.

Copy link
Member

@qinsoon qinsoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qinsoon
Copy link
Member

qinsoon commented Sep 18, 2025

We probably want to make the heap size for concurrent immix tests smaller (currently 7x which assumes concurrent immix is non moving), but we can do that after this PR.

@mmtkgc-bot mmtkgc-bot merged commit e970e68 into mmtk:master Sep 18, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants