-
Notifications
You must be signed in to change notification settings - Fork 38
Concurrent Immix #311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent Immix #311
Conversation
openjdk/barriers/mmtkSATBBarrier.cpp
Outdated
@@ -0,0 +1,536 @@ | |||
#define private public // too lazy to change openjdk... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copied from lxr branch. Basically when code needs to be patched, we need to call a private method. A proper solution is to define friend class in OpenJDK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works around the fact that LIRAssembler::as_Address
is private in OpenJDK 11. It is a public method in OpenJDK 21. We may remove this workaround when porting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this mmtkSATBBarrier is also an object remembering barrier. Theoretically, we should be able to reuse the existing mmtkObjectBarrier. But due to the current barrier api design, the mmtkObjectBarrier has the generational semantic baked in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Extracting the common code paths should be helpful from the software engineering's point of view. It will also be helpful for porting this to the OpenJDK 21 binding, in which case fewer lines of code means fewer places to merge. I attempted to extract some code, but found myself digging into a rat hole of many new crashes. For now I'll make this PR stable and get it merged first, then we can refactor the barriers in mmtk-openjdk, and refactor the pause/GC things in mmtk-core in parallel. I'll create another PR for the barriers refactoring.
_pre_barrier_c1_runtime_code_blob = Runtime1::generate_blob(buffer_blob, -1, "mmtk_pre_write_code_gen_cl", false, &pre_write_code_gen_cl); | ||
MMTkPostBarrierCodeGenClosure post_write_code_gen_cl; | ||
_post_barrier_c1_runtime_code_blob = Runtime1::generate_blob(buffer_blob, -1, "mmtk_post_write_code_gen_cl", false, &post_write_code_gen_cl); | ||
// MMTkBarrierCodeGenClosure write_code_gen_cl_patch_fix(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was in the lxr branch and after I discussed with wenyu, we believe it is redundant. The code patching has already been dealt with, no need to do any special things here
openjdk/mmtkBarrierSet.hpp
Outdated
@@ -70,13 +72,16 @@ class MMTkBarrierSetRuntime: public CHeapObj<mtGC> { | |||
static void object_reference_array_copy_pre_call(void* src, void* dst, size_t count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In OpenJDK, array copy does not give base address of src or dst arrays, so nothing can be remembered. I have a patch in Iso to pass down base address of both src and dst arrarys (they are required by the publication semantics). I am not sure if it is worthwhile to have that in our OpenJDK fork
All the tests failed. @tianleq Is there anything we need to do for running ConcurrentImmix? |
I did not test compressed pointers as in Iso, it is not supported yet. Other than that, the minheap is much larger due to it being non-moving immix |
Thanks. I temporarily disabled compressed pointer for concurrent Immix. We should get compressed pointer working before merging the PR. |
OOM is expected, as concurrent Immix is non-moving. Tianle and Wenyu clarified the issue about compressed pointer. Currently the SATB barrier computes metadata address differently, based on whether compressed pointer is in used or not, which is incorrect. We should use the same approach as how metadata is computed in the object barrier. The current implementation of SATB barrier is derived from lxr which uses field barrier, and needs to deal with the difference of field slots with/without compressed pointer. |
ea4c968
to
fc9d143
Compare
fc9d143
to
5018b70
Compare
Current CI runs concurrent Immix with 4x min heap. There are still some benchmarks that ran out of memory. I will increase it to 5x. There are a few correctness issues. Segfault in mmtk_openjdk::abi::OopDesc::size: fastdebug h2, fastdebug jython
old_queue is not empty: fastdebug pmd
Segfault in mark_lines_for_object: release h2, release jythonThis could be the same issue as the first one (
|
I am aware of 4x is not enough, jython OOM even with 5x, this is true in the stop-the-world non-moving immix. The iython crash is what I saw when barrier is incorrectly eliminated. I fixed that and never saw it again. I also look at my test run, only see OOMs on jython. I will try to reproduce those locally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the code were copied from the field logging barrier in the lxr
branch. When not using compressed oops, each field is 64 bits, and the granularity of field-logging bits is one bit per 8 bytes. But when using compressed oops, fields become 32 bits, and the field-logging bits becomes one bit per 4 bytes. That's why the shift changes between 5 or 6 depending on whether compressed oops is enabled.
But here we are working on the object-remembering barrier, and using the regular global unlog bits. Its granularity is only related to the object alignment, not the field size. On 64-bit machines, objects are always 64-bit aligned. So we should always shift by 6 when computing the address of unlog bits. Similarly, when computing the in-byte shift, we should always shift by 3.
Change this and the segmentation fault disappears, even when using compressed oops.
|
I also observed a crash during a full GC (after patching the shifting operations for unlog bits as I suggested above).
It has something to do with finalizers. |
The initial implementation does not schedule finalization related packets in final pause. This commit should have fixed that: mmtk/mmtk-core@2bbb200 |
We will introduce it later when we implement a plan that needs it.
... instead of from a mutator instance.
The h2o benchmark got stuck for some reason. When given a 1000M heap, ConcurrentImmix should finish h2o in 20 seconds (tested locally). |
I think this PR is stable enough. I labelled this PR as ready for review. There are some things that I choose to do after this PR.
|
}; | ||
|
||
void MMTkSATBBarrierSetRuntime::object_probable_write(oop new_obj) const { | ||
// We intentionally leave this method blank. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait is this actually sound? I thought this can be called for de-optimizations as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc comment of the SharedRuntime::on_slowpath_allocation_exit
method describes it as:
// Post-slow-path-allocation, pre-initializing-stores step for
// implementing e.g. ReduceInitialCardMarks
static void on_slowpath_allocation_exit(JavaThread* thread);
This means it is called before any initialization stores, i.e. all reference fields are null.
I grepped the code and found that SharedRuntime::on_slowpath_allocation_exit
is only called in the following places:
OptoRuntime::new_instance_C
OptoRuntime::new_array_C
OptoRuntime::new_array_nozero_C
- and two more places in
JVMCIRuntime
.
Let's ignore JVMCIRuntime
for now because it is for third-party compilers like Graal. Those methods in OptoRuntime
are called by C2-compiled code after fast-path allocation fails. For example, new_instance_C
does the following
- run
oop result = InstanceKlass::cast(klass)->allocate_instance(THREAD);
which will eventually enterMMTkHeap::mem_allocate
and allocate a new object (or trigger GC), and - call
deoptimize_caller_frame(current, HAS_PENDING_EXCEPTION);
which sets up a return barrier so that after returning fromnew_instance_C
, the caller (C2-compiled Java method that failed the fast path allocation) will not continue from the original PC, and - call
SharedRuntime::on_slowpath_allocation_exit(current);
which will enterMMTkSATBBarrierSetRuntime::object_probable_write(result)
.
From those code paths, we see there is no intermediate writes between the allocation and the invocation of object_probable_write
. So the fields of result
must be all null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
We probably want to make the heap size for concurrent immix tests smaller (currently 7x which assumes concurrent immix is non moving), but we can do that after this PR. |
We added support for the new plan ConcurrentImmix added to the mmtk-core in mmtk/mmtk-core#1355
We implemented the SATB barrier fast paths in the OpenJDK binding, and refactored the barriers to support both pre and post barriers, as well as (weak) reference loading barrier. The OpenJDK binding is now aware of concurrent marking, too.