Skip to content

Conversation

davidhozic
Copy link

@davidhozic davidhozic commented Jun 18, 2025

This introduces changes to the CMake files to allow static linking.
This is linked to #2618.

Update: It seems that starting from the main branch, libraries can be used with LTO without problems. Originally (3.3.3), I had to disable it for the dynamic library, which then resulted in slightly larger performance by just statically linking (+8%) and way larger performance when statically linking and LTO (+33%). This no longer seems to be the problem and the performance is not different.

Summary from #2618:

Benchmarks are steps per second (calls to mj_step) Results are 60-second mean +- standard err.

Prebuilt shared library

Prebuilt baseline: 116432.93 +- 151.63

Static linking (GCC):

102460.48 +- 44.18

Shared library (CLANG):

113133.28 +- 75.63

Shared library (GCC):

can't compile, not related to this PR.

Shared library (MSVC):

90465.95 +- 207.61

Other configuration variables:
-DCMAKE_C_COMPILER:STRING=clang-14 # Only when using clang
-DCMAKE_CXX_COMPILER:STRING=clang++-14 # Only when using clang
-DMUJOCO_HARDEN:BOOL=ON # Only when using clang
-DCMAKE_BUILD_TYPE:STRING=Release
-DCMAKE_INTERPROCEDURAL_OPTIMIZATION:BOOL=OFF
-DMUJOCO_BUILD_EXAMPLES:BOOL=OFF

@davidhozic davidhozic marked this pull request as ready for review June 18, 2025 21:02
@davidhozic
Copy link
Author

@saran-t

@davidhozic
Copy link
Author

davidhozic commented Jun 18, 2025

A comment on lodepng. Everything seems to build when building the shared MuJoCo library, however for the static MuJoCo library I had to add lodepng to the install. Not sure if this is the valid solution, please let me know if there is some more correct way of fixing it.

@davidhozic davidhozic marked this pull request as draft June 19, 2025 00:28
@davidhozic
Copy link
Author

davidhozic commented Jun 19, 2025

Is this a bug?

if(NOT CMAKE_INTERPROCEDURAL_OPTIMIZATION AND (CMAKE_BUILD_TYPE AND NOT CMAKE_BUILD_TYPE STREQUAL "Debug"))

@davidhozic
Copy link
Author

davidhozic commented Jun 19, 2025

So it seems that there weren't any speedups at all compared to the prebuilt.
When originally testing this I had to comment out some lines parts of the cmake files as it would otherwise not work and one of them was:

if(NOT CMAKE_INTERPROCEDURAL_OPTIMIZATION AND (CMAKE_BUILD_TYPE AND NOT CMAKE_BUILD_TYPE STREQUAL "Debug"))

which basically disabled LTO. Not sure why it did't work in the past, but I can now do that no problem.
The benchmarks now actually show it's slower 😅, but that's probably due to different compilers and their optimization methods.

However, static linking will still be beneficial for those that don't want to move the shared library around and want things as a single executable.

@davidhozic davidhozic marked this pull request as ready for review June 19, 2025 10:42
@traversaro
Copy link
Contributor

A comment on lodepng. Everything seems to build when building the shared MuJoCo library, however for the static MuJoCo library I had to add lodepng to the install. Not sure if this is the valid solution, please let me know if there is some more correct way of fixing it.

What is the error if you did not added lodepng to the targets to install?

@davidhozic
Copy link
Author

davidhozic commented Jun 19, 2025

A comment on lodepng. Everything seems to build when building the shared MuJoCo library, however for the static MuJoCo library I had to add lodepng to the install. Not sure if this is the valid solution, please let me know if there is some more correct way of fixing it.

What is the error if you did not added lodepng to the targets to install?

CMake Error: install(EXPORT "mujoco" ...) includes target "mujoco" which requires target "lodepng" that is not in any export set.

@davidhozic
Copy link
Author

davidhozic commented Jun 20, 2025

@traversaro So I'm currently trying to make a shared library using GCC. For some reason the library fails to link with the final program saying the symbols are missing (e. g., undefined reference to `mj_loadXML') and upon inspection with nm, I get this:

/.../mujoco/build/lib/libmujoco.so: plugin needed to handle lto object
00000000000037d9 b completed.0
                 w __cxa_finalize
0000000000001540 t deregister_tm_clones
00000000000015b0 t __do_global_dtors_aux
0000000000002630 d __do_global_dtors_aux_fini_array_entry
00000000000037b0 d __dso_handle
0000000000002640 d _DYNAMIC
000000000000152c t _fini
00000000000015f0 t frame_dummy
0000000000002638 d __frame_dummy_init_array_entry
000000000000050c r __FRAME_END__
00000000000037b8 d _GLOBAL_OFFSET_TABLE_
                 w __gmon_start__
00000000000037d8 B __gnu_lto_slim
0000000000001510 t _init
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
0000000000001570 t register_tm_clones
00000000000037b0 d __TMC_END__
00000000000037b0 d __TMC_LIST__

Do you by any chance know what needs to be done to make this work? It's clearly something to do with LTO but not sure what. It works fine when using CLANG.

@oursland
Copy link

I think your problem may have something to do with this file: include/mujoco/mjexport.h

Try defining MJ_STATIC when compiling and see if that fixes the issue.

@traversaro
Copy link
Contributor

For some reason the library fails to link with the final program saying the symbols are missing (e. g., undefined reference to `mj_loadXML')

Can you share the exact error and the gcc version you are using?

@davidhozic
Copy link
Author

davidhozic commented Jun 20, 2025

you share the exact error and the gcc version you are using?

Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.3.0 (Ubuntu 13.3.0-6ubuntu2~24.04)
 note: /usr/bin/ld: simulation/target/release/examples/stepping2-1cbadef7d7a70dc1.stepping2.682662642ace0e14-cgu.0.rcgu.o: in function `stepping2::main':
          stepping2.682662642ace0e14-cgu.0:(.text._ZN9stepping24main17h94d4afc2a5c5b61fE+0x145): undefined reference to `mj_loadXML'
          /usr/bin/ld: stepping2.682662642ace0e14-cgu.0:(.text._ZN9stepping24main17h94d4afc2a5c5b61fE+0x32a): undefined reference to `mj_makeData'
          /usr/bin/ld: stepping2.682662642ace0e14-cgu.0:(.text._ZN9stepping24main17h94d4afc2a5c5b61fE+0x336): undefined reference to `mj_step'
          /usr/bin/ld: stepping2.682662642ace0e14-cgu.0:(.text._ZN9stepping24main17h94d4afc2a5c5b61fE+0x10de): undefined reference to `mj_deleteData'
          /usr/bin/ld: stepping2.682662642ace0e14-cgu.0:(.text._ZN9stepping24main17h94d4afc2a5c5b61fE+0x10e9): undefined reference to `mj_deleteModel'

@traversaro
Copy link
Contributor

Can you share also the execution arguments passed to ld, for example by compling with ninja -v or make VERBOSE=1 ? Is this happening only with this PR or also with stock MuJoCo?

@davidhozic
Copy link
Author

davidhozic commented Jun 20, 2025

Can you share also the execution arguments passed to ld, for example by compling with ninja -v or make VERBOSE=1 ? Is this happening only with this PR or also with stock MuJoCo?

It doesn't seem to be happening on the main branch. The ld command doesn't seem to be called explicitly

@davidhozic
Copy link
Author

@traversaro

Actually sorry again, I accidentally only compiled the library without linking, it is still happening and this includes the main branch. The command:

/usr/bin/g++ -O3 -DNDEBUG -flto=auto -fno-fat-lto-objects -Wl,--no-as-needed -fuse-ld=lld -Wl,--gc-sections CMakeFiles/simulate.dir/main.cc.o -o ../bin/simulate -Wl,-rpath,"\$ORIGIN/../lib" ../lib/libsimulate.a ../lib/libglfw3.a ../lib/liblodepng.a ../lib/libmujoco.so.3.3.4 /usr/lib/x86_64-linux-gnu/librt.a -lm -ldl /usr/lib/x86_64-linux-gnu/libX11.so

@davidhozic
Copy link
Author

It's like it can't read the symbols in the file due to LTO, but this only happens with gcc, clang works. I'm not sure if I need to have anything else installed on my system to make this work

@traversaro
Copy link
Contributor

@traversaro

Actually sorry again, I accidentally only compiled the library without linking, it is still happening and this includes the main branch. The command:

/usr/bin/g++ -O3 -DNDEBUG -flto=auto -fno-fat-lto-objects -Wl,--no-as-needed -fuse-ld=lld -Wl,--gc-sections CMakeFiles/simulate.dir/main.cc.o -o ../bin/simulate -Wl,-rpath,"\$ORIGIN/../lib" ../lib/libsimulate.a ../lib/libglfw3.a ../lib/liblodepng.a ../lib/libmujoco.so.3.3.4 /usr/lib/x86_64-linux-gnu/librt.a -lm -ldl /usr/lib/x86_64-linux-gnu/libX11.so

If that happens in the main branch, could it make sense to have a separate issue for it, ideally with the full command required to reproduce the errors, the exact mujoco commit and the distro you are using?

@davidhozic
Copy link
Author

@traversaro
Actually sorry again, I accidentally only compiled the library without linking, it is still happening and this includes the main branch. The command:
/usr/bin/g++ -O3 -DNDEBUG -flto=auto -fno-fat-lto-objects -Wl,--no-as-needed -fuse-ld=lld -Wl,--gc-sections CMakeFiles/simulate.dir/main.cc.o -o ../bin/simulate -Wl,-rpath,"\$ORIGIN/../lib" ../lib/libsimulate.a ../lib/libglfw3.a ../lib/liblodepng.a ../lib/libmujoco.so.3.3.4 /usr/lib/x86_64-linux-gnu/librt.a -lm -ldl /usr/lib/x86_64-linux-gnu/libX11.so

If that happens in the main branch, could it make sense to have a separate issue for it, ideally with the full command required to reproduce the errors, the exact mujoco commit and the distro you are using?

Sure, I'll open one.

@davidhozic
Copy link
Author

Alright, I don't think there's anything from my side to be done on this. I've tested on windows and linux and it works. Someone has to test for Mac @traversaro .

@saran-t
Copy link
Member

saran-t commented Jun 24, 2025

LTO with static library is fairly unusual AFAICT. With LTO you emit compiler-specific bytecode which is only converted to actual machine code at link time, which only happens when you make the final binary or when you make a DSO.

Typically with LTO you wouldn't bother making a "static library" as such, you'd just compile everything to the IR "object files" temporarily and just link them straight to the final product.

@oursland
Copy link

@davidhozic Can you document your test environment?

I have built and tested on an M2 MacBook Pro and found the benchmark results to be mixed. Not all benchmarks were improved, many got slightly worse, typically a change less than 1%. A microbenchmark for RotVecQuat operations did show a significant improvement (5x), but the effects did not translate into major changes in the simulation step benchmarks.

LTO-static-benchmark.txt

@davidhozic
Copy link
Author

LTO with static library is fairly unusual AFAICT. With LTO you emit compiler-specific bytecode which is only converted to actual machine code at link time, which only happens when you make the final binary or when you make a DSO.

Typically with LTO you wouldn't bother making a "static library" as such, you'd just compile everything to the IR "object files" temporarily and just link them straight to the final product.

The reason why I'm compiling to a static library is I'm doing a project in Rust, which then links to the static lib. It's easier to statically link than having to configure Cargo to work with CMake. It's thus also useful if anyone wishes to link MuJoCo in other C-compatible languages.

@davidhozic
Copy link
Author

davidhozic commented Jun 24, 2025

LTO with static library is fairly unusual AFAICT. With LTO you emit compiler-specific bytecode which is only converted to actual machine code at link time, which only happens when you make the final binary or when you make a DSO.
Typically with LTO you wouldn't bother making a "static library" as such, you'd just compile everything to the IR "object files" temporarily and just link them straight to the final product.

The reason why I'm compiling to a static library is I'm doing a project in Rust, which then links to the static lib. It's easier to statically link than having to configure Cargo to work with CMake. It's thus also useful if anyone wishes to link MuJoCo in other C-compatible languages.

And it does seem like it still helps, regardless of whether I do LTO afterwards on the entire bin.

@davidhozic
Copy link
Author

davidhozic commented Jun 24, 2025

@davidhozic Can you document your test environment?

I have built and tested on an M2 MacBook Pro and found the benchmark results to be mixed. Not all benchmarks were improved, many got slightly worse, typically a change less than 1%. A microbenchmark for RotVecQuat operations did show a significant improvement (5x), but the effects did not translate into major changes in the simulation step benchmarks.

LTO-static-benchmark.txt

I originally tested these with LTO disabled (i.e., I edited the parts of the configuration where it is forcefully enabled in Release). The benchmarks were just done by simply measuring the number of steps in one second and then averaged over 60 samples (=60 seconds).

The actual results I get now are insignificant as shown is this PR (I assume you're referring to #2618.)

My CPU is R5 5600x.
It was also my custom MJFC.
OS: (K)Ubuntu 24.04.2 LTS for CLANG and GCC and Windows 10 for MSVC

@davidhozic
Copy link
Author

@oursland When running the same benchmarks as you, I get fairly similar results. So this won't really be a performance boost, but at least it will be easier to link with external projects.

static.txt
shared.txt

@davidhozic
Copy link
Author

Any update on this? @traversaro

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants