An easy-to-use tool that applies many different code size optimizations.
The SLLIM tool takes code compiled with LLVM and applies a variety of different code size optimizations, including LLVM's default optimizations, LLVM optimizations that aren't enabled by default, and optimizations from external research projects. Optimization results are cached to avoid redundant computation. We also provide scripts that make it easy to apply SLLIM to existing software projects without modifying their build systems.
SLLIM builds on the BCDB infrastructure project, and is currently located in a subdirectory of the BCDB repository. A Dockerfile is provided to install SLLIM in an Ubuntu container (recommended):
$ git clone https://github.com/yotann/bcdb.git
$ cd bcdb
$ docker build -t sllim-ubuntu -f experiments/dockerfiles/sllim-ubuntu.docker .
...
Successfully tagged sllim-ubuntu:latest
$ docker run -it sllim-ubuntu
root@...:/#
As an alternative, SLLIM can easily be installed on any Linux system after installing Nix. You can use sllim-ubuntu.docker as a starting point.
WARNING: SLLIM will start a database server (memodb-server
) on 127.0.0.1.
There's no access control, so you need to trust every process running on
127.0.0.1. This is safe in a SLLIM-specific container or VM; other uses are a
bad idea for now.
$ git clone https://github.com/yotann/bcdb.git
$ cd bcdb
$ nix-env -f . -iA sllim
Here's an example using SLLIM to optimize LZ4. Run these commands in the Docker container.
root# git clone --depth=1 https://github.com/lz4/lz4.git
Cloning into 'lz4'...
...
root# cd lz4
root# sllim-env.sh
SLLIM overrides added.
sllim-env: root# make -j4 lz4
compiling static library
sllim-ld: optimizing lz4
sllim-env: root# size lz4
text data bss dec hex filename
172907 940 104 173951 2a77f lz4
The optimizing lz4
line shows you that SLLIM is actually working.
SLLIM is designed to work with any C/C++ code that can be built with Clang. In order to make it easy to use with existing projects and diverse build systems, three ways are provided to invoke SLLIM:
- Use the
sllim
command directly. This is difficult as you are required to make your build system work with LLVM bitcode. - Configure your build system to use
sllim-cc
as the compiler, etc. These scripts should handle everything else automatically. - Use
sllim-env.sh
(recommended) before you configure and build your project. Most of the time, this will work automatically without any extra effort on your part.
Simply run sllim input.bc -o output.o
. Note that the input is LLVM bitcode,
while the output is an object file containing machine code.
$ CC=sllim-cc CXX=sllim-c++ LD=sllim-ld ./configure
$ CC=sllim-cc CXX=sllim-c++ LD=sllim-ld make
If you know how to override the compiler and linker used by your build system,
choosing sllim-cc
, sllim-c++
, and sllim-ld
will generally allow you to
apply SLLIM to your project without any further effort. For many projects,
setting CC
, CXX
, and LD
(as above) is enough.
The sllim-cc
and sllim-c++
wrapper scripts can be used as drop-in
replacements for clang
and clang++
. They work by invoking the original
clang
/clang++
, adding options to generate LLVM bitcode and use sllim-ld
and making some other tweaks. See the sllim-cc source and
sllim-ld source for details.
The sllim-ld
wrapper script can be used as a drop-in replacement for ld
. It
works by running the original ld
normally to produce a program or shared
library, then extracting the bitcode, optimizing it with sllim
, and compiling
and linking the result to produce a new, size-optimized program or shared
library.
These scripts require the original clang
, clang++
, ld
, bc-imitate
(part of BCDB), and sllim
itself to be in your PATH
. Alternatively, you can
set SLLIM
, SLLIM_BC_IMITATE
, SLLIM_CLANG
, SLLIM_CLANGXX
, and SLLIM_LD
to the full paths of each program.
This is the easiest option; you just have to source sllim-env.sh
in your
shell, and then configure and build your project in the same shell. For most
open source projects, this is all you need to do, and you don't need to deal
with the build system at all.
The sllim-env.sh
script tries to force build systems to use sllim-cc
and
the other scripts. Not only does it set variables like CC
, but it also adds
cc
, gcc
, clang
, etc. wrapper scripts to your PATH
, ensuring your build
system will use sllim-cc
even if it hardcodes the name of the compiler.
This script requires SLLIM, clang
, clang++
, ld
, bc-imitate
, and the
LLVM tools (such as llvm-ar
) to be in your PATH
before you start it. It
works with Bash, Zsh, BusyBox Ash, Dash, and probably other shells too.
IMPORTANT: Configuration tools, like ./configure
and cmake
, often try
to compile lots of small test files to check which compiler features are
available. It's normal for some of these checks to fail, and any error messages
are usually hidden, but sllim-env.sh
forces its error messages to be shown in
order to help with debugging. You should generally ignore these sllim-env.sh
errors as long as the configuration tool ignores them.
Currently, SLLIM is configured by setting the SLLIM_LEVEL
environment variable
before using it. SLLIM_LEVEL
affects the sllim
command, which is invoked
when linking an executable or shared library; it has no effect when compiling
an object file. With most build systems, if you want to change the optimization
level, you just need to remove the executables/libraries, change
SLLIM_LEVEL
, and run make
again.
Different size optimizations are enabled at different values of SLLIM_LEVEL
:
SLLIM_LEVEL=0
: Avoid optimizations but still go through SLLIM; useful for testing or speed.SLLIM_LEVEL=1
: EnableStandardPass
(basicallyopt -Oz
).SLLIM_LEVEL=2
: EnableForceMinSizePass
(which enables optimization for all functions, even if they were compiled with-O0
).SLLIM_LEVEL=3
: Enable iterated machine outlining. LLVM already enables the machine outliner by default for common targets, but Uber added support for running this outliner multiple times in a row, getting additional benefits.SLLIM_LEVEL=7
: Enable smout, our powerful (but slow) IR-level outliner.SLLIM_LEVEL=8
: Make smout search more exhaustively for outlining candidates.SLLIM_LEVEL=9
: Make smout compile all possible outlining candidates to determine their actual effects on code size, making it much more accurate.SLLIM_LEVEL=10
: Enable Google's ML-based inlining heuristics. These are supposed to help reduce code size, but in the few tests we've done we've observed them to increase code size, so they may be counterproductive. Currently disabled: our Nix expressions don't build LLVM with the right version of TensorFlow.
NOTE: when smout is enabled (SLLIM_LEVEL
7 and up), optimization times
are much longer because smout extracts and evaluates huge numbers of outlining
candidates. This process is highly parallel, so it's recommended to use SLLIM
on a machine with many cores (for example, 32 cores).
You can use SLLIM_LEVEL=0
for configuration (to speed up building test
programs) and a higher level for the actual code:
sllim-env: $ export SLLIM_LEVEL=0
sllim-env: $ ./configure
sllim-env: $ export SLLIM_LEVEL=9
sllim-env: $ make -j10
Instead of opening a port on 127.0.0.1, we should make memodb-server
and the
clients support Unix sockets for access control. Time to implement: ~1 week.
There should be a way to configure SLLIM in more detail than just using
SLLIM_LEVEL
. Our current plan is to let the developer use a custom Python
script that overrides sllim.configuration.Config
. Time to implement: a few
weeks.
We're interested in making SLLIM try multiple possible optimization sequences and choose the best result. It could either use brute force or some kind of optimization algorithm (like Compiler Gym). Time to implement (brute force): <1 week.
We're interested in adding support for more types of optimization, such as "Function Merging by Sequence Alignment", TRIMMER (partial evaluation), Guided Linking, and so on. Some of these (like TRIMMER and Guided Linking) will need specific configuration for each project being optimized with SLLIM.
The sllim
command itself should generally be correct for all kinds of code,
just like normal LLVM passes. But note that (depending on SLLIM_LEVEL
) it
may force all code to be optimized, even if the developer tried to prevent it
from being optimized. And note that smout can prevent backtraces from working
normally.
The sllim-cc
command has to remove some options in order to make
-fembed-bitcode
to work; see the sllim-cc source for details.
An alternative would be to use -flto
instead of -fembed-bitcode
, which may
be compatible with more of the options, but might confuse build systems because
the object files aren't in the usual format.
Currently, sllim-ld
works by running the linker command (with some inputs
that have bitcode available and some that may not), extracting and optimizing
bitcode from the result, and then running the linker command again with the
optimized bitcode in addition to the original inputs. This means the same
code is actually linked in twice; we use -zmuldefs
to allow the duplicate
definitions and --gc-sections
to remove the extra, unoptimized definitions.
This seems to work, but it's unsatisfying and will probably cause problems.
There are a few options to do this more correctly:
- Process the linker command line options to figure out which inputs have bitcode, so we can extract it and remove them from the linker command. This would involve reimplementing part of the linker functionality (such as finding libraries and perhaps parsing linker scripts) in a shell script.
- Implement a custom linker plugin. This would give us exactly the interface
we need. However, it would only work for GNU
ld
andgold
, notlld
, and would make SLLIM slightly more difficult to install. Time to implement: 1–2 weeks. - Extend the linkers with custom code. This would need to be done separately for each linker, and it would be incompatible with any modified linkers developers might be using. Time to implement: a few weeks per linker.
SLLIM currently uses a lock to protect the MemoDB store, which means that only
one sllim
process can be active at once. All the code would actually work
perfectly fine with multiple sllim
processes connected to the same
memodb-server
instance; the lock only exists to simplify starting and
stopping memodb-server
. Time to implement: ~1 week.
SLLIM effectively uses normal LTO, because it links everything into one bitcode file before optimizing it. For large projects such as libLLVM, just running the simplest optimizations on this file can be very slow. It would be nice to support ThinLTO as a faster alternative, even if only a subset of our optimizations work. Time to implement: several weeks, depending on how many optimizations need support.
SLLIM's caching could be finer-grained; instead of caching the full opt -Oz
invocation on a module, it could cache individual function passes on a
per-function basis, for example. Time to implement: a few weeks.
Smout is designed to support our semantic outlining work, which means it has to extract every outlining candidate into a new function before determining whether any equivalent candidates exist; if we decide to forget about semantic equivalence and just rely on syntactic equivalence, we could avoid extracting candidates and instead use some sort of graph equivalence searching algorithm, which would be much faster. Smout also spends a lot of time compiling each candidate to determine its code size, but we could heuristics to avoid doing this. Time to implement: several weeks or longer.
It'd be nice to use a log file for all the messages printed by different parts of SLLIM, rather than using stdout/stderr.
SLLIM depends on a variety of Python modules, custom LLVM plugins, and C++ libraries, and we feel that Nix is the best way to make these dependencies manageable. But installing Nix could still be difficult on old systems or when root access isn't available. There are a few ways we could make this easier:
- Use nix-bundle to create a
single self-extracting executable that contains
sllim
and all its dependencies. This would be convenient, but it does require the kernel to support user namespaces (which aren't always enabled), and we haven't tested nix-bundle with large amounts of software. - Use PRoot to install Nix without requiring root access or user namespaces.
- Run
sllim
on a separate server; put just thesllim-*
scripts, which are highly portable, on the legacy system where software is being built, and have the scripts upload bitcode to thesllim
server for optimization.
The sllim
tool itself should work with any programming language that can be
compiled into LLVM bitcode, but we currently only have scripts to do this
conveniently for C and C++. It would be nice to add support for Swift, Rust,
and perhaps other languages.
SLLIM currently only optimizes the final, fully linked executables and shared libraries, including any static libraries that they are linked against. If a static library project wants to use SLLIM, but the library will only be used by other projects that don't use SLLIM, there's currently no way to get the benefits of SLLIM optimization. We could try to add some support for this case, although optimization opportunities will be limited because we can't link everything into one module.