Simple tool for Linux/glibc hooking into OpenGL functions.
The main motivation for this was intercepting application calls to
glXSwapInterval[EXT|SGI|MESA]
to override VSync settings if your GPU driver
doesn't allow you to override this.
With the proprietary NVIDIA Linux driver, the nvidia-settings and
__GL_SYNC_TO_VBLANK
environment variable are actually overriden by an
application using
GLX_EXT_swap_control
,
GLX_SGI_swap_control
or GLX_MESA_swap_control
extensions.
This tool works by exchanging the
value (or silently ignoring the calls altoghether, so that you driver
settings become effective). To do so, it is using the (in)famous LD_PRELOAD
approach.
There are also some more advanced features, notably a latency limiter and a frametime measurement mode, see the section Experimental Features below.
$ LD_PRELOAD=path/to/glx_hook.so GH_SWAP_MODE=$mode target_binary
or
$ LD_PRELOAD=path/to/glx_hook_bare.so GH_SWAP_MODE=$mode target_binary
where $mode
controls how the values are exchanged. Valid modes are
nop
: keep calls as intended by the applicationignore
: silently ignore the calls (return success to the application)clamp=$x,$y
: clamp requested swap interval to [$x
,$y
]force=$x
: always set swap interval$x
disable
: same asforce=0
enable
: same asmin=1
min=$x
: set interval to at least$x
max=$x
: set interval to at most$x
NOTE: This option only changes values forwarded to the swap interval
functions, or ignores these calls completely, but never adds new calls
to set the swap interval. If the app doesn't do it, this option does nothing.
For actually injection such calls, have a look at the experimental option
GH_INJECT_SWAPINTERVAL
below.
NVidia is promoting a feature called "adaptive vsync" where a "late" buffer
swap is done immediately instead of being delayed to the next sync interval.
This feature is exposed via the
GLX_EXT_swap_control_tear
extension. If this
is present, negative intervals enable adaptive vsync with the absolute
value beeing the swap interval. The GH_SWAP_TEAR
environment variable can
be used to control this feature:
raw
: do not treat positive and negative intervals as different. This has the effect that you for example could do aclamp=-1,1
keep
: keep the adaptive vsync setting, modify only the absoulte valuedisable
: always disable adaptive vsyncenable
: always enable adaptive vsyncinvert
: enable adaptive vsync if the app disables it and vice versa (whatever this might be useful for...)
NOTE: we do not check for the presence of this extension. Negative swap intervals are invalid if the extension is not present. So if you enable adaptive vsync without your driver supporting it, the calls will fail. Most likely, the application won't care and no swap interval will be set.
Further environment variables controlling the behavior:
GH_VERBOSE=$level
: control level of verbosity (0 to 5)GH_VERBOSE_FILE=$file
: redirect verbose output to$file
(default is to use standard error stream), see section File Names for details about how the file name is parsed
The glx_hook.so
version is the full version which tracks GL contexts, and
allows also for glXSwapBuffers
manipulations (see below). However, the GL
context tracking adds a whole layer of complexity and might fail in some
scenarios. If you are only interested in the swap interval manipulations,
you can try to use the glx_hook_bare.so
library, which only tries to deal
with the bare minimum of glX
(and dlsym
) functions.
If a GL symbol cannot be resolved, glx_hook tries to manually load the
OpenGL library via dlopen(3)
. This behavior can be controlled by the
GH_LIBGL_FILE
environment variable: If it is not set, libGL.so
is
used, but you might want to specify another one, potentially with
full path. If you set GH_LIBGL_FILE=""
, libGL loading is disabled.
The following features are only available in glx_hook.so
(and not glx_hook_bare.so
):
Set GH_INJECT_SWAPINTERVAL=$n
to inject a SwapInterval
call when a context
is made current for the first time. By default, this is disabled. The GH_SWAP_MODE
setting does not affect the operation of this option. This option is most
useful if the application never sets a swap interval, but it might be combined
with the other GH_SWAP_MODE
settings, i.e. GH_SWAP_MODE=ignore
to prevent
the app from changing th injected setting later on.
Set GH_FRAMETIME=$mode
to aquire frame timings.The following modes are
supported:
0
: no frametime measurements (the default)1
: measure frametime on CPU only2
: measure frametimes on CPU and GPU (requires a context >= 3.3, or supporting theGL_ARB_timer_query
extension)
Use GH_FRAMETIME_DELAY=$n
to set the delay for the timer queries (default: 10 frames).
This controls the number of frames the GPU might lag behind of the CPU. Setting a
too low number may result in performance degradation in comparison to not measuring
the frametimes. The implicit synchronizations can have a similar effect as the
GH_LATENCY
setting, albeit only as an unintented side-effect,
and might completely invalidate the measurements. Just leave this value
at the default unless you know exactly what you are doing...
Use GH_FRAMETIME_FRAMES=$n
to control the number of frames which are buffered
internally (default: 1000 frames). The results will be dumped to disk if the buffer is full. Setting
a too low value here might result in performance degradation due to the output.
Use GH_FRAMETIME_FILE=$name
to control the output file name (default:
glx_hook_frametimes-ctx%c.csv
). See section File Names
for details about how the file name is parsed.
The output will be one line per frame,
with the following values:
frame_number CPU GPU latency CPU GPU latency
where CPU
denotes timestamps on the CPU, GPU
denotes timestamps on the GPU
and latency
denotes the latency of the GPU. All values are in nanoseconds.
The first three values refer to the time directly before the buffer swap,
the latter to directly after the swap. The CPU
and GPU
values are always
relative to the buffer swap of the previous frame, and latency
is just
the observed latency at the respective timing probe. The data for frame 0 might
be useless.
Included is an example script for gnuplot,
script.gnuplot
,
to easily create some simple frame timing graphs. You can use it directly
on any frametime file by specifying the filename
variable on the gnuplot command line:
gnuplot -e filename=\'glx_hook_frametimes-ctx1.csv\' script.gnuplot
Use GH_LATENCY=$n
to limit the number of frames the GPU lags behind. The following
values might be used:
-2
: no limit (the default)-1
: limit to 0, force a sync right after the buffer swap0
: limit to 0, force a sync right before the buffer swap>0
: limit the number of pending frames to$n
(requires a context >= 3.2, or supporting theGL_ARB_sync
extension)
This can be helpful in situations where you experience stuttering in a GL application. Preferably,
you should use GH_LATENCY=1
to not degrade performance too much.
Some GL drivers may use busy waiting when waiting for the sync objects, resulting in maxing out one CPU core for only very little of an advantage. However, sometimes you might even want to use busy waiting (even if the driver doesn't do it for you). Two different modes are implemented:
- the standard mode, which uses a single call to
glClientWaitSync
, which wait for either the completion of rendering of the relevant frame, or the reaching of a timeout, whatever comes first. The timeout can be specified by settingGH_LATENCY_GL_WAIT_TIMEOUT_USECS=$n
, where$n
is the timeout in microseconds. Default is1000000
(1 second), but you can turn this down significantly if you don't wait for long periods even in extreme cases. Setting this too low might result in not actually full synchronization. - the manual mode, where the wait is performed in a loop, until synchronization is
achieved. Use
GH_LATENCY_GL_WAIT_USECS=$n
to set the wait timeout for each individual GL wait operation to$n
microseconds (default: 0) andGH_LATENCY_WAIT_USECS=$n
to add an additional sleep cycle of$n
microseconds per loop iteration (default: 0).
The mode is selected by setting GH_LATENCY_MANUAL_WAIT=$n
, where $n
is
-1
: automatic mode selecion (the default): enable manual mode if either ofGH_LATENCY_GL_WAIT_USECS
orGH_LATENCY_WAIT_USECS
is set to a non-zero value0
: always use standard mode1
: always use manual mode (this allows explicit busy waiting by setting both wait usecs to 0)
Set GH_SWAPBUFFERS=$n
to only execute every $n
-th buffer swap. This might be
useful for games where all game logic is implemented in the same loop as
the rendering, and you want vsync on but stilll a higher frequency for the loop.
In this mode, you need to reach a frame rate of $n
times the refresh
to not miss any display frames.
There is also an experimental - and very crude - adaptive mode which can be enabled by setting
GH_MIN_SWAP_USECS
to a value above zero. If enabled, the GH_SWAPBUFFERS
setting is
ignored, and the swapbuffer omission value is calculated based on the frame timeings
of the previous frames. The idea is to set the value to the frametime of your monitor
value's refresh rate, or somewhat lower, eg. something in the range of 14000
to 16600
for a 60Hz display. You can set GH_SWAP_OMISSION_MEASURE
to either 1
to use CPU
frame times, 2
to use GPU frame times, or 3
to use the maximum of both.
The actual swap buffer omission value is clamped between GH_SWAP_OMISSION_MIN' (default
1, meaning no omission), and
GH_SWAP_OMISSION_MAX(default
4). The measurement is capturing the famre times over the last
GH_SWAP_OMISSION_MEASURE_TOTfames (default:
6, min:
2, max:
16) and using the average of the oldest
GH_SWAP_OMISSION_MEASURE_AVGframes of these (default:
4, min:
1, max:
total frames - 1`).
Note that this mode
can be very unstable, depending on the app and also the GL driver. If might help to
test it in combination with various latency limiter and swap omission flush modes, and also
different threshold durations as well as measurement modes.
The interaction between the latency limiter and swap buffer omission can be controlled
by the GH_SWAP_OMISSION_LATENCY
option as follows:
0
: apply the latency limiter on every swapbuffer operation which is actually carried out (the default)1
: apply the letancy limiter to every swapbuffer operation the application attempts to do, including the omitted ones
Furthermore, you can control the flush behavior at omitted swapbuffer operations via
the GH_SWAP_OMISSION_FLUSH
variable:
0
: do nothing1
: do a flush viaglFlush
(the default)2
: do a full sync viaglFinish
When the latency limiter is enabled andGH_SWAP_OMISSION_LATENCY
is set to1
, you probably should setGH_SWAP_OMISSION_FLUSH
to 0 to avoid additional syncs the latency limiter already cares about.
Frametime measurements will always measure each individual frame the application attempted to render, and is not (directly) affected by the swap buffer omission settings.
Set GH_SWAP_SLEEP_USECS=$n
to force an addition sleep of that many microseconds
after each buffer swap. This might be useful if you want to reduce the framerate or simulate
a slower machine.
You can override the attributes for GL context creation. This will require the
GLX_ARB_create_context
extension. The following overrides are defined:
GH_FORCE_MIN_GL_VERSION_MAJOR
: set the the minimum GL major version number to requestGH_FORCE_MIN_GL_VERSION_MINOR
: set the the minimum GL minor version number to requestGH_FORCE_MAX_GL_VERSION_MAJOR
: set the the maximum GL major version number to requestGH_FORCE_MAX_GL_VERSION_MINOR
: set the the maximum GL minor version number to requestGH_FORCE_GL_VERSION_MAJOR
: set the the exact GL major version number to requestGH_FORCE_GL_VERSION_MINOR
: set the the exact GL minor version number to requestGH_FORCE_GL_CONTEXT_PROFILE_CORE
: set to non-zero to force the creation of a core profile. (Requires GL version of at least 3.2)GH_FORCE_GL_CONTEXT_PROFILE_COMPAT
: set to non-zero to force the creation of a compat profile. (Requires GL version of at least 3.2). Set to 1 to always force compat, and to 2 only if the app would be using legacy instead.GH_FORCE_GL_CONTEXT_FLAGS_NO_DEBUG
: set to non-zero to disable debug contexts.GH_FORCE_GL_CONTEXT_FLAGS_DEBUG
: set to non-zero to force debug contexts.GH_FORCE_GL_CONTEXT_FLAGS_DEBUG
takes precedence overGH_FORCE_GL_CONTEXT_FLAGS_NO_DEBUG
.GH_FORCE_GL_CONTEXT_FLAGS_NO_FORWARD_COMPAT
: set to non-zero to disable forwadr-compatible contexts.GH_FORCE_GL_CONTEXT_FLAGS_FORWARD_COMPAT
: set to non-zero to force forward-compatible contexts.GH_FORCE_GL_CONTEXT_FLAGS_FORWARD_COMPAT
takes precedence overGH_FORCE_GL_CONTEXT_FLAGS_NO_FORWARD_COMPAT
.GH_FORCE_GL_CONTEXT_FLAGS_NO_ERROR
: set to non-zero to force a no-error context (as defined inGL_KHR_NO_ERROR
).GH_FORCE_GL_CONTEXT_FLAGS_ERROR
: set to non-zero to force removal of the no-error context flag if the application may request that.
The GL version overrides are applied in the order min,max,exact
. Set a component to -1
for no override.
Note: it is advised to set both the major and minor version, the version comparision will then take
both major and minor into account (e.g. a minumium of 3.2 will not change a requested 4.0 to 4.2),
but it is also possible to override only one component if you really want to (e.g. a minimum of -1.2 will
change a requested 4.0 to 4.2).
You can also directly specify the bitmasks for the context flags and profile mask (see the various GLX
context creation extensions for the actual values):
GH_FORCE_GL_CONTEXT_FLAGS_ON
manaully specify a the bits which must be forced on in the context flags bitmask.GH_FORCE_GL_CONTEXT_FLAGS_OFF
manaully specify a the bits which must be forced off in the context flags bitmask.GH_FORCE_GL_CONTEXT_PROFILE_MASK_ON
manaully specify a the bits which must be forced on in the context profile bitmask.GH_FORCE_GL_CONTEXT_PROFILE_MASK_OFF
manaully specify a the bits which must be forced off in the context profile bitmask. When setting these, they will override any settings by the otherGL_FORCE_GL_*
environment variables.
Note that the context profile is only relevant for GL version 3.2 and up. When forcing a GL version
of 3.2 or higher, the default profile is the core profile. You must explicitely request a compat profile
if the application would otherwise work with a leagcy context (by not using GLX_ARB_create_context
or
specifying an earlier version), or use GH_FORCE_GL_CONTEXT_PROFILE_COMPAT=2
to dynamically request
compatibility profile only if legacy profiles were requested.
By setting GH_GL_DEBUG_OUTPUT
to a non-zero value, GL debug output message callbacks will be intercepted. The debug messages will be logged as INFO
level messages in the GH log. Set GH_GL_INJECT_DEBUG_OUTPUT
to a non-zero value to inject a call to the
debug output functionality into the application. Note that to get debug output, you must
force the creation of a debug GL context if the app does not do it on its own.
See the example script setup_env_debug
for a setup which tries to inject Debug Output into an unspecified GL app.
Whenever an output file name is specified, special run-time information
can be inserted in the file name to avoid overwriting previous files in
complex situations (i.e. the application is using several processes).
A sequence of %
followed by another character is treated depending
on the second character as follows:
c
: the GL context number (sequetially counted from 0), (this is not available for theGH_VERBOSE_FILE
output, context number is always 0 there)p
: the PID of the processt
: the current timestamp as<seconds_since_epoch>.<nanoseconds>
%
: the%
sign itself
To build, just type
$ make
(assuming you have a C compiler and the standard libs installed).
Finally copy the glx_hook.so
to where you like it. For a debug build, do
$ make DEBUG=1
glx_hook requires glibc, as we rely on some glibc internas. Tested with glibc-2.13 (from debian wheezy), glibc-2.24 (from debian stretch) and glibc-2.28 (from debian buster).
glx_hook works by exporting all the relevant GL functions in the shared object,
as well as hooking into dlsym()
(and optionally also dlvsym()
) as well as
glXGetProcAddress
/glXGetProcAddressARB
.
Howver, hooking dlsym()
/dlvsym()
can be done via different methods,
The method is selected at compile time via the METHOD
variable:
$ make METHOD=2
The following methods are available:
-
1
: Deprecated: Use the internal_dl_sym()
function of glibc. However, this function is not exported any more since glibc-2.34, so this approach won't work with newer linux distros beginning some time around autumn of 2021. Using this method allows for hookingdlsym()
anddlvsym
. -
2
: Use thedlvsym()
function which is an official part of the glibc API and ABI. To query the originaldlsym
viadlvsym
, we need to know the exact version of the symbol, which in glibc is dependent on the platform. glx_hook currently supports the platforms x86_64 and i386 via this method, but other platforms can easyly be added. Just do agrep 'GLIBC_.*\bdlsym\b' -r sysdeps
in the root folder of the glibc source. Using this methid allows for hookingdlsym()
, but notdlvsym
. This is currently the default. -
3
: Use a second helper librarydlsym_wrapper.so
. That file will be automatically built if this mode is selected. It must be placed in the same folder where theglx_hook.so
is located, and will be dynamically loaded at runtime whenglx_hook.so
initializes itself. Using this method allows for hookingdlsym()
anddlvsym
. It is probably the most flexible approach, but it adds some complexity.
When using the method 2, this means that we end up getting the symbol
from glibc
even if another hooking library is injected to the same process.
By default, glx_hooks plays nice and actually uses the dlsym()
queried by
dlvsym()
to again query for the unversioned dlsym
. This behavior can
be prevented by setting the GH_ALLOW_DLSYM_REDIRECTION
environment variable
to 0. It is only relevant for METHOD=2
.
You can control wether we shall also hook the dlsym()
and dlvsym()
methods
dynamically, meaning an application calling (our) dlsym()
to query for "dlsym"
itself
should be redirected to our implementation. Use GH_HOOK_DLSYM_DYNAMICALLY=1
or
GH_HOOK_DLVSYM_DYNAMICALLY=1
to enable is. Bu default, this is disabled, as this
creates lots of shenanigans, especially if we are not the only dlsym
/dlvsym
hook
around. Use with care.
There are some example scripts to simplify the setup:
Use either of these to set up your current shell's environment for the use of glx_hook.so
:
cd /path/to/your/glx_hook/installation; source ./setup_env
(assumes thesetup.env
and theglx_hook.so
are in the same directory)source /path/to/your/glx_hook/scripts/setup_env /path/to/your/glx_hook/installation
(directories might differ)
Have fun, derhass (derhass@arcor.de)