Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VRT Pixel Functions: Add function to evaluate arbitrary expression #11209

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

dbaston
Copy link
Member

@dbaston dbaston commented Nov 5, 2024

What does this PR do?

Adds a C++ pixel function called "expression" that can evaluate an arbitrary expression (or indeed, a mini-program) using the exprtk library. This is a single MIT-licensed header that is easily integrated into GDAL. It appears to be quite widely used. However, given that a major purpose of this is to extend the utility of VRT to users that don't want to allow functions defined in Python (possibly because of security concerns), I'm a little uncomfortable with the scope of expressions allowed by this library. Some of these, such as file I/O, can be disabled (and I've done so.)

I've also looked at:

  • tinyexpr, which seems easily integrated but lacks conditionals
  • muparser, a little harder to integrate (includes multiple header/object files) and supports a reasonable variety of functions including a ternary conditional operator

I need to do more work to investigate details such as nodata handling, but wanted to share the concept sooner rather than later.

An alternative implementation would expose exprtk as an additional pixel function language alongside Python, instead of hooking into the existing C++ pixel function machinery.

An example VRT that is allowed is:

<VRTDataset rasterXSize="1" rasterYSize="1">
  <VRTRasterBand dataType="Float64" band="1" subClass="VRTDerivedRasterBand">
    <PixelFunctionType>expression</PixelFunctionType>
    <PixelFunctionArguments expression="(NIR-R)/(NIR+R)" />
    <SimpleSource name="NIR">
      <SourceFilename relativeToVRT="0">source_0.tif</SourceFilename>
      <SourceBand>1</SourceBand>
    </SimpleSource>
    <SimpleSource name="R">
      <SourceFilename relativeToVRT="0">source_1.tif</SourceFilename>
      <SourceBand>1</SourceBand>
    </SimpleSource>
  </VRTRasterBand>
</VRTDataset>

@dbaston dbaston marked this pull request as draft November 5, 2024 19:40
@rouault
Copy link
Member

rouault commented Nov 5, 2024

That's super cool, and so compact as an integration!

Any hints on the performance ? In particular it would be cool if exprtk could use SIMD to vectorize computations, but I'm probably asking too much. Regarding performance, I'd suspect the biggest issue to be the GDALCopyWords() done for each pixel. Creating a temporary array with all the double values and copying in one-go to the output data type would certainly be beneficial (this remark applies to existing pixel functions).

Can exptrtk can be used with non-double variables ? (in case that would make things faster for integer only computations)

Thinking out loud: for ultimate performance, we should probabl use some LLVM-based solution where the expression would be just-in-time compiled to the target architecture, a bit like numba does.

#define exprtk_disable_rtl_io_file
#define exprtk_disable_rtl_vecops
#define exprtk_disable_string_capabilities
#include <exprtk.hpp>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the following to avoid namespace collisions in case GDAL is integrated with another software also using exprtk?

Suggested change
#include <exprtk.hpp>
#define exprtk gdal_exprtk
#include <exprtk.hpp>

@dbaston
Copy link
Member Author

dbaston commented Nov 5, 2024

Any hints on the performance ?

Haven't looked into this yet. In particular, comparing to Python would be interesting.

Creating a temporary array with all the double values and copying in one-go to the output data type would certainly be beneficial (this remark applies to existing pixel functions).

Maybe they could be rewritten as templates...

Can exptrtk can be used with non-double variables ? (in case that would make things faster for integer only computations)

The docs note "Support for various numeric types (float, double, long double, MPFR/GMP)"

Thinking out loud: for ultimate performance, we should probabl use some LLVM-based solution where the expression would be just-in-time compiled to the target architecture, a bit like numba does.

I also came across mathpresso that does something along these lines.

@coveralls
Copy link
Collaborator

coveralls commented Nov 5, 2024

Coverage Status

coverage: 70.096% (+0.003%) from 70.093%
when pulling 54fd38f on dbaston:vrt-expression
into ee04fce on OSGeo:master.

@dbaston
Copy link
Member Author

dbaston commented Nov 8, 2024

@rouault A limitation of the pixel function integration is that we can only output a single band. exprtk would happily accept an expression like "var q[2] := {B2, B3}; B1 * q;" but we have no way to return the second band to the user. Do you think looking into a VRTProcessedDataset integration might make sense instead?

On performance, I've done basic tests like summing up a 24-band netCDF. Performance is roughly equivalent to Python with perf showing the following for exprtk:

  23.81%  gdal_translate  libgdal.so.36.3.11.0        [.] ExprPixelFunc
  15.50%  gdal_translate  libgdal.so.36.3.11.0        [.] exprtk::details::vec_add_op<double>::process
  12.32%  gdal_translate  libgdal.so.36.3.11.0        [.] (anonymous namespace)::GDALCopyWordsFromT<short>
  11.21%  gdal_translate  libgdal.so.36.3.11.0        [.] GDALCopyWords64
   6.31%  gdal_translate  libnetcdf.so.19             [.] ncx_getn_short_short
   4.06%  gdal_translate  [unknown]                   [k] 0xffffffff98a0109c
   3.31%  gdal_translate  libc.so.6                   [.] __memset_avx2_unaligned_erms
   2.47%  gdal_translate  [unknown]                   [k] 0xffffffff98a00158

and numpy:

  16.49%  gdal_translate  libc.so.6                                          [.] __memmove_avx_unaligned_erms
  13.34%  gdal_translate  _multiarray_umath.cpython-310-x86_64-linux-gnu.so  [.] DOUBLE_add_FMA3__AVX2
  12.76%  gdal_translate  libgdal.so.36.3.11.0                               [.] (anonymous namespace)::GDALCopyWordsFromT<short>
   6.90%  gdal_translate  [unknown]                                          [k] 0xffffffff98a00158
   6.82%  gdal_translate  libnetcdf.so.19                                    [.] ncx_getn_short_short
   4.28%  gdal_translate  [unknown]                                          [k] 0xffffffff98a0109c
   4.23%  gdal_translate  ld-linux-x86-64.so.2                               [.] do_lookup_x
   3.13%  gdal_translate  libc.so.6                                          [.] __memset_avx2_unaligned_erms
   1.98%  gdal_translate  libpython3.10.so.1.0                               [.] _PyEval_EvalFrameDefault

@rouault
Copy link
Member

rouault commented Nov 8, 2024

Do you think looking into a VRTProcessedDataset integration might make sense instead?

It accepts only a single (potentially multiband) input dataset. That said the Input can be a VRTDataset, so you can assemble several single-band datasets as a virtual multiple one. Hard to tell which way is the preferred one. Would having it available in both VRTDerivedRasterBand and VRTProcessedDataset be overkill ? Having it available in VRTDerivedRasterBand makes it probably easier/intutive to write formulas that are different per output-band, even if exprtk allows to return a tuple.

@dbaston
Copy link
Member Author

dbaston commented Nov 8, 2024

Would having it available in both VRTDerivedRasterBand and VRTProcessedDataset be overkill ?

Not necessarily. I'll look into VRTProcessedDataset some more.

Having it available in VRTDerivedRasterBand makes it probably easier/intutive to write formulas that are different per output-band

On the other hand, this requires repeating all of the SimpleSource definitions (and re-reading the input pixels) for every output band...unless I'm missing something.

@rouault
Copy link
Member

rouault commented Nov 8, 2024

On the other hand, this requires repeating all of the SimpleSource definitions (and re-reading the input pixels) for every output band.

that's true

@dbaston dbaston force-pushed the vrt-expression branch 2 times, most recently from e8b10a9 to 98ceafa Compare November 11, 2024 18:41
),
pytest.param(
"""
var chunk[10];
Copy link
Member

@rouault rouault Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting... from a security point of view.

  • can that trigger arbitrary memory allocation ?
  • does exprtk do bound checking when accessing the arrays ?

I suspect you could also cause infinite /very long looping ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rouault - Yes ExprTk has runtime checks. One of which is explicitly for vector index access.

An example can be found here: https://www.partow.net/programming/exprtk/index.html#simple_example_20

More detailed documentation regarding the checks can be found in Section 24 of the readme: https://www.partow.net/programming/exprtk/readme.html#line_4696

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can that trigger arbitrary memory allocation

Array sizes can be up to 2 billion, so you can cause a large allocation with:

var out[2000000000];
return [out];

does exprtk do bound checking when accessing the arrays ?

Not by default, but enabled in fd94c9e

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array sizes can be up to 2 billion, so you can cause a large allocation with:

could we improve exprtk to offer a way to limit the allocations?

Copy link

@ArashPartow ArashPartow Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we improve exprtk to offer a way to limit the allocations?

@rouault: Yes the maximum vector size that can be created in the expression can be set explicitly:

https://www.partow.net/programming/exprtk/readme.html#line_1473

parser_t parser; 
parser.settings().set_max_local_vector_size(12345); 

third_party/exprtk/exprtk.hpp Outdated Show resolved Hide resolved
),
pytest.param(
"""
var chunk[10];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rouault - Yes ExprTk has runtime checks. One of which is explicitly for vector index access.

An example can be found here: https://www.partow.net/programming/exprtk/index.html#simple_example_20

More detailed documentation regarding the checks can be found in Section 24 of the readme: https://www.partow.net/programming/exprtk/readme.html#line_4696

@dbaston dbaston force-pushed the vrt-expression branch 2 times, most recently from bcebd99 to 4bdc19d Compare November 13, 2024 02:14
@rouault
Copy link
Member

rouault commented Nov 13, 2024

The ASAN failures are unrelated to this PR and are avoided in master per fbaa99c. You can rebase on top of that.

Perhaps we could limit the amount of memory to some default value, let's say 10 MB, and if above that, fail with an error like "you need to set the GDAL_VRT_MAX_MEMORY_EXPR configuration option".

frmts/vrt/vrtexpression.cpp Outdated Show resolved Hide resolved
frmts/vrt/vrtexpression.cpp Outdated Show resolved Hide resolved
@dbaston dbaston force-pushed the vrt-expression branch 4 times, most recently from 57f8062 to 8e06984 Compare November 19, 2024 03:13
@dbaston dbaston force-pushed the vrt-expression branch 2 times, most recently from 1e8aa46 to 36c0665 Compare December 2, 2024 18:26
@rouault
Copy link
Member

rouault commented Dec 2, 2024

you'll need to rebase your PR on top of latest master and run scripts/collect_config_options.py

@rouault
Copy link
Member

rouault commented Dec 3, 2024

frmts/vrt/vrtprocesseddatasetfunctions.cpp Outdated Show resolved Hide resolved
frmts/vrt/vrtprocesseddatasetfunctions.cpp Outdated Show resolved Hide resolved
frmts/vrt/vrtexpression_muparser.cpp Show resolved Hide resolved
@rouault
Copy link
Member

rouault commented Jan 3, 2025

Maybe there are workflows where the expression evaluation would be a more significant part of runtime?

the time spent in GDALCopyWords seems quite large. I'm wondering if it is expected or if there's some inefficiency we could solve. Do you have a link to the dataset you use here so I can have a look?

@dbaston
Copy link
Member Author

dbaston commented Jan 3, 2025

@rouault
Copy link
Member

rouault commented Jan 3, 2025

the time spent in GDALCopyWords seems quite large. I'm wondering if it is expected or if there's some inefficiency we could solve

After investigation, we are probably close to optimal, or at least, I can't spot any low-hanging fruit.
Most of the time is spent acquiring the source data, converting from Int16 to Float64 and re-arranging it in a pixel-interleaved way.
python3 -c "from osgeo import gdal; ds = gdal.Open('era5_hour_202101.nc'); ds.ReadAsArray(buf_type = gdal.GDT_Float64, interleave = 'pixel')" just takes slightly less time than reading the VRT.
If acquiring in a interleave = 'band' way, this cuts the time by 2, but that would require reworking the VRTProcessedDataset logic, and I'm not sure it would help at all since we would need to re-order data anyway in step processing functions.
What I noticed though with this particular dataset is that by default the processing chunk is the whole raster, given the netCDF block size is the raster dimensions, which thus consumes 1440 * 721 * 1488 * sizeof(double) = 11.5 GB to acquire input data, and 1440 * 721 * 62 * sizeof(double) = 0.5 GB for the output data. In #11569, I've implemented a logic to dynamically reduce the block size if it is too large compared to the amount of RAM. Of course when doing so, this increases the processing time, but that's better than crashing the machine. With that particular dataset and the 40% threshold I put, with 32 GB RAM, you can still process the dataset as a single block.

@rouault
Copy link
Member

rouault commented Jan 3, 2025

After investigation, we are probably close to optimal, or at least, I can't spot any low-hanging fruit.

actually I now realize I had already implemented a cache-friendly array transposition function for the multidim code. I've generalized it and used it in VRTProcessedDataset and it gives a nice perf boost: #11572

@dbaston dbaston force-pushed the vrt-expression branch 2 times, most recently from e9ae7d7 to 0ec86a7 Compare January 7, 2025 20:59
@dbaston
Copy link
Member Author

dbaston commented Jan 9, 2025

I think this is getting close. A few issues:

  • If exprtk is optional and non-default, do we want to continue to vendor it (1.6 MB), have CMake download it if the feature is enabled, or rely on the user to provide it? Or do we want to remove this support for the time being if we can't show a clear benefit? I'm inclined not to vendor it but I can be irrational about trying to keep the repo small.

  • I'm struggling to resolve the linking issue in the msys2 build. I have this reproduced locally and will continue flailing at it.

  • I'm also not sure how to resolve the Doxygen warnings. I would have expected them to be resolved by adding GDAL_VRT_ENABLE_MUPARSER and GDAL_VRT_ENABLE_EXPRTK to the PREDEFINED block in the Doxyfile, but this has no effect locally. The issue may also resolve itself depending on how we handle exprtk.

  • Still need doc updates to reflect the use of muparser as a default but I'll do this after we make a decision on how to handle exprtk.

@rouault
Copy link
Member

rouault commented Jan 9, 2025

  • If exprtk is optional and non-default, do we want to continue to vendor it (1.6 MB), have CMake download it if the feature is enabled, or rely on the user to provide it? Or do we want to remove this support for the time being if we can't show a clear benefit? I'm inclined not to vendor it but I can be irrational about trying to keep the repo small.

Also +1 not to vendor it, and rely on CMake FindXXXX() logic (cf cmake/helpers/CheckDependentLibraries.cmake) to detect an external exprtk library

Isn't it some issue related to declspec(dllexport) / declspec(dllimport) being inappropriately used in the context of a vendored library

as it is about non-public API, you can just wrap the offending files between *! @cond Doxygen_Suppress */ and *! @endcond */

@hobu
Copy link
Contributor

hobu commented Jan 9, 2025

+1 to requiring user-provided exprtk. It is available on conda-forge for our CI, and it looks like it is starting to be provided by some of the other packaging systems. If we vendor it, we will get complaints about it. We lose either way 😄

@rouault
Copy link
Member

rouault commented Jan 10, 2025

@dbaston I've pushed to your branch a commit (cherry-picked from master) that should hopefully fix the mysterious unrelated crash under the ASAN CI config (no idea why it just popped up on your pull request!)

@dbaston dbaston force-pushed the vrt-expression branch 2 times, most recently from 6e9f144 to 34f8c05 Compare January 10, 2025 14:13
@dbaston
Copy link
Member Author

dbaston commented Jan 10, 2025

@dbaston I've pushed to your branch a commit (cherry-picked from master) that should hopefully fix the mysterious unrelated crash under the ASAN CI config (no idea why it just popped up on your pull request!)

Sorry, I didn't notice this message and apparently clobbered your commit with a force-push. I'll rebase from master again to pick it up.

Is the commit history in this PR important to you? I think the simplest way to remove the vendored exprtk from the history is to squash to a single commit. (Or maybe two commits, with the patch to the vendored muparser kept separate)

@rouault
Copy link
Member

rouault commented Jan 10, 2025

Is the commit history in this PR important to you?

Squashing in a single commit is probably going to be the simplest solution . I tend to prefer atomic history in general like "Add function foo()", "use function foo in driver A", "use function foo in driver B", but only if each individual commit builds properly and doesn't break a git bisect section, and when doing week-long developments it can be a headache to reconstruct a clean history (I'd wish github had a function where we could ask it to run CI on each commit to ensure the history is clean). We're not yet at a position to be as demanding as the Linux kernel project is to its contributors :-)

@rouault
Copy link
Member

rouault commented Jan 10, 2025

We'll need to document the variables related to EXPRTK and MUPARSER dependencies in https://gdal.org/en/latest/development/building_from_source.html#cmake-package-dependent-options

frmts/vrt/vrtexpression.h Outdated Show resolved Hide resolved
@rouault
Copy link
Member

rouault commented Jan 11, 2025

dbaston#2 : CI: add muparser to conda windows build

@rouault
Copy link
Member

rouault commented Jan 11, 2025

dbaston#3: ExprPixelFunc(): fix memory leak in error code path

@dbaston dbaston marked this pull request as ready for review January 11, 2025 15:57
@rouault
Copy link
Member

rouault commented Jan 12, 2025

The PR LGTM. Does @hobu want to give it some additional review ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants