Fix MMapDirectory performance issue, #1151 #1152
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix MMapDirectory performance issue by using BufferedIndexInput instead of ByteBufferIndexInput.
Fixes #1151
Description
This is a draft PR to run tests on all platforms and gather feedback on this approach.
As noted in #1151, MMapDirectory has a severe performance problem that causes it to be as much as 51x slower in my testing than SimpleFSDirectory. I was able to narrow down the root cause: reading a single byte from a MemoryMappedViewAccessor is very slow in .NET. I believe this is because it has to do logic like acquire and release a pointer, in addition to the range checking. Reading a single byte at a time is very common in our codebase, such as in LZ4 decompression which was a hotspot in the profile. The performance difference of memory-mapped files between .NET and Java disappears when you read multiple bytes at once via ReadArray.
While it might be possible to create a ByteBuffer implementation for memory-mapped files that also has an internal buffer to read 1kB at a time, I decided to take a stab at making MMapDirectory use a similar approach to SimpleFSDirectory, and have its IndexInput implementation inherit from BufferedIndexInput, which maintains a 1kB buffer around the input. Then, when it needs to refill its buffer, it reads from the memory-mapped file view accessor. By buffering 1kB at a time, ReadByte reads from the already-filled buffer which is a fast array operation, resulting in a massive speed-up.
Initial performance results (compare to results in #1151), macOS arm64, .NET 9: