-
Notifications
You must be signed in to change notification settings - Fork 646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search performance issue with MMapDirectory under load #1151
Comments
The results posted in the original issue were with the beta 17 nuget packages. Here is latest master on macOS arm64:
|
I have determined that this largely is a concurrent load issue. When setting MacOS 15, arm64 (MaxDegreeOfParallelism = 1):
|
I created an equivalent Java app using Lucene 4.8.1 on JDK 21. MacOS 15, arm64, parallel:
This shows that we definitely have a performance issue with MMapDirectory. (Aside: the good news is that the Java code had no issue reading this index built with Lucene.NET.) Here's the serial results:
|
Another finding: this discrepancy evaporates when just using |
Hi all 👋 It's not entirely the same thing I'm experiencing... though I do see some performance improvements when using What I've been discussing with @Shazwazza has to do with query execution time when ramping up the number of concurrent users. My test bench runs Using
|
Nice work tracking this down. I know how hard this can be! First of all, the below is all up for debate and is just my opinion. To ByteBuffer or Not To ByteBuffer?I should preface this with my desire to eliminate
The In addition, J2N is a general purpose library. ICU4N currently depends on Combined with the fact that we would all like a production release of Lucene.NET as soon as possible, fixing the One Byte at a TimeThe purpose of reading one byte at a time in Java appears to be to eliminate allocating temporary buffers on the heap to convert primitive types such as Span<byte> buffer = stackalloc byte[4]; But it does require there to be APIs to pass that span to without creating another allocation on the heap in order for it to be sensible, and I haven't run any benchmarks, but in Note however, that only the integral number conversions are available prior to .NET Core in the BCL. So, we will need to patch the others. If we follow the convention used for
System.Memory Support
Buffering the For the
MemoryMappedViewByteBuffer Buffering Design
There are probably more things to consider and this plan is admittedly a bit half-baked, but this is a good start. |
@NightOwl888 I think that approach makes sense. It was an alternative I mentioned (although thought through far less than you have here) on my prototype PR #1152. In that PR I confirmed that if the mmap is buffered, the performance problems go away, and it performs better than the other two directory implementations, like Java. Performance is actually slightly better than Java (as we would hope, but still great to see!) except for in the NIOFS case, which is slower in Lucene.NET for different reasons. My PR still has a good amount of failing unit tests that would need to be addressed if we want to take that approach though. Regardless, we've demonstrated that buffering is the solution, because reading a single byte from a mmap in .NET is painfully slow. (Aside: my performance measuring was on macOS, which gets NIOFSDirectory from I'm good with your plan above, although I am less familiar with the J2N code than you are. If you wouldn't mind taking a first pass at that on the J2N side, we can then compare approaches and see how they fare in performance testing. IMO, if BufferedIndexInput comes out ahead, we should still consider that, although there might likely be concurrency/scale issues (including concurrent writes) with that approach I haven't considered yet that would need to be considered as well. Regardless, it sounds like for methods like ReadInt32, we should do the J2N change anyways. |
@kjac Thanks for confirming that there is still a performance issue with MMapDirectory in your testing, even though it sounds like that's not the main problem you're experiencing. If you wouldn't mind filing an issue about that problem separately, that would be appreciated. If you are able to come up with a minimal reproduction that would be awesome, or at least include an overview of which Lucene.NET types you're using and how you're using them, and some profiling to know what the hotspots are, would be great. Thanks! |
Yeah, the
There is a pretty decent overview of how
I will add the span overloads to the base classes and backport the I think for the buffered implementation of
|
@NightOwl888 HeapByteBuffer looks like it could work, but unfortunately a lot of those methods are sealed, so I'm not sure on first glance how you'd override them to trigger refilling the buffer... Another thing to explore: I found this in the dotNext repo, about how to use unsafe code to get Memory/ReadOnlySequence from MemoryMappedFile: https://dotnet.github.io/dotNext/features/io/mmfile.html - that code is MIT-licensed so we could lift it (with license attribution) without having to add a dependency if we wanted. |
Well, the good news is nearly all of the classes that are part of the buffers API are internal, including The DotNext code looks interesting. Looks like they are handling Just a thought - it would be a lot simpler if the |
Is there an existing issue for this?
Describe the bug
I have discovered that MMapDirectory has serious performance problems under parallel load when searching (with no background writes), across all platforms. The demo repo to reproduce this issue is at https://github.com/paulirwin/LuceneAdventureWorks/
This demo builds a simple index from the AdventureWorks2022 database data, and does a load test against it with 100k requests, using Parallel.For to search in parallel as fast as possible.
Of note: MMapDirectory is the default Directory implementation returned by
FSDirectory.Open
on Windows and Linux 64-bit platforms.^ This was an outlier for NIOFS, and yes it was identical to MMap. Also note that hardware between these machines is not exactly comparable but all are 6-8 performance core machines with at either 32GB or 64GB RAM.
Given that MMapDirectory is the default Directory implementation returned by FSDirectory on Windows and Linux 64-bit (which is the majority of Windows and Linux use nowadays), we definitely need to resolve this, as MMapDirectory is supposed to be a fast implementation.
Expected Behavior
MMapDirectory performs well on all platforms.
Steps To Reproduce
See demo repo.
Exceptions (if any)
No response
Lucene.NET Version
4.8.0-beta00017
.NET Version
No response
Operating System
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: