-
-
Notifications
You must be signed in to change notification settings - Fork 90
Performance Improvements #63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Awesome, thanks for this! Will try to get to this later this evening. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of questions below.
I'm a little nervous about these changes without good test coverage in place to guard against regressions.
pystdf/IO.py
Outdated
@@ -193,17 +184,26 @@ def parse(self, count=0): | |||
raise | |||
|
|||
def getFieldParser(self, fieldType): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you get the same effect with a memoizing decorator here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decorator has been added.
try: | ||
for parse_field in field_parsers: | ||
fields.append(parse_field(parser, header, fields)) | ||
except EndOfRecordException: pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a change in behavior [swallowing the end of record exception, I mean]. Why was this added? Could it subtly change the behavior of how the parser is used in an unexpected way? I'm concerned that it might... can you provide more context here to derisk that concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that the addition of the EndOfRecordException
exception handling in the createRecordParser
method originates from the original appendFieldParser
method.
The original appendFieldParser
method served the purpose of organizing the order of actions and handling exceptions.
Now, the memorize
decorator only provides the caching functionality. The organization of actions and exception handling have been moved to the createRecordParser
method (the original caller of appendFieldParser
) for explicit implementation.
A new The performance is further optimized (the time consumed is about 75% of the master) (pystdf) ➜ pystdf git:(speed_up) ✗ python test.py
Time taken to import data/lot3.stdf: 0.4064 seconds for 4.35 MB of data (lot3)
Time taken to import data/lot2.stdf: 0.4240 seconds for 4.21 MB of data (lot2)
Time taken to import data/demofile.stdf: 0.4097 seconds for 4.35 MB of data (demofile) |
From a performance standpoint, batch processing of However, this PR is already quite complex, and introducing To keep things manageable, I think it would be better to address the optimization of This way, we can focus on refining the current changes while ensuring clarity and maintainability. Let me know your thoughts! |
By using closures instead of lambda / caching field parsers methods / precompiling regular objects, a performance improvement of about 15% was achieved (mainly from closures)
Here are my benchmark codes: