Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues on large files? #6

Open
Cobertos opened this issue Oct 17, 2021 · 2 comments
Open

Performance issues on large files? #6

Cobertos opened this issue Oct 17, 2021 · 2 comments

Comments

@Cobertos
Copy link

I believe there's issue with performance and iterating over large tar files, not sure if large size or number of entries.

Any chance this is fixable? Looking at the code I don't think this is possible without writing a better native Python implementation of tar extraction.

Thanks for your library btw.

@beatsbears
Copy link
Owner

Hey @Cobertos thanks for opening the issue. Off the top of my head I don't know of any obvious performance enhancements.

Any chance you have files where the performance is especially poor? I could potentially do some profiling to see which checks are the most expensive.

@catfag
Copy link

catfag commented Oct 17, 2021

It's a bit sloppy but it'd be possible to integrate that into the tarfile implementation In CPython. I'm not sure if it'll improve performance but it might since it'll avoid iterating over the tar archive twice (which seems to be a very slow operation). It's probably sufficient to override next() in the tarsafe class and implement the checks there (near the bottom of the method); it seems that due to Python's semantics it'll call this overridden next() method from the parent methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants