-
-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZIP exports to S3 are corrupt #475
Comments
@darm0ck 👋 |
Given that kobocat/onadata/settings/kc_environ.py Line 129 in 8bfaea5
...it makes sense that we have to write more than 50 MB to have an issue. The following code creates a valid ZIP: stuffs = 'F' * 49 * 1024 * 1024
from onadata.libs.utils.export_tools import *
storage = get_storage_class()()
filename = 'aaa__test.zip'
out_f = storage.open(filename, 'wb')
import StringIO
sio = StringIO.StringIO(stuffs)
import zipfile
out_zip = zipfile.ZipFile(out_f, 'w', zipfile.ZIP_STORED, allowZip64=True)
out_zip.writestr('funstuff', stuffs)
out_zip.close()
out_f.seek(0)
out_f.close()
# test the generated zip
in_f = storage.open(filename, 'r')
in_zip = zipfile.ZipFile(in_f, 'r', allowZip64=True)
# "Read all the files in the archive and check their CRC’s and file headers. Return the name of the first bad file, or else return None."
if in_zip.testzip() is not None:
print "yep, it's broken!"
storage.delete(filename) ...but changing the first line to
|
Uhh...
|
😢
|
I opened jschneier/django-storages#566. For now, I'll try to implement a wrapper class that forbids |
My fork needs to be updated for the Python 3 version of django-storages and boto3 |
using my fork of django-storages. See kobotoolbox/kobocat#475
@noliveleger, the fork has been updated: https://github.com/jnm/django-storages/commits/s3boto3_accurate_tell |
…oto3-accurate-tell Make `tell()` accurate when using s3boto3, by…
This has reared its head again with the upgrade to Python 3. I think something about Python's zipfile module has changed—possibly that it attempts to seek back to the directory header to fill in previously-unknown CRC and length information after adding each file. I do notice flag 0x08 being set when the underlying file is not seekable; according to wikipedia:
I originally said that I'd forbid Internal discussion: https://chat.kobotoolbox.org/#narrow/stream/6-Support/topic/ZIP.20exports.20failing |
Not critical anymore; let's do future work in #743. |
To reproduce, enter the KC Django shell and run:
Will whittle this down in future comments.
The text was updated successfully, but these errors were encountered: