Skip to content

Automate zipfile extraction #23

@cutright

Description

@cutright

Would be helpful to port the automated zip file extraction code from the previous version of this repo:

https://github.com/cutright/IMRT-QA-Data-Miner/blob/85abf9dc66a139c02574c386377f46f0944c5893/IQDM/utilities.py#L190-L208

def extract_files_from_zipped_files(init_directory, extract_to_path, extension='.pdf'):
    """
    Function to extract .pdf files from zipped files
    :param init_directory: initial top-level directory to walk through
    :type init_directory: str
    :param extract_to_path: directory to extract pdfs into
    :type extract_to_path: str
    :param extension: file extension of file type to extract, set to None to extract all files
    :type extension: str or None
    """
    for dirName, subdirList, fileList in walk(init_directory):  # iterate through files and all sub-directories
        for fileName in fileList:
            if splitext(fileName)[1].lower == '.zip':
                zip_file_path = join(dirName, fileName)
                with zipfile.ZipFile(zip_file_path, 'r') as z:
                    for file_name in z.namelist():
                        if not isdir(file_name) and (extension is None or splitext(file_name)[1].lower == extension):
                            temp_path = join(extract_to_path)
                            z.extract(file_name, path=temp_path)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions