Skip to content

Latest commit

 

History

History
441 lines (304 loc) · 22.5 KB

README.md

File metadata and controls

441 lines (304 loc) · 22.5 KB

motion-blur-microscopy

Table of Contents

  1. Introduction
  2. Training Code/Data Navigation
  3. Analysis Code/Data Navigation

Introduction

Hello, and welcome to the Github page which houses the code and data used for the paper Motion Blur Microscopy by Goreke, Gonzales, Shipley, et al. In this Github repository, you will find a complete collection of code used and data used/generated in the production of results for the paper. In this paper, we developed a machine learning protocol where adhered cells could be analysed in the Motion Blue Microscopy framework. The relevant code and data can be largely split into two categories.

  1. Training Code/Data
  2. Analysis Code/Data

Code and data for the two larger categories can be found in the Training_Material and Analysis_Material directories respectively. Please note, this Github only contains representative inputs and results. If a reader wants access to the complete data set for the project, please navigate to our OSF repository here. The following sections will guide readers on how to navigate both categories.

The most recent stable versions of libraries which can be used to run all of the code are as follows:

  • python 3.9.15
  • matplotlib 3.6.2
  • numpy 1.23.4
  • tensorflow 2.10.0
  • keras 2.10.0
  • opencv 4.6.0
  • pandas 1.5.2
  • scikit-image 0.18.1
  • scipy 1.9.3
  • scikit-learn 1.2.2
  • statsmodels 0.13.5
  • albumentations 1.3.1

But stable versions for each particular chunk of code are also listed if you are only interested in running bits of the code at a time.

Training Code/Data Navigation

When you enter the Training_Material directory, you will notice many sub-directories. The idea here is that we want to decompose the training process into pieces that are more accessible to the community. In each sub-directory, there will be a corresponding Jupyter notebook, as well as sub-sub-directories, which contain inputs to the code and outputs from the code that were used and produced in our work. The "official" order of sub-directories to be followed (with descriptions) is as follows:

Phase 1 Code/Data

Phase 1 of the machine learning workflow is a semantic segmnatation network, whose job is to classify every pixel of an input motion blue microscopy into one of two categories, background, or adhered.

  • Complete Mask Coloring As a starting point, a training image(s), or frame(s) from a training video, should be manually colored. The user should color over all of the "adhered" regions of the training images, and leave the rest of the image untouched. We personally labeled our images using the software Gimp. The code in this sub-directory will take an image, or images labeled in this way and fill all non-colored regions as background, which is noted by the color black [0,0,0] (RGB).

    • Input: Partially colored masks, where the "adhered" regions of the original images are colored.
    • Output: A fully colored mask, where the "adhered" regions of the original image are colored the same as the input, and the rest of the pixels in the image are colored in black for "background".

    This code was last run without errors with the following library versions:

    • python 3.9.15
    • matplotlib 3.6.2
    • numpy 1.23.4
  • Label And Layer Masks The code in this sub-directory will convert the completed colored masks into label encoded regions. On top of this, the code will convert the label encoded regions into layered one-hot encoded regions.

    • Input: Fully colored masks.
    • Output: Label encoded masks. Also, layered one-hot encoded masks.

    This code was last run without errors with the following library versions:

    • python 3.9.15
    • matplotlib 3.6.2
    • numpy 1.23.4
    • tensorflow 2.10.0
    • keras 2.10.0
  • Split Into Tiles The phase one network takes in as an input regions of a specific size, specifically, 128x128 pixels. The code here generates these sized tiles from our input images/colored masks/label encoded masks/one-hot encoded masks by first splitting each image into 150x150 pixel size chunks, and then rescaling to 128x128.

  • Input: Original images, colored masks, label encoded masks, and one-hot encoded masks.

  • Output: All of the input images/masks split into 150x150 pixel tiles and 128x128 pixel tiles.

    This code was last run without errors with the following library versions:

    • python 3.9.15
    • matplotlib 3.6.2
    • numpy 1.23.4
    • opencv 4.6.0
  • Extract Training Tiles Class Distributions In this sub-directory will take a collection of extracted tiles from different sources, and based on a user-specified cutoff, split them into a training set, and a validating/testing set.
  • Input: A collection of extracted tiles from different sources.

  • Output: A .csv file which lists each of the tiles, and their corresponding category of training vs. validating/testing, as well as the number of adhered vs. not-adhered pixels.

    This code was last run without errors with the following library versions:

    • python 3.9.15
    • matplotlib 3.6.2
    • numpy 1.23.4
    • pandas 1.5.2
  • Input: A .csv file containing tile names with a classification of training vs. validating/testing as well as counts for each of adhered vs. not-adhered pixels.

  • Output: Three .csv files, one containing the training tile names and information, one containing the validating tile names and information, and one containing the testing tile names and information. The extracted tiles are now stratified with regards to adhered pixels vs. not-adhered pixels.

    This code was last run without errors with the following library versions:

    • python 3.9.15
    • numpy 1.23.4
    • pandas 1.5.2
  • Input: Three .csv files, one containing the training tile names and information, one containing the validating tile names and information, and one containing the testing tile names and information. The extracted tiles are now stratified with regards to adhered pixels vs. not-adhered pixels.

  • Output: Three .csv files, one containing the training tile names as .png and .npy files, one containing the validating tile names as .png and .npy files, and one containing the testing tile names as .png and .npy files.

    This code was last run without errors with the following library versions:

    • python 3.9.15
    • pandas 1.5.2
  • Optimize Phase One In this step, we optimize the hyper-parameters which define the phase one training. Namely, we optimize the loss function alpha value, the patience, and the learning rate of our optimizer.
  • Input: Original image tiles, as well as one-hot encoded mask tiles of size 128x128 pixels.
  • Output An array of trained network outputs, where the output is the jaccard-loss on a set of testing tiles. The optimal hyper-parameters are those that minimize the jaccard-loss on the testing tiles after training.

This code was last run without errors with the following library versions:

  • python 3.9.15
  • matplotlib 3.6.2
  • numpy 1.23.4
  • opencv 4.6.0
  • tensorflow 2.10.0
  • keras 2.10.0
  • pandas 1.5.2
  • albumentations 1.3.1
  • Train Phase One Here we actually train the phase one segmentation network. The network architecture is inspired by U-Net, and the Hinczewski Lab's previous work.

    • Input: Original image tiles, as well as one-hot encoded mask tiles of size 128x128 pixels.
    • Output: A trained network, as a .h5 file, which includes the network architecture and the associated weights, all in one.

    This code was last run without errors with the following library versions:

    • python 3.9.15
    • matplotlib 3.6.2
    • numpy 1.23.4
    • opencv 4.6.0
    • tensorflow 2.10.0
    • keras 2.10.0
    • pandas 1.5.2
    • albumentations 1.3.1

Phase 2 Code/Data

  • Extract Phase Two Regions In this sub-directory, you will find code which takes regions identified by the phase one segmentation network as adhered, and extracts a 40x40 pixel square centered on the adhered region. These regions will need to be manually classified by cell type by the user for use in the phase 2 network training. The purpose of this code is to speed up the process of identifying regions from the phase one network to be manually classified. In our analysis, we rescale the color of the inputs.

    • Input: Raw MBM images or frames from MBM videos.
    • Output: 40x40 pixel regions corresponding to areas of the raw images or frames classified as "adhered" by the phase one network.

    This code was last run without errors with the following library versions:

    • python 3.9.15
    • matplotlib 3.6.2
    • numpy 1.23.4
    • opencv 4.6.0
    • tensorflow 2.10.0
    • scikit-image 0.18.1
    • scipy 1.9.3
  • Create Train Split In this sub-directory, you will find code which splits manually labeled cells into a training set, validation set, and testing set. The splits can be adjusted as the user wants.

    • Input: Manually classified images of regions identified as "adshered" by the phase one network of size 40x40 pixels.

    • Output: The input images will be split into a training set, validation set, and testing set, used for training the phase 2 network.

    This code was last run without errors with the following library versions.

    • python 3.9.15
    • matplotlib 3.6.2
    • numpy 1.23.4
  • Optimize Phase Two In this sub-directory, you will find code which optimizes the hyper-parameters which define the phase two network. Namely, the architecture, patience, and learning rate.

    • Input: Manually classified regions identified from the phase one network split into training/validating/testing sets.

    • Output: An array of trained network outputs, where the output is the parameter set, and the trained networks performance on the jaccard-loss of a testing data set.

  • Train Phase Two In this sub-directory, you will find code which will use transfer learning to train a ResNet-50 network architecture with weights pre-trained on imagenet to classify cell types from one another.

    • Input: Manually classified regions identified from the phase one network split into training and validation sets.

    • Output: A trained VGG16 network, which can be used to classify adhered regions identified by the phase one segmantation network.

    This code was last run without errors with the following library versions.

    • python 3.9.15
    • tensorflow 2.10.0
    • keras 2.10.0

Analysis Code/Data Navigation

When you enter the Analysis Material directory, you will notice many subdirectories. The idea here, just as with the training material directory, is to decompose all of the analysis code and data into smaller chunks more easily understandeable for a reader. In each subdirectory, you will notice a Jupyter notebook script, as well as sub-subdirectories, which contain inputs and outputs that we used/generated in our work. The official "order" to run the code in is as follows:

Results Generation

The code in this section of the readme is used to take raw inputs, with the trained machine learning networks, and generate data from them. These results might be counts of cells, or morphological features for static images, or dynamic quantities in the case of videos.

  • Extract Morphological Features. This sub-directory will have two relevant scripts. One of the scripts can be used to extract morphological features (size, eccentricity) of regions classified as "adhered" by the phase 1 network, whereas the second script can be used to extract morphological features (size, eccentricity) of regions classified as "adhered" by the phase 1 network where input images are color adjusted.

    • Input: Raw MBM images or MBM video frames.

    • Output: Two .npy numpy arrays containing all of the region sizes and eccentricities.

      These codes were last run without errors with the following library versions:

      • python 3.9.15
      • matplotlib 3.6.2
      • numpy 1.23.4
      • opencv 4.6.0
      • tensorflow 2.10.0
      • scipy 1.9.3
      • scikit-image 0.18.1
  • Count Cells This sub-directory will have two relevant scripts. One of the scripts can be used to count cells for input MBM images or MBM frames using a size threshold. The second script can be used to count cells for input MBM images or MBM frames using a phase 2 classification network.

    • Input: Raw MBM images or MBM video frames.

    • Output: A .csv file which contains counts of relevant cells for all of the input images.

      These codes were last run without errors with the following library versions:

      • python 3.9.15
      • matplotlib 3.6.2
      • numpy 1.23.4
      • opencv 4.6.0
      • tensorflow 2.10.0
      • keras 2.10.0
      • scipy 1.9.3
      • scikit-image 0.18.1
      • pandas 1.5.2
  • Complete F1 Analysis The code in this sub-directory can be used to complete an F1 analysis for the phase two classification network.

    • Input: The input will be "adhered" regions identified by the phase one segmentation network that were NOT used in the training or validation of the phase 2 classification network.

    • Output: The output will be three .npy numpy arrays containing the precision, recall, and F1 score values for a range of confidence thresholds.

      This code was last run without errors with the following library versions:

      • python 3.9.15
      • matplotlib 3.6.2
      • numpy 1.23.4
      • opencv 4.6.0
      • tensorflow 2.10.0
  • Video Analysis

    • Input: The Phase 1 model path in H5 format, video path in AVI or MP4, the cell type ("srbc", "cart", or "custom"), and whether or not to automatically take the convex hull of all regions found in segmentation ('y' or 'n'). OPTIONAL: the range of frames to analyze (this can be greater than the total frames, it will just stop analysis once all frames have been processed), and if a custom threshold is desired, the thresholds as: [minimum area to consider],[minimum area to begin adhesion event],[number of frames a cell may "disappear"].

VALID ARGUMENTS: If arguments are invalid, the program will either raise a warning or interrupt the process.

The requirements should be installed for local use:

pip install -r  Analysis_Material/Video_Analysis/requirements.txt

To run this process in the command line, use the following format (once again, frames_range and custom_thresh are optional parameters and they may be excluded)

python Analysis_Material/Video_Analysis/MBM_videprocessing.py model='model.h5' video_path='video_path.avi' celltype='cart' autoconvex='n' frames_range=1,10000 custom_thresh=40,40,10

There is also a Colab version of the file Analysis_Material/Video_Analysis/MBM_videoprocessing_colab.ipynb that has the user select these inputs. This is recommended if someone would like to run analysis without installing anything and has less of a learning curve. Downside: less RAM available (you may have to process a long video in segments, hence defining the frames_range parameter). To use this code, navigate to the file on GitHub (here), click "Open in Colab" at the top of the page, and you should be able to run the program.

  • Output: By running either the Colab or local file, two CSV files will be produced:

    • The 'static' file: video_filename_static.csv allows user to explore frame-by-frame cell properties
    • the 'dynamic' file: video_filename_dynamic.csv gives an overview of each cell's behavior
  • Transform Tracks Into Dynamic Data The code in this sub-directory will take tracks generated from the the video analysis of MBM videos and transform the data into dynamics data.

    • Input: The input to the code will be tracks data generated from the video analysis of MBM videos.

    • Output: The output of the code will be dynamics data, which can be more easily used to generate results.

      This code was last run without errors with the following library versions:

      • python 3.9.1.5
      • numpy 1.23.4
      • pandas 1.5.2

Results Analysis

The code in this section is used to take data generated from raw inputs and create plots, tables, or any other sort of important representation of the results for the paper.

  • Create Hexplots This sub-directory will have code which will convert input region areas and eccentricities into a hexplot.

    • Input: Areas and eccentricities of regions identified by the phase 1 segmentation network as "adhered".

    • Output: A hexplot of the region areas and eccentricities.

      This code was last run without errors with the following library versions:

      • python 3.9.15
      • matplotlib 3.6.2
      • numpy 1.23.4
      • pandas 1.5.2
  • Create Reproducibility Plots The code in this sub-directory can create inter- and intra-reproducibility plots.

    • Input: A .csv file containing counts generated from two different experimenters at different times of an MBM experiment.

    • Output: Two plots, one for inter-reproducibility, and another for intra-reproducibility.

      This code was last run without errors with the following library versions:

      • python 3.9.15
      • matplotlib 3.6.2
      • numpy 1.23.4
      • pandas 1.5.2
  • Create R Squared Plots The code in this sub-directory can be used to generate R-Squared comparison plots between manual and automated counts.

    • Input: The inputs are two .csv files, one containing manual counts, and the other containing automated counts, for a variety of input MBM images or MBM video frames.

    • Output: The output is a plot which compares the automated and manual counts, with a reported r-squared, which describes how closely the two datasets are.

      This code was last run without errors with the following library versions:

      • python 3.9.15
      • matplotlib 3.6.2
      • numpy 1.23.4
      • pandas 1.5.2
      • scikit-learn 1.2.2
  • Create R Squared Plots With Groupings The code in this sub-directory can be used to generate R-Squared comparison plots between aggregated manual and aggregated automated counts.

    • Input: The inputs are two .csv files, one containing manual counts, and the other containing automated counts, for a variety of input MBM images or MBM video frames.

    • Output: The output is a plot which compares the aggregated automated and aggregated manual counts, with a reported r-squared, which describes how closely the two datasets are.

      This code was last run without errors with the following library versions:

      • python 3.9.15
      • matplotlib 3.6.2
      • numpy 1.23.4
      • pandas 1.5.2
      • scikit-learn 1.2.2
  • Create Adhesion Time Probability Plots The script in this sub-directory can be used to generate adhesion time probability plots.

    • Input: The input here is a .csv file with dynamics data of MBM videos.

    • Output: The output of the script is an adhesion time probability plot.

      This code was last run without errors with the following library versions:

      • python 3.9.15
      • matplotlib 3.6.2
      • numpy 1.23.4
      • pandas 1.5.2
  • Create Eccentricity Vs Adhesion Time Plot The script in this sub-directory can be used to generate eccentricity vs. adhesion time plots.

    • Input: The input here is a .csv file with dynamics data of MBM videos.

    • Output: The output of the script is an eccentricity vs. adhesion time plot.

      This code was last run without errors with the following library versions:

      • python 3.9.15
      • matplotlib 3.6.2
      • numpy 1.23.4
      • pandas 1.5.2
      • statsmodels 0.13.5
  • Create Average Velocity Plots The script in this sub-directory can be used to generate a variety of average velocity plots. In particular, the user can generate with flow and cross flow average velocity probability plots, as well as with flow vs. adhesion time and cross flow vs. adhesion time plots.

    • Input: The input for the script is a .csv file with dynamics data of MBM videos.

    • Output: The output of the script are four plots. With flow average velocity probability plot, cross flow average velocity probability plot, with flow average velocity vs. adhesion time plot, and cross flow average velocity vs. adhesion time plot.

      This code was last run without errors with the following library versions:

      • python 3.9.15
      • matplotlib 3.6.2
      • numpy 1.23.4
      • pandas 1.5.2
      • scipy 1.9.3
  • Create Velocity Distribution And Track Plots The script in this sub-directory can be used to generate colored velocity distribution plots and also track plots, which show various adhesion event trajectories of MBM videos.

    • Input: The input for the script is a .csv file containing track data for each adhesion event in an MBM video.

    • Output: The output of the script is a colored with flow average velocity plot distribution, as well as a tracks plot, showing trajectories of adhesion events.

      This code was last run without errors with the following library versions:

      • python 3.9.15
      • matplotlib 3.6.2
      • numpy 1.23.4
      • pandas 1.5.2
  • Create F1 Plot The script in this sub-directory can be used to generate an F1 plot for various confidence thresholds.

    • Input: The input for the script is a .csv file containing F1 values for various confidence thresholds.

    • Output: The output of the script is a plot of F1 score vs. confidence threshold.

      This code was last run without errors with the following library versions:

      • python 3.9.15
      • matplotlib 3.6.2
      • numpy 1.23.4

Model download

Our models can be downloaded from huggingface at ayeshagonzales/MBM_Model.