Skip to content

Sorghum Phenophase Bayesian Belief Network in R and Python

License

Notifications You must be signed in to change notification settings

rbartelme/phenophasebbn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

As of October 29th 2021, this project is no longer maintained in this repository. For any further updates to this project, please see the GenoPhenoEnvo organization repository

Sorghum bicolor bicolor Phenophase Bayesian Belief Network in R & Python

This project uses:

  • Rocker Group's Tidyverse R 4.0 Ubuntu 18 LTS docker container image
  • data from the TERRA-REF project accessed through the traits R package
  • jags for Gibbs Sampled MCMC modeling
  • causalnex to implement the NO TEARS directed acyclic graph structure learning algorithm as described here
  • causalnex has dependencies: pandas, sklearn, and igraph

To develop a causal Bayesian network, also known as a Bayesian Belief Network, predicting growth rate as a phenotype from the Sorghum bicolor biomass accumulation panel.

This analysis produces a casual inference Bayesian Belief Network similar to Judea Pearle's work, where the nodes (vertices) of the network represent variables and the edges (arcs) represent linked dependencies supported by conditional probailities.


Methods

Docker Setup

To run any aspect of this analysis it is recommended that you have Docker installed on the host machine. Or use singularity-ce to run the containers on high performance clusters.

Running the Analyses with Docker

  • All RScripts detailed below can be run with the container image cyversevice/rstudio-bayes-cpu:4.0-ubuntu-jags, including the growth rate modeling
  • All python code will run in the command line with this Docker container image and is written so that this repository is mounted as a volume in the container image as /work/phenophasebbn/
    • Ex. docker run --rm -it -v /local/path/to/phenophasebbn/:/work/phenophasebbn/ rbartelme/pytorch-causalnex:0.10.0 python /work/phenophasebbn/bbn/bbn_structure.py (See note below)
    • The current Dockerfile for this image is contained in this repository at /causal_nex/Dockerfile
  • A JupyterLab Docker container image has been created to facilitate the exploration of the python codebase

Initial Graph Embedding

Initial Graph

In order to speed up the directed acyclic graph generation for the Bayesian Belief Network, an initial graph was instantiated using lists of tuples that reference the edge/node connections and directions outlined in the conceptual diagram above.

NOTE: Learning the graph structure without any expert knowledge graph encodings via the NO TEARS implementation in causalnex without GPU acceleration is a computationally intensive process and may not solve the graph structure with the Sorghum gene data included in these analyses.


Network Workflow Description

How the contents of this repository were used to generate the analysis.

1. Processing raw data:

  • Weather & phenotype data processing:
    • Code: /bnprocess_functional.R
    • Exports (TSV):
      • /season4_combined.txt
      • /season6_combined.txt
      • /ksu_combined.txt (No longer used in final analysis)
  • Genomic Data:
    • Code to process the SNP frequency by Sorghum bicolor gene table from this repository can be found in /genomic_preprocessign/snp_normalization.R
    • Exports (TSV):
      • /genomic_preprocessing/genewise_snp_relative_abundance.txt where the relative abundance of single nucleotide polymorphisms is calculated relative to the Sorghum bicolor biomass accumilation panel population
  • Development work:
    • notes and pseudo code are in /sandbox/ and /bnprocess_mac.R

2. Model Growth Rate by Sorghum bicolor Cultivar using JAGS in R:

  • /jags/ contains the dev code for the growth rate modeling below, these scripts & files are used in the bbn structure learning model
  • Full logistic growth rate modeling by Jessica Guo
  • Summary plots of the logistic growth models can be found in /data_figs/

3. Prepare dataset for structure learning in R & Python:

  • Join genomic, environmental, and phenotypic data
    • This is done with the Rscript /bbn/join_datasets.R
  • Exports:
    • /bbn/rgr_snp_joined.csv

4. BBN Structure Learning in Python with NO TEARS algorithm:

  • Ingest joined data /bbn/rgr_snp_joined.csv and learns structure with:
    • /bbn/bbn_structure.py
  • Process categorical data with labelencoder from scikit-learn
  • Encode expert knowledge into graph structure via a list of tuples in the first invocation of StructureModel()
    • png exported as /bbn/init_graph.png (as of 10-25-2021 this takes a long time to write the png and is commented out)
  • Improve graph structure with NO TEARS using the from_pandas function from causalnex blacklisting spurrious node + edge connections with a second list of tuples
  • Exports:
    • categorical label encodings for genotype (or cultivar) /bbn/genotype_map.json & /bbn/season_map.json
    • pickle of structure model as /bbn/nt_sm
    • png of directed acyclic graph as /bbn/final_graph.png

5. Discritized Data Mapping & Conditional Probability Distribution Fitting:

  • Import Bayesian Network by structure model pickle
  • Instantiate Bayesian network with BayesianNetwork() function from causalnex
  • Map continuous variables into categories