BioLabSim: Data Analysis for Biotechnology

Introduction

This Jupyter book is BioLabSim: a collection of workflows to simulate different steps of strain engineering and fermentation in industrial biotechnology. The data is generated from a virtual organism simulation with various models of microbial metabolism, genetics and physiology {cite:p}Liebal2023.

The compute environment works best with Python=3.9.21 (also 3.11.11), Pandas=2.2.2, Numpy=1.26.4, COBRA=0.29.1, Biopython=1.79, (Pip=24.1.2, Jupyter-client=6.1.12, Jinja2=3.1.5, Bokeh=3.6.2, Openpyxl=3.1.5).

The BioLabSim workflows are designed to help students and researchers in biotechnology to learn and practice data analysis in biotechnology. The workflows are based on the Python programming language and Jupyter Notebooks, which provide an interactive environment to write code, visualize data, and generate plots. The workflows cover various topics, including genetics, fermentation, metabolism, and genetic regulation, and are accompanied by additional materials to deepen the understanding of the underlying concepts. By working through these workflows, users can gain hands-on experience in data analysis and computational modeling in biotechnology, enhancing their skills and knowledge in this field.

How to run the simulations

The simulations can be run in GitHub Codespaces or Google Colab. For Codespaces, login with your account to GitHub. Click on the green Code icon on the top of the repository description. Select the register card Codespaces and start a new instance. The Codespaces is a remote server space, with e.g. VSCode interface, that contains the full repository. You can navigate on the left side through the folder structure and open files to be displayed on the main screen. On the top right of the main screen the kernel can be selected.

For Google Colab, start login to your Google account and navigate to the main page at https://colab.research.google.com. In the Open notebook window, click on GitHub and enter the name BioLabSim in the search field. In the search results you can extract individual Jupyter Notebooks into Google Colab. The kernel is already preset, Python=3.11.11 works fine. Because in Colab Notebooks are retrieved from the GitHub as single files, internal links will stop working. You may need to manually copy over needed files with

os.system('wget https://raw.githubusercontent.com/biolabsim/BioLabSim/refs/heads/master/requirements.txt')

Available workflows

Name	Field	Content	Addition Material	Time, h	Developer
Notebook Introduction	Programming	A tutorial to explore the important features of Jupyter Notebooks.	Introduction to Jupyter Notebooks	0.5	Stephan Palkovits (RWTH)
RecExpSim	Genetics	Simulation of recombinant protein expression and data analyses, including growth characterization, promoter sequence selection, cloning and expression.	$\cdot$ Propterties of mesophilic organisms overview (p. 18) $\cdot$ Bacterial promoter architecture review (-10 and -35 box) $\cdot$ GC-content calculations	3h	Ulf Liebal (RWTH) Iris Broderius (RWTH)
FermProSim	Fermentation	Parameter estimation of fermentation with Monod-equation.	$\cdot$ ...	1.5h	Jonathan Sturm (WHS)
GroExpSim-Experiment	Fermentation	Setup of growth experiments to identify optimal temperature, substrate concentrations, cultivation time, and sampling period.		1.5h	Ulf Liebal (RWTH)
GroExpSim-Data Analysis	Fermentation	Data analysis of growth experiment to analyse biomass and substrate rates and yields.		1.5h	Ulf Liebal (RWTH)
GSMM+FBA_Start	Metabolism	Introduction to genome scale model constraint based and reconstruction analysis (COBRA). Investigating and modifying existing models		1	Ulf Liebal (RWTH) Brigida Fabry (RWTH)
GSMM+FBA_YieldsMutants	Metabolism	Analysis of models, FBA.		1	Ulf Liebal (RWTH) Brigida Fabry (RWTH)
GSMM+FBA_GrowthCorr	Metabolism	Model growth correlation and visualization.		1	Ulf Liebal (RWTH)
Genetic Logic Gates, vol1	Genetics	Introduction to genetic regulation and mathematical modelling with Hill equation.	Elowitz & Bois, Caltech	1.5h	Ulf Liebal (RWTH)
Genetic Logic Gates, vol2	Genetics	Investigation of feedforward loops, representation of logic gates with mathematical models	Elowitz & Bois, Caltech	1.5h	Ulf Liebal (RWTH)
Metabolic Engineering Simulation (MetEngSim)	Metabolism	Introduction to gene databases and metabolic maps.		1h	Paula Lanze (RWTH), Ulf Liebal (RWTH)
Industrial Fermentation Simulation (IndFermSim)	Metabolism	Introduction to gene databases and metabolic maps.		1h	Bhavya Dutta (HSRW), Joachin Fensterle (HSRW)

How Complex Data Permeates Biotechnology

Data analysis plays a crucial role in biotechnology by providing insights essential for optimizing fermentation processes. Effective fermentation necessitates precise control of variables such as nutrient concentrations, pH levels, and temperature. Monitoring and measuring these variables using advanced sensors and analytical tools generate data that informs decision-making and process adjustments. For instance, in microbial fermentation, understanding the dynamics of metabolite production and consumption is pivotal for achieving high yields and product quality {cite:p}nielsen2016engineering. Accurate data analysis enables biotechnologists to uncover patterns and correlations in the data, guiding the modification of fermentation conditions to enhance productivity and efficiency {cite:p}Narayanan2020. Without robust data analysis, harnessing the potential of biotechnological processes would be severely limited.

The integration of high-throughput measurement techniques {cite:p}Wehrs2020 and omics technologies, such as genomics, proteomics, and metabolomics, has significantly elevated the complexity of data in biotechnology {cite:p}Pinu2019. These advanced methodologies capture an immense volume of molecular information, offering a comprehensive view of intricate biological processes. However, this expanded scope comes with a challenge – the data becomes more intricate and multifaceted. Traditional analytical methods struggle to cope with the sheer volume and intricacy of omics data, which encompasses intricate molecular interactions across various biological networks. This complexity demands the development and application of novel computational tools and algorithms to extract meaningful insights {cite:p}Volk2020. The advent of high-throughput and omics technologies has thus transformed biotechnological research, requiring a paradigm shift in data analysis approaches to fully exploit the potential of these data-rich techniques.

Improving Biotech with New Computational Methods

As biotechnological data becomes increasingly complex and intricate, the demand for sophisticated analysis methods and advanced statistical techniques grows. The intricate nature of modern data, stemming from diverse sources such as omics technologies and high-throughput experiments, requires analytical approaches that can unravel intricate patterns and correlations. Techniques like machine learning, network analysis, and multivariate statistics are now crucial for extracting meaningful insights from such complex datasets {cite:p}Oliveira2019. Moreover, the adoption of FAIR (Findable, Accessible, Interoperable, and Reusable) standards is essential to ensure that intricate biotechnological data remains comprehensible and shareable {cite:p}Rehnert2022. These standards facilitate data integration and collaboration, enabling researchers to collectively address intricate challenges and unlock new frontiers in biotechnology.

Paragraph about design-build-test-learn cycles and the automated strain design workflow.

:::{dropdown} Summary

Data analysis is crucial in biotechnology, guiding effective fermentation through precise measurement of variables, such as nutrient levels and metabolite concentrations, to optimize microbial production systems.
Integration of high-throughput measurement and omics technologies elevates data complexity, challenging traditional methods and demanding novel computational approaches for meaningful insights.
Industrial biotechnology's intricate data necessitates advanced analysis methods like machine learning and network analysis, while FAIR standards ensure accessible and collaborative data management. :::

User Notes

To run BioLabSim directly choose among the following JupyterHubs:

RWTHjupyter: RWTH Aachen University JupyterHub:
RWTHjupyter: RWTH Aachen University JupyterHub:

It is possible to download the examples in BioLabSim and run them locally. This requires the installation of packages to do the simulations, see Developer Notes.

(DevopNotes)=

Developer Notes

Setup project for local development

# Setup the python virtual environment next to it. (use Python 3.9)
python3.9 -m venv py39-env

# Activate your environment. (Broad topic that depends on what software and OS is used)
source py39-env/bin/activate

# Clone the repository to a nearby folder.
git clone https://git.rwth-aachen.de/ulf.liebal/biolabsim.git repo-biolabsim

# Enter the newly cloned repository.
cd repo-biolabsim

# Install all required python libraries.
pip install -r requirements.txt

# See the Notebook for examples on how to use the library.

Contacts

Ulf Liebal

Institute of Applied Microbiology-iAMB, Aachen Biology and Biotechnology-ABBT, RWTH Aachen University, Worringerweg 1, 52074 Aachen Germany

Licence: See LICENCE file @https://git.rwth-aachen.de/ulf.liebal/biolabsim, or @https://github.com/uliebal/BioLabSim

References

:::{bibliography} :::

Name		Name	Last commit message	Last commit date
Latest commit History 544 Commits
Concepts		Concepts
Data		Data
Figures		Figures
Notebooks		Notebooks
Resources		Resources
Snippets		Snippets
docs		docs
models		models
try		try
.DS_Store		.DS_Store
.editorconfig		.editorconfig
.flake8		.flake8
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
AUTHORS.md		AUTHORS.md
Dockerfile		Dockerfile
Index.ipynb		Index.ipynb
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
_config.yml		_config.yml
_toc.yml		_toc.yml
freeze.txt		freeze.txt
requirements.txt		requirements.txt
runtime.txt		runtime.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioLabSim: Data Analysis for Biotechnology

Introduction

How to run the simulations

Available workflows

How Complex Data Permeates Biotechnology

Improving Biotech with New Computational Methods

User Notes

Developer Notes

Setup project for local development

Contacts

References

About

Releases

Packages

Contributors 7

Languages

License

biolabsim/BioLabSim

Folders and files

Latest commit

History

Repository files navigation

BioLabSim: Data Analysis for Biotechnology

Introduction

How to run the simulations

Available workflows

How Complex Data Permeates Biotechnology

Improving Biotech with New Computational Methods

User Notes

Developer Notes

Setup project for local development

Contacts

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages