Team:British Columbia/Model

UBC iGEM 2021

MODELS

Computational & Mathematical Modelling

INTRODUCTION



To inform wet-lab and support the rest of our project, our dry-lab team focused their efforts on three specific tasks:

1) Leveraging mathematical models to infer light output from our split-lux operon system based off of biomarker concentrations and promoter activity.

2) Developing computer simulations and visualizations of the tumour microenvironment (TME).

3) Performing bioinformatic analysis of tumour transcriptomics data.


MATHEMATICAL MODELS



In our attempt at developing an in-vivo biosensor for detecting tumour immune activity, we quickly realized that we were limited in our ability to perform in-vivo testing of our system. To elucidate the expected performance of our lux operon system, we turned to robust models [1-2] based around promoter activity inference from green fluorescent protein (GFP) fluorometry and lux operon kinetics, in order to simulate light quanta output from Salmonella. These two models are coupled with a detailed 3D TME software simulation that has been modified to include Salmonella population density data based on how they are known to cluster in the TME.



These three models are brought together in order to reconstruct promoter activity of the system and then infer light output in the TME, providing valuable information for potential in-vivo imaging applications as discussed on our Implementation page.

Informed by experimental data examining dynamic RFP presence and biomass at a variety of inducer concentrations, a dataframe of the TME simulation is used to predict dynamic light output over time. Several assumptions are made in this approach, foremost being that inducer concentrations and cell density in discrete tumour regions of the TME are constant over the prediction interval.

Model #1: Fluorescence to Promoter Activity


The first model in our pipeline is a differential model constructed by Kannan et al [1], which we used to reconstruct promoter activity for our AND-gate plasmid system. We started by gathering fluorescence data from a range of inducer concentrations over 12 hours. This fluorescence data is run through the codebase to output promoter activity data over time for each inducer concentration.

Promoter activity units are dependent on the function inputs (biomass and fluorometry) to give units of relative fluorescence per unit biomass per unit time (RFU/OD600/h, in our case). This ability to utilize different inputs is crucial to the success of the model, as it allowed for the units to be tailored to the required promoter activity units needed for input into the next stage of the modelling approach. The model used for this first modelling phase is outlined below, and its codebase and outputted graphs can be found on our GitHub here.


  • R(t) is the amount of mRNA in the culture over time in units corresponding to measured units for protein amounts (ie. RFU)
  • N(t) is the amount of nonfluorescent protein in measured units
  • F(t) is the amount of fluorescent protein in measured units (ie. RFU)
  • b(t) is the measured biomass, measured in units of OD
  • P(t) is in units of RFU per unit biomass per unit time
  • β is translation rate in number of protein per mRNA per unit time
  • μ(t) is the growth rate
  • m is the maturation rate of the fluorescent protein
  • γr is the degradation rate of the mRNA
  • γn is the degradation rate of the nonfluorescent proteins
  • γf is the degradation rate of the fluorescent protein

Model #2: Promoter Activity to Light Output


With the curves generated for promoter activity from our previous model, we are able to sample datapoints for inducer concentrations at discrete regions of the TME and predict promoter activity over a 12-hour window. Once the appropriate curve is chosen to match the inducer concentration in the simulation, our second model, created by Iqbal et al [2], is used to simulate light quanta output from each region in the TME.

The light output from the lux system can be tracked in mice, and demonstrates promise for the use of our system to investigate immune activity in tumour microenvironments.

The differential system, originally developed by Professor Dov Stekel from the University of Nottingham in the UK, and later refined by Iqbal et al, is outlined below. The reaction velocities can be found in the paper [2] as well as in our Appendix. The codebase uses a robust Monte-Carlo Method approach to iterate and compute the resulting light output.



SOFTWARE SIMULATIONS



As we were unable to study the tumour microenvironment (TME) through means such as tumour organoids or mouse models, we relied on software simulations to help us mimic the TME and predict the efficacy of our system in-vivo. We discovered an experimentally-validated codebase, called "tumorcode", originally developed by Professor Heiko Reiger and his research group at Saarland University, that simulates the microenvironment of a developing tumour over time [3].

Alongside a cell-level simulation called "VBL" (Virtual Biology Laboratory) [4], the tumorcode program is able to simulate tumours in great detail, generating blood vessel networks and cells (cancerous and healthy) along with intra- and extra-cellular chemical profiles - e.g. intracellular ATP levels, extracellular oxygen and lactate levels, etc. The ultimate goal for our team was to integrate Salmonella colonization into the simulation in order to predict the activity of our engineered biosensor in a hypothetical tumour.

Initially, we attempted to generate simulations by compiling and running tumorcode ourselves through Amazon EC2 instances. However, it became apparent that generating a simulated tumour of sufficient size, and which included the details of the VBL cell-level simulation, would take a prohibitively long time.

Instead, we decided to make use of the simulated tumour data included with the paper on the integration of tumorcode and VBL [4]. Our first step was to understand the data format of this simulation output as well as the many variables it contained. The data is stored in a HDF5 [5] file format. We used a GUI application, HDFCompass [6], and a Python package, h5py [7], to investigate the data. We also reached out to the original authors on GitHub and they graciously helped us make sense of certain components of the data. Our conversation can be found here.


Parameter Slices
Graphical representation of various TME parameters across slices of our simulated tumour. Visualized using h5py and matplotlib.

The next step was to integrate our engineered bacteria into the simulation. The original software was only capable of modelling tumour cell type, so we focused on expanding the work by introducing Salmonella into the system, and modelling it’s growth, penetration, and colonization patterns with distinct regions of a multicellular tumour spheroid (MTS) over time, based off of experimental data from literature.

For this, we referred to a study by Ganai et al [8] that describes Salmonella's differential colonization of four different tumour regions. The four tumour regions described are the "Edge", "Body", "Transition Zone", and "Core". Based on the descriptions of these regions’ chemical profiles and locations within tumours, we used c-means clustering to classify each cell in our simulated tumour into one of these four regions.


Tumour Regions
The four tumour regions described in Ganai et al.

GIF of Tumour Regions
Showing, in order, the Core, Transition Zone, Body, and Edge regions of a cross-section of our simulated tumour. Visualized using ParaView.

Next, we needed to extract the data on overall and per-region Salmonella concentrations from figures included in Ganai et al because the raw data was not available. We did this with a tool that extracts data points from charts called WebPlotDigitzer [9]. With this data in hand, we were able to add Salmonella concentrations over a period of 0-100 hours after inoculation to each region in our simulation.


Salmonella Growth and Colonization in TME
Figure from Ganai et al, showing relative Salmonella concentrations in each of the four tumour regions.

Salmonella Colonization of Fake MTS
Relative Salmonella concentrations for each tumour region from 0-100 hours after inoculation.

So, what are our next steps with this project? Our future directions include:


  • Modelling bacteria-cancer interactions (ie. the impact of bacteria presence on tumour cells as Salmonella has been shown to induce apoptosis and delay tumour growth in certain studies [8]).

  • Currently, only our lactate biomarker is modeled within our simulation, as many cell types (such as immune or connective tissue cells) are not modeled by the program, and thus the concentration distribution of our second biomarker, TNFa, is not outputted by the system. As a proof-of-concept, we hope to model macrophages and their subsequent TNFa production and secretion within the TME, as they are reported to be one of the most abundant immune cell types, and a major source of TNFa, within the TME [10].

All the scripts, walkthroughs, and visualizations we developed are available on our GitHub here.

We hope that our added features and increased documentation will help make this unique computational tool more accessible and useful for others in providing an overall, yet detailed, and non-invasive way of studying the TME.


BIOINFORMATIC ANALYSIS



To support our literature search and confirm our decision of introducing tumour-associated macrophages (TAMs) as a proof-of-concept source of TNFa within our 3D software simulation, we conducted bioinformatic analysis of publicly available single-cell RNAseq data [11] sequenced from tumours of various cancer types, in order to see which cell types had the highest TNF gene expression level within the TME. We conducted the analysis in R and used data from a GitHub repository called TMExplorer [11].

Given that we only analyzed two datasets, the results are not conclusive, however the exploratory analysis did yield certain findings that support what we read in the literature. One dataset consisted of a melanoma tumour and the other, a sample of eleven breast cancer tumours.

As shown in the figure below, T cells, B cells, and tumour cells had the highest TNFa expression count for both cells with more than 3 transcripts, and amongst total cells.



When analyzing the breast cancer dataset, the highest TNF gene expression was seen in tumour cells for both metrics, as shown in the plots below. A limitation of this method of analysis is that both metrics are absolute counts of expression that depend highly on the number of cells in each category. Having more cells of a specific cell type generates a higher absolute expression count, even if a different cell type has cells with higher expression overall.



To counteract the flaw in our previous analysis, we next looked at the average TNFa expression per cell type. Looking at the rightmost figure below, average expression per cell type in melanoma was very similar across the board with Cancer-Associated Fibroblasts (CAFs), Natural Killer (NK), and tumour cells showing the highest expression levels. These results are drastically different from the absolute count analysis as CAF and NK cells had the lowest expression counts overall. In the leftmost figure below, we can see a similar disconnect in the breast cancer data as well. Stromal cells have a drastically higher average expression despite being the lowest on absolute expression. A limitation of seeing only average expression is that we cannot determine variance in expression within a cell type.



Based off of our initial literature search, the cell types we expected to have the highest TNF gene expression levels were immune cells. The possibility that stromal cells have the highest TNFa expression goes against our initial findings, however we would need to analyze more tumour samples to reach a more consistent and conclusive result.

All the scripts and visualizations we developed are available on our GitHub here.

APPENDIX





[1] Kannan, S., Sams, T., Maury, J., & Workman, C. T. (2018). Reconstructing dynamic promoter activity profiles from reporter gene data. ACS Synthetic Biology, 7(3), 832-841. https://doi.org/10.1021/acssynbio.7b00223

[2] Iqbal, M., Doherty, N., Page, A. M. L., Qazi, S. N. A., Ajmera, I., Lund, P. A., Kypraios, T., Scott, D. J., Hill, P. J., & Stekel, D. J. (2017). Reconstructing promoter activity from lux bioluminescent reporters. PLoS Computational Biology, 13(9), e1005731-e1005731. https://doi.org/10.1371/journal.pcbi.1005731

[3] Fredrich, T., Welter, M., & Rieger, H. (2017). Tumorcode - A framework to simulate vascularized tumors. https://doi.org/10.1101/216903

[4] Fredrich, T., Rieger, H., Chignola, R., & Milotti, E. (2019). Fine-grained simulations of the microenvironment of vascularized tumours. https://doi.org/10.1101/661603

[5] https://en.wikipedia.org/wiki/Hierarchical_Data_Format

[6] https://support.hdfgroup.org/projects/compass/

[7] https://www.h5py.org

[8] Ganai, S., Arenas, R. B., Sauer, J. P., Bentley, B., & Forbes, N. S. (2011). In tumors salmonella migrate away from vasculature toward the transition zone and induce apoptosis. Cancer Gene Therapy, 18(7), 457–466. https://doi.org/10.1038/cgt.2011.10

[9] https://automeris.io/WebPlotDigitizer/

[10] Chen, Y., Song, Y., Du, W., Gong, L., Chang, H., & Zou, Z. (2019). Tumor-associated macrophages: An accomplice in solid tumor progression. Journal of Biomedical Science, 26(1). https://doi.org/10.1186/s12929-019-0568-z

[11] Christensen, E., Naidas, A., Husic, M., & Shooshtari, P. (2020). TMExplorer: A tumour microenvironment single-cell rnaseq database and search tool. https://doi.org/10.1101/2020.10.31.362988