NavBar

Modeling

Introduction

The creation of models allows for the acquisition of data outside of the lab. Models can be predictive, illustrative, or even validative. The most important part about models is how they aide the user in seeing data or realizations that otherwise might not have been possible to see. For our project, we visualized the interaction of bacteria and phage populations to predict the amount of phage needed to control a bacterial population.

Docking Model

We didn’t actually complete this model. Dry Lab put lots of work into docking; however Austin Cool, Dr. Debnath, and Dr. Mandell eventually suggested it would not be in our best interest. We still wanted to talk about this model because of how much work was put into it and to help future teams understand more about docking.

Our anti-lipid A proteins can be split into two main categories: lipid A binding proteins and lipid A modifying proteins. Both were modeled to see how well they bonded to lipid A or how well they inhibited bonding to lipopolysaccharide binding protein (LBP, our body's lipid A detection protein). In doing so, we could predict which protein would work best in our system, by limiting our body’s detection of lipid A.

Binding Proteins

The binding proteins we modeled were lactoferrin (LF) and LptA. In order to model their binding with lipid A, we used the Autodoc Vina docking software. This software allows us to determine Gibbs Free energies associated with bringing the two molecules close together. In order to do this, we had to input a structured data format (SDF) file of a molecule (lipid A), a protein data bank (PDB) file of the substrate (protein), and a 2 nm cubed docking box (the site of binding). Examples of both a PDB and SDF file can be found below in figures 1 and 2 respectively.

LF PDB File — Figure 1. PDB File Example for LF

Lipid A SDF File — Figure 2. SDF File Example for Lipid A

We obtained the SDF by simply drawing lipid A in ChemDraw and exporting it as an SDF. The PDB files were all readily available online, mainly from the RCSB Protein Data Bank. The docking box was the most challenging of the inputs to attain. In fact, we stopped pursuing this model before we found the docking boxes. The way Austin suggested we go about finding docking boxes was through searching literature. If the protein is well known to bind to the molecule, most times, the sources associated with the PDB file will actually discuss the docking box. Because our proteins don’t all commonly bind to lipid A, these docking boxes were not so easily found. Had we continued with docking, we would find our docking boxes through expanded literature sources and maybe even eventually, best estimates.

Once we gathered all the inputs we would feed them into Autodoc Vina to get outputs of a docking score and a specific orientation of the molecule. We would conduct this process for each of the binding proteins stated above. To actually via the orientation of the molecules, we could use PyMol and input the structure outputted by Autodoc Vina. PyMol is a molecular visualization tool that aids in physically interpreting docking orientation in 3D. A typical output can be seen below for just the LF protein.

PyMol Demo of LF — Figure 3. PyMol Visualization Example of LF

A docking score is a measure of Gibbs free energy, thus the more negative, the more favorable the binding. Knowing this we could see if any of the proteins have favorable binding to lipid A. While these scores can determine if the binding is favorable or not, they cannot be compared between each other. That is, the docking scores for the same molecule binding to different proteins cannot be compared (the docking score for different molecules binding to the same protein can be compared however). This is because of the different surfaces of the proteins.The different protein surfaces offer different free energy contributions to potential ligands. Due to these differences, it is hard to compare scores because a score in the range of 5-6 may be good for one protein, but terrible for another. If the proteins have high sequence similarity and their docking boxes by one or two amino acids, it is more reasonable Using these results, contact maps showing where different residues bind to the molecule could be generated. Contact maps would be produced through a different program, Glide SP, with the same inputs, and show which parts of the protein are most likely binding where to the molecule. If the same residue from each protein is binding to the same portion of lipid A, it might be a characteristic of a good lipid A binding protein. Future proteins with the same residue and residue spacing could be hypothesized to perform as well as or better than our selected binding proteins.

Modifying Proteins

The modifying proteins we would model were PagL, LpxR, PagP, LpxE, and LpxF. Each of these proteins alter lipid-A in some way described below in table 1. In doing so, we hypothesized that they would decrease the binding favorability to the body's lipid A detection protein: lipopolysaccharide (LPS) binding protein (LBP).

Table 1. Anti-Lipid A Proteins and Their Effect on Lipid A
Protein	Function	Lipid A Structure
LF	Binds to lipid A but doesn't change lipid A structure
LptA	Binds to lipid A but doesn't change lipid A structure
PagL	Removes the 5^th acyl group (1 chain, 14 carbons and 2 oxygens)
LpxR	Removes 1^st acyl group (2 chains, 28 carbons and 3 oxygens)
PagP	Adds a 6^th acyl group (second to last chain, 16 carbons)
LpxE	Removes the phosphate group on the 1' end
LpxF	Removes the phosphate group on the 3' end

To test this hypothesis, we would repeat the same procedure as above, feeding the PDB file of LBP along with each of the altered lipid A molecules into Autodoc Vina. For the binding proteins, we had a single molecule (lipid A) and many substrates, so the docking scores couldn’t be compared; however, we now have a single substrate (LBP) and many molecules, so if completed, we could have compared docking scores.We would be able to say which one of the proteins creates the altered lipid A molecule with the least favorable binding to LBP. In doing so, we would classify that selected protein as the best modifying protein to be used.

Docking Complications

We ended up not proceeding with docking for multiple reasons. The first reason was our molecule. Oftentimes docking is done with 500 Da molecules. Lipid A is very large in comparison with a molecular weight of 1763 Da. This makes it hard for the program to orient and combine lipid A with the protein. Docking primarily uses Gibbs Free Energy calculations to determine the docking score. Lipid molecules (like lipid A) notoriously sidestep and disrupt these calculations due to their long carbon chains (leading to inaccurate results).

There is no method to know if the way the docking program orients and folds the proteins is accurate, since the proteins aren’t well documented binding to lipid A. Because of this, even if we got results, we would have no way to validate them and no way to do a rough check to see if the model results even make sense. We would just have to blindly accept them. With only seven proteins, it would be relatively easy to get experimental results and discover the best anti-lipid A protein to use. This would eliminate all the guesswork inherent to our model, effectively making the docking model not worthwhile.

Bacteria-Phage Population Model

If our project were to make it as an actual product there would need to be specific dosing instructions. In order to help visualize what they might be, we modeled a bacteria and phage population over time. This was done in a controlled environment with resources. We made the following assumptions to create a simplified, first estimates model:

Open, liquid, well-mixed habitat
Random interactions between resource, phage, and bacteria
Uninfected bacteria multiply via binary fission at a per capita rate that is a hyperbolic function of the resource concentration in the habitat
Phage encounter and irreversibly adsorb to uninfected bacteria at a per capita rate that is a linear function of the bacterial density
Each infection is lethal to a bacterium

We solved a system of ODEs, suggested by Richard E. Lenski¹. The pdf below houses our notation and parameters used. We first tried to solve this system in MATLAB using ODE45. The S prime and P prime terms caused us issues however. Lenski did not describe in depth how to handle these terms and them being themselves a function of S and P complicated things. We couldn’t figure out a way to have the primes reference S and P using the ODE solver. As a first guess we simply set S prime = S and P prime = P. Unfortunately this led to a nonsensical answer with concentrations barely changing over time. Seeing this as a limitation of using an ODE solver we turned to more basic methods, constructing a while loop that just stepped through time. In this way, we were able to specifically set S prime and P prime to previous values of S and P. Once again we had impossible outputs, the free phage concentration approached negative infinity while the infected bacteria approached positive infinity (figure 4).

Figure 4. Concentration Output for the Time Loop Method

After more failed debugging and literature searches, we reached out to Dr. Rathman to verify our ODE solver construction was sound. After confirming we had indeed coded the ODE solver correctly, he suggested a way to handle the primes. Since the primes themselves change based on time, we could approximate them as a function of change in S (for S prime) and P (for P prime) with respect to time. After a healthy amount of algebra and the use of a mass matrix, the ODEs could be solved again. Still, our code yielded unrealistic answers with the free phage concentration shooting up quickly and the infected bacteria going negative after just one hour (figure 5).

ODE Solver with Mass Matrix Method — Figure 5. Concentration Output for ODE Solver Using a Mass Matrix

Failing to figure out how to handle the prime terms led us to reconsider our model. We wanted to get some visual to show how adding phage into a system would affect the bacteria population. Ideally in a way that could somewhat represent a real life scenario where we would have to administer phage. If we assume that the infection is well-developed and look at a control volume of the body, it is reasonable to say that it could be at steady state. In essence, the bacteria have reached maximum concentration based on the available resources and these resources are being replenished as fast as they are consumed and/or leave. The system is at steady state. Our ODEs (equations 1-4 in the pdf) now have no time dependence and can be set equal to zero. Download pdf here.

In doing so we can look at two interesting cases, steady state bacteria concentration without phage present and with phage present. Without phage, the bacteria will be limited based on the amount of resources present. Setting equation 1 equal to zero, results in equation 5. We can now solve equation 5 for S in terms of C. Since we don’t have any phage, P = 0, and equation 2 can be set equal to zero (equation 6) and solved for C (the S terms will all cancel out). Finally, to arrive at equation 7, we simply plug our equation we just got for C, into our S equation we got from equation 5.

Similar to the resource limited case, we can solve for S from equation 1. Once again we have a C term present and can use equation 6 to eliminate it. However, the P term will not disappear like it does in equation 6 since we have phage present. Instead, it will stay and therefore, C will now be a function of P. Again, we plug equation 9 into 8 and get the phage limited concentration of bacteria.

Looking at the equations for the resource limited (RL) and phage limited (PL) bacteria concentrations shows some differences. We can see that RL is a function of the initial concentration along with bacteria and phage parameters. PL is also a function of the initial concentration and bacteria and phage parameters, but also P. As we might expect, the PL bacteria concentration is a function of the concentration of phage.

To better visualize these differences, we can also graph them in MATLAB. With a spread of initial concentrations and four different phage concentrations, we see some patterns. As one might expect, the elevated phage concentrations yield a lower bacterial concentration. Figure 6 also shows, that an increase in resource concentration gives rise to a higher bacterial concentration in both RL and PL scenarios.

Steady State Method — Figure 6. Concentration Output for the Steady State Assumption

When the phage concentration is low, there is little to no difference in the bacteria concentrations. As the phage concentration grows in magnitude, the bacteria concentration shrinks. If for a given case we need to reduce the bacterial density to below 4x10⁷ CFU/mL for an average resource concentration (100 μg/mL), 10⁶ PFU/mL of phage should be used. This is because it is the lowest amount of phage that reduces the bacteria to its necessary concentration.

Obviously this is a first estimates model. To get actual dosing for phage, we will need to be more precise and accurate with our assumptions. A future iteration of this model might expand the model to encompass changing concentrations. An unsteady state model could capture the evolving bacteria and phage concentrations over time. In doing so, the model would better show how a single injection of phage would develop in body where both the concentrations of bacteria and phage are constantly changing.