Modeling
Introduction
The creation of models allows for the acquisition of data outside of the lab. Models can be predictive, illustrative, or even validative. The most important part about models is how they aide the user in seeing data or realizations that otherwise might not have been possible to see. For our project, we visualized the interaction of bacteria and phage populations to predict the amount of phage needed to control a bacterial population.
Docking Model
We didn’t actually complete this model. Dry Lab put lots of work into docking; however Austin Cool, Dr. Debnath, and Dr. Mandell eventually suggested it would not be in our best interest. We still wanted to talk about this model because of how much work was put into it and to help future teams understand more about docking.
Our anti-lipid A proteins can be split into two main categories: lipid A binding proteins and lipid A modifying proteins. Both were modeled to see how well they bonded to lipid A or how well they inhibited bonding to lipopolysaccharide binding protein (LBP, our body's lipid A detection protein). In doing so, we could predict which protein would work best in our system, by limiting our body’s detection of lipid A.
Binding Proteins
The binding proteins we modeled were lactoferrin (LF) and LptA. In order to model their binding with lipid A, we used the Autodoc Vina docking software. This software allows us to determine Gibbs Free energies associated with bringing the two molecules close together. In order to do this, we had to input a structured data format (SDF) file of a molecule (lipid A), a protein data bank (PDB) file of the substrate (protein), and a 2 nm cubed docking box (the site of binding). Examples of both a PDB and SDF file can be found below in figures 1 and 2 respectively.
We obtained the SDF by simply drawing lipid A in ChemDraw and exporting it as an SDF. The PDB files were all readily available online, mainly from the RCSB Protein Data Bank. The docking box was the most challenging of the inputs to attain. In fact, we stopped pursuing this model before we found the docking boxes. The way Austin suggested we go about finding docking boxes was through searching literature. If the protein is well known to bind to the molecule, most times, the sources associated with the PDB file will actually discuss the docking box. Because our proteins don’t all commonly bind to lipid A, these docking boxes were not so easily found. Had we continued with docking, we would find our docking boxes through expanded literature sources and maybe even eventually, best estimates.
Once we gathered all the inputs we would feed them into Autodoc Vina to get outputs of a docking score and a specific orientation of the molecule. We would conduct this process for each of the binding proteins stated above. To actually via the orientation of the molecules, we could use PyMol and input the structure outputted by Autodoc Vina. PyMol is a molecular visualization tool that aids in physically interpreting docking orientation in 3D. A typical output can be seen below for just the LF protein.
A docking score is a measure of Gibbs free energy, thus the more negative, the more favorable the binding. Knowing this we could see if any of the proteins have favorable binding to lipid A. While these scores can determine if the binding is favorable or not, they cannot be compared between each other. That is, the docking scores for the same molecule binding to different proteins cannot be compared (the docking score for different molecules binding to the same protein can be compared however). This is because of the different surfaces of the proteins.The different protein surfaces offer different free energy contributions to potential ligands. Due to these differences, it is hard to compare scores because a score in the range of 5-6 may be good for one protein, but terrible for another. If the proteins have high sequence similarity and their docking boxes by one or two amino acids, it is more reasonable Using these results, contact maps showing where different residues bind to the molecule could be generated. Contact maps would be produced through a different program, Glide SP, with the same inputs, and show which parts of the protein are most likely binding where to the molecule. If the same residue from each protein is binding to the same portion of lipid A, it might be a characteristic of a good lipid A binding protein. Future proteins with the same residue and residue spacing could be hypothesized to perform as well as or better than our selected binding proteins.
Modifying Proteins
The modifying proteins we would model were PagL, LpxR, PagP, LpxE, and LpxF. Each of these proteins alter lipid-A in some way described below in table 1. In doing so, we hypothesized that they would decrease the binding favorability to the body's lipid A detection protein: lipopolysaccharide (LPS) binding protein (LBP).
Protein | Function | Lipid A Structure |
---|---|---|
LF | Binds to lipid A but doesn't change lipid A structure | |
LptA | Binds to lipid A but doesn't change lipid A structure | |
PagL | Removes the 5th acyl group (1 chain, 14 carbons and 2 oxygens) | |
LpxR | Removes 1st acyl group (2 chains, 28 carbons and 3 oxygens) | |
PagP | Adds a 6th acyl group (second to last chain, 16 carbons) | |
LpxE | Removes the phosphate group on the 1' end | |
LpxF | Removes the phosphate group on the 3' end |
To test this hypothesis, we would repeat the same procedure as above, feeding the PDB file of LBP along with each of the altered lipid A molecules into Autodoc Vina. For the binding proteins, we had a single molecule (lipid A) and many substrates, so the docking scores couldn’t be compared; however, we now have a single substrate (LBP) and many molecules, so if completed, we could have compared docking scores.We would be able to say which one of the proteins creates the altered lipid A molecule with the least favorable binding to LBP. In doing so, we would classify that selected protein as the best modifying protein to be used.
Docking Complications
We ended up not proceeding with docking for multiple reasons. The first reason was our molecule. Oftentimes docking is done with 500 Da molecules. Lipid A is very large in comparison with a molecular weight of 1763 Da. This makes it hard for the program to orient and combine lipid A with the protein. Docking primarily uses Gibbs Free Energy calculations to determine the docking score. Lipid molecules (like lipid A) notoriously sidestep and disrupt these calculations due to their long carbon chains (leading to inaccurate results).
There is no method to know if the way the docking program orients and folds the proteins is accurate, since the proteins aren’t well documented binding to lipid A. Because of this, even if we got results, we would have no way to validate them and no way to do a rough check to see if the model results even make sense. We would just have to blindly accept them. With only seven proteins, it would be relatively easy to get experimental results and discover the best anti-lipid A protein to use. This would eliminate all the guesswork inherent to our model, effectively making the docking model not worthwhile.
Bacteria-Phage Population Model
If our project were to make it as an actual product there would need to be specific dosing instructions. In order to help visualize what they might be, we modeled a bacteria and phage population over time. This was done in a controlled environment with resources. We made the following assumptions to create a simplified, first estimates model:
- Open, liquid, well-mixed habitat
- Random interactions between resource, phage, and bacteria
- Uninfected bacteria multiply via binary fission at a per capita rate that is a hyperbolic function of the resource concentration in the habitat
- Phage encounter and irreversibly adsorb to uninfected bacteria at a per capita rate that is a linear function of the bacterial density
- Each infection is lethal to a bacterium
We solved a system of ODEs, suggested by Richard E. Lenski1. The pdf below houses our notation and parameters used. We first tried to solve this system in MATLAB using ODE45. The S prime and P prime terms caused us issues however. Lenski did not describe in depth how to handle these terms and them being themselves a function of S and P complicated things. We couldn’t figure out a way to have the primes reference S and P using the ODE solver. As a first guess we simply set S prime = S and P prime = P. Unfortunately this led to a nonsensical answer with concentrations barely changing over time. Seeing this as a limitation of using an ODE solver we turned to more basic methods, constructing a while loop that just stepped through time. In this way, we were able to specifically set S prime and P prime to previous values of S and P. Once again we had impossible outputs, the free phage concentration approached negative infinity while the infected bacteria approached positive infinity (figure 4).
After more failed debugging and literature searches, we reached out to Dr. Rathman to verify our ODE solver construction was sound. After confirming we had indeed coded the ODE solver correctly, he suggested a way to handle the primes. Since the primes themselves change based on time, we could approximate them as a function of change in S (for S prime) and P (for P prime) with respect to time. After a healthy amount of algebra and the use of a mass matrix, the ODEs could be solved again. Still, our code yielded unrealistic answers with the free phage concentration shooting up quickly and the infected bacteria going negative after just one hour (figure 5).
Failing to figure out how to handle the prime terms led us to reconsider our model. We wanted to get some visual to show how adding phage into a system would affect the bacteria population. Ideally in a way that could somewhat represent a real life scenario where we would have to administer phage. If we assume that the infection is well-developed and look at a control volume of the body, it is reasonable to say that it could be at steady state. In essence, the bacteria have reached maximum concentration based on the available resources and these resources are being replenished as fast as they are consumed and/or leave. The system is at steady state. Our ODEs (equations 1-4 in the pdf) now have no time dependence and can be set equal to zero. Download pdf here.