Team:Tongji Software/Proof Of Concept

  • About Phage-MAP
  • Theoretical Support
  • Demonstration
  • Command line operation
  • Test and Validation


About Phage-MAP

Our project set up an integrated MAP of phage-bacterial interaction information and built a phage recommendation system based on it. You can choose to input a name of bacterium/phage and obtain the phages/bacteria that are highly possible to interact with your input. These functions are integrated into Phage-MAP website and are divided into three modules: Bacteriophage Bay, Phage Finder and Interactive MAP. The results will be visually displayed, and all the result files can be downloaded, which is convenient for users to save the results and conduct subsequent analysis.

Theoretical Support

Sequence signal of bacteriophage-bacteria interactions

Currently, phage-bacterial interactions are still measured by plaque experiments in the laboratory, but plaque experiments have high requirements for phage purity and experimental environment. It is clear that experimental methods cannot accommodate the spurt of microbial data in the era of metagenomic, and computational methods to predict phage-bacterial interactions are necessary. In the long process of coevolution, phages and their hosts have been engaged in an "arms race", forming an interlocking relationship between them. Molecular and ecological coevolutionary processes shape phage and bacterial genomes and leave signals in their genome sequences that allow us to predict phage-host interaction. Many phages insert parts of themselves actively or passively into the bacterial genome, at the same time, some phages change their sequence characteristics in response to the changes of bacterial genome during infection. Therefore, there is an evolutionary basis for analyzing phage-host interactions based on genome sequences. At present, the main ideas of sequence-based bioinformatics prediction are based on the following aspects: Genetic homology, CRISPR, Exact matches, and Oligonucleotide profiles. Their predictions of phage-host interactions can achieve decent results.

Fig. ROC curves of computational methods in phage-bacteria interaction predicting

Effectiveness of phage therapy

Phages have strict host specificity, which is both an advantage of phage therapy and, to some extent, limits their clinical use, especially for single phage preparations. The effectiveness of phage therapy almost entirely depends on the selection of phage, the key factors include the strength of the interaction between phage and bacteria, and the way phage acting on bacteria. In phage therapy, we want the phage to bind precisely to the target bacteria and to lyse them within a short period. In this regard, we have noticed lysogenic phages (also known as temperature phages), a class of phages that insert themselves directly into the bacterial chromosome without lysing it. These phages may be strongly associated with their host in sequence, but are not actually suitable for phage therapy, so we chose to screen and remove them in our project.


Bacteriophage Bay

Click the Bacteriophage Bay button from the home page to enter the database module. Here, you will know the basic framework of our data and download the data. Our data provides 11 downloadable files, named "bacteria.csv", "bacteria_phage_score.csv", "bacteria_spacer.csv", "bacteria_taxon.csv", "bug_score_with_name.csv", "phage.csv", "phage_bug.csv", "result.csv", "score_bug.csv", "score_with_name.csv" and "super_bug.csv" respectively.

Fig. Bacteriophage Bay

Phage Finder

Click the interface on the home page to visit the Phage Finder module. In the middle of this page, you can find one gray window consist of many entry bars. It is the central part of the Phage Finder, including two functions: Phage-bacteria search and Download data.

Fig. Central part of the Phage Finder

In the searching condition, you can choose to input bacteria or phages. If you don't know what to enter, you can refer to the example in tips.

In the input score, we provide a variety of mathematical symbols for selection. Such as >, <, = etc. The range of score is 0-1, and the more likely it is to be related to the representation close to 1.

You can set number of display records in this bar. The range is 0-50. Please note: the number here is the number of records, That is, the number of lines on the final map.

After selecting all the parameters, click 'Sreach', and you will see the map in the window on the left. On the line between the balls, the number represent the correlation between them. The range is 0-1, which corresponds to the value in score bar.

The result in "Phage Finder"

Interactive MAP

In Interactive MAP, we collected the data of superbug and showed some of their information in the entries,such as "Super Bug Sequence Id", "Super Bug Name", "Super Bug Id" and "Taxon Id".

Superbug data

You can check the items you are interested in, and the following grey windows will update the map in real time. Like the previous Phage Finder map, the nodes here can right-click to jump to NCBI to find detailed information.

Interactive MAP

Command line operation

In addition to provide the web interface version, we also provide the command line data analysis process, see the Github repository for the code. Using the following virtual machine configuration, we can run the entire analysis process smoothly.

Fig. Computer configuration, memory 2G, hard disk 256GB

Fig. Docker installation succeeded.

Fig. Pull docker succeeded.

Fig. Screenshot 1 of successful Operation

Fig. Screenshot 2 of successful Operation

Test and Validation

One thing the software team must do is constantly validate the results of the software and improve the it based on the results.Although the test result of our preliminary design didn't turn out well, we were glad that the test result of our complete design turned out well. The more details as follows.

Dataset Validation

The benchmark dataset was used to train and validate the model, which contained 312 positive and 312 negative samples (235 hosts and 304 Phage). The dataset was obtained from PHIAF. "The benchmark dataset is non-redundant and correctly labeled. Most importantly, all positive phage-host combinations have been verified in published literature or in NCBI records, "Li M wrote in the paper "PHIAF: prediction of phage-host interactions with GAN-based data augmentation and sequence-based feature fusion."

To simulate a situation where people have no background knowledge, we predicted all 235*304=71440 possible bacterium-phage combinations generated from this dataset.And we eventually took 5% top-ranked 3572 combinations (113 positive, 3412 negative) to demonstrate.

Violin Plot

Figure: Violin Plot

By comparing the distribution and probability density of the four groups of data, we can see that our final Score is better than the other three groups.

Chi-square Test

We also apply Chi-square test to compare scoring performance between fully-choosed situation and top-ranked selected situation. This is the combination table:

Figure: Chi-square Test Formulation

Figure: Combination Table

Figure: The result of Chi-square Test

P-value is far less than 0.01, so we could conclude that Phage-MAP have a good performance for collect positive combination at 99% confidence level.

Literature Validation

For 16 superbug dataset, we have predicted the top 20 phages with the highest scores. For these specific data, please click Phage-MAP to view result.

For example, a superbug named Acinetobacter baumannii, Phage-MAP software predicts that its combination with Acinetobacter phage Ab_SZ3 has score of 0.971375, which is one of the top 20 phages for Acinetobacter baumannii. Therefore, we strongly infer that this phage is a highly potential choice to treat the disease caused by Acinetobacter baumannii.

Figure: Predicting result of Phage-MAP

Thankfully, a literature published in May 2021 validate our analyzing results. This paper contended that there was a 86-year-old Chinese male patient with diabetes mellitus type 2(T2D) was hospitalized with chronic obstructive pulmonary disease(COPD) and contracted Acinetobacter baumannii during the hospitalization. But he was subsequently treated by pahgotherapy including Acinetobacter phage Ab_SZ3 and graually have a stablized healthy condition.

Implementing the plaque formation method, the researchers confirmed that Acinetobacter phage Ab_SZ3 can effectively infect Acinetobacter baumannii and can be applied to clinic experiment.

Figure: Phage plaques formed by Phage Ab_SZ3 in the Acinetobacter baumannii

Experiment Validation

It is also worth mentioning is that we set up partnership with XJTLU-China, who has similar project theme with us. After friendly and impressing communication, we built a partnership relation. What's more, this team provided important experimental validataion result for us.

The scoring result of combination scheme, T4 Phage and E.coli, reaches highly 0.995, which is among the 5 top-ranked predicting results of Escherichia coli.

Figure: Predicting result of Phage-MAP

And it is very joyful for us that XJTLU-China's experiment help us verify this combination scheme.

Figure: Phage plaques formed by Phage T4 in the E.coli

In summary, we verified the excellent performance of Phage-MAP from 3 aspects: dataset validation, literature validation and wet-experiment validation. Based on these evidence, we could definitely confirm that Phage-MAP is a reliable tool to find accurate interaction between phages and bacteria to assist phagotherapy.

If you have any question, please contact us.

No.1239 Siping Road

Tongji University, Shanghai, China

Get in Touch