Team:Tongji Software/Engineering

  • Born
  • Preliminary Design
  • Build
  • Grow
  • Complete Design
  • Test and Validation
  • Improvement


This year, we achieved engineering success by following the engineering design cycle: Born → Design → Build → Test → Learn → Design...


Drug-resistant bacteria are becoming a worldwide problem. In the early stage of our project, we saw phage therapy emerging as a promising new treatment for superbugs. At the same time, we found that how to select the specific phage/phages is a challenge, and scientists often spend a lot of time finding it from the vast amount of literature and to conduct preliminary experiments. Compared with wet experiments, computational methods of identifying phage-bacteria interactions are much more efficient, saving time with lower cost meanwhile. By human practices, we think this is a meaningful research direction. So, we started.

Preliminary Design

In the preliminary stage, we chosed CRISPR system as the basis for the design of the project. CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) is the immune system of bacteria. When phages invade, bacteria will intercept the phage genome fragment and insert it into their own genome. We call this fragment Spacer. Numerous spacers and directed repeat(DR) sequences are arranged at regular intervals, making up a CRISPR array. The purpose of the bacteria to use CRISPR is to form an immune memory of the phage so that they can react quickly and fight against the next phage's invasion.

Figure: the CRISPR immune system of bacteria

So, based on the principle of the CRISPR system, we designed our basic analysis method. First, find spacers in genome of bacteria; Then, aligned spacers to the genome of phages; Finally, the intensity of interaction between the phages and the bacteria was calculated by the number and score of alignment result.


Our initial details of build as follows:

① Use the CRISPRDetect tool to obtain the spacer sequence with high probability in bacteria;

② Align the analyzed short spacer sequence to the phage genome through the BLAST tool;

③ Analyze and integrate BLAST scoring that reach a certain e-value threshold into a 2D table;

④ Perform normalization analysis on the corresponding bit-score to get a score Scorespacer_Blast.

Through dataset verification, we found that the results obtained from the preliminary design are not satisfactory. No matter how small we adjust the threshold, the detection rate is still so low that we had to improve it.


Later, by communicating with SJTU-Software, we further improved our analysis methods. Based on the sequence alignment-based analysis in the early stage, we added alignment-free methods to analyze sequence relation from other perspectives, such as k-mer frequency and Markov model. By using these models, we can get Scorekmer_freq and ScoreMarkov. More detail can be found in Model.

Finally, we can get the three characteristic scores, Scorespacer_Blast, Scorekmer_freq and ScoreMarkov. Then, we applied the TOPSIS method to obtain a composite index Score, so as to comprehensively evaluate phagotherapy treatment, and then visualize them in our webpage.

Complete Design

In addition, We also did some extra processing on the superbug dataset.

By communication with Prof. Guo, we noticed existence of lysogenic phages in nature. Considering the universality of this phenomenon and its destabilizing effect on our prediction results, we decided to take this mechanism into consideration. By using the prediction software PHASTER based on the BLAST and DBSCAN algorithms, we selected prophage-bacteria pairs, in which phage consist with high-score prophage in bacteria, and directly deleted them form superbug-phages dataset.

In real life, for severe bacterial infections, medical professionals will use several specific bacteriophages to attack and kill virus bacteria. This method of using multiple phage reagents to treat diseases is called phage cocktail therapy, which is also the most common form of phage therapy. In this project, we also took this into consideration on the advice of the doctor at Dongfang' Hospital. According to genome similarity, we clustered the remaining phages in superbug dataset and obtained different phage clusters. In this way, users can choose phages from different clusters to make up a "specific phages cocktail".

Test and Validation

One thing the software team must do is constantly validate the results of the software and improve it based on the results. The test result of our preliminary design didn't turn out well, we then improved our model based on these failures, and the test result of our complete design finally pleased us. The more details as follows. Click Proof_of_Concept to learn more details about validation.

Fig: Summary diagram of the test and validation section

In a word, we verified the excellent performance of Phage-MAP from 3 aspects: dataset validation, literature validation and wet-experiment validation. Based on these evidence, we could defintely confirm out Phage-MAP is a reliable tool to find accurate interaction between phages and bacteria to assist phagotherapy.


In addition, through the project discussion and literature research, we also put forward some further improvement schemes for some existing shortcomings. And they can be applied to the update of Phage-MAP.

The first one is about the further expansion and improvement of characteristic scores. We can further investigate various analysis tools, summarize and integrate their analysis models with performance results. Then we could adopt more effective models (such as convolutional neural networks) and develop new tools to analysis correlations between bacteria and phages.

The other one is the improvement of multi-score evaluation method. We can further apply other evaluation models, such as gray evaluation models. Also, if we can obtain more characteristic analysis scores in the future, we can apply feature engineering analysis to them by means of machine learning methods, so as to construct and tune a neural network to achieve a better comprehensive evaluation result.


[1] Alkhnbashi OS, Meier T, Mitrofanov A, Backofen R, Voß B. CRISPR-Cas bioinformatics. Methods. 2020 Feb 1;172:3-11. doi: 10.1016/j.ymeth.2019.07.013. Epub 2019 Jul 19. PMID: 31326596.

[2] Tan X, Chen H, Zhang M, Zhao Y, Jiang Y, Liu X, Huang W, Ma Y. Clinical Experience of Personalized Phage Therapy Against Carbapenem-Resistant Acinetobacter baumannii Lung Infection in a Patient With Chronic Obstructive Pulmonary Disease. Front Cell Infect Microbiol. 2021 Feb 26;11:631585. doi: 10.3389/fcimb.2021.631585. PMID: 33718279; PMCID: PMC7952606.

If you have any question, please contact us.

No.1239 Siping Road

Tongji University, Shanghai, China

Get in Touch