Traditional synthetic biology often relies on a large number of repetitive experiments to obtain results. Such an approach always requires a lot of effort and high cost. With the rapid development of data science, researchers can use machine learning or deep learning to find patterns or discover new phenomena in huge amounts of data, which makes machine learning and deep learning extremely important in fields such as search and prediction.
In fact, synthetic biology has accumulated enough data over a long period of time. We’ve brought computer science into our synthetic biology experiments, which can be of great benefit to us, such as helping iGEMers design biological experiments, predicting biological properties, etc. There are already many successful attempts by a number of groups.
Synthetic biology is a multidisciplinary subject and our inspiration came from other disciplines as well. When we can't come up with a good idea, our university happened to host a presentation on the application of machine learning in materials science. Thanks to this presentation, we had the idea of combining synthetic biology and machine learning. Meanwhile, we had a in-depth conversation with Prof. Haiyan Liu and he gave us the idea of machine learning and thought our idea was promising and worthwhile. Thus, we set the direction of our project and put it into action.
In the earliest stages of the project, we wanted to design a novel machine learning model predicting properties that hasn’t been done before. But after a long period of researching, collecting data and talking to members of USTC and our faculty, we realized that the high cost and difficulty of the project would not allow us to complete it in a limited time. Therefore, We did further research and find that there is no highly integrated and novice-friendly platform for machine learning models that predict protein properties by its sequence. That's how we came up with the current project, CAT.
CAT is a user-friendly platform designed for people who would like to obtain knowledge of machine learning and predict protein properties. It greatly simplifies the time needed for retrieval by integrating machine learning prediction software such as Jpred4, IPC2.0, DeepTMHMM, Cell-PLoc 2.0 to predict protein secondary structure, isoelectric point, transmembrane topology, subcellular localization in one click. For beginners, we provided the demonstration videos in the demonstrate section. For iGEMers or students who are interested in machine learning, we design an educational version to deliver knowledge machine learning and synthetic biology.