At the very beginning of our project, we went to meet our school’s wet team - USTC, to learn something about synthetic biology. During our visit, we found that traditional biological experiments are very time consuming, especially when the experiment requires some new properties of protein that is unfamiliar to the researchers. Being familiar with information technology, we are curious about the possibility of collaboration between computer science and biology, which means applying machine learning to predict protein properties. We asked our wet team about the convenience of accessory to machine learning models and if there was any integrated platform specialized for those models, their answer is no. So we thought we should do some contributions in this area. With this belief, we design the software CAT to serve as the sharing platform of machine learning. Here are several of our contributions.

Integration of Models

Due to the reasons just mentioned above, we want to provide a tool for researchers to use machine learning to predict protein properties. In order to meet the needs of researchers in different directions, we decided to design a platform that integrates various property prediction models. For researchers, they only need to input the protein sequence and then they can get the predicted results about the properties of proteins such as subcellular localization, transmembrane topology, secondary structure and isoelectric point, the four important protein features we provide.
# Subcellular localization
Protein function, metabolism and interactions are closely related to the subcellular localization of proteins, and mature proteins must be located in specific subcellular structures to perform their correct biological functions. Therefore, subcellular localization is of great importance for protein synthesis and functional studies.
# Transmembrane localization
Membrane proteins play a crucial part in life activities by binding to ligand molecules and stimulating a series of reactions to transmit information between cells. The study of membrane proteins plays a vital role in the field of intercellular communication.
# Secondary structure
The secondary structure, basically alpha spiral and beta fold, provides key information to the conformation of proteins and help bioinformaticians to deduce the spatial structure.
# Isoelectric point
The solubility of amino acids is minimal at the isoelectric point, so the isoelectric point of proteins is essential for purification of amino acids and precipitation of proteins.

Disseminate Knowledge

For the learners who are interested in biology but have not studied it, some dissemination of traditional biology knowledge give them a deeper understanding of the field they love and a solid foundation for future study and research. For some researchers in certain fields, the ability of the models we offer is limited, so we think it’s necessary to disseminate knowledge of biology and machine learning. We believe this can help them understand what kind of machine learning models are useful for themselves and to what extent. That's why we build the education section on CAT.

The part of traditional biology shares some traditional assays, such as subcellular localization of proteins using fluorescent protein assays. We believe that for students it can have a brief of traditional biology, and develop their interest in synthetic biology through our educational version.

The machine learning part has the advantage of being easy to understand. It uses the simplest examples to explain the principles of machine learning so that users are not confused. We believe that practice is necessary when learning knowledge. So we provide a small test for users. Besides, we found that knowledge in graphical form is easier to be accepted than one in textual form, so we set up interactive animations to help learners master the basic principles of machine learning quickly and firmly.

We believe that only an educational version is not enough to popularize the knowledge about synthetic biology and machine learning, so we designed a brochure. It has slightly the same content as the education section, but we believe that by doing so, we can contribute to the popularization of science.

Human Practice

We held a meeting at the beginning of the project to educate each regular and reserve team member about synthetic biology and computer science, and to plan and organize the project.

After the initial research, we had a meeting with Prof. Haiyan Liu from the School of Life Sciences, University of Science and Technology of China, and communicated about the development and possible problems in protein prediction, which was very helpful to us.

From August 27th to 29th, 2021, we participated in CCiC, the Conference of China iGEMer Community. we had an in-depth and friendly interaction with other teams and were able to learn and understand synthetic biology in more depth. After the conference we had further communication with SJTU-Software team from Shanghai Jiao Tong University and then started a collaboration between the two teams.

As we’ve mentioned above, we created a brochure. By hanging out the brochures in science and technology museums, high schools, and university campuses, we promoted our project and taught the basics of synthetic biology and machine learning to students and community members at all levels, and received a lot of positive feedback.
Contact Us


University of Science and Technology of China, No.96, JinZhai Road Baohe District, Hefei, Anhui, 230026, P.R.China