Mathematical modeling can often help saving lots of materials and workload in the laboratory. Through computer calculation and simulation, limited experimental data can be used to deduce the truth and laws between the data, and provide the optimal strategy for the follow-up experimental design, significantly improving work efficiency. Since Transcription Factor (TF) database has been developed rapidly by researchers in recent years and information explosion could be brought by genetic diversity, computer simulation is obviously essential. To better integrate with the dry lab and HP work and maximize the benefits of dry experiments, we developed three models as follows.

1 Software-NuPGO

Taking over from Wet lab, dry lab further optimized the promoter. TFs and their binding site, called TFBS or UAS, are the research emphasis in our project and also the research field that has been excavated in the past ten years. After its essential function and significance were discovered, the TF database expanded gradually, and different TF motifs and functional roles gradually became clear. But in addition to replacing and inserting more efficient UAS sequences, there is another factor influencing the effect of TF, namely, the nucleosomes that entangle with DNA and form chromosomes, which will prevent TFs from binding to UAS. The affinity of nucleosomes towards DNA, or how tightly the nucleosome will bind to DNA, is determined by the different bases on the sequences. In other words, decreasing the nucleosome affinity of DNA sequences would allow TF to bind to UAS with less space resistance and more opportunity, which is an equally important improvement approach. So after the hybrid UAS promoter with the best effect was obtained in the wet lab, we continued to reduce the affinity of the promoter while keeping the core region of the promoter and the key UAS motif intact to get a more potent promoter.

It is worth noting that TFBS is very common in various promoters, which means not only our project but also any other promoters with artificial or natural TFBS can be simply strengthened in the same way using our software NuPGO. The increasing strength amplitude, excitingly, has been shown in previous studies to yield several folds higher expression level, which is a very significant improvement. That's why SCUT_China has developed a full-featured and user-friendly software code package in order to enable subsequent iGEM teams and researchers could easily use this tool or refine the source code to optimize their own promoters! We sincerely and strongly recommend any research team using UAS to give a try on our optimizer software!!

2 Fermentation Simulation

Initially, we wanted to use mathematical equations to describe the actual conditions inside the biological fermenter , such as cell growth kinetics and product kinetics. But after consulting the pieces of literature and the professor, we realized that this program design was not feasible, as the fermentation process is a nonlinear and multivariable complex process with large time delay, and the turbulence in the objective fermentation process is even much difficult to describe with mathematical formulas. Therefore we subsequently turn to education and popularizing science purpose instead, hoping to fit the fermentation process in the tank as much as possible, then realize visualization with cellular automaton.

ODEs set was established to describe the growth of yeast over time in the fermenter, also the production formation and substrate consumption inside, as well as the effects of varieties of environmental parameters on the process. A cellular automaton is then built from the ODE to visualize the process as a fermenter. Most significantly, aiming to help education and science popularization, processes under different conditions can be observed by users adjusting the input parameters such as the setting range of pH, temperature, dissolved oxygen, and stirring force, etc.

3 Machine Learning

One of the difficulties in predicting the strength of promoters is the complexity and nonlinearity of the information contained in genes, and machine learning is likely to be an appropriate measurement tool. Inspired by the characteristics of DNA sequences, we have tried to use the Recurrent Neural Network (RNN) algorithm, which has the advantage of text understanding to learn the relationship between the sequences and strength of promoters. Because of the lack of time and the gigantic training dataset , the training result and hyperparameter optimization have not finished yet. Nonetheless, we believe the work would offer a novel approach to predicting promoters' strength with machine learning for iGEM teams in the future, taking the dataset and adjusted initial parameters as a reference. To provide a sequence-based approach to machine learning for subsequent iGEM teams, we present the training data set and the adjusted initial parameters.