During the research process of the project, we got the deep insight of both software and web tools and summarized the joint usage with obtaining the empirical and valid conclusions from the result analysis. We believe that the whole process of our project possesses the characteristics of universal, and that the joint tool that has been mapped out by SCUT-China could become a measurement instruction with some value. This tool is seriously useful for experiment design and for reducing experimental workload for whose project using transcription factor(TF) and transcription factors binding sites(TFBS). Therefore, we sincerely promote this practical tool here in the hope of making some contributions to the futrue iGEM teams or researchers.


Databases and software for biological research have gained rapid development in recent years. For the explosion of information in biology, effective and correct measurement tools become scientific tools of great significance. In order to advance biology in the face of the rising amount of experimental data, scientists are integrating data into various databases for researchers to use. In such a time, we prefer to make good use of these software and databases to help our research than to continue to be limited to our own labs by creating gene libraries and other means to obtain more data.

We know that chromosomes are consist of DNA tightly bounding to nucleosomes with superhelical properties , and transcription factors are known to be an important class of functional proteins that activate transcription downstream by binding to DNA. In such a strongly distorted situation, genes are tightly bound and coiled with nucleosomes and are unable to expose itself that allow transcription factor to bind specifically. In this case, it is difficult for transcription factors(TF) to bind to DNA and come into play. Only in the region where nucleosomes are loosely bound can TF bind and function more smoothly. Therefore, we put forward two points of view, ‘transcription factor binding is strongly related to transcriptional strength’ and ‘nucleosome affinity of DNA has a strong influence on transcription factor binding’. Therefore, the evaluation of "transcription factors" and "nucleosome affinity" became a critical task.Before introducing our measurement to you, we have to refer to two main tools[1][2] in our project.

2.1 TFBS database

Transcription factor binding sites have been determined and validated motifs in recent years by experiment , and due to the diversity of motifs, they often contain concatenated bases to represent a class of binding sequences corresponding to one transcription factor. Information on these motifs as well as on the function of transcription factors is collected in a database. Since UAS has been studied more in yeast, the classic databases are YeTFaSCo (The Yeast Transcription Factor Specificity Compendium), YEASTRACT+ and Saccharomyces Genome Database (SGD). , most of them possess the ability to calculate and predict TFBS of promoters .

We used both the YEASTRACT+ and YeTFaSCo website server for getting the TFBS distribution figure and collecting the transtription factors information.

2.1 2.2 dHMM

Hidden Markov models are tools for mining hidden information in sequence information (detailed in our Model webpage) and are suitable for reading continuous information, linking information between different nodes and mining hidden information according to Markov chain rule.

Gene sequence and the transcription factor binding site TFBS both fit this profile, so some scientists have also tried to use dHMM to predict nucleosome occupancy corresponding to chromosomal DNA. And better results have been obtained.

The proposed model has very good robustness and has been developed as a user-friendly R package called NuPoP. NuPoP is able to take in a sequence of arbitrary length as input and calculate the nucleosome occupancy and nucleosome affinity score upon that sequence.

We used it for promoter nucleosome occupancy prediction. The nucleosome affinity score output by NuPoP was used as a measurement tool to assess the affinity of different positions of genes to nucleosomes

2.3 Combination

TShowing the final result ahead, we found and verified a method which simultaneously performs nucleosome affinity and TFBS predictions , in order to tell or classify the promoter strength or characteristic.

In its developing proess, we verified the accuracy of NuPoP and TF binding sites prediction. With the experience and conclusion we gain in the process, this set of measurement could be performed upon any other promoter to get a deeper insight and roughly judge its capibility.


To those who would like to use this measurement, you are suggested to follow the ‘method’ context while taking ‘In our program’ part as example .

3.1 Promoters decision

In promoter engineering, it is critical to pick a promoter with high basal strength. A superficially accessible theory is that promoters with high natural initiation transcriptional strength may contain key UAS that contribute more to strength, while in those inducible promoters with more complex mechanisms, there may be key UAS with stronger regulatory capacity.

In our program: We select 14 well studied and common used promoter in S. cerevisiae, and divided them into four group according to their metabolic pathway, taking account into that different metabolic pathways have distinct different.

3.2 UAS location

Using the ‘Find TF Binding Sites’ function on the Yeastract+ database, you can predict the distribution of known UASs on the promoter sequence you submit and view the functional descriptions of the corresponding transcription factors for each UAS to facilitate selection. It is necessary to know the distribution of UASs of interest on all promoters in your project. You can see the locations of UASs in the predicted TFBS results of Yeastract+ database, and you can record their specific locations by location site, or use software such as snapgene to record them in visual form for subsequent comparative analysis. Because this visualization function provided by the database is not perfect, this marking work may need to be done manually.

In our program: We submitted 14 selected promoter sequences to the Yeastract+ database and obtained the returned prediction results. Based on the functional descriptions of the transcription factors corresponding to TFBS in the prediction results, the transcription factor binding sites associated with ‘carbon source response’ or ‘glucose repression’ were screened by us. Based on the sequence matches, snapgene software was used to annotate the genes with the locations of the UASs whose mechanisms of action were associated with ‘carbon response’ or ‘glucose repression’.

Figure 1. UAS distribution diagram

3.3 HMM prediction

All promoters in your program should be calculated by NuPoP and a one-to-one picture obtained. When the value is positive, the higher the affinity value, the stronger the affinity of the gene towards the nucleosome, the more tightly bound the nucleosome is, and the more difficult it is for the TF to bind and function with the UAS. When the value is negative, the smaller the affinity value indicates that the sequence is more resistant to the nucleosome, the less tightly bound to it, and the easier it is for TF to bind UAS and function. Therefore, NuPoP predicts that the region with low nucleosome affinity is likely to be the region where the corresponding transcription factor is more likely to perform a function.

In our program: We submit the 14 promoters sequences to the integrated Software NuPoP and obtain the 14 respective nucleosome affinity distribution curve.

Figure 2. Nucleosome affinity distribution diagram

3.4 Merge analysis

For each promoter, it should already have two figures of TFBS distribution and nucleosome affinity distribution curve. Now please splice them radially and align as much as possible for the following analysis.

Figure 3.1 Joint diagram of glycolysis pathway promoters

Figure 3.2 Joint diagram of glycolysis pathway promoters

Figure 4. Joint diagram of pentose phosphate pathway promoters

Figure 5. Joint diagram of ethanol metabolic pathway promoters

Figure 6. Joint diagram of other pathway promoters

3.4.1 Double verification proof

After combining the results of the nucleosome affinity distribution and TFBS distribution predicted by Yeastract+ and NuPoP, respectively, we were surprised to find a very satisfactory status: specifically speaking, the regions with low affinity usually have more important UAS clustered there. This confirms the accuracy of the TFBS distribution predicted by the transcription factor database and illustrates the certain accuracy of the nucleosome affinity distribution predicted by NuPoP. As two independent measurement tools, both showed excellent accuracy and validity, and in turn showed good crossover in the combined analysis of the results. Therefore, we believe that these are good measurement tools and have value for combined use.

Figure 7. Regional preference in TFBS

On the other verification, we constructed expression vectors, performed characterization validation experiments for the strength of each promoter, labeled and ranked the strength of each promoter based on the experimental results, and explored the relationship between UAS plus nucleosome affinity and promoter strength. Since these 14 promoters have their own metabolic pathways and may have slightly different mechanisms of action, we discuss them separately in each metabolic pathway, and additionally we select the best promoters selected for each metabolic pathway for comparison.

Table 1. Key transcription factors in project design

Taking the nucleosome mechanism into account, we consider the part of nucleosome affinity value less than 0 and the trough of fluctuation as ‘Valid Region’, which means that UAS located in this kind of region are more likely to play the role of corresponding transcription factor. The more UAS or less URS in the ‘Valid region’, the higher theoretically the strength of the promoter, or the TF function.

The experimental data, as a test, are consistent with our hypothesis :

Table 2. Joint diagram of glycolysis pathway promoters

Table 3. Joint diagram of pentose phosphate pathway promoters

Table 4. Joint diagram of ethanol metabolic pathway promoters

From table 2 to 4, the experiment result is obviously in accord with our UAS theory. Among them, we need to additionally explain the table 4, which contains two promoters with ethanol metabolic pathway. In our wet lab, experiment has verified it that promoters in ethanol metabolic pathway show serious inhibition in early period with glucose consuming, called glucose repression. The UASs we record here are all related to glycolysis, so that’s why ALD3 containing so many Gcr1p binding sites but gain a low rank.

Figure 1. Glycolysis group in 14 promoters with key TFBS count (strength sorting)

Taking the six promoters of the glycolysis metabolic pathway as an example. Three transcription factor related to carbon source response or glucose utilization were selected since their resonable function and occurrence frequency suggesting them as the main transcription factors.

Firstly, most of them are in the region with low nucleosome affinity, which means they are probably performing functions, and only a few of them are located in the region with high affinity value, unconvenient for TFs to bind. We consider the latter one as ‘silence TFBS’ with little value.

Besides, as shown in figure 1 the theory that the more ‘Valid UAS’ meets the growth demand, the higher the yield, is consistent with the experimental data: PPDC1 has many Gcr1p TFBS, Mig1p TFBS and Rgt1p TFBS without Nrg1p TFBS. Although PCDC19 has more Gcr1p TFBS than PPDC1, Nrg1p belongs to a strong upstream repress sequence(URS), which decreases the transcription strength of PCDC19. PENO1 has less valid UAS with one URS therefore ranks after PCDC19 not only theoretically, but also actually, as we wish. The PTPI1 has less UAS, and so on.

4.Result Analysis

Above all, our conclusion has realistic references significance. The conclusion we gain is that, the more ‘Valid UAS’ located in ‘Valid Region’, the more significant is these UAS function effect. Transfering to the promoter strength measurement, it can work as well in some conditions. You can perform this measurement in silico as pre-experiment, which could help you to design, predicting the general strength of promoter or comparison between different promoters, not only on strength but also on specific function.

UAS are commonly found in eukaryotic promoters, therefore our combination of nucleosome affinity and TFBS prediction enables measurement and analysis of many registered promoter parts, a step that may need to be achieved artificially because the diversity of transcription factors is not quantifiedyet. We believe that a combination of some experience will allow it to make better judgments about the characteristic of promoters. Obviously, it is an exited job that there is such a gigantic repository waiting for us to explore with this measurement.