Difference between revisions of "Team:SCUT-China/Engineering"

Line 9: Line 9:
 
   <title></title>
 
   <title></title>
 
   <!-- 4444444444444444444444此处上传时需要删除44444444444444444 -->
 
   <!-- 4444444444444444444444此处上传时需要删除44444444444444444 -->
  <link rel="stylesheet" href="css/style.css">
+
 
  <link rel="stylesheet" href="css/style-content.css">
+
  <link rel="stylesheet" href="css/style_ttc_p.css">
+
 
   <!-- 4444444444444444444444此处上传时需要删除44444444444444444 -->
 
   <!-- 4444444444444444444444此处上传时需要删除44444444444444444 -->
  
Line 21: Line 19:
 
         #content { padding:0px; width:100%; margin-top:-7px; margin-left:0px; border:none;}
 
         #content { padding:0px; width:100%; margin-top:-7px; margin-left:0px; border:none;}
 
       body, html {background-color:#ececea; width: 100%; height: 100%;scroll-behavior: smooth;}
 
       body, html {background-color:#ececea; width: 100%; height: 100%;scroll-behavior: smooth;}
           body::-webkit-scrollbar{
+
 
            display: none;
+
/* 666666666666666666666666666666666666666666666666666666666666以下为加载动画样式66666666666666666666666666666666666666666 */
 +
        #gif
 +
           {
 +
            position: fixed;
 +
            top: 0;
 +
            left: 0;
 +
            width: 100%;
 +
            height: 100%;
 +
            z-index: 99999;
 +
            /* display: -webkit-box; */
 +
            -webkit-box-orient: horizontal;
 +
            -webkit-box-pack: center;
 +
            -webkit-box-align: center;
 +
            background: #fff;
 
         }
 
         }
  
 +
        #gif>img/*设置gif下的img元素样式*/
 +
        {
 +
 +
            /* width:40%; */
 +
        }
 +
          .gif2{
 +
            top: 25%;
 +
            height:auto;
 +
            margin-left:30%;
 +
            width:40%;
 +
          }
 +
          .gif1{
 +
            margin-left:35%;
 +
            top: 70%;
 +
            width:30%;
 +
            margin-top:-25px;
 +
 +
          }
 +
 +
/* 666666666666666666666666666666666666666666666666666666666666以上为加载动画样式66666666666666666666666666666666666666666 */
 
   /* 左侧导航栏样式 */
 
   /* 左侧导航栏样式 */
 
   #bodyContent h2{
 
   #bodyContent h2{
Line 63: Line 94:
 
     height:100%;
 
     height:100%;
 
     background-color:rgba(0,0,0,0);
 
     background-color:rgba(0,0,0,0);
     background:url("https://static.igem.org/mediawiki/2021/e/ea/T--SCUT-China--engineering-1.jpg") ;
+
     background:url("https://static.igem.org/mediawiki/2021/2/2f/T--SCUT-China--description-0.png") ;
 
     background-size: 100% 100%;
 
     background-size: 100% 100%;
 
     background-attachment: fixed;
 
     background-attachment: fixed;
Line 138: Line 169:
 
   }
 
   }
  
 
+
  /* #nav li a:hover > img{
 +
    visibility:visible;
 +
  } */
 
   #nav li.on{
 
   #nav li.on{
 
       background:#dad9d5;
 
       background:#dad9d5;
  }
+
  }
  
  
 
   #right-nav li a img{
 
   #right-nav li a img{
    width: 30px;
+
    width: 30px;
    height: auto;
+
    height: auto;
    float: right;
+
    float: right;
    margin-right: 10px;
+
    margin-right: 10px;
 
   }
 
   }
 
   #right-nav {
 
   #right-nav {
Line 157: Line 190:
 
   }
 
   }
 
   #right-nav li {
 
   #right-nav li {
        width: 167px;
+
        width: 167px;
 
         height: 32px;
 
         height: 32px;
 
         line-height: 32px;
 
         line-height: 32px;
Line 531: Line 564:
  
 
     <body  >
 
     <body  >
 +
 
       <!-- 顶部导航栏位置 -->
 
       <!-- 顶部导航栏位置 -->
 
   <div class="header" >
 
   <div class="header" >
Line 670: Line 704:
 
     </div>
 
     </div>
 
<!-- 444444444444444444444444444444444444444444444444444444444444444以下需要替换(直接替换即可)预览时字体效果不会出现44444444444444444444444444444444444444444444444444444444444444 -->
 
<!-- 444444444444444444444444444444444444444444444444444444444444444以下需要替换(直接替换即可)预览时字体效果不会出现44444444444444444444444444444444444444444444444444444444444444 -->
 +
      <!-- -0000000000000000000000000000000000000加载动图000000000000000000000000000000000000000 -->
 +
    <div id= "gif">
 +
      <p style="font-size:30px;margin-left:43%;margin-top:20px;">Loading...</p>
 +
        <img class="gif2"src = "https://static.igem.org/mediawiki/2021/b/b8/T--SCUT-China--model-gif-bool.gif"/>
 +
        <img class="gif1"src = "https://static.igem.org/mediawiki/2021/1/12/T--SCUT-China--model-gif-bar.gif"/>
 +
    </div>
 +
    <!-- -0000000000000000000000000000000000000以上加载动图000000000000000000000000000000000000000 -->
 +
 
     <!-- 111111111111111111111111111111111111111111111111左侧导航条,标准样式11111111111111111111111111111111111 -->
 
     <!-- 111111111111111111111111111111111111111111111111左侧导航条,标准样式11111111111111111111111111111111111 -->
  
<main class="main">
+
<main class="main">
      <div class="main_div" >
+
<div class="main_div" >
      <div class="left" >
+
  <div class="left" >
        <div class="siderBar">
+
    <div class="siderBar">
        <div class="menu">
+
    <div class="menu">
            <ul id="nav">
+
        <ul id="nav">
              <li style="font-size:22px;"><a class="a_nav">Engineering</a></li>
+
          <li style="font-size:26px;"><a class="a_nav">GA-HMM</a></li>
                <li ><a>Overview</a></li>
+
          <li ><a>Background</a></li>
                <hr class="hrmar">
+
          <li><a >Fundamental</a></li>
                <li><a >UAS adding location </a></li>
+
          <li><a >Strategy</a></li>
                <li><a >Add UAS </a></li>
+
          <li><a > Result Analysis</a></li>
                <li><a >Multi copy of UAS </a></li>
+
          <li><a >Availability</a></li>
                <hr class="hrmar">
+
          <li><a >References</a></li>
                 <li><a >References</a></li>
+
        </ul>
            </ul>
+
      </div>
 +
    </div>
 +
  </div>
 +
 
 +
  <div class="right" style="width: 75%;float: right;">
 +
    <div class="content-my">
 +
        <div id="wrap">
 +
          <hr class="hrmar" style="visibility:hidden;" id="content_part_h2">
 +
          <hr class="hrmar"style="visibility:hidden;" id="content_part_h2">
 +
          <div id="hrmar">
 +
            <div id="content_part">
 +
                 <h2  class="content_text_h2">Background</h2>
 +
                  <div class="content_div_text">
 +
                  <p  id="p1_1"align="left">  Most eukaryotic genomic DNA is wrapped in nucleosomes, which occlude and <span>strongly distort</span> the wrapped DNA. Transcription factor is a special protein that binds to a specific region of the gene to start transcription downstream. However, in the chromosomal state, the genomic DNA is tightly bound to the nucleosome. It can not reveal the <span>transcribed factor binding sites</span> (TFBS), the upstream activating sequence (UAS) for the promoter. Only when the nucleosome is <span>relaxed or not tightly bound</span> could TF have more opportunity to bind UAS and function effectively.                  </p>
 +
                  <p  id="p1_2"align="left">  The affinity of nucleosomes to genomic DNA is determined by the <span>arrangement and combination</span> of sequence bases, so the affinity will be different if the sequence is slightly altered. Therefore, we hope to decrease the histone binding affinity score by introducing mutations to the sequence bases except for UAS and the core region. For this reason, a tool that can calculate the nucleosome affinity as an <span>evaluation function</span> to score the effect of the introduction of base fine-tuning is <span>in urgent need.</span>                  </p>
 +
                  <p  id="p1_3"align="left">  Suppose there is a scoring tool that could calculate histone binding affinity theoretically. In that case, we could calculate nucleosome affinity for all the possible arrangements of bases in a <span>fixed length</span>. The promoter sequence in theory with the lowest nucleosome affinity sequence and highest strength could be <span>selected</span>. Of course, this approach has extremely high time complexity and low specificity; that's the reason we choose to combine genetic algorithm to optimize the promoter sequence iteratively.                  </p>
 +
                  </div>
 
             </div>
 
             </div>
        </div>
+
          </div>
      </div>
+
  
      <div class="right" style="width: 75%;float: right;">
+
          <hr class="hrmar"style="visibility:hidden;" id="content_part_h2">
        <div class="content-my">
+
          <div id="hrmar">
            <div id="wrap">
+
            <div id="content_part">
              <hr class="hrmar" style="visibility:hidden;" id="content_part_h2">
+
                <h2 class="content_text_h2">Fundamental</h2>
              <hr class="hrmar"style="visibility:hidden;" id="content_part_h2">
+
                <div class="content_div_text">
              <div id="hrmar">
+
                  <p  id="p2_1"align="left">   The hidden Markov model (HMM) has been known for decades<sup>[1]</sup>.<br>    Given the characteristic of DNA sequence, and the accumulated pieces of evidence that DNA sequence itself can highly predict the localization of nucleosome in vivo, also the function of chromatin is closely related to the localization of nucleosome, scientists proposed using a <span>duration Hidden Markov Model</span> (dHMM) to predict the occupation of the corresponding nucleosomes in chromosomal DNA: divided into the <span>nucleosome (N) state </span>and<span> the linker (L) state</span>, where the length of the nucleosome state is fixed at 147 bp and the linker state is variable<sup>[2]</sup>. It is assumed that the two states must be alternately connected. A complete chromatin sequence must begin and end in the linker state, training a 4th order time-dependent Markov chain for the N state and a homogeneous 4th order Markov chain for the L state.<br>    In the detailed method, they let <span>e</span> = e<sub>1</sub>, ..., e<sub>147</sub> be corresponding to the nucleosome DNA sequence, and let P<sub>N</sub> be the probability of observing <span>e</span> as a nucleosome, computed as the product of probabilities for both Watson and Crick strands under the 4th order Markov Chain model. They assume that the linker DNA length of a given species has an unknown distribution F<sub>L</sub> (k) defined for k = 1, ..., τ<sub>L</sub> (the maximum linker length allowed). An observed linker DNA sequence <span>e</span> = e<sub>1</sub>, ..., e<sub>k</sub> carries two pieces of information, the length is k bp, and given which, the emitted letters are e<sub>1</sub>, ..., e<sub>k</sub> . Let G<sub>L</sub> (<span>e</span>|k) denote the homogeneous Markov chain model for the linker DNA (again including both strands). Then observing <span>e</span> as a linker DNA has probability
                 <div id="content_part">
+
                  </p>
                    <h2 class="content_text_h2">1. Overview</h2>
+
                  <div class="content_div_img">
                      <div class="content_div_text">
+
                      <img style="width:50%;" src="img/model/HMM_function1.png">
                      <p  id="p1"align="left">     As a team participating in the synthetic Biology Competition, we plan to follow the engineering philosophy of "design-build-test-learn" to carry out our promoter engineering projects. In this process, we apply the “learning by doing” approach. Based on the experience learned from each round of modification, then we continue to design the next round of modification, and finally get the ideal result. We are good at summarizing experience from each modification and some exploring principle from existing research, so as to provide more experience for rational design of promoters in the future.
+
                  </div>
                      </p>
+
                  <p  id="p2_2"align="left">   Suppose <span>x</span> = x<sub>1</sub>, ..., x<sub>n</sub> is a genomic DNA sequence of length n, where x<sub>i</sub> = A/C/G/T. Let <span>z</span> = z<sub>1</sub>, ..., z<sub>n</sub> be the corresponding hidden state path, where z<sub>i</sub> = 1 if x<sub>i</sub> is covered by a nucleosome state, and 0 otherwise. Suppose that the path <span>z</span> = z<sub>1</sub>, ..., z<sub>n</sub> partitions <span>x</span> into k consecutive nucleosome or linker state blocks, in which the nucleosome blocks have a uniform length of 147 bp, whereas the length of linker blocks may vary. They denote these blocks as <span>y</span> = <span>y+</span>, ..., <span>y<sub>B</sub> </span>, and their state identification as <span>s</span> = s<sub>1</sub>, ..., s<sub>B</sub> , where s<sub>i</sub> = 1 if <span>y<sub>i</sub></span> is nucleosome state, and 0 otherwise. The probability of observing (<span>x</span>, <span>z</span>) is given by                 </p>
                      </div>
+
                  <div class="content_div_img">
                </div>
+
                      <img style="width:90%;"src="img/model/HMM_function2.png">
              </div>
+
                  </div>
 +
                  <p id="p2_3"align="left">  where π<sub>0</sub>(s<sub>1</sub>) and π<sub>e</sub>(s<sub>B</sub> ) stand for the probabilities that the chain initializes and ends with state s<sub>1</sub> and s<sub>k</sub> respectively, and I is an indicator function. Since they assume that a complete chromatin sequence must start with and end in a linker state, π<sub>0</sub>(s<sub>1</sub> = 0) = π<sub>e</sub> (s<sub>B</sub> = 0) = 1. Define the nucleosome occupancy at a specific position i, denoted o i , as the posterior probability that z<sub>i</sub> = 1, i.e.,                  </p>
 +
                  <div class="content_div_img">
 +
                      <img style="width:50%;"src="img/model/HMM_function3.png">
 +
                  </div>
 +
                  <p  id="p2_4"align="left">   Most importantly, the histone binding affinity score at position i is defined as the log likelihood ratio for the region x<sub>i-73</sub>, ... x<sub>i</sub> , ..., x<sub>i+73</sub> to be a nucleosome vs. a linker, i.e.,                  </p>
 +
                  <div class="content_div_img">
 +
                      <img  style="width:80%;"src="img/model/HMM_function4.png">
 +
                  </div>
 +
                  <p  id="p2_5"align="left">  Given the models P<sub>N</sub>, G<sub>L</sub> and F<sub>L</sub>, the optimal path <span>z</span> can be found by the standard Viterbi algorithm, and the nucleosome occupancy score can be estimated using forward and backward algorithms<sup>[2]</sup>.                  </p>
 +
                  <p  id="p2_6"align="left">  The model <span>has been trained</span> on a large number of yeast nucleosome DNA fragments dataset that have been pyrosequencing and on a species by species basis, and the test results are satisfactory<sup>[2]</sup>. They made the <span>nucleosome occupancy predicting tool</span> into a software tool called <span>NuPoP</span>, which means "nucleosome position predicting", as an R language package for researchers to use.                 </p>
 +
                </div>
 +
            </div>
 +
          </div>
  
              <hr class="hrmar" id="content_part_h2" style="visibility:hidden;">
+
          <hr class="hrmar"style="visibility:hidden;" id="content_part_h2">
              <div id="hrmar">
+
          <div id="hrmar">
                <div id="content_part">
+
            <div id="content_part">
                    <h2 class="content_text_h2">2. UAS adding location</h2>
+
              <h2 class="content_text_h2">Strategy</h2>
                    <h4 class="content_text_h2" style="font-size:18px;margin-top:20px;"> 2.1 Design</h4>
+
                <h4 class="content_text_h2" style="font-size:18px;margin-top:20px;"> Optimizing design</h4>
                    <div class="content_div_text">
+
                <div class="content_div_text">
                        <p id="p2_1"align="left">      The YeTFaSCo database(<a href="http://yetfasco.ccbr.utoronto.ca/scanSeqs.php">http://yetfasco.ccbr.utoronto.ca/scanSeqs.php</a>) was used to predict the transcription factor binding sites (TFBs) of PPDC1, PSED1L, PALD4. Selected version: 1.02 and resulted in more than 100 putative TFBSs with a matching score higher than 0.75. In order to avoid the incompleteness of using a single database and improve the accuracy of prediction, the YEASTRACT database (<a href="http://www.yeastract.com/formtfsbindingsites.php">http://www.yeastract.com/formtfsbindingsites.php</a>) also used for prediction and comparison. These TFBSs were classified into five groups according to the function (Table 1)
+
                  <p id="p3_1"align="left">      We designed to obtain the variants with different nucleosome affinity of the promoter sequence by introducing the limited amount of random bases <span>mutation</span> without destroying the UAS sequence and the core region of the promoter.
                      <br><br></p>
+
                  </p>
                        <p style="font-size:12px;text-align:center;color:#666666">
+
                  <p id="p3_2"align="left">     Actually, a <span>similar</span> iterative optimization model has been proposed, called <span>NucOpt Progs</span>. The theory is technically known as a “<span>Greedy algorithm</span>”, in which all possible random single mutations are <span>enumerated</span> on the initial promoter sequence, and every single sequence variant was calculated the nucleosome affinity by NuPoP to obtain a score. After that, the sequence with the lowest affinity was selected as the winner in this round, then taken as the parent placing into the next round, calculating all mutation possible cases enumeration on this winner and vote in a new champion variant candidate. Finally, the theoretical best optimal sequence which is difficult to gain higher score is obtained<sup>[3]</sup>.                 </p>
                            Table 1. Predicted putative TFBs on P<sub>PDC1</sub>, P<sub>SED1L</sub>, P<sub>ALD4</sub> by YeTFaSCo database and YEASTRACT database
+
                  <p id="p3_3"align="left">     However, after learning and testing NucOpt Progs, we found that this method has many <span>disadvantages</span>. First, the exhaustive enumeration method brings<span> immense time complexity</span>. It takes 36 hours to optimize a sequence of 800 bp length (based on the computer capacity). Secondly, single mutation only and greedy algorithm in use obviously determine that it's extraordinarily easy to get optimization fallen into a <span>local optimal solution</span>. For example, assuming that a single base change is a component of a group of mutations that significantly decrease histone affinity, but rarely calculating the benefit exactly this single base change could bring is unremarkable. In this case, the superior choice would easily be neglected. Thirdly, taking into account the biological effect, other published studies have shown that rarely low affinity is <span>not directly related</span> to the promoter's strength<sup>[3]</sup>. At the same time, the researchers speculate, the site distribution of nucleosome affinity may play a vital role in the promoter strength.                 </p>
                        </p>
+
                  <p id="p3_4"align="left">     Given the above three defects, we have designed a software <span>NuPGO</span> which can make up for these shortcomings and achieve a <span>better optimization effect</span>. Our new strategy is to combine NuPoP with the genetic algorithm(Fig.1) using the <span>roulette wheel operator</span> so that the base number of mutations can realize <span>adaptive</span> convergence according to probability instead of only introducing a single random mutation at a time, which not only effectively reduce the time complexity but also take much more possibilities into account, escaping local optimal solution. Considering the biological effect, we additionally set the <span>user-defined region weights</span> especially, which enable the algorithm to place larger or smaller weights on the specific sequence region.                 </p>
                      <div class="content_div_img">
+
                  <div class="content_div_img">
                          <img src="https://static.igem.org/mediawiki/2021/2/22/T--SCUT-China--engineering_1.png">
+
                      <img style="width:85%;"src="img/model/Fig1_genetic_algorithm.png">
                      </div>
+
                  </div>
                        <p  id="p2_2"align="left">According to the characteristics of P<sub>SED1L</sub> and P<sub>ALD4</sub> to control the synthesis of Valencene, which mainly expressed in the later stage of fermentation. We rationally selected the transcription factor binding sites of CAT8p and ADR1p for promoter modification.
+
                  <p style="font-size:16px;text-align:center;">Figure 1. Genetic algorithm schematic diagram</p>
                      </p>
+
                </div>
                      <p  id="p2_3"align="left"><br>    CAT8p is a Zinc cluster transcriptional activator, which regulate the expression include of most genes in gluconeogenesis, ethanol utilization and glyoxylate cycle<sup> [1]</sup>. CAT8p can bind to carbon source response elements and activate target genes after glucose consumption, the binding motif is 5′-YCCNYTNRKCCG-3′<sup> [2]</sup>.
+
            </div>
                      </p>
+
                      <p id="p2_4"align="left"><br>    ADR1p is a Carbon source-responsive zinc-finger transcription factor, which first identified as a transcription factor activate the transcription of the alcohol dehydrogenase gene ADH2. It also activates genes involved glucose fermentation, glycerol metabolism and fatty acid utilization. The binding motif is 5′-TTGGRG-3′. In addition<sup> [3-4]</sup>, ADR1p and CAT8p may interact when activating the transcription of certain genes<sup> [5]</sup>.
+
                      </p>
+
                      <p  id="p2_5"align="left"><br>    In cells, transcription is a process of three-dimensional regulation. The transcription factor binding promoter has a positional effect. When the transcription factor binding site is at a position with high nucleosome affinity, the transcription factor may not be able to bind to the site due to steric hindrance<sup> [6]</sup>. In order to find out which position of PDC1 can make the added UAS (upstream activation sequences) avoid the effect of steric hindrance. The reported UAS1 and the database predicted 100% binding UAS2 were added to the two key positions of PDC1, which can normally bind to transcription factors and activate. UAS1 is the CAT8 binding site of the FBP1 promoter, and UAS2 is the ADR1p binding site predicted to be 100% bound in the ALD4 promoter. Studies have shown that there may be a positive interaction between CAT8p and ADR1p, so we added UAS1 and UAS2 closer to each other. (Figure 1)
+
                      </p>
+
                      <p id="p2_5_1"align="left"><br>     Considering promoter architecture constraints, (a) avoiding a change on other Predicted putative TFBSs for the activation of PDC1 promoter, (b) determining positions for CAT8p and ADR1p binding sites that are close proximity to the core promoter, (c) existing binding sites were replaced with new ones, reduce the changes to the local sequence context.
+
                      </p>
+
                      <div class="content_div_img">
+
                            <img style="width:70%;"src="https://static.igem.org/mediawiki/2021/1/1a/T--SCUT-China--engineering_3.png">
+
                      </div>
+
                      <p style="font-size:12px;text-align:center;color:#666666">Figure 1. The addition of different positions of CAT8 and ADR1</p>
+
  
 +
            <div id="content_part">
 +
                <h4 class="content_text_h2" style="font-size:18px;margin-top:20px;"> Greedy Algorithm vs Roulette Operator</h4>
 +
                <div class="content_div_text">
 +
                  <p id="p3_5"align="left">      Greedy algorithm and roulette wheel selection algorithm are both essentially <span>evolutionary algorithms</span>. The greedy algorithm also performs better than other evolutionary algorithms in some cases. Still, its efficiency advantage brought by enumeration couldn't display here, while the <span>multi-factor considering</span> ability in wheel selection algorithm stands out.
 +
                  </p>
 +
                </div>
 +
                <h4 class="content_text_h2" style="font-size:18px;margin-top:20px;"> Greedy Algorithm</h4>
 +
                <div class="content_div_text">
 +
                <p id="p3_6"align="left">    The fatal disadvantage of the previous promoter optimizing algorithm is that only the<span> best performer</span> is selected in each round. Perhaps the author aimed to reduce the time complexity of exhaustive enumeration and simply pursue the best score. But such algorithms can easily trap the results into <span>local optima</span>. For example, the three-point mutation on site ABC decrease the histone affinity more than the three-point mutation on site DEF. However, the affinity decrease of site B mutation is smaller than that of site D mutation, so the greedy algorithm will choose site D mutation strategy as the winner, while site B strategy is obviously better actually.          </p>
 +
                <p id="p3_7"align="left">      Moreover, only a random single mutation is introduced in each round. In other words, the mutation number is set to 1, which leads to gigantic amounts of <span>enumerations</span>, and the time complexity has not been sufficiently reduced. The execution of the greedy algorithm is time-consuming. In our test, an 800bp length promoter had to cost 36 hours to run.                </p>
 +
                <div class="content_div_img">
 +
                      <img src="img/model/Fig2_Tradition_1080p.gif">
 +
                  </div>
 +
                  <p style="font-size:16px;text-align:center;">Figure 2. Schematic diagram of NucOpt Progs</p>
 +
                </div>
  
 +
                <h4 class="content_text_h2" style="font-size:18px;margin-top:20px;"> Roulette Operator</h4>
 +
                <div class="content_div_text">
 +
                  <p id="p3_8">      The wheel selection method is also known as the fitness proportionate selection (FPS), which means that <span>the fitness of an individual is directly proportional to the probability of being selected</span>, similar to putting all the individuals on a wheel, the area occupied by each individual is directly proportional to its fitness, that is, the probability of being selected is determined by the area of the plate, namely the fittness of each individual when the wheel spins randomly.<br>      In other words, the fitness is directly proportional to the probability of being selected, turning the wheel N times to <span>select N </span><span>individuals</span>. Meanwhile, it also means that an individual can be selected repeatedly.
 +
                  </p>
 +
                  <div class="content_div_img">
 +
                      <img src="img/model/Fig3_GA_1080p.gif">
 +
                  </div>
 +
                  <p style="font-size:16px;text-align:center;">Figure 3. Schematic diagram of genetic algorithm applied to promoter optimizer NuPGO</p>
 +
                  <p id="p3_9">      For random bases mutations introduced in each round, we set the mutation amount as an <span>adaptive</span> parameter. Each base position in the sequence is a probability mutation rather than the unintelligent setting the amount as "1" from beginning to end. To ensure the rationality and accuracy of the optimization, we set the threshold of <span>mutation probability</span> to 0.0005 so that the number of base mutations in each round would<span> not exceed </span> 5. Of course, other choices are worth expanding. By doing this, the <span>time complexity</span> of the model is reduced to some degree, the curve will <span>converge faster</span> when performing the gradient descends, broadening the search, and <span>more combinations</span> would be explored to approach the global optimal solution. As the iteration converges, the mutation probability will gradually approach 0, that is, reaching the final optimized promoter sequence(Fig.3).
 +
                  </p>
 +
                </div>
 +
            </div>
  
                    </div>
+
            <div id="content_part">
                </div>
+
                <h4 class="content_text_h2" style="font-size:18px;margin-top:20px;">Specific Weights</h4>
                <div id="content_part">
+
                <div class="content_div_text">
                    <h4 class="content_text_h2" style="font-size:18px;margin-top:20px;"> 2.2 Build</h4>
+
                  <p id="p3_10"align="left">     The critical UAS are usually only about 10 bp in length, and the vital goal we want to achieve, precisely speaking, is to reduce the nucleosome affinity around the <span>UAS region</span> and to make the UAS region tend to become the linker region rather than nucleosome region.<br>      The final goal of the algorithm is to decrease the total affinity score of the whole promoter sequence as far as possible, which is <span>likely to ignore</span> the key UAS region mutation and choose other strategies with more significant benefit, the final optimized promoter obtained in such condition may not make use of the advantages of UAS. Specifically, the key UAS is still in the high histone affinity region. To compensate for this shortcoming, we decided to <span>artificially add a regional weight </span>to the score calculation process. According to the formula introduced above, dHMM calculates the nucleosome affinity score for every single site, and then calculates the area under the curve (AUC). <br>    To make some adjustments, we decide to compute the score of every single position with a custom zoom when calculating the specific area that we regard special, such as the area of the crucial UAS, in other words, multiplying the simply calculated score by the user-defined weight to get each final single point score.<br>
                    <div class="content_div_text">
+
                  </p>
                      <p id="p2_6"align="left">     Using Overlap PCR method, the sequence at the key position of the PDC1 promoter is replaced. Taking the construction of the M1 mutant strain as an example, the construction method of the M2 strain was the same. Find the CAT8 binding site of FBP1 promoter through the literature and the ADR1 binding site from ALD4 promoter predicted by the database (Table 2).<br><br>Design the binding site motifs on the homology arm, (1) Using primer M1-F/PDC1-R to amplify fragment one, primer M1-R/PDC1-F to amplify fragment two, (2) assemble fragment 1 and fragment 2 to obtain M1 promoter, (3) Using primer PDC1-F/PDC1-R amplifies M1 promoter. Using BamHI and XbaI restriction enzymes to digest the M1 promoter and YEp181-PDC1p-VS-SAG1t vector, and react at 37°C for 2h. After the reaction, the promoter and vector were purified and ligated at 16°C overnight. Transform into E. coli DH5α, pick the correct transformants, and extract the YEp181-M1-VS-SAG1t plasmid.<br><br>Then, use donor DNA primers (Table4) to amplify the M1-VS-SAG1t expression cassette with a homology arm next to the LEU2 site. Transform the constructed gRNA expression plasmid and M1-VS-SAG1t expression cassette into yeast cells which can express Cas9 protein, and use the method of auxotrophic medium selection and colony PCR verification to obtain the strains that successfully knocked into the M1-VS-SAG1t expression cassette.
+
                  <p id="p3_11"align="left">      Fortunately, the results of testing different weights on the NuPGO algorithm are <span>satisfactory</span>. After adding weights to a specific area, NuPGO will pay <span>more attention </span>to that area. In the final optimized result sequence, this specific region will get better performance of nucleosome affinity degradation.
                      </p>
+
                  </p>
 +
                </div>
 +
            </div>
  
                      <p style="font-size:12px;text-align:center;color:#666666"><br>Table 2. CAT8 and ADR1 binding sites</p>
+
            <div id="content_part">
                      <div class="content_div_img">
+
                <h4 class="content_text_h2" style="font-size:18px;margin-top:20px;">Sustainable</h4>
                          <img style="width:46%;"src="https://static.igem.org/mediawiki/2021/4/43/T--SCUT-China--engineering_2.png">
+
                <div class="content_div_text">
 +
                  <p id="p3_12"align="left">    Ultimately, and most worth noting is that, neither the NuPoP software program which can evaluate DNA’s nucleosome affinity<sup>[2]</sup> nor the iterative optimization code<sup>[3]</sup> that uses the greedy algorithm, because of the <span>components missing </span>and internet limitation, following the tutorials provided in the papers, we couldn’t get the complete executable code file and failed to run the optimizing software with greedy algorithm directly, as well as other researchers. It means the use of NuPoP and greedy algorithm optimizer have been difficult for those who lack the computer or bioinformatics study infrastructure.<br>      Therefore, we <span>refactored</span> the implementation code almost completely according to some of its logic, and wrote the genetic algorithm using the roulette wheel operator, which is much easier to read and user-friendly much more than the previous code. In other words, we <span>fix the algorithm published before</span> and come up with a <span>more novel and intelligent</span> software. Both the greedy algorithm optimizer <span>NucOpt Progs</span> and genetic algorithm optimizer <span>NuPGO</span> are published by us for users’ convenient.<br>      This means that <span>anyone</span>, especially iGEM teams in the future and researchers in labs, can <span>easily get started</span> using our program based on a tutorial. <span>Input your initial data</span> and conditions visually and easily and it will be on. Those interested are also welcomed to read our source code and make some comments or modifications, and maybe a stronger NuPGO will be come up with next year.
 +
                  </p>
 +
                </div>
 +
            </div>
 +
          </div>
  
                      </div>
 
  
                      <!-- <p style="font-size:12px;text-align:center;color:#666666"><br>Table 3. Primers of M1 and M2 promoter</p>
+
          <hr class="hrmar"style="visibility:hidden;" id="content_part_h2">
                      <div class="content_div_img">
+
          <div id="hrmar">
                          <img style="width:65%;"src="img/Engineering/add_1.png">
+
            <div id="content_part">
                      </div>
+
              <h2 class="content_text_h2">Result Analysis</h2>
 +
                <h4 class="content_text_h2" style="font-size:18px;margin-top:20px;"> Optimization result on our promoter</h4>
 +
                <div class="content_div_img">
 +
                    <img src="img/model/Fig4_PDC1-M8.gif">
 +
                </div>
 +
                <p style="font-size:16px;text-align:center;">Figure 4. NuPGO optimization on our project promoter nucleosome affinity</p>
 +
                <h4 class="content_text_h2" style="font-size:18px;margin-top:20px;">  Time-consuming comparison</h4>
 +
                <div class="content_div_img">
 +
                    <img src="img/model/.png">
 +
                </div>
  
                      <p style="font-size:12px;text-align:center;color:#666666"><br>Table 4. Donor DNA primers</p>
+
                <h4 class="content_text_h2" style="font-size:18px;margin-top:20px;"> Optimization effect comparison</h4>
                        <div class="content_div_img">
+
                <div class="content_div_img">
                          <img style="width:53%;"src="img/Engineering/add_2.png">
+
                    <img src="img/model/Fig5_traGA.png">
                      </div> -->
+
                </div>
                    </div>
+
                <p style="font-size:16px;text-align:center;">Figure 5. Result comparison between Greedy Algorithm and Genetic Algorithm, weights added result and result without weights.</p>
                </div>
+
                <div class="content_div_img">
                <div id="content_part">
+
                    <img src="img/model/Fig6_Descent_curve.png">
                    <h4 class="content_text_h2" style="font-size:18px;margin-top:20px;"> 2.3 Test</h4>
+
                </div>
                    <div class="content_div_text">
+
                <p style="font-size:16px;text-align:center;">Figure 6. Descent curve of total nucleosome affinity score between Greedy Algorithm and Genetic Algorithm, weights added result and result without weights.</p>
                        <p id="p2_5_2">
+
                            The strain was inoculated into 10 ml of fermentation broth, the initial OD600 of the fermentation broth was controlled to be 0.05, and it was placed in a shaker at 30℃, 220 rpm for 64 hours. After 64 hours of shake flask fermentation, the concentration of valencene was detected by gas chromatography. (Figure 2)
+
                        </p>
+
  
                      <div class="content_div_img">
+
                <div class="content_div_img">
                            <img style="width:110%;"src="https://static.igem.org/mediawiki/2021/8/87/T--SCUT-China--engineering_4.png">
+
                    <img src="img/engineering_1.png">
                      </div>
+
                </div>
                      <p style="font-size:12px;text-align:center;color:#666666">Figure 2 Fermentation results of M1 and M2 mutant strains. (a) OD<sub>600</sub> of M1 and M2 mutant strain, (b) Valencene yield of M1 and M2 mutant strain. </p>
+
                <p style="font-size:16px;text-align:center;">Figure 7. Result comparison between Greedy Algorithm and Genetic Algorithm, weights added result and result without weights.</p>
 +
                <div class="content_div_img">
 +
                    <img src="img/engineering_1.png">
 +
                </div>
 +
                <p style="font-size:16px;text-align:center;">Figure 8. Descent curve of total nucleosome affinity score between Greedy Algorithm and Genetic Algorithm, weights added result and result without weights.</p>
  
                        <p id="p2_5_3"align="left">
 
                            The addition of CAT8 and ADR1 transcription factor binding sites at different positions shows different effects. No significant differences in Valencene production in M1, P<sub>PDC1</sub> were observed. The valencene production of the M2 mutant strain was increased by 18.9% compared to the original strain (P<sub>PDC1</sub>).
 
                        </p>
 
                    </div>
 
                  </div>
 
  
 +
                <h4 class="content_text_h2" style="font-size:18px;margin-top:20px;"> NuPGO result visualization</h4>
 +
                <div class="content_div_img">
 +
                    <img src="img/model/Fig9_Comparison_all_1080p_english.png">
 +
                </div>
 +
                <p style="font-size:16px;text-align:center;">Figure 9. Comparison between initial and optimized project promoter M8 sequence</p>
 +
                <div class="content_div_img">
 +
                    <img src="img/model/Fig10_Comparison_GCR1_1080p_english.png">
 +
                </div>
 +
                <p style="font-size:16px;text-align:center;">Figure 10. Partial protected UAS region during optimization</p>
  
  
                <div id="content_part">
+
                <h4 class="content_text_h2" style="font-size:18px;margin-top:20px;">NuPGO result verification</h4>
                    <h4 class="content_text_h2" style="font-size:18px;margin-top:20px;"> 2.4 Learn</h4>
+
                <div class="content_div_text">
                    <div class="content_div_text">
+
                  <p id="p4_1"align="left">     Unfortunately, the NuPGO-optimized promoter has some mutation of around hundreds of bp, requiring chemical synthesis to obtain the new promoter gene fragment. Moreover, it is not statistically representative of testing only the final variant. To prove that the strength of the NuPGO optimized promoter sequence is consistent with the predicted results, we need to select at least 5 to 10 promoter variants, and conduct chemical synthesis and characterization experiments to verify their strength. Due to the lack of project funds and time constraints, we can not perform characterization experiments to ascertain the strength of the sequence obtained by the optimization algorithm. <br>     If conditions permit or any other team was interested in verifying the effect of NuPGO, we would like to or recommend selecting the promoter variants with 50, 100, 200, 400, 500bp mutations in the same run, also pick three variants at each of the position, such as 499, 500, 501bp. Then design the parallel characterization experiments to see the strength difference. The descent curve could also be consulted to obtain the selection choice strategy.<br>      Nevertheless, we have reason to believe that the results of the optimization are relatively reliable, because, in our references(Fig.6), the authors conducted experiments on the strength characterization of the promoter variants obtained by the greedy algorithm. We can conclude from the experimental results that nucleosome affinity optimization indeed can increase the promoter's strength and thus enhance expression.
                        <p id="p2_9"align="left">     In M2, UAS is added about 20 bp upstream of TATA-box located on P<sub>PDC1</sub>, and there are continuous 17 bp A/T near the position of TATA-box, which may cause the structure of this position to be relatively loose and the nucleosome affinity rate Low <sup>[7]</sup>, unable to form a nucleosome structure, transcription factors have a greater probability of binding to this site, and play a role in activating transcription <sup>[8]</sup>. In M1, UAS was added between -393 and -450 located in the upstream region of the PDC1 gene, and it did not work. The possible reason is that it is close to the original activation region, and there is dense transcription factor binding at this site, which hinders the combination of transcription factors to the added UAS. This suggests that we should transform on the basis of M2.
+
                  </p>
                    </div>
+
                  <div class="content_div_img">
                </div>
+
                      <img src="img/model/Fig11.png">
              </div>
+
                  </div>
 +
                  <p style="font-size:16px;text-align:center;">Figure 10. Redesign of native yeast promoters for increased expression by decreasing nucleosome affinity in reference article. <span>DOI: 10.1038/ncomms5002</span></p>
 +
                </div>
 +
            </div>
 +
          </div>
  
  
 +
          <hr class="hrmar"style="visibility:hidden;" id="content_part_h2">
 +
          <div id="hrmar">
 +
            <div id="content_part">
 +
              <h2 class="content_text_h2">Availability</h2>
 +
                <div class="content_div_text">
 +
                  <p id="5_1">      We publish our source code and packages on Github. We extraordinarily sincerely and strongly recommend that every future iGEM team or researcher using a eukaryotic promoter use our software to achieve a higher expressing level and rational use of UAS strong promoters.    </p>
 +
                  <a href="https://2021.igem.org/Team:SCUT-China">Github</a>
 +
                </div>
 +
            </div>
 +
          </div>
 +
          <hr class="hrmar"  id="content_part_h2" style="visibility:hidden;">
 +
          <div id="hrmar">
 +
          <div id="content_part">
 +
              <h2  class="content_text_h2">References</h2>
 +
              <div class="content_div_text">
 +
                <p>[1]L. R. Rabiner, "A tutorial on hidden Markov models and selected applications in speech recognition," in Proceedings of the IEEE, vol. 77, no. 2, pp. 257-286, Feb. 1989, doi: 10.1109/5.18626.    </p>
 +
                <p>[2]Xi, L., Fondufe-Mittendorf, Y., Xia, L. et al. Predicting nucleosome positioning using a duration Hidden Markov Model. BMC Bioinformatics <span>11,</span> 346 (2010). https://doi.org/10.1186/1471-2105-11-346  </p>
 +
                <p>[3] [3]Curran, K., Crook, N., Karim, A. et al. Design of synthetic yeast promoters via tuning of nucleosome architecture. Nat Commun <span>5,</span> 4002 (2014). https://doi.org/10.1038/ncomms5002    </p>
 +
                <p>[4]de Boer, C.G., Vaishnav, E.D., Sadeh, R. et al. Deciphering eukaryotic gene-regulatory logic with 100 million random promoters. Nat Biotechnol <span>38,</span> 56–65 (2020). https://doi.org/10.1038/s41587-019-0315-8.    </p>
 +
              </div>
 +
          </div>
 +
        </div>
  
  
 +
        </div>
 +
  </div>
 +
</div>
 +
</main>
 +
<!-- -0000000000000000000000000000000000000底栏000000000000000000000000000000000000000 -->
 +
<div id="bottom_div">
 +
  <div id="bottom_div_img">
 +
  <!-- <img class="" src="img/bottom.png" alt=""> -->
 +
  <div class="a_div">
 +
    <a href="https://2021.igem.org/Team:SCUT-China"><div class="a_div"></div></a>
 +
  </div>
 +
</div>
  
              <hr class="hrmar" id="content_part_h2" style="visibility:hidden;">
+
</body>
              <div id="hrmar">
+
                <div id="content_part">
+
                    <h2  class="content_text_h2">3. Add UAS</h2>
+
                    <h4 class="content_text_h2" style="font-size:18px;margin-top:20px;"> 3.1 Design</h4>
+
                    <div class="content_div_text">
+
                        <p  id="p3_1"align="left">      The YeTFaSCo database and YEASTRACT database were used to predict the CAT8p binding site and ADR1p binding site of SED1L promoter, ALD4 promoter, and select data with a system score of 0.85 or more. According to the results of the first round of transformation, the CAT8p binding site was replaced with the predicted CAT8p binding site of P<sub>SED1L</sub> and P<sub>ALD4</sub> on the basis of M2, but the original ADR1p binding sequence was not changed. (Figure 3) </p>
+
                          <div class="content_div_img">
+
                              <img src="https://static.igem.org/mediawiki/2021/3/32/T--SCUT-China--engineering_6.png">
+
                          </div>
+
                          <p style="font-size:12px;text-align:center;color:#666666">
+
                              Figure 3. the additional add of CAT8p-3 binding site
+
                          </p>
+
                    </div>
+
  
                   
+
<!-- 加载进度条 -->
                </div>
+
<!-- <script type="text/javascript">
 +
    window.onload=function(){
 +
        var loading='<div class="loading"><div class="pic"></div></div>';
 +
        var varbody = document.getElementById("body");
 +
        var varloading = document.getElementById("loading");
 +
        varbody.append(loading);
 +
        setInterval(function(){
 +
            varloading.fadeOut();
 +
        },3000);
 +
    }
 +
</script> -->
  
                <div id="content_part">
+
<!-- 左侧导航栏固定与滑动 -->
                    <h4 class="content_text_h2" style="font-size:18px;margin-top:20px;"> 3.2 Build</h4>
+
<!-- 左侧导航栏固定与滑动 -->
 +
<script type="text/javascript">
 +
var nav = document.getElementById("nav");
 +
var card = document.getElementById("card-holder");
 +
var divBox = document.getElementById("divBox_0");
 +
var navH =nav.offsetTop-80;
 +
window.addEventListener('scroll',function(e){
  
                    <div class="content_div_text">
+
var scroH = document.body.scrollTop;
                        <p id="p3_2_1"align="left">
+
  if(scroH>=navH){
                            Taking the construction of the M3 mutant strain as an example, the construction method of the M4-M7 strain was the same. Find the CAT8 binding site motifs and ADR1 binding site motif predicted by the database (Table 5).<br><br>Design the binding site motifs on the homology arm, (1) Using primer M3-F/PDC1-R to amplify fragment one, primer M3-R/PDC1-F to amplify fragment two, (2) assemble fragment 1 and fragment 2 to obtain M3 promoter, (3) Using primer PDC1-F/PDC1-R amplifies M3 promoter. Using BamHI and XbaI restriction enzymes to digest the M3 promoter and YEp181-PDC1p-VS-SAG1t vector, and react at 37°C for 2h. After the reaction, the promoter and vector were purified and ligated at 16°C overnight. Transform into E. coli DH5α, pick the correct transformants, and extract the YEp181-M3-VS-SAG1t plasmid.<br><br>Then, use donor DNA primers (Table 4) to amplify the M3-VS-SAG1t expression cassette with a homology arm next to the LEU2 site. Transform the constructed gRNA expression plasmid and M3-VS-SAG1t expression cassette into yeast cells which can express Cas9 protein, and use the method of auxotrophic medium selection and colony PCR verification to obtain the strains that successfully knocked into the M3-VS-SAG1t expression cassette.
+
    nav.style.position = 'fixed'
                        </p>
+
    nav.style.top = '50px'
                        <p style="font-size:12px;text-align:center;color:#666666">
+
    nav.style.width='23%'
                            Table 5. Predicted CAT8 and ADR1 binding site motif
+
    card.style.visibility = 'visible'
                        </p>
+
    divBox.style.visibility = 'visible'
                        <div class="content_div_img">
+
  }else{
                            <img style="width:57%;"src="https://static.igem.org/mediawiki/2021/d/d9/T--SCUT-China--engineering_5.png">
+
    // alert( scroH-navH);
                        </div>
+
    nav.style.position = 'static'
 +
    nav.style.width='98%'
 +
    card.style.visibility = 'hidden'//可以设置它隐藏
 +
    divBox.style.visibility = 'hidden'
 +
  }
 +
})
 +
</script>
 +
<!-- 左侧导航栏交互效果- -->
 +
  <script>
 +
                window.onload = function () {
 +
                    var allDiv = document.querySelectorAll("#wrap #content_part_h2");
 +
                    var allNavLi = document.querySelectorAll("#nav li");
 +
                    var Div = document.getElementById("divBox");
 +
                    /**
 +
                    * 滚动一定距离时,返回顶部按钮出现    滚动页面改变导航li的弹出
 +
                    */
 +
                    function toTopIcon() {
 +
                        window.onscroll = function () {
 +
                            var d = document.documentElement.scrollTop || document.body.scrollTop;//获取滚动条高度
 +
                            var viewd = document.documentElement.clientHeight;//获取可视区高度
  
                        <!-- <p style="font-size:12px;text-align:center;color:#666666">
+
                            for (var i = 0; i < allDiv.length; i++) {
                            <br>
+
                                allDiv[i].index = i;
                            Table 6. Primers of M3-M7 promoters
+
                                if (d >= allDiv[i].offsetTop && d < allDiv[i].offsetTop + 500) {
                        </p>
+
                                    for (var j = 0; j < allDiv.length; j++) {
                        <div class="content_div_img">
+
                                        allNavLi[j].className = "";
                            <img style="width:72%;"src="img/Engineering/add_3.png">
+
                                        allNavLi[i].className = "on";
                        </div> -->
+
                                    }
                      </div>
+
                                }
                </div>
+
                            }
 +
                            <!---->
  
 +
                            var d = document.documentElement.scrollTop || document.body.scrollTop;//获取滚动条高度
 +
                            var h = document.body.scrollHeight;//全文高
 +
                            var view = document.documentElement.clientHeight;//获取可视区高度
 +
                            if(d>view){
 +
                              Div.style.height =  (d-view)/h*1250 + "px";
 +
                              Div.style.visibility = "visible";
 +
                            }else {
 +
                                Div.style.visibility = "hidden";
 +
                            }
  
                <div id="content_part">
+
                        }
                    <h4 class="content_text_h2" style="font-size:18px;margin-top:20px;">3.3 Test</h4>
+
                    }
                    <div class="content_div_text">
+
                      <p  id="p3_3_0"align="left">      The strain was inoculated into 10 ml of fermentation broth, the initial OD600 of the fermentation broth was controlled to be 0.05, and it was placed in a shaker at 30℃, 220 rpm for 64 hours. After 64 hours of shake flask fermentation, the concentration of valencene was detected by gas chromatography. (Figure 4)    </p>
+
                      <div class="content_div_img">
+
                          <img style="width:110%;"src="https://static.igem.org/mediawiki/2021/8/84/T--SCUT-China--engineering_7.png">
+
                      </div>
+
                      <p style="font-size:12px;text-align:center;color:#666666">
+
                          Figure 4. Fermentation results of M3-M7 mutant strains. (a) OD<sub>600</sub> of M3-M7 mutant strain, (b) Valencene yield of M3-M7 mutant strain.
+
                      </p>
+
  
                        <p id="p3_3"align="left">
+
                    /**
                          The modified promoters showed different intensities. The Valencene yield of M3, M4, M6, and M7 mutant strains was not significantly different from that of the control strain. But the Valencene yield of the M5 mutant strain was increased by 29.5% (5.58 mg/L) compared to the control strain.                  </p>
+
                    * 点击导航li li弹出  页面跳转到相应div位置
                     </div>
+
                    * scrollTo默认的是瞬间滚动到坐标位置,把behavior属性设置为smooth就可以支持平滑滚动了,不过缺点是无法设置滚动速率
                </div>
+
                     */
                <div id="content_part">
+
                    function navLiPop() {
                    <h4 class="content_text_h2" style="font-size:18px;margin-top:20px;">3.4 Learn</h4>
+
                        for (var i = 0; i < allNavLi.length; i++){
                    <div class="content_div_text">
+
                            allNavLi[i].index = i;
                      <p  id="p3_4"align="left">      In M5, the CAT8 binding site-3 is derived from the SED1L promoter, which has superior performance compared to the CAT8 binding site from P<sub>FBP1</sub> reported in the article. And we observed that only CAT8 binding site-3 has such an effect, other CAT8 binding sites may be false positive data predicted by the database. This suggests that we can continue to add CAT8 binding site-3 in M5 to increase the expression strength of the promoter.    </p>
+
                            allNavLi[i].onclick = function () {
                      </div>
+
                                for (var j = 0; j < allNavLi.length; j++){
                </div>
+
                                    window.scrollTo({
              </div>
+
                                        top:allDiv[this.index].offsetTop-50,
 +
                                        behavior:"smooth"
 +
                                    });
 +
                                }
 +
                            }
 +
                        }
 +
                    }
  
                <hr class="hrmar"  id="content_part_h2" style="visibility:hidden;">
+
                    /**
                <div id="hrmar">
+
                     * 点击按钮回到顶部(定时器平滑滚动)
                <div id="content_part">
+
                     */
                     <h2  class="content_text_h2">4. Multi copy of UAS</h2>
+
                    function navToTop() {
                     <h4 class="content_text_h2" style="font-size:18px;margin-top:20px;">4.1 Design</h4>
+
                    }
                    <div class="content_div_text">
+
                    navLiPop();
                      <p  id="p4_1"align="left">      Continue to add CAT8 binding site-3 from P<sub>SED1L</sub> on the basis of M5, add in two locations, the first is added upstream of the original ADR1 binding site, and the second is added to the CAT8 binding site-3 between ADR1 binding site. (Figure 5)   </p>
+
                    toTopIcon();
                      <div class="content_div_img">
+
                    navToTop();
                          <img src="https://static.igem.org/mediawiki/2021/d/dc/T--SCUT-China--engineering_8.png">
+
                }
                      </div>
+
            </script>
                      <p style="font-size:12px;text-align:center;color:#666666">
+
                          Figure 5. The additional add of CAT8p-3 binding site
+
                      </p>
+
                      </div>
+
                </div>
+
                <div id="content_part">
+
                    <h4 class="content_text_h2" style="font-size:18px;margin-top:20px;">4.2 Build</h4>
+
                    <div class="content_div_text">
+
                    <p id="p4_2_1">
+
                        Taking the construction of the M8 mutant strain as an example, the construction method of the M9 strain was the same. M10 is constructed on the basis of M8 or M9.<br><br>Design the CAT8 binding site motif on the homology arm, (1) Using primer M8-F/PDC1-R to amplify fragment one, primer M8-R/PDC1-F to amplify fragment two (Table 7), (2) assemble fragment 1 and fragment 2 to obtain M8 promoter, (3) Using primer PDC1-F/PDC1-R amplifies M8 promoter. Using BamHI and XbaI restriction enzymes to digest the M8 promoter and YEp181-PDC1p-VS-SAG1t vector, and react at 37°C for 2h. After the reaction, the promoter and vector were purified and ligated at 16°C overnight. Transform into E. coli DH5α, pick the correct transformants, and extract the YEp181-M8-VS-SAG1t plasmid.<br><br>Then, use donor DNA primers (Table 4) to amplify the M8-VS-SAG1t expression cassette with a homology arm next to the LEU2 site. Transform the constructed gRNA expression plasmid and M8-VS-SAG1t expression cassette into yeast cells which can express Cas9 protein, and use the method of auxotrophic medium selection and colony PCR verification to obtain the strains that successfully knocked into the M8-VS-SAG1t expression cassette.
+
                    </p>
+
                    <!-- <p style="font-size:12px;text-align:center;color:#666666">
+
                        Table 7. Primers of M8 and M9 promoter
+
                    </p>
+
                    <div class="content_div_img">
+
                        <img style="width:68%;"src="img/Engineering/add_4.png">
+
                    </div> -->
+
  
  
                    </div>
+
            <!--  111111111111111111111111111111111111111111111切换语言 -->
                </div>
+
            <script type="text/javascript">
                <div id="content_part">
+
            var count=0;
                    <h4 class="content_text_h2" style="font-size:18px;margin-top:20px;">4.3 Test</h4>
+
            // var oP=document.getElementsByClassName("content_bglt");
                    <div class="content_div_text">
+
          function changeLanguage(){
                        <p id="p4_3_1">
+
            // alert("点击事件发生");
                            The strain was inoculated into 10 ml of fermentation broth, the initial OD600 of the fermentation broth was controlled to be 0.05, and it was placed in a shaker at 30℃, 220 rpm for 64 hours. After 64 hours of shake flask fermentation, the concentration of valencene was detected by gas chromatography. (Figure 6)
+
            count++;
                        </p>
+
            if(count%2==1){
                      <div class="content_div_img">
+
              document.getElementById("p1_1").innerHTML="     染色体是由DNA与核小体缠绕紧密结合形成超螺旋而形成的,转录因子是一种特殊的蛋白质,会结合到基因的特定区域使下游开始转录,因此成为转录因子。而在染色体状态下的基因与核小体紧密地结合与盘旋,无法暴露出能够让核小体特异性结合的序列(transcriptional factor binding sites,TFBS),which对于启动子而言就是上游激活序列(upstream activating sequence,UAS)。只有在核小体松弛的时候或结合不紧密的区段,TF才能够更顺利地结合上UAS并发挥作用。";
                          <img style="width:110%;"src="https://static.igem.org/mediawiki/2021/0/0e/T--SCUT-China--engineering_9.png">
+
              document.getElementById("p1_2").innerHTML="      核小体对于基因的亲和力,是由基因的碱基排列组合决定的,因此碱基序列稍有不同,其对核小体的亲和力也会有不同。因此我们希望通过改变UAS以外的碱基序列,让启动子对核小体的亲和力降低。为此我们亟需一个工具 能够计算序列的核小体亲和力,作为评估函数,为引入的碱基微调的效果打分。";
                      </div>
+
              document.getElementById("p1_3").innerHTML="     假设拥有这样能够计算核小体亲和力的打分工具,理论上我们就能计算所有碱基排列的组合的核小体亲和力,选出亲和力最低的序列理论上就代表着强度最高的启动子。当然这样的思路拥有高时间复杂度和低特异性,我们选择结合遗传算法对启动子序列进行迭代优化。";
                        <p style="font-size:12px;text-align:center;color:#666666">
+
              document.getElementById("p2_1").innerHTML="     隐马尔科夫模型已经被世人研究经过了几十余年,鉴于DNA序列本身的特性,以及许多积累的证据都能够表明DNA序列本身就能高度预测核小体在体内的定位,而且染色质的功能也与核小体定位有密切关系,因此科学家提出了用dHMM模型预测染色体DNA对应的核小体占有情况:分为nucleosome(N)状态和linker(L)状态,N状态的长度固定为147bp,而L状态的长度是可变的。假定两个状态一定是交替相连的,并且一个完整的染色质序列以linker状态起始和结束。</br>"+"设 e = e<sub>1</sub>,... ,e<sub>147</sub>是一段核小体 DNA 序列。设 p<sub>n</sub> 是观察 e 属于核小体区域的概率,计算为4阶马尔可夫链模型下两个沃森与克里克的概率的乘积。</br>我们假设一个给定物种的linker DNA 长度可以遵循一个未知的分布,将f<sub> l </sub>(k)定义为 k = 1,... ,τ<sub>l</sub> (我们允许的最大linker度)。一个被观测的linker DNA 序列 e = e<sub>1</sub>,... ,e <sub>k</sub> 携带两段信息,长度是 k bp,给它定义的字母是 e<sub>1</sub>,... ,e <sub>k</sub>。设 G<sub>L</sub> (e | k)表示连接 DNA 的齐次马尔可夫链模型(同样包括两条链) 。然后观察 e 作为linker的可能性。";
                            Figure 6. Figure 6 Fermentation results of M8-M10 mutant strains. (a) OD<sub>600</sub> of M8-M10 mutant strain, (b) Valencene yield of M8-M10 mutant strain.
+
              document.getElementById("p2_2").innerHTML="     设 x = x<sub>1</sub>,... ,x<sub>n</sub> 是一个长度为 n 的 DNA 序列,其中 x<sub>i </sub>= A/C/G/T 设 z = z<sub>1</sub>,... ,z<sub>n</sub> 为对应的隐状态路径,其中如果 x<sub>i </sub>被一个nucleosome状态覆盖,z<sub> i</sub> = 1,否则为0。假设路径 z = z<sub>1</sub>,... ,z<sub>n</sub> 将 x 划分为 k 个连续的nucleosome或连接linker区,其中nucleosome区域的长度一致为147 bp,而linker的长度可能不同。我们把这些区块表示为 y = y + ,... ,y <sub>b</sub>,它们的状态标记为 s = s<sub>1</sub>,... ,s <sub>b</sub>,其中如果 y<sub> i</sub> 是nucleosome状态,s <sub>i</sub> = 1,否则为0。观测概率(x,z)计算方法为";
                        </p>
+
              document.getElementById("p2_3").innerHTML="      其中, π<sub>0</sub>(s<sub>1</sub>)和 π<sub>e </sub>(s <sub>b</sub>) 分别表示DNA链的初始和结束状态为 s<sub>1</sub>和 s<sub>k</sub> 的概率,I是指示函数。由于我们假设一个完整的染色质序列必须以linker状态开始和结束,π<sub>0</sub>(s<sub>1 </sub>= 0) = π<sub>e</sub> (s<sub>b</sub> = 0) = 1。我们定义了在一个特定位置 i 上的核小体占有,即 z<sub> i</sub> = 1的后验概率,";
                      <p  id="p4_2"align="left">     The Valencene yield of M9 and M10 mutant strains was not significantly improved compared to control strain。However, the Valencene yield of the M8 mutant strain was increased by 27.0% (6.75 mg/L) compared to M5 strain, and 56.6% (6.75 mg/L) compared with the unmodified strain.    </p>
+
              document.getElementById("p2_4").innerHTML="     位于 i 位置的组蛋白结合亲和力评分为区域 x<sub> i-73</sub>... x <sub>i</sub>,... ,x<sub> i + 73</sub>属于nucleosome区域和属于linker区域各自概率的对数似然比,即,";
                    </div>
+
              document.getElementById("p2_5").innerHTML="      给定模型 P<sub>N</sub>,G<sub>L</sub> 和 F<sub>L</sub>,最优路径 z 可以通过标准的维特比算法算法找到,核小体占用得分可以通过前向和后向算法估计。";
                </div>
+
              document.getElementById("p2_6").innerHTML="     该模型已经过根据大量经过焦磷酸测序的酵母核小体DNA片段数据以及按不同物种划分的数据进行训练,测试结果表现良好,并制作成了一个名为NuPoP的软件工具,作为R包供研究人员使用。";
<div id="content_part">
+
                    <h4 class="content_text_h2" style="font-size:18px;margin-top:20px;">4.4 Learn</h4>
+
                    <div class="content_div_text">
+
  
                      <p  id="p4_3"align="left">      In M8, CAT8 binding sites-3 was added between the original CAT8 binding sites-3 and ADR1 binding site, which may increase the interaction between CAT8 transcription factor and ADR1 transcription factor. In M9, CAT8 binding site-3 was added to the upstream of the original CAT8 binding site-3, The transcription factor cannot effectively bind to this site, or the add changes the surrounding sequence and may have some negative effects, such as destroying the original binding site of transcription factor with activation. As a result, the Valencene yield of the M9 mutant strain could not be further increased. In M10, the two CAT8 binding sites-3 was added around the ADR1 binding site, which may prevent the binding of ADR1 transcription factors and affect the interaction between CAT8 transcription factors and ADR1 transcription factors.    </p>
 
                     
 
                    </div>
 
                </div>
 
</div>
 
<hr class="hrmar"  id="content_part_h2" style="visibility:hidden;">
 
                <div id="hrmar">
 
                <div id="content_part">
 
                    <h2  class="content_text_h2">5. Summary</h2>
 
               
 
                    <div class="content_div_text">
 
                      <p  id="p4_4"align="left">      The above is the content of the promoter engineering in the wet lab. Through three rounds of "design-build-test-learn", we not only obtained M1-M10 promoters with different strengths, but also summarized the engineering experience from each learning. Our work is summarized as follows. The work presented here, and the promoters designed and engineered through it, represent an attempt to identify yeast regulatory elements that respond to a condition of interest by designing regulatory parts responsive to such a signal––diauxic shift in this case.    </p>
 
                      <div class="content_div_img">
 
                          <img src="https://static.igem.org/mediawiki/2021/d/d4/T--SCUT-China--Engineering-11.png">
 
                      </div>
 
                    </div>
 
                </div>
 
              </div>
 
</div>
 
                <hr class="hrmar"  id="content_part_h2" style="visibility:hidden;">
 
                <div id="hrmar">
 
                <div id="content_part">
 
                    <h2  class="content_text_h2">6. References</h2>
 
                    <div class="content_div_text">
 
                      <p>1. Haurie V, Perrot M, Mini T, Jenö P, Sagliocco F, Boucherie H. The transcriptional activator Cat8p provides a major contribution to the reprogramming of carbon metabolism during the diauxic shift in Saccharomyces cerevisiae[J]. The Journal of Biological Chemistry, 2001, 276(1): 76-85.    </p>
 
                      <p>2. Roth S, Kumme J, Schüller H-J. Transcriptional activators Cat8 and Sip4 discriminate between sequence variants of the carbon source-responsive promoter element in the yeast Saccharomyces cerevisiae[J]. Current Genetics, 2004, 45(3): 121-128.    </p>
 
                      <p>3. Young ET, Dombek KM, Tachibana C, Ideker T. Multiple pathways are co-regulated by the protein kinase Snf1 and the transcription factors Adr1 and Cat8[J]. The Journal of Biological Chemistry, 2003, 278(28): 26146-26158.    </p>
 
                      <p>4. Thukral SK, Eisen A, Young ET. Two monomers of yeast transcription factor ADR1 bind a palindromic sequence symmetrically to activate ADH2 expression[J]. Molecular and Cellular Biology, 1991, 11(3): 1566-1577.    </p>
 
                      <p>5. Walther K, Schüller HJ. Adr1 and Cat8 synergistically activate the glucose-regulated alcohol dehydrogenase gene ADH2 of the yeast Saccharomyces cerevisiae[J]. Microbiology (Reading, England), 2001, 147(Pt 8): 2037-2044.    </p>
 
                      <p>6. Segal E, Widom J. Poly(dA:dT) tracts: major determinants of nucleosome organization[J]. Current Opinion in Structural Biology, 2009, 19(1): 65-71.    </p>
 
                      <p>7. Anderson JD, Widom J. Poly(dA-dT) Promoter Elements Increase the Equilibrium Accessibility of  Nucleosomal DNA Target Sites[J]. Molecular and Cellular Biology, 2001, 21(11): 3830.    </p>
 
                      <p>8. Raveh-Sadka T, Levo M, Shabi U, Shany B, Keren L, Lotan-Pompan M, et al. Manipulating nucleosome disfavoring sequences allows fine-tune regulation of gene expression in yeast[J]. Nature Genetics, 2012, 44(7): 743-750.    </p>
 
  
                      </div>
+
              document.getElementById("p3_1").innerHTML="      我们设计在不破坏UAS序列和启动子核心区域的条件下,引入随机碱基突变,获得启动子序列的变体with不同的核小体亲和力。";
                </div>
+
              document.getElementById("p3_2").innerHTML="      之前已经有一个类似的迭代优化模型被提出,使用的方法在学术上称作“贪心算法”,具体步骤而言,是对初始启动子序列引入穷尽枚举的所有可能的随机单点突变,并利用NuPoP计算所有变体的核小体亲和力,将亲和力最低的一条序列作为优胜者筛出,作为母本进入到下一轮的随机突变枚举,依次循环迭代下去,最终得到分数难以再产生变化的理论最优序列。";
              </div>
+
              document.getElementById("p3_3").innerHTML="      但是这样的方法存在诸多弊端,首先1穷尽枚举法带来了非常大的时间复杂度,优化一条长度为800bp的序列需要耗时36个小时,其次2只采用单点突变以及贪心算法,非常容易使优化进入到局部最优解,假设某一碱基的突变是大幅度降低亲和力的一组突变中的其中一个碱基突变,但是单观察这个碱基突变所带来的收益是非常小的,这就容易导致这一更优情况被忽略,另外3已发表的研究指出,单纯的低亲和力并不与启动子的强度有着直接的联系,研究者推测,核小体亲和力的位置分布可能起到影响启动子强度的关键作用。";
            </div>
+
              document.getElementById("p3_4").innerHTML="      鉴于以上3点缺陷,我们设计出了能够弥补这些缺点,达到更好优化效果的NuPGO。我们的新策略是,将NuPoP与遗传算法with轮盘算子结合,突变的碱基数能够根据概率自收敛而不是每次只能引入单点随机突变。";
 +
              document.getElementById("p3_5").innerHTML="      贪婪算法和轮盘赌选择法,本质上都属于遗传算法,分别是其中的两种遗传算子(GA Operator),排序选择和轮盘选择(rank-based selection and roulette wheel selection)<br>     贪婪算法和轮盘赌遗传算法本质上都是进化算法。贪婪算法也在部分情况下表现得比其他进化算法更好。但是贪婪算法的优势其实在这里没有体现出来,简单枚举减少计算量但是增加了时间复杂度,而且得到的效果并不太好。遗传算法中的轮盘算子能够考虑多种进化方向,这一优点能够在启动子优化案例中很好地体现出来。";
 +
              document.getElementById("p3_6").innerHTML="      先前的算法的致命缺点就在于只选取了每轮中的表现最优者,作者设计时是为了减少穷尽枚举法的时间复杂度,以及追求最优的得分。但这样的算法非常容易让结果陷入局部最优解,例如,ABC三点突变比DEF三点突变所降低的亲和力要更多,但B位置的碱基单点突变所下降的核小体亲和力比D位置的碱基单点突变所下降的核小体亲和力少,贪婪算法会选择D位置的碱基突变策略。";
 +
              document.getElementById("p3_7").innerHTML="      而且在每轮中只引入随机单点突变,设置每轮的突变数为1,导致枚举的数量依旧非常大,时间复杂度没有得到足够的减轻。贪婪算法的执行非常耗费时间,在我们的测试中,一条长度为800bp的启动子就需要等待36小时的运行时间。";
 +
              document.getElementById("p3_8").innerHTML="      轮盘选择法也成为适应比例选择法(fitness proportionate selection,FPS),指的是个体的适应度直接根据正比关系得到被选择的概率,类似于将所有个体放在一个轮盘上,每个个体所占的面积与适应度成正比关系,即当轮盘转起来随机落在一个区域上时,被选中的概率由板块的面积所决定,即适应度与被选中的概率成正比关系,转动n次轮盘,选出n个个体,也意味着一个个体可以被重复选择。";
 +
              document.getElementById("p3_9").innerHTML="      对于每轮引入的随机碱基突变,我们将碱基突变数设置为自适应参数,使序列的每个碱基位置都是依概率突变,而不是从头到尾都是设定的1。为了保证优化的合理性和精确性,我们把突变概率的阈值设为0.0005,以让每轮中的碱基突变数不超过5。这样的设置使模型的时间复杂度大大削减,梯度下降时曲线会更快地收敛,还能够探索更多的组合可能,更逼近全局最优解。随着迭代的收敛,突变概率也会慢慢趋近于0,即达到最终的优化完成的序列。";
 +
              document.getElementById("p3_10").innerHTML="      起核心作用的UAS通常只有20bp左右的基序长度,我们最关键想要实现的,准确地说,应该是降低UAS所在区域的核小体亲和力,让UAS所在片段倾向于成为linker区而不是nucleosome区。</br>     算法的目标是尽可能降低整条启动子序列的核小体亲和力总分,有可能会忽略突变核心UAS周边而选择收益更大的突变,最终优化后得到的启动子可能并没有凸显UAS的优势。为了弥补这个缺点,我们决定人为地加入评分时的区域权重。根据上述公式我们可以得知,持续时间隐马尔可夫模型dHMM在计算核小体亲和力时是计算每一个单独碱基位置的核小体亲和力得分,再求取其AUC。</br>     因此我们可以在计算单点得分时让我们想要指定的特定区域,例如核心UAS所在的区域,获得一个自定义的缩放,分数与权重相乘后得到最终的单点位置得分。</br>";
 +
              document.getElementById("p3_11").innerHTML="    在NuPGO的算法上测试添加不同权重的结果是令人满意的,在给特定区域添加权重后,NuPGO会更重视这一区域,在最终的优化结果序列中这一特定区域会得到更好的核小体亲和力下降表现。";
 +
              document.getElementById("p3_11").innerHTML="      最终值得一提的是,无论是用于评估DNA的核小体亲和力的NuPoP软件程序,还是使用贪婪算法的迭代优化代码,由于网络限制以及贪婪算法部分内容缺失,根据论文提供的途径已经无法获取完整的可执行的代码文件,无法直接运行贪婪算法的优化程序,NuPoP的使用对于部分缺少计算机或生物信息学基础的人而言还是比较吃力的。/<br>      因此我们几乎完全重构了用于实现的代码,并写成了利用轮盘赌算子的遗传算法,并且比之前的代码更加易于阅读和user-friendly。无论是贪婪算法NucOpt Progs还是遗传算法NuPGO,我们都一并上传了完整的可执行代码集。</br>     这意味着没有生信基础的人、将来的iGEM团队或者是实验室里的研究人员,都可以根据教程轻易地上手开始使用我们的程序,直观且便捷地输入自己的初始数据和条件。";
  
            <div id="bottom_div">
 
              <div id="bottom_div_img">
 
              <!-- <img class="" src="img/bottom.png" alt=""> -->
 
              <div class="a_div">
 
                <a href="https://2021.igem.org/Team:SCUT-China"><div class="a_div"></div></a>
 
              </div>
 
            </div>
 
        </div>
 
          </div>
 
        </div>
 
      </div>
 
    </main>
 
  <!-- -0000000000000000000000000000000000000底栏000000000000000000000000000000000000000 -->
 
  <!-- 444444444444444444444444444444444444444444444444444444444444以上需要替换4444444444444444444444444444444444444444444444444444444444444444 -->
 
  
  <script src="https://2021.igem.org/Team:SCUT-China/Javascript/jquery?action=raw&ctype=text/javascript"></script>
+
              document.getElementById("p3_11").innerHTML="     非常可惜的是,经过NuPGO优化获得的启动子通常已经发生了几百bp的突变,需要通过化学合成方法得到新启动子基因片段。而且,只对最终的一条变体进行实验验证,在统计学上而言是不具有代表意义的。想要证明NuPGO所更新得到的启动子序列强度提升与预测的结果相吻合,需要挑选最少5~10条启动子变体序列,进行化学合成和表征实验验证其强度。由于项目经费的缺乏以及时间限制,我们无法对通过优化算法得到的序列进行表征实验验证其强度。</br>     如果条件允许,或者有其他的队伍对我们的模型感兴趣并想要验证NuPGO的效果,我们会采取或推荐在一次优化过程中,选取突变了50, 100, 200, 400, 500bp的变体进行表征实验测试启动子的强度,并且在500bp左右要多选取几条变体,如突变了499, 500, 501bp的3条启动子都挑选出来做表征实验,这样才有统计学意义。另外也可以根据软件生成的下降曲线,设定固定步长,如下降了50, 100, 150, … 500, 550核小体亲和力的变体序列,挑选出来进行表征实验。</br>      但我们有理由相信优化结果是相对可靠的,这是因为在我们的参考文献(图6)中作者进行了贪婪算法得到的启动子变体强度表征实验。我们可以从实验结果中得出结论,核小体亲和力优化确实能够提高启动子的强度从而增强表达。";
  <script type="text/javascript" src="https://2021.igem.org/Team:SCUT-China/Javascript/jquery/waypoints?action=raw&ctype=text/javascript"></script>
+
  
  <script type="text/javascript" src="https://2021.igem.org/Team:SCUT-China/Javascript/main-4?action=raw&ctype=text/javascript"></script>
+
              document.getElementById("p3_11").innerHTML="     我们把开源代码和软件包都放在了Github,由衷地并强烈地推荐每一个使用真核启动子的未来iGEM队伍或研究人员使用我们的傻瓜式软件获得更高表达、更合理利用UAS的强启动子。";
  
  
  </body>
+
              // -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 +
            }else{
 +
              document.getElementById("p1_1").innerHTML="      Most eukaryotic genomic DNA is wrapped in nucleosomes, which occlude and <span>strongly distort</span> the wrapped DNA. Transcription factor is a special protein that binds to a specific region of the gene to start transcription downstream. However, in the chromosomal state, the genomic DNA is tightly bound to the nucleosome. It can not reveal the <span>transcribed factor binding sites</span> (TFBS), the upstream activating sequence (UAS) for the promoter. Only when the nucleosome is <span>relaxed or not tightly bound</span> could TF have more opportunity to bind UAS and function effectively.";
 +
              document.getElementById("p1_2").innerHTML="      The affinity of nucleosomes to genomic DNA is determined by the <span>arrangement and combination</span> of sequence bases, so the affinity will be different if the sequence is slightly altered. Therefore, we hope to decrease the histone binding affinity score by introducing mutations to the sequence bases except for UAS and the core region. For this reason, a tool that can calculate the nucleosome affinity as an <span>evaluation function</span> to score the effect of the introduction of base fine-tuning is <span>in urgent need.</span>";
 +
              document.getElementById("p1_3").innerHTML="      Suppose there is a scoring tool that could calculate histone binding affinity theoretically. In that case, we could calculate nucleosome affinity for all the possible arrangements of bases in a <span>fixed length</span>. The promoter sequence in theory with the lowest nucleosome affinity sequence and highest strength could be <span>selected</span>. Of course, this approach has extremely high time complexity and low specificity; that's the reason we choose to combine genetic algorithm to optimize the promoter sequence iteratively.";
 +
              document.getElementById("p2_1").innerHTML="      The hidden Markov model (HMM) has been known for decades<sup>[1]</sup>.<br />"+"Given the characteristic of DNA sequence, and the accumulated pieces of evidence that DNA sequence itself can highly predict the localization of nucleosome in vivo, also the function of chromatin is closely related to the localization of nucleosome, scientists proposed using a <span>duration Hidden Markov Model</span> (dHMM) to predict the occupation of the corresponding nucleosomes in chromosomal DNA: divided into the <span>nucleosome (N) state </span>and<span> the linker (L) state</span>, where the length of the nucleosome state is fixed at 147 bp and the linker state is variable<sup>[2]</sup>. It is assumed that the two states must be alternately connected. A complete chromatin sequence must begin and end in the linker state, training a 4th order time-dependent Markov chain for the N state and a homogeneous 4th order Markov chain for the L state. <br />"+"In the detailed method, they let <span>e</span> = e<sub>1</sub>, ..., e<sub>147</sub> be corresponding to the nucleosome DNA sequence, and let P<sub>N</sub> be the probability of observing <span>e</span> as a nucleosome, computed as the product of probabilities for both Watson and Crick strands under the 4th order Markov Chain model. They assume that the linker DNA length of a given species has an unknown distribution F<sub>L</sub> (k) defined for k = 1, ..., τ<sub>L</sub> (the maximum linker length allowed). An observed linker DNA sequence <span>e</span> = e<sub>1</sub>, ..., e<sub>k</sub> carries two pieces of information, the length is k bp, and given which, the emitted letters are e<sub>1</sub>, ..., e<sub>k</sub> . Let G<sub>L</sub> (<span>e</span>|k) denote the homogeneous Markov chain model for the linker DNA (again including both strands). Then observing <span>e</span> as a linker DNA has probability.";
  
  <!-- 加载进度条 -->
+
              document.getElementById("p2_2").innerHTML="      Suppose <span>x</span> = x<sub>1</sub>, ..., x<sub>n</sub> is a genomic DNA sequence of length n, where x<sub>i</sub> = A/C/G/T. Let <span>z</span> = z<sub>1</sub>, ..., z<sub>n</sub> be the corresponding hidden state path, where z<sub>i</sub> = 1 if x<sub>i</sub> is covered by a nucleosome state, and 0 otherwise. Suppose that the path <span>z</span> = z<sub>1</sub>, ..., z<sub>n</sub> partitions <span>x</span> into k consecutive nucleosome or linker state blocks, in which the nucleosome blocks have a uniform length of 147 bp, whereas the length of linker blocks may vary. They denote these blocks as <span>y</span> = <span>y+</span>, ..., <span>y<sub>B</span></sub> , and their state identification as <span>s</span> = s<sub>1</sub>, ..., s<sub>B</sub> , where s<sub>i</sub> = 1 if <span>y<sub>i</span></sub> is nucleosome state, and 0 otherwise. The probability of observing (<span>x</span>, <span>z</span>) is given by";
  <!-- <script type="text/javascript" src="https://2021.igem.org/Template:SCUT-China/MyJs-right-card?action=raw&ctype=text/javascript"></script> -->
+
              document.getElementById("p2_3").innerHTML="      where π<sub>0</sub>(s<sub>1</sub>) and π<sub>e</sub>(s<sub>B</sub> ) stand for the probabilities that the chain initializes and ends with state s<sub>1</sub> and s<sub>k</sub> respectively, and I is an indicator function. Since they assume that a complete chromatin sequence must start with and end in a linker state, π<sub>0</sub>(s<sub>1</sub> = 0) = π<sub>e</sub> (s<sub>B</sub> = 0) = 1. Define the nucleosome occupancy at a specific position i, denoted o i , as the posterior probability that z<sub>i</sub> = 1, i.e.,";
  <!-- 避免JS运行时页面还没有加载出来,所以需要放在body后 -->
+
              document.getElementById("p2_4").innerHTML="      Most importantly, the histone binding affinity score at position i is defined as the log likelihood ratio for the region x<sub>i-73</sub>, ... x<sub>i</sub> , ..., x<sub>i+73</sub> to be a nucleosome vs. a linker, i.e.,";
 +
              document.getElementById("p2_5").innerHTML="      Given the models P<sub>N</sub>, G<sub>L</sub> and F<sub>L</sub>, the optimal path <span>z</span> can be found by the standard Viterbi algorithm, and the nucleosome occupancy score can be estimated using forward and backward algorithms<sup>[2]</sup>.";
 +
              document.getElementById("p2_6").innerHTML="      The model <span>has been trained</span> on a large number of yeast nucleosome DNA fragments dataset that have been pyrosequencing and on a species by species basis, and the test results are satisfactory<sup>[2]</sup>. They made the <span>nucleosome occupancy predicting tool</span> into a software tool called <span>NuPoP</span>, which means 'nucleosome position predicting', as an R language package for researchers to use.";
  
    <script type="text/javascript">
 
  var oB=document.getElementById("bg");
 
  var oL=document.getElementById("lt");
 
  // var oP=document.getElementsByClassName("content_div_text");
 
  var oP=document.querySelectorAll(".content_div_text p");
 
  var sum=15;
 
  function bigger(){
 
  sum+=2;
 
  var i;
 
      for (i = 0; i < oP.length; i++) {
 
          oP[i].style.fontSize=sum+"px";//在JS中不能出现“-”符号
 
      }
 
  }
 
  oB.onclick=bigger;
 
  
  function small(){
+
              document.getElementById("p3_1").innerHTML="      We designed to obtain the variants with different nucleosome affinity of the promoter sequence by introducing the limited amount of random bases <span>mutation</span> without destroying the UAS sequence and the core region of the promoter.";
  sum-=2;
+
              document.getElementById("p3_2").innerHTML="      Actually, a <span>similar</span> iterative optimization model has been proposed, called <span>NucOpt Progs</span>. The theory is technically known as a “<span>Greedy algorithm</span>”, in which all possible random single mutations are <span>enumerated</span> on the initial promoter sequence, and every single sequence variant was calculated the nucleosome affinity by NuPoP to obtain a score. After that, the sequence with the lowest affinity was selected as the winner in this round, then taken as the parent placing into the next round, calculating all mutation possible cases enumeration on this winner and vote in a new champion variant candidate. Finally, the theoretical best optimal sequence which is difficult to gain higher score is obtained<sup>[3]</sup>.";
  for (i = 0; i < oP.length; i++) {
+
              document.getElementById("p3_3").innerHTML="      However, after learning and testing NucOpt Progs, we found that this method has many <span>disadvantages</span>. First, the exhaustive enumeration method brings<span> immense time complexity</span>. It takes 36 hours to optimize a sequence of 800 bp length (based on the computer capacity). Secondly, single mutation only and greedy algorithm in use obviously determine that it's extraordinarily easy to get optimization fallen into a <span>local optimal solution</span>. For example, assuming that a single base change is a component of a group of mutations that significantly decrease histone affinity, but rarely calculating the benefit exactly this single base change could bring is unremarkable. In this case, the superior choice would easily be neglected. Thirdly, taking into account the biological effect, other published studies have shown that rarely low affinity is <span>not directly related</span> to the promoter's strength<sup>[3]</sup>. At the same time, the researchers speculate, the site distribution of nucleosome affinity may play a vital role in the promoter strength.";
      oP[i].style.fontSize=sum+"px";;//在JS中不能出现“-”符号
+
              document.getElementById("p3_4").innerHTML="     Given the above three defects, we have designed a software <span>NuPGO</span> which can make up for these shortcomings and achieve a <span>better optimization effect</span>. Our new strategy is to combine NuPoP with the genetic algorithm(Fig.1) using the <span>roulette wheel operator</span> so that the base number of mutations can realize <span>adaptive</span> convergence according to probability instead of only introducing a single random mutation at a time, which not only effectively reduce the time complexity but also take much more possibilities into account, escaping local optimal solution. Considering the biological effect, we additionally set the <span>user-defined region weights</span> especially, which enable the algorithm to place larger or smaller weights on the specific sequence region.";
      //alert(oP.length);
+
              document.getElementById("p3_5").innerHTML="      Greedy algorithm and roulette wheel selection algorithm are both essentially <span>evolutionary algorithms</span>. The greedy algorithm also performs better than other evolutionary algorithms in some cases. Still, its efficiency advantage brought by enumeration couldn't display here, while the <span>multi-factor considering</span> ability in wheel selection algorithm stands out.";
  }
+
              document.getElementById("p3_6").innerHTML="      The fatal disadvantage of the previous promoter optimizing algorithm is that only the<span> best performer</span> is selected in each round. Perhaps the author aimed to reduce the time complexity of exhaustive enumeration and simply pursue the best score. But such algorithms can easily trap the results into <span>local optima</span>. For example, the three-point mutation on site ABC decrease the histone affinity more than the three-point mutation on site DEF. However, the affinity decrease of site B mutation is smaller than that of site D mutation, so the greedy algorithm will choose site D mutation strategy as the winner, while site B strategy is obviously better actually.";
 +
              document.getElementById("p3_7").innerHTML="      Moreover, only a random single mutation is introduced in each round. In other words, the mutation number is set to 1, which leads to gigantic amounts of <span>enumerations</span>, and the time complexity has not been sufficiently reduced. The execution of the greedy algorithm is time-consuming. In our test, an 800bp length promoter had to cost 36 hours to run. ";
 +
              document.getElementById("p3_8").innerHTML="      The wheel selection method is also known as the fitness proportionate selection (FPS), which means that <span>the fitness of an individual is directly proportional to the probability of being selected</span>, similar to putting all the individuals on a wheel, the area occupied by each individual is directly proportional to its fitness, that is, the probability of being selected is determined by the area of the plate, namely the fittness of each individual when the wheel spins randomly.<br>      In other words, the fitness is directly proportional to the probability of being selected, turning the wheel N times to <span>select N </span><span>individuals</span>. Meanwhile, it also means that an individual can be selected repeatedly.";
 +
              document.getElementById("p3_9").innerHTML="      For random bases mutations introduced in each round, we set the mutation amount as an <span>adaptive</span> parameter. Each base position in the sequence is a probability mutation rather than the unintelligent setting the amount as '1' from beginning to end. To ensure the rationality and accuracy of the optimization, we set the threshold of <span>mutation probability</span> to 0.0005 so that the number of base mutations in each round would<span> not exceed</span> 5. Of course, other choices are worth expanding. By doing this, the <span>time complexity</span> of the model is reduced to some degree, the curve will <span>converge faster</span> when performing the gradient descends, broadening the search, and <span>more combinations</span> would be explored to approach the global optimal solution. As the iteration converges, the mutation probability will gradually approach 0, that is, reaching the final optimized promoter sequence(Fig.3).";
 +
              document.getElementById("p3_10").innerHTML="      The critical UAS are usually only about 10 bp in length, and the vital goal we want to achieve, precisely speaking, is to reduce the nucleosome affinity around the <span>UAS region</span> and to make the UAS region tend to become the linker region rather than nucleosome region.</br>      The final goal of the algorithm is to decrease the total affinity score of the whole promoter sequence as far as possible, which is <span>likely to ignore</span> the key UAS region mutation and choose other strategies with more significant benefit, the final optimized promoter obtained in such condition may not make use of the advantages of UAS. Specifically, the key UAS is still in the high histone affinity region. To compensate for this shortcoming, we decided to <span>artificially add a regional weight </span>to the score calculation process. According to the formula introduced above, dHMM calculates the nucleosome affinity score for every single site, and then calculates the area under the curve (AUC). </br>      To make some adjustments, we decide to compute the score of every single position with a custom zoom when calculating the specific area that we regard special, such as the area of the crucial UAS, in other words, multiplying the simply calculated score by the user-defined weight to get each final single point score.<br>";
 +
              document.getElementById("p3_11").innerHTML="      Fortunately, the results of testing different weights on the NuPGO algorithm are <span>satisfactory</span>. After adding weights to a specific area, NuPGO will pay <span>more attention </span>to that area. In the final optimized result sequence, this specific region will get better performance of nucleosome affinity degradation.";
 +
              document.getElementById("p3_12").innerHTML="      Ultimately, and most worth noting is that, neither the NuPoP software program which can evaluate DNA’s nucleosome affinity<sup>[2]</sup> nor the iterative optimization code<sup>[3]</sup> that uses the greedy algorithm, because of the <span>components missing </span>and internet limitation, following the tutorials provided in the papers, we couldn’t get the complete executable code file and failed to run the optimizing software with greedy algorithm directly, as well as other researchers. It means the use of NuPoP and greedy algorithm optimizer have been difficult for those who lack the computer or bioinformatics study infrastructure.</br>      Therefore, we <span>refactored</span> the implementation code almost completely according to some of its logic, and wrote the genetic algorithm using the roulette wheel operator, which is much easier to read and user-friendly much more than the previous code. In other words, we <span>fix the algorithm published before</span> and come up with a <span>more novel and intelligent</span> software. Both the greedy algorithm optimizer <span>NucOpt Progs</span> and genetic algorithm optimizer <span>NuPGO</span> are published by us for users’ convenient.</br>      This means that <span>anyone</span>, especially iGEM teams in the future and researchers in labs, can <span>easily get started</span> using our program based on a tutorial. <span>Input your initial data</span> and conditions visually and easily and it will be on. Those interested are also welcomed to read our source code and make some comments or modifications, and maybe a stronger NuPGO will be come up with next year.";
  
  }
+
                document.getElementById("p4_1").innerHTML="      Unfortunately, the NuPGO-optimized promoter has some mutation of around hundreds of bp, requiring chemical synthesis to obtain the new promoter gene fragment. Moreover, it is not statistically representative of testing only the final variant. To prove that the strength of the NuPGO optimized promoter sequence is consistent with the predicted results, we need to select at least 5 to 10 promoter variants, and conduct chemical synthesis and characterization experiments to verify their strength. Due to the lack of project funds and time constraints, we can not perform characterization experiments to ascertain the strength of the sequence obtained by the optimization algorithm. </br>     If conditions permit or any other team was interested in verifying the effect of NuPGO, we would like to or recommend selecting the promoter variants with 50, 100, 200, 400, 500bp mutations in the same run, also pick three variants at each of the position, such as 499, 500, 501bp. Then design the parallel characterization experiments to see the strength difference. The descent curve could also be consulted to obtain the selection choice strategy.</br>      Nevertheless, we have reason to believe that the results of the optimization are relatively reliable, because, in our references(Fig.6), the authors conducted experiments on the strength characterization of the promoter variants obtained by the greedy algorithm. We can conclude from the experimental results that nucleosome affinity optimization indeed can increase the promoter's strength and thus enhance expression.";
  oL.onclick=small;
+
  </script>
+
  
  <script type="text/javascript" src="https://2021.igem.org/Template:SCUT-China/Myjs-Lunbo?action=raw&ctype=text/javascript"></script>
+
              document.getElementById("p2_1").innerHTML="     We publish our source code and packages on Github. We extraordinarily sincerely and strongly recommend that every future iGEM team or researcher using a eukaryotic promoter use our software to achieve a higher expressing level and rational use of UAS strong promoters.";
 +
            }
  
  
<script type="text/javascript" src="https://2021.igem.org/Template:SCUT-China/Myjs-on-55?action=raw&ctype=text/javascript"></script>
+
          }
 +
          var btn3=document.getElementById("cl");
 +
          btn3.onclick=changeLanguage;
  
 +
        </script>
 +
        <!-- 避免JS运行时页面还没有加载出来,所以需要放在body后 -->
 +
          <script type="text/javascript">
  
 +
        var oB=document.getElementById("bg");
 +
        var oL=document.getElementById("lt");
 +
        var oP=document.getElementsByClassName("content_div_text");
 +
        var sum=15;
 +
        function bigger(){
 +
        sum+=2;
 +
        var i;
 +
            for (i = 0; i < oP.length; i++) {
 +
                oP[i].style.fontSize=sum+"px";//在JS中不能出现“-”符号
 +
            }
 +
        }
 +
        oB.οnclick=bigger;
  
<!--  以下需要替换144444444444444444444444444444444444444444444441切换语言,不能统一放到网页上444444444444444444444444444444444444444444444444444444444444444444444444444 -->
+
        function small(){
<script type="text/javascript">
+
        sum-=2;
var count=0;
+
        for (i = 0; i < oP.length; i++) {
// var oP=document.getElementsByClassName("content_bglt");
+
            oP[i].style.fontSize=sum+"px";;//在JS中不能出现“-”符号
function changeLanguage(){
+
            // alert(oP.length);
// alert("点击事件发生");
+
        }
count++;
+
if(count%2==1){
+
  document.getElementById("p1").innerHTML="      作为一只参与合成生物学竞赛的iGEM队伍 ,我们将按照“设计-建造-测试-学习”的工程理念开展我们的启动子改造工程。在这个过程中,我们采取了<span>“循环学习”</span>的改造思路,基于每一轮改造实验所学习到的经验,接着开展下一轮的改造,以此类推,最终得到理想的改造结果。我们善于从每一次的改造结果中总结经验,善于从已有研究中探寻原理,以便为之后的理性设计提供更多经验。";
+
  
  document.getElementById("p2_1").innerHTML="      利用YeTFaSCo数据库(http://yetfasco.ccbr.utoronto.ca/scanSeqs.php)预测PPDC1、P<sub>SED1L</sub>、P<sub>ALD4</sub>的转录因子结合位点(TFBs)。所选版本:1.02,结果有100多个假定的TFBSs,匹配分数高于0.75。为了避免使用单个数据库的不完全性,提高预测的准确性,还使用了YEASTRACT数据库(http://www.yeastract.com/formtfsbindingsites.php)进行预测和比较。根据功能将这些TFBSs分为五组。";
+
        }
  document.getElementById("p2_2").innerHTML="      根据P<sub>SED1L</sub>和P<sub>ALD4</sub>的特性来控制瓦伦西亚烯的合成,主要表达在发酵后期。我们合理选择了CAT8p和ADR1p的转录因子结合位点进行启动子修饰。";
+
        oL.οnclick=small;
  document.getElementById("p2_3").innerHTML="      CAT8p是一种锌簇转录激活因子,调节糖异生、乙醇利用和乙醛酸循环[1]中大多数基因的表达。CAT8p与碳源响应元件结合,葡萄糖消耗后激活靶基因,结合基序为5 ' -YCCNYTNRKCCG-3 '[2]。";
+
        </script>
  document.getElementById("p2_4").innerHTML="      ADR1p是碳源响应型锌指转录因子,首次被鉴定为激活乙醇脱氢酶基因ADH2转录的转录因子。它还能激活与葡萄糖发酵、甘油代谢和脂肪酸利用有关的基因。结合基序为5 ' -TTGGRG-3 '。此外[3-4],ADR1p和CAT8p可能在激活某些基因[5]的转录时相互作用。";
+
  <!-- 444444444444444444444444444444444444444444444444444444444444444444444以上需要替换444444444444444444444444444444444444444444444444444444444444444 -->
  document.getElementById("p2_5").innerHTML="       在细胞中,转录是一个三维调控的过程。转录因子结合启动子具有位置效应。当转录因子结合位点位于核小体亲和度高的位置时,由于空间位阻[6],转录因子可能无法与该位点结合。为了找出PDC1的哪个位置可以使添加的UAS(上游激活序列)避免空间位阻的影响。      报道的UAS1和数据库预测的100%结合UAS2被添加到PDC1的两个关键位置,它可以正常结合转录因子并激活。UAS1是FBP1启动子的CAT8结合位点,UAS2是预测在ALD4启动子中100%结合的ADR1p结合位点。研究表明,CAT8p和ADR1p之间可能存在积极的相互作用,所以我们增加了UAS1和UAS2,两者之间的距离更近。(图1)";
+
  <!-- 444444444444444444444444444444444444444444444444444444444444444444444以上需要替换444444444444444444444444444444444444444444444444444444444444444 -->
  document.getElementById("p2_5_1").innerHTML="     考虑到启动子结构的限制,(a)避免PDC1启动子激活的其他预测假定TFBSs的变化,(b)确定靠近核心启动子的CAT8p和ADR1p结合位点的位置,(c)将现有的结合位点替换为新的结合位点,减少对局部序列上下文的更改。";
+
  <!-- -0000000000000000000000000000000000000加载动图代码000000000000000000000000000000000000000 -->
  document.getElementById("p2_5_2").innerHTML="    利用重叠PCR方法替换PDC1启动子关键位置的序列。以M1突变株的构建为例,M2突变株的构建方法相同。通过文献找到FBP1启动子的CAT8结合位点和数据库预测的ALD4启动子的ADR1结合位点(表2)。";
+
<script type="text/javascript" src="https://2021.igem.org/Template:SCUT-China/Myjs-on-555?action=raw&ctype=text/javascript"></script>
  document.getElementById("p2_5_3").innerHTML="    设计同源臂上的结合位点,(1)利用引物M1- f /PDC1-R扩增片段1,引物M1- r /PDC1-F扩增片段2,(2)组装片段1和片段2得到M1启动子,(3)利用引物PDC1-F/PDC1-R扩增M1启动子。利用BamHI和XbaI酶切酶消化M1启动子和YEp181-PDC1p-VS-SAG1t载体,37℃反应2h。反应结束后,将启动子与载体进行纯化,16℃过夜连接。转化大肠杆菌DH5α,选择正确的转化子,提取YEp181-M1-VS-SAG1t质粒。";
+
  <!-- -0000000000000000000000000000000000000加载动图代码000000000000000000000000000000000000000 -->
  document.getElementById("p2_6").innerHTML="      然后,使用donor DNA引物(表4)扩增M1-VS-SAG1t表达盒,在LEU2位点旁边有一个同源臂。将构建的gRNA表达质粒和M1-VS-SAG1t表达盒转化到能表达Cas9蛋白的酵母细胞中,采用营养缺陷培养基选择和菌落PCR验证的方法,获得成功敲入M1-VS-SAG1t表达盒的菌株。";
+
  
  document.getElementById("p2_9").innerHTML="      在M2的改造中,UAS被添加在偏向于P<sub>PDC1</sub>上游,大约20 bp的位置,有连续17 bp / T着重的位置附近,这可能导致这个位置的结构相对松散,[7]核小体亲和率低,无法形成核小体结构,转录因子与该位点结合的可能性较大,并在激活转录[8]中发挥作用。在M1改造中,在PDC1基因上游区域的-393 ~ -450之间添加了UAS,未起作用。可能的原因是该位点靠近原始激活区,且有密集的转录因子结合,阻碍了转录因子与添加的UAS的结合。这意味着我们应该在M2的基础上进行进一步改造。";
 
  document.getElementById("p3_1").innerHTML="      利用YeTFaSCo数据库和YEASTRACT数据库预测SED1L启动子、ALD4启动子的CAT8p结合位点和ADR1p结合位点,并选择系统评分为0.85及以上的数据。根据第一轮转化的结果,在M2的基础上,将CAT8p结合位点替换为预测的P<sub>SED1L</sub>和P<sub>ALD4</sub>的CAT8p结合位点,但原ADR1p结合序列没有改变。(图3)";
 
  document.getElementById("p3_2_1").innerHTML="    以M3突变株的构建为例,M4-M7菌株的构建方法相同。查找数据库预测的CAT8结合位点motif和ADR1结合位点motif(表5)。<br><br>设计同源臂上的结合位点,(1)利用引物M3- f /PDC1-R扩增片段1,引物M3- r /PDC1-F扩增片段2,(2)组装片段1和片段2得到M3启动子,(3)利用引物PDC1-F/PDC1-R扩增M3启动子。利用BamHI和XbaI酶切酶消化M3启动子和YEp181-PDC1p-VS-SAG1t载体,37℃反应2h。反应结束后,将启动子与载体进行纯化,16℃过夜连接。转化大肠杆菌DH5α,选择正确的转化子,提取YEp181-M3-VS-SAG1t质粒。<br><br>然后,使用donor DNA引物(表4)扩增M3-VS-SAG1t表达盒,在LEU2位点旁边有一个同源臂。将构建的gRNA表达质粒和M3-VS-SAG1t表达盒转化到能表达Cas9蛋白的酵母细胞中,采用营养缺陷培养基选择和菌落PCR验证的方法,获得成功敲入M3-VS-SAG1t表达盒的菌株。";
 
 
  document.getElementById("p3_3_0").innerHTML="    该菌株接种到10 ml发酵液中,控制发酵液初始OD600为0.05,置于30、220 rpm的摇瓶中孵育64小时。摇瓶发酵64小时后,用气相色谱法检测瓦伦西亚烯的浓度。(图4)";
 
  document.getElementById("p3_3").innerHTML="      这轮改造后的的启动子表现出不同的强度。M3、M4、M6和M7突变株的瓦伦西亚烯产量与对照无显著差异。而M5突变株的瓦伦西亚烯产量比对照株提高29.5% (5.58 mg/L)。";
 
  document.getElementById("p3_4").innerHTML="      在M5中,CAT8结合位点-3来源于SED1L启动子,与本文报道的来自P<sub>FBP1</sub>的CAT8结合位点相比,其性能更优越。并且我们观察到只有CAT8结合位点-3具有这种作用,其他的CAT8结合位点可能是数据库预测的假阳性数据。这表明我们可以继续在M5中添加CAT8结合位点-3,以提高启动子的表达强度。";
 
  document.getElementById("p4_1").innerHTML="      在M5的基础上继续从P<sub>SED1L</sub>添加CAT8结合位点-3,在两个位点上添加,第一个位点添加在原ADR1结合位点的上游,第二个位点添加在ADR1结合位点之间的CAT8结合位点-3。(图5)";
 
  document.getElementById("p4_2_1").innerHTML="    以M8突变株的构建为例,M9菌株的构建方法与之相同。M10是在M8或M9的基础上构建的。在同源臂上设计CAT8结合位点motif,(1)利用引物M8- f /PDC1-R扩增片段1,引物M8- r /PDC1-F扩增片段2(表7),(2)组装片段1和片段2得到M8启动子,(3)利用引物PDC1-F/PDC1-R扩增M8启动子。利用BamHI和XbaI酶切酶消化M8启动子和YEp181-PDC1p-VS-SAG1t载体,在37℃反应2h。反应结束后,启动子和载体被纯化并在16c连接过夜。转化大肠杆菌DH5α,选择正确的转化子,提取YEp181-M8-VS-SAG1t质粒。然后,使用donor DNA引物(表4)扩增M8-VS-SAG1t表达盒,在LEU2位点旁边有一个同源臂。将构建的gRNA表达质粒和M8-VS-SAG1t表达盒转化到能表达Cas9蛋白的酵母细胞中,采用营养缺陷培养基选择和菌落PCR验证的方法,获得成功敲入M8-VS-SAG1t表达盒的菌株。";
 
  document.getElementById("p4_3_1").innerHTML="    将该菌株接种到10 ml发酵液中,控制发酵液初始OD600为0.05,置于30、220 rpm的摇瓶中孵育64小时。摇瓶发酵64小时后,用气相色谱法检测瓦伦西亚烯的浓度。(图6)";
 
  document.getElementById("p4_2").innerHTML="      M9和M10突变株的瓦伦西亚烯产量与对照相比没有显著提高。M8突变株的瓦伦烯产量比M5突变株提高了27.0% (6.75 mg/L),比未突变株提高了56.6% (6.75 mg/L)。";
 
  document.getElementById("p4_3").innerHTML="      在M8的改造中,原有的CAT8结合位点-3与ADR1结合位点之间增加了CAT8结合位点-3,这可能增加了CAT8转录因子与ADR1转录因子之间的相互作用。在M9的改造中,在原CAT8结合位点-3的上游添加了CAT8结合位点-3,导致转录因子不能有效地与该位点结合,或者添加改变了周围的序列,可能会产生一些负面影响,如激活后破坏了转录因子原有的结合位点。结果表明,M9突变株的瓦伦西亚烯产量不能进一步提高。在M10的改造中,在ADR1结合位点周围添加了两个CAT8结合位点-3,这可能会阻止ADR1转录因子的结合,影响CAT8转录因子与ADR1转录因子的相互作用。";
 
 
}else{
 
  document.getElementById("p1").innerHTML="      As a team participating in the synthetic Biology Competition, we plan to follow the engineering philosophy of "design-build-test-learn" to carry out our promoter engineering projects. In this process, we apply the “learning by doing” approach. Based on the experience learned from each round of modification, then we continue to design the next round of modification, and finally get the ideal result. We are good at summarizing experience from each modification and some exploring principle from existing research, so as to provide more experience for rational design of promoters in the future.";
 
  document.getElementById("p2_1").innerHTML="      The YeTFaSCo database were used to predict the transcription factor binding sites (TFBS) of P<SUB>PDC1</SUB>, P<SUB>SED1L</SUB>, P<sub>ALD4</sub>. Selected Version: 1.02 and resulted in more than 100 putative TFBSs with a matching score higher than 0.75. These TFBSs were classified into five groups according to the function (Table 1)";
 
  document.getElementById("p2_2").innerHTML="      According to the characteristics of P<sub>SED1L</sub> and P<sub>ALD4</sub> to control the synthesis of Valencene, which mainly expressed in the later stage of fermentation. We rationally selected the transcription factor binding sites of CAT8p and ADR1p for promoter modification.";
 
  document.getElementById("p2_3").innerHTML="      CAT8p is a Zinc cluster transcriptional activator, which regulate the expression include of most genes in gluconeogenesis, ethanol utilization and glyoxylate cycle<sup> [1]</sup>. CAT8p can bind to carbon source response elements and activate target genes after glucose consumption, the binding motif is 5′-YCCNYTNRKCCG-3′<sup> [2]</sup>. ";
 
  document.getElementById("p2_4").innerHTML="    ADR1p is a Carbon source-responsive zinc-finger transcription factor, which first identified as a transcription factor activate the transcription of the alcohol dehydrogenase gene ADH2. It also activates genes involved glucose fermentation, glycerol metabolism and fatty acid utilization. The binding motif is 5′-TTGGRG-3′. In addition<sup> [3-4]</sup>, ADR1p and CAT8p may interact when activating the transcription of certain genes<sup> [5]</sup>.";
 
  document.getElementById("p2_5").innerHTML="    In cells, transcription is a process of three-dimensional regulation. The transcription factor binding promoter has a positional effect. When the transcription factor binding site is at a position with high nucleosome affinity, the transcription factor may not be able to bind to the site due to steric hindrance<sup> [6]</sup>. In order to find out which position of PDC1 can make the added UAS (upstream activation sequences) avoid the effect of steric hindrance. The reported UAS1 and the database predicted 100% binding UAS2 were added to the two key positions of PDC1, which can normally bind to transcription factors and activate. UAS1 is the CAT8 binding site of the FBP1 promoter, and UAS2 is the ADR1p binding site predicted to be 100% bound in the ALD4 promoter. Studies have shown that there may be a positive interaction between CAT8p and ADR1p, so we added UAS1 and UAS2 closer to each other. (Figure 1)";
 
  document.getElementById("p2_5_1").innerHTML="    Considering promoter architecture constraints, (a) avoiding a change on other Predicted putative TFBSs for the activation of PDC1 promoter, (b) determining positions for CAT8p and ADR1p binding sites that are close proximity to the core promoter, (c) existing binding sites were replaced with new ones, reduce the changes to the local sequence context.";
 
  document.getElementById("p2_5_2").innerHTML="    The strain was inoculated into 10 ml of fermentation broth, the initial OD600 of the fermentation broth was controlled to be 0.05, and it was placed in a shaker at 30℃, 220 rpm for 64 hours. After 64 hours of shake flask fermentation, the concentration of valencene was detected by gas chromatography. (Figure 2)";
 
  document.getElementById("p2_5_3").innerHTML="    The addition of CAT8 and ADR1 transcription factor binding sites at different positions shows different effects. No significant differences in Valencene production in M1, P<sub>PDC1</sub> were observed. The valencene production of the M2 mutant strain was increased by 18.9% compared to the original strain (P<sub>PDC1</sub>).";
 
 
 
  document.getElementById("p2_6").innerHTML="      Using Overlap PCR method, the sequence at the key position of the PDC1 promoter is replaced. Taking the construction of the M1 mutant strain as an example, the construction method of the M2 strain was the same. Find the CAT8 binding site of FBP1 promoter through the literature and the ADR1 binding site from ALD4 promoter predicted by the database (Table 2).<br><br>Design the binding site motifs on the homology arm, (1) Using primer M1-F/PDC1-R to amplify fragment one, primer M1-R/PDC1-F to amplify fragment two, (2) assemble fragment 1 and fragment 2 to obtain M1 promoter, (3) Using primer PDC1-F/PDC1-R amplifies M1 promoter. Using BamHI and XbaI restriction enzymes to digest the M1 promoter and YEp181-PDC1p-VS-SAG1t vector, and react at 37°C for 2h. After the reaction, the promoter and vector were purified and ligated at 16°C overnight. Transform into E. coli DH5α, pick the correct transformants, and extract the YEp181-M1-VS-SAG1t plasmid.<br><br></br>Then, use donor DNA primers (Table4) to amplify the M1-VS-SAG1t expression cassette with a homology arm next to the LEU2 site. Transform the constructed gRNA expression plasmid and M1-VS-SAG1t expression cassette into yeast cells which can express Cas9 protein, and use the method of auxotrophic medium selection and colony PCR verification to obtain the strains that successfully knocked into the M1-VS-SAG1t expression cassette.";
 
  document.getElementById("p2_9").innerHTML="      In M2, UAS is added about 20 bp upstream of TATA-box located on P<sub>PDC1</sub>, and there are continuous 17 bp A/T near the position of TATA-box, which may cause the structure of this position to be relatively loose and the nucleosome affinity rate Low <sup>[7]</sup>, unable to form a nucleosome structure, transcription factors have a greater probability of binding to this site, and play a role in activating transcription <sup>[8]</sup>. In M1, UAS was added between -393 and -450 located in the upstream region of the PDC1 gene, and it did not work. The possible reason is that it is close to the original activation region, and there is dense transcription factor binding at this site, which hinders the combination of transcription factors to the added UAS. This suggests that we should transform on the basis of M2.";
 
  document.getElementById("p3_1").innerHTML="      The YeTFaSCo database and YEASTRACT database were used to predict the CAT8p binding site and ADR1p binding site of SED1L promoter, ALD4 promoter, and select data with a system score of 0.85 or more. According to the results of the first round of transformation, the CAT8p binding site was replaced with the predicted CAT8p binding site of P<sub>SED1L</sub> and P<sub>ALD4</sub> on the basis of M2, but the original ADR1p binding sequence was not changed. (Figure 3)";
 
  document.getElementById("p3_2_1").innerHTML="    Taking the construction of the M3 mutant strain as an example, the construction method of the M4-M7 strain was the same. Find the CAT8 binding site motifs and ADR1 binding site motif predicted by the database (Table 5).<br><br>Design the binding site motifs on the homology arm, (1) Using primer M3-F/PDC1-R to amplify fragment one, primer M3-R/PDC1-F to amplify fragment two, (2) assemble fragment 1 and fragment 2 to obtain M3 promoter, (3) Using primer PDC1-F/PDC1-R amplifies M3 promoter. Using BamHI and XbaI restriction enzymes to digest the M3 promoter and YEp181-PDC1p-VS-SAG1t vector, and react at 37°C for 2h. After the reaction, the promoter and vector were purified and ligated at 16°C overnight. Transform into E. coli DH5α, pick the correct transformants, and extract the YEp181-M3-VS-SAG1t plasmid.<br><br>Then, use donor DNA primers (Table 4) to amplify the M3-VS-SAG1t expression cassette with a homology arm next to the LEU2 site. Transform the constructed gRNA expression plasmid and M3-VS-SAG1t expression cassette into yeast cells which can express Cas9 protein, and use the method of auxotrophic medium selection and colony PCR verification to obtain the strains that successfully knocked into the M3-VS-SAG1t expression cassette.";
 
 
  document.getElementById("p3_3_0").innerHTML="    The strain was inoculated into 10 ml of fermentation broth, the initial OD600 of the fermentation broth was controlled to be 0.05, and it was placed in a shaker at 30℃, 220 rpm for 64 hours. After 64 hours of shake flask fermentation, the concentration of valencene was detected by gas chromatography. (Figure 4)";
 
  document.getElementById("p3_3").innerHTML="      The modified promoters showed different intensities. The Valencene yield of M3, M4, M6, and M7 mutant strains was not significantly different from that of the control strain. But the Valencene yield of the M5 mutant strain was increased by 29.5% (5.58 mg/L) compared to the control strain.";
 
  document.getElementById("p3_4").innerHTML="      In M5, the CAT8 binding site-3 is derived from the SED1L promoter, which has superior performance compared to the CAT8 binding site from P<sub>FBP1</sub> reported in the article. And we observed that only CAT8 binding site-3 has such an effect, other CAT8 binding sites may be false positive data predicted by the database. This suggests that we can continue to add CAT8 binding site-3 in M5 to increase the expression strength of the promoter.";
 
  document.getElementById("p4_1").innerHTML="      Continue to add CAT8 binding site-3 from P<sub>SED1L</sub> on the basis of M5, add in two locations, the first is added upstream of the original ADR1 binding site, and the second is added to the CAT8 binding site-3 between ADR1 binding site. (Figure 5)";
 
  document.getElementById("p4_2_1").innerHTML="    Taking the construction of the M8 mutant strain as an example, the construction method of the M9 strain was the same. M10 is constructed on the basis of M8 or M9.<br><br>Design the CAT8 binding site motif on the homology arm, (1) Using primer M8-F/PDC1-R to amplify fragment one, primer M8-R/PDC1-F to amplify fragment two (Table 7), (2) assemble fragment 1 and fragment 2 to obtain M8 promoter, (3) Using primer PDC1-F/PDC1-R amplifies M8 promoter. Using BamHI and XbaI restriction enzymes to digest the M8 promoter and YEp181-PDC1p-VS-SAG1t vector, and react at 37°C for 2h. After the reaction, the promoter and vector were purified and ligated at 16°C overnight. Transform into E. coli DH5α, pick the correct transformants, and extract the YEp181-M8-VS-SAG1t plasmid.<br><br>Then, use donor DNA primers (Table 4) to amplify the M8-VS-SAG1t expression cassette with a homology arm next to the LEU2 site. Transform the constructed gRNA expression plasmid and M8-VS-SAG1t expression cassette into yeast cells which can express Cas9 protein, and use the method of auxotrophic medium selection and colony PCR verification to obtain the strains that successfully knocked into the M8-VS-SAG1t expression cassette.";
 
 
  document.getElementById("p4_3_1").innerHTML="    The strain was inoculated into 10 ml of fermentation broth, the initial OD600 of the fermentation broth was controlled to be 0.05, and it was placed in a shaker at 30℃, 220 rpm for 64 hours. After 64 hours of shake flask fermentation, the concentration of valencene was detected by gas chromatography. (Figure 6)";
 
  document.getElementById("p4_2").innerHTML="      The Valencene yield of M9 and M10 mutant strains was not significantly improved compared to control strain。However, the Valencene yield of the M8 mutant strain was increased by 27.0% (6.75 mg/L) compared to M5 strain, and 56.6% (6.75 mg/L) compared with the unmodified strain.";
 
  document.getElementById("p4_3").innerHTML="      In M8, CAT8 binding sites-3 was added between the original CAT8 binding sites-3 and ADR1 binding site, which may increase the interaction between CAT8 transcription factor and ADR1 transcription factor. In M9, CAT8 binding site-3 was added to the upstream of the original CAT8 binding site-3, The transcription factor cannot effectively bind to this site, or the add changes the surrounding sequence and may have some negative effects, such as destroying the original binding site of transcription factor with activation. As a result, the Valencene yield of the M9 mutant strain could not be further increased. In M10, the two CAT8 binding sites-3 was added around the ADR1 binding site, which may prevent the binding of ADR1 transcription factors and affect the interaction between CAT8 transcription factors and ADR1 transcription factors.";
 
}
 
}
 
var btn3=document.getElementById("cl");
 
btn3.onclick=changeLanguage;
 
 
</script>
 
  <!-- 444444444444444444444444444444444444444444444444444444444444444444444以上需要替换444444444444444444444444444444444444444444444444444444444444444 -->
 
 
   </html>
 
   </html>

Revision as of 21:11, 21 October 2021


Loading...

Background

Most eukaryotic genomic DNA is wrapped in nucleosomes, which occlude and strongly distort the wrapped DNA. Transcription factor is a special protein that binds to a specific region of the gene to start transcription downstream. However, in the chromosomal state, the genomic DNA is tightly bound to the nucleosome. It can not reveal the transcribed factor binding sites (TFBS), the upstream activating sequence (UAS) for the promoter. Only when the nucleosome is relaxed or not tightly bound could TF have more opportunity to bind UAS and function effectively.

The affinity of nucleosomes to genomic DNA is determined by the arrangement and combination of sequence bases, so the affinity will be different if the sequence is slightly altered. Therefore, we hope to decrease the histone binding affinity score by introducing mutations to the sequence bases except for UAS and the core region. For this reason, a tool that can calculate the nucleosome affinity as an evaluation function to score the effect of the introduction of base fine-tuning is in urgent need.

Suppose there is a scoring tool that could calculate histone binding affinity theoretically. In that case, we could calculate nucleosome affinity for all the possible arrangements of bases in a fixed length. The promoter sequence in theory with the lowest nucleosome affinity sequence and highest strength could be selected. Of course, this approach has extremely high time complexity and low specificity; that's the reason we choose to combine genetic algorithm to optimize the promoter sequence iteratively.

Fundamental

The hidden Markov model (HMM) has been known for decades[1].
Given the characteristic of DNA sequence, and the accumulated pieces of evidence that DNA sequence itself can highly predict the localization of nucleosome in vivo, also the function of chromatin is closely related to the localization of nucleosome, scientists proposed using a duration Hidden Markov Model (dHMM) to predict the occupation of the corresponding nucleosomes in chromosomal DNA: divided into the nucleosome (N) state and the linker (L) state, where the length of the nucleosome state is fixed at 147 bp and the linker state is variable[2]. It is assumed that the two states must be alternately connected. A complete chromatin sequence must begin and end in the linker state, training a 4th order time-dependent Markov chain for the N state and a homogeneous 4th order Markov chain for the L state.
In the detailed method, they let e = e1, ..., e147 be corresponding to the nucleosome DNA sequence, and let PN be the probability of observing e as a nucleosome, computed as the product of probabilities for both Watson and Crick strands under the 4th order Markov Chain model. They assume that the linker DNA length of a given species has an unknown distribution FL (k) defined for k = 1, ..., τL (the maximum linker length allowed). An observed linker DNA sequence e = e1, ..., ek carries two pieces of information, the length is k bp, and given which, the emitted letters are e1, ..., ek . Let GL (e|k) denote the homogeneous Markov chain model for the linker DNA (again including both strands). Then observing e as a linker DNA has probability

Suppose x = x1, ..., xn is a genomic DNA sequence of length n, where xi = A/C/G/T. Let z = z1, ..., zn be the corresponding hidden state path, where zi = 1 if xi is covered by a nucleosome state, and 0 otherwise. Suppose that the path z = z1, ..., zn partitions x into k consecutive nucleosome or linker state blocks, in which the nucleosome blocks have a uniform length of 147 bp, whereas the length of linker blocks may vary. They denote these blocks as y = y+, ..., yB , and their state identification as s = s1, ..., sB , where si = 1 if yi is nucleosome state, and 0 otherwise. The probability of observing (x, z) is given by

where π0(s1) and πe(sB ) stand for the probabilities that the chain initializes and ends with state s1 and sk respectively, and I is an indicator function. Since they assume that a complete chromatin sequence must start with and end in a linker state, π0(s1 = 0) = πe (sB = 0) = 1. Define the nucleosome occupancy at a specific position i, denoted o i , as the posterior probability that zi = 1, i.e.,

Most importantly, the histone binding affinity score at position i is defined as the log likelihood ratio for the region xi-73, ... xi , ..., xi+73 to be a nucleosome vs. a linker, i.e.,

Given the models PN, GL and FL, the optimal path z can be found by the standard Viterbi algorithm, and the nucleosome occupancy score can be estimated using forward and backward algorithms[2].

The model has been trained on a large number of yeast nucleosome DNA fragments dataset that have been pyrosequencing and on a species by species basis, and the test results are satisfactory[2]. They made the nucleosome occupancy predicting tool into a software tool called NuPoP, which means "nucleosome position predicting", as an R language package for researchers to use.

Strategy

Optimizing design

We designed to obtain the variants with different nucleosome affinity of the promoter sequence by introducing the limited amount of random bases mutation without destroying the UAS sequence and the core region of the promoter.

Actually, a similar iterative optimization model has been proposed, called NucOpt Progs. The theory is technically known as a “Greedy algorithm”, in which all possible random single mutations are enumerated on the initial promoter sequence, and every single sequence variant was calculated the nucleosome affinity by NuPoP to obtain a score. After that, the sequence with the lowest affinity was selected as the winner in this round, then taken as the parent placing into the next round, calculating all mutation possible cases enumeration on this winner and vote in a new champion variant candidate. Finally, the theoretical best optimal sequence which is difficult to gain higher score is obtained[3].

However, after learning and testing NucOpt Progs, we found that this method has many disadvantages. First, the exhaustive enumeration method brings immense time complexity. It takes 36 hours to optimize a sequence of 800 bp length (based on the computer capacity). Secondly, single mutation only and greedy algorithm in use obviously determine that it's extraordinarily easy to get optimization fallen into a local optimal solution. For example, assuming that a single base change is a component of a group of mutations that significantly decrease histone affinity, but rarely calculating the benefit exactly this single base change could bring is unremarkable. In this case, the superior choice would easily be neglected. Thirdly, taking into account the biological effect, other published studies have shown that rarely low affinity is not directly related to the promoter's strength[3]. At the same time, the researchers speculate, the site distribution of nucleosome affinity may play a vital role in the promoter strength.

Given the above three defects, we have designed a software NuPGO which can make up for these shortcomings and achieve a better optimization effect. Our new strategy is to combine NuPoP with the genetic algorithm(Fig.1) using the roulette wheel operator so that the base number of mutations can realize adaptive convergence according to probability instead of only introducing a single random mutation at a time, which not only effectively reduce the time complexity but also take much more possibilities into account, escaping local optimal solution. Considering the biological effect, we additionally set the user-defined region weights especially, which enable the algorithm to place larger or smaller weights on the specific sequence region.

Figure 1. Genetic algorithm schematic diagram

Greedy Algorithm vs Roulette Operator

Greedy algorithm and roulette wheel selection algorithm are both essentially evolutionary algorithms. The greedy algorithm also performs better than other evolutionary algorithms in some cases. Still, its efficiency advantage brought by enumeration couldn't display here, while the multi-factor considering ability in wheel selection algorithm stands out.

Greedy Algorithm

The fatal disadvantage of the previous promoter optimizing algorithm is that only the best performer is selected in each round. Perhaps the author aimed to reduce the time complexity of exhaustive enumeration and simply pursue the best score. But such algorithms can easily trap the results into local optima. For example, the three-point mutation on site ABC decrease the histone affinity more than the three-point mutation on site DEF. However, the affinity decrease of site B mutation is smaller than that of site D mutation, so the greedy algorithm will choose site D mutation strategy as the winner, while site B strategy is obviously better actually.

Moreover, only a random single mutation is introduced in each round. In other words, the mutation number is set to 1, which leads to gigantic amounts of enumerations, and the time complexity has not been sufficiently reduced. The execution of the greedy algorithm is time-consuming. In our test, an 800bp length promoter had to cost 36 hours to run.

Figure 2. Schematic diagram of NucOpt Progs

Roulette Operator

The wheel selection method is also known as the fitness proportionate selection (FPS), which means that the fitness of an individual is directly proportional to the probability of being selected, similar to putting all the individuals on a wheel, the area occupied by each individual is directly proportional to its fitness, that is, the probability of being selected is determined by the area of the plate, namely the fittness of each individual when the wheel spins randomly.
In other words, the fitness is directly proportional to the probability of being selected, turning the wheel N times to select N individuals. Meanwhile, it also means that an individual can be selected repeatedly.

Figure 3. Schematic diagram of genetic algorithm applied to promoter optimizer NuPGO

For random bases mutations introduced in each round, we set the mutation amount as an adaptive parameter. Each base position in the sequence is a probability mutation rather than the unintelligent setting the amount as "1" from beginning to end. To ensure the rationality and accuracy of the optimization, we set the threshold of mutation probability to 0.0005 so that the number of base mutations in each round would not exceed 5. Of course, other choices are worth expanding. By doing this, the time complexity of the model is reduced to some degree, the curve will converge faster when performing the gradient descends, broadening the search, and more combinations would be explored to approach the global optimal solution. As the iteration converges, the mutation probability will gradually approach 0, that is, reaching the final optimized promoter sequence(Fig.3).

Specific Weights

The critical UAS are usually only about 10 bp in length, and the vital goal we want to achieve, precisely speaking, is to reduce the nucleosome affinity around the UAS region and to make the UAS region tend to become the linker region rather than nucleosome region.
The final goal of the algorithm is to decrease the total affinity score of the whole promoter sequence as far as possible, which is likely to ignore the key UAS region mutation and choose other strategies with more significant benefit, the final optimized promoter obtained in such condition may not make use of the advantages of UAS. Specifically, the key UAS is still in the high histone affinity region. To compensate for this shortcoming, we decided to artificially add a regional weight to the score calculation process. According to the formula introduced above, dHMM calculates the nucleosome affinity score for every single site, and then calculates the area under the curve (AUC).
To make some adjustments, we decide to compute the score of every single position with a custom zoom when calculating the specific area that we regard special, such as the area of the crucial UAS, in other words, multiplying the simply calculated score by the user-defined weight to get each final single point score.

Fortunately, the results of testing different weights on the NuPGO algorithm are satisfactory. After adding weights to a specific area, NuPGO will pay more attention to that area. In the final optimized result sequence, this specific region will get better performance of nucleosome affinity degradation.

Sustainable

Ultimately, and most worth noting is that, neither the NuPoP software program which can evaluate DNA’s nucleosome affinity[2] nor the iterative optimization code[3] that uses the greedy algorithm, because of the components missing and internet limitation, following the tutorials provided in the papers, we couldn’t get the complete executable code file and failed to run the optimizing software with greedy algorithm directly, as well as other researchers. It means the use of NuPoP and greedy algorithm optimizer have been difficult for those who lack the computer or bioinformatics study infrastructure.
Therefore, we refactored the implementation code almost completely according to some of its logic, and wrote the genetic algorithm using the roulette wheel operator, which is much easier to read and user-friendly much more than the previous code. In other words, we fix the algorithm published before and come up with a more novel and intelligent software. Both the greedy algorithm optimizer NucOpt Progs and genetic algorithm optimizer NuPGO are published by us for users’ convenient.
This means that anyone, especially iGEM teams in the future and researchers in labs, can easily get started using our program based on a tutorial. Input your initial data and conditions visually and easily and it will be on. Those interested are also welcomed to read our source code and make some comments or modifications, and maybe a stronger NuPGO will be come up with next year.

Result Analysis

Optimization result on our promoter

Figure 4. NuPGO optimization on our project promoter nucleosome affinity

Time-consuming comparison

Optimization effect comparison

Figure 5. Result comparison between Greedy Algorithm and Genetic Algorithm, weights added result and result without weights.

Figure 6. Descent curve of total nucleosome affinity score between Greedy Algorithm and Genetic Algorithm, weights added result and result without weights.

Figure 7. Result comparison between Greedy Algorithm and Genetic Algorithm, weights added result and result without weights.

Figure 8. Descent curve of total nucleosome affinity score between Greedy Algorithm and Genetic Algorithm, weights added result and result without weights.

NuPGO result visualization

Figure 9. Comparison between initial and optimized project promoter M8 sequence

Figure 10. Partial protected UAS region during optimization

NuPGO result verification

Unfortunately, the NuPGO-optimized promoter has some mutation of around hundreds of bp, requiring chemical synthesis to obtain the new promoter gene fragment. Moreover, it is not statistically representative of testing only the final variant. To prove that the strength of the NuPGO optimized promoter sequence is consistent with the predicted results, we need to select at least 5 to 10 promoter variants, and conduct chemical synthesis and characterization experiments to verify their strength. Due to the lack of project funds and time constraints, we can not perform characterization experiments to ascertain the strength of the sequence obtained by the optimization algorithm.
If conditions permit or any other team was interested in verifying the effect of NuPGO, we would like to or recommend selecting the promoter variants with 50, 100, 200, 400, 500bp mutations in the same run, also pick three variants at each of the position, such as 499, 500, 501bp. Then design the parallel characterization experiments to see the strength difference. The descent curve could also be consulted to obtain the selection choice strategy.
Nevertheless, we have reason to believe that the results of the optimization are relatively reliable, because, in our references(Fig.6), the authors conducted experiments on the strength characterization of the promoter variants obtained by the greedy algorithm. We can conclude from the experimental results that nucleosome affinity optimization indeed can increase the promoter's strength and thus enhance expression.

Figure 10. Redesign of native yeast promoters for increased expression by decreasing nucleosome affinity in reference article. DOI: 10.1038/ncomms5002

Availability

We publish our source code and packages on Github. We extraordinarily sincerely and strongly recommend that every future iGEM team or researcher using a eukaryotic promoter use our software to achieve a higher expressing level and rational use of UAS strong promoters.

Github

References

[1]L. R. Rabiner, "A tutorial on hidden Markov models and selected applications in speech recognition," in Proceedings of the IEEE, vol. 77, no. 2, pp. 257-286, Feb. 1989, doi: 10.1109/5.18626.

[2]Xi, L., Fondufe-Mittendorf, Y., Xia, L. et al. Predicting nucleosome positioning using a duration Hidden Markov Model. BMC Bioinformatics 11, 346 (2010). https://doi.org/10.1186/1471-2105-11-346

[3] [3]Curran, K., Crook, N., Karim, A. et al. Design of synthetic yeast promoters via tuning of nucleosome architecture. Nat Commun 5, 4002 (2014). https://doi.org/10.1038/ncomms5002

[4]de Boer, C.G., Vaishnav, E.D., Sadeh, R. et al. Deciphering eukaryotic gene-regulatory logic with 100 million random promoters. Nat Biotechnol 38, 56–65 (2020). https://doi.org/10.1038/s41587-019-0315-8.