*Quantifying the Stability of Different RNAs*

In the work of P. Cetnar et al, they construct 16 operons, each designed and inserted 5’ UTR sequences that contained between 0 and 40 polyA nucleotides, and used GFP as the report gene. During the characterizing experiments, they keep the promoter unchanged and maintain cells in the exponential growth phase for a long time period, so the changes in the mRNA levels can be primarily attributed to changes in mRNA stability[1].

The model focuses on the mRNA stability, considering that mRNA level is correlated with mRNA degradation rate, and thus, if we need to describe the RNA stability, we can assume the mRNA S’ stability is positively correlated with mRNA level T. We have

$$ S\mathrm{'}=kT $$For simplicity, here k is 1

After constructing the relationship between mRNA stability and mRNA level, by modeling the mRNA level can help us describe mRNA stability.

Obtain the following data in the experiment

Sequence Name | Relative mRNA Level |

0nt_5p | 0.397 |

1nt_5p | 1.000 |

2nt_5p | 0.995 |

4nt_5p | 0.848 |

5nt_5p | 0.550 |

6nt_5p | 0.344 |

8nt_5p | 0.261 |

10nt_5p | 0.336 |

12nt_5p | 0.311 |

15nt_5p | 0.146 |

16nt_5p | 0.135 |

20nt_5p | 0.106 |

25nt_5p | 0.145 |

30nt_5p | 0.158 |

35nt_5p | 0.164 |

40nt_5p | 0.254 |

Table 1. length of ssRNA in 5’UTR affect the mRMA level

According to the mRNA degradation rate model provided in the literature[1]

$$ dgrad_{mRNA}\left( L \right) =C\left( L-N \right) \exp \!\:\left( -\frac{L}{P} \right) ,L>20 $$ $$ dgrad_{mRNA}\left( L \right) =C\left( L-N \right) ,L\le 20 $$ $$ dgrad_{mRNA}Max=dgrad\left( 20 \right) =9.5058 $$Where L represents the length of ssRNA region, N represents the smallest possible binding site for RNases E/G[2]. P refers to the persistence length of ssRNA, which is a measure of its bending stiffness and normally ranging from 1 to 7 nm[3][4]. When the length of ssRNA region is less than 20, the activity coefficient C is 0.5281, and when it is greater than or equal to 20, C=0.975.

Considering that the length of the ssRNA sequence at 5’UTR is only one of the indicators to measure the stability of mRNA, in order to avoid the influence between different indicators, the obtained degradation needs to be dimensioned, that is, the relative degradation rate R is obtained.

$$ R_{UTR}\left( L \right) =\frac{dgrad_{mRNA}\left( L \right)}{dgrad_{mRNA}Max}\mathrm{ (}0\le R_{UTR}\left( L \right) \le 1) $$Considering that this model focuses on the stability of mRNA, which needs to be described by relative stability, so there is relative stability S(L)

$$ S_{UTR}\left( L \right) =-R_{UTR}\left( L \right) $$According to the above model and experimental data[1], the following results can be obtained:

length of ssRNA | Stability | length of ssRNA | Stability |

1 | -0.05556 | 21 | 0.717063 |

2 | 0 | 22 | 0.719703 |

3 | 0.055556 | 23 | 0.720546 |

4 | 0.111111 | 24 | 0.719754 |

5 | 0.166667 | 25 | 0.717478 |

6 | 0.222222 | 26 | 0.713857 |

7 | 0.277778 | 27 | 0.709022 |

8 | 0.333333 | 28 | 0.703092 |

9 | 0.388889 | 29 | 0.696181 |

10 | 0.444444 | 30 | 0.688392 |

11 | 0.5 | 31 | 0.679821 |

12 | 0.555556 | 32 | 0.67056 |

13 | 0.611111 | 33 | 0.660689 |

14 | 0.666667 | 34 | 0.650286 |

15 | 0.722222 | 35 | 0.639423 |

16 | 0.777778 | 36 | 0.628163 |

17 | 0.833333 | 37 | 0.616568 |

18 | 0.888889 | 38 | 0.604692 |

19 | 0.944444 | 39 | 0.592588 |

20 | 1 | 40 | 0.580302 |

Table 2. mRNA stability based on the model (5’UTR)

Daniel P. Cetnar et al. carried out work similar to Part 1, constructing 16 operons and inserting 0-40 nt adenylate into the internal region of the gene. Similarly, after inserting the reporter gene, GFP, a series of characterization tests were performed.

Sequence Name | Relative mRNA Level |

0nt_I | 1.000 |

1nt_I | 1.192 |

2nt_I | 1.406 |

4nt_I | 1.151 |

5nt_I | 1.045 |

6nt_I | 1.097 |

8nt_I | 1.091 |

10nt_I | 1.217 |

12nt_I | 0.839 |

15nt_I | 0.872 |

16nt_I | 0.898 |

20nt_I | 0.705 |

24nt_I | 0.622 |

30nt_I | 0.747 |

35nt_I | 0.613 |

40nt_I | 0.601 |

Table 3. length of ssRNA in intergenic region affect the mRMA level

According to the mRNA degradation rate model provided in the literature[1]

Since the culture conditions of the experiment are the same as Part 1, so the changes in the mRNA levels can be primarily attributed to changes in mRNA stability, so there is also a positive correlation between mRNA stability S’ and mRNA level T.

$$ S^{\mathrm{'}}=kT\text{，}k=1 $$According to the degradation rate model provided by the literature:

$$ dgrad_{mRNA}\left( L \right) =C\left( L-N \right) $$ $$ dgrad_{mRNA}Max=dgrad\left( 40 \right) $$The meaning of each parameter is the same as in Part 1, where C , the activity coefficient is 0.018.

We can also have the relative degradation rate and relative stability

$$ R_{intergenic}\left( L \right) =\frac{dgrad_{mRNA}\left( L \right)}{dgrad_{mRNA}Max}\mathrm{ (}0\le R_{intergenic}\left( L \right) \le 1) $$ $$ S_{intergenic}\left( L \right) =-R_{intergenic}\left( L \right) $$According to the above experimental data and model, the following results can be obtained:

length of ssRNA | Stability | length of ssRNA | Stability |

1 | -0.02632 | 21 | 0.5 |

2 | 0 | 22 | 0.526316 |

3 | 0.026316 | 23 | 0.552632 |

4 | 0.052632 | 24 | 0.578947 |

5 | 0.078947 | 25 | 0.605263 |

6 | 0.105263 | 26 | 0.631579 |

7 | 0.131579 | 27 | 0.657895 |

8 | 0.157895 | 28 | 0.684211 |

9 | 0.184211 | 29 | 0.710526 |

10 | 0.210526 | 30 | 0.736842 |

11 | 0.236842 | 31 | 0.763158 |

12 | 0.263158 | 32 | 0.789474 |

13 | 0.289474 | 33 | 0.815789 |

14 | 0.315789 | 34 | 0.842105 |

15 | 0.342105 | 35 | 0.868421 |

16 | 0.368421 | 36 | 0.894737 |

17 | 0.394737 | 37 | 0.921053 |

18 | 0.421053 | 38 | 0.947368 |

19 | 0.447368 | 39 | 0.973684 |

20 | 0.473684 | 40 | 1 |

Table 4. mRNA stability based on the model (intergenic region)

In the experiment of Daneil et al., it was found that the first three nucleotides in the 5'UTR affects the activity of the RppH enzyme, thereby affecting the degradation rate of mRNA. After constructing the cistron under the same experimental conditions as part 1 and 2 for characterization experiments[1], the following mRNA level data was obtained

Sequence Name | Relative mRNA Level |

RppH_AAA | 0.055 |

RppH_AAC | 0.524 |

RppH_AAG | 0.561 |

RppH_AAT | 0.409 |

RppH_ACA | 0.317 |

RppH_ACC | 0.613 |

RppH_ACG | 0.634 |

RppH_ACT | 0.397 |

RppH_AGA | 0.716 |

RppH_AGC | 0.755 |

RppH_AGG | 0.748 |

RppH_AGT | 1.000 |

RppH_ATA | 0.825 |

RppH_ATC | 0.604 |

RppH_ATG | 0.716 |

RppH_ATT | 0.446 |

Table 5. mRNA stability based on the model (first three nucleotides in 5’UTR)

According to the mRNA degradation rate model provided in the literature[1]

Since the culture conditions of the experiment are the same as Part 1, the mRNA level is also only affected by the degradation rate of mRNA, so there is also a positive correlation between mRNA stability S’ and mRNA level T.

$$ S^{\mathrm{'}}=kT\text{，}k=1 $$For simplicity, here k is 1

Therefore, the table data can be used as the relative stability value of mRNA.

After obtaining the RNA relative stability model under the influence of the three structural factors, which are 5′ untranslated region length, First Three Nucleotides in the 5′ Untranslated Region, Intergenic Region length, we plan to unify the three models into one model, so that the overall relative stability of the RNA can be estimated based on the various characteristics of the RNA.

According to the modeling rules, we observe that the relative stability of RNA affected by the three structural factors have a certain degree of contribution to the relative stability of the overall RNA, so we will build a model to assign weights to the relative stability of the three structural factors. Thus, the overall model is constructed as follows:

$$ Y=\sum_{i=1}^3{\mu _ix_i} $$Among them, Y is the relative stability of the total RNA, $x_1$ is the RNA stability estimated by the 5'UTR length, $x_2$ is the RNA stability estimated by the RNA mid-end UTR length, and $x_3$ is the first 5'UTR Estimated RNA stability of triplets. $\mu_i$ are the estimated stability of each factor and then used to calculate the total stability parameters. How to calculate the respective contributions of the three structural factors to the total stability of RNA will be our main task in the next modeling.

We considered two schemes to model and predict the contribution. The first method is to predict the contribution of the three factors through the entropy weight method, and the second method is to fit the model parameters based on the existing experimental data.

Entropy is a measure of the uncertainty of the system state, which can be used to measure the information contained in the index data in the evaluation index system and determine the weight of each index accordingly.

Before using the entropy method to find the relative weights, we need to normalize the existing index data from 0 to 1. There are two formulas as follows.

$$\begin{aligned} x_{i j}^{\prime} &=\frac{x_{i j}-\min \left(x_{j}\right)}{\max \left(x_{j}\right)-\min \left(x_{j}\right)} \\ x_{i j}^{\prime} &=\frac{\max \left(x_{j}\right)-x_{i j}}{\max \left(x_{j}\right)-\min \left(x_{j}\right)} \end{aligned}$$If the indicator is positive, the first formula is chosen, and if the indicator is negative, the second formula is chosen. min(xj) is the minimum value of the jth indicator and max(xj) is the maximum value of the jth indicator. The normalized matrix is thus obtained.

First, calculate the weight of the jth indicator of the ith object:

$$y_{i j}=\frac{x_{i j}^{\prime}}{\sum_{i=1}^{m} x_{i j}^{\prime}}$$According to the definition of information entropy, the information entropy of the jth indicator in the evaluation matrixY is:

$$E_{j}=-\frac{1}{\ln m} \sum_{i=1}^{m} y_{i j} \ln y_{i j}$$Calculate the weigh of the jth indicator:

$$w_{j}=\frac{1-e_{j}}{\sum_{j} 1-e_{j}}$$From this, we can get the contribution of the three structural factors

$$ \omega _1=0.524,\mathrm{ }\omega _2=0.0885,\mathrm{ }\omega _3=0.3876 $$we have obtained all the data of the experiment[1] to measure the influence of three structural factors on RNA stability.

In the case of having all the data of the three eigenvalues and the relative value of the corresponding RNA, we consider using these data for linear regression to predict the linear parameters of the relative content of the three eigenvalues and RNA, and then substitute them into the model.

Part of the data is as follows:

RNA features | RNA level(y) | Length of 5'UTR(x1) | Length of innerUTR(x2) | The first triplet codon(x3) |

0nt_5p | 0.397 | 1.0000 | 1.0000 | 0.755 |

1nt_5p | 1.000 | 1.0000 | 1.0000 | 0.755 |

2nt_5p | 0.995 | 1.0000 | 1.0000 | 0.561 |

4nt_5p | 0.848 | 0.8889 | 1.0000 | 0.055 |

5nt_5p | 0.550 | 0.8333 | 1.0000 | 0.055 |

6nt_5p | 0.344 | 0.7778 | 1.0000 | 0.055 |

8nt_5p | 0.261 | 0.6667 | 1.0000 | 0.055 |

10nt_5p | 0.336 | 0.5556 | 1.0000 | 0.055 |

12nt_5p | 0.311 | 0.4444 | 1.0000 | 0.055 |

15nt_5p | 0.146 | 0.2778 | 1.0000 | 0.055 |

16nt_5p | 0.135 | 0.2222 | 1.0000 | 0.055 |

20nt_5p | 0.106 | 0.0000 | 1.0000 | 0.055 |

25nt_5p | 0.145 | 0.2829 | 1.0000 | 0.055 |

30nt_5p | 0.158 | 0.3116 | 1.0000 | 0.055 |

35nt_5p | 0.164 | 0.3606 | 1.0000 | 0.055 |

40nt_5p | 0.254 | 0.4197 | 1.0000 | 0.055 |

0nt_I | 1.000 | 1.0000 | 1.0000 | 0.755 |

1nt_I | 1.192 | 1.0000 | 1.0000 | 0.755 |

2nt_I | 1.406 | 1.0000 | 1.0000 | 0.755 |

4nt_I | 1.151 | 1.0000 | 0.9474 | 0.755 |

5nt_I | 1.045 | 1.0000 | 0.9211 | 0.755 |

6nt_I | 1.097 | 1.0000 | 0.8947 | 0.755 |

8nt_I | 1.091 | 1.0000 | 0.8421 | 0.755 |

10nt_I | 1.217 | 1.0000 | 0.7895 | 0.755 |

12nt_I | 0.839 | 1.0000 | 0.7368 | 0.755 |

15nt_I | 0.872 | 1.0000 | 0.6579 | 0.755 |

16nt_I | 0.898 | 1.0000 | 0.6316 | 0.755 |

20nt_I | 0.705 | 1.0000 | 0.5263 | 0.755 |

24nt_I | 0.622 | 1.0000 | 0.3947 | 0.755 |

30nt_I | 0.747 | 1.0000 | 0.2632 | 0.755 |

35nt_I | 0.613 | 1.0000 | 0.1316 | 0.755 |

40nt_I | 0.601 | 1.0000 | 0.0000 | 0.755 |

RppH_AAA | 0.146 | 0.2778 | 1.0000 | 0.055 |

RppH_AAC | 1.392 | 0.2778 | 1.0000 | 0.524 |

RppH_AAG | 1.489 | 0.2778 | 1.0000 | 0.561 |

RppH_AAT | 1.085 | 0.2778 | 1.0000 | 0.409 |

RppH_ACA | 0.841 | 0.2778 | 1.0000 | 0.317 |

RppH_ACC | 1.628 | 0.2778 | 1.0000 | 0.613 |

RppH_ACG | 1.682 | 0.2778 | 1.0000 | 0.634 |

RppH_ACT | 1.053 | 0.2778 | 1.0000 | 0.397 |

RppH_AGA | 1.899 | 0.2778 | 1.0000 | 0.716 |

RppH_AGC | 2.005 | 0.2778 | 1.0000 | 0.755 |

RppH_AGG | 1.986 | 0.2778 | 1.0000 | 0.748 |

RppH_AGT | 2.655 | 0.2778 | 1.0000 | 1.000 |

RppH_ATA | 2.191 | 0.2778 | 1.0000 | 0.825 |

RppH_ATC | 1.603 | 0.2778 | 1.0000 | 0.604 |

RppH_ATG | 1.902 | 0.2778 | 1.0000 | 0.716 |

RppH_ATT | 1.184 | 0.2778 | 1.0000 | 0.446 |

Table 6. various factors affect the mRNA level

According to the mRNA degradation rate model provided in the literature[1]

In this modeling process, we choose to establish a multivariate linear model based on the data characteristics, and use Octave to calculate the linear regression. The ultimate goal is to make the following formula reach the minimum.

$$ J\left( \theta _1,\theta _2,\theta _3 \right) =\sum_{i=1}^m{\left( y^{\left( i \right)}-\hat{y}^{\left( i \right)} \right) ^2} $$Where $ y^{\left( i \right)} $ is the actual RNA level of the i-th group of data, and $ \widehat{\mathrm{ }y}^{\left( i \right)} $ is the predicted RNA level based on the characteristic value of the i-th group of data.

Calculated by Octave. The final result are as follow.

$$ \theta _1=-0.8124,\theta _2=0.8783,\theta _3=1.5902 $$Two models are used to fit the experimental data respectively, and then compared with the actual data. The following figure is obtained:

Fig 1
result of two models compared with the actual data

(* Is entropy method, ⚪ is linear regression)

It is clear that when the fitted data is less than 0.5, the entropy weight method has a better fitting effect; otherwise, the linear regression has a better effect.

So, the final model result we get is:

$$ Y=\mathrm{ }k_1\sum_{i=1}^3{\omega _ix_i}+k_2\sum_{i=1}^3{\theta _ix_i} \\ k_1=1,\mathrm{ }k_2=0\mathrm{ (}\sum_{i=1}^3{\theta _ix_i}<0.5) \\ k_1=0,\mathrm{ }k2=1\mathrm{ (}\sum_{i=1}^3{\omega _ix_i}>0.5) $$Previous studies have suggested that ribosomes can protect mRNA transcripts from RNase activity, possibly by covering RNase binding sites and limiting their accessibility[5][6].

Fig 2 Schematics illustrating how a mRNA’s translation rate controls the accessibility of RNase binding sites in an operon’s coding regions.[1]

In order to quantify the relationship between the translation rate and the decomposition rate of mRNA, we constructed a relevant model to simulate the process.

The key to modeling is to calculate how many bases are not covered by ribosomes and are exposed to the catalytic range of the RNase. Here, we designated α as the ratio of the ribosome’s translation initiation rate over its elongation rate over its elongation rate. We also designated F as the physical footprint of each ribosome in units of trinucleotides (amino acids). According to a TASEP (totally asymmetric exclusion process) model of ribosome dynamics that includes the ribosome’s footprint on the mRNA[7].

When translation initiation is the rate-limiting step (α is less than one), the steady-state ribosome density ρr is

$$ \rho _{\mathrm{r}}=\min \left( \frac{F\alpha}{1+(F-1)\alpha},1 \right) \tag{a} $$According to the definition,

$$ \rho r=\frac{\left( MF \right)}{N} $$(M is the number of ribosomes on RNA, and N is the number of codons on RNA)

The relationship defined by eq illustrates how increasing a mRNA’s translation initiation rate results in higher ribosome densities, up until the maximum possible value.

Next, we define the unprotected distance between two adjacent ribosomes, the number of trinucleotides between the end of one ribosome and the beginning of the next ribosome. We call this unprotected RNA ‘hole’, so the hole density ρh=1-ρr. According to the TASEP model，the probability that a bound ribosome has m free trinucleotides.

$$ P(D=m)\propto \frac{\rho _{\mathrm{r}}}{\rho _{\mathrm{r}}+\rho _{\mathrm{h}}}\left( \frac{\rho _{\mathrm{h}}}{\rho _{\mathrm{r}}+\rho _{\mathrm{h}}} \right) ^m\tag{b} $$The first half of the equation calculates the probability that a mRNA position is bound by a ribosome, whereas the second part calculates the probability that the next m adjacent positions on the mRNA all contain a hole, Substituting equation (a) and the definition of the hole density into equation (b), we get the probability distribution of the distance between ribosomes

$$ P(D=m)\propto \frac{\rho _{\mathrm{r}}\left( 1-F\rho _{\mathrm{t}} \right) ^m}{\left( 1+\rho _{\mathrm{r}}-F\rho _{\mathrm{r}} \right) ^{(m+1)}}\tag{c} $$We then determined the ribosomes’ average headway distance by calculating the first moment of the probability distribution in eq(c), using

$$ \langle D\rangle =\frac{1}{Z}\sum_{m=1}^{m=L}{P}(D=m)m\tag{d} $$Where $ Z=\sum\nolimits_{m=1}^{m=L}{P(D=m)} $

According to the model, we model the following results

Fig 3 the translation initial rate affects the rate of RNA decomposition(1/mRNA level)

It can be seen that with the increase of translation intensity, the rate of RNA decomposition continues to decrease, verifying the hypothesis proposed in the literature[1].

*Reference*

- 1. Cetnar DP, Salis HM. Systematic Quantification of Sequence and Structural Determinants Controlling mRNA stability in Bacterial Operons. ACS Synth Biol. 2021 Feb 19;10(2):318-332.
- 2. Laurence TA, Kong X, Jäger M, Weiss S. Probing structural heterogeneities and fluctuations of nucleic acids and denatured proteins. Proc Natl Acad Sci U S A. 2005 Nov 29;102(48):17348-53.
- 3. Chen, H., Meisburger, S. P., Pabit, S. A., Sutton, J. L., Webb, W. W., and Pollack, L. (2012) Ionic strength-dependent persistence lengths of single-stranded RNA and DNA. Proc. Natl. Acad. Sci. U. S. A. 109 (3), 799−804.
- 4. Doose, S., Barsch, H., and Sauer, M. (2007) Polymer properties of polythymine as revealed by translational diffusion. Biophys. J. 93 (4), 1224−1234.
- 5. Braun, F., Le Derout, J., and Regnier, P. (1998) Ribosomes ́ inhibit an RNase E cleavage which induces the decay of the rpsO mRNA of Escherichia coli. EMBO Journal 17 (16), 4790−4797.
- 6. Yarchuk, O., Jacques, N., Guillerez, J., and Dreyfus, M. (1992) Interdependence of translation, transcription and mRNA degradation in the lacZ gene. J. Mol. Biol. 226 (3), 581−596.
- 7. Shaw, L. B., Zia, R. K. P., and Lee, K. H. (2003) Totally asymmetric exclusion process with extended objects: A model for protein synthesis. Phys. Rev. E: Stat. Phys., Plasmas, Fluids, Relat. Interdiscip. Top. 68 (2), 021910.