Motivation

There is an ever-increasing need for rapid, non-invasive, yet accurate methods for cancer diagnosis given that cancer is the most common cause of global disease-related mortality, and the current invasive, unpleasant and inconvenient clinical diagnostic procedures limit their applications.
Recent researches have shown that the levels of multiple microRNAs in serum are informative biomarkers for the early diagnosis of cancers.
Since DNA can interact with different molecules, transduce the signals and report the results in a programmable manner, DNA molecular computation actually provides powerful tools to analyze miRNA profiles in clinical serum samples[1]. In contrast to traditional methods like NGS and PCR, DNA computation is actually a probe-based method to detect miRNA concentrations.

Data gathering and construction of in silico linear classifier

TCGA provides a great amount of available clinical data, which include the expression levels of multiple miRNAs of patients and healthy people. We use Web Crawler to gather enough data for subsequent experiments.
We apply discrepancy analysis for the miRNA data to screen suitable miRNA targets. These miRNAs targets are effective biomarkers in early cancer diagnosis.
Supporting Vector Machine (SVM) is a machine learning model which can find the separation hyperplane that can correctly divide the training data set and has the largest geometric interval. With the help of SVM, we are able to construct an in silico linear classifier that can accurately judge whether a person has cancer.

Decoding in silico-trained classifier to a computational scheme

A winner-takes-all strategy can experimentally implement classifier trained in silico.
We integrated the visualization tool into our project to clearly show the structures of the miRNAs. By using the model we constructed later, we are able to give possible suitable sites on miRNAs. And we also offer some common strategies in DNA molecular computation for reference.

DNA computation

The core of the DNA computation is the reactions happen between miRNAs and probes. This process is based on nucleic acid strand-displacement reactions. A demostrartion of strand-displacement reaction are shown below[2]:

Strand-displacement reactions generally proceed by three-way or four-way branch migration and initially were investigated for their relevance to genetic recombination. In the process of DNA computation, strand displacement reactions are mostly carried out in the form of three-way branch migration (a DNA strand displaces one member of a DNA duplex).

Toeholds are short sections of single-stranded DNA that initiates branch migration through a hybridization reaction. The probes we design for DNA computation should include such a toehold, and it will interact with the proper sites on miRNA and eventually bind to it.

However, this process does not always proceed as expected. Two major concerns will influence the effectiveness of DNA computation process.

First and foremost, the probes shouldn't form significant secondary structures. If the toehold site on it forms a secondary structure (forms hydrogen bonds with other bases), it will greatly hinder the smooth progress of the strand displacement reaction. Thus, we need an accurate method to detect RNA secondary structure in order to evaluate the probe. By building a deep learning model and using a large amount of data for training, we obtain an effective model for predicting secondary structure of nucleic acids.
Moreover, spurious hybridization will also affect the precision of DNA computation[3]. As a quantitative process, we hope the probes can and can only react with proper sites of the corresponding miRNA. But with a large number of substances (different miRNAs, different probes and possible impurities) coexist in the system, probes (toehold sites) may carry out unnecessary reactions (spurious hybridization). To settle this problem, we develop a dynamic programming algorithms to measure the propensity of a probe to form spurious hybridization.

References

[1] Zhang, C., Zhao, Y., Xu, X. et al. Cancer diagnosis with DNA molecular computation. Nat. Nanotechnol. 15,709–715 (2020). https://doi.org/10.1038/s41565-020-0699-0
[2] Simmel FC, Yurke B, Singh HR. Principles and Applications of Nucleic Acid Strand Displacement Reactions. Chem Rev. 2019 May 22;119(10):6326-6369. doi: 10.1021/acs.chemrev.8b00580. Epub 2019 Feb 4. PMID: 30714375.
[3] Zhang D.Y. (2011) Towards Domain-Based Sequence Design for DNA Strand Displacement Reactions. In: Sakakibara Y., Mi Y. (eds) DNA Computing and Molecular Programming. DNA 2010. Lecture Notes in Computer Science, vol 6518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18305-8_15

Team:SJTU-Software/Design

Motivation

Data gathering and construction of in silico linear classifier

Decoding in silico-trained classifier to a computational scheme

DNA computation

References