The amount of digital data generated by humanity is growing exponentially, by another side, the space to store them is running out, creating an overload on current storage media(1). In addition, the silicon used to build chips is a non-renewable resource, which causes damage to the environment, high water consumption, and labor analogous to slavery(2). Therefore, the search for new data storage technologies is urgent, and the solution may be just a few atoms away.
Molecules have advantages over current storage technologies: their high density, durability, and low cost. DNA, for example, offers a density six orders of magnitude greater than the densest medium used today, about 1018 bytes per mm³(1), a half-life of over 500 years(3), in addition to a representative reducing energy expenditure(4). With that, a few grams of DNA could store all the information on Earth!
Inspired by Microsoft's DNA data storage and reading initiative(5), our team created Lovelace's Note in Gene, based on the name of the first person and programmer in history: Ada Lovelace. Our genetic circuit is capable of generating a fluorescent binary code in Escherichia coli cells through the interaction of fluorophores with two RNAs aptamers (Broccoli and Corn).
The Lovelace project aims to create a genetic device able to store information in binary code within the cell's DNA so that this information is generated autonomously after the first input. The information will also be always the same and easy to recognize.
Thus, once created, our system can be adapted to other genetic devices with a wide range of applications so that it will be possible to create more complex messages, and then this system will be able to be used as one of the leading storage sources in technology systems, such as computers or even data centers.
Back to top
Our device will be able to detect inducing molecules through promoters regulated by tetracycline and lactose. The circuit will start after adding the inputs, triggering the production of dCas9 and gRNAs that will control the system at the transcriptional level through activation and repression (CRISPRa and CRISPRi). Once induced, the CRISPR control system will regulate the output production autonomously. Thus no other stimulus will be necessary for the message in binary code to be written. The transcription of RNAs Broccoli and Corn is regulated through time, enabling differentiation of messages 0 (Broccoli) and 1 (Corn).
Back to top
We made the design to optimize the project's innovation pillars, the autonomous and temporal regulation of the system. Thus, we divided the design into three modules: control, message, and response. Each module is responsible for regulating the circuit, carrying out the progression and continuity of the code, and issuing the signals, respectively.
The control module consists of 3 elements: the TetR repressor protein, controlled by the J23100 promoter (constitutive and strong of the Anderson family for E. coli); SpdCas9 fused to bacteriophage T4 anti-sigma 70 protein, AsiA, controlled by the Tet promoter (regulated by TetR); and gRNA-S, controlled by the T7-lacO promoter (related to T7 RNA polymerase and with a regulatory site for the LacI protein). To better understand and visualize the control module, you can join the previous information with the image below.
After the start of gRNA-S transcription, Message Module will start. The Message Module is composed of seven blocks, composed of the following parts: a target site (Sa and 3a to 7a) of activation for the SpdCas9_AsiA_gRNA complex, the PJ23117 promoter (weak promoter, constitutive of the Anderson family for E. coli), the ribozyme RiboJ, gRNAs 0 and 1 and gRNAs 3 to 7. The system will work through the transcription with time spacing of each of the blocks. The gRNAs 0 and 1 guide the activation complex to trigger the response module and thus transcribing the fluorescent RNAs. On the other hand, gRNAs 3 to 8 determine the sequence of the Message coded. They activate the transcription of the next gRNA (0 or 1), determining the interspersing of the gRNA synthesis. They guide the activation complex through parts of the Message Module (3 to 8), producing a temporal variation in the transcription of gRNAs 0 and 1, hence a variation in fluorescent RNAs.
The last division of our system corresponds to the Response Module. This circuit module contains the parts necessary to create the binary code message through fluorescence. For this, there are two parts composed of the RNAs Broccoli and Corn genes, which will be the numbers 0 and 1 of the code, controlled by the PJ23117 promoter. In addition, there are four target sites: 0i, 0a, 1i, and 1a. The "a" sites are for activating gene transcription, and the "i" sites are for repressing the gene. Thus, when gRNA-0 reaches a minimum concentration in the system, the transcription of RNA Broccoli and repression of RNA Corn will occur. When the system transcribes the gRNA-1, the opposite happens. With this, it will be possible to differentiate the message through the time needed to change the concentration of gRNAs 0 and 1.
Back to top
A new device will decode the fluorescence emitted by the cell, created to be a low-cost, small, and easily reproducible device. This hardware will be a mini-spectrometer capable of identifying the wavelength in real-time, thus, identifying it precisely, ensuring the accuracy of the message transmission. Check out our Hardware page description
Back to top
Amplification of the Ptet_SpdCas9 (4327 bp) and PJ23100_tetR (797 bp) sequences were performed using specific oligonucleotide pairs by PCR reaction with the enzyme Time Hifi DNA Polymerase (Images 4 and 5).
Sequences Ptet_SpdCas9 + PJ23100_tetR (5100 bp) cloning in plasmid pSB1A3 (2100 bp) confirmed by EcoRI and PstI enzymes digestion (Image 6).
Then, the AsiA activator protein gene (464 bp) needed to be inserted at the end of the SpdCas9 sequence. However, an error was made in the request for the AsiA gBlock, which had one more nucleotide (a Guanine), which consequently led to an incorrect reading frame and to form a premature stop codon. Thus, it was necessary an oligonucleotide for sequence correction using PCR. PCR was performed at Tm 55°C, using 2ng of the template to avoid a higher concentration of template sequence, and confirmed by electrophoresis (Image 7).
We made the asia gene linkage with the pSB1A3_tetR_spdcas9 vector with a ratio of 1:3, 1:10, and 1:50 (vector: insert). After cloning and miniprep, we performed a PCR with specific oligos, and afterward, electrophoresis. Only one colony had a band close to the expected size (Image 8).
Cloning of the sequences PT7-lacO_gRNA-0 (9.1 = 248 bp), PT7-lacO_RiboJ_gRNA-0 (9.2 = 322 bp), PT7-lacO_gRNA-1 (10.1 = 248 bp), and PT7-lacO_RiboJ_gRNA-1 (10.2 = 322 bp) were performed and confirmed by digestion with EcoRI and PstI enzymes (Image 9).
The cloning of the sequences PT7-lacO_target-1a_PJ23117_target-0i_corn_gRNA-0 (9.1 + 11 = 572 bp), PT7-lacO_target-1a_PJ23117_target-0i_corn_RiboJ_gRNA-0 (9.2 + 11 = 658 bp) were carried out and confirmed by digestion enzymes and PstI (Image 10).
The cloning of the sequences PT7-lacO_target-1a_J23117_target-0i_corn_gRNA-1 (10.1 + 11 = 578 bp), PT7-lacO_target-1a_PJ23117_target-0i_corn_RiboJ_gRNA-0 (10.2 + 11 = 652 bp) were also made and confirmed by digestion enzymes EcoRI and PstI (Image 11).
The cloning of the sequences PT7-lacO_target-0a_PJ23117_target-1i_broccoli_target-1a_PJ23117_target-0i_corn_gRNA-0 (9.1 + 11 = 877 bp), PT7-lacO_target-0a_PJ23117-9.2 broccoli_target-1a_PJ23117_target-0i_corn_gRNA-0 (9.1 + 11 = 877 bp), PT7-lacO_target-0a_PJ23117-9.2. and confirmed by digestion with EcoRI and PstI enzymes (Image 12 and 13).
Result of aligning cloned sequences with sequencing. The top sequence represents the projected sequence and the bottom the sequencing.
Back to top
(1) Ceze, L., Nivala, J., & Strauss, K. (2019). Molecular digital data storage using DNA. Nature Reviews Genetics, 20(8), 456-466.
(2) Reporter, G. S. (2020, October 15). Is your phone tainted by the misery of the 35,000 children in Congo’s mines? The Guardian. https://www.theguardian.com/global-evelopment/2018/oct/12/phone-misery-children-congo-cobalt-mines-drc.
(3) Kaplan, M. (2012). DNA has a 521-year half-life [at 13.1 C]: genetic material can’t be recovered from dinosaurs–but it lasts longer than thought. Nature News, 10.
(4) Zhirnov, V., Zadegan, R. M., Sandhu, G. S., Church, G. M., & Hughes, W. L. (2016). Nucleic acid memory. Nature materials, 15(5), 366-370.
(5) Langeston, Jennifer. “With a “hello,” Microsoft and UW demonstrate first fully automated DNA data storage” Microsoft, https://news.microsoft.com/innovation-stories/hello-data-dna-storage/.
Back to top