Team:UPF Barcelona/Design

Team:UPF Barcelona - 2021.igem.org

Project

Design

In order to start each of our multiple engineering cycles, the design of ARIA’s three main components was the first step. In this page, this first cycle stage for each of the three main elements of our project is exposed.

Introduction

Our whole project consists of 3 different modules that have been produced in parallel. Two of them are based on Artificial Intelligence (AI) and the third is a wet laboratory module.

Alpha is a research AI made from a variety of tools that is capable of analyzing thousands of genomes from public databases on the internet, constructing pangenomes and finding differences between genomes. Then, it is able to characterize if different DNA regions fit in antibiotic resistance characteristics or even other forms of action such as promiscuity and virulence. Finally, it can construct a set of gRNAs by analyzing word frequency in the sequences in order to create a matrix organized by functional classes.

Alexandria is a library of self-growable biosensors that have been engineered to target specific antibiotic resistant sequences found by Alpha. This step is crucial because it allows us to detect and sense individual bacterial resistant patterns. The information is deployed as a matrix where each element contains a different biosensor specific for the detection of a particular resistant sequence.

Omega is born as a second set of AI modules designed for the diagnostic prediction. Based on the markers found in the arrays it can be trained to infer resistant profiles of the pathogens sent by the users. The information arrives to the system by the means of a cross platform application that is capable of capturing the array and identifying the sensible information. To ensure a good communication between the system and the user, a server architecture is provided to allocate the algorithms and make them accessible to the public.

Each part is interconnected to the others as seen in Figure 1. In order to see the general pipeline of each part and their interconnections, mouse over the tags 'Alpha', 'Alexandria' and 'Omega' and the arrows 'Alpha → Alexandria' and 'Alpha → Omega'.

Figure 1: Scheme of the 3 modules that make up ARIA with their respective pipelines summarized in the most important steps.

Mouse over the tags 'Alpha', 'Alexandria' and 'Omega' and the arrows 'Alpha → Alexandria' and 'Alpha → Omega'.

Alpha
Alexandria
Omega

Alpha is the first system of our pipeline. It plays a key role in the project, as it allows the computational analysis and understanding of whole bacteria genomes in order to find genetic hallmarks that describe the mechanisms behind antibiotic resistance, virulence and promiscuity. This information will be then used to guide Alexandria so as to build the finest biosensors possible and in Omega, where Alpha’s functional models and Alexandria’s detections will be coupled in order to provide the final output.

Alpha is also a complex system constructed by multiple connected stages, each of which relies on distinct computational tools as a way to complete a specific task.

Efficient and Flexible Genomic Analysis

To be, or not to be resistant? That is not the only question here, but a fundamental one. It is clear that antibiotic resistance is eminently complex by its nature, and that as knowledge about it increases, the number of possibilities and the multiple mechanisms involved becomes greater, especially regarding indirect interactions that may create the right conditions for resistance to emerge. In this context, it could be valuable to analyze the biological system from a bottom-to-top perspective, focusing on how the connections between simple elements bring out the dangerous capacities of resistant bacteria. Thus, it may be worth using approaches that seek to find from scratch those threads to pull at first: what may be seating the bases for resistance? That was the motivation to create AlphaMine. This is a modular, simple and flexible Python-based genomics software. To perform its task, AlphaMine does not depend on traditional sequence alignment but a much more efficient combination of word frequency-based methods and low dimensional clustering. The software can be easily integrated into computational pipelines, and it is intuitive to use and configure. It focuses on the autonomous analysis of large collections of prokaryotic genomes from a comparative perspective and relies on set theory operations. AlphaMine is designed as a nested-class structure. Its core module, called Pangee, is the one directly manipulating the genomes, performing set operations, and constructing pangenomes. Then, the system is built on Pangee with the other parts, such as a preprocessor for data preparation, a command-line interface for better user control, or a managing mechanism to bridge between the higher-level instructions and the internal processes.

AI-Powered Functional Labeling of DNA Sequences

Once all the hallmarks are found and isolated, we need to identify the function of each of them. In other words, we require a system that is able to read a genetic sequence and give us an output that describes if the sequence encodes information for resistance mechanisms, virulence factors or gene transfer phenomena.

To perform such a complex task, we decided to use a very novel Artificial Intelligence tool: Deep Learning (DL) models. DL allows the fast analysis of massive amounts of data, which is used to train computationally models that find patterns embedded among the sequences, which cannot be perceived by manually studying said sequences. The patterns learnt, in the end, will allow the models to classify and indicate the mechanism of action of each of the sequences.

If we dig a little deeper, our DL system is based on an organized series of Convolutional Neural Networks (CNNs) models. Each of the models fulfills a very specific duty, and when put all together they can classify all the sequences depending on their mechanism of action.

**Figure 1:** Architecture of our CNN models.

Biosensors library

1. General concept

Alexandria is a library of biosensors which aim to detect if a biological sample with pathogenic bacteria is resistant to specific antibiotics. For designing it, we considered engineering E. coli competent cells in order to convert them into our biosensor factory. With this idea, we came up with self-growable specific biosensors. But, how do we engineer them? The first step is to design the plasmids needed to encode all the elements required for our biosensors.

**Figure 1:** Biosensors growing in living bacteria.

2. CRISPR-Cas technology

The elementary technology of our biosensors is the well-known CRISPR-Cas machinery, which has been implemented in multiple plasmids. This stunning technology is simply based on two main elements: the Cas (CRISPR associated protein) endonuclease enzyme, which in our case is LbCas12a (from Lachnospiraceae bacterium strain), and the specific gRNA that guides the protein to the target sequence [1]. Once the two elements bind, and only if the gRNA recognizes a possible specific target DNA sequence, Cas12 gets activated, being now able to specifically bind and cut the sample [2] (cis cleavage). Immediately after, the protein starts to cut, in a collateral way, other inespecific DNA sequences that are present in the media. This key feature is known as collateral trans cleavage activity [2] and is crucial for reporting the biosensor decision. This is explained more in detail in the last steps of the engineering cycle: testing and learning (see Biosensor Library Testing).

**Figure 2:** gRNA-Cas12a DNA target recognition. Cis-cleavage activity (left), collateral trans cleavage activity (right).

3. LbCas12a protein

We chose the LbCas12a protein [3] since it has been the most widely used Cas enzyme for detection using the CRISPR-Cas12 mechanism [2]. The original plasmid employed for cloning Cas12 was taken from Addgene [4], but some modifications were made with the purpose of optimizing its functionality. These basically consisted of deleting purification sites and unuseful sequences codifying for MBP (maltose binding protein). Moreover, the modified plasmid expresses our protein under the T7 promoter so that it is inducible with IPTG, and it has ampicillin resistance to allow for its selection.

**Figure 3:** Trimming of MBP from LbCas12a plasmid.

4. Guide RNAs design

Importantly, the most complex part of using CRISPR is the guide RNA (gRNA) design, since it has several rules and restrictions. In fact, even following them, the gRNAs do not always work as expected. For their specific design (to be coupled to LbCas12a), we followed the guidelines provided by New England BioLabs [5]. In this case, these artificially programmed gRNAs are only made up of crRNA (as Cas12 does not require tracrRNA), constituted by a common sequence called repeat and a variable one known as spacer. The first is maintained between all guide RNAs whereas the latter, with a length between 18 and 24 nucleotides, depends on the desired targeting site as it has to be complementary to it for promoting its recognition. However, the targeted sequences must fulfill an important requirement in order to be detected: the complementary chain to the one being targeted must precede (be in its 5’ end) a protospacer adjacent motif (PAM), a small sequence that for our nuclease is concretely TTTV, being V any nucleotide except T [5].

With all this in mind, we designed a total of 5 gRNAs, each of them targeting the following resistance genes: Ampicillin, Chloramphenicol, Erythromycin, Kanamycin and Spectinomycin. After having completed the full engineering cycle, in the testing and learning phases, we realized that our designed plasmids for gRNA transcription lacked any terminator. This structure is a crucial element for triggering the end of transcription [6]. The lack of a functional terminator would imply extension of the transcript further away from the gRNA codifying region, resulting in a nonfunctional gRNA.

On another note, after having consulted on the literature, we learned that Cas12a is able to process its own crRNAs (CRISPR RNAs) [10]. This means that, once the succession of direct repeat (DR) and spacer sequences are transcribed, the resulting transcript (named pre-crRNA) can be processed into mature gRNAs as a result of the dual RNase/DNase activity of Cas12a [7]. For that event to happen, the spacer sequence (necessary for target DNA recognition) should be followed and preceded by a DR. Specifically, it is known that Cas12 cuts the pre-crRNA 4 nucleotides upstream of the hairpin structures formed by the DR [8]. That should be taken into account when designing gRNAs, in order not to lose key nucleotides after Cas12a processing.

**Figure 4:** Expression of the LbCas12a plasmid and assembly of the protein (1). Transcription of the gRNA plasmid (2). Cas12a processing
of the pre-crRNA, cutting 4 nucleotides upstream of the DR hairpin structure (3). Coupling of LbCas12a and the processed gRNA (4).

Taking all that into account, we adapted and perfected the gRNAs design. The new versions included two DR separated by the spacer with 4 additional base pairs downstream, as well as the L3S2P21 terminator at the end of the construct [9]. These modifications were made to ensure that gRNA transcription ended correctly and to avoid that in the process of gRNA trimming, the length of our spacer remains shorter than recommended [8]. The first non-efficient gRNAs, and the improved ones are available on the Parts page.

When gRNAs are successfully designed, they are cloned in a plasmid backbone also taken from Addgene [11], which was modified with a T insertion to obtain the proper spacer sequence. The promoter is again T7, and the selection resistance is in this case kanamycin.

**Figure 5:** Addition of a thymine nucleotide (T) to the direct repeat (DR) sequence of the gRNA plasmid backbone.

5. Transformation approaches

As we are working with two different plasmids, promoters and selection resistances are important for our approach definition. On one hand, a cotransformation of both plasmids in the same E. coli was thought, and in this context the different resistances are important in order to be sure that both plasmids are being integrated. On the other hand, a single transformation of each plasmid in a different culture can also be considered. For both cases, the induction with IPTG for the T7 promoter is usable because it would induce both plasmids at the same time. In the end we believe that it is better to have the two plasmids separately since all biosensors will be constituted by Cas12a. This part would remain constant and it would not be necessary to co-transform different strains with the new gRNAs and the plasmid for the Cas12a protein.

6. Cell lysis

The next design step was to think about how to extract the biosensor elements for detection from the engineered E. coli. To tackle this, the most intuitive and effective solution is to lysate the cells. Moreover, different lysis methods have to be considered according to their pros and cons.

Initially, both mechanical and enzymatic lysis were considered as they are widely used and can be performed using commercial kits. More precisely, the mechanical lysis uses a Bead mill (bead beating method) in order to disrupt the cell membrane and release the internal components [12]. The latter relies on enzymatic machinery, used commonly by living organisms to degrade biomolecules. Both approaches require laboratory procedures such as incubation or centrifugation, being unfeasible for fast deployment on a prototype. This is why after having tested both of them as part of the engineering cycle, we decided to take a parallel path by restarting the design, the first cycle step.

The new design approach consisted of automatizing the cell lysis and not depend on laboratory-based procedures. Luckily, there’s an existing part that programs bacterial cells for autolysis [13], which therefore can be used strategically for this matter. Protein E, the lysis gene codified by the bacteriophage PhiX174, provokes cell death when infected by this virus [14]. So inducing its expression on the biosensor-producing cells would resolve the lysis issue seamlessly.

The plasmid design for Protein E was implemented by choosing a distinct resistance gene from the ones present in the other plasmids, in order to co-transform them both into the cells and be able to distinguish successful transformations. Also, the PBAD promoter was chosen due to its susceptibility to be inhibited by glucose avoiding leakiness. This factor is crucial, as leakiness in Protein E expression would lead to lysis of the cells at undesired times (e.g. before induction of gRNA and LbCas12a expression).

**Figure 6:** Cloning of Protein E into pBAD promoter plasmid with chloramphenicol resistance (1).
Transformation of protein E plasmid into *E. coli* cells (2). Induction of protein E transcription and consequent autolysis (3).

Paper-based array

1. Basic design

Once our biosensors are ready, they will be placed in a cheap and simple paper-based structure, which will be the final architecture and whose output information will go through Omega’s computational network.

2. Architecture approaches

More specifically, taking into consideration other recent work [15][16][17], diverse approaches, depending also on the requirements of the user, have been considered for achieving biosensors’ viability and detection success.

Ready-to-use array

On one hand, there is the option to offer the final user a ready-to-use array. Within this section, one could additionally choose between two other options:

Providing an array containing lyophilized cells with the biosensors inside. For this, the biosensor-producing cell cultures could be massively produced and then deployed onto the paper device, for later addition of the patient's sample and fluorescence readout.

The procedure would consist of incubating bacteria with high concentrations of glucose and IPTG inducing the expression of the biosensor functional elements, while also preventing leakiness of the autolysis protein expression. Then, aliquots of each culture would be placed into the paper-based array, dried at 37º with the presence of chemical desiccants, and finally distributed worldwide for clinical use. Prior to adding the patient’s sample, arabinose-rich media should be added to induce autolysis and release the cell internal components. Then detection could be carried out directly on the device.

The other option would be to supply an array with freeze-dried biosensors. For that, bacterial lysis could be induced before disposal onto the paper. This way, the detection devices would be distributed ready to use, since there would not be any barriers between the biosensors and the DNA material aiming to be targeted. Degradation over time and stability of the gRNA-Cas12a complexes should be tested to assess the viability of this approach.

For this case, a design could be proposed where the locations of the biosensors are delimited by a wax printing, so that after the addition of the samples, the reagents are not mixed [18].

Aliquots with growing biosensors

On the other hand, there is the possibility that the final-user can dispose of aliquots with the specific biosensors growing so that the array can be built in a customized way. In this case, since the cells are alive, it would be advisable to mix these cell cultures with LB-Agar in liquid form and then deposit a drop directly on the paper. Mixing with this culture medium would prevent the dispersion of the cell cultures on the surface. Furthermore, once the droplet is solidified, the cells would continue in a state of growth. This method would imply an additional advantage for detection is that because it would allow better control of the position of each cellular element, which would be favorable in the case of having several samples to be detected.

In this particular case, one could think of designing a hardware containing multiple channels to stamp the cells with the agar on paper [16]. Indeed, this is an approach we want to develop after iGEM.

Detection

After all, the most challenging aspect of working with this paper-based architecture would be to achieve an optimal output signal, readable by our Iris capture system. Recently, diverse paper-based models have raised by giving fluorescence-based solutions [16]. In this case, it could be coupled with the fluorophore-quencher reporter used in our proof-of-concept. To allow the product not to require the presence of special and more exclusive equipment (such as a plate reader or a UV light source), our array design could be coupled with a portable fluorometer. In 2019, the Lambert iGEM Team , for their Labyrinth project, designed a fluorometer called FluoroCent, which is portable and inexpensive, ready to use anywhere.

However, other reporter methodologies could be considered as future work, in order to achieve a more reliable and easy-to-quantify signal that could be for example seen by the naked eye, as for instance with the use of the chromoprotein AmilCP [19].

The engineering cycle for this project section has not been finished and remains for future work after igem competition. But, as shown in this design explanation, lots of ideas and possible implementations can be applied.

AI-Powered Analysis of Biological Systems

Alpha systems are intended to provide useful knowledge, but to fulfill our purpose we need a way to embed said knowledge in a platform that, with sufficient generalizability, can turn it into power to analyze reality and predict what its properties are. This is the premise behind Omega, but it is precisely OmegaCore that embodies this definition in its purest form. OmegaCore is a system that seeks to capture, from a bottom-up approach, how the dangerous behaviors of resistant bacteria emerge through the potential interactions of their constituent mechanisms. For this, it focuses on generating lightweight Convolutional Neural Networks (CNN), the so-called subunits, which are trained to separately evaluate the absence or presence of each of the behaviors of interest: that is, whether or not the whole is the sum of its parts. To achieve this, the system will be fed with simulated detection matrices, resulting from probing the genomes of interest with the design provided by ARIABuilder. The result of the process will be a collection of subunits (one for each situation of interest) as self-contained models, which can then be incorporated into an inference module. In this way, by adding the verdict of each one of these subunits, the inference module would determine the complete resistance profile of the sample in question.

References

[1] Schindele, P., & Puchta, H. (2020). Engineering CRISPR/LbCas12a for highly efficient, temperature-tolerant plant gene editing. Plant Biotechnology Journal, 18(5), 1118–1120. doi: 10.1111/pbi.13275

[2] Ramachandran, A., & Santiago, J. G. (2021). Enzyme kinetics of CRISPR molecular diagnostics. Biorxiv(5). doi: 10.1021/acs.analchem.1c00525

[3] Part:BBa_K2927005 - parts.igem.org. (n.d.). Retrieved October 5, 2021, from iGEM. Link here.

[4] Addgene: pMBP-LbCas12a. (n.d.). Retrieved October 5, 2021, from Addgene. Link here.

[5] New England Biolabs. (n.d.). How do I design a guide RNA for use with EnGen Lba Cas12a? Retrieved October 5, 2021, from NEB. Link here.

[6] Gusarov, I., & Nudler, E. (1999). The mechanism of intrinsic transcription termination. Molecular Cell, 3(4), 495–504. doi: 10.1016/s1097-2765(00)80477-3

[7] Campa, C. C., Weisbach, N. R., Santinha, A. J., Incarnato, D., & Platt, R. J. (2019). Multiplexed genome engineering by Cas12a and CRISPR arrays encoded on single transcripts. Nature Methods, 16(9), 887–893. doi: 10.1038/s41592-019-0508-6

[8] Paul, B., & Montoya, G. (2020). CRISPR-Cas12a: Functional overview and applications. Biomedical Journal, 43(1), 8–17. doi: 10.1016/j.bj.2019.10.005

[9] Part:BBa K2675031 - parts.igem.org. (n.d.). Retrieved October 5, 2021, from iGEM. Link here.

[10] Swarts, D. C., van der Oost, J., & Jinek, M. (2017). Structural basis for guide RNA processing and seed-dependent DNA targeting by CRISPR-Cas12a. Molecular Cell, 66(2), 221-233.e4. doi: 10.1016/j.molcel.2017.03.016

[11] pT7-G-LbCas12a crRNA-BsaI cassette (MSP3495). (n.d.). Retrieved October 5, 2021, from Addgene. Link here.

[12] Shehadul Islam, M., Aryasomayajula, A., & Selvaganapathy, P. (2017). A review on macroscale and microscale cell lysis methods. Micromachines, 8(3), 83. doi: 10.3390/mi8030083

[13] Part:BBa_K2500006 - parts.igem.org. (n.d.). Retrieved October 5, 2021, from iGEM. Link here.

[14] Witte, A., Bläsi, U., Halfmann, G., Szostak, M., Wanner, G., & Lubitz, W. (1990). PhiX174 protein E-mediated lysis of Escherichia coli. Biochimie, 72(2–3), 191–200. doi: 10.1016/0300-9084(90)90145-7

[15] Bhadra, S., Nguyen, V., Torres, J. A., Kar, S., Fadanka, S., Gandini, C., Akligoh, H., Paik, I., Maranhao, A. C., Molloy, J., & Ellington, A. D. (2021). Producing molecular biology reagents without purification. PLOS ONE, 16(6), e0252507. doi: 10.1371/journal.pone.0252507

[16] Mogas-Díez, S., Gonzalez-Flo, E., & Macía, J. (2021). 2D printed multicellular devices performing digital and analogue computation. Nature Communications, 12(1). doi: 10.1038/s41467-021-21967-x

[17] Struss, A., Pasini, P., Ensor, C. M., Raut, N., & Daunert, S. (2010). Paper strip whole cell biosensors: a portable test for the semiquantitative detection of bacterial quorum signaling molecules. Analytical Chemistry, 82(11), 4457–4463. doi: 10.1021/ac100231a

[18] Wynn, D., Raut, N., Joel, S., Pasini, P., Deo, S., & Daunert, S. (2018). Detection of bacterial contamination in food matrices by integration of quorum sensing in a paper-strip test. The Analyst, 143(19), 4774-4782. doi: 10.1039/c8an00878g

[19] Part:BBa K592009 - parts.igem.org. (n.d.). Retrieved October 20, 2021, from iGEM. Link here.