Team:UPF Barcelona/Engineering

Team:UPF Barcelona -



To reach the final implementation goal, several engineering cycles were required.

Engineering cycles

Our project development has been performed with parallel but complementary working teams, each of them and their subgroups having successfully completed the engineering cycle, which mainly consists of: design, build, test and learn. We can confirm that ARIA has passed through multiple engineering cycles several times, since after performing all the steps, we have obtained results in the form of a functional biological system or useful data.

The nature of an engineering cycle helped us realize what aspects of our project were lacking in accuracy and perhaps were not working as expected. When we initially debated what functions could be covered by our systems, we focused on the need for a fast, personalized, and reliable tool to assist medical decision-making. This led us to design three different components based on quite unique technologies.

Soon after we consolidated our project bases, the planning and building of each of the parts began. We had to materialize what seemed so easy with our heads. But thanks to a lot of research and the immeasurable help of our advisors we traced an action plan. Then testing the effectiveness of what we had created became a daily routine, and helped us to distinguish what worked as expected and what needed a little tweaking. And this is where we had to analyze what mistakes of our approaches had to be solved. A deeper understanding and re-design of each component lead us to iterate again on the engineering cycle, improving each time the capabilities of our systems (or hoping to do so).

Next, we display some of the most important engineering cycles (mouse over them and explore them in detail):

1. Lysis type: EDTA / no EDTA / Mechanical / autolysis

1.1. Design

To achieve our goal of direct detection of resistance in cell lysates we tested different lysis methods in order to find the most efficient one.

1.2. Build

We tested the Lysozyme lysis with a buffer that contained EDTA and the Mechanical lysis. We prepared two replicates of each liquid culture that we wanted to lyse. For each type of liquid culture we did the two types of lysis. All the cultures had been previously grown overnight and under the same conditions.

1.3. Test

The results of the lysis were inconclusive. After that, we investigated the possible causes and we realised that the EDTA was one of the possible explanations of the problem.

1.4. Learn

We learned that EDTA is a chelator and that binds to different metal ions such as calcium, magnesium, lead, or iron. So, it reduces ion concentration in the medium and thus the LbCas12 proteine (DNase) is not active. So we realised that could interfere in the lysis reaction.

1.5. Re-design-build-test-learn

As a consequence, we decided to repeat the experiment. This time we did the same protocol, but without adding EDTA to the buffer Lysis. We observe that now we had conclusive results with the lysozyme lysis.

Even though enzymatic lysis without EDTA worked well, it was not ideal for our final implementation realization, since the biosensor must be used just after the lysis was performed, and it is not a fast process. This is because the lysozyme requires the presence of other detergents to break down the cell walls of gram-negative bacteria [1].

The problem is that if the cellular components are exposed for a prolonged time to these detergents, they also end up disintegrating and losing their three-dimensional structure. So, in conclusion, enzymatic lysis worked well in the lab, but it was not ideal for final paper-based implementation.

1.6. Re-re-design-build-test

In order to be more careful and efficient, we realized that by using an automatic lysis method we would intrude less on the biosensors and other intracellular components, as well as we would save time and work. Moreover, it would accelerate the final biosensor library preparation and permit a more fluent final kit usage. A small test to prove its feasibility was done.


2. IPTG concentration: 100 µM / 10 µM for induction

2.1. Design

To be able to induce the production of proteins in the E.coli cells, we needed to add IPTG, an analog of the lactose, to the bacterial liquid cultures.

2.2. Build

For each liquid culture to be induced we prepared two tubes of 15 mL where we added into one tube 10 µM of IPTG and to the other one 100 µM of IPTG. In this way, we were able to test which concentration of IPTG was more optimal for the Cas12 protein and the gRNA induction.

2.3. Test

After inducing with both concentrations we observed that with 100 µM of IPTG the induction was more efficient than with 10 µM.

2.4. Learn

We learned that the induction with 10 µM of IPTG was not enough no express the needed amount of protein and gRNA.

2.5. Re-design-build-test

In the following IPTG inductions we did the same IPTG induction protocol, but just in a 100 µM concentration of IPTG, thus obtaining better results.

3. Cells OD normalization

3.1. Design

The detection experiments were designed in a way that lysed cell cultures were directly used for the Cas12a assay.

3.2. Build

Each detection reaction was carried out by adding the same quantities of bacterial lysate that had come from distinct cultures. The collection of cultures had been growing overnight at the same conditions, and then were equally diluted in fresh LB media and induced with the chosen concentration of IPTG.

3.3. Test

The results for these preliminary experiments were quite disparate. Fluorescence measurements of each detection reaction were not comparable between each biosensor, so reliability was compromised.

3.4. Learn

We learned that each bacterial culture had different growth rates. Consequently, optic density (OD) could vary between those, thus the synthesized quantity of gRNA-Cas12a biosensors.

3.5. Re-design-build-test

This is why we finally decided to read OD measurements after o/N incubation, induce the expression of the plasmids, and read ODs again. After each OD measurement, we considered normalizing bacterial cultures to a similar concentration. This way we could assure that differences in the fluorescence results were not due to distinct saturation of the cell cultures.

4. gRNAs for target recognition

4.1. Design

Whilst designing the gRNA constructs to be transformed into E. coli, we took into account the following aspects following LbCas12a requirements: not to include the PAM sequence, to preserve the 20 nucleotides from the direct repeat sequence (DR), and to keep the variable sequence between 18-24 nucleotides [1].

4.2. Build

The vector used for gRNA expression already contained the direct repeat sequence. However it was missing the first nucleotide, so it had to be added via PCR and then perform a simple ligation. Then, the variable sequence from the gRNA was ordered as two separate ssDNA oligonucleotides that were complementary annealed, generating the proper overhangs for ligation. Consecutively, the insert was cloned into the plasmid via Golden Gate assembly.

4.3. Test

Testing of the gRNA-Cas12a constructs did not go as expected. Variable and non-coherent results led us to think that something was not working quite successfully.

4.4. Learn

Further research and the advisor’s feedback helped us realize that there was something missing in the gRNA construct. As mentioned on the design page, Cas12a has the ability to process its own gRNAs. For achieving this successfully, it is recommended to build a transcript containing the conserved sequence (DR), followed by the proto-spacer, and another repeat sequence downstream [2]. We were lacking on this last element, but also a terminator was missing downstream of the gRNA sequence in order to stop transcription at the end of the sequence of interest. Both problems could be contributing to an abnormal length of the transcript, resulting in an inefficient gRNA.

4.5. Re-design-build-test

These two elements were considered as key components for proper processing of the pre-crRNA, so new constructs were designed including them both. Also, since our spacer sequence was 20 nt long, we added 4 extra base pairs downstream. This would keep it as long as possible (between the range of 18-24 bp), and prevent excessive trimming and loss of key nucleotides after Cas12a processing. This new design allowed us to build and test new constructs which gave better results.

5. Reporters for gRNA-Cas12a activity read-out

5.1. Design

Taking as a reference other previously developed CRISPR-based detection models [3][4][7], we chose a commercially available fluorophore-quencher based DNase reporter [5]. This ssDNA is cleaved as a consequence of non-specific trans-cleavage activity of Cas12a, after site-specific recognition and cleavage of dsDNA [6].

5.2. Build

The reporter was commercially acquired and preserved in several aliquotes.

5.3. Test

Multiple detection assays were performed using this DNase alert kit (reporter).

5.4. Learn

Due to the presence of endogenous nucleases in the bacterial lysates (containing the specific gRNA and Cas12a), fluorescence measurements were high in every reaction, both detection assays and negative controls. Meaning that many DNases were responsible for cleaving the fluorophore from the reporter sequence. This made it very difficult to discriminate between true positives in our detection reactions. That is why we aimed to find a reporter with greater susceptibility to be cleaved by Cas12a.

5.5. Re-design-build-test

Based on very recent work regarding optimization of CRISPR-Cas12 diagnostics [7], we found improved versions of the reporter we were using at that moment. We chose two different reporters, based on a greater C nucleotide (cytosine) content and an additional phosphorothioate modification. The first feature allows for greater Cas12a specificity, since it has been determined that the activated enzyme has a much higher affinity for C-rich reporters [7]. Due to order shipment delays, these new reporters were not tested, but they will be tested in after igem stage.

6. Detection sensitivity enhancement

6.1. Design

Our resistant-strain bacterial detection system firstly considered using purified plasmids as dsDNA targets for our biosensor testing.

6.2. Build

Assays were carried out with the LbCas12a and gRNA found in the bacterial lysates, purified plasmids with the antibiotic resistance gene aiming to be detected, and the remaining components needed for the reaction.

6.3. Test

The detection reactions were systematically prepared following the previous considerations. Moreover, optimal target plasmid quantity was assessed preparing dilutions of various concentrations (90ng/µL, 25ng/µL, 2,5 ng/µL). No substantial differences of the detection results were found, and there were still incongruences regarding comparison with the negative controls (without dsDNA template, or without Cas12a).

6.4. Learn

After analyzing the collection of results, we compared our methods to other previously developed CRISPR-based detection assays. In these, a RPA (Recombinase Polymerase Amplification) or other amplification reactions were coupled to the gRNA-Cas12a assay. Also, reactions were obviously carried out in vitro, meaning there were no endogenous cellular components. These factors lead to more precise and sensible systems, which we aim to achieve by perfecting our design.

6.5. Re-design-build-test

In order to achieve a more specific collateral dsDNA cleavage (for Cas12a activity read-out), we aim to find a more precise reporter. But also propose to couple the detection reaction to a prior amplification of the dsDNA template material. These adjustments would be expected to improve our biosensors performance. Anyway, due to lack of time and resources for implementing these modifications, we could not accomplish them yet.

1. AlphaMine

1.1. Design

The goal originally proposed was to find dangerous genetic components within bacterial genomes. For this, it was necessary to construct pangenomes that concentrated the properties of the strains analyzed, thus being able to establish comparisons to identify these elements of interest. It was proposed as a first approach to make a multiple alignment system applied to groups of candidate sequences, thus looking for what the dangerous strains had in common.

1.2. Build

A comparison system prototype was built that implemented these principles, looking for candidates based on the similarity in length of the sequences and evaluating their exclusivity by means of multiple alignments.

1.3. Test

When evaluating the performance of the system, the time required for each genome was too long to make the construction of pangenomes that include at least 100 candidate strains feasible. So, we had to look deeply where the inefficiencies were.

1.4. Learn

After conducting an in-depth analysis of the program workflow, it was found that, for certain situations, each individual alignment was too time consuming, delaying the entire system.

1.5. Re-design

Taking into account the learning carried out previously, it was proposed to reconstruct the mechanism regardless of multiple alignment, and it was then decided to use methods based on the frequency of words. This made it possible to overcome the bottleneck and accelerate the speed of the system by two orders of magnitude, reaching the current AlphaMine base.

2. AlphaNeuro

1.1. Design

How can we find, analyse and understand the inner mechanisms of resistance, virulence, and gene transfer? Before starting programming all the models, we needed to learn, understand and plan how our software had to be constructed. Therefore, the early stages of the project were based on the research of studies that used artificial intelligence tools in order to examine genetic sequences. We found the work of Alexander Scarla [8], who conducted a binary classification of gene sequences using supervised learning. From his project, we learned two things: our sequences needed to be labeled and a model based on deep learning networks could be a great tool.

With these in mind, we designed an ensemble of deep learning models using Convolutional Neural Networks (CNN) which shared a common architecture. Each of the networks had one unique job: classifying if a sequence gave resistance to a specific antibiotic.

1.2. Build

How was this initial network constructed? The network was built and trained in an open Google Collaboratory notebook using Python as computational language. For more details about the construction of a CNN, check our Software page.

1.3. Test

Were the results obtained good enough? After analysing the first results obtained, we observed that the accuracies were not good enough, as some of the CNNs did several misclassifications and the predictions were inconsistent across models. The ensemble of CNNs was not able to give robust predictions, as in most cases the same sequence was classified as resistant for many antibiotics types.

1.4. Learn

Why wasn’t the network correctly classifying? The strategy of building individual models for each of the antibiotics could not extract good enough patterns in order to correctly classify each of the sequences. Therefore, a new approach using a multi-class classification model could be a solution.

1.5. Re-design:

How could we improve our initial design? Having in mind the first results obtained and the learning we gained from them, we decided to restructure the backbone of our system by designing a network in which the models were interconnected. This way, the output of one CNN was the input of another CNN. We also decided to include information about virulence and gene transfer at this point of development, as we learned that these types of genes have an impact on the bacterial acquisition of antibiotic resistance.

The first layer of the system needed to be a multi-class CNN that was able to classify the sequences as a resistance, virulence, or promiscuity gene. Then, depending on the prediction done, the sequence would be input in one of our unique models: Resistance, Virulence of Promiscuity. In order to learn more about this process, visit our Analysis-Focused Deep Learning site.

3. ARIABuilder

3.1. Design

Assuming we had a set of candidate sequences that would be interesting to detect, the goal was to build a system that would find the spacer templates for gRNA with the highest probability of success. The proposed strategy was to filter the sequences by exclusivity, locate the PAM sequences in them and generate candidate templates based on the adjacent kilometers. These templates were filtered according to their content in GC, promoting those sequences with intermediate values.

3.2. Build

To implement this proposal, a first version of the program was built that took the sequences as input, and sequentially applied the previously mentioned procedures, showing at the end the a subset of the selected templates.

3.3. Test

To check the operation of the system, different sequences were introduced, and the resulting templates were saved. When introduced into external evaluation tools, the scores were relatively low.

3.4. Learn

These results showed that the single criterion used was not reliable, since the intrinsic properties of the target sequence were somehow being ignored when determining its suitability.

3.5. Re-Design

In this context, it was necessary to add a better criterion when filtering the templates. This is why the entire system was redesigned to include the calculation of the Doench score, which takes into account the specific structure of the target sequence and the non-linear interactions of the adjacent nucleotides. With this, the efficiency of the produced insoles could finally be improved.

4. OmegaCore

4.1. Design

After having been working on studying the presence or absence of markers associated with resistance, it was important to begin to propose some mechanism to reveal how, from bottom to top, these elements make dangerous behaviors emerge. To do this, we propose to generate a 2D CNN that should analyze matrices such as those that our detections would produce and relate them to the risks that their strains could present.

4.2. Build

We prepared a training notebook following these characteristics, and we synthesize some fictitious input data for a series of classes whose probability distributions we knew beforehand.

4.3. Test

When performing the training with the simulated data, we observed that there was stabilization in mid accuracies, and the confusion matrix was not as good as we expected.

4.4. Learn

We realized that for the architecture we were working with, the search space was too large, and that was why there was never enough training to improve the performance of our system. We needed to find a new way to deal with the problem.

4.5. Re-Design

We finally determined that the best way to fix it was to break the problem down into simpler tasks. That is, instead of training a huge classifier that contemplated all the possibilities, we could train a large set of small mono-class CNNs, and add the verdict of all of them to evaluate a complex sample. This approach was the cornerstone behind OmegaCore.


5.1. Design

For us, one of the keys was that ARIA could be easily accessible to a hypothetical end-user. This is why we were designing a mechanism that would allow us to analyze the samples that the biosensors would produce automatically. For this, we designed a methodology that consisted of analyzing the images captured by the camera in search of basic geometric shapes, focusing on the squares as indicators of the presence of objects of interest (the matrix and its cells).

5.2. Build

We made a prototype of an artificial vision system that carried out the analysis in two phases: first, it looked for the matrix, and then it looked for each of its cells individually, scanning its content to determine if they were associated with positive or negative detections.

5.3. Test

When we tested the prototype, it appeared to be robust to lighting changes and rotations. However, it did not take long to verify that the system did not usually detect the cells near the edges.

5.4. Learn

After several experiments, our observations pointed out that we could not trust that the system would always find the small squares that made up the cells of the matrix, especially in the specific situations in which we had seen that the mechanism was more difficult.

5.5. Re-Design

To fix this problem, we rethought the second phase in the discovery process. Instead of searching the cells individually, we would apply a real-time angle correction that would place the image in alignment all the time, thus allowing us to impose a uniform grid. This methodology allowed to increase the robustness of the system, and at the same time solved the problem of the edges since when imposing the grid they were automatically included.

6. Omega Architecture

6.1. Design

One of our last goals was to create a prototype of an internet-based system. To explore this possibility, we designed a simple client-server architecture, in which the client had to send the data in question, in such a way that the program on the server could post-process it.

6.2. Build

Taking this into account, we proceeded to create a structured communication based on sockets, in which the client, when activated, connected using the IP directly to the server in question, which was in listening mode, and sent the data.

6.3. Test

When we did the first tests, we saw that the messages were not usually received correctly, all depending on the execution. Due to this, we proceeded to evaluate point by point what was happening in both parts, looking for what could be the source of instability.

6.4. Learn

Analyzing the operation of the architecture and both parts, we realized that it was key to understand how the encoding-decoding of the messages depended on the type of information that was being intended to be sent or received, respectively (it was not to be treated the same a string that, therefore, making communications in a raw or unstructured way was not feasible, since there was a very high risk of information loss.

6.5. Re-Design

To fix this problem, we designed a simple communication protocol based on requests, in which both the client and the server communicate in an orderly and bidirectional way to indicate mutually which operations they were going to execute. In this way, they could coordinate and send the necessary information in a secure and controlled way.


[1] Integrated DNA Technologies. (2021, 30 julio). Guide RNA design: Be on target!!

[2] Campa, C. C., Weisbach, N. R., Santinha, A. J., Incarnato, D., & Platt, R. J. (2019). Multiplexed genome engineering by Cas12a and CRISPR arrays encoded on single transcripts. Nature Methods, 16(9), 887–893.

[3] East-Seletsky, A., O’Connell, M. R., Knight, S. C., Burstein, D., Cate, J. H. D., Tjian, R., & Doudna, J. A. (2016). Two distinct RNase activities of CRISPR-C2c2 enable guide-RNA processing and RNA detection. Nature, 538(7624), 270–273.

[4] Kellner, M. J., Koob, J. G., Gootenberg, J. S., Abudayyeh, O. O., & Zhang, F. (2019). SHERLOCK: nucleic acid detection with CRISPR nucleases. Nature Protocols, 14(10), 2986–3012.

[5] Nuclease detection and control. DNase alert kit. Integrated DNA Technologies.

[6] Chen, J. S., Ma, E., Harrington, L. B., da Costa, M., Tian, X., Palefsky, J. M., & Doudna, J. A. (2018). CRISPR-Cas12a target binding unleashes indiscriminate single-stranded DNase activity. Science, 360(6387), 436–439.

[7] Liang, M., Li, Z., Wang, W., Liu, J., Liu, L., Zhu, G., Karthik, L., Wang, M., Wang, K. F., Wang, Z., Yu, J., Shuai, Y., Yu, J., Zhang, L., Yang, Z., Li, C., Zhang, Q., Shi, T., Zhou, L., . . . Zhang, L. X. (2019). A CRISPR-Cas12a-derived biosensing platform for the highly sensitive detection of diverse small molecules. Nature Communications, 10(1).

[8] Scarlat, A., 2019. Predict antibiotic resistance w gene sequence.