Project
Integrated Proof of Concept
Introduction
During the project, we have started with a high-level idea and we have been working on the development of several systems, very different from each other, that would end up shaping it. In addition, we tested small interactions of these modules, trying to understand what is the best way to make them fit together. However, if our goal is to advance the system to a truly functional state, in which the end-to-end pipeline we propose is possible, we first need to consider a critical checkpoint: the integrated proof-of-concept.
As the name implies, this means concretely planning a pipeline from start to finish, but whose constituent units are only those systems that we have developed to date. In addition, it is also necessary to specify the conditions for the various steps to be feasible. If completed successfully, this integrated proof-of-concept implies that the functionalities have been correctly transferred to specific mechanisms and that therefore we can continue to move in this direction, both by increasing the number of embedded systems and the capabilities of each one. On the other hand, and in an equally constructive way, the failures that we detect are those that will allow us to fix incompatibilities between systems and strengthen them in a much more secure and controlled environment than the one involved in the future stage we are heading to.
One of our main goals, however, was to proof our modules functioning. For this reason, we present a specific proof of concept of Alexandria biosensors library (see Wetware proof of concept page).
Having said that, we present our integrated proof-of-concept proposal below. The numbers indicate each of the steps that should be carried out sequentially, while the letters explain events in parallel.
One of the most important problems we have had during the computational part is obtaining the necessary data to prepare our systems: scarce, difficult to access, poor, with labels incompatible with our work... We have dedicated a huge effort to find, pre-process and cure this data, but for an integrated proof-of-concept, this has to go further.
Currently, we are trying to access providers that offer us two distinct elements. On the one hand, reliable genomes with labels according to their resistance or susceptibility for each of the antibiotics studied. On the other, a dataset of DNA sequences associated with a universal set of mechanisms related to the resistance affecting those same antibiotics. If this condition is met, AlphaNeuro can be trained according to the data that AlphaMine will extract, which is vital for the rest of the process.
As an extra requirement, in addition, the selected sequences must be filtered. We have to work only with those that can be then integrated into the plasmids by the wet lab team: if not, there will be a breakpoint in the subsequent steps.
Once the necessary genomes are obtained, the next step is to run AlphaMine for the pair of genome collections related to each antibiotic. Although with this system we can calculate both the resistome and the vulnerome, for the rest of the approach we focus only on the first one. Thus, in the end, we have a list of candidate genes acting as mechanisms associated with resistance..
Having already the initial set of sequences, they go through AlphaNeuro, which has been previously trained. Thanks to the generalization power of convolutional neural networks (CNN), the system discards all those novel sequences that are innocuous, and the dangerous ones are classified according to the mechanisms with which they correlate. The result is a set of lists: each list contains the sequences associated with a specific biomolecular mechanism.
After the sequences have been classified, the lists are entered into ARIABuilder. This system selects for each class a set of candidate sequences, following a criterion of specificity concerning individual mechanisms and detection viability, such as the presence of PAM. Once this is done, the PAMs are located for each of the selected sequences, and the corresponding potential templates are generated.
Each of these templates is evaluated separately, leaving only the best for each sequence. Then, for each class, the group of sequences whose best template has the highest score is selected. In our current configuration, ARIA requires square arrays. Therefore, the number of these sequences that we have selected for each class is equal to the total number of classes. The result is the design of the final array, with a gRNA spacer for each position.
Once the best possible array has been designed and exported, all the available genomes are probed to evaluate which of its markers exist in each strain. Thus, arrays associated with the different combinations of mechanisms can be generated, which already are labeled according to the response to the antibiotic presented by their reference genome. .
Having these arrays already separated, they are provided to OmegaCore, which distributes them according to the types of mechanisms that exist. From here, dozens of subunits, small CNN specialized, are trained to detect the emergence or absence of a specific microbial behavior: resistance to a specific antibiotic, intense pathogenic activity, tendency to share genetic material, etc. After finishing the training, the subunits are exported as h5 models and provided to the OmegaServer.
In parallel, the gRNA spacers produced in step 4 are used as a reference to build the necessary sensors following our laboratory protocols. As we have commented previously, in the final version of our system, the previous steps would have been developed on a massive scale and in anticipation. This would imply that we already have Alexandria in advance, a huge set of generator cells that we can activate or deactivate to produce what we need. Unfortunately, for the integrated proof-of-concept, we have not yet reached that stage, so we need to build and prepare the generating mechanism for each of the gRNAs that we have extracted, which leads us to build a small, sui generis Alexandria.
Apart from the biosensors, the strains whose detection should be emulated are selected. For each of them, a DNA fragment is synthesized and amplified that combines the target sequences of its array interspersed with random base sets. These DNA chains will come to fulfill the role of the pathogen genome later and in a very simplified way. It is also possible to create chains that include positive target sequences from various arrays, thus seeking to emulate multi-resistance.
After allowing the generator cells to grow, the lysis mechanism is activated and the solutions containing each of the biosensors are purified. Subsequently, and having printed the array grid on a conventional paper, we apply a few drops of the biosensor-specific solution on its array cell. We repeat this process as many times as pathogen surrogates have been prepared, so we have an array per "strain" to analyze. After finishing, we apply to each cell (and for all of the arrays) a few drops of the solution corresponding to its pathogen surrogate. We wait a few minutes, and the fluorescence reaction occurs.
After having reached this point, we start up the OmegaServer on a computer A connected to a local network with internet access, so that the Omega Architecture is operational and ready to receive data. At the same time, we attach a simple UV filter to a smartphone, and we configure the device as a USB webcam by connecting it to a computer B, linked to a different local network with internet access (two separate local networks are necessary to ensure that the communication protocol is robust and works properly). On this last computer, we run the IRIS source code, set the IP of computer A and indicate a contact email address. We prepare the lighting conditions and proceed to scan each of the arrays, which will be automatically sent to the OmegaServer, using the communication protocol for requests established by the Omega Architecture.
Every time a request is made by the IRIS client, the array reaches the OmegaServer, and the system exposes it to each of the OmegaCore subunits, noting whether their verdict is positive or negative. Since each subunit is trained to inform about the emergence or not of a certain behavior, in the end, a profile is obtained with which antibiotics the pathogen will resist, what is its level of virulence, and how likely is it to share genetic material. After the CNN-based analysis, the majority resistance mechanisms in the sample are determined. Measuring this, the known antibiotics that present the least overlap are selected, that is, those that will be affected with a lower probability by the profiles detected. Finally, all this information is collected in a written report, which is sent by email to the stipulated contact address.
After obtaining the results, the report received should be compared with the strains on which the pathogen surrogates had been constructed. The objective of this is to evaluate what has been the capacity of the entire system to infer the important information that exists in the genetic material analyzed. Thus, this gives rise to finding where the sources of error and uncertainty are, so we can fix them. In addition, this proof-of-concept allows new features in the operation of the modules, which makes it easier to find new functions and approaches.
Putting this entire structure together implies achieving an engineering cycle at a new level, much more complete and complex than any of the others proposed. Thus, we believe that to make possible the end-to-end pipeline, it is imperative to get this proof-of-concept to work properly: this is the critical first step. That is why we have developed each of the modules mentioned with special care for their flexibility, interaction, and adaptability, so we can properly advance on the proposed direction.