What is Foldase ?
Project Cargos program is a simple to use python based application, utilizing the open source ViennaRNA package as a groundwork to fold and analyze RNA secondary structures. The program can be set up using by following the linked github page which details a step by step guide on how to install the associated packages necessary for the program as well as an included example for testing if the program has been properly installed and functioning.
We will go into more detail about the functionality of the current 1.0 Cargo program and later discuss future iterations. So what is Cargo? As mentioned early our software aims to analyze RNA sequences, by isolating the best position possible for an insertion of an IRE into a given 5’UTR region. In order to achieve such a task we relied on the ViennaRNA python package to fold each of the possible candidates after inserting the IRE into all of the possible known positions.
How it functions
This is done by breaking down the sequence into 3 parts: the 5’UTR, 3’UTR, as well as the rest of the sequence all stored in a fasta format. Our program parses this file storing each component, specifically iterating through the 5’ region placing the IRE into each possible position and storing it into a python dictionary. This dictionary is used to generate a fasta file containing these new sequences with the insertion named after what position they were placed in.
The resulting fasta formatted file is passed to Dr.Radeckies Fold-Fastalike program that reformats Vienna files for easier use. Vienna stores data from its functions as postscript files such as the graph and base pair probabilities that we use for assessing which candidate to choose. Fold-Fastalike parses through these files by taking in sequences stored in a fasta format and folding each of them and storing them separately in folders containing a text and pdf version of the base pair probabilities and graph respectively.
We run this same process on the IRE itself, isolating the highest base pair probabilities and storing which nucleotides pair together. Checking those same values, adjusted with where the IRE was inserted, and making sure those values line up. Likewise we also focused on the interaction between the 5’UTR and the 3’UTR which we dubbed as “global kissing” muah that we observed on the Fold-Fastalike sequence graphs. These graphs are generated with every sequence in a pdf format which on the github page we outlined how to interpret, as to optimize a minimal amount of global interactions. Through these pruning methods we isolated the 10 best possible candidates to be shipped out and made for the wetlab team to work with and verify the results of the drylab pipeline.
Future Directions
When it comes to the implementation of this program in other usages, with covid-19 bringing MRNA vaccine research to the forefront of many people's interest, we believe that our package can be utilized for such research. As we are working with the ViennaRNA library it gives us a wide range of rna specific functions that we can use for further iterations of our software for research purposes. Building on this notion of further iterations as mentioned above this is the current 1.0 Cargo software we intend to flesh out to support full automation, as well as access to hyperparameter tuning for a user's need. We believe in our softwares ability to assist in future MRNA research and that we plan on a long term plan for iteratively supporting our work. We will utilize future data to improve the algorithm by modifying pruning parameters and creating subroutines that will focus on parsing the sequence dot plot to create numerical values for “global kissing”