Team:GO Paris-Saclay/Software

Software

Initially, when we were in search of a detection method, we were faced with a choice between Toeholds and crispr / cas. part of our team worked on software for toeholds when choosing a system. Therefore, despite the change of topic, we decided to publish this code for other teams, and also analyzed the data for the IGEM UParis_BME team. For the UParis team, we evaluated the choice of Toeholds for their system. Initially, during the development of the project, we created Software to evaluate the possible crossing/binding of microRNAs with each other before the reaction occurs. Due to the fact that microRNAs are quite small in size, such binding can have a large impact on the overall response. We estimated the level of possible overlaps in two situations - a simplified version with linear microRNA and taking into account the secondary structure. Unfortunately, we could not use the secondary structure estimate due to a lack of computing power (about 1 year). Therefore, if you have computing power, you can test our program on your microRNAs.

UParis_BME team testimony:

“GO_Paris_Saclay took the candidate libraries for detecting microRNA-21 and microRNA-141 that we utilised for our wet lab proof-of-concept cycle. Initially, we considered comparing the predictions provided by the Saclay colleagues with the fluorescence readouts produced by our bacterial system. Unfortunately, we haven’t had sufficient time to make our prototype work, so this part of the collaboration could not happen. Nevertheless, this algorithm may provide great help for teams designing their toehold switch candidate libraries in the future.”

Py-RNA: How does it work?

Py-arn est un ensemble de fonction de comparaison sur des ARNs. Il y a plusieurs méthodes pour l’utiliser, soit par la CLI (command line interface) , soit par une Web Ui (interface utilisateur web).

Que pouvez-vous faire avec ce programme ?
- comparer les deux miARN et visualiser leurs différences dans leurs séquences.

software 1

- comparer 2 miARN et voir à quel point ils peuvent se coller entre eux. L'output est le % de nucléotides qui peuvent se lier entre eux à une position donnée (toutes les positions sont testées d'abord linéairement).

software 2

software 3

software 4

- la même chose que précédement, sauf que cette fois le programme prend en compte que l'ARN peut faire des boucles et "sauter" des nucléotides. On rentre un seuil qu'on estime critique. Par exemple "ça serait problématique si 80% des nucléotides des 2 séquences étaient collées pour une position donnée" : on rentre seuil = 80% et le programme va chercher s' il existe des positions pour lesquelles ce seuil est atteint. Il s'arrêtera à la première position trouvée dans le cas des boucles pour une raison de temps. Cette option n'est disponible qu'en ligne de commande et comporte un problème fonctionnel que nous développeront plus tard.

- le software peut également calculer la probabilité qu'ont 2 miARN à former des structures tiges boucles. Dans le cadre d'un test diagnostique, ces structures pourraient éventuellement gêner notre système de détection. Le programme informatique nous donne les positions pour lesquelles le miARN testé à le plus de nucléotides appariés en tige boucle. Une visualisation de cette position est disponible sur l'interface graphique.

Le programme fonctionne également pour l'ARN et l'ADN en changeant la lettre du code. Il a une limite au niveau du temps de calcul pour la 3ème section, qui augmente selon que l'on augmente le seuil ou que l'on allonge la séquence.

Retrouvez notre software ici.

Le projet est open source. Des contributeurs pourront donc ajouter/améliorer des features qu’ils aimeraient avoir dans le logiciel.
Py-arn est simple d’utilisation. Vous choississez la méthode de comparaison et vous mettez vos 2 ARNs (ou 1), et vous obtenez le résultat avec la Web Ui.

Pour la CLI voici trois exemple =>
python position.py –sequence1=ARN1 –sequence2=ARN2
Le 1er fera simplement la comparaison par position et poussera l’analyse dans le fichier par défault se trouvant dans /tmp/log.txt.
python position.py –sequence1=ARN1 –sequence2=ARN2 –verbose
Le 2ème fera simplement la comparaison par position , poussera l’analyse dans le fichier par défault se trouvant dans /tmp/log.txt ainsi qu’en sortie standard (sera afficher dans l’écran du terminal).
python position.py –sequence1 = ARN1 –sequence2 = ARN2 –log_output = / path / output.txt
Le 3ème fera simplement la comparaison par position , poussera l’analyse dans le fichier donné par l’utilisateur. Le fichier s’appelera output.txt, sera crée dans /path/.
Les autres comparaison en CLI suivent la même logique.

Nos résultats

Les pourcentages d'appareillement entre les nucléotides restent raisonnables. Notre software conforte notre choix de miARN et pourra être appliqué pour tester d'autres marqueurs miARN, notamment ceux révélés par notre modèle mathématiques.

Py-arn is a set of functions for comparing different RNAs. It can be used in several ways: through the CLI (command line interface) or through the web user interface (web user interface).

What can you do with it?
- compare two miRNAs and visualize their differences in their sequences.

software 1

- compare 2 miRNAs and see how much they can stick together. The result is the percentage of nucleotides that can bind together at a given position (first, all positions are checked linearly).

software 2

software 3

software 4

-additionally, as before, except that this time the program takes into account that RNA can form loops and "skip" nucleotides. We introduce a threshold that we consider critical. For example, “it would be problematic if 80% of the nucleotides of the two sequences were stuck in a given position”: we enter a threshold = 80%, and the program will search if there are positions for which this threshold is reached. It will stop at the first position it finds in case of temporary loops. This option is only available on the command line and has a functional problem that we will develop later.

- the software can also calculate the probability of 2 miRNAs forming hairpin structures. In a diagnostic test, these structures could potentially interfere with our detection system. The computer program gives us the positions for which the tested miRNA has the most nucleotides paired in a hairpin structure. A visualisation of this position is available on the graphical interface.

The program also works for RNA and DNA by changing the letter in the code. It has a limit on the level of computation time for the 3rd section, which increases depending on whether we increase the threshold or if we lengthen the sequence.

Here is the project link.

It is an open source project. This way, participants will be able to add / improve the features they would like to have in the software.
Py-arn is easy to use. You choose the comparison method, insert 2 RNA (or 1), click a button and get the result using the web interface.

For the CLI, here are three examples =>
python position.py –sequence1 = ARN1 –sequence2 = ARN2
The first will simply do the comparison by position and send the analysis to the default file located at /tmp/log.txt.
python position.py –sequence1 = ARN1 –sequence2 = ARN2 –verbose
The second will simply do the comparison by position, send the analysis to the default file located at /tmp/log.txt, and also to stdout (will be displayed on the terminal screen).
python position.py –sequence1 = ARN1 –sequence2 = ARN2 –log_output = / path / output.txt
The third will simply compare by position, place the analysis in a file provided by the user. The file will be named output.txt and it will be created in / path /.
The rest of the CLI comparisons follow the same logic.

Our results

The percentages of pairing between nucleotides remain reasonable. Our software confirms our choice of miRNA and can be applied to test other miRNA markers, notably those revealed by our mathematical model.

Py-ARN - это набор функций сравнения различных РНК. Его можно использовать несколькими способами: через CLI (интерфейс командной строки) или через веб-интерфейс пользователя (веб-интерфейс пользователя).

Что вы можете сделать с ним?
- сравнить две miRNA и визуализировать их различия в их последовательностях.

software 1

- сравните 2 miRNA и посмотрите, насколько они могут склеиваться. Результатом является процент нуклеотидов, которые могут связываться вместе в заданном положении (сначала все позиции проверяются линейно).

software 2

software 3

software 4

-То же, что и раньше, за исключением того, что на этот раз программа учитывает, что РНК может образовывать петли и «пропускать» нуклеотиды. Мы вводим порог, который считаем критическим. Например, «было бы проблематично, если бы 80% нуклеотидов двух последовательностей застряли в данной позиции»: мы вводим порог = 80%, и программа будет искать, есть ли позиции, для которых этот порог достигнут. Он остановится в первой найденной позиции в случае возникновения петель по временной причине.

Программа также работает для РНК и ДНК, меняя букву в коде. Он имеет ограничение на уровень времени расчета для 3-го раздела, которое увеличивается в зависимости от того, увеличиваем ли мы порог или если мы удлиняем последовательность.

Ссылка на проект.

Проект с открытым исходным кодом. Таким образом, участники смогут добавлять / улучшать функции, которые они хотели бы иметь в программном обеспечении.
Py-arn прост в использовании. Вы выбираете метод сравнения, вставляете 2 РНК (или 1), нажимаете кнопку и получаете результат с помощью веб-интерфейса.

Для CLI вот три примера =>
python position.py –sequence1 = ARN1 –sequence2 = ARN2
Первый просто выполнит сравнение по позиции и отправит анализ в файл по умолчанию, расположенный в /tmp/log.txt.
python position.py –sequence1 = ARN1 –sequence2 = ARN2 –verbose
Второй просто выполнит сравнение по позиции, отправит анализ в файл по умолчанию, расположенный в /tmp/log.txt, а также в стандартный вывод (будет отображаться на экране терминала).
python position.py –sequence1 = ARN1 –sequence2 = ARN2 –log_output = / path / output.txt
Третий просто проведет сравнение по позиции, поместит анализ в файл, предоставленный пользователем. Файл будет называться output.txt, он будет создан в / path /.
Остальные сравнения CLI следуют той же логике.

Problem

The big problem with py-arn is the loop comparison. You have to compare all the possible overlaps of the two RNAs.

Let's imagine two RNAs of size 25 (hence two RNAs with 25 nucleotides), then it will be necessary to compare the factorial (25!) X factorial (25!). With what has been done, it would have taken ... several thousand years (even with multithreading). So this is 1.551 121 004 333 098 598 4 1.551 121 004 333 098 598 4 possibilities. Humanly, this is impossible.

Improvement

One way to improve might be to use algorithms like ACO (Ant Colony Optimization) to compare cycles. However, the ACO algorithms are mainly used for pathfinding, so I'm not sure if the ACO algorithm can be used in our case, but this is a trail worth exploring.
Or a specialized algorithm with an exception according to the rules we know...

Another big improvement would be to add API documentation (e.g. via swagger). Py-arn has a Rest API when using the web interface. If users want to create their own user interface, they need to know the paths, endpoint parameters, and response format.

It would be ideal for developers to add a Dockerfile. Dockerfile is an instant image of a service, software, executable files widely used in computing. This allows you to quickly create an instance of a service in a service group (cluster).

The final improvement would be to create two web services, one for the Rest API, the other for the web interface. This allows the two services to be distinguished so that they can be installed on two different servers.

Одним из путей улучшения может быть использование алгоритмов типа ACO (Ant Colony Optimization) для сравнения циклов. Однако алгоритмы ACO в основном используются для поиска пути, поэтому я не уверен, что алгоритм ACO можно использовать в нашем случае, но это след, который стоит изучить.
Или специализированный алгоритм с исключением по известным нам правилам ...

Еще одним большим улучшением было бы добавление документации API (например, через swagger). В py-ARN есть Rest API при использовании веб-интерфейса. Если пользователи хотят создать свой собственный пользовательский интерфейс, им необходимо знать пути, параметры конечной точки и формат ответа.

Для разработчиков было бы идеально добавить файл Dockerfile. Dockerfile - это мгновенный образ службы, программного обеспечения, исполняемых файлов, широко используемых в вычислительной технике. Это позволяет быстро создать экземпляр службы в группе служб (кластере).

Последним улучшением было бы создание двух веб-сервисов, один для Rest API, другой для веб-интерфейса. Это позволяет различать две службы, чтобы их можно было установить на двух разных серверах.

Team:GO Paris-Saclay/Software

Software

Logiciel

Софт/ПО