Team:UIUC Illinois/Package

Package | UIUCiGEM

Package


We wanted to develop a machine learning model that was capable of generating novel amino acid sequences. While conducting our research, we discovered that the 2019 Toronto iGEM team attempted to do this via a recurrent neural network (RNN) and the UniRep model, a deep learning model for protein engineering. Their attempt at generating these sequences was successful, but we had a hard time evaluating their output and finding which sequences were viable for validation.

While deciding to review some of the errors in their pipeline, we also wished to improve the model’s workflow by increasing the number of sequences in the training dataset for the RNN. We also opted to make the output more selective for the end user by allowing the user to choose the number of sequences they would like to be generated. These pipeline reworks led us to pursue a new avenue of contribution, which was the development of a Python package. The package would essentially allow us to rework the pipeline’s use cases and make it applicable to more projects beyond our own.

With the development of this package, the pipeline’s functionality is expanded; it is fully customizable. End users can use this package for their own projects by using their own training data, filters, and stability indices, allowing them to adapt the pipeline to work with any other enzyme or protein different from PETase.

The package is available here.