See how we used Hidden Markov Models, Structure prediction, Molecular Dynamics simulations, and Molecular Docking to draw important conclusions and increase our knowledge of our project.

Searching for Novel Cyclotides Using Hidden Markov Models


Hidden Markov Models (HMMs) are probabilistic models which can be used for searching sequence databanks for homologous sequences to a set of given sequences. Using this technique, one can find related peptides to already known ones. It works by analyzing a set of sequences and then building a graph (also known as an HMM Profile) which includes the probabilities of a certain amino acid being at a certain place in the sequence. Using the resulting HMM profile, one can calculate the probability that a new sequence is homologous to the ones that were used as an input for the profile. It now becomes possible to search a sequence database through automated search.

We chose to use HMMs compared to other searching-techniques such as BLAST, because they give more control over which amino acids are most relevant in the desired sequences. E.g. all cyclotides have 6 cysteine residues at certain points in their sequence and therefore they are very important when searching for cyclotides. To ensure such motives were taken into account, we used HMMs.


We aimed to use Hidden Markov Models to find new related cyclotides to a set of antimicrobial cyclotides that fitted our criteria, namely high antimicrobial activity and optimally low toxicity. These cyclotides would then have been used in our expression system.

The original idea to use HMMs to search for homologous peptides was proposed by Prof. Nadine Ziemert, who kindly answered our questions about genome mining (the process of finding specific sequences in large genomes) and our project in general.

As guidance for our search, we were inspired by this paper 1 , in which the authors used translated genomes and HMMs to search for novel defensin-like genes, which are another class of antimicrobial peptides.


To prepare our search, we chose cyclotides for which antimicrobial activity had been tested and which optimally did not show toxicity for human cells 2 3 . The cyclotides are found in table 1.

Name Sequence
Table 1: Our input cyclotides.

As for the database to search for new cyclotides, we chose the genomes of the plants in which historically cyclotides had already been found 4 , as this would increase our chance of finding new ones. They can be found in table 2.

Family Name
Table 2: Plant families with proven cyclotide existence.

Another important assumption was that the genetic code for the cyclotides that we searched for did not have introns, otherwise, we would have run into problems when translating the genomes in all 6 reading frames and then performing the search over the resulting amino acid sequences, without first splicing out the introns. 5

We also searched using only the sequence of the cyclotides themselves and not the whole cyclotide precursor (which includes sequences that are relevant for processing in the cell). This would give our model greater range and flexibility.

Performing the Search

1. Finding Our Profile Cyclotides

To find suitable cyclotides for our profile (which we would use to search for homologous sequences), we searched different peptide databases. We found that the best resource to find cyclotides was the Cybase database, which specializes in cyclic peptides. Using their search tool and filtering for cyclotides on which antibacterial assays had been performed, we found the fitting candidates Circulin B and Cycloviolacin O2 . Using further literary research, we also found Cyclopsychotride A . 6

2. Finding Genomes and Proteomes

As a next step, we had to find the genomes and proteomes from members of the plant families listed in table 2. We got the genomes and available proteomes of the families Cucurbitaceae, Fabaceae, Rubiaceae, Solanaceae from NCBI Genome Datasets 7 . We also only found a single genome of the Violaceae family, which we also got from NCBI Genome Datasets.

In addition to the proteomes that were already available to download, we also used the genomes that we could find and translated them in 6 reading frames. For this purpose, we wrote a custom Python script that utilizes the Biopython 8 package. It takes the genomes as an input and outputs the translated protein sequences in all 6 reading frames. It can be found below.

from Bio.SeqIO.FastaIO import SimpleFastaParser
from Bio.Seq import Seq
from os.path import join
from os import getcwd
from pathlib import Path

# Finds all subfiles containing ".fna" in their name
def find_sequences(save_array, search_path):
    for path in Path(search_path).rglob('*.fna'):

# Find all sequences
print("Collecting all sequence files...")
dna_sequences = []
find_sequences(dna_sequences, join(getcwd(), "genomes"))

# Translate all sequences
s_i = 0
for sequence in dna_sequences:
    s_i += 1
    print("Sequence %d/%d:" % (s_i, len(dna_sequences)))
    print("Translating " + sequence + " in all 6 reading frames...")
    with open(sequence, 'r') as in_handle:
        with open(join("output", "all_sequences_6_frames.fasta"), 'a') as out_handle:
            for title, dna_letters_full in SimpleFastaParser(in_handle):
                # Split DNA into chunks of 260000 NTs to not reach HMM sequence length limit
                dna_letters_split = []
                for j in range(0, len(dna_letters_full), 250000):
                if len(dna_letters_split) > 2:
                    print("Split sequence of %d NTs into %d chunks." % (len(dna_letters_full), len(dna_letters_split)))
                # Translate all chunks
                chunk = 0
                for dna_letters in dna_letters_split:
                    seq = Seq(dna_letters)
                    reverse_seq = seq.reverse_complement()
                    for i in range(0,3):
                        out_handle.write(">%s\n%s\n\n" % (title + "_chunk_" + str(chunk) + "_F" + str(i), seq[i:].translate()))
                    for i in range(0,3):
                        out_handle.write(">%s\n%s\n\n" % (title + "_chunk_" + str(chunk) + "_R" + str(i), reverse_seq[i:].translate()))
                    chunk += 1

3. Building our HMM profile

To build our HMM profile from our cyclotides, we used the Clustal Omega 9 tool to create a Multiple Sequence Alignment of our cyclotides as seen in figure 1.

Multiple Sequence Alignment of our Cyclotides
Figure 1: Multiple Sequence Alignment of our Cyclotides

Following that, we used HMMer version 3.3.2, a command-line tool for performing HMM searches and building HMM profiles, to build our HMM profile.

4. Searching the Protein Sequences

To search the resulting protein sequences, we also wrote a python script that would collect all protein sequences and then call HMMer’s hmmsearch command to perform the actual search. The script can be found below.

from os import walk, getcwd
from os.path import join, split as split_path
import subprocess
from os.path import join
from os import getcwd
from pathlib import Path

# Finds all subfiles containing "protein" in their name
def find_proteins(save_array, search_path):
    for path in Path(search_path).rglob('protein*'):

# Lists all hmms in "./hmm_models"
def list_hmms():
    files = []
    for (dirpath, dirnames, filenames) in walk(join(getcwd(), "hmm_models")):
        for name in filenames:
            if name.endswith(".hmm"):
                files.append(join(getcwd(), "hmm_models", name))
    return files

# Find all protein sequences in subdirectories in "./sequences"
print("Collecting all protein files...")
protein_files = []
find_proteins(protein_files, join(getcwd(), "sequences"))

hmms = list_hmms()

# Concatenate for search
if (len(protein_files) > 1):
    # Concatenate all found protein sequences into one file
    print("Writing all protein sequences to single file...")
    with open(join(getcwd(), "output", "all_proteins.fasta"), 'w') as outfile:
        for fname in protein_files:
            with open(fname) as infile:
                for line in infile:
    # Run all hmms in "./hmm_models"
    print("Running all hmms...")
    for hmm in hmms:["hmmsearch", "-o", join(getcwd(), "output", (split_path(hmm)[-1] + ".out")), hmm, join(getcwd(), "output", "all_proteins.fasta")])
    print("Only single protein file found, no need to concatenate...")
    # Run all hmms in "./hmm_models"
    print("Running all hmms...")
    for hmm in hmms:["hmmsearch", "-o", join(getcwd(), "output", (split_path(hmm)[-1] + ".out")), hmm, protein_files[0]])

We used the HMM profile we built from our cyclotides and also the cyclotide family profile provided by PFAM 10 , a database of HMM models which are used to automatically categorize proteins to their protein families.


After performing our search, we found several potential cyclotide sequences in the translated genomes of the Solanaceae and Violaceae families. We categorized them on whether we were able to find their sequence in the Cybase database or not. The number of members in each category can be found in table 3.

Violaceae Solanaceae
Sequences already in Cybase (Our profile) 8 6
Sequences not yet in Cybase (Our profile) 88 49
Sequences already in Cybase (PFAM profile) 10 3
Sequences not yet in Cybase (Pfam profile) 202 14
Table 3: Amounts of found cyclotide sequences.

A file with all the found sequences we could find in Cybase and a file with the newly found ones are located here .

There is a great number of sequences we identified as potentially novel. Still, this does not mean that they actually are real cyclotides. Further possible filter criteria, like the existence of a signal sequence in close proximity to the cyclotide, could be applied in further steps (as was done in the paper that served as our inspiration 1 ). However, we did not do that, because, as described in the next section, we reevaluated our project and went in a different direction.

Effects on Our Project

Our original idea was to use the newly found cyclotides in our expression system to characterize them and to possibly find new antibacterial ones. However, since our vision for our project shifted more towards grafting existing antimicrobial peptides (AMPs) into the cyclotide framework, we no longer used/needed the found sequences. Still, it was great to show that we could find possible new cyclotides using this methodology.


1 Silverstein, Kevin A. T.; Graham, Michelle A.; Paape, Timothy D.; VandenBosch, Kathryn A. (2005): Genome organization of more than 300 defensin-like genes in Arabidopsis. In: Plant physiology 138 (2), S. 600–610. DOI: 10.1104/pp.105.060079.

2 Tam, J. P.; Lu, Y. A.; Yang, J. L.; Chiu, K. W. (1999): An unusual structural motif of antimicrobial peptides containing end-to-end macrocycle and cystine-knot disulfides. In: Proceedings of the National Academy of Sciences of the United States of America 96 (16), S. 8913–8918. DOI: 10.1073/pnas.96.16.8913.

3 Pränting, Maria; Lööv, Camilla; Burman, Robert; Göransson, Ulf; Andersson, Dan I. (2010): The cyclotide cycloviolacin O2 from Viola odorata has potent bactericidal activity against Gram-negative bacteria. In: The Journal of antimicrobial chemotherapy 65 (9), S. 1964–1971. DOI: 10.1093/jac/dkq220.

4 Troeira Henriques, Sónia; Craik, David J. (2017): Cyclotide Structure and Function: The Role of Membrane Binding and Permeation. In: Biochemistry 56 (5), S. 669–682. DOI: 10.1021/acs.biochem.6b01212.

5 Nguyen, Giang Kien Truc; Zhang, Sen; Nguyen, Ngan Thi Kim; Nguyen, Phuong Quoc Thuc; Chiu, Ming Sheau; Hardjojo, Antony; Tam, James P. (2011): Discovery and characterization of novel cyclotides originated from chimeric precursors consisting of albumin-1 chain a and cyclotide domains in the Fabaceae family. In: The Journal of biological chemistry 286 (27), S. 24275–24287. DOI: 10.1074/jbc.M111.229922.

6 Tam, J. P.; Lu, Y. A.; Yang, J. L.; Chiu, K. W. (1999): An unusual structural motif of antimicrobial peptides containing end-to-end macrocycle and cystine-knot disulfides. In: Proceedings of the National Academy of Sciences of the United States of America 96 (16), S. 8913–8918. DOI: 10.1073/pnas.96.16.8913.

7 Agarwala, Richa; Barrett, Tanya; Beck, Jeff; Benson, Dennis A.; Bollin, Colleen; Bolton, Evan et al. (2018): Database resources of the National Center for Biotechnology Information. In: Nucleic acids research 46 (D1), D8-D13. DOI: 10.1093/nar/gkx1095.

8 Cock, Peter J. A.; Antao, Tiago; Chang, Jeffrey T.; Chapman, Brad A.; Cox, Cymon J.; Dalke, Andrew et al. (2009): Biopython: freely available Python tools for computational molecular biology and bioinformatics. In: Bioinformatics (Oxford, England) 25 (11), S. 1422–1423. DOI: 10.1093/bioinformatics/btp163.

9 Madeira, Fábio; Park, Young Mi; Lee, Joon; Buso, Nicola; Gur, Tamer; Madhusoodanan, Nandana et al. (2019): The EMBL-EBI search and sequence analysis tools APIs in 2019. In: Nucleic acids research 47 (W1), W636-W641. DOI: 10.1093/nar/gkz268.

10 Mistry, Jaina; Chuguransky, Sara; Williams, Lowri; Qureshi, Matloob; Salazar, Gustavo A.; Sonnhammer, Erik L. L. et al. (2021): Pfam: The protein families database in 2021. In: Nucleic acids research 49 (D1), D412-D419. DOI: 10.1093/nar/gkaa913.

Structure Prediction of Our Grafted Cyclotides


To be able to run different models with our constructs, we needed the 3-dimensional structure of our grafted cyclotides. Now because the sequences we constructed have never existed anywhere before, we needed a way to create 3-dimensional structures from the sequences alone. Luckily, through the recent advance of protein folding structure prediction, it has become relatively easy to do this. We used different tools to create our structures that use different approaches. First, we used I-TASSER 11 , which works by searching for similar sequences with known structures and uses these “structural templates” to create the structure of the new sequence. Later on, we used a version of the new AlphaFold 12 , which leverages machine learning to predict structures and is considered highly accurate.


Our first aim was to see whether the sequences we were designing resulted in sensible structures, before ordering the sequences and beginning our work with them in the Wetlab.

Secondly, we needed the 3-dimensional structures of our peptides to run our Molecular Dynamics simulations and Molecular Docking.

Building the Structures

Wetlab Screening

To check whether our sequences would yield sensible structures while designing them, we predicted their structures using I-TASSER and compared them to the nuclear magnetic resonance (NMR) structure, an experimentally determined structure, of our scaffold MCoTI-II (which we got from the PDB (1IB9) 13 ) . We calculated the root-mean-square-deviation (RMSD), a measure as to how far the atoms of two structures are spaced apart at average, given in Ångström, using PyMOL 14 and its “align” command.

Structure Prediction for Our Models

To predict the final structures with which we would run our models, we used AlphaFold. After predicting the structures, we wrote a script that utilizes the Python package Modeller 15 to cyclize the structures, meaning it joins their termini. The final cyclized structures were compared to the wild-type version of MCoTI-II using the same procedure as above. For wild-type KR-12 we used a structure from PDB (2NA3) 16 and for wild-type CHEN we predicted the structure using I-TASSER.


Wetlab Screening

The RMSDs of the tested structures are listed in table 4. (Please note that the constructs are not yet cyclic in the calculations, meaning their termini are not yet joined, but rather the predicted structures themselves).

Construct RMSD vs. MCoTI-II (Å)
c_CHEN_1 (w/o His-tag) 1.096
c_CHEN_1 (w/o His-tag, w/o GSG-linkers) 1.426
c_CHEN_1 1.653
c_CHEN_6 (w/o His-tag) 1.193
c_2xCHEN_1,6 (w/o His-tag) 0.929
c_2xCHEN_1,6 1.274
c_CHEN_1_KR-12_6 (w/o His-tag) 0.831
c_CHEN_1_KR-12_6 1.101
c_2xKR-12_1,6 (w/o His-tag) 1.313
c_2xKR-12_1,6 1.910
c_CHEN_5 (w/o His-tag, w/o GSG-linkers) 1.058
Table 4: RMSDs of our constructs VS MCoTI-II in Ångström

Two of the structures are shown in figures 2 and 3.

Figure 2: c_CHEN_1 (w/o His-tag) (gray, CHEN pink) VS MCoTI-II (cyan)
Figure 3: c_CHEN_1_CHEN_6 (gray, CHEN pink, His-tag orange) VS MCoTI-II (cyan)

All the predicted structures were sensible regarding RMSD, as an RMSD in the range of 1-2 Å means that the structures are similar. Also, the relevant structures displayed the right amount of disulfide bonds.

Structure Prediction for Our Models

The RMSDs of our final structures compared to MCoTI-II can be found in table 5.

Construct RMSD vs. MCoTI-II (Å)
c_CHEN_1 1.432
c_CHEN_6 1.297
c_CHEN_1_KR-12_6 1.056
c_2xCHEN_1,6 7.180
c_KR-12_1 1.727
c_KR-12_6 1.250
c_KR-12_1_CHEN_6 1.832
c_2xKR-12_1,6 8.392
c_blank 1.320
Table 5: RMSDs of our final structures compared to MCoTI-II in Ångström

The structures for c_CHEN_1_CHEN_6 and c_KR-12_1_KR-12_6 did not yield sensible structures when compared to MCoTI-II and were therefore not used in our models. All other structures showed a sensible RMSD and the right number of disulfide bonds and were therefore used for our models. Two of the structures can be seen in figures 4 and 5.

Figure 4: c_CHEN_1 (gray, CHEN pink, His-tag orange) VS MCoTI-II (cyan)
Figure 5: c_CHEN_1_KR-12_6 (gray, CHEN pink, KR-12 blue, His-tag orange) VS MCoTI-II (cyan)

Effects on Our Project

Wetlab Screening

It was important for us to see that when we designed our construct sequences, they would yield promising structures. Therefore, using structure prediction as a way to back us up on choices such as inserting linker sequences between cysteines and the AMP-sequences, as well as the placement of the AMPs in the different loops of the scaffold, was a useful way to ensure this. Of course, the confidence we could draw from this process was limited, as there most likely exists a bias towards predicting similar structures to the natural ones when using natural structure templates (like I-TASSER does). Still, as information on the topic of grafting AMP sequences into cyclotides was and is limited, the procedure was of great use at the time. We chose to use the sequences with a His-tag in loop 5 and the AMP sequences in loops 1 and 6, flanked by linkers.

Structure Prediction for Our Models

Using structure prediction we removed non-sensible structures from our models and were now able to run them with the sensible structures of 7 of our constructs, therefore increasing the chances of representative models.


11 Yang, Jianyi; Zhang, Yang (2015): I-TASSER server: new development for protein structure and function predictions. In: Nucleic acids research 43 (W1), W174-81. DOI: 10.1093/nar/gkv342.

12 Jumper, John; Evans, Richard; Pritzel, Alexander; Green, Tim; Figurnov, Michael; Ronneberger, Olaf et al. (2021): Highly accurate protein structure prediction with AlphaFold. In: Nature. DOI: 10.1038/s41586-021-03819-2.

13 Felizmenio-Quimio, M. E.; Daly, N. L.; Craik, D. J. (2001): Circular proteins in plants: solution structure of a novel macrocyclic trypsin inhibitor from Momordica cochinchinensis. In: The Journal of biological chemistry 276 (25), S. 22875–22882. DOI: 10.1074/jbc.M101666200.

14 The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC.

15 Sali, A.; Blundell, T. L. (1993): Comparative protein modelling by satisfaction of spatial restraints. In: Journal of Molecular Biology 234 (3), S. 779–815. DOI: 10.1006/jmbi.1993.1626.

16 Backbone-Cyclized Stable Peptide-Dimers Derived from the Human Cathelicidin ll-37 Mediate Potent Antimicrobial Activity.

Molecular Dynamics Simulations to Understand Membrane Interaction


Molecular Dynamics simulation is the process of computing the interactions of different atoms/molecules over a certain timeframe and then visualizing/analyzing the resulting data. The interactions between different atoms are computed using force fields, which are databases with definitions of how strong and with what kind of force certain atoms interact with each other. It is possible to compute all-atom systems, where all atoms are computed as individual entities, or to computes coarse-grained systems, where certain groups of atoms are united into one computational entity, and it becomes, therefore, easier and less calculation-expensive to compute the systems. Coarse-grained simulations are less accurate, however, because atoms no longer act as individual entities. We chose to leverage both coarse-grained and all-atom Molecular Dynamics simulations to gain insight into our project. This was made possible by the support of Dr. Philipp Thiel, who granted us access to the local supercomputer BinAC to compute our simulations.


The goal of our Molecular Dynamics analysis is to give us insight into whether AMPs that are grafted into our construct can still show membrane interaction. To test this, we simulated the interactions of the wild-type AMPs CHEN and KR-12 and afterward our grafted constructs with a model bacterial membrane. The simulations would show us the binding of the constructs during membrane perforation.


Our first assumption was that the AMPs interact with the bacterial membrane, as they were characterized to do so by the literature. They are effective against gram-positive and gram-negative bacteria 17 18 19 .

The second assumption our model includes is the use of a model gram-negative bacterial membrane containing a 3:1 ratio of zwitterionic to anionic phospholipids. The zwitterionic phospholipid is POPE (palmitoyloleoylphosphatidylethanolamine) and the anionic one is POPG (palmitoyloleoylphosphatidylglycerol). This choice was made by our partner team IISER Kolkata and is backed by the literature 20 .

Running the Model

We used the Molecular Dynamics simulation suite GROMACS 21 22 , which is used for preparing and running simulations.

All-Atom Systems

All our all-atom systems were built by our partner team IISER Kolkata as part of our partnership. After we provided them with the structures of our constructs built above, they used them to build the membrane systems and prepared the scripts for running those. After some slight modifications to make the simulations run-ready, we ran them using our supercomputer access.

Coarse-Grained Systems

The coarse-grained systems were built and prepared by us. They are using a simulation approach known as Umbrella Sampling, in which different systems with slightly different configurations, for example, the distance of the peptide from the membrane, are created and run. Afterward, statistical analysis is performed on all systems to gain insights.


To analyze our simulations we used “Visual Molecular Dynamics” 23 (VMD in short), Grace , as well as GROMACS itself.

Unfortunately, our coarse-grained simulations didn’t yield any useful results, which is why their analysis is omitted.

The following parameters were analyzed regarding our all-atom simulations:

  • Root Mean Square Deviation (RMSD): as we expected the membrane to be more stable than the protein, we expected the protein to be stabilized once binding occurs (blue line in the graph)
  • Solvent Accessible Surface (SASA): we expected the SASA to decrease once the protein binds to the membrane, as part of its surface gets blocked (orange line in the graph)
  • Hydrogen bonds (H-Bonds): we expected the number of hydrogen bonds to increase once the protein and membrane come into contact (black line in the graph)

To make it easier to compare the fluctuation of the parameters at different time points, the following graphs show the running averages (of 50 datapoints) of the mentioned parameters within one graph.

Parameters of CHEN
Figure 6: Parameters of CHEN
Parameters of c_blank
Figure 7: Parameters of c_blank
Parameters of KR-12
Figure 8: Parameters of KR-12
Parameters of c_KR-12_1_CHEN_6
Figure 9: Parameters of c_KR-12_1_CHEN_6

By comparing the graphs to the visualized simulations, it turns out that the number of hydrogen bonds gives a good idea at which timepoint the protein interacts with the membrane.

In contrast, RMSD, as well as SASA, don't provide useful conclusions, as they don’t align well with the number of hydrogen bonds.
This becomes clear not only through visual inspection of the simulation but also when calculating the correlation between them and the number of hydrogen bonds:

Construct H-Bonds ~ SASA H-Bonds ~ RMSD
c_KR-12_1_CHEN_6 -0.26219362 0.39548478
KR-12 -0.01997589 0.17944922
c_blank -0.02565356 0.03495232
CHEN 0.08947693 0.06257159
Table 6: Correlations between different parameters

As the values are quite close to 0 and therefore correlation is low, it can be said that no other parameter is as representative as the number of hydrogen bonds generated.

In case of RMSD this might be due to the fact that its values were already low even when the protein hadn’t bound to the membrane.
Furthermore, the membrane itself is flexible as well, not holding the protein in one place as initially thought.

For SASA, a possible explanation could be that the protein wasn’t close enough to the molecules of the membrane to effectively reduce the amount of surface reachable by water molecules.

To compare the constructs to each other, we computed the arithmetic mean of the number of hydrogen bonds created over the simulation:

Construct Average number of hydrogen bonds
c_KR-12_1_CHEN_6 5.87306347
KR-12 5.0225
c_blank 3.626
CHEN 5.82
Table 7: Average number of hydrogen bonds over the course of the simulation

In our simulations, c_KR-12_1_CHEN_6 performed best, closely followed by CHEN and KR-12.
c_blank, our scaffold, has the lowest score and therefore interacts the least with the membrane.

One of the simulations, showing CHEN interacting with the membrane, is shown below.

Figure 10: CHEN interacting with the membrane

Effects on Our Project

As we weren’t able to simulate a membrane perforation, we used the all-atom simulation to gather information regarding how well our construct would initiate a perforation.

Using the simulations, we could see a prediction about how well our constructs interact with a membrane. This validated that the binding of the membrane was still possible, even when the AMPs were grafted into our scaffold. Also, through using the empty scaffold as a control, we were able to validate that the grafting of the AMPs into the scaffold increased its membrane-binding ability.


17 Dong, Weibing; Dong, Zhe; Mao, Xiaoman; Sun, Yue; Li, Fei; Shang, Dejing (2016): Structure-activity analysis and biological studies of chensinin-1b analogues. In: Acta biomaterialia 37, S. 59–68. DOI: 10.1016/j.actbio.2016.04.003.

18 Wang, Guangshun (2008): Structures of human host defense cathelicidin LL-37 and its smallest antimicrobial peptide KR-12 in lipid micelles. In: The Journal of biological chemistry 283 (47), S. 32637–32643. DOI: 10.1074/jbc.M805533200.

19 Gunasekera, Sunithi; Muhammad, Taj; Strömstedt, Adam A.; Rosengren, K. Johan; Göransson, Ulf (2020): Backbone Cyclization and Dimerization of LL-37-Derived Peptides Enhance Antimicrobial Activity and Proteolytic Stability. In: Frontiers in microbiology 11, S. 168. DOI: 10.3389/fmicb.2020.00168.

20 Lee, Juho; Jung, Sang Won; Cho, Art E. (2016): Molecular Insights into the Adsorption Mechanism of Human β-Defensin-3 on Bacterial Membranes. In: Langmuir 32 (7), S. 1782–1790. DOI: 10.1021/acs.langmuir.5b04113.

21 Abraham, Mark James; Murtola, Teemu; Schulz, Roland; Páll, Szilárd; Smith, Jeremy C.; Hess, Berk; Lindahl, Erik (2015): GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. In: SoftwareX 1-2, S. 19–25. DOI: 10.1016/j.softx.2015.06.001.

22 Páll, Szilárd; Abraham, Mark James; Kutzner, Carsten; Hess, Berk; Lindahl, Erik (2015): Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS. In: Stefano Markidis und Erwin Laure (Hg.): Solving Software Challenges for Exascale, Bd. 8759. Cham: Springer International Publishing (Lecture Notes in Computer Science), S. 3–27.

23 Schulten, W. H. (1996). VMD - Visual Molecular Dynamics. Journal of Molecular Graphics, 33-38

Molecular Docking to Anionic Phospholipids


Molecular Docking is the process of finding the optimal docking position of a ligand to a receptor (usually an enzyme) in a 3-dimensional space. It is sometimes used to screen a library of ligands against each other to find the best-fitting ligand for an enzyme. The score of a certain docking position of a ligand is called the binding affinity and is calculated using an energy function in kcal/mol. The position with the lowest binding energy is the one with the highest binding affinity and therefore the optimal calculated position.


The idea of using Molecular Docking for our project was proposed by the iGEM Team IISER Kolkata, with whom we had a partnership. The idea was to use Molecular Docking to calculate the binding affinities of our grafted constructs with the anionic phospholipids of the bacterial membrane and then compare them to the affinities of the AMPs alone. This would give us the following insights:

  1. Screening for the most promising constructs to use/prioritize in the Wetlab. We predict that the constructs with the highest binding affinities will be the most likely to show activity.
  2. Validation that the binding ability of AMPs is retained after insertion into our scaffold.


To build our model, we assumed the following things:

  1. Our AMPs, namely KR-12 and CHEN, are both positively charged. Therefore, they should bind to the membrane by leveraging the opposing charge, meaning using anionic phospholipids.
  2. The anionic phospholipid POPG, also known as palmitoyloleoylphosphatidylglycerol, is an anionic phospholipid that is abundant in bacterial membranes. 20

Running Our Model

1. Preparing the .pdbqt Files

To run our molecular docking, we needed the structures of our ligands (AMPs, grafted constructs) and receptor (POPG). We used the same ligand structures as in all our simulations and got a POPG membrane structure from the GROMACS download site (file: 24 from which we extracted a single POPG molecule.

We used PyRx 25 to prepare the ligands and the receptor by automatically letting the program add hydrogens, choose polar hydrogens, add charges, and do necessary further preprocessing. This didn’t work for one structure and so we prepared it using AutoDockTools 26 27 .

2. Running Molecular Docking using AutoDock Vina

To calculate the docking of our constructs, we used AutoDock Vina 28 , a tool for running Molecular Docking. We ran the simulations on the supercomputer BinAC, to which we were kindly given access by Dr. Philipp Thiel.


The calculated binding affinities can be found in table 8. AutoDock Vina produces multiple modes of how the ligand could be docked to the receptor (we chose to produce 5 per ligand). We tried to choose the ones with the highest binding affinities for our model, while also filtering out impossible configurations which occurred because it seems that Vina cannot dock circular molecules and therefore breaks the bond between the termini (which wouldn’t happen in a natural system).

Construct Binding affinity (kcal/mol)
CHEN -3.2
KR-12 -3.3
c_blank -1.9
c_CHEN_1 -2.3
c_CHEN_6 No sensible docking produced
c_KR-12_1 -2.2
c_KR-12_6 -1.3
c_CHEN_1_KR-12_6 -2.3
c_KR-12_1_CHEN_6 -2.2
Table 8: Binding affinities of the different ligands

The binding modes can be found in the figures below. The bonds are colored according to the legend in figure 11. CHEN side chains are colored pink, KR-12 side chains are colored cyan, POPG is colored orange and the peptide itself is colored gray with dark-blue side chains. The visualizations and analysis of the bonds were created using the Protein-Ligand Interaction Profiler 29 .

Appearance of bonds
Figure 11: Appearance of bonds

Figure 12: CHEN docked to POPG
Figure 13: KR-12 docked to POPG
Figure 14: c_blank docked to POPG
Figure 15: c_CHEN_1 docked to POPG
Figure 16: c_KR-12_1 docked to POPG
Figure 17: c_KR-12_6 docked to POPG
Figure 18: c_CHEN_1_KR-12_6 docked to POPG
Figure 19: c_KR-12_1_CHEN_6 docked to POPG

Effects on Our Project

By using the highest binding affinities of each construct, we were able to mostly validate our Wetlab prioritization for expression/testing of our grafted constructs.

We also see that even though the binding affinity decreases for our constructs compared to the AMPs alone, the binding affinity of our constructs (excluding c_KR-12_6) is still higher than the binding affinity of the scaffold alone. This is promising, as it shows that the grafting of the AMPs into the scaffold increases the binding affinity of the scaffold.

What was also interesting to see was that even though the AMPs bind only to the membrane surface in our Molecular Dynamics simulations, their optimal binding position according to the docking is inserted into the hydrophobic tails of the lipids. This means that our docking model is more representative of a membrane after perforation, where the hydrophobic tails of the lipids would be accessible. However, the conclusion we drew from our model should still apply, as this might support the prevention of rebuilding the membrane after perforation.


20 Lee, Juho; Jung, Sang Won; Cho, Art E. (2016): Molecular Insights into the Adsorption Mechanism of Human β-Defensin-3 on Bacterial Membranes. In: Langmuir 32 (7), S. 1782–1790. DOI: 10.1021/acs.langmuir.5b04113.

24 Kukol, Andreas (2009): Lipid Models for United-Atom Molecular Dynamics Simulations of Proteins. In: Journal of chemical theory and computation 5 (3), S. 615–626. DOI: 10.1021/ct8003468.

25 Dallakyan, Sargis; Olson, Arthur J. (2015): Small-molecule library screening by docking with PyRx. In: Methods in molecular biology (Clifton, N.J.) 1263, S. 243–250. DOI: 10.1007/978-1-4939-2269-7_19.

26 Morris, Garrett M.; Huey, Ruth; Lindstrom, William; Sanner, Michel F.; Belew, Richard K.; Goodsell, David S.; Olson, Arthur J. (2009): AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. In: Journal of computational chemistry 30 (16), S. 2785–2791. DOI: 10.1002/jcc.21256.

27 Sanner, M. F. (1999). Python: a programming language for software integration and development. J Mol Graph Model, 17(1), 57-61.

28 Trott, O., & Olson, A. J. (2010). AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry, 31(2), 455-461.

29 Adasme, Melissa F.; Linnemann, Katja L.; Bolz, Sarah Naomi; Kaiser, Florian; Salentin, Sebastian; Haupt, V. Joachim; Schroeder, Michael (2021): PLIP 2021: expanding the scope of the protein-ligand interaction profiler to DNA and RNA. In: Nucleic acids research 49 (W1), W530-W534. DOI: 10.1093/nar/gkab294.

About Us

We are the iGEM Team Tuebingen, a group of motivated students who are working on creating a fast screening platform for stabilized peptides. We are aiming to provide a system that gives everyone the ability to stabilize peptides such as antimicrobial peptides to create better medical agents.

Follow Us

Affinity Designer
Affinity Photo
Affinity Publisher
Greiner Bio One
Toggl Plan
Twist Biosciences
Microsynth SEQLAB
NewEngland Biolabs