Introduction
Our detection platform starts from SARS-CoV-2 and we want to adapt it to all pathogens that can be detected by their nucleic acids. The route of adapting it is to choose alternative zinc finger (ZF) proteins that target specific nucleic acid elements of the pathogens. We plan to develop software that can promote the ZF protein design to target a specific sequence in a quick and precise way.
Taking SARS-CoV-2 as an example, it currently has more than 40,000 variants, and currently the most prevalent strain is the Delta strain. We plan to conduct the following development:
Design a ‘Universal’ kit based on the conservative sequence of SARS- CoV-2, and use this kit to distinguish whether a person carries the SARS-CoV-2 or another virus. Then, we modify the ZF protein according to the unique or mutated sequence of a particular variant, and design a ‘variant’ specific kit that can be used to determine the identity or the subtype of this variant.
Similarly, we can apply this design to any pathogen, because it is relatively easy to change the ZF protein, so we can quickly develop new kits to detect constantly emerging and altering pathogens.
Method
For a particular pathogen, the software can scan the genomic sequence to identify potentially targetable sites in 3-base pair (bp) sets with priority scores, choose the most suitable structure of a ZF protein, and generate a corresponding amino acid sequence for the protein.
The scoring function of the software is based on the ZF structure database. The database contains the results of multi-target specificity analysis of each ZF against the background of three-finger proteins targeting GCGNNNGCG, where NNN is the proposed ZF binding site. We enter a desirably targeted DNA sequence, in which the software will search for motifs that can be targeted by ZF proteins. For the 3-bp set of each ZF binding motif, the software will score it and export a ZF protein sequence that specifically targets the site. Taking the binding site GTA as an example, the ZF sequence with the best specificity is -QSSSLVR-. As shown in the figure, the black bars represent targeted oligonucleotides with different ZFs. The white bars represent the oligonucleotide library with specific 5' nucleotides in its ZF: GNN, ANN, TNN and CNN. The height of each bar represents the relative specificity of each target protein, which is averaged in two independent experiments and normalized against the highest signal in the corresponding groups. The error bar represents the deviation from the average of each sample.
Taking our ZF protein targeting sequence 5’-GAAGGGGGGGTA-3' as an example, we
(1) input it into the software, which will output several ZFs with different scores.
(2) select the ZF with the highest score, which is the ZF with the predicted best targeting specificity of 3-bp for each recognition site.
(3) incorporate them into the ZF protein to give a complete ZF protein sequence.
The analysis results of the ZF protein used in our project:
Finger | Triplet | Helix |
---|---|---|
1 | GTA | QSSSLVR |
2 | GGG | RSDKLVR |
3 | GGG | RSDKLVR |
4 | GAA | QSSNLVR |
The amino acid sequence of the most suitable ZF protein is:
LEPGEKPYKCPECGKSFSQSSSLVRHQRTHTGEKPYKCPECGKSFSRSDKLVRHQRTHTGEKPYKCPECGKSFSRSDKLVRHQRTHTGEKPYKCPECGKSFSQSSNLVRHQRTHTGKKTS
Result
We have chosen three well-known SARS-CoV-2 variants: B.1.1.7 (α), B.1.351 (β), and B.1.617.2 (δ). We analyzed the reverse transcripts (double-stranded DNA) of novel coronavirus's genome and designed ZF proteins specifically targeting different mutants according to the mutation sites on its S gene. The results are shown as follows:
Original SARS-CoV-2
Target site:5' CAG GAT GTT AAC 3'
Predicted zinc fingers:
Finger | Triplet | Helix |
---|---|---|
1 | AAC | DSGNLRV |
2 | GGT | TSGSLVR |
3 | GAT | TSGNLVR |
4 | CAG | RADNLTE |
The amino acid sequence of the ZF protein is:
LEPGEKPYKCPECGKSFSDSGNLRVHQRTHTGEKPYKCPECGKSFSTSGSLVRHQRTHTGEKPYKCPECGKSFSTSGNLVRHQRTHTGEKPYKCPECGKSFSRADNLTEHQRTHTGKKTS
B.1.1.7
Target site:5' GTA GTA CAC CTT 3'
Predicted zinc fingers:
Finger | Triplet | Helix |
---|---|---|
1 | CTT | TTGALTE |
2 | CAC | SKKALTE |
3 | GTA | QSSSLVR |
4 | GTA | QSSSLVR |
The amino acid sequence of the ZF protein is:
LEPGEKPYKCPECGKSFSTTGALTEHQRTHTGEKPYKCPECGKSFSSKKALTEHQRTHTGEKPYKCPECGKSFSQSSSLVRHQRTHTGEKPYKCPECGKSFSQSSSLVRHQRTHTGKKTS
B.1.351
Target site:5' GGT AGC AAA CCT 3'
Predicted zinc fingers:
Finger | Triplet | Helix |
---|---|---|
1 | CTT | TKNSLTE |
2 | AAA | QRANLRA |
3 | AGC | ERSHLRE |
4 | GGT | TSGHLVR |
The amino acid sequence of the ZF protein is:
LEPGEKPYKCPECGKSFSTKNSLTEHQRTHTGEKPYKCPECGKSFSQRANLRAHQRTHTGEKPYKCPECGKSFSERSHLREHQRTHTGEKPYKCPECGKSFSTSGHLVRHQRTHTGKKTS
B.1.617.2
Target site:5' GGT GTT AAA GGT 3'
Predicted zinc fingers:
Finger | Triplet | Helix |
---|---|---|
1 | CTT | TKNSLTE |
2 | AAA | QRANLRA |
3 | AGC | ERSHLRE |
4 | GGT | TSGHLVR |
The amino acid sequence of the ZF protein is:
LEPGEKPYKCPECGKSFSTTGALTEHQRTHTGEKPYKCPECGKSFSSKKALTEHQRTHTGEKPYKCPECGKSFSQSSSLVRHQRTHTGEKPYKCPECGKSFSQSSSLVRHQRTHTGKKTS
Summary
In general, the design of this part is still designated to highlight the advantages of our project. If we want to improve our ability to promptly respond to the pandemic of the current SARS-CoV-2 or other pathogens, the developability of the detection methods is essential. Through such software, we can easily and quickly change the ZF protein sequence in the detection system to target suddenly emerged novel coronavirus strains. We can also quickly identify new novel coronavirus strains prevalent in a specific area and complete the mastery of COVID-19 epidemic information in a short time.