Team:EPFL/Design

Design

Introduction

Driven by the wish to develop a solution for one of the leading local concerns – the excessive copper contamination in the soils of local vineyards - we squeezed our brains to bring out the maximum of our creativity and engineering intuition. Therefore, the goal was to understand how to approach this problem in a way that would get the most out of our whole design.

Figure 1Dimer design process.

Thus, the first step was to choose the chassis that would welcome our solution. This process took several back and forths, round tables and exchanges. After a lot of bibliographic research in the fields of bioaccumulation and bioremediation, consultations and comparisons with both experts in synthetic biology and local producers, we narrowed down our selection until finally choosing yeast as the chassis. At this point, the main interest became which solution to develop in order to optimize our results. Different solutions have already been developed, created and tested to target the problem of heavy metal removal, many of which have been developed on yeast strains. Moreover, different metallothionein proteins have been studied and classified mainly by size, binding affinity, number of attracted copper ions, expression, folding, and biological activity. With this in mind, the next step for us was finding the suitable protein to help us solve our targeted problem and capture as much copper as possible, saving our project and our soils.

Figure 2Schematic representation of copper metabolism in yeast cell, and in our transgenic yeast strain.

CUP1

Several factors were considered before choosing the perfect protein for our project. The initial goal was to assemble a complex able to accumulate copper both inside and outside the cell. This complex would include three main components: an intracellular protein for cytoplasmic storage of copper ions, a transmembrane protein allowing the traffic of the ions from the external environment, and a metallothionein expressed on the extracellular surface to bind ions. Fortunately, we realized that the first two components of this complex are already endogenous functions of the wildtype yeast: that is, being able to transport copper ions in the intracellular space and store them there. Nevertheless, why don't we also try to express the same intracellular protein able to bind copper on the surface?

During this process, many other metal-binding proteins were considered, but none of them compared with the original intracellular protein in terms of the ability to attract eight copper ions simultaneously, stability, and protein characterization. The protein here chosen, responsible for copper tolerance, is called CUP1, a polypeptide of 61 amino acids – with its leading eight residues post-translationally cleaved off – resulting in a 53-residue final processed polypeptide1. This protein sequence is particularly interesting: it contains 12 cysteine residues organized as Cys – X – Cys, Cys – Cys, and Cys – X – X – Cys, which produce four different binding sites for divalent metal ions and eight for monovalent ions, with high affinity for copper and cadmium2.

In sum, CUP1 can bind eight copper ions through 12 cysteines per molecule.

The CUP1 processed primary structure is therefore a yeast metallothionein 53-residue polypeptide of molecular weight 5655 Da. Reconstitution studies of the apo-molecule revealed that 8 mol eq of Cu(I) depleted the content of zinc in the zinc-saturated metallothionein and showed maximal stability against proteolysis. These assays, therefore, suggested that, as mentioned above, the protein has eight binding sites for Cu(I). Additionally, this yeast metallothionein was observed to coordinate two additional ions: Cd(II) and Zn(II). In particular, in studies of direct binding, protection against proteolysis, and metal ion exchange, these divalent ions were found to associate with the protein with a maximal stoichiometry of 4 ions per molecule. Yeast metallothionein therefore reveals, as does the mammalian protein, two distinct binding configurations for Cu(I) and Cd(II)1.

An experiment based on gene replacement has highlighted that the CUP1 protein is essential in yeast mainly for two reasons. First, it performs the main function of protecting the cells against the toxic effects of high concentrations of copper added to the medium. Second, it also reacts by suppressing the transcription of its own structural gene at low external copper concentrations. Indeed, transcription of the CUP1 gene is rapidly induced by copper. However, studies showed that unlike the mammalian metallothionein genes, transcription of the yeast gene is not affected by other ions such as cadmium, zinc, or mercury3.

In summary, copper resistance in Saccharomyces cerevisiae is controlled by the CUP1 encoding genes. The CUP1 locus has an open reading frame (ORF) capable of encoding a 61- amino acid polypeptide that particularly resembles the mammalian metallothionein in its high proportion of cysteine residues and in the presence of a conserved six amino acid sequence. However, the predicted structure of the yeast protein does differ from mammalian metallothionein and includes two Phe residues, aromatic residues that are not found in any other metallothionein. Moreover, unlike mammalian metallothionein, the yeast molecule is rich in Glx and contains one His. Thus, the protein product of the CUP1 locus has been termed either Cu-metallothionein or Cu-chelatin1.

Expression system

Yeast surface display is an efficient technique for producing engineered antibodies to increase their stability, affinity and specificity. Indeed, antibodies can be engineered for improved stability, as the expression is measured directly and has been shown to correlate with the stability of the displayed protein.

Moreover, yeast display has also been implemented to engineer several proteins with a variety of different applications4. Yeast surface display offers several advantages for protein-directed evolution. First, it enables quantitative screening through fluorescence-activated cell sorting. This allows to directly observe the general statistics and the equilibrium activity of the sample during the screening process. Furthermore, the protein-binding signal is normalized for expression, eliminating all the possible artefacts due to expression bias, thus allowing for clear discrimination between all possible mutants4.

Finally, the displayed proteins are folded in the endoplasmic reticulum of the eukaryotic yeast cells, taking advantage of endoplasmic reticulum chaperones and quality-control ‘machinery’4.

For engineering a single protein sequence, the DNA of interest must first be cloned into a yeast surface display vector.

In order to express CUP1 on the surface, we decided to use a surface display strategy. In the yeast surface display system, CUP1 is fused to the adhesion subunit of the yeast agglutinin protein Aga2p, which is attached to the yeast cell wall and Aga1p through disulfide bonds. Expression of the Aga2p – CUP1 is under the control of a galactose-inducible promoter on the yeast display plasmid – maintained in yeast episomally with a nutritional marker – whereas Aga1p is expressed from a chromosomally integrated galactose-inducible expression cassette. Moreover, variations in surface expression can be measured through immuno-fluorescence labelling of the V5 epitope tag flanking CUP1. A clear representation of our design is shown in figure 2.

Figure 3Yeast surface display.The CUP1 (cyan) is displayed on an Aga2 (pink) fusion protein on the surface of yeast. The expression can be detected by using fluorescent antibodies binding to the V5 tag (purple), and binding of the CUP1 to a biotinylated antigen (orange) can be detected using fluorescent label.

Dimers

To improve the ability of our original construct, we chose to join together two units of CUP1 through different linking sequences. This intuition should allow for a significant improvement in the copper absorption capabilities of our chassis, effectively doubling the presence of CUP1 on the surface and ideally leading to a twofold improvement in copper uptake and resistance.

In order to develop such a complex, we decided to use two slightly different CUP1 sequences, to avoid any problem of homology, coupled by a linker. Furthermore, the full nucleotide sequences of the CUP1 dimers were codon optimized to maximise the yield of protein production in yeast and prevent self-hybridization of the RNA after transcription. Different linkers have been modelled and tested in order to obtain the most stable and efficient structure. Once the modelling process was finished, the seven created complexes were cloned and tested, observing excellent results in expression and cloning on each of them.

Linkers

Figure 4Yeast surface display for linkers.One CUP1 (cyan) is dimerized with another CUP1 (yellow) differing in sequence. This is displayed on an Aga2 (pink) fusion protein on the surface of yeast. The expression can be detected by using fluorescent antibodies binding to the V5 tag (purple), and binding of the CUP1 to a biotinylated antigen (orange) can be detected using fluorescent label.

The successful design of a fusion protein requires two essential elements: the component proteins and the linkers. The choice of the whole complex is based on the desired functions of the fusion protein product and, typically, is relatively straightforward. On the other hand, the rational design or selection of a suitable linker to join the protein domains together can be arduous and is often the most critical step in designing fusion proteins. The direct fusion of functional domains without a linker may lead to several undesirable outcomes. This may include misfolding of the fusion proteins, low yield in protein production, or impaired bioactivity. Moreover, these linker peptides not only serve to connect the protein moieties, but also provide several other functions, such as maintaining cooperative inter-domain interactions, leading to the proper folding of the whole complex, preserving biological activity of the two single proteins, and sometimes adding new active functions5.

Indeed, by genetically fusing two or more protein domains, the fusion protein product may obtain many distinct functions derived from each of their component moieties.

In general, some of the properties added by the presence of a linker are:

  • Increase stability and folding
  • Increase expression
  • Increase biological activity

In order to fuse two different CUP1 and develop the best models for the dimers, we decided to design and test 7 different linkers, each of which eventually led to excellent results.

In particular, to express different features in all our dimers, we chose specific designs for each of these linkers. Toward this aim, we mainly played with four factors: length, hydrophobicity, amino acid residues, and secondary structure. The last three features could also be categorized under the macro concept of flexibility. Therefore, length and flexibility were the main actors of our design.

The length determines the distance between the two proteins, thus allowing greater or lesser interaction between the two. In order to obtain a tradeoff between the absence of interactions and the size of the complex, we finally chose sequences that varied between 14 and 20 amino acids, typical length at the border between intermediate and long sequences.

Flexibility (or rigidity) is instead a parameter used to empirically describe the ability of a protein sequence to maintain a specific conformation6. Generally, this parameter is directly related to the amino acids and thus to the structure of the linker, which therefore would also affect the orientations of the fused proteins. Thus, although the orientation is determined by the structure of the linker, which is difficult to quantify, flexibility is used here to serve as an index of the linker structure. Moreover, previous studies have shown that the flexibility of a linker is associated with the function of fusion proteins, indicating the importance of linker flexibility in the construction of fused proteins.

In order to observe the behaviour of our complex in different conformations and structures, we decided to model, design and test flexible, semi-rigid and rigid linkers.

Flexible linkers

Flexible linkers are usually applied when the joined domains require a certain degree of movement or interaction. They are generally composed of small, non-polar (e.g. Gly) or polar (e.g. Ser or Thr) amino acids7. The small size of these amino acids provides flexibility and more degree of freedom allowing for high mobility of the connecting functional domains.

In addition, the incorporation of Serine or Threonine can maintain the stability of the linker in aqueous solutions by forming hydrogen bonds with the water molecules, and therefore reduces the unfavourable interaction between the linker and the protein moieties5.

The most commonly used flexible linkers have sequences consisting primarily of stretches of Glycine and Serine residues (“GS” linker). An example of the most widely used flexible linker has the sequence of (Gly – Gly – Gly – Gly – Ser)n. By adjusting the copy number “n”, the length of this GS linker can be optimized to achieve appropriate separation of the functional domains, maintain necessary inter-domain interactions, and allow for proper folding of the fusion proteins5.

Therefore, regarding the flexible linker, the final choice has been oriented towards a combination of Glycine and Serine, in two different combinations of GGGGS, in particular (GGGGS)n, with n = 3, 4.

Rigid linkers

While flexible linkers have the advantage of connecting the functional domains passively and permitting a certain degree of movement, the lack of rigidity of these linkers can be limiting. There are different examples in the literature where flexible linkers showed loss of biological activity or poor expression yields. The inefficiency of flexible linkers in these instances was attributed to the ineffective separation of the protein domains or insufficient reduction of their interaction. In these situations, rigid linkers have been successfully applied to keep a fixed distance between the domains and maintain their independent functions, separating the functional domains more efficiently than the flexible linkers.

Rigid linkers exhibit relatively stiff structures by adopting α-helical structures or by containing multiple Pro residues. Indeed, Pro is a unique amino acid with a cyclic side chain that causes a very restricted conformation. Furthermore, the lack of amide hydrogen on Pro typically prevents the formation of hydrogen bonds with other amino acids, and therefore reduces the interaction between the linkers and the protein domains. As a result, the inclusion of Pro residues might increase the stiffness and structural independence of the linkers8.

Furthermore, in this case, the length of the linkers can be easily adjusted by changing the copy number to achieve an optimal distance between domains. As a result, rigid linkers are chosen when the spatial separation of the domains is critical to preserve the stability or bioactivity of the fusion proteins.

In this case, the most common rigid linker is the alpha helix-forming linkers with the sequence of (EAAAK)n, which shows an approximately 80% helicity with n = 39.

Moreover, many natural linkers exhibited α-helical structures10. Generally, these latter are characterized by high rigidity and stability, with intra-segment hydrogen bonds and a closely packed backbone. Therefore, the stiff α-helical linkers may act as simple rigid spacers between protein domains.

Another type of rigid linkers, as explained above, has a Pro-rich sequence, (XP)n, with X designating any amino acid, preferably Ala, Lys, or Glu. Therefore, the presence of Pro in non-helical linkers can increase the stiffness, and allows for effective separation of the protein domains as well.

Therefore, regarding the rigid linkers, we decided to use either a combination of EAAAK, interested in the resulting α-helix structure, or a combination of AP interested in the properties of the poly-proline sequence. In particular, we chose (EAAAK)n with n = 3,4 and (AP)n with n = 7.

Semiflexible linkers

Finally, recombining the entire building units with these significantly different flexibilities can potentially extend the linker mobility range for fusion protein design. Such an approach would benefit from several new properties. In particular, using stable conformational sequences as the entire building units would lead to eliminating useless complexity in predicting the possible conformations of the linkers and the perturbations between the linker region and connected domains. In addition, as linker flexibility mainly depends on its conformation, which as stated above stems from the primary sequence, the linker library by recombining the rigid and flexible building units in different length, can be expected to have widely controllable flexibility, capable of changing from rigidity to flexibility9.

Therefore for the semi-flexible (or semi-rigid) conformation, we decided to combine the design choices used for flexible and rigid linkers. In particular, the main idea is to separate the two functional domains at an adequate distance to prevent any interaction between the two and maintain their independent functions while allowing extreme flexibility in the area close to the C and N terminus of the fusion proteins. In this way, the complex will be highly rigid in the centre and particularly flexible in the vicinity of the two protein termina allowing further degrees of freedom, proper folding, thus hopefully further facilitating the process of copper uptake. Accordingly, the structures used for the semi-rigid linker are:

GGGGS(EAAAK)nGGGS, with n = 1, 2, alpha helicity of around 15-20% and 25-30% respectively9.

In summary, linkers can adopt various structures and exert diverse functions to fulfil the application of fusion proteins. The flexible linkers are often rich in small or hydrophilic amino acids such as Gly or Ser to provide structural flexibility and have been applied to connect functional domains that favour interdomain interactions or movements. In cases where sufficient separation of protein domains is required, rigid linkers may be preferable. By adopting α-helical structures or incorporating Pro, the rigid linkers can efficiently keep protein moieties at a distance. Both flexible and rigid linkers are stable in vivo and do not allow joined proteins to separate.

Copper assay design

In order to see whether our engineered yeasts could retrieve copper from the surrounding environment, we developed and designed a tailored assay.

The principle behind it is simply based on measuring the amount of copper in a solution before adding the genetically modified organism and up to 2-3 hours later. To obtain the precise copper concentration, we performed a colorimetric assay and we made the absorbance measurements using a spectrophotometer.

To analyze the effect of our engineered organism on copper concentration over time, we grew our various engineered yeast strains in liquid media containing copper and we collected 2 ml aliquots of the culture at different time point after addition of yeasts in the media (t0, 15 min, 60 min, 90 min, 120 min). The yeast cultures were prepared in 500 ml Erlenmeyer flasks, and incubated at 30°C under 200 rpm shaking. Each of the collected samples were then slowly centrifuged (200g for 10 minutes) to separate the yeast cell from the media and copper sulfate solution. Note that the centrifugation step was performed at a low speed to avoid any cell lysis, hence preventing the presence of yeast components, free or bound to copper ions, in the supernatant. This is done mainly for two reasons. First, the absorbance wavelengths of yeast and copper are in the same spectrum range. Therefore, during the measurement, we cannot really distinguish them and any contaminations of yeast in the supernatant could lead to alteration and artefact values in the OD measurements, resulting in data with a higher concentration than reality. Second, if any pieces of yeast bound to copper ions would end up in the supernatant, these ions would also be detected by the spectrophotometer, also leading to values with a higher concentration than the one that would have been actually removed from our yeast. Once the centrifugation step was finished, the pellet, mainly characterized by yeast cells, and the supernatant, mainly characterized by media and copper sulfate, are isolated from each other and collected for different chemical downstream processes. The supernatant is referred for quantification of the residual metal. The pellet is saved for determination of amount of ions released.

All these steps were performed evaluating mainly five parameters.

  • Time – the first parameter to consider was the time needed by our yeast to perform its biofunctional role. After several bibliographic literature research and the analysis of many experimental results, we realized that the optimal time required by the wildtype yeast was a bit more than 2 hours.

  • Copper concentration – After testing several concentrations, we decided to perform each experiment at the copper concentration closest to the soil's actual conditions, being as rigorous and realistic as possible. Therefore, analyzing the data collected in the measurements of copper concentration in the soil, we observed that the average uniformly converged to a concentration of about 2 mg/l. Consequently, we chose a concentration of 2 mg/l of copper sulfate as the final value for our experiments. You can find more details on how we obtained this value on the Background page.

  • Yeast concentration – Different concentrations of yeast were tested to empirically observe the proportionality between the amount of yeast cells and the portion of copper removed from the solution. In particular, the data showed that removal efficiency increased with increasing biomass concentration, possibly due to the higher amount of uptake sites in the presence of more cells.

  • Media – We analyzed the behaviour of our organism in mainly three media: SGCAA, SDCAA, YPD+galactose, observing different growth and proliferative patterns in each. Given the diversity obtained in these data, especially through growth curves and colony-forming assays, we decided to observe whether the ability of our organism to attract copper changed according to the media in which it grew or the media in which we performed the experiments.

  • Clones – The backbone, and all clones, subclones and dimers were tested to compare data between them and find the optimal organism able more than any other to attract copper.

All measurements were performed with three biological replicates and three technical replicates.