Team:UZurich/Measurement

Best Measurement

Good Practice in Data Analysis

Biology heavily depends on statistics as a tool to attach significance to findings. As a result, data analysis is an essential skill for natural scientists. Nevertheless, it is sometimes regarded as a chore and can be performed mindlessly, following conventions. But because data analysis is the foundation on which we build our conclusions, it is important that one takes time to do the data analysis and thinks about the effect every step has.
This is why we in the dry lab have focused solely on the task of data analysis. In order to ensure that we draw valid conclusions and communicate them properly, we put great care into every step of the process. Perhaps it comes from our initial inexperience but we questioned every step we took and consulted our supervisors when we were uncertain. We took many measures to ensure that our data analysis was sound, so that our team could contribute to the world with confidence.


Note

Because the Biology major in our university has a mandatory data analysis course in which we used R Studio extensively, we also used it as a coding environment (Version 1.4.1106, R: 4.0.4) for iGEM. The basis of our knowledge was also gained in that course.

Reduction of Variability

For our project, we often performed ROS assays, which are plant immunity assays that measure the reactive oxygen species burst in response to an elicitor. As we were doing our ROS analysis for the first time, we noticed that the responses to the treatments differed greatly among repetitions (Fig. 1&2 ). Worried that we had done something wrong or the experiment didn’t work, we asked the wet lab whether that was normal.



Fig. 1: Graphs showing ROS burst of each repetition and treatment (Col-0)


Fig. 2: Graphs showing ROS burst of each repetition and treatment (efr1)


They ensured us that ROS assays were intrinsically variable and that is why they had made 12 repetitions per sample in each experiment. This is a practice that we placed a lot of value on throughout our experiments, since biological systems are inherently variable. Repetitions are vital to reduce noise and allow us to make more reliable conclusions. ROS assays are an extreme case regarding variability, so it would be good to use even more repetitions, since it would give us more confidence in our findings.


Data Wrangling

During our analyses, we often encountered statistical outliers which came from contamination of the sample. The wet lab always clearly marked such samples, which made the assessment whether to keep or discard them much easier. We often created boxplots and histograms to see whether there were any strong outliers (Fig. 3 ). When we stumbled on measurements that stood out, we could check with the original dataset to see whether the particular measurement was contaminated. When we were unsure whether we should keep a measurement or not because the deviation was not large or the contamination wasn’t strong, we kept it in the data set. It is important not to change the collected data whenever possible, as this could result in biased results. Handling outliers is a delicate issue and we also consulted our supervisors when we were in doubt, as they are much more experienced than we are.



Fig. 3: Example histogram from SGI where a Mock measurement was rotten.

Because we were not experienced with data analysis in the beginning, we often worked on separate computers and performed the analyses in parallel. This served as a safety net and to ensure that the difference in operating systems has no effect on the execution of the code. When we reached different results, we discussed our approaches to find the more adequate one for a particular analysis. This system allowed us to double check our steps and catch our mistakes, as well as to optimise our approach.

Balanced designs are a prerequisite for an ANOVA with an interaction. However, we were sometimes confronted with an unbalanced design. In such cases, we carefully considered what our research question was and removed categories from the data set so that the design is balanced, while still making sure that the relevant information can be gained.

The data taken out can sadly not be used, unless it can be evaluated completely separately from the other measured data. We cannot reuse the other measurements for the analysis because this implies several assumptions that result in a theoretical duplication on increase of data points, even though we didn’t collect them. Though we are unsure of the exact principles behind this, we believe that it is a complex combination of inherent variability of nature, as well as stochastics and the nature of data sets.


Controls

Controls are an essential part of the experimental design. We always had a positive and negative control where it made sense. Though we used the positive controls in the plots so as to inform the viewer that nothing was faulty in our setup, we decided to remove them from our statistical tests. The reason for this is that we used an ANOVA for most of our analyses. This means that the means of each treatment group are compared to each other and the results indicate whether at least one of the means is different from the reference category (neg. control). The problem with the positive control in such a setup is that it is intrinsically different from the negative control and thus would greatly influence the statistical test’s outcome. In effect, this would render the ANOVA useless and we would only have the post-hoc test (TukeyHSD) for the evaluation of data. Post-hoc tests however can only be used to make new hypotheses, so we wouldn’t be able to draw conclusions from our experiments. As scientists, this would have been an impossible sacrifice to make, so we removed the positive controls in statistical tests.

Fitting Models

Fitting linear models comes with assumptions that need to be met:

  1. The expected value of the residuals is 0
  2. All residuals have the same variance
  3. All residuals are normally distributed
  4. The residuals are independent

To ensure that the models we fitted were appropriate for the data, we always checked the diagnostic plots to see whether all the modelling assumptions were met (Figures 4-7 ). Because the assumption of normality was not met in ROS assays, we opted for the transformation of the response variable in our model (total fluorescence) for the data using the natural logarithm. However, for legibility reasons, we used the untransformed data for the plots.


Fig. 4: Tukey-Anscombe plot


Fig. 5: QQ-plot


Fig. 6: Scale-location plot


Fig. 7: Leverage plot


Drawing Conclusions

We made sure to not only focus on and report p-values, but also F-statistics, and R2 values since they can give valuable information. While R2 values show how well our explanatory variables explains the variability in the response variable, F-statistics can show us how high the variability between groups is compared to within groups. A low F-statistic means that the difference in variability between and within groups is not high, which implies that the categories we tested for are not that different from each other. Since we mainly performed ANOVA, both of the values can tell us how good our explanatory variables are at explaining the way the response variable fluctuates.

When the result of an ANOVA is significant, we cannot say which group it is that is significantly different. This is why people often perform post-hoc tests. As mentioned above, post-hoc tests do not allow us to make statistical conclusions either, but they can be used to formulate new hypotheses. They can thus be an indication for the more detailed ‘mechanisms’ behind the significance of the ANOVA, but the results are to be handled with care. One of our advisors uses Dunnett’s tests on the Prism software, but we found that Dunnett’s tests are used to compare all means to the reference category. This is not fitting for our use because we want to compare all groups to one another. We decided to talk to Prof. Owen Petchey, who was the lecturer of our data analysis course, suggested we use the TukeyHSD. It is a test that compares all means to one another and uses a family wise error rate to correct for multiple testing. This means that our confidence level across all pairwise tests is set at 95%. Because this test ticks all our boxes, we decided to use this test for our analyses.

Communication of Results

It is easy to get lost in the heaps of numbers created during data analysis. It was important to us that we could always close the loop to biological significance. Looking at the p-values and R2s, we asked ourselves what the result means for us and whether it makes sense at all. For the plots, we focused on adding as much information as possible, while maintaining legibility. The aim was to give the viewer all the information needed in the plots to understand what our data shows. For the error bars, we decided to use standard errors, since they provide the most easily interpretable representation of errors. In some experiments, the setup was so big that we had to use several plates. This also meant that we had several controls. Though it isn’t necessary to standardise our data from different plates to ensure comparability because all the conditions for something to count as one experiment were met (i.e. plants of the same age, time of treatment and measurement etc.), we decided to do it in some cases to simplify comparison by eye. To do so, we standardised every plate by the respective negative control which generates relative results.

Facilitating Data Analysis

While we worked on our project, we came to realise how time consuming and difficult it could be to perform a data analysis. Our Wet Lab Team performed the same assays repeatedly and we believe that being consistent with the evaluation of the same type of experiments is of great importance to report congruous results. Therefore, we have written a code containing many functions to facilitate evaluating a dataset with a specific structure more efficiently. We are sure that we are not the only ones who are not well versed in data analysis at the start of iGEM because often, universities tend to focus on practical work.
When we were looking for ways to analyse Reactive Oxygen Species (ROS) burst assays, we could not find a lot of information. That is why we have made our code for analysing ROS data in RStudio available as a text file for everyone as well as a file where every function is documented. For easier use, we also contribute an example of a dataset (and an annotated version) with the correct structure of the code (you can download everything here). Our short and simple protocol for a ROS assay that you can find on our protocols page. This way, future iGEM teams that work on plant immunity can perform their ROS assays more easily and hopefully benefit from the work we did!