AstroBio Overview
AstroBio is a two year iGEM project. The first year, 2020, consisted of building a gene expression database for organisms experiencing microgravity. The second year consisted of populating the database and building a small web-application to generate interactive visuals and automated gene set expression analysis. The database will soon host fifteen studies, twelve of which are compatible with the web-application.
AstroBio Explorer
The AstroBio Explorer is a small web-application designed to analyze a curated list of differential gene expression experiments for organisms treated with both spaceflight and simulated microgravity. Using various R packages and the Shiny web development framework, users can both examine results using a variety of interactive plots and download them in a excel spreadsheet format. Using the g:Profiler API, users can also perform automated gene set enrichment analysis to find differentially regulated functional groups from a large subset of the most popular gene annotation databases (Raudvere et al., 2019). The results are displayed in several interactive plots and tables and are available to download.
The github release can be found here.
Guide
Users can select up to three studies to view simultaneously from the table and perform gene set enrichment (GSE) analysis provided on g:Profiler supported the organism (the GSE_Analysis_Compatible column should list yes). While hovering over the desired row, a summary of the experiment and a contrast table will be shown below. The contrast table shows the linear model used to generate the differential gene expression values with the R package limma (Smyth, 2005). After selecting the studies and checking off the perform Gene Enrichment Analysis option, users can then edit the filter options for both the p values (adjusted, false discovery rate) and the log2 fold change (at above or equal to the absolute value). Excluding IEA GO annotations removes gene ontology annotations which are added via comparative analysis and keeps only manually curated ones. Selecting this option can help remove background annotations from cluttering the analysis. After pressing Import and waiting a few moments the data is ready to view.
Gene expression plot
The volcano plot shows the log¬2 fold change on the x axis and the -log10 p-value on the y. By clicking on the options in the header, users can subset the genes by their membership in an enriched group category. Users can further subset genes by a particular enriched group by selecting a row in the Enriched Gene Functional Group table and clicking on Reset\Show Genes on Plot with the volcano plot open.
Figure 1: Volcano Plot and Gene Selection
Manhattan Plot
Enriched gene functional groups are displayed in a Manhattan plot provided by g:Profiler where the functional groups are arranged on the x-axis by semantic similarity and the y-axis is the -log10 adjusted p-value for the particular group. Circle size is the term size for the organism in question.
Figure 2: Manhattan Plot
Biofabric Plot
The Biofabric Plot is a network plot that is able to show gene memberships in any combination of enriched groups or subset categories (Longabaugh, 2012). Biofabric plots are useful because they can be arbitrarily large without being too computationally expensive and can still reveal interesting connections at smaller localities. In a Biofabric plot, nodes are depicted as horizontal lines and their edges are propagated vertically and terminated by a small square point. The plot can be subset by functional groups categories and fold change direction in the same way the volcano plot can. Furthermore, users can generate unique biofabric plots by selecting rows in the Enriched Gene Functional Group table and clicking on Reset\Show Genes on Plot to show the genes associated with the selected groups. Be sure to have the correct group category and fold change direction in order to show positive results. Display shadow links in the options forces all edges to be shown in a duplicate manner. This is helpful when highly connected nodes are obscured by being sorted into a wedge and printed in a small font (Longabaugh, 2012).
Figure 3: Biofabric Plot and Gene Selection
Sources
Longabaugh, W. J. (2012). Combing the hairball with BioFabric: A new approach for visualization of large networks. BMC Bioinformatics, 13(1), 275. https://doi.org/10.1186/1471-2105-13-275
Raudvere, U., Kolberg, L., Kuzmin, I., Arak, T., Adler, P., Peterson, H., & Vilo, J. (2019). g:Profiler: A web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Research, 47(W1), W191-W198. https://doi.org/10.1093/nar/gkz369
Smyth, G. K. (2005). limma: Linear Models for Microarray Data. In R. Gentleman, V. J. Carey, W. Huber, R. A. Irizarry, & S. Dudoit (Eds.), Bioinformatics and Computational Biology Solutions Using R and Bioconductor (pp. 397-420). Springer. https://doi.org/10.1007/0-387-29362-0_23