Overview: Developing an End-to-End Organoid Analysis Pipeline - Organoid Profiler
Bioimage segmentation in the field of synthetic biology and life sciences is a common but challenging task, especially in dealing with complex biological properties due to highly dense systems, high variation in shapes and sizes, and complex morphology. This challenge is magnified working with organoids, as one of the major bottlenecks in analysis is with processing large amounts of microscopy image data. However, due to the sizes and shapes of organoids varying greatly over different time points, drug treatment types, and cell types, there have been challenges in developing a high-throughput screening workflow that could be generalized for all types of organoids.
Such a need for an accurate and generalizable pipeline to analyze and extract quantitative measurements from organoid scans motivated us to develop Organoid Profiler, a deep learning (DL)-based suite for a complete end-to-end organoid analysis. Organoid Profiler is composed of two major components - a trained segmentation model (Brain OS) and a trained classifier model. We have currently developed Brain OS (Organoid Segmentation), a DL-based segmentation model for organoid-based segmentation. While Brain OS allows the user to extract quantitative measurements from the images - such as the surface area of each individual organoid - Brain OC (Organoid Classifier), the second part of Organoid Profiler, performs classification tasks to label the segmented organoids depending on their liability. During the second phase of our project, we will be developing the Brain OC classifier model and completing the Organoid Profiler pipeline. Once completed, Ogranoid Profiler allows us to quickly screen results of glioma drug screenings and identify hopeful candidates for clinical trials. In this section, we focus on the segmentation pipeline. To download and access our software, go to our github repository or download the repository
Figure 1. Proposed Workflow for Organoid Profiler.
Part 1: Brain OS (Organoid Segmentation)
Approach
While deep learning provides an opportunity to tackle previously intractable bioimage analysis tasks (i.e. organoid segmentation), large amounts of annotated data is generally required to train a reliable network. However, curation of a sizable and diverse training data is labor intensive and challenging, which limits the development of high accuracy segmentation models on such datasets. As the phenotypes and the density vary greatly depending on the cell type, growth stage, and the treatment drug, it is especially challenging to curate an adequate amount of training data to train a generalizable model.
Therefore, we tackle the issue by:
- Curating a diverse training dataset of organoid scans. Organoid scans of four different cell lines at three different growth stages were obtained throughout the project.
- Optimizing the network structure and training principles to efficiently utilize a limited dataset.
Dataset
60 4x brightfield scans of size 1296 × 966 were obtained from various conditions such as different cell types and different stages of growth (3, 4, and 5 days).
Due to the variation in the condition, the organoids in the images had a wide range of shapes, sizes, and density. Such a variety is necessary in order to train a generalizable model.
Background
Conventional image analysis tools (ImageJ) along with existing deep learning tools (CellPose (Stringer et al., 2021), StarDist (Schmidt et al., 2018), OrganoSeg (Borten et al., 2018)) largely failed in performing segmentation tasks on the dataset due to the variation in the dataset being so high. This was expected as organoids created under different conditions result in very different phenotypes. Although such tools were successfully segmenting the larger organoids, it would tend to either over or under-segment the smaller ones as there are multiple focal planes superimposed on one image. Because the shapes of these organoids are highly curved and uneven, conventional image analysis tools had trouble accurately detecting the organoid boundaries as well. Lastly, these pre-existing tools are not trained to ignore the scale bar at the bottom of all images, a surprising but vital problem in organoid analysis.
Segmentation results from pre-existing image analysis tools. (First row: predictions generated using ImageJ. Bottom left: prediction generated from CellPose. Bottom right: prediction generated from Stardist)
Methods
30 representative images were chosen and split into 256x256 pixel tiles. The tiles were then divided into training (80%) and validation (20%) sets based on a stratified Monte Carlo sampling method.
In order to generate the “ground truth”, the pretrained CellPose model was utilized as manually labeling all organoids in the images would be extremely labor intensive and inefficient. Although these are predictions and are not entirely accurate, we presumed that if the model is efficiently learning from the given dataset, it would account for the small portion of false positives/negatives.
In order to account for the variation in the organoid phenotypes and the ground truth being artificially generated, a series of image augmentations - rotation, horizontal and vertical flips, zoom in and out - were performed. In addition, L2 regularization was applied to prevent the model from overfitting.
A deep learning model utilizing a UNet framework (Ronneberger et al., 2005) was developed to segment each individual pixel either as an organoid or the background. While a more complex model utilizing MobileNetV2 (Isola et al., 2018) (CNN) as the encoder and Pix2Pix (Sandler et al., 2018) (GAN) as the decoder was developed, we observed that the simple UNet framework is sufficient for a reliable model performance. Therefore, we use the simpler framework and train the corresponding model.
Figure 2. Segmentation Pipeline
Results
The model achieved 0.93 accuracy in the training set and 0.90 accuracy in the validation set. Although these metrics are not as high as we’d prefer, model performance is still reliable considering the high variation in the training and validation dataset. The model with the lowest validation loss was then tested on an unseen, independent set of brightfield scans. Model performance was evaluated on visual and performance metrics and confirmed that the model is capable of segmenting highly variant organoid image datasets.
Model prediction results from Brain OS
Conclusion
We applied data augmentation strategies to express class invariances along with feature extraction techniques and optimized the network structure to effectively utilize a highly variant dataset. We also demonstrated that by utilizing deep learning with adequate framework, a high-throughput organoid analysis is possible. The workflow enables the extraction of quantitative measurements from images and enables more extensive analysis that goes beyond simply counting the total number of organoids. Organoid Profiler streamlines the process of brain organoid image analysis. We hope that the workflow we developed will be helpful to other researchers in the field working with organoids.
User Manual: A quick guide for using Brain OS
Github repository and link to download the source code
>Running the pretrained model
1. Download the pretrained model from https://drive.google.com/file/d/1Dl7BczvxStkL0pjoeS2R42kFP-9Kv860/view?usp=sharing
2. Preprocess the images that you’d like to segment to 256x256 tiles
3. Using gen_pred.ipynb, generate prediction on the images
- Train your own model
1. Prepare the dataset and the corresponding masks. You can either generate your ground truth annotations manually (we recommend using Piximi annotator https://www.piximi.app/ or other annotation tools such as ImageJ or QuPath). If you wish to create such “ground truth” labels using pre-trained models, we suggest that you look into Stardist, CellPose, or CellProfiler depending on the characteristics of your specific dataset.
2. Preprocess the images and the masks to 256x256 tiles and generate a .pytable for training and validation using make_hdf5.ipynb.
3. Train the model using organoids_unet.ipynb. The code is optimized for the Google Colab environment which we recommend.
4. Process the test dataset and apply the trained model to generate predictions on new data. You’re done!
*For further assistance in using the software or data specific issues, please contact us!
Part 2: Brain OC (Organoid Classification) - Coming soon!
During the second phase of our project, we will build upon Brain OS and develop a classifier model to label each organoid depending on its viability. As such a task is challenging even for a trained human, we hope to utilize machine learning to extract subtle phenotypic and distribution characteristics of organoids. With Brain OC, the Organoid Profiler pipeline would be complete, enabling an end-to-end processing of organoid images and allowing the user to extract valuable quantitative measurements.
Figure 3. Classification Pipeline
References
Stringer, C., Wang, T., Michaelos, M. et al. Cellpose: a generalist algorithm for cellular segmentation. Nat Methods 18, 100–106 (2021). https://doi.org/10.1038/s41592-020-
01018-x.
Schmidt, Uwe et al. “Cell Detection with Star-convex Polygons.” ArXiv abs/1806.03535 (2018): n. pag.
Borten, M.A., Bajikar, S.S., Sasaki, N. et al. Automated brightfield morphometry of 3D organoid populations by OrganoSeg. Sci Rep 8, 5319 (2018). https://doi.org/10.1038/s41598-017-18815-8
Ronneberger, Olaf; Fischer, Philipp; Brox, Thomas (2015). "U-Net: Convolutional Networks for Biomedical Image Segmentation". arXiv:1505.04597
Isola, Phillip, et al. “Image-to-Image Translation with Conditional Adversarial Networks.” ArXiv:1611.07004 [Cs], Nov. 2018. arXiv.org, http://arxiv.org/abs/1611.07004.
Sandler, Mark, et al. “MobileNetV2: Inverted Residuals and Linear Bottlenecks.” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2018, pp. 4510–20. DOI.org (Crossref), https://doi.org/10.1109/CVPR.2018.00474.