Revision as of 05:42, 24 September 2021

HK_GTC 2021 Homepage

Deep Learning

Detection of Plastic Bottles
Our Workflow
Model Description
Model Usage
Model Results
Future Plans/Implementations

Detection of Plastic Bottles

As we know, plastic pollution is a severe problem that impacts both the ecosystem and our daily lives. One of the major sources of plastic is oceans, and an estimated amount of 5.25 trillion pieces of plastic and microplastic are currently floating around the ocean, of which 15% of them will eventually land on our beaches[1]. In response to the plastic problem, we would like to develop a deep learning PET bottle detection model for mapping plastic pollution on beaches. The data allows government, councils, NGO to have an overview of the current situation and to estimate how effective their proposal is to reduce the impact of plastic wastes. Researchers will begin working to prepare and implement an effective plan for prevention and cleanup effort on where they should focus. Our ultimate goal is to help to reduce the amount of plastic pollution in the ocean.

Our Workflow

Photo Taking

Using our drone and phones, we took 718 images along coastlines of beaches including Cheung Chau and Cheung Sha in Hong Kong, and uploaded them to CVAT(Computer vision annotation tool)[2] - a website provided by Clearbot, a company which creates marine plastic clearing robots, to create ground truth instances for the training process.

Fig 1. Students using a drone to capture images on beaches

Volunteer Plastic Tagging

Together with the human practice team, we have organized a plastic tagging activity within our school, we invited around 5 students from each class, with a total amount of 60 students. During the activity, we taught them how to trace polygons around a plastic object in an image in CVAT. Together with the help of some of our team members, we had generated the training data.

Then we train a model using the data obtained with Detectron from Facebook AI Research to obtain plastic detecting models.

Model Description

The detection algorithm we’re using is Mask-RCNN[3], which is an object detection algorithm developed by Facebook, the algorithm extends the concept of Faster-RCNN, which only creates bounding boxes around detected objects. We decided to use Detectron2[4] as a framework to create a model. We use the baselines from Detectron’s Model Zoo[5] for transfer learning to improve model performance. To have an intuitive understanding of the model structure. Let's say we have a video that we want to detect PET bottles from, we then process each frame from the video by passing it to a Convolutional Neural Network, which uses filters as kernels to extract features from the images, such as shapes, reflections, highlights, etc. And creating a feature map.

Fig.2 Structure of feature extracting backbone

The feature map is then passed into a Region Proposal Network (RPN), which is a small neural network that generates region proposals as bounding boxes and whether there is an object in them. Together with the feature map from the last section, ROI align is applied to each region of interest in the feature map to get a fixed-dimensioned input for the next section. The output of the ROI align layer is passed into two networks a Fully Connected Layer and a small Fully convolutional network, together they can create a mask and classify the object.

Fig.3 Structure of RPN, ROI Align and following two neural network layers

Training

To optimize the results, we need to train the network until the output of the network for the training data is close to the ground truth results. In other words, the training objective is to minimize the difference between them. The difference between the ground truth and model output can be defined with a Loss Function, in the case of Mask-RCNN, it’s defined from the error of bounding box prediction, classification, and mask prediction. For validation of training results, we use COCO mAP[6] for calculation.

Model Usage

Requirements: Local Linux environment With: Detectron2 Jupyter Notebook pytorch 1.8 torchvision OpenCV Numpy or Google Colab Notebook Files and guidelines for the codes can be found in our Github(Link To Github).

Model Results

Training / Validation Configurations

We performed data augmentations on the training images, including random flipping, and brightness and contrast, between the scale 0.9 and 1.1. Each iteration loops through 2 images from the training data for mini-batch stochastic gradient descent. Our training and validation dataset contains 718 images in total, and are split in an 8:2 ratio, where there are 574(1145 instances) and 144 (193 instances) images for training and validation. We trained the models in both Google Colaboratory and Kaggle using both Nividia P100 and K80 GPU, for 1000 iterations. For validation, we collected the Mean Average Precisions of the model performing on the validation dataset and losses for every 20 iterations. And plotted them in a line graph. We trained the model in different baselines from the Detectron Model Zoo[4], which includes X101-FPN, R101-FPN, R50-FPN. Also, performed random subsampling for the training dataset in fractions of 0.25, 0.5, 0.75 to find out the size of the training data to maximize model performance.

Baseline results

Table 1: The mAP results were tested on the validation dataset by models with different baselines, as we can see that X101-FPN outperforms other models we tested.

Table 2: The mAP results from the Mask-RCNN paper.

As shown in Table 1, the X101-FPN backbone, which combines the concept of ResNet and InceptionNet, out-performs the R50-FPN and R101-FPN baseline in terms of overall Mean Average Precision(+3.7, +2.5). Comparing with the results obtained from the Mask-RCNN paper, which they train their model from the COCO dataset, consisting 330k images of 80 object categories. Our models clearly have a higher mAP. Sample detection images are shown below.

Fig. 4: 3 sample images from the validation set detected by the model with X101-FPN baseline

Potential reasons for High AP & inaccurate performance

Although we have a high Average Precision, the detections were not perfect, there were clearly some false positives existing in the sample images. The first reason is probably that our model only detects one object category - PET bottles. Compared with the Mask-RCNN benchmark, they detect 80 categories in total, which drags down their AP. Besides, there are flaws existing in our dataset: The most obvious point is the lack of both training and validation data in our dataset, compared to large scale datasets such as Pascal VOC, imageNet, and COCO, consisting of >10000 instances per category. For training, it inhibits the ability of our model to learn more features of PET bottles. Moreover, our dataset contains images of similar objects of different angles, this causes the model to rely on these duplicate data, and only be able to detect bottles with features similar to them, causing bias in our model. Finally, we didn’t use some advanced validation methods, such as K-Fold Cross-validation, where the dataset is divided into k sections, and average the AP obtained by using the distinct sections for validation.

Fig 5. The training curves of X101FPN, R101FPN, and R50FPN(Left to right)

Fig. 4 are the training curves for the models, where the blue line represents the mAP, and the purple line represents the loss. We can see that the mAP of the R50FPN and R101FPN model reached stability at around 500 iterations, while the X101FPN model reaches stability at around 400 iterations, early than the rest. And their losses all reached constancy at around 600 iterations. And after the 600 iteration mark, there is no sudden spike drop in the mAP, so we know that the model wasn’t experiencing overfitting.

Power Estimations of the dataset

As Clearly shown in the graphs, our small scaled dataset is entirely not enough to achieve maximum performance. The trend of the AP against the training image percentage shows an exponential increase, which means if we further expand our dataset it will continue to show AP improvements.

Future Plans/Implementations

Plans

Increase the size of training and validation data to improve model accuracy and ensure a reliable mAP Train our dataset on other algorithms e.g. YOLOv3.

References
[1]https://www.condorferries.co.uk/marine-ocean-pollution-statistics-facts [2]https://github.com/openvinotoolkit/cvat [3]https://arxiv.org/abs/1703.06870 [4]https://github.com/facebookresearch/Detectron [5]https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md#coco-instance-segmentation-baselines-with-mask-r-cnn [6]https://jonathan-hui.medium.com/map-mean-average-precision-for-object-detection-45c121a31173 [7]https://www.itc.gov.hk/en/fund_app/patent_app_grant.html

@@ Line 32: / Line 32: @@
              <div class="single-image-with-desc">
                 <center><img src="Figure 1. deep learning.png" alt=""></center>
-                <p>Figure 1. Students using a drone to capture images on beaches</p>
+                <p>Fig 1. Students using a drone to capture images on beaches</p>
              </div>
              <h3>Volunteer Plastic Tagging</h3>
@@ Line 42: / Line 42: @@
              <h2 id="descri">Model Description</h2>
              <p>
-                The detection algorithm we\’re using is Mask-RCNN[3], which is an object detection algorithm developed by Facebook, the algorithm extends the concept of Faster-RCNN, which only creates bounding boxes around detected objects. We decided to use Detectron2[4] as a framework to create a model. We use the baselines from Detectron’s Model Zoo[5] for transfer learning to improve model performance.
+                The detection algorithm we’re using is Mask-RCNN[3], which is an object detection algorithm developed by Facebook, the algorithm extends the concept of Faster-RCNN, which only creates bounding boxes around detected objects. We decided to use Detectron2[4] as a framework to create a model. We use the baselines from Detectron’s Model Zoo[5] for transfer learning to improve model performance.
                 To have an intuitive understanding of the model structure. Let's say we have a video that we want to detect PET bottles from, we then process each frame from the video by passing it to a Convolutional Neural Network, which uses filters as kernels to extract features from the images, such as shapes, reflections, highlights, etc. And creating a feature map.
@@ Line 49: / Line 49: @@
              <div class="single-image-with-desc">
                 <center><img src="../img/1.png" alt=""></center>
-                <p>Figure 1. The breakdown process of PETing usPETase and MHETase. Left: PETase depolymerizes PET into BHET,
+                <p>Fig.2 Structure of feature extracting backbone</p>
-                  MHET and TPA. Right: MHETase depolymerizes MHET into TPA and EG.</p>
              </div>
+            <p>The feature map is then passed into a Region Proposal Network (RPN), which is a small neural network that generates region proposals as bounding boxes and whether there is an object in them. Together with the feature map from the last section, ROI align is applied to each region of interest in the feature map to get a fixed-dimensioned input for the next section. The output of the ROI align layer is passed into two networks a Fully Connected Layer and a small Fully convolutional network, together they can create a mask and classify the object. </p>
+<div class="single-image-with-desc">
+               <center><img src="../img/1.png" alt=""></center>
+               <p>Fig.3 Structure of RPN, ROI Align and following two neural network layers</p>
+            </div>
+            <h3>Training</h3>
+            <p>To optimize the results, we need to train the network until the output of the network for the training data is close to the ground truth results. In other words, the training objective is to minimize the difference between them. The difference between the ground truth and model output can be defined with a Loss Function, in the case of Mask-RCNN, it’s defined from the error of bounding box prediction, classification, and mask prediction. For validation of training results, we use COCO mAP[6] for calculation.</p>
-             <h2 id="nase">PETase and its mutant, S245I</h2>
+             <h2 id="usage">Model Usage</h2>
-             <p>In 2019, our team created two successful PETase mutants that can increase the enzymatic activity of PETase.
+             <p>Requirements:
-               The single-mutant S245I, and the double-mutant W159H/S245I proved to have a higher depolymerization activity
-               as compared with the wild type. This year, we did some follow-up experiments to confirm if our PETase mutant
+Local Linux environment With:
-               S245I can successfully digest PET. We used a Scanning Electron Microscope (SEM) to observe the pitting of
+Detectron2
-               the digested PET film surface. The HPLC result shows the levels of the intermediate product of digestion,
+Jupyter Notebook
-               MHET, and the monomer, TPA.
+pytorch 1.8
+torchvision
-               We showed that when only wild type PETase and the S245I PETase is present in the digestion process, MHET was
+OpenCV
-               detected, which suggests that PET is not completely depolymerized by PETase (Table 1). Therefore, we
+Numpy
-               hypothesize the presence of MHETase in our enzyme system can increase the degradation rate of PET into its
-               constituting monomers.
+or Google Colab Notebook
+Files and guidelines for the codes can be found in our Github(Link To Github).
              </p>
-             <div class="two-image-with-desc">
+            <h2 id="results">Model Results</h2>
+            <h3>Training / Validation Configurations</h3>
+            <p>We performed data augmentations on the training images, including random flipping, and brightness and contrast, between the scale 0.9 and 1.1. Each iteration loops through 2 images from the training data for mini-batch stochastic gradient descent. Our training and validation dataset contains 718 images in total, and are split in an 8:2 ratio, where there are 574(1145 instances) and 144 (193 instances) images for training and validation. We trained the models in both Google Colaboratory and Kaggle using both Nividia P100 and K80 GPU, for 1000 iterations. For validation, we collected the Mean Average Precisions of the model performing on the validation dataset and losses for every 20 iterations. And plotted them in a line graph.
+We trained the model in different baselines from the Detectron Model Zoo[4], which includes X101-FPN, R101-FPN, R50-FPN. Also, performed random subsampling for the training dataset in fractions of 0.25, 0.5, 0.75 to find out the size of the training data to maximize model performance.
+</p>
+            <h3>Baseline results</h3>
+            <div class="single-image-with-desc">
+               <center><img src="../img/1.png" alt=""></center>
+               <p>Table 1: The mAP results were tested on the validation dataset by models with different baselines, as we can see that X101-FPN outperforms other models we tested.</p>
+            </div>
+            <div class="single-image-with-desc">
+               <center><img src="../img/1.png" alt=""></center>
+               <p>Table 2: The mAP results from the Mask-RCNN paper.</p>
+            </div>
+            <p>As shown in Table 1, the X101-FPN backbone, which combines the concept of ResNet and InceptionNet, out-performs the R50-FPN and R101-FPN baseline in terms of overall Mean Average Precision(+3.7, +2.5). Comparing with the results obtained from the Mask-RCNN paper, which they train their model from the COCO dataset, consisting 330k images of 80 object categories. Our models clearly have a higher mAP. Sample detection images are shown below.</p>
+             <div class="three-image-with-desc">
                 <div class="im-group">
+                  <center><img src="../img/1.png" alt=""></center>
                    <center><img src="../img/2.png" alt=""></center>
                    <center><img src="../img/3.png" alt=""></center>
                 </div>
-                <p>Figure 2. HPLC data of the products obtained after 96 hours of PET digestion at 30°C.
+                <p>Fig. 4: 3 sample images from the validation set detected by the model with X101-FPN baseline
-                  The retention time of TPA and MHET HPLC standards were at 4.64 minutes and 5.17 minutes respectively.
-                  Left: PETase digestion using wild type PETase as the only enzyme.
-                  Right: PETase digestion using S245I PETase as the only enzyme.
                 </p>
              </div>
+            <h3>Potential reasons for High AP & inaccurate performance</h3>
+            <p>
+Although we have a high Average Precision, the detections were not perfect, there were clearly some false positives existing in the sample images.
+The first reason is probably that our model only detects one object category - PET bottles. Compared with the Mask-RCNN benchmark, they detect 80 categories in total, which drags down their AP.
+Besides, there are flaws existing in our dataset: The most obvious point is the lack of both training and validation data in our dataset, compared to large scale datasets such as Pascal VOC, imageNet, and COCO, consisting of >10000 instances per category. For training, it inhibits the ability of our model to learn more features of PET bottles. Moreover, our dataset contains images of similar objects of different angles, this causes the model to rely on these duplicate data, and only be able to detect bottles with features similar to them, causing bias in our model.
+Finally, we didn’t use some advanced validation methods, such as K-Fold Cross-validation, where the dataset is divided into k sections, and average the AP obtained by using the distinct sections for validation.</p>
+            <div class="three-image-with-desc">
+               <div class="im-group">
+                  <center><img src="../img/1.png" alt=""></center>
+                  <center><img src="../img/2.png" alt=""></center>
+                  <center><img src="../img/3.png" alt=""></center>
+               </div>
+               <p>Fig 5. The training curves of X101FPN, R101FPN, and R50FPN(Left to right)
+               </p>
+            </div>
+            <p>Fig. 4 are the training curves for the models, where the blue line represents the mAP, and the purple line represents the loss. We can see that the mAP of the R50FPN and R101FPN model reached stability at around 500 iterations, while the X101FPN model reaches stability at around 400 iterations, early than the rest. And their losses all reached constancy at around 600 iterations. And after the 600 iteration mark, there is no sudden spike drop in the mAP, so we know that the model wasn’t experiencing overfitting.</p>
-             <h2 id="chrim">Chimeric proteins and enzyme cocktails</h2>
+             <h3>Power Estimations of the dataset</h3>
-            <p>We hypothesize that adding MHETase with PETase will synergize the PET depolymerization process. We propose
+            <div class="two-image-with-desc">
-                to develop a dual-enzyme of PETase and MHETase system in forms of chimeric proteins and enzyme cocktails.
+               <div class="im-group">
-               For the chimeric proteins of PETase (including PETase mutants) and MHETase, we link the C-terminus of PETase
+                  <center><img src="../img/1.png" alt=""></center>
-               to the N-terminus of MHETase using a 12 amino acid serine-glycine linker. We expect the efficiency of the
+                  <center><img src="../img/2.png" alt=""></center>
-               degradation of PET into its final monomers, TPA and EG, will be increased. For the protein cocktails of
+                </div>
-               PETase (including PETase mutants) and MHETase, we mix PETase with MHETase in a single reaction. We would
-               also like to compare the depolymerization activities of the chimeric protein and the protein cocktail.</p>
+            <p>As Clearly shown in the graphs, our small scaled dataset is entirely not enough to achieve maximum performance. The trend of the AP against the training image percentage shows an exponential increase, which means if we further expand our dataset it will continue to show AP improvements.</p>
-             <h2 id="result">Results</h2>
+             <h2 id="future">Future Plans/Implementations</h2>
-             <p>In the start of our project, we designed our new constructs, cut using the restriction enzyme and did PCR
+            <h3>Plans</h3>
-               screening. Following that, we did protein induction to express the protein, and protein extraction and
+             <p>Increase the size of training and validation data to improve model accuracy and ensure a reliable mAP
-               purification. We did Bradford protein assay to test the concentration of protein, and SDS-PAGE to confirm if
+Train our dataset on other algorithms e.g. YOLOv3.
-               our protein is expressed. Finally, we did PET film digestion and used HPLC and SEM equipment borrowed from
-               HKU to analyse the results of the experiment (See Figure 3).
              </p>
              <p>
                 References <br>
-                [1]: “EU plastics production and demand first estimates for 2020” <br>
+                [1]https://www.condorferries.co.uk/marine-ocean-pollution-statistics-facts
-               [2]: ?<br>
+[2]https://github.com/openvinotoolkit/cvat
-               [3]: The New Plastics Economy, Rethinking The Future of Plastics (Rep.). (2016). Geneva, Switzerland: The
+[3]https://arxiv.org/abs/1703.06870
-               World Economic Forum. <br>
+[4]https://github.com/facebookresearch/Detectron
-               [4]: <br>
+[5]https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md#coco-instance-segmentation-baselines-with-mask-r-cnn
-               [5]: <br>
+[6]https://jonathan-hui.medium.com/map-mean-average-precision-for-object-detection-45c121a31173
-               [6]: Austin, H. P., Allen, M. D. et al. (2018). Characterization and engineering of a plastic-degrading
+[7]https://www.itc.gov.hk/en/fund_app/patent_app_grant.html
-               aromatic polyesterase. Proceedings of the National Academy of Sciences, 115(19).
+<br>
-               doi:10.1073/pnas.1718804115<br>
-               [7]: Yoshida, S., Hiraga, K. et al. (2016). A bacterium that degrades and assimilates poly(ethylene
-               terephthalate). Science,351(6278), 1196-1199. doi:10.1126/science.aad6359 <br>
-               [8]: Knott, B. C., Erickson, E. et al. (2020). Characterization and engineering of a two-enzyme system for
-               plastics depolymerization, pnas.org. doi:10.1073/pnas.2006753117<br>
-               [9]: Han, X., Liu, W. et al. (2017). Structural insight into catalytic mechanism of PET hydrolase, nature
-               communications. DOI:10.1038/s41467-017-02255-z<br>
-               [10]: Joo, S., Cho, I. J. et al. (2018). Structural insight into molecular mechanism of poly(ethylene
-               terephthalate) degradation. Nature Communications, 9(1). doi:10.1038/s41467-018-02881-<br>
              </p>

Difference between revisions of "Team:HK GTC/Deep learning"

Revision as of 05:42, 24 September 2021

Deep Learning

Deep Learning

Detection of Plastic Bottles

Our Workflow

Photo Taking

Volunteer Plastic Tagging

Model Description

Training

Model Usage

Model Results

Training / Validation Configurations

Baseline results

Potential reasons for High AP & inaccurate performance

Power Estimations of the dataset

Future Plans/Implementations

Plans