Difference between revisions of "Team:HK GTC/Deep learning"

 
(28 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
 
{{HK_GTC}}
 
{{HK_GTC}}
  
 
<html>
 
<html>
<div class="section-heading">
 
        <div class="mask"></div>
 
        <h1>Deep Learning</h1>
 
      </div>
 
  
      <section></section>
+
<style>
    
+
   .section-heading{
      <div class="section-selector">
+
       background-image: url("https://static.igem.org/mediawiki/2021/2/2f/T--HK_GTC--dl_head.jpg");
        <h1>Deep Learning</h1>
+
  }
        <ul>
+
</style>
            <li><a href="#detection">Detection of Plastic Bottles</a></li>
+
            <li><a href="#workflow">Our Workflow</a></li>
+
            <li><a href="#descri">Model Description </a></li>
+
            <li><a href="#usage">Model Usage</a></li>
+
            <li><a href="#results">Model Results</a></li>
+
            <li><a href="#future">Future Plans/Implementations</a></li>
+
        </ul>
+
       </div>
+
 
+
      <div class="right-section">
+
 
+
        <div class="section-contents">
+
            <h2 id="current">Detection of Plastic Bottles</h2>
+
            <p>As we know, plastic pollution is a severe problem that impacts both the ecosystem and our daily lives. One of the major sources of plastic is oceans, and an estimated amount of 5.25 trillion pieces of plastic and microplastic are currently floating around the ocean, of which 15% of them will eventually land on our beaches[1]. In response to the plastic problem, we would like to develop a deep learning PET bottle detection model for mapping plastic pollution on beaches. The data allows government, councils, NGO to have an overview of the current situation and to estimate how effective their proposal is to reduce the impact of plastic wastes. Researchers will begin working to prepare and implement an effective plan for prevention and cleanup effort on where they should focus. Our ultimate goal is to help to reduce the amount of plastic pollution in the ocean. </p>
+
 
+
            <h2 id="workflow">Our Workflow</h2>
+
            <h3>Photo Taking</h3>
+
            <p>Using our drone and phones, we took 718 images along coastlines of beaches including Cheung Chau and Cheung Sha in Hong Kong, and uploaded them to CVAT(Computer vision annotation tool)[2] - a website provided by Clearbot, a company which creates marine plastic clearing robots, to create ground truth instances for the training process.</p>
+
            <div class="single-image-with-desc">
+
              <center><img src="Figure 1. deep learning.png" alt=""></center>
+
              <p>Fig 1. Students using a drone to capture images on beaches</p>
+
            </div>
+
            <h3>Volunteer Plastic Tagging</h3>
+
            <p>Together with the human practice team, we have organized a plastic tagging activity within our school, we invited around 5 students from each class, with a total amount of 60 students. During the activity, we taught them how to trace polygons around a plastic object in an image in CVAT. Together with the help of some of our team members, we had generated the training data.</p>
+
<source src="https://drive.google.com/file/d/1O7Fn-FvBeDmkYP4nrVs7grysdhNNiQL5/view?usp=sharing">
+
            <br>
+
            <p>Then we train a model using the data obtained with Detectron from Facebook AI Research to obtain plastic detecting models. </p>
+
 
+
            <h2 id="descri">Model Description</h2>
+
            <p>
+
              The detection algorithm we’re using is Mask-RCNN[3], which is an object detection algorithm developed by Facebook, the algorithm extends the concept of Faster-RCNN, which only creates bounding boxes around detected objects. We decided to use Detectron2[4] as a framework to create a model. We use the baselines from Detectron’s Model Zoo[5] for transfer learning to improve model performance.
+
 
+
              To have an intuitive understanding of the model structure. Let's say we have a video that we want to detect PET bottles from, we then process each frame from the video by passing it to a Convolutional Neural Network, which uses filters as kernels to extract features from the images, such as shapes, reflections, highlights, etc. And creating a feature map.
+
            </p>
+
 
+
            <div class="single-image-with-desc">
+
              <center><img src="../img/1.png" alt=""></center>
+
              <p>Fig.2 Structure of feature extracting backbone</p>
+
            </div>
+
           
+
            <p>The feature map is then passed into a Region Proposal Network (RPN), which is a small neural network that generates region proposals as bounding boxes and whether there is an object in them. Together with the feature map from the last section, ROI align is applied to each region of interest in the feature map to get a fixed-dimensioned input for the next section. The output of the ROI align layer is passed into two networks a Fully Connected Layer and a small Fully convolutional network, together they can create a mask and classify the object. </p>
+
  
<div class="single-image-with-desc">
+
<span class="material-icons to-top"><a href="#1">expand_less</a></span>
              <center><img src="../img/1.png" alt=""></center>
+
              <p>Fig.3 Structure of RPN, ROI Align and following two neural network layers</p>
+
            </div>
+
            <h3>Training</h3>
+
            <p>To optimize the results, we need to train the network until the output of the network for the training data is close to the ground truth results. In other words, the training objective is to minimize the difference between them. The difference between the ground truth and model output can be defined with a Loss Function, in the case of Mask-RCNN, it’s defined from the error of bounding box prediction, classification, and mask prediction. For validation of training results, we use COCO mAP[6] for calculation.</p>
+
 
+
            <h2 id="usage">Model Usage</h2>
+
            <p>Requirements:
+
  
Local Linux environment With:
+
    <div class="section-heading" id="1">
Detectron2
+
      <div class="mask"></div>
Jupyter Notebook
+
      <h1>Deep Learning</h1>
pytorch 1.8
+
  </div>
torchvision
+
OpenCV
+
Numpy
+
  
or Google Colab Notebook
+
  <div class="section-wrapper">
  
Files and guidelines for the codes can be found in our Github(Link To Github).
+
  <div class="section-selector">
 +
      <h1>Deep Learning</h1>
 +
      <ul>
 +
        <li><a href="#detection">Detection of Plastic Bottles</a></li>
 +
        <li><a href="#workflow">Our Workflow</a></li>
 +
        <li><a href="#descri">Model Description </a></li>
 +
        <li><a href="#usage">Model Usage</a></li>
 +
        <li><a href="#results">Model Results</a></li>
 +
        <li><a href="#future">Future Plans/Implementations</a></li>
 +
      </ul>
 +
  </div>
  
 +
  <div class="right-section">
 +
      <div class="section-contents">
 +
        <h2 id="current">Detection of Plastic Bottles</h2>
 +
        <p>As we know, plastic pollution is a severe problem that impacts both the ecosystem and our daily lives. One
 +
            of the major sources of plastic is oceans, and an estimated amount of 5.25 trillion pieces of plastic and
 +
            microplastic are currently floating around the ocean, of which 15% of them will eventually land on our
 +
            beaches[1]. In response to the plastic problem, we would like to develop a deep learning PET bottle
 +
            detection model for mapping plastic pollution on beaches. The data allows government, councils, NGO to have
 +
            an overview of the current situation and to estimate how effective their proposal is to reduce the impact of
 +
            plastic wastes. Researchers will begin working to prepare and implement an effective plan for prevention and
 +
            cleanup effort on where they should focus. Our ultimate goal is to help to reduce the amount of plastic
 +
            pollution in the ocean. </p>
 +
 +
        <h2 id="workflow">Our Workflow</h2>
 +
 +
        <h3>Photo Taking</h3>
 +
 +
        <p>Using our drone and phones, we took 718 images along coastlines of beaches including Cheung Chau and Cheung
 +
            Sha in Hong Kong, and uploaded them to CVAT(Computer vision annotation tool)[2] - a website provided by
 +
            Clearbot, a company which creates marine plastic clearing robots, to create ground truth instances for the
 +
            training process.</p>
 +
 +
        <div class="single-image-with-desc">
 +
            <center><img src="https://static.igem.org/mediawiki/2021/b/b6/T--HK_GTC--dl_1.jpg" alt=""></center>
 +
            <p>Fig 1. Students using a drone to capture images on beaches</p>
 +
        </div>
 +
 +
        <h3>Volunteer Plastic Tagging</h3>
 +
        <p>Together with the human practice team, we have organized a plastic tagging activity within our school, we
 +
            invited around 5 students from each class, with a total amount of 60 students. During the activity, we
 +
            taught them how to trace polygons around a plastic object in an image in CVAT. Together with the help of
 +
            some of our team members, we had generated the training data.</p>
 +
            <video width="640" height="480" controls>
 +
              <source src="https://static.igem.org/mediawiki/2021/1/10/T--HK_GTC--dl_cvat.mp4" type="video/mp4">
 +
            </video>
 +
        <p>Then we train a model using the data obtained with Detectron from Facebook AI Research to obtain plastic
 +
            detecting models. </p>
 +
 +
        <h2 id="descri">Model Description</h2>
 +
        <p>
 +
            The detection algorithm we’re using is Mask-RCNN[3], which is an object detection algorithm developed by
 +
            Facebook, the algorithm extends the concept of Faster-RCNN, which only creates bounding boxes around
 +
            detected objects. We decided to use Detectron2[4] as a framework to create a model. We use the baselines
 +
            from Detectron’s Model Zoo[5] for transfer learning to improve model performance.
 +
        </p>
 +
        <h3>Structure</h3>
 +
        <p>To have an intuitive understanding of the model structure. Let's say we have a video that we want to detect PET bottles from, we then process each frame from the video by passing it to a Convolutional Neural Network, which uses filters as kernels to extract features from the images, such as shapes, reflections, highlights, etc. And creating a feature map.</p>
 +
 +
        <div class="single-image-with-desc">
 +
            <center><img src="https://static.igem.org/mediawiki/2021/b/b7/T--HK_GTC--dl_2.jpg" alt=""></center>
 +
            <p>Fig.2 Structure of feature extracting backbone</p>
 +
        </div>
 +
 +
        <p>The feature map is then passed into a Region Proposal Network (RPN), which is a small neural network that
 +
            generates region proposals as bounding boxes and whether there is an object in them. Together with the
 +
            feature map from the last section, ROI align is applied to each region of interest in the feature map to get
 +
            a fixed-dimensioned input for the next section. The output of the ROI align layer is passed into two
 +
            networks a Fully Connected Layer and a small Fully convolutional network, together they can create a mask
 +
            and classify the object. </p>
 +
 +
        <div class="single-image-with-desc">
 +
            <center><img src="https://static.igem.org/mediawiki/2021/3/3c/T--HK_GTC--dl_3.jpg" alt=""></center>
 +
            <p>Fig.3 Structure of RPN, ROI Align and following two neural network layers</p>
 +
        </div>
 +
        <h3>Training</h3>
 +
        <p>To optimize the results, we need to train the network until the output of the network for the training data
 +
            is close to the ground truth results. In other words, the training objective is to minimize the difference
 +
            between them. The difference between the ground truth and model output can be defined with a Loss Function,
 +
            in the case of Mask-RCNN, it’s defined from the error of bounding box prediction, classification, and mask
 +
            prediction. For validation of training results, we use COCO mAP[6] for calculation.</p>
 +
 +
        <h2 id="usage">Model Usage</h2>
 +
        <p>Requirements:<br><br>
 +
 +
            Local Linux environment With:<br><br>
 +
            Detectron2<br>
 +
            Jupyter Notebook<br>
 +
            pytorch 1.8<br>
 +
            torchvision<br>
 +
            OpenCV<br>
 +
            Numpy<br>
 +
            <br>
 +
            or Google Colab Notebook<br>
 +
            <br>
 +
            Files and guidelines for the codes can be found in our
 +
            <a href="https://github.com/IGEM-TEAM-HK-GTC/HK_GTC">Github</a>.
 +
 +
        </p>
 +
 +
        <h2 id="results">Model Results</h2>
 +
        <h3>Training / Validation Configurations</h3>
 +
        <p>We performed data augmentations on the training images, including random flipping, and brightness and
 +
            contrast, between the scale 0.9 and 1.1. Each iteration loops through 2 images from the training data for
 +
            mini-batch stochastic gradient descent. Our training and validation dataset contains 718 images in total,
 +
            and are split in an 8:2 ratio, where there are 574(1145 instances) and 144 (193 instances) images for
 +
            training and validation. We trained the models in both Google Colaboratory and Kaggle using both Nividia
 +
            P100 and K80 GPU, for 1000 iterations. For validation, we collected the Mean Average Precisions of the model
 +
            performing on the validation dataset and losses for every 20 iterations. And plotted them in a line graph.
 +
        </p>
 +
        <p>We trained the model in different baselines from the Detectron Model Zoo[4], which includes X101-FPN,
 +
            R101-FPN, R50-FPN. Also, performed power estimations for the training dataset in fractions of 0.25, 0.5,
 +
            0.75 to find out the size of the training data to maximize model performance.</p>
 +
        <h3>Baseline results</h3>
 +
        <div class="single-image-with-desc">
 +
            <center><img src="https://static.igem.org/mediawiki/2021/2/25/T--HK_GTC--dl_4.jpg" alt=""></center>
 +
            <p>Table 1: The mAP results were tested on the validation dataset by models with different baselines, as we
 +
              can see that X101-FPN outperforms other models we tested.</p>
 +
        </div>
 +
 +
        <div class="single-image-with-desc">
 +
            <center><img src="https://static.igem.org/mediawiki/2021/6/61/T--HK_GTC--dl_5.jpg" alt=""></center>
 +
            <p>Table 2: The mAP results from the Mask-RCNN paper.</p>
 +
        </div>
 +
        <p>As shown in Table 1, the X101-FPN backbone, which combines the concept of ResNet and InceptionNet,
 +
            out-performs the R50-FPN and R101-FPN baseline in terms of overall Mean Average Precision(+3.7, +2.5).
 +
            Comparing with the results obtained from the Mask-RCNN paper, which they train their model from the COCO
 +
            dataset, consisting 330k images of 80 object categories. Our models clearly have a higher mAP. Sample
 +
            detection images are shown below.</p>
 +
        <div class="single-image-with-desc">
 +
            <div class="im-group">
 +
              <center><img src="https://static.igem.org/mediawiki/2021/6/6a/T--HK_GTC--dl_6.jpg" alt=""></center>
 +
            </div>
 +
            <p>Fig. 4: 3 sample images from the validation set detected by the model with X101-FPN baseline
 
             </p>
 
             </p>
 
+
        </div>
            <h2 id="results">Model Results</h2>
+
            <h3>Training / Validation Configurations</h3>
+
            <p>We performed data augmentations on the training images, including random flipping, and brightness and contrast, between the scale 0.9 and 1.1. Each iteration loops through 2 images from the training data for mini-batch stochastic gradient descent. Our training and validation dataset contains 718 images in total, and are split in an 8:2 ratio, where there are 574(1145 instances) and 144 (193 instances) images for training and validation. We trained the models in both Google Colaboratory and Kaggle using both Nividia P100 and K80 GPU, for 1000 iterations. For validation, we collected the Mean Average Precisions of the model performing on the validation dataset and losses for every 20 iterations. And plotted them in a line graph.
+
  
We trained the model in different baselines from the Detectron Model Zoo[4], which includes X101-FPN, R101-FPN, R50-FPN. Also, performed random subsampling for the training dataset in fractions of 0.25, 0.5, 0.75 to find out the size of the training data to maximize model performance.  
+
        <h3>Potential reasons for High AP & inaccurate performance</h3>
</p>
+
        <p>
             <h3>Baseline results</h3>
+
            Although we have a high Average Precision, the detections were not perfect, there were clearly some false
            <div class="single-image-with-desc">
+
            positives existing in the sample images.<br><br>The first reason is probably that our model only detects one object category - PET bottles. Compared with the Mask-RCNN benchmark, they detect 80 categories in total, which drags down their AP. <br><br>Besides, there are flaws existing in our dataset: The most obvious point is the lack of both training and validation data in our dataset, compared to large scale datasets such as Pascal VOC, imageNet, and COCO, consisting of >10000 instances per category. For training, it inhibits the ability of our model to learn more features of PET bottles. Moreover, our dataset contains images of similar objects of different angles, this causes the model to rely on these duplicate data, and only be able to detect bottles with features similar to them, causing bias in our model.
               <center><img src="../img/1.png" alt=""></center>
+
<br><br>Finally, we didn’t use some advanced validation methods, such as K-Fold Cross-validation, where the dataset
               <p>Table 1: The mAP results were tested on the validation dataset by models with different baselines, as we can see that X101-FPN outperforms other models we tested.</p>
+
             is divided into k sections, and average the AP obtained by using the distinct sections for validation.</p>
 +
        <div class="single-image-with-desc">
 +
            <div class="im-group">
 +
               <center><img src="https://static.igem.org/mediawiki/2021/0/05/T--HK_GTC--dl_7.png" alt=""></center>
 +
              <center><img src="https://static.igem.org/mediawiki/2021/0/08/T--HK_GTC--dl_8.png" alt=""></center>
 +
               <center><img src="https://static.igem.org/mediawiki/2021/b/bc/T--HK_GTC--dl_9.png" alt=""></center>
 
             </div>
 
             </div>
              
+
             <center><p>Fig 5. The training curves of X101FPN, R101FPN, and R50FPN (Up to down)</p></center>
            <div class="single-image-with-desc">
+
        </div>
              <center><img src="../img/1.png" alt=""></center>
+
 
              <p>Table 2: The mAP results from the Mask-RCNN paper.</p>
+
        <p>Fig. 5 are the training curves for the models, where the blue line represents the mAP, and the purple line
            </div>
+
             represents the loss. We can see that the mAP of the R50FPN and R101FPN model reached stability at around 500
            <p>As shown in Table 1, the X101-FPN backbone, which combines the concept of ResNet and InceptionNet, out-performs the R50-FPN and R101-FPN baseline in terms of overall Mean Average Precision(+3.7, +2.5). Comparing with the results obtained from the Mask-RCNN paper, which they train their model from the COCO dataset, consisting 330k images of 80 object categories. Our models clearly have a higher mAP. Sample detection images are shown below.</p>
+
             iterations, while the X101FPN model reaches stability at around 400 iterations, early than the rest. And
            <div class="three-image-with-desc">
+
            their losses all reached constancy at around 600 iterations. And after the 600 iteration mark, there is no
              <div class="im-group">
+
            sudden spike drop in the mAP, so we know that the model wasn’t experiencing overfitting.</p>
                  <center><img src="../img/1.png" alt=""></center>
+
 
                  <center><img src="../img/2.png" alt=""></center>
+
        <h3>Power Estimations of the dataset</h3>
                  <center><img src="../img/3.png" alt=""></center>
+
        <div class="two-image-with-desc">
              </div>
+
            <div class="im-group">
              <p>Fig. 4: 3 sample images from the validation set detected by the model with X101-FPN baseline
+
              <center><img src="https://static.igem.org/mediawiki/2021/9/98/T--HK_GTC--dl_10.png" alt=""></center>
              </p>
+
              <center><img src="https://static.igem.org/mediawiki/2021/a/ad/T--HK_GTC--dl_11.png" alt=""></center>
             </div>
+
              
+
            <h3>Potential reasons for High AP & inaccurate performance</h3>
+
            <p>
+
Although we have a high Average Precision, the detections were not perfect, there were clearly some false positives existing in the sample images.  
+
The first reason is probably that our model only detects one object category - PET bottles. Compared with the Mask-RCNN benchmark, they detect 80 categories in total, which drags down their AP.
+
Besides, there are flaws existing in our dataset: The most obvious point is the lack of both training and validation data in our dataset, compared to large scale datasets such as Pascal VOC, imageNet, and COCO, consisting of >10000 instances per category. For training, it inhibits the ability of our model to learn more features of PET bottles. Moreover, our dataset contains images of similar objects of different angles, this causes the model to rely on these duplicate data, and only be able to detect bottles with features similar to them, causing bias in our model.
+
Finally, we didn’t use some advanced validation methods, such as K-Fold Cross-validation, where the dataset is divided into k sections, and average the AP obtained by using the distinct sections for validation.</p>
+
            <div class="three-image-with-desc">
+
              <div class="im-group">
+
                  <center><img src="../img/1.png" alt=""></center>
+
                  <center><img src="../img/2.png" alt=""></center>
+
                  <center><img src="../img/3.png" alt=""></center>
+
              </div>
+
              <p>Fig 5. The training curves of X101FPN, R101FPN, and R50FPN(Left to right)
+
              </p>
+
 
             </div>
 
             </div>
 +
        </div>
  
            <p>Fig. 4 are the training curves for the models, where the blue line represents the mAP, and the purple line represents the loss. We can see that the mAP of the R50FPN and R101FPN model reached stability at around 500 iterations, while the X101FPN model reaches stability at around 400 iterations, early than the rest. And their losses all reached constancy at around 600 iterations. And after the 600 iteration mark, there is no sudden spike drop in the mAP, so we know that the model wasn’t experiencing overfitting.</p>
+
        <p>As Clearly shown in the graphs, our small scaled dataset is entirely not enough to achieve maximum
 
+
            performance. The trend of the AP against the training image percentage shows an exponential increase,
             <h3>Power Estimations of the dataset</h3>
+
             which means if we further expand our dataset it will continue to show AP improvements.</p>
            <div class="two-image-with-desc">
+
              <div class="im-group">
+
                  <center><img src="../img/1.png" alt=""></center>
+
                  <center><img src="../img/2.png" alt=""></center>
+
              </div>
+
  
             <p>As Clearly shown in the graphs, our small scaled dataset is entirely not enough to achieve maximum performance. The trend of the AP against the training image percentage shows an exponential increase, which means if we further expand our dataset it will continue to show AP improvements.</p>
+
        <h2 id="future">Future Plans/Implementations</h2>
 +
        <h3>Plans</h3>
 +
             <ul>
 +
              <li>Increase the size of training and validation data to improve model accuracy and ensure a reliable mAP</li>
 +
              <li>Train our dataset on other algorithms e.g. YOLOv3.</li>
 +
            </ul>
  
            <h2 id="future">Future Plans/Implementations</h2>
+
        <h3>Implementations</h3>
             <h3>Plans</h3>
+
        <ul>
             <p>Increase the size of training and validation data to improve model accuracy and ensure a reliable mAP
+
             <li>Verify actions done to coastline plastic pollution from different stakeholders (e.g. Effects of producer
Train our dataset on other algorithms e.g. YOLOv3.
+
              responsibility schemes by the government, Duration of effect of beach cleanups by NGOs)</li>
             </p>
+
             <li>Use the model as a backbone of a plastic cleanup robot around the coastline.</li>
 +
            <li>Mapping of plastic waste around the coastline using drones to generate a cleanup plan.
 +
             </li>
 +
        </ul>
  
            <h3>Implementations</h3>
+
        <p>
             <p>Verify actions done to coastline plastic pollution from different stakeholders (e.g. Effects of producer responsibility schemes by the government, Duration of effect of beach cleanups by NGOs)
+
            References <br>
Use the model as a backbone of a plastic cleanup robot around the coastline.
+
            [1]https://www.condorferries.co.uk/marine-ocean-pollution-statistics-facts<br>
Mapping of plastic waste around the coastline using drones to generate a cleanup plan.
+
             [2]https://github.com/openvinotoolkit/cvat<br>
             </p>
+
            [3]https://arxiv.org/abs/1703.06870<br>
 +
            [4]https://github.com/facebookresearch/D<br>etectron<br>
 +
            [5]https://github.com/facebookresearch/d<br>etectron2/blob/master/MODEL_ZOO.md#co<br>co-instance-segmentation-baselines-with-mask-r-cnn<br>
 +
             [6]https://jonathan-hui.medium.com/map-mean-average-precision-for-object-detection-45c121a31173<br>
 +
            [7]https://www.itc.gov.hk/en/fund_app/pat<br>ent_app_grant.html<br>
 +
        </p>
  
            <p>
 
              References <br>
 
              [1]https://www.condorferries.co.uk/marine-ocean-pollution-statistics-facts
 
[2]https://github.com/openvinotoolkit/cvat
 
[3]https://arxiv.org/abs/1703.06870
 
[4]https://github.com/facebookresearch/Detectron
 
[5]https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md#coco-instance-segmentation-baselines-with-mask-r-cnn
 
[6]https://jonathan-hui.medium.com/map-mean-average-precision-for-object-detection-45c121a31173
 
[7]https://www.itc.gov.hk/en/fund_app/patent_app_grant.html
 
<br>
 
            </p>
 
 
 
        </div>
 
 
       </div>
 
       </div>
   </section>
+
   </div>
 
+
  </div>
 +
  <div class="clear"></div>
 
   <footer>
 
   <footer>
 
       <div>
 
       <div>
          <h1>GT COLLEGE iGEM2021</h1>
+
        <h1>GT COLLEGE iGEM2021</h1>
          <img width="30%" src="https://static.igem.org/mediawiki/2021/7/78/T--HK_GTC--logo.png" alt="">
+
        <img width="30%" src="https://static.igem.org/mediawiki/2021/7/78/T--HK_GTC--logo.png" alt="">
 
       </div>
 
       </div>
 
       <div>
 
       <div>
          <h1>Follow Us!</h1>
+
        <h1>Follow Us!</h1>
          <a href="https://www.instagram.com/gtigemteam/">
+
        <a href="https://www.instagram.com/gtigemteam/">
              <i class="icon fab fa-instagram fa-2x"></i>
+
            <i class="icon fab fa-instagram fa-2x"></i>
          </a>
+
        </a>
 
       </div>
 
       </div>
 
       <div>
 
       <div>
          <h1>Contact:</h1>
+
        <h1>Contact:</h1>
          <p>igemteam.gt@gmail.com</p>
+
        <p>igemteam.gt@gmail.com</p>
 
       </div>
 
       </div>
  </footer>
+
  </footer>
 
</html>
 
</html>

Latest revision as of 11:18, 20 October 2021

HK_GTC 2021 Homepage

expand_less

Deep Learning

Detection of Plastic Bottles

As we know, plastic pollution is a severe problem that impacts both the ecosystem and our daily lives. One of the major sources of plastic is oceans, and an estimated amount of 5.25 trillion pieces of plastic and microplastic are currently floating around the ocean, of which 15% of them will eventually land on our beaches[1]. In response to the plastic problem, we would like to develop a deep learning PET bottle detection model for mapping plastic pollution on beaches. The data allows government, councils, NGO to have an overview of the current situation and to estimate how effective their proposal is to reduce the impact of plastic wastes. Researchers will begin working to prepare and implement an effective plan for prevention and cleanup effort on where they should focus. Our ultimate goal is to help to reduce the amount of plastic pollution in the ocean.

Our Workflow

Photo Taking

Using our drone and phones, we took 718 images along coastlines of beaches including Cheung Chau and Cheung Sha in Hong Kong, and uploaded them to CVAT(Computer vision annotation tool)[2] - a website provided by Clearbot, a company which creates marine plastic clearing robots, to create ground truth instances for the training process.

Fig 1. Students using a drone to capture images on beaches

Volunteer Plastic Tagging

Together with the human practice team, we have organized a plastic tagging activity within our school, we invited around 5 students from each class, with a total amount of 60 students. During the activity, we taught them how to trace polygons around a plastic object in an image in CVAT. Together with the help of some of our team members, we had generated the training data.

Then we train a model using the data obtained with Detectron from Facebook AI Research to obtain plastic detecting models.

Model Description

The detection algorithm we’re using is Mask-RCNN[3], which is an object detection algorithm developed by Facebook, the algorithm extends the concept of Faster-RCNN, which only creates bounding boxes around detected objects. We decided to use Detectron2[4] as a framework to create a model. We use the baselines from Detectron’s Model Zoo[5] for transfer learning to improve model performance.

Structure

To have an intuitive understanding of the model structure. Let's say we have a video that we want to detect PET bottles from, we then process each frame from the video by passing it to a Convolutional Neural Network, which uses filters as kernels to extract features from the images, such as shapes, reflections, highlights, etc. And creating a feature map.

Fig.2 Structure of feature extracting backbone

The feature map is then passed into a Region Proposal Network (RPN), which is a small neural network that generates region proposals as bounding boxes and whether there is an object in them. Together with the feature map from the last section, ROI align is applied to each region of interest in the feature map to get a fixed-dimensioned input for the next section. The output of the ROI align layer is passed into two networks a Fully Connected Layer and a small Fully convolutional network, together they can create a mask and classify the object.

Fig.3 Structure of RPN, ROI Align and following two neural network layers

Training

To optimize the results, we need to train the network until the output of the network for the training data is close to the ground truth results. In other words, the training objective is to minimize the difference between them. The difference between the ground truth and model output can be defined with a Loss Function, in the case of Mask-RCNN, it’s defined from the error of bounding box prediction, classification, and mask prediction. For validation of training results, we use COCO mAP[6] for calculation.

Model Usage

Requirements:

Local Linux environment With:

Detectron2
Jupyter Notebook
pytorch 1.8
torchvision
OpenCV
Numpy

or Google Colab Notebook

Files and guidelines for the codes can be found in our Github.

Model Results

Training / Validation Configurations

We performed data augmentations on the training images, including random flipping, and brightness and contrast, between the scale 0.9 and 1.1. Each iteration loops through 2 images from the training data for mini-batch stochastic gradient descent. Our training and validation dataset contains 718 images in total, and are split in an 8:2 ratio, where there are 574(1145 instances) and 144 (193 instances) images for training and validation. We trained the models in both Google Colaboratory and Kaggle using both Nividia P100 and K80 GPU, for 1000 iterations. For validation, we collected the Mean Average Precisions of the model performing on the validation dataset and losses for every 20 iterations. And plotted them in a line graph.

We trained the model in different baselines from the Detectron Model Zoo[4], which includes X101-FPN, R101-FPN, R50-FPN. Also, performed power estimations for the training dataset in fractions of 0.25, 0.5, 0.75 to find out the size of the training data to maximize model performance.

Baseline results

Table 1: The mAP results were tested on the validation dataset by models with different baselines, as we can see that X101-FPN outperforms other models we tested.

Table 2: The mAP results from the Mask-RCNN paper.

As shown in Table 1, the X101-FPN backbone, which combines the concept of ResNet and InceptionNet, out-performs the R50-FPN and R101-FPN baseline in terms of overall Mean Average Precision(+3.7, +2.5). Comparing with the results obtained from the Mask-RCNN paper, which they train their model from the COCO dataset, consisting 330k images of 80 object categories. Our models clearly have a higher mAP. Sample detection images are shown below.

Fig. 4: 3 sample images from the validation set detected by the model with X101-FPN baseline

Potential reasons for High AP & inaccurate performance

Although we have a high Average Precision, the detections were not perfect, there were clearly some false positives existing in the sample images.

The first reason is probably that our model only detects one object category - PET bottles. Compared with the Mask-RCNN benchmark, they detect 80 categories in total, which drags down their AP.

Besides, there are flaws existing in our dataset: The most obvious point is the lack of both training and validation data in our dataset, compared to large scale datasets such as Pascal VOC, imageNet, and COCO, consisting of >10000 instances per category. For training, it inhibits the ability of our model to learn more features of PET bottles. Moreover, our dataset contains images of similar objects of different angles, this causes the model to rely on these duplicate data, and only be able to detect bottles with features similar to them, causing bias in our model.

Finally, we didn’t use some advanced validation methods, such as K-Fold Cross-validation, where the dataset is divided into k sections, and average the AP obtained by using the distinct sections for validation.

Fig 5. The training curves of X101FPN, R101FPN, and R50FPN (Up to down)

Fig. 5 are the training curves for the models, where the blue line represents the mAP, and the purple line represents the loss. We can see that the mAP of the R50FPN and R101FPN model reached stability at around 500 iterations, while the X101FPN model reaches stability at around 400 iterations, early than the rest. And their losses all reached constancy at around 600 iterations. And after the 600 iteration mark, there is no sudden spike drop in the mAP, so we know that the model wasn’t experiencing overfitting.

Power Estimations of the dataset

As Clearly shown in the graphs, our small scaled dataset is entirely not enough to achieve maximum performance. The trend of the AP against the training image percentage shows an exponential increase, which means if we further expand our dataset it will continue to show AP improvements.

Future Plans/Implementations

Plans

  • Increase the size of training and validation data to improve model accuracy and ensure a reliable mAP
  • Train our dataset on other algorithms e.g. YOLOv3.

Implementations

  • Verify actions done to coastline plastic pollution from different stakeholders (e.g. Effects of producer responsibility schemes by the government, Duration of effect of beach cleanups by NGOs)
  • Use the model as a backbone of a plastic cleanup robot around the coastline.
  • Mapping of plastic waste around the coastline using drones to generate a cleanup plan.

References
[1]https://www.condorferries.co.uk/marine-ocean-pollution-statistics-facts
[2]https://github.com/openvinotoolkit/cvat
[3]https://arxiv.org/abs/1703.06870
[4]https://github.com/facebookresearch/D
etectron
[5]https://github.com/facebookresearch/d
etectron2/blob/master/MODEL_ZOO.md#co
co-instance-segmentation-baselines-with-mask-r-cnn
[6]https://jonathan-hui.medium.com/map-mean-average-precision-for-object-detection-45c121a31173
[7]https://www.itc.gov.hk/en/fund_app/pat
ent_app_grant.html

GT COLLEGE iGEM2021

Follow Us!

Contact:

igemteam.gt@gmail.com