Team:UESTC-China/Model2

description

Problem Restatement

In the experimental stage, each deinking experiment requires a standard to measure the residual degree of ink spots on the paper with human eyes. If the residual amount is small enough, the deinking will be classified as completed and continue to the next step. If the residual ink spots are notable, the deinking experiment will be repeated until the paper has reached the output standard. The key point is this 'standard', and how to measure it objectively. On the residual of paper ink spot, it's obvious that measuring by experience is run counter to accuracy.So a quantitative indicator is urgently need to judge the amount of ink left on paper.

We proposed a solution to this problem! In the experimental test phase, we printed the regular distribution of English periods, namely '.', on the test A4 paper. At the end of each deinking test, the number of remaining English periods is calculated manually. If the quantity exceeds the threshold value, it is judged that the paper is not deinked clean. If it does not exceed the threshold value, it means that the paper reaches the output standard. At the same time, according to the quantity of residual ink, we can easily set up multiple experiments to verify the effects of different enzyme solutions we used. The indicator is the number of periods '.' left on the paper after deinking.

But there are obvious drawbacks to this approach. The calculation of period number can play a good role in verifying the deinking effect of different solutions, and the error is relatively low. But in practice, there are no regular English periods on the page. Sentences and punctuation appear most frequently on paper, as well as charts and Arabic numerals. Therefore, we use the method of counting points to verify the effect of the experiment. For the mathematical model of distribution control in the deinkers, we adopt another form, that is, the use of computer vision to help us measure.

Table1. Compare the advantages and disadvantages and application scenarios of Counting Quantization Method and CV Evaluation Method

	Advantages	Disadvantages
Counting Quantization Method	It can effectively quantify the index and present each experimental paper in digital form. It can effectively reflect the difference between two sheets of the same paper after different solutions and environmental reactions. It is suitable for verifying the effect of the deinking results by enzymes.	But lack of practicability, only suitable for laboratory environment, but not suitable for deinker environment.
CV Evaluation Method	Application range is very wide, not only suitable for all kinds of paper, but suitable for process judgment.	The camera is not accurate enough. It can only identify whether the ink is remained on each area of the paper, and can not be calculated in the form of data. It is not suitable for accurately measuring the deinking effect.

We used deep learning models in computer vision to help us determine if there are any ink spots left on a piece of paper. When we choose to use the relevant model, we face several problems. We have listed some of the issues that need to be addressed below, in which we need to each provide a reasonable solution.

1. Deep learning models are difficult to quantify the amount of residual ink on a piece of paper using digital metrics. In the case of conventional instruments (consider only 720p, 1080p, 2k resolution cameras), computers can usually only tell us if there are ink spots in an area, not how many there are in an area;

2. After the paper is processed, the impurities in the paper will gradually appear due to the reduction of thickness. Especially in black box lighting conditions, it is easy to misidentify residual ink and paper impurities;

3. After the paper is wet, how many deinking operations should be completed on the paper by sticky cylinder can only be decided by experience at present.

In essence, the first two problems are mainly solved by the ability of the deinker to judge the ink, and the third problem is mainly solved by the deinker to operate the ink. With a combination of clear identification and accurate deinking, we can easily complete the deinking task without damaging the paper significantly. We even added some extra functionality to the base recognition system. But all of this is based on three problem solvers, and here are some ideas to solve the problems mentioned above.

Pre-training on Paper

In view of the first problem, due to the limitation of equipment, it is impossible to deploy a large scanner in a small deinking machine to obtain various attributes of the paper to be deinked. It's even possible to measure the ink content of the paper directly in certain ways, but that's not practical for our instruments. So we do not consider the direct quantification of paper residual ink. We do the following for the standard A4 size input paper:

1. Camera photography needs to adjust the light and paper flatness. These are assisted by other hardware structures of the deinker;

2. Divide the A4 paper in the photo into several rectangles, which need to be evenly divided;

3. For the central rectangular part of the paper, we will divide it carefully. The degree of detail can be adjusted to facilitate the addition of new functions behind;

4. Use convolutional neural network to identify whether each small rectangle in the central rectangle has ink, and set the threshold of the number of small rectangles with ink. If the value exceeds the threshold, then continuing deinking experiment, if less than the threshold, end the experiment, output paper;

5. Delete the photos after the end to achieve file confidentiality.

Figure1. Random distribution diagram of ink block and normal block. The blue part is the normal block, and the orange part is the block with ink

Figure2. Random distribution diagram of impurity block and normal block. The blue parts are normal blocks and the green parts are blocks containing impurities

We randomly selected two maps of the distribution of ink and impurities with the normal block. Figure 1 and Figure 2 are both extracted from the edge rectangle area, so we consider the problem of ink staining caused by paper getting wet. So our x-coordinate starts at -0.5.

The first step above is to prepare the work and provide a good base picture for the subsequent identification system. The purpose of the second and third steps is to divide the paper into pieces. During processing, we consider that the four sides of A4 paper will leave a large area when printing, so we put the core area of the block into the center part. The fourth step is to identify the main parts of the algorithm. The fifth part mainly completes the auxiliary function of confidentiality.

In this operation, we get many small rectangles with or without ink. In fact, if the problem of prediction accuracy is not considered, the decision-making process of the model has been finished after the threshold value of the number of rectangles is set. However, we encountered some other problems in practice, such as the second problem mentioned above.

Network Structure Selection

For the second problem, it is sometimes difficult to distinguish impurities and residual ink at a long distance with the human eye alone without using computer vision prediction. As a result, the convolutional neural network commonly used by us often overfits, treating the impurities and ink in the paper as ink. Under this decision, the model will always determine that the machine needs to deink the paper. But in many cases, the effect of deinking has reached the standard, and there is no need to repeat the deinking operation. The main problem caused by such operations is that too many times of deinking seriously hurt the paper, causing paper damage and so on [1][2].

When we hadn't thought about this before, we used VGG, which can be called directly in Keras [3]. However, the reality is that under the guidance of VGG network, we found that several deinking operations did not affect the ink. There is even a problem that the clean part of the paper has been deinked for many times leading to breakage and the ink has not been deinked. We first thought it was the problem of the neural network model. Maybe our computer simulation is still quite different from the reality, maybe our training set data is too little, maybe the parameters in the network are not adjusted reasonably... So we deepen the network structure, adjust network parameters, increase training data. However, the recognition accuracy was still very low, so we considered that it might be the problem of network structure.

We decided to use a network with a similar architecture for Inception (Figure 3 illustrates the architecture of the Inception network). Unlike the common convolutional neural network architecture, the Inception module computes several different transformations on the same input map in parallel and connects their results to a single output. After special consideration, we choose a 1*1 and two 3*3 convolution kernels for each layer for processing transformation. This is due to the fact that our calculations can become very expensive without considering the size. Let's say there are M input mappings, adding a filter means convolving M more mappings, adding N filters means convolving N times M more mappings. In other words, as the authors point out: "Any uniform increase in the number of filters results in a four-fold increase in computations." Our naive Inception module simply increases the number of filters by a factor of three or four. But in terms of cost, it's a disaster [4].

Figure3. Inception network structure

As we continue to deepen the structure of the network, problems still arise. At the same time, the deepening of the network accelerates the over-fitting and gradient explosion of the model. So to sum up, we need to adjust the network structure rather than just deepen the network depth or breadth. What we need is a network that can identify impurities and ink with high accuracy. However, according to the actual results, our new Inception network still fails to solve the problem of misrecognition. So we need to consider other network structures that can compare different stages of change. So let's think about the way of residual connection.

Therefore, we use the structure in Figure 4 to modify the network structure by means of residual connection. We considered using ResNet [5][6].

Figure4. Graph representation of residual connection

This network structure works surprisingly well in practice. Previously, deep neural networks often suffered from the problem of gradient extinction, in which the gradient signal from the error function dropped exponentially as it propagated back to the earlier layers. Essentially, by the time the error signals are reverted back to earlier layers, they become too small for the network to learn. However, because ResNet's gradient signal can be directly connected to earlier layers, we can build networks of many more layers at a time, and they still perform well [7][8][9].

Figure5. VGG, Inception, and ResNet on the accuracy of inkblot recognition.

As shown in the figure, we find that when we give a certain number of experimental paper as the training set, ResNet is significantly better than the other two networks in recognizing ink. Therefore, we believe that the ResNet network can meet the requirements in recognition accuracy after some modification.

Reinforcement Learning on Decision-making

So then we discuss the third question above. How many times should we deink an ink stain? We need to build neural networks to tell us the answer to that question. However, obviously, the network structure optimized for the second problem cannot help us make such a decision. We lack a network architecture with strong robustness and decision-making ability. With this requirement in mind, we decided to introduce reinforcement learning into the ResNet network that we had already done [10][11]. Then we will form a new network structure. Our current fusion method is to use convolutional neural network to fit Q function (DQN) in reinforcement learning. Let's first explain the concepts and functions of reinforcement learning introduced here [12].

First we introduce markov reward model, we introduce the following equation:

$$ V_t(s)= E[G_t|s_t=s] $$

$$G_t=R_{t+1}+\gamma R_{t+2}+\gamma ^2 R_{t+3}+...+\gamma ^{T-t-1}R_T$$

Where S represents a certain state, G represents the reward obtained, and γ is used as a coefficient to reinforce the value of the reward in the next step and avoid the endless reward in the closed Markov chain.

Soon, we introduce the Bellman equation. The characteristic of The Bellman equation is to establish a relationship between the two terms, where R(s) represents the current reward.

$$V(s)=R(s)+\gamma \sum_{s'\in s}P(s'|s)V(s')$$

Let's write it in vector form:

$$ V=R+\gamma PV $$

Next, we introduce the equation of the complete Markov decision process. These include variables such as state, action, probability, reward, strategy, etc.

$$P^π(s'|s)=\sum_{a\in A}π(a|s)P(s'|s,a)$$

$$R^π(s)=\sum_{a\in A}π(a|s)R(s,a)$$

We need to get the state value function and the action value function in the equation with multiple variables.
State-value function:

$$v^π(s)=E_π[G_t|s_t=s]$$

Action-value function:

$$q^π(s,a)=E_π[G_t|s_t=s,A_t=a]$$

We establish a relationship for the two functions:

$$v^π(s)=\sum_{a\in A}π(a|s)q^π(s,a)$$

Then we get the Bellman equation:

$$v^π(s)=\sum_{a\in A}π(a|s)(R(s,a)+\gamma \sum_{s'\in S}P(s'|s,a)v^π(s'))$$

$$q^π(s,a)=R(s,a)+\gamma \sum_{s'\in S}P(s'|s,a)\sum_{a'\in A}π(a'|s')q^π(s',a')$$

Our subsequent operation is to solve the value function, but the solving process is divided into two cases, respectively, whether the strategy π is given.

If given the strategy π and the related characteristics of state, action, probability, we can easily find the value function under π. At this point, we just need to use the Bellman equation for iteration to get the result.

$$v_{t+1}(s)=\sum_{a\in A}π(a|s)(R(s,a)+γ\sum_{s'\in S}P(s'|s,a)v_t(s'))$$

But without a clear strategy π, this is similar to the problem we encountered in the deinking experiment. We can write the problem as follows.

$$v^*(s)=\max_πv^π(s)$$

$$π^*(s)=arg\,\max_πv^π(s)$$

To solve the above problem, we need to carry out policy Iteration in two steps, namely evaluate and improve.
Evaluate: Evaluate the value function of a given π. The calculation procedure is π - v - q - maxq - π '.
Improve: Use the greedy algorithm to change the value of π, again back to evaluate the value function.

$$\pi'=greedy(v^\pi)$$

$$q^{\pi_i}(s,a)=R(s,a)+γ\sum_{s'\in S}P(s'|s,a)v^{\pi_i}(s')$$

$$π_{i+1}(s)=arg\,\max_a q^{π_i}(s,a)$$

After iteration, we get Bellman Optimality Equation.

$$v^\pi (s)=\max_{a\in A}q^\pi(s,a)$$

In a word, the value Iteration method we use is to use Bellman Optimality Equation as the iteration condition for iteration.

Deep Reinforcement Learning

If we make reasonable use of the decision making ability of reinforcement learning and take into account the accurate prediction ability of ResNet, we can build a set of perfect ink recognition system. In other words, this system can recognize ink through strong independent decision-making and adjustment ability. After a certain period of learning, the machine can take the initiative to adjust the number of paper deinking according to the identified ink on the paper, so as to achieve perfect paper deinking and avoid surface damage. Therefore, we need to shift the focus of the model work to combining the two models.

We have described markov decision process and Bellman equation in details above. Therefore, according to the content of mathematical derivation of reinforcement learning, we decided to use the convolutional neural network designed previously to fit the Q function in reinforcement learning, and the weight of each layer of network is the corresponding value function. Take 4 paper images taken in 4 frames as state and output the Q value corresponding to each action. If we want to perform a Q update or select the corresponding action with the highest Q value, we can go through the entire network once and get the Q value for any action immediately. The original input of DQN is 4 consecutive frames of images. More than one frame is used in order to perceive the dynamic environment.

However, there is an obvious problem in this process, that is, as supervised learning, neural network requires data to meet the independent homodistribution. And there is a strong correlation between the data that we collect through reinforcement learning. If we use such data to train the model, it will cause network oscillation, and the result will not be satisfactory. So we used the experience replay mechanism to break this correlation.

The main approach is to store the data obtained by reinforcement learning in a specific database, and then let the neural network randomly select a certain amount of data for learning, which can solve the problem of strong correlation between data to a certain extent.

Meanwhile, in the deep reinforcement learning network, we set up the target network independently to deal with TD bias in the time-difference algorithm. When the neural network was used to approximate the value function, the parameter θ was updated at each step by gradient descent method. Therefore, value function update actually becomes an update process of supervised learning, and its gradient descent method can be expressed as follows after modification:

$$ \theta_{t+1} = \theta_{t} + \alpha[r+\gamma \max_{a'}Q(s',a';\theta^{-})-Q(s,a;\theta)]\nabla Q(s,a;\theta) $$

Where θ^- is separately designed to compute the TD target network.

After such combination, the recognition system has been completed in theory. With the help of reinforcement learning, the recognition accuracy of the original network structure is improved. What is more surprising is that the convergence speed of the network is obviously accelerated, and the model becomes more stable after some training. As we work with more and more different paper materials, our model will become more comprehensive and our prediction accuracy will get closer to 100% [13].

Figure6. Comparison of recognition effect between DQN and other three neural networks

Function Expansion

As mentioned earlier in this article, we can add additional features on top of the basic features. The new feature added could be point-to-point recognition of deinking. Considering the confidentiality of the document, we do not consider the method of text recognition to scan the content of the document for deinking, but the method of number box selection, the specific method is as follows:

In the above description, when we partitioned A4 paper, we have mentioned that we divided the central rectangle into several smaller rectangles. At this point, in order to achieve point-to-point recognition deinking, we need to set as many small rectangles as possible. We have a selection box that covers a certain area of the paper and the essence of the selection box is that it covers a lot of little rectangles. The more small rectangles we have in the unit area of the paper, the higher our accuracy will be, and the smaller the range of point-to-point deinking we can achieve. As for other structures of our hardware, we only need to spray enzyme solution and ink adhesion in specific areas with the selection box.

References

[1] Dumitru Erhan, Christian Szegedy, Alexander Toshev, and Dragomir Anguelov. Scalable ob- ject detection using deep neural networks. In Computer Vision and Pattern Recognition, 2014. CVPR 2014. IEEE Conference on, 2014.

[2] Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Computer Vision and Pattern Recognition, 2014. CVPR 2014. IEEE Conference on, 2014.

[3] F. Chollet. Keras. https://github.com/fchollet/keras, 2015.

[4] C.Szegedy,W.Liu,Y.Jia,P.Sermanet,S.Reed,D.Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1–9, 2015.

[5] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.

[6] J. Jin, A. Dundar, and E. Culurciello. Flattened convolutional neural networks for feedforward acceleration. arXiv preprint arXiv:1412.5474, 2014.

[7] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[8] M. Wang, B. Liu, and H. Foroosh. Factorized convolutional neural networks. arXiv preprint arXiv:1608.04337, 2016.

[9] M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014, pages 818–833. Springer, 2014.

[10] Leemon Baird. Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the 12th International Conference on Machine Learning (ICML 1995), pages 30–37. Morgan Kaufmann, 1995.

[11] Richard Sutton and Andrew Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.

[12] Alex Graves, Abdel-rahman Mohamed, and Geoffrey E. Hinton. Speech recognition with deep recurrent neural networks. In Proc. ICASSP, 2013.

[13] Pierre Sermanet, Koray Kavukcuoglu, Soumith Chintala, and Yann LeCun. Pedestrian de- tection with unsupervised multi stage feature learning. In Proc. International Conference on Computer Vision and Pattern Recognition (CVPR 2013). IEEE, 2013.

footer