Team:UESTC-China/Model1

description

Problem Restatement and Background Introduction

Prior to deinking efficiency validation experiment, we need to find a statistics method to measure how much ink is remained on the enzyme-treated paper. As you can see in figure1, residual ink is a mess in the treated paper. It is not easy to distinguish and count them by eyes. It is essential to find a reliable way to solve this problem.

Figure1. Deinked paper

Shrink characters to dots

The key problem hindering the counting is that the whole character is not always completely deinked. For example, h character might be deinked to be n or to be I. This difference make us unclear about how much h was deinked.

Figure2. Deinked "h" character

By intention-to-treat analysis, l or n are considered as not deinked at all. Namely, if the word home is deinked to be 'lone', then the deinking efficiency is 0%, because none of the character was completely deinked. And if the word home is deinked to be on, the deinking efficiency is 50%, because in the whole 4 characters, two of them are completely deinked.

Figure3. Deinked "home" words

However, this method has low accuracy. If all the characters are not completely deinked, the deinking efficiency is 0%, even though we can see in eyes that a lot of ink is removed. The accuracy is dependent on the size of the character. The more characters shrinks, the more accurate this method is. To the limit, one character shrinks into a small dot. And that's the basis of our statistical model.

Figure4. Deinked dots and its efficiency

Two-sample T test or nonparametric test

In order to count ink dot more conveniently, we divided a piece of A4 paper into small cells of the same size, and each cell contains the same 15 ink dots. Each small cell is considered as a sample. At the beginning, the residual ink points of each sample are all 15. Under different treatments, a large number of samples were counted, and their sample distribution curves were drawn respectively. Finally, T test was done to show if there is significant differences.

Figure5. Paper partition

Before choosing the test method, we need to check whether the population distribution of all samples obeys the Normal distribution.

If the data satisfies the Normal distribution, it is suitable to use a two-sample t-test[1]. If the data does not satisfy normal distribution, we use a nonparametric test instead[1]. Sometimes we can also use t-test when the number of samples is large (more than 100 in experience)[1],

Application for deinking efficiency validation experiment

In the case of our deinking experiment, the distribution of the data doesn't satisfy normal distribution. Thus during the experiment, we let the number of samples be over 100 and used t-test to check the significant difference between the treatments using active and inactive enzyme respectively.

References

[1] "Advanced Mathematical Statistics Second Edition". Mao Shisong, Wang Jinglong, Pu Xiaolong. Higher Education Press

footer