Problem Restatement and Background Introduction
Prior to deinking efficiency validation experiment, we need to find a statistics method to measure how much ink is remained on the enzyme-treated paper. As you can see in figure1, residual ink is a mess in the treated paper. It is not easy to distinguish
and count them by eyes. It is essential to find a reliable way to solve this problem.
Figure1. Deinked paper
Shrink characters to dots
The key problem hindering the counting is that the whole character is not always completely deinked. For example, h character might be deinked to be n or to be I. This difference make us unclear about how much h was deinked.
Figure2. Deinked "h" character
By intention-to-treat analysis, l or n are considered as not deinked at all. Namely, if the word home is deinked to be 'lone', then the deinking efficiency is 0%, because none of the character was completely deinked. And
if the word home is deinked to be on, the deinking efficiency is 50%, because in the whole 4 characters, two of them are completely deinked.
Figure3. Deinked "home" words
However, this method has low accuracy. If all the characters are not completely deinked, the deinking efficiency is 0%, even though we can see in eyes that a lot of ink is removed. The accuracy is dependent on the size of the character. The more characters
shrinks, the more accurate this method is. To the limit, one character shrinks into a small dot. And that's the basis of our statistical model.
Figure4. Deinked dots and its efficiency
Two-sample T test or nonparametric test
In order to count ink dot more conveniently, we divided a piece of A4 paper into small cells of the same size, and each cell contains the same 15 ink dots. Each small cell is considered as a sample. At the beginning, the residual
ink points of each sample are all 15. Under different treatments, a large number of samples were counted, and their sample distribution curves were drawn respectively. Finally, T test was done to show if there is significant differences.
Figure5. Paper partition
Before choosing the test method, we need to check whether the population distribution of all samples obeys the Normal distribution.
If the data satisfies the Normal distribution, it is suitable to use a two-sample t-test[1]. If the data does not satisfy normal distribution, we use a nonparametric test instead[1]. Sometimes we can also use t-test when the number of samples
is large (more than 100 in experience)[1],
Application for deinking efficiency validation experiment
In the case of our deinking experiment, the distribution of the data doesn't satisfy normal distribution. Thus during the experiment, we let the number of samples be over 100 and used t-test to check the significant difference
between the treatments using active and inactive enzyme respectively.
References
[1] "Advanced Mathematical Statistics Second Edition". Mao Shisong, Wang Jinglong, Pu Xiaolong. Higher Education Press