Not surprisingly, recognition of pre-printed
numerics was far superior to that of hand-written
numerics. At comparable rejection rates of around
12%, for the RE engine with image enhancement,
the digit substitution rate for hand-written
numerics was 1.41% compared to the rate of 0.16%
for pre-printed numerics.
For the RE engine on enhanced images, we inspected
all 43 cases for 'district' or 'subdistrict'
where the keyed and recognised responses differed,
and the recognised response would be automatically
accepted when the confidence threshold for all
digits was 650. In none of the cases was the
response on the image pre-printed and whole.
|
Hand-written |
Pre-printed - Cutting Error |
Key Entry Error |
|
17 |
24 |
2 |
As can be seen from the table above, in 40%
of the cases viewed the 'district' or 'subdistrict'
field had actually been filled out by hand.
In 56% of the cases viewed the pre-printed 'district'
or 'subdistrict' had the tops of the characters
cut off. This only occurred on individual form
fronts (IF's) where, for some reason, the printing
in the top right hand corner of the form had
been cut from the image. Unfortunately, we presume
some sort of registration error caused part
of the 'district' and 'subdistrict' responses
to be cut out also.
For the RE engine on enhanced images, we never
found an example of a whole pre-printed digit
being substituted for another digit.
Rejection rates for pre-printed numerics were
higher than could be expected. For the RE engine
on enhanced images when the confidence threshold
for all digits was 0, 11% of data was still
rejected. Upon discovering the cutting error
referred to above, we decided to analyse the
rejection rates separately for DF's and IF's
(expecting the rates to be worse on IF's).
|
Form |
'District' Rejection Rate |
'Subdistrict' Rejection Rate |
|
DF |
51.75% |
25.30% |
|
IF |
13.13% |
11.36% |
Surprisingly the rejection rates were worse
for the DF's. For DF's, if the 'subdistrict'
had been rejected, there was an 86% chance that
the 'district' had also been rejected. Of the
few cases viewed, the 'district' and 'subdistrict'
were clearly pre-printed. So why were the rejection
rates so high? We suspect that this relates
back to the original difficulty that both engines
had in registering the DF's at all.
For the RE engine on enhanced images, we also
looked at the proportion of rejected 'district'
and 'subdistrict' responses that were rejected
only because they contained an excessive number
of digits, and not because one or more of the
digits had a confidence score below 650.
| Variable |
Proportion of Rejections Due to Length
Only |
| District |
17.82% |
| Subdistrict |
31.83% |
There were 228 recognised 'district' and 'subdistrict'
responses rejected only because they were too
long, where the keyed response was available
for comparison. All of these were too long because
they contained one or more recognised hyphens,
"-". Had the hyphen not been there, every recognised
response would have been correct. That is, the
recognised digits were all correct, but a hyphen
had been inserted somewhere in the recognised
response. The recognition library should not
have included a hyphen as a valid numeric character.
Presumably different libraries were used for
hand-written and pre-printed numerics because
hyphens only featured as characters in the recognised
'district' and 'subdistrict' responses.
|