This study compared verified
key entry data with the Intelligent Character
Recognition (ICR) of pre-printed and constrained
hand-written numerics in census questionnaires.
Two different recognition engines were used
- the RE and the MITEK engine. We found significant
differences between the two engines, in terms
of both digit substitution error and bias. The
RE engine was more accurate. For this engine,
we also studied the effect of image enhancement
techniques. The results showed that the use
of image enhancement leads to a much improved
performance.
This study showed that numeric recognition
technology is capable of delivering vastly superior
results to those achieved in processing the
1996 Census. In 1996 the RE engine was used
with no image enhancement. All of the recognition
systems on trial gave results that were superior
to those achieved in 1996.
The most effective way to lower error rates,
both at the digit and variable level, is to
reject a higher proportion of the recognised
responses (rejected responses go to an operator
for key entry). Rejection is primarily based
on the level of confidence the engine assigns
to each recognised digit.
However, it is also possible to reduce error
rates, with little increase in the amount of
data rejected, by using 'smarter' rejection
strategies. This means treating different variables
in different ways and having different confidence
thresholds for different recognised digits.
Rejection can also be based on simple edit rules
and size checks that are designed to detect
recognised responses that are highly unlikely
to be correct.
Determining the rejection strategy that will
maximise the volume and quality of recognised
data depends on the questionnaire design, the
recognition engine, and any image enhancement
being used. Therefore, in order to allow us
time to determine the optimal recognition strategy
for the 2001 Census, it is important to collaborate
with the chosen contractor as early as possible.
US Census research suggests that superior recognition
results are possible when working directly with
recognition software developers. SNZ should
strongly encourage the external contractor to
work with manufacturers of the recognition engine
to be used in the 2001 Census, in order to optimise
recognition results.
This study was invaluable for increasing our
understanding of the whole recognition process
and has placed us in a much stronger position
to negotiate our requirements for the 2001 Census.
We have made the decision to build on our experience
with the 1996 census and use imaging technology
for the 2001 census. |