UN Web Site | UN Web Site Locator
Home Site map Contact 
ESCAP Statistics Division
ESCAP Statistics Division
 
Third Meeting    
The Third Meeting of the Working Party on the Application of New Technology to Population Data
Bali, 7-9 January 1999

STAT/WPA(3)/11
7 January 1999
ENGLISH ONLY

ECONOMIC AND SOCIAL COMMISSION FOR ASIA AND THE PACIFIC

Working Party on the Application of  New Technology to Population Data
Third meeting
7-9 January 1999
Bali

Numeric Recognition in the 1996 Census of Population and Dwellings - A Review
Andrea Piesse, Carolina Kol and Craig Lange
Statistics New Zealand
Contents
Summary of Findings

This study compared verified key entry data with the Intelligent Character Recognition (ICR) of pre-printed and constrained hand-written numerics in census questionnaires.

Two different recognition engines were used - the RE and the MITEK engine. We found significant differences between the two engines, in terms of both digit substitution error and bias. The RE engine was more accurate. For this engine, we also studied the effect of image enhancement techniques. The results showed that the use of image enhancement leads to a much improved performance.

This study showed that numeric recognition technology is capable of delivering vastly superior results to those achieved in processing the 1996 Census. In 1996 the RE engine was used with no image enhancement. All of the recognition systems on trial gave results that were superior to those achieved in 1996.

The most effective way to lower error rates, both at the digit and variable level, is to reject a higher proportion of the recognised responses (rejected responses go to an operator for key entry). Rejection is primarily based on the level of confidence the engine assigns to each recognised digit.

However, it is also possible to reduce error rates, with little increase in the amount of data rejected, by using 'smarter' rejection strategies. This means treating different variables in different ways and having different confidence thresholds for different recognised digits. Rejection can also be based on simple edit rules and size checks that are designed to detect recognised responses that are highly unlikely to be correct.

Determining the rejection strategy that will maximise the volume and quality of recognised data depends on the questionnaire design, the recognition engine, and any image enhancement being used. Therefore, in order to allow us time to determine the optimal recognition strategy for the 2001 Census, it is important to collaborate with the chosen contractor as early as possible.

US Census research suggests that superior recognition results are possible when working directly with recognition software developers. SNZ should strongly encourage the external contractor to work with manufacturers of the recognition engine to be used in the 2001 Census, in order to optimise recognition results.

This study was invaluable for increasing our understanding of the whole recognition process and has placed us in a much stronger position to negotiate our requirements for the 2001 Census. We have made the decision to build on our experience with the 1996 census and use imaging technology for the 2001 census.

Introduction

Statistics New Zealand (SNZ) used imaging and Intelligent Character Recognition (ICR) for the first time in the processing of the 1996 Census of Population and Dwellings. Although the numeric recognition results achieved were within contractual limits, the data quality, for certain variables in particular, was not as high as anticipated. To some extent, these deficiencies in the data quality stemmed from our lack of understanding of the total recognition process.

The specific aim of this research was to make a recommendation whether or not to use numeric recognition for the 2001 Census.

The research was a joint venture between SNZ and Datamail the imaging contractor. The study used images retained for research purposes from the 1996 Census. Data that had been key entered and verified was compared to recognised data. Datamail was responsible for performing the numeric recognition and also provided the software for capturing the key entry data. SNZ designed the study and carried out all aspects of the analysis.

To investigate possible differences between recognition engines we included two engines in the trial - RE and MITEK. For the RE engine, we also assessed the benefit of applying image enhancement techniques prior to recognition. The RE engine is Datamail's current engine, while the MITEK engine is regularly used by their associate company in Australia.

In order to look at the trade-off between cost and quality, we established the relationship between the proportion of data rejected (requiring key entry) and the associated error rate in the data. We considered the digit error rate and the variable error rate in the data, both before and after simple edits.

Note: There is a Glossary of technical terms at the end of this report.


 
Pop-IT project (1997-2001)
Project Objectives
Working Party Members
Working Party Meetings
First meeting, Bangkok, 24-26 September 1997
Second meeting, Singapore, 1-3 April 1998
Third meeting, Bali, 7-9 January 1999
Fourth meeting, Manila, 6-9 July 1999
Ffth meeting, Bangkok, 21 October 1999
Sixth meeting, Bangkok, 26 March 2001
Workshops
Application of New Information Technology to Population data, Bangkok, 12-20 October 1999
Population Data Analysis, Storage and Dissemination Technologies, Bangkok, 27-30 March 2001
Guidelines
Population data collection and capture (BBS - Statistics Indonesia)
GPS in modern mapping and GIS technologies to population data (Bangladesh Bureau of Statistics)
Population data dissemination (Statistics New Zealand)
Project Newsletter
Contact us
   
Copyright (c) 2013 ESCAP  |  Legal Notice