ESCAP logo
Home Site Map   Contact
 
About US Media Centre Members Programmes Documents Publications Jobs
Search:
More Options | Search Tips
Bangkok, Thailand  
  Home > Statistics Division > STAT/WPA(3)/11

Statistics Division, UNESCAP
About us
Statistics Development
 
Bullet Statistics for monitoring MDGs
Bullet Statistics on disability
Bullet Statistics on informal sector and informal employment
Bullet Concluded capacity building projects
Regional Adviser on Statistics
Data Centre
Statistical Publications
Statistical Newsletter
Committee on Statistics
 
Bullet Bureau of the Committee on Statistics
Statistical meetings in Asia and the Pacific
 
Bullet Forward calendar
 
Bullet Past meetings
Contact Us
Related Links
National Statistical Offices in Asia and the Pacific
Statistical Institute for Asia and the Pacific
United Nations Statistics Division
UNdata
Millennium Development Goals Asia Pacific
 
The Third Meeting of the Working Party on the Application of New Technology to Population Data
Bali, 7-9 January 1999
Numeric Recognition in the 1996 Census of Population and Dwellings - A Review
Analysis - Pre-printed Numerics

Not surprisingly, recognition of pre-printed numerics was far superior to that of hand-written numerics. At comparable rejection rates of around 12%, for the RE engine with image enhancement, the digit substitution rate for hand-written numerics was 1.41% compared to the rate of 0.16% for pre-printed numerics.

For the RE engine on enhanced images, we inspected all 43 cases for 'district' or 'subdistrict' where the keyed and recognised responses differed, and the recognised response would be automatically accepted when the confidence threshold for all digits was 650. In none of the cases was the response on the image pre-printed and whole.

Hand-written
Pre-printed - Cutting Error
Key Entry Error
17
24
2

As can be seen from the table above, in 40% of the cases viewed the 'district' or 'subdistrict' field had actually been filled out by hand. In 56% of the cases viewed the pre-printed 'district' or 'subdistrict' had the tops of the characters cut off. This only occurred on individual form fronts (IF's) where, for some reason, the printing in the top right hand corner of the form had been cut from the image. Unfortunately, we presume some sort of registration error caused part of the 'district' and 'subdistrict' responses to be cut out also.

For the RE engine on enhanced images, we never found an example of a whole pre-printed digit being substituted for another digit.

Rejection rates for pre-printed numerics were higher than could be expected. For the RE engine on enhanced images when the confidence threshold for all digits was 0, 11% of data was still rejected. Upon discovering the cutting error referred to above, we decided to analyse the rejection rates separately for DF's and IF's (expecting the rates to be worse on IF's).

Form
'District' Rejection Rate
'Subdistrict' Rejection Rate
DF
51.75%
25.30%
IF
13.13%
11.36%

Surprisingly the rejection rates were worse for the DF's. For DF's, if the 'subdistrict' had been rejected, there was an 86% chance that the 'district' had also been rejected. Of the few cases viewed, the 'district' and 'subdistrict' were clearly pre-printed. So why were the rejection rates so high? We suspect that this relates back to the original difficulty that both engines had in registering the DF's at all.

For the RE engine on enhanced images, we also looked at the proportion of rejected 'district' and 'subdistrict' responses that were rejected only because they contained an excessive number of digits, and not because one or more of the digits had a confidence score below 650.

Variable
Proportion of Rejections Due to Length Only
District
17.82%
Subdistrict
31.83%

There were 228 recognised 'district' and 'subdistrict' responses rejected only because they were too long, where the keyed response was available for comparison. All of these were too long because they contained one or more recognised hyphens, "-". Had the hyphen not been there, every recognised response would have been correct. That is, the recognised digits were all correct, but a hyphen had been inserted somewhere in the recognised response. The recognition library should not have included a hyphen as a valid numeric character. Presumably different libraries were used for hand-written and pre-printed numerics because hyphens only featured as characters in the recognised 'district' and 'subdistrict' responses.



Copyright (c) 2010 ESCAP  |  Legal Notice