UN Web Site | UN Web Site Locator
Home Site map Contact 
ESCAP Statistics Division
ESCAP Statistics Division
 
Third Meeting    
The Third Meeting of the Working Party on the Application of New Technology to Population Data
Bali, 7-9 January 1999
STAT/WPA(3)/10
7 January 1999
ENGLISH ONLY
ECONOMIC AND SOCIAL COMMISSION FOR ASIA AND THE PACIFIC

Working Party on the Application of  New Technology to Population Data
Third meeting
7-9 January 1999
Bali

Prototyping a Distributed Data-Capture System Using a Scanner
Toshio Shigematsu
JICA Expert at BPS-Indonesia
Contents
1. Introduction

1. A distributed data capture system is an approach a census planner may want to choose when a centralized data capture approach can not be used due to the lack of work space for equipment or of storage space for questionnaires or due to other administrative or logistic reasons.

2. The proposed prototype system for a distributed data-capture system will be developed around an OCR/OMR software engine and a medium-speed scanner. The system will process a single-sheet, two-sided form, capable of processing 8,000 to 12,000 questionnaire forms per site per day.

3. There are integrated form processing software packages, such as NCS ACCRA, MS&I's MATRA, and TiS's AFPS. These integrated packages, while providing comprehensive functionality and flexibility, are expensive and overly complex for many users or are often difficult to optimize for a particular form or application. To customize these products, even if possible, requires significant application development effort to be ready for processing the census.

2. Requirements

4. The basic requirements for the prototype system are given as follows:

  1. Able to process two-sided legal-size questionnaire sheets which may or may not have identical sides. One side of the questionnaire sheet will have information on four individual persons.
  2. Usable by relatively inexperienced personnel on a day-to-day basis.
  3. Throughput for a single installation in the range of 8,000 to 12,000 sheets per day.
  4. All components of the system localized for the Indonesian language.
  5. Data will consists of both handprint numerals and mark-sense.
  6. User-friendly verification module.
  7. A system monitor will keep track of the progress of each work (image (Tiff), document, etc.) in the system, collect and record statistics.
3. Work sequence (see Figure 1.)

5. The prototype system will be designed to handle a simple work sequence described below:

(1) Operator starts the system.
(2) Operator loads questionnaire into the scanner.
(3) Scanner creates an image in the Recognition-ready Folder.
(4) Image is submitted to the Recognition program."
(5) Recognition results are sent to the Recognition Results Analyzer.

Figure 1. The data flow for the prototype system
Figure 1. The data flow for the prototype system

(6) The Recognition Results Analyzer determines if human intervention is required, based on confidence level of the recognition and on other factors.

(6.1) Human intervention is required:

  • The image and the recognition results are loaded for verification.
  • Upon completion of verification, the Verify program sends the results to the Verified Folder.

(6.2) Human intervention is not required:

  • The Verify program sends the results to the Verified Folder.

(6.3) The results are written in the specified format to the file system.

  • The Output program sends the verified data to the Output Folder

6. In the above sequence, after starting the system, the only required human intervention is at the Scanner and at the Verify program. All other processing are executed automatically.

4. Design

7. The prototype system will consist of a single integrated GUI that handles scanning, recognition, and final output. This GUI will have a separate thread running for each function (Scanning, recognition, and output).

8. Scanning controls will be as simple as possible, possibly limited to a "Start/Stop" button on the main GUI panel. The progress of work through the system will be indicated by simple numeric displays. Workflow will be handled internally between scanning and recognition. File-based workflow will be used between recognition, the Recognition Results Analyzer, the Verifier, and the output function.

a. Scanner

9. A TWAIN-compliant SCSI scanner (e.g. Fujitsu M3099GX) will be used for the prototype. This would avoid being tied to proprietary driver systems.

b. Recognition Module

10. For the prototype system, an OCR engine NCS NestorReaderÒ will be used for recognition of the images taken from the scanner.

c. Recognition Results Analyzer

11. The Recognition Results Analyzer, a program in Visual Basic®, will read the recognition results file, which is in a text format, and count recognition errors. Depending on a user-defined acceptable error-level, it will determine whether or not the recognition results be subjected to the verification or not. The error statistics for each Enumeration District will be displayed and printed.

d. Verifier program

12. The Verifier program is an application program in Visual Basic® language, which allows the user to view the original image and the recognition results simultaneously and to modify the results to agree with the image if an error was made in recognition. The Verifier may either operate on a single recognition results file, on a list of result files, or on a directory where recognition results files are being added to a Recognition Results Folder.

13. The Verifier contains a main-window and one or more different sub-windows, each displaying a different aspect of the image, and the recognition results for the current zone.

14. Typically, the sub-windows consist of a portion of the image that includes a view of the zone being verified. Next to the image of the zone being verified there is a text field where any corrections will be entered. In the display of the image, zones that need to be verified are highlighted in color. 

e. Output

15. Output of the prototype will be ASCII data in the form of a Comma Separated Value (CSV) file or a text file, one record per person.

5. Conclusion

16. The prototype system will be a simple OMR/OCR and data entry solution that includes customized functionality required for supporting a population census application. Every process in the system will be optimized for the census application in the design stage, including:

  • Scanner interface
  • Network image workflow
  • Key verification
  • Output of the resolved, edited and formatted ASCII data 

 
Pop-IT project (1997-2001)
Project Objectives
Working Party Members
Working Party Meetings
First meeting, Bangkok, 24-26 September 1997
Second meeting, Singapore, 1-3 April 1998
Third meeting, Bali, 7-9 January 1999
Fourth meeting, Manila, 6-9 July 1999
Ffth meeting, Bangkok, 21 October 1999
Sixth meeting, Bangkok, 26 March 2001
Workshops
Application of New Information Technology to Population data, Bangkok, 12-20 October 1999
Population Data Analysis, Storage and Dissemination Technologies, Bangkok, 27-30 March 2001
Guidelines
Population data collection and capture (BBS - Statistics Indonesia)
GPS in modern mapping and GIS technologies to population data (Bangladesh Bureau of Statistics)
Population data dissemination (Statistics New Zealand)
Project Newsletter
Contact us
   
Copyright (c) 2013 ESCAP  |  Legal Notice