UN Web Site | UN Web Site Locator
Home Site map Contact 
ESCAP Statistics Division
ESCAP Statistics Division
 
Third Meeting    
The Third Meeting of the Working Party on the Application of New Technology to Population Data
Bali, 7-9 January 1999

STAT/WPA(3)/8
7 January 1999
ENGLISH ONLY

ECONOMIC AND SOCIAL COMMISSION FOR ASIA AND THE PACIFIC

Working Party on the Application of  New Technology to Population Data
Third meeting
7-9 January 1999
Bali

Integrated Approach to Data Capture using OMR and ICR Technology
Contents
  1. Introduction
  2. Experience with OMR IBM-3881 in Population Census 1981
  3. Experience with OMR OPScan 21 for processing Population Census 1991
  4. Integrated Approach to OMR and ICR for data capture in 2000 round of Population Census
  5. Benefit
  6. Appendix
    • Annex-A: Short Questionnaire for Population Census 2001
    • Annex-B: Long Questionnaire for Population Census 2001
1. Introduction

Bangladesh started application of OMR in 1981 for processing the Population Census documents with OMR IBM 3881. The same machines were used in 1983!84 for processing agriculture census documents. Bangladesh used OPScan 21 to process the census documents of population census 1991. In all these censuses it successful captured the numbers and the codes. A short single sheet OMR questionnaire was used to collect basic information of each household and individual members of household all along. The detailed socio-economic information were collected and processed using the data entry technology. The main census and the detailed sample survey used to be conducted after a period of about 9 months. As a result, time frame could not be maintained in the totality of census activities.

Bangladesh is at the door step of the 2000 round censuses. On the basis of the past experience and demand for timely census results

Government of Bangladesh has decided on principle, to adopt the 3:2 years timeframe for the 2000 round of censuses. For sharing the resources it has planned to conduct three national censuses in succession i.e. Economic Census in 2000, Population and Housing Census in 2001 and Agricultural Census in 2002. Bangladesh has also contemplating to adopt civil registration system through the local government in the near future. In view of users demand it is felt essential to search for technology which will ensure rapid capture of number as well as character from multi-sheet questionnaire. With these in view questionnaires have been designed in both the OMR and ICR format in single as well as in multiple sheets. To make the operation success 3 important requirements were ascertained.

  • availability of resource persons to manage printing of questionnaire, data collection and data capture from both single sheet and also multiple sheet questionnaires;
  • availability of maintenance person; and
  • distributive data capture with back-up system.

Incidentally, we have become blessed with the advancement of technology. OMR is now available with both the digit and character recognition capability with high speed.  Bangladesh has ordered for procurement of 5 OMRs of model DRS 800 with throughput 8000 A4 sheets per hour with large host computers. To use as back-up it has ordered for 5 ICRs with throughput of 42 ppm.  Softwares for the OMR and ICR will be configured in such a way so that both the machines can be used interchangeable and can produce output in ASCII format which can be processed with our own application program.

2. Experience with OMR IBM 3881 in Population Census 1981
OMR operation

Operation of the OMR started on the 20" October, 1981 on experimental basis and actual production job was taken up from 3`d November, 1981. During the first week it was run on one shift of 7 hours duration. Next week, work was done in 2 shifts. From the third week 3 shifts were introduced. This decision to run 3 shifts was extremely risky. But considering the need for quick data entry and also due to the fact that 8 months time already elapsed after the census, 3 shifts were necessary to make up time. The decision was however taken in consultation with IBM Engineers who agreed to help in maintenance and offered service every day without any extra charge. Also it was expected that the other OMR would be operational after the receipt of necessary spare parts and then it would be sufficient to run the two OMRs in 2 shifts only.

Only 3 to 4 thousand sheets (without splitting a Union) were taken a time in one Tape. This was done to avoid large scale re-run due to (i) frequent power failure, (ii) machine breakdown and (iii) eventual data check on tapes. The tapes of one Thana were later merged together in one containing all questionnaires of a Thana.

Every sheet was assigned a 7-digit serial number generated by the OMR and recorded in the tape. The same serial was also printed on the sheet. This serial, read with Geo-code, could identify individual household for later reference. The left most 2 digits indicated Geo-code of Thana and the remaining 5 digits indicated running serial within the Thana. After successful run, tape number, serial numbers of the sheets, date and other particulars were written on the external label of the tape and also recorded in a Register, These tapes were then taken to a main frame computer for further processing.

Reasons for lower avera a speed

Some of the difficulties faced at the time of operation are:

  1. Some times recording on tape was found to be erratic. This necessitated re-run of the entire batch.
  2. Sudden power failure necessitated re-run of the batch. Such failure was almost a daily affair. On the average about 2 hours time was needed daily for this.
  3. Two to three hundred sheets are kept on the hopper at a time. When these are processed the machine automatically stops. The operator has to bring down the hopper, feed the next batch of 2-3 hundred sheets and start again. These sheets needed proper alignment at the edges. These actions taken 2 to 3 minutes time.
  4. After processing 5 to 6 thousand sheets the hopper area, carriage area, mark sensing area, etc. had to be cleaned manually with clean dusters and vacum cleaner. Similarly, the read-write head of the tape drive has to be cleaned 2 or 3 times daily.
  5. The FCS was designed in such a way as to reject the Tally sheet in case there was any blank or double mark on a row. If a Tally sheet is rejected the operator must stop the machine, look for the error and correct it before re-feeding.
  6. The OMR machine itself broke down several times during the period of census processing. While the IBM Engineers were available all the time it was very difficult to get the supply of necessary spareparts from abroad. Initially the other OMR was cannibalised but later the whole processing remained suspended for considerable period of time.
Actual Performance

As stated earlier, we have to settle down with only one OMR. The other OMR could not be made operational till end of census processing mainly because of non availability of spare parts.

One OMR machine was put into operation from 3rd November, 1981 and preparation of data tapes for the Census of Population, 1981 continued till the 17th August, 1983.  Though the time taken for completion of the work was about 21 1/2 months (655 days), actual operation was much less than that. Out of 655 days, 197 days was lost for machine break down and 12 days was observed as holidays.  The machine was operated on test basis in single shift from 3rd November to 8th November, 1981 and in 2 shifts from 9th to 12th November, 1981.  Thereafter operation continued in 3 shifts per day till August, 1982 and for the rest one year the machine was operated in 2 shifts per day.  Generally 2 operators worked in one shift. For the available 446 actual working days, a total of 1,116 shifts were planned. But 169 shifts could not be utilized either due to minor mechanical troubles or power failure. About 15 million documents were read through OMR in 997 shifts each of 7 hours duration. Thus, the average number of sheets read per shift stands at 15,000 approximately.

But 145 tapes containing about 4,000 households in each had to be re-run because those could not be read in the computer. As such, the average number of sheets per shift was actually slightly higher and stood at 16,000 roughly. However, experience showed that a maximum of 20-22 thousand sheets can be read in a single shift of 7 hours duration.

Total time required Total working days availed Number of shifts Machine break down Holidays  No. of Tapes Re-run
Planned Actually worked Lost Worked per day
Nov. 3, 1981 to Aug. 17, 1983 (655 days) 446 1166 997 169  2.23 197 12 145
Details are given below:
OMR Operation by Month
 
Month Working Days  No of Shifts worked No. of Tapes Re-run Machine Break down time (days) Holiday
Days Shifts
November, 1981 27 69 69 20 - 1
December, 1981 27 81 78 33 4 -
January, 1982 29 87 83 4 2 -
February, 1982 26 78 64 4 3 -
March, 1982 25 75 49 2 6 -
April, 1982 15  45 44 4 15 -
May, 1982 17 51 49 5 14 -
June,1982 15 45 40 2 15 -
July, 1982 26 78 72 4 - 5
August, 1982 19 57 48 7 12 -
September, 1982 29 58 46 9 1 -
October, 1982 26 52 47 5 2 3
November, 1982 29 58 48 16 1 -
December, 1982 27 54 53 1 4 -
January, 1983 27 54 51 3 4 -
February, 1983 27 54 54 1 1 -
March, 1983 14 28 23 4 17 -
April, 1983 13 26 26 - 17 -
May, 1983 - - - - 31 -
June, 1983 - - - - 30 -
July, 1983 11 22 20 11 18 2
August, 1983 17 33 33 10 - 2
Total = 655 1166 997 145 197 12
3. Experience with OMR Opscan2l for Processing Population Census 1991
OMR Operation:
Two OMR OPSCAN 21/75 run in two shifts. In each shift 16 men worked as --
 
Supervisor - 2x1 = 2
Operator - 2x1 = 2
Geo-code checking = 2
Tearing of questionnaire = 8
MLSS = 1
Officer = 1
The morning shift started from 7-30 a.m and continued upto 2.00 p.m. and the evening shift started from 2-00 p.m. and continued upto 8-30 p.m. In addition, one maintenance engineer remained standby from 9.00 a.m. to 5.00 p.m. every day to haunt the problems and to make the machine operational instantaneously, when there is any disturbance in the machine.
OMR Environment
OMR room is relatively cool. Dehumidifiers were used to control humidity.  Blowers and Vacum cleaners were used for dust control. In addition, Jugglers were also used to remove dust from the document before these were run in OMR, Before data capture the questionnaires were preserved in the controlled air for 48 hours. The procedures followed in OMR operation are -
 
Step-1: Documents were received by batches of thanas and entered into control register; 
Step-2: Census books were taken out of the envelope, checked the geo-code and teared off filled-in questionnaires and kept them above the envelope for seasoning;
Step-3:  After 48 hours these documents were taken to OMR room and Juggled before running in OMR;
Step-4: After data capture the checker inserted the questionnaires in respective envelopes. When all the documents of a thana was complete then those were returned to store.
Step-5: Once the harddisk attached to OMR is full, then the data image of a thana (batch) was transferred to Micro-processing system through LAPLINK procedure.
Some observations of OMR operation are given below:
  • OMR stops running if there are mistakes in
    • Geo-code
    • Household number
    • Time mark
    • Continuation number
    • RMO code
  • Initially, rejection rate was 50% but ultimately it varied from 1 to 3%.
  • Rejection rate observed in running the documents of one thana without editing was 20%.
  • A batch is rejected if the error rate is more than 2%.
  • OMR convert one sheet as one big record of 223 characters. This record consists of some internal information, a sheet number, the household. Special continuation character indicates overflow. The product of OMR is a file consisting of a EA tally record followed by many household and personal records.
  • HARDDISK capacities of OMR-host computer (Everax) is 40 Mbyte. It can easily accommodate the records of two thanas at time. When the harddisk of OMR was full then the data of a thana was transferred to micro computer system in either of two ways
    • (1) by Laplink or (2) by diskette.
  • The initial plan for transfer of data through LAN was abandoned. Instead both the Brooklyn Bridge and Laplink softwares were tried for transmission of data. Laplink worked well and has been installed. Currently it has been experiencing cable problem. The cipher tape dirves came with the OMR were disconnected and moved to computer rooms. They now serve to off-load intermediate and final data sets.
  • MACHINE SPEED is 7000 sheets per hour but achievement is 4000-5000 sheets per hour;
  • DAILY OUTPUT varies from 80,000 to 1,00,000 sheets.
  • MACHINE stops, loose its life time and the output is reduced because of the following problems:- 
    • Dust;
    • Environment; Seasoning;
    • Maintenance;
    • Juggling;
    • Physical quality of questionnaire;
    • Manual editing.
  • For any haltage of OMR, running time of 100 to 150 documents are lost. The following messages were shown in the monitor for the above causes: 
    • Output stacker problem;
    • Input hopper problem;
    • Deskew station jam;
    • Printer problem;
    • Read check.
  • Probable solutions to the above problems;
  • Maintenance of dust free environment and cleaning by blower and vacum cleaner;
  • Seasoning of documents for 48 hours;
  • Running the dehumidifier for 24 hours during rainy seasons and 12 hours during winter;
  • Regular maintenance of machine;
  • Adequate stock of spareparts;
  • Better editing and no marking over time mark and skunk mark area;
  • Regular use of juggler;
  • Incentive for operators.

OMR operation was started in May, 1991. Data capture from tally sheets was completed in June, 1991. Data capture from about 30 million main census questionnaires was started in July, 91 and was completed on 15th October 1992. Monthly progress report of OMR is given in table T1.

T1 Progress Report of OMR Operation
Month Type of Document Working Machine Average out put per machine per hour
Days  Shifts Breakdown # of sheet Run
August'91 Census Book 24 2 146 867055 2596
Sep'91 " 22 2 155 809205 2839
Oct'91 " 26 2 73 1648972 3689
Nov'91 " 24 2 209 1028766 3796
Dec'91 " 25 2 375 579300 4634
Jan'92 " 22 2 172 1273257 4751
Feb'92 " 24 2 137 1514377 4415
March'92 " 26 2 116 1518264 3758
April'92 " 20 2 49 1641635 4677
May'92 " 25 2 164 1855737 5522
June'92 " 24 2 168 1843948 5910
July'92 " 25 2 128 1863965 5011
August'92 " 27  2 261 1281079 4592
Sep'92 " 25 2 223 1044951 3772
Oct'92 " 12 2 40 633183 3165
Error Statistics of OMR Operation are given in tables T2, T3 and T4.
T2 Sample Statistics of Machine Error
Date Run Machine problem Timing  mark deviation Deskew station jam Printer problem Read check Thickness Electricity
17.2.92 4:01 - 11 20 - - 55 -
18.2.92 4:31 - 6 - - - 91 20
19.2.92 1:53   36 - - - 30  
22.2.92             22  
22.2.92 3:30 2:30 18 22 - 8 -  
23.2.92 3:30 2:30 22 - - -    
24.2.92 4:30   21 50 - - 26  
26.2.92 4:52   25 6 - - 40  
" 3:30 2:00 41 38 - - 45  
29.2.92     45 15 - - -  
1.3.92 3:00   230 - - - 10  
2.3.92 3:49   215 - - - 45  
3.3.92     35 - - - 35  
4.3.92     21 - - - 20  
T3 Sample Statistics of Editing Error
Date Hours HH blank duplicate Continuation blank Mark on TM Geo code wron or Sheet torn folded Thickness Document run
Worked Machine problem Electric problem
17.2.92 4:01     230 6 11 35 105 55 3500
18.2.92 4:30   20 191 11 11 41 57 91 4000
2.2.92 4:52     176 27 30 47 85 45 4500
T4 Error Statistics (Summary)
Type of Error Percentage
Household 51.19
Continuation Mark 4.35
Timing Mark 16.31
Geo code 5.50
Deskew station jam 17.19
Printer problem 0.93
Read check 2.63
RMO 1.90
4. Integrated Approach to OMR and ICR for data capture in 2000 round of Population Census

Five Pragmatic steps have been recommended by the National Statistical Council and is being adopted -

  1. Both the short and the long questionnaires have to be designed in OMR/ICR format so that they can be read in both the machines ;
  2. Data capture have to be made from 4 Divisional offices and high speed communication have to be made with the headquarters with essential technical and maintenance personnel ;
  3. Data capture must be completed within 5 to 6 months without fail so that report can be produced within 2 years of conducting the census. Thus necessary arrangement for backup machines, re-enforcement of manpower and arrangement for high speed data communication will be made.
  4. Most of the data to be collected will be numeric code and numbers and only one or two fields will be in short form with block English characters so that both ICR and dual recording OMR can capture the characters easily.
  5. Sufficiently fast moving spares will be made available to keep the machines running till the end of data capture.

To ensure timeliness quality paper of 95 GSM and web press have been procured. For ICR machine OCR for Forms software and OMR machine SOSkit and SOSRes have been procured so that the captured data /image data can be stored in ASCII format . Host computers with 512 MB RAM and 9 GB harddisk and 400 MHz processing speed have been ordered with sufficient fast moving spares essential for data capture from 30 million documents. We have hired maintenance engineer to station in all the form Divisional centres.

There will be four regional data capture enteries. Each centre will have one OMR and One ICR machines. One set of OMR and ICR will be used as mobile.

Population and Housing Census 2001 will have a short questionnaire (appendix-A) with single sheet which will be canvassed to all households and population. It contains basic characteristics of each household and individual members of household. The long questionnaire (Appendix-B) will contain detail questions for different socio-economic characteristics of household and individual members of household. 

The long questionnaire is comprised of 6 sheets. The short questionnaire has 28 questions. Out of them 27 questions will be numerically coded or numbered. Only the name of individual members will be written in maximum of 10 English capital characters. Occupation and economic activity and names and codes may be captured from the long OMR questionnaire.

5. Benefit

Four types of benefit can be ensured from the integration of OMR and ICR technology:

  1. It will develop a master identification database with unique name and numerical address of each individual of the country. This database can be used for demographic analysis, better sampling frame and identification number ;
  2. OMR speed will ensure timeliness and ICR will ensure completeness without fail;
  3. Substantial savings can be made in cost if it can be used for issuance of identification card, civil registration etc.
  4. Savings of manpower will be ensured.
  5. All types of reports can be published within 2 years from the date of completion of enumeration.

 
Pop-IT project (1997-2001)
Project Objectives
Working Party Members
Working Party Meetings
First meeting, Bangkok, 24-26 September 1997
Second meeting, Singapore, 1-3 April 1998
Third meeting, Bali, 7-9 January 1999
Fourth meeting, Manila, 6-9 July 1999
Ffth meeting, Bangkok, 21 October 1999
Sixth meeting, Bangkok, 26 March 2001
Workshops
Application of New Information Technology to Population data, Bangkok, 12-20 October 1999
Population Data Analysis, Storage and Dissemination Technologies, Bangkok, 27-30 March 2001
Guidelines
Population data collection and capture (BBS - Statistics Indonesia)
GPS in modern mapping and GIS technologies to population data (Bangladesh Bureau of Statistics)
Population data dissemination (Statistics New Zealand)
Project Newsletter
Contact us
   
Copyright (c) 2013 ESCAP  |  Legal Notice