Annex-A: Short Questionnaire
for Population Census 2001
Annex-B: Long Questionnaire
for Population Census 2001
1.
Introduction
Bangladesh started application of OMR in
1981 for processing the Population Census documents
with OMR IBM 3881. The same machines were used
in 1983!84 for processing agriculture census
documents. Bangladesh used OPScan 21 to process
the census documents of population census 1991.
In all these censuses it successful captured
the numbers and the codes. A short single sheet
OMR questionnaire was used to collect basic
information of each household and individual
members of household all along. The detailed
socio-economic information were collected and
processed using the data entry technology. The
main census and the detailed sample survey used
to be conducted after a period of about 9 months.
As a result, time frame could not be maintained
in the totality of census activities.
Bangladesh is at the door step of the 2000
round censuses. On the basis of the past experience
and demand for timely census results
Government of Bangladesh has decided on principle,
to adopt the 3:2 years timeframe for the 2000
round of censuses. For sharing the resources
it has planned to conduct three national censuses
in succession i.e. Economic Census in 2000,
Population and Housing Census in 2001 and Agricultural
Census in 2002. Bangladesh has also contemplating
to adopt civil registration system through the
local government in the near future. In view
of users demand it is felt essential to search
for technology which will ensure rapid capture
of number as well as character from multi-sheet
questionnaire. With these in view questionnaires
have been designed in both the OMR and ICR format
in single as well as in multiple sheets. To
make the operation success 3 important requirements
were ascertained.
availability of resource
persons to manage printing of questionnaire,
data collection and data capture from both
single sheet and also multiple sheet questionnaires;
availability of maintenance
person; and
distributive data capture
with back-up system.
Incidentally, we have become blessed with the
advancement of technology. OMR is now available
with both the digit and character recognition
capability with high speed. Bangladesh
has ordered for procurement of 5 OMRs of model
DRS 800 with throughput 8000 A4 sheets per hour
with large host computers. To use as back-up
it has ordered for 5 ICRs with throughput of
42 ppm. Softwares for the OMR and ICR
will be configured in such a way so that both
the machines can be used interchangeable and
can produce output in ASCII format which can
be processed with our own application program.
2.
Experience with OMR IBM 3881 in Population Census
1981
OMR operation
Operation of the OMR started on the 20" October,
1981 on experimental basis and actual production
job was taken up from 3`d November, 1981. During
the first week it was run on one shift of 7
hours duration. Next week, work was done in
2 shifts. From the third week 3 shifts were
introduced. This decision to run 3 shifts was
extremely risky. But considering the need for
quick data entry and also due to the fact that
8 months time already elapsed after the census,
3 shifts were necessary to make up time. The
decision was however taken in consultation with
IBM Engineers who agreed to help in maintenance
and offered service every day without any extra
charge. Also it was expected that the other
OMR would be operational after the receipt of
necessary spare parts and then it would be sufficient
to run the two OMRs in 2 shifts only.
Only 3 to 4 thousand sheets (without splitting
a Union) were taken a time in one Tape. This
was done to avoid large scale re-run due to
(i) frequent power failure, (ii) machine breakdown
and (iii) eventual data check on tapes. The
tapes of one Thana were later merged together
in one containing all questionnaires of a Thana.
Every sheet was assigned a 7-digit serial number
generated by the OMR and recorded in the tape.
The same serial was also printed on the sheet.
This serial, read with Geo-code, could identify
individual household for later reference. The
left most 2 digits indicated Geo-code of Thana
and the remaining 5 digits indicated running
serial within the Thana. After successful run,
tape number, serial numbers of the sheets, date
and other particulars were written on the external
label of the tape and also recorded in a Register,
These tapes were then taken to a main frame
computer for further processing.
Reasons for lower avera
a speed
Some of the difficulties faced at the time
of operation are:
Some times recording
on tape was found to be erratic. This necessitated
re-run of the entire batch.
Sudden power failure
necessitated re-run of the batch. Such failure
was almost a daily affair. On the average
about 2 hours time was needed daily for this.
Two to three hundred
sheets are kept on the hopper at a time. When
these are processed the machine automatically
stops. The operator has to bring down the
hopper, feed the next batch of 2-3 hundred
sheets and start again. These sheets needed
proper alignment at the edges. These actions
taken 2 to 3 minutes time.
After processing 5
to 6 thousand sheets the hopper area, carriage
area, mark sensing area, etc. had to be cleaned
manually with clean dusters and vacum cleaner.
Similarly, the read-write head of the tape
drive has to be cleaned 2 or 3 times daily.
The FCS was designed
in such a way as to reject the Tally sheet
in case there was any blank or double mark
on a row. If a Tally sheet is rejected the
operator must stop the machine, look for the
error and correct it before re-feeding.
The OMR machine itself
broke down several times during the period
of census processing. While the IBM Engineers
were available all the time it was very difficult
to get the supply of necessary spareparts
from abroad. Initially the other OMR was cannibalised
but later the whole processing remained suspended
for considerable period of time.
Actual Performance
As stated earlier, we have to settle down
with only one OMR. The other OMR could not be
made operational till end of census processing
mainly because of non availability of spare
parts.
One OMR machine was put into operation from
3rd November, 1981 and preparation of data tapes
for the Census of Population, 1981 continued
till the 17th August, 1983. Though the
time taken for completion of the work was about
21 1/2 months (655 days), actual operation was
much less than that. Out of 655 days, 197 days
was lost for machine break down and 12 days
was observed as holidays. The machine
was operated on test basis in single shift from
3rd November to 8th November, 1981 and in 2
shifts from 9th to 12th November, 1981.
Thereafter operation continued in 3 shifts per
day till August, 1982 and for the rest one year
the machine was operated in 2 shifts per day.
Generally 2 operators worked in one shift. For
the available 446 actual working days, a total
of 1,116 shifts were planned. But 169 shifts
could not be utilized either due to minor mechanical
troubles or power failure. About 15 million
documents were read through OMR in 997 shifts
each of 7 hours duration. Thus, the average
number of sheets read per shift stands at 15,000
approximately.
But 145 tapes containing about 4,000 households
in each had to be re-run because those could
not be read in the computer. As such, the average
number of sheets per shift was actually slightly
higher and stood at 16,000 roughly. However,
experience showed that a maximum of 20-22 thousand
sheets can be read in a single shift of 7 hours
duration.
Total time
required
Total working
days availed
Number of shifts
Machine break
down
Holidays
No. of
Tapes Re-run
Planned
Actually worked
Lost
Worked per day
Nov. 3, 1981 to Aug. 17, 1983 (655 days)
446
1166
997
169
2.23
197
12
145
Details are given below:
OMR Operation by Month
Month
Working Days
No of
Shifts worked
No. of Tapes
Re-run
Machine Break
down time (days)
Holiday
Days
Shifts
November, 1981
27
69
69
20
-
1
December, 1981
27
81
78
33
4
-
January, 1982
29
87
83
4
2
-
February, 1982
26
78
64
4
3
-
March, 1982
25
75
49
2
6
-
April, 1982
15
45
44
4
15
-
May, 1982
17
51
49
5
14
-
June,1982
15
45
40
2
15
-
July, 1982
26
78
72
4
-
5
August, 1982
19
57
48
7
12
-
September, 1982
29
58
46
9
1
-
October, 1982
26
52
47
5
2
3
November, 1982
29
58
48
16
1
-
December, 1982
27
54
53
1
4
-
January, 1983
27
54
51
3
4
-
February, 1983
27
54
54
1
1
-
March, 1983
14
28
23
4
17
-
April, 1983
13
26
26
-
17
-
May, 1983
-
-
-
-
31
-
June, 1983
-
-
-
-
30
-
July, 1983
11
22
20
11
18
2
August, 1983
17
33
33
10
-
2
Total =
655
1166
997
145
197
12
3.
Experience with OMR Opscan2l for Processing Population
Census 1991
OMR Operation:
Two OMR OPSCAN 21/75 run in two shifts. In each
shift 16 men worked as --
Supervisor
- 2x1
= 2
Operator
- 2x1
= 2
Geo-code checking
= 2
Tearing of questionnaire
= 8
MLSS
= 1
Officer
= 1
The morning shift started from 7-30 a.m and
continued upto 2.00 p.m. and the evening shift
started from 2-00 p.m. and continued upto 8-30
p.m. In addition, one maintenance engineer remained
standby from 9.00 a.m. to 5.00 p.m. every day
to haunt the problems and to make the machine
operational instantaneously, when there is any
disturbance in the machine.
OMR Environment
OMR room is relatively cool. Dehumidifiers were
used to control humidity. Blowers and Vacum
cleaners were used for dust control. In addition,
Jugglers were also used to remove dust from the
document before these were run in OMR, Before
data capture the questionnaires were preserved
in the controlled air for 48 hours. The procedures
followed in OMR operation are -
Step-1:
Documents were
received by batches of thanas and entered
into control register;
Step-2:
Census books were taken out
of the envelope, checked the geo-code
and teared off filled-in questionnaires
and kept them above the envelope for seasoning;
Step-3:
After 48 hours these documents
were taken to OMR room and Juggled before
running in OMR;
Step-4:
After data capture the checker
inserted the questionnaires in respective
envelopes. When all the documents of a
thana was complete then those were returned
to store.
Step-5:
Once the harddisk attached
to OMR is full, then the data image of
a thana (batch) was transferred to Micro-processing
system through LAPLINK procedure.
Some observations of OMR operation are given
below:
OMR stops running if
there are mistakes in
Geo-code
Household number
Time mark
Continuation number
RMO code
Initially, rejection
rate was 50% but ultimately it varied from
1 to 3%.
Rejection rate observed
in running the documents of one thana without
editing was 20%.
A batch is rejected if
the error rate is more than 2%.
OMR convert one sheet
as one big record of 223 characters. This
record consists of some internal information,
a sheet number, the household. Special continuation
character indicates overflow. The product
of OMR is a file consisting of a EA tally
record followed by many household and personal
records.
HARDDISK capacities of
OMR-host computer (Everax) is 40 Mbyte. It
can easily accommodate the records of two
thanas at time. When the harddisk of OMR was
full then the data of a thana was transferred
to micro computer system in either of two
ways
(1) by Laplink
or (2) by diskette.
The initial plan for
transfer of data through LAN was abandoned.
Instead both the Brooklyn Bridge and Laplink
softwares were tried for transmission of data.
Laplink worked well and has been installed.
Currently it has been experiencing cable problem.
The cipher tape dirves came with the OMR were
disconnected and moved to computer rooms.
They now serve to off-load intermediate and
final data sets.
MACHINE SPEED is 7000
sheets per hour but achievement is 4000-5000
sheets per hour;
DAILY OUTPUT varies from
80,000 to 1,00,000 sheets.
MACHINE stops, loose
its life time and the output is reduced because
of the following problems:-
Dust;
Environment; Seasoning;
Maintenance;
Juggling;
Physical quality
of questionnaire;
Manual editing.
For any haltage of OMR,
running time of 100 to 150 documents are lost.
The following messages were shown in the monitor
for the above causes:
Output stacker
problem;
Input hopper problem;
Deskew station
jam;
Printer problem;
Read check.
Probable solutions to
the above problems;
Maintenance of dust free
environment and cleaning by blower and vacum
cleaner;
Seasoning of documents
for 48 hours;
Running the dehumidifier
for 24 hours during rainy seasons and 12 hours
during winter;
Regular maintenance of
machine;
Adequate stock of spareparts;
Better editing and no
marking over time mark and skunk mark area;
Regular use of juggler;
Incentive for operators.
OMR operation was started in May, 1991. Data
capture from tally sheets was completed in June,
1991. Data capture from about 30 million main
census questionnaires was started in July, 91
and was completed on 15th October 1992. Monthly
progress report of OMR is given in table T1.
T1 Progress Report of OMR
Operation
Month
Type of Document
Working
Machine
Average out
put per machine per hour
Days
Shifts
Breakdown
# of sheet Run
August'91
Census Book
24
2
146
867055
2596
Sep'91
"
22
2
155
809205
2839
Oct'91
"
26
2
73
1648972
3689
Nov'91
"
24
2
209
1028766
3796
Dec'91
"
25
2
375
579300
4634
Jan'92
"
22
2
172
1273257
4751
Feb'92
"
24
2
137
1514377
4415
March'92
"
26
2
116
1518264
3758
April'92
"
20
2
49
1641635
4677
May'92
"
25
2
164
1855737
5522
June'92
"
24
2
168
1843948
5910
July'92
"
25
2
128
1863965
5011
August'92
"
27
2
261
1281079
4592
Sep'92
"
25
2
223
1044951
3772
Oct'92
"
12
2
40
633183
3165
Error Statistics of OMR Operation are given
in tables T2, T3 and T4.
T2 Sample Statistics of
Machine Error
Date
Run
Machine problem
Timing mark deviation
Deskew station jam
Printer problem
Read check
Thickness
Electricity
17.2.92
4:01
-
11
20
-
-
55
-
18.2.92
4:31
-
6
-
-
-
91
20
19.2.92
1:53
36
-
-
-
30
22.2.92
22
22.2.92
3:30
2:30
18
22
-
8
-
23.2.92
3:30
2:30
22
-
-
-
24.2.92
4:30
21
50
-
-
26
26.2.92
4:52
25
6
-
-
40
"
3:30
2:00
41
38
-
-
45
29.2.92
45
15
-
-
-
1.3.92
3:00
230
-
-
-
10
2.3.92
3:49
215
-
-
-
45
3.3.92
35
-
-
-
35
4.3.92
21
-
-
-
20
T3 Sample Statistics of
Editing Error
Date
Hours
HH
blank duplicate
Continuation
blank
Mark
on TM
Geo
code wron or
Sheet
torn folded
Thickness
Document
run
Worked
Machine problem
Electric problem
17.2.92
4:01
230
6
11
35
105
55
3500
18.2.92
4:30
20
191
11
11
41
57
91
4000
2.2.92
4:52
176
27
30
47
85
45
4500
T4 Error Statistics (Summary)
Type of Error
Percentage
Household
51.19
Continuation Mark
4.35
Timing Mark
16.31
Geo code
5.50
Deskew station jam
17.19
Printer problem
0.93
Read check
2.63
RMO
1.90
4.
Integrated Approach to OMR and ICR for data capture
in 2000 round of Population Census
Five Pragmatic steps have been recommended
by the National Statistical Council and is being
adopted -
Both the short and the long
questionnaires have to be designed in OMR/ICR
format so that they can be read in both the
machines ;
Data capture have to be
made from 4 Divisional offices and high speed
communication have to be made with the headquarters
with essential technical and maintenance personnel
;
Data capture must be completed
within 5 to 6 months without fail so that
report can be produced within 2 years of conducting
the census. Thus necessary arrangement for
backup machines, re-enforcement of manpower
and arrangement for high speed data communication
will be made.
Most of the data to be
collected will be numeric code and numbers
and only one or two fields will be in short
form with block English characters so that
both ICR and dual recording OMR can capture
the characters easily.
Sufficiently fast moving
spares will be made available to keep the
machines running till the end of data capture.
To ensure timeliness quality paper of 95 GSM
and web press have been procured. For ICR machine
OCR for Forms software and OMR
machine SOSkit and SOSRes have been procured
so that the captured data /image data can be
stored in ASCII format . Host computers with
512 MB RAM and 9 GB harddisk and 400 MHz processing
speed have been ordered with sufficient fast
moving spares essential for data capture from
30 million documents. We have hired maintenance
engineer to station in all the form Divisional
centres.
There will be four regional data capture enteries.
Each centre will have one OMR and One ICR machines.
One set of OMR and ICR will be used as mobile.
Population and Housing Census 2001 will have
a short questionnaire (appendix-A) with single
sheet which will be canvassed to all households
and population. It contains basic characteristics
of each household and individual members of
household. The long questionnaire (Appendix-B)
will contain detail questions for different
socio-economic characteristics of household
and individual members of household.
The long questionnaire is comprised of 6 sheets.
The short questionnaire has 28 questions. Out
of them 27 questions will be numerically coded
or numbered. Only the name of individual members
will be written in maximum of 10 English capital
characters. Occupation and economic activity
and names and codes may be captured from the
long OMR questionnaire.
5.
Benefit
Four types of benefit can be ensured from
the integration of OMR and ICR technology:
It will develop a master
identification database with unique name and
numerical address of each individual of the
country. This database can be used for demographic
analysis, better sampling frame and identification
number ;
OMR speed will ensure timeliness
and ICR willensure completeness without
fail;
Substantial savings can
be made in cost if it can be used for issuance
of identification card, civil registration
etc.
Savings of manpower will
be ensured.
All types of reports can
be published within 2 years from the date
of completion of enumeration.