Appendix
1: Topics Investigated in the 1990 and
1995 Population Census in Japan
Appendix
2: Historical Development of OMR Used
for Census Tabulation
Appendix
3: Results of OMR's Reading in the 1990
and 1995 Population Census
Appendix
4: System Configuration of the OCR for
the 1998 Housing and Land Survey
Appendix
5: Accuracy of OCR's Reading -- Results
of the Pilot Survey for the 1998 Housing and
Land Survey
Appendix
6: Tabulation and Release Schedule for
the 1995 Population Census
Introduction
1. The Population Census of Japan, authorized
as Designated Statistics No.1, has been conducted
in the years ending with 0 and 5. The former
is called the full-scale Census and the latter,
the simplified Census. The difference between
the both Censuses is the number of topics surveyed.
In the last several Censuses, the full-scale
one covered 22 topics, and the simplified one
17 topics. The topics, even in the full-scale,
are restricted to basic items in the demographic,
economic and social fields. Housing items are
limited. Detailed housing items are covered
by the Housing and Land Survey which is taken
with a large sample in the years ending with
3 and 8. That is why the Census of Japan has
not been named the Population and Housing Census.
2. The next Population Census of Japan is scheduled
to be taken as of 1 October, 2000. The data
from the 2000 Census are quite essential to
the national planning toward the 21st century.
The time left before its taking is less than
two years. The Statistics Bureau and the Statistics
Center (hereinafter referred to as "the Bureau"
and "the Center" respectively) have made continuous
efforts to draw out the basic design covering
overall stages from preparatory work to tabulation
and dissemination for the 2000 Census. On the
other hand, the Bureau and the Center have held
several meetings with users such as ministries
and academic researchers to grasp what data
they need, while they have examined what methods
should be adopted for smooth field-enumeration
by discussing with persons-in-charge of local
organizations. Three pilot surveys have been
already carried out to examine questionnaire
forms, method of enumeration, performance of
several types of OCR equipment, etc. The full-dress
rehearsal will be taken in June this year.
3. This paper shows the experience we acquired
in the use of OMRs in the past Censuses and
the preparations we made for the new OCR equipment
for the 2000 Census.
Use
of OMRs in the Past Population Censuses
4. In the 1965 Population Census, small size
IBM OCRs were used for the first time in history
as mark readers for a limited number of items
for 100% tabulation. In the 1970 Census, larger
size NEC OMRs were introduced to replace key-entry
entirely. In both two censuses, questionnaires
had to be transcribed manually to OCR/OMR sheets.
5. Mark sheet type questionnaires printed on
one side were introduced in the 1975 Census,
where households entered marks directly on the
sheets and no transcription was needed. Mark
sheet type questionnaires printed on both sides
were used in the 1980 Census. The OMRs used
in the 1980 Census were designed to read the
marks written on both sides of the questionnaire
simultaneously.
6. For the 1990 Census, OMIRs (Optical Mark
and Image Readers) were developed. The OMIRs
had functions not only of reading marks but
of capturing images in the specified areas on
the questionnaire. Since hand-written responses
on industry and occupation were captured by
computer using OMIRs, it was expected unnecessary
for clerical workers to refer to the original
questionnaires in the coding and editing processes.
But, captured images were sometimes ambiguous
because machine checking and adjustment was
not always made properly for image reading.
Therefore, their functions were not fully utilized
in the coding work.
7. The latest model of OMIRs that were used
for the 1990 and 1995 Censuses was capable of
reading mark sheets of up to 257? ? 364?. The
reading speed is 150 sheets per minutes, when
continuously operated. In order to avoid interruption
caused by the operation of refilling questionnaire
sheets to the stacker, dual stackers with an
automatic changer were installed.
Introduction
of OCRs for the 2000 Census
8. Although the Bureau and the Center used
NEC OMRs without any serious problems until
the 1995 Census, the manufacturer found it difficult
to continue technical supports and production
of spare-parts to the current model. Therefore,
the Bureau and the Center had to decide to abandon
the use of OMRs for the 2000 Census. Another
thing is that as today's OCRs give wider flexibility
in designing questionnaires and demand less
restrictions on paper quality, it was right
time for us to seek for the most appropriate
OCRs for the 2000 Census.
9. OCRs are about to be used for data capture
for the 1998 Housing and Land Survey conducted
in October, 1998. The results of the OCRs reading
test using the questionnaires of the pilot survey
for the 1998 Housing and Land Survey are shown
in Appendix
5. The two pilot surveys for the Census
showed similar results in the accuracy of reading
by OCRs. We are about to conclude that OCRs
are applicable for the coming Census because
the reading error was found very little.
10. However, there still remains a problem
that some particular numerals are apt to be
misread more than others. These reading errors,
which often occur systematically, may affect
considerably the quality of the Census output,
and two counter-measures are being considered.
The one is to improve the instructions to respondents,
and to enumerators and supervisors regarding
how to write numerals in the questionnaire,
and what kind of written numerals need to be
rewritten. The third pilot survey conducted
in November, 1998 is expected to present how
much improvement will be brought by this method.
The improvement of OCRs' capability is the
other measure. One thing is to improve the Numeral
Patterns for recognition by OCRs, and will be
done through the development of the prototype
OCR equipment for the 2000 Census.
Questionnaire
Design and Printing
11. The size of the questionnaire was changed
from B4 to A4 at the 1995 Census to aim at easier
handling of questionnaires and more speedy data
capture. There has been a claim to the A4 size
questionnaire, however, that letters are too
small to read in the questionnaire. The above
claim will be eased by the implementation of
OCRs. OCRs allow for more space in the questionnaire
and make it possible to print larger letters
in darker color. Taking the above situations
into account, the A4 size questionnaire will
be again used for the 2000 Census.
Improvement
of the Tabulation System
12. The tabulation and release plan for the
1995 Census is listed in Appendix
6. In the 1995 Census, the Bureau was able
to release the complete counts on industry six
months earlier than in the 1990 Census, since
the coding work of the industrial classification
at the major group level was decentralized to
be performed in 3,365 municipal offices all
over Japan. This decentralization of the coding
work of the industrial classification will be
followed also in the forthcoming 2000 Census.
13. Personnel for tabulating work of the Census
has been decreasing considerably these 30 years
in accordance with the policy of the Japanese
Government, while the number of the questionnaires
to be processed has been increasing, and users
request that the Census results should be released
earlier and earlier. Therefore, the tabulating
work has to be far more efficient than in the
last Census.
14. Among various measures coping with this
matter, first, computer imputation should be
more utilized than in the previous Census so
as to reduce manual work. The process of data-editing
and imputation needs a lot of manual work to
scrutinize those responses which do not satisfy
consistency-check. If this work is dealt with
by computer imputation with almost the same
accuracy as by manual work, it will help reduce
manpower.
15. Second, image data should be more utilized.
Descriptive responses such as place of work,
industry and occupation should be referred to
for the coding and editing work. Once the Census
questionnaires are inputted into the computer
system by using OCRs, the image data of the
hand-written answers to those questions can
be displayed on PCs. Then, clerical workers
can put classification codes or correct errors
by viewing the contents of the questionnaires
on their PCs. They need not take the original
paper questionnaires out from the storage or
back to there.
Closing
Remarks
16. The Bureau and the Center have been taking
measures to enhance the efficiency of the Census
tabulation work, particularly with respect to
manual processing. As mentioned above, the Bureau
and the Center will adopt A4 size questionnaires,
decentralized industrial classification coding,
OCRs, image data, man-equivalent computer imputation,
etc. as much as possible for the forthcoming
Population Census in 2000.
APPENDIX
1: Topics Investigated in the 1990 and 1995 Population
Census in Japan
Topics Investigated in The
1990 and 1995 Population Census
Topics investigated the 1995 population census:
(For household members)
Name
Sex
Year and month of birth
Relationship to the
head of household
Marital status
Nationality
Labour force status
Name of establishment
and kind of business (industry)
Kind of work (occupation)
Status in employment
Place of work or school
(For households)
Type of household
Number of household
members
Type and tenure of
dwelling
Number of dwelling
rooms
Area of floor space
of dwelling
Type of building and
number of stories
The following five topics were also investigated
in the 1990 population census:
(For household members)
Place of residence 5
years ago
Education
Transportation to the
place of work or school
Commuting time to the
place of work or school
(For households)
Source of household
income
APPENDIX
2: Historical Development of OMR Used for Census
Tabulation
Historical Development of OMRs (Optical Mark
Readers) Used for Census Tabulation
APPENDIX
3: Results of OMR's Reading in the 1990 and 1995
Population Census
Results of OMR's Reading in
the 1990 and 1995 Population Census
(1990)
Type of Tabulation
Inputting Period
Inputted questionnaires
Rejected
Jammed
Doubly fed
1st Basic Complete Tabulation
1990.11-1991.8
(10 months)
50,930,567
23,338
0.05%
8,072
0.02%
91,423
0.18%
2nd Basic Complete Tabulation
1991.8-1992.5
(16 months)
50,861,415
10,216
0.02%
4,602
0.01%
59,212
0.12%
3rd Basic Complete Tabulation
1992.5-1993.7
(15 months)
50,855,386
3,262
0.01%
3,669
0.01%
37,752
0.07%
Detailed Sample Tabulation
1993.4-1994.6
(15 months)
7,410,425
594
0.01%
925
0.01%
3,503
0.05%
Notes: Rejected and jammed questionnaires
are excluded from inputted questionnaires.
Double-fed questionnaires are re-inputted.
(1995)
Type of Tabulation
Inputting Period
Inputted questionnaires
Rejected
Jammed
Doubly fed
1st Basic Complete Tabulation
1995.11-1996.8
(10 months)
54,639,491
34,242
0.06%
12,741
0.02%
99,178
0.18%
2nd Basic Complete Tabulation
3rd Basic Complete Tabulation
1996.9-1997.11
(14 months)
53,709,613
7,880
0.01%
4,438
0.01%
37,316
0.07%
Notes: Rejected and jammed questionnaires
are excluded from inputted questionnaires.
Double-fed questionnaires are re-inputted.
APPENDIX
4: System Configuration of the OCR for the 1998
Housing and Land Survey
System Configuration of the
OCR for the 1998 Housing and Land Survey
Hardware
Scanning and recognition device
This device consists of a scanning part and
a character recognition part as well as a part
for controlling the device itself, magnetic
disks that can store character and image data
input in a day, a keyboard, a display, a mouse
and a 3.5 inch floppy disk drive.
Device for correcting characters that could
not be read by the OCR
This device includes the parts to be needed
for correcting characters that could not be
read by the OCR. They are, for example, a keyboard,
a mouse, and a 13.3-inch or over TFT color liquid
crystal display.
Software
Operating system
The OCR is equipped with Windows NT Workstation
4.0 or Windows NT Server 4.0 as an operating
system.
Character recognition
software
The character recognition device is equipped
with software for recognizing characters.
If the recognition is made by hardware, this
software in not needed.
Software for correcting
characters that could not be read by the OCR
The device for correcting characters is equipped
with software for correcting characters that
could not be read by the OCR.
Utility software for
file management
Both the controlling unit in the recognition
device and the device for correcting characters
are equipped with the utility software that
has functions of creating, deleting, copying,
sorting and transmitting data.
Software for client
PCs to correct characters that could not be
read by the OCR
This software is used at the client PCs in
the LAN of the Statistics Bureau and the Statistics
Center. The client PCs are NEC PC-9821Nr166/X30N,
which is equipped with Windows NT Workstation
4.0. The OCR is connected to the LAN using
Ethernet and TCP/IP.
Performance of the
OCR
The OCR can read simultaneously both sides
of the questionnaires of 1998 Housing and Land
Survey.
By operating three sets of OCRs, 48,000 questionnaires
can be processed per day, including the recognition
of characters and marks, and the storing of
data both as characters and image data. The
operation time of the OCR are 12 hours per day
with the net working rate of 0.9.
The resolution for reading, storing and outputting
image data is 200 dpi or over.
The data for characters and marks can be outputted
in any formats such as CSV type or ASCII type.
Image data can be outputted in TIFF format with
MMR data compression.
APPENDIX
5: Accuracy of OCR's Reading -- Results of the
Pilot Survey for the 1998 Housing and Land Survey
APPENDIX
6: Tabulation and Release Schedule for the 1995
Population Census
All the questionnaires
were submitted from the prefectural governments
and accumulated at the Statistics Center for
processing. They were read by optical mark
readers (OMRs), and tabulated by computer.
The tabulations and releases of the results
were done in the following groups.
Preliminary Counts
of the Population;
Prompt Sample Tabulation;
The First Complete-Count
Tabulation;
The Second Complete-Count
Tabulation;
The Third Complete-Count
Tabulation;
Detailed Sample
Tabulation;
Tabulation on Place
of Work or Schooling;
Tabulation by BUB
and Administrative District.
The Preliminary Counts,
which were based on the summary sheets prepared
at the municipalities, give the population
by sex and the number of households by municipality,
prefecture, and for the whole country.?The
Preliminary Counts were released at the end
of December 1995.
The Prompt Sample Tabulation,
which was based on 1% sample questionnaires,
gave the overview of the population structure
of the country and the prefectures.?But the
detailed regional data for respective municipalities
were not made available due to sampling errors.
The results were released in June 1996.
The Complete Count Tabulation
was performed in three phases.?In the first
phase, the topics of tabulation are limited
to such basic demographic and household characteristics
as sex, age, marital status, nationality,
household type and housing conditions. These
topics are all coded by the households themselves
or the enumerators, and thus processed without
any manual coding by the central staff. The
results of the first phase tabulation were
released in November 1996.
The second phase included
tabulation by industry (major group), labour
force status, status in employment, place
of work or schooling, etc. And the third phase
included tabulation by occupation (major group).
The second and the third phases required manual
coding on industry and occupation, and took
much time. The Second Complete-Count Tabulation
was released in January 1997, and the Third
Complete-Count Tabulation was released in
March 1998.
The whole tabulation
of the 1995 Population Census is to be completed
by July 1999. The Census tabulation work is
thus distributed in the five-year inter-census
period in Japan according to priorities, as
most clerical workers are permanent employees.