UN Web Site | UN Web Site Locator
Home Site map Contact 
ESCAP Statistics Division
ESCAP Statistics Division
 
Third Meeting    
The Third Meeting of the Working Party on the Application of New Technology to Population Data
Bali, 7-9 January 1999

STAT/WPA(3)/12
7 January 1999
ENGLISH ONLY

ECONOMIC AND SOCIAL COMMISSION FOR ASIA AND THE PACIFIC

Working Party on the Application of  New Technology to Population Data
Third meeting
7-9 January 1999
Bali

Experience in Using OMR/OCR in the Population Census of Japan
Akihito Yamauchi
Population Census Division, Statistics Bureau
Management and Coordination Agency, Government of Japan
Contents

Introduction

1. The Population Census of Japan, authorized as Designated Statistics No.1, has been conducted in the years ending with 0 and 5. The former is called the full-scale Census and the latter, the simplified Census. The difference between the both Censuses is the number of topics surveyed. In the last several Censuses, the full-scale one covered 22 topics, and the simplified one 17 topics. The topics, even in the full-scale, are restricted to basic items in the demographic, economic and social fields. Housing items are limited. Detailed housing items are covered by the Housing and Land Survey which is taken with a large sample in the years ending with 3 and 8. That is why the Census of Japan has not been named the Population and Housing Census.

2. The next Population Census of Japan is scheduled to be taken as of 1 October, 2000. The data from the 2000 Census are quite essential to the national planning toward the 21st century. The time left before its taking is less than two years. The Statistics Bureau and the Statistics Center (hereinafter referred to as "the Bureau" and "the Center" respectively) have made continuous efforts to draw out the basic design covering overall stages from preparatory work to tabulation and dissemination for the 2000 Census. On the other hand, the Bureau and the Center have held several meetings with users such as ministries and academic researchers to grasp what data they need, while they have examined what methods should be adopted for smooth field-enumeration by discussing with persons-in-charge of local organizations. Three pilot surveys have been already carried out to examine questionnaire forms, method of enumeration, performance of several types of OCR equipment, etc. The full-dress rehearsal will be taken in June this year.

3. This paper shows the experience we acquired in the use of OMRs in the past Censuses and the preparations we made for the new OCR equipment for the 2000 Census.

Use of OMRs in the Past Population Censuses

4. In the 1965 Population Census, small size IBM OCRs were used for the first time in history as mark readers for a limited number of items for 100% tabulation. In the 1970 Census, larger size NEC OMRs were introduced to replace key-entry entirely. In both two censuses, questionnaires had to be transcribed manually to OCR/OMR sheets.

5. Mark sheet type questionnaires printed on one side were introduced in the 1975 Census, where households entered marks directly on the sheets and no transcription was needed. Mark sheet type questionnaires printed on both sides were used in the 1980 Census. The OMRs used in the 1980 Census were designed to read the marks written on both sides of the questionnaire simultaneously.

6. For the 1990 Census, OMIRs (Optical Mark and Image Readers) were developed. The OMIRs had functions not only of reading marks but of capturing images in the specified areas on the questionnaire. Since hand-written responses on industry and occupation were captured by computer using OMIRs, it was expected unnecessary for clerical workers to refer to the original questionnaires in the coding and editing processes. But, captured images were sometimes ambiguous because machine checking and adjustment was not always made properly for image reading. Therefore, their functions were not fully utilized in the coding work.

7. The latest model of OMIRs that were used for the 1990 and 1995 Censuses was capable of reading mark sheets of up to 257? ? 364?. The reading speed is 150 sheets per minutes, when continuously operated. In order to avoid interruption caused by the operation of refilling questionnaire sheets to the stacker, dual stackers with an automatic changer were installed.

Introduction of OCRs for the 2000 Census

8. Although the Bureau and the Center used NEC OMRs without any serious problems until the 1995 Census, the manufacturer found it difficult to continue technical supports and production of spare-parts to the current model. Therefore, the Bureau and the Center had to decide to abandon the use of OMRs for the 2000 Census. Another thing is that as today's OCRs give wider flexibility in designing questionnaires and demand less restrictions on paper quality, it was right time for us to seek for the most appropriate OCRs for the 2000 Census.

9. OCRs are about to be used for data capture for the 1998 Housing and Land Survey conducted in October, 1998. The results of the OCRs reading test using the questionnaires of the pilot survey for the 1998 Housing and Land Survey are shown in Appendix 5. The two pilot surveys for the Census showed similar results in the accuracy of reading by OCRs. We are about to conclude that OCRs are applicable for the coming Census because the reading error was found very little.

10. However, there still remains a problem that some particular numerals are apt to be misread more than others. These reading errors, which often occur systematically, may affect considerably the quality of the Census output, and two counter-measures are being considered. The one is to improve the instructions to respondents, and to enumerators and supervisors regarding how to write numerals in the questionnaire, and what kind of written numerals need to be rewritten. The third pilot survey conducted in November, 1998 is expected to present how much improvement will be brought by this method.

The improvement of OCRs' capability is the other measure. One thing is to improve the Numeral Patterns for recognition by OCRs, and will be done through the development of the prototype OCR equipment for the 2000 Census.

Questionnaire Design and Printing

11. The size of the questionnaire was changed from B4 to A4 at the 1995 Census to aim at easier handling of questionnaires and more speedy data capture. There has been a claim to the A4 size questionnaire, however, that letters are too small to read in the questionnaire. The above claim will be eased by the implementation of OCRs. OCRs allow for more space in the questionnaire and make it possible to print larger letters in darker color. Taking the above situations into account, the A4 size questionnaire will be again used for the 2000 Census.

Improvement of the Tabulation System

12. The tabulation and release plan for the 1995 Census is listed in Appendix 6. In the 1995 Census, the Bureau was able to release the complete counts on industry six months earlier than in the 1990 Census, since the coding work of the industrial classification at the major group level was decentralized to be performed in 3,365 municipal offices all over Japan. This decentralization of the coding work of the industrial classification will be followed also in the forthcoming 2000 Census.

13. Personnel for tabulating work of the Census has been decreasing considerably these 30 years in accordance with the policy of the Japanese Government, while the number of the questionnaires to be processed has been increasing, and users request that the Census results should be released earlier and earlier. Therefore, the tabulating work has to be far more efficient than in the last Census.

14. Among various measures coping with this matter, first, computer imputation should be more utilized than in the previous Census so as to reduce manual work. The process of data-editing and imputation needs a lot of manual work to scrutinize those responses which do not satisfy consistency-check. If this work is dealt with by computer imputation with almost the same accuracy as by manual work, it will help reduce manpower.

15. Second, image data should be more utilized. Descriptive responses such as place of work, industry and occupation should be referred to for the coding and editing work. Once the Census questionnaires are inputted into the computer system by using OCRs, the image data of the hand-written answers to those questions can be displayed on PCs. Then, clerical workers can put classification codes or correct errors by viewing the contents of the questionnaires on their PCs. They need not take the original paper questionnaires out from the storage or back to there.

Closing Remarks

16. The Bureau and the Center have been taking measures to enhance the efficiency of the Census tabulation work, particularly with respect to manual processing. As mentioned above, the Bureau and the Center will adopt A4 size questionnaires, decentralized industrial classification coding, OCRs, image data, man-equivalent computer imputation, etc. as much as possible for the forthcoming Population Census in 2000.

APPENDIX 1: Topics Investigated in the 1990 and 1995 Population Census in Japan

Topics Investigated in The 1990 and 1995 Population Census
Topics investigated the 1995 population census:

(For household members)

  1. Name
  2. Sex
  3. Year and month of birth
  4. Relationship to the head of household
  5. Marital status
  6. Nationality
  7. Labour force status
  8. Name of establishment and kind of business (industry)
  9. Kind of work (occupation)
  10. Status in employment
  11. Place of work or school
(For households)
  1. Type of household
  2. Number of household members
  3. Type and tenure of dwelling
  4. Number of dwelling rooms
  5. Area of floor space of dwelling
  6. Type of building and number of stories
The following five topics were also investigated in the 1990 population census:

(For household members)

  1. Place of residence 5 years ago
  2. Education
  3. Transportation to the place of work or school
  4. Commuting time to the place of work or school
(For households)
  1. Source of household income
APPENDIX 2: Historical Development of OMR Used for Census Tabulation

Historical Development of OMRs (Optical Mark Readers) Used for Census Tabulation

APPENDIX 3: Results of OMR's Reading in the 1990 and 1995 Population Census
Results of OMR's Reading in the 1990 and 1995 Population Census
(1990)
Type of Tabulation
Inputting Period
Inputted questionnaires
Rejected
Jammed
Doubly fed
1st Basic Complete Tabulation
1990.11-1991.8
(10 months)
50,930,567
23,338
0.05%
8,072
0.02%
91,423
0.18%
2nd Basic Complete Tabulation
1991.8-1992.5
(16 months)
50,861,415
10,216
0.02%
4,602
0.01%
59,212
0.12%
3rd Basic Complete Tabulation
1992.5-1993.7
(15 months)
50,855,386
3,262
0.01%
3,669
0.01%
37,752
0.07%
Detailed Sample Tabulation
1993.4-1994.6
(15 months)
7,410,425
594
0.01%
925
0.01%
3,503
0.05%
Notes: Rejected and jammed questionnaires are excluded from inputted questionnaires.
          Double-fed questionnaires are re-inputted.
(1995)
Type of Tabulation
Inputting Period
Inputted questionnaires
Rejected
Jammed
Doubly fed
1st Basic Complete Tabulation
1995.11-1996.8
(10 months)
54,639,491
34,242
0.06%
12,741
0.02%
99,178
0.18%
2nd Basic Complete Tabulation
3rd Basic Complete Tabulation
1996.9-1997.11
(14 months)
53,709,613
7,880
0.01%
4,438
0.01%
37,316
0.07%
Notes: Rejected and jammed questionnaires are excluded from inputted questionnaires.
          Double-fed questionnaires are re-inputted.
APPENDIX 4: System Configuration of the OCR for the 1998 Housing and Land Survey

System Configuration of the OCR for the 1998 Housing and Land Survey

Hardware

Scanning and recognition device
This device consists of a scanning part and a character recognition part as well as a part for controlling the device itself, magnetic disks that can store character and image data input in a day, a keyboard, a display, a mouse and a 3.5 inch floppy disk drive.
Device for correcting characters that could not be read by the OCR
This device includes the parts to be needed for correcting characters that could not be read by the OCR. They are, for example, a keyboard, a mouse, and a 13.3-inch or over TFT color liquid crystal display.

Software

  1. Operating system
    The OCR is equipped with Windows NT Workstation 4.0 or Windows NT Server 4.0 as an operating system.
  2. Character recognition software
    The character recognition device is equipped with software for recognizing characters. If the recognition is made by hardware, this software in not needed.
  3. Software for correcting characters that could not be read by the OCR
    The device for correcting characters is equipped with software for correcting characters that could not be read by the OCR.
  4. Utility software for file management
    Both the controlling unit in the recognition device and the device for correcting characters are equipped with the utility software that has functions of creating, deleting, copying, sorting and transmitting data.
  5. Software for client PCs to correct characters that could not be read by the OCR
    This software is used at the client PCs in the LAN of the Statistics Bureau and the Statistics Center. The client PCs are NEC PC-9821Nr166/X30N, which is equipped with Windows NT Workstation 4.0. The OCR is connected to the LAN using Ethernet and TCP/IP.
  6. Performance of the OCR

The OCR can read simultaneously both sides of the questionnaires of 1998 Housing and Land Survey.

By operating three sets of OCRs, 48,000 questionnaires can be processed per day, including the recognition of characters and marks, and the storing of data both as characters and image data. The operation time of the OCR are 12 hours per day with the net working rate of 0.9.

The resolution for reading, storing and outputting image data is 200 dpi or over.

The data for characters and marks can be outputted in any formats such as CSV type or ASCII type. Image data can be outputted in TIFF format with MMR data compression.

APPENDIX 5: Accuracy of OCR's Reading -- Results of the Pilot Survey for the 1998 Housing and Land Survey
Results of OCR's Reading
APPENDIX 6: Tabulation and Release Schedule for the 1995 Population Census
  1. All the questionnaires were submitted from the prefectural governments and accumulated at the Statistics Center for processing. They were read by optical mark readers (OMRs), and tabulated by computer. The tabulations and releases of the results were done in the following groups.
    1. Preliminary Counts of the Population;
    2. Prompt Sample Tabulation;
    3. The First Complete-Count Tabulation;
    4. The Second Complete-Count Tabulation;
    5. The Third Complete-Count Tabulation;
    6. Detailed Sample Tabulation;
    7. Tabulation on Place of Work or Schooling;
    8. Tabulation by BUB and Administrative District.
  2. The Preliminary Counts, which were based on the summary sheets prepared at the municipalities, give the population by sex and the number of households by municipality, prefecture, and for the whole country.?The Preliminary Counts were released at the end of December 1995.
  3. The Prompt Sample Tabulation, which was based on 1% sample questionnaires, gave the overview of the population structure of the country and the prefectures.?But the detailed regional data for respective municipalities were not made available due to sampling errors.  The results were released in June 1996.
  4. The Complete Count Tabulation was performed in three phases.?In the first phase, the topics of tabulation are limited to such basic demographic and household characteristics as sex, age, marital status, nationality, household type and housing conditions. These topics are all coded by the households themselves or the enumerators, and thus processed without any manual coding by the central staff. The results of the first phase tabulation were released in November 1996.
  5. The second phase included tabulation by industry (major group), labour force status, status in employment, place of work or schooling, etc. And the third phase included tabulation by occupation (major group). The second and the third phases required manual coding on industry and occupation, and took much time. The Second Complete-Count Tabulation was released in January 1997, and the Third Complete-Count Tabulation was released in March 1998.
  6. The whole tabulation of the 1995 Population Census is to be completed by July 1999. The Census tabulation work is thus distributed in the five-year inter-census period in Japan according to priorities, as most clerical workers are permanent employees.

 
Pop-IT project (1997-2001)
Project Objectives
Working Party Members
Working Party Meetings
First meeting, Bangkok, 24-26 September 1997
Second meeting, Singapore, 1-3 April 1998
Third meeting, Bali, 7-9 January 1999
Fourth meeting, Manila, 6-9 July 1999
Ffth meeting, Bangkok, 21 October 1999
Sixth meeting, Bangkok, 26 March 2001
Workshops
Application of New Information Technology to Population data, Bangkok, 12-20 October 1999
Population Data Analysis, Storage and Dissemination Technologies, Bangkok, 27-30 March 2001
Guidelines
Population data collection and capture (BBS - Statistics Indonesia)
GPS in modern mapping and GIS technologies to population data (Bangladesh Bureau of Statistics)
Population data dissemination (Statistics New Zealand)
Project Newsletter
Contact us
   
Copyright (c) 2013 ESCAP  |  Legal Notice