In the 1965 Population Census, small size
IBM OCRs were used for the first time in history
as mark readers for a limited number of items
for 100% tabulation. In the 1970 Census, larger
size NEC OMRs were introduced to replace key-entry
entirely. In both two censuses, questionnaires
had to be transcribed manually to OCR/OMR sheets.
Mark sheet type questionnaires printed on one
side were introduced in the 1975 Census, where
households entered marks directly on the sheets
and no transcription was needed. Mark sheet
type questionnaires printed on both sides were
used in the 1980 Census. The OMRs used in the
1980 Census were designed to read the marks
written on both sides of the questionnaire simultaneously.
For the 1990 Census, OMIRs (Optical Mark and
Image Readers) were developed. The OMIRs have
functions not only to read marks but to read
images in the specified areas on the questionnaire.
Since hand-written responses about industry
and occupation were captured by the computer
using OMIRs, it was expected unnecessary for
clerical workers to refer to the original questionnaires
in the coding and editing processes.
1-2
OCR for the 2000 Population Census
The Statistics Bureau is planning to adopt
OCRs for the 2000 Census. Although the Statistics
Bureau has been using NEC OMRs without any serious
problems until the 1995 Census, the manufacturer
of the OMRs has decided to discontinue technical
supports and production of spare- parts for
the current model which is essentially the same
as the 1975 model.
As today's OCRs give wider flexibility in designing
questionnaires and demand less restrictions
on paper quality, we have seeking makers who
can provide appropriate OCRs for the 2000 Census.
1-3
Considerations
Considerations when using OMR/OCR are:
quality of paper acceptable
for OMR/OCR sheets;
the printing colour for
OMR/OCR sheets should be "drop out colours"
which cannot be recognized as marks/letters
by OMR/OCR;
restrictions in designing
OMR/OCR sheets such as entry positions, intervals
between entry positions, etc.;
recognizable marks/letters
should be written by respondents/ interviewers;
cost, performance and maintainability.
2
Use of CMS
The Census Mapping System (CMS) is a geographic
information system (GIS) developed by the Statistics
Bureau in the 1990 Population Census for the
first time. CMS contains boundaries and statistical
data for all the Enumeration Districts (EDs)1)
and Basic Unit Blocks (BUBs)2),
and can produce statistical maps based on EDs
and BUBs. The original purpose of developing
CMS was to improve the efficiency of production
of ED maps and small area statistics. It is
becoming more and more important as spatial
data infrastructure. Described below are some
of applications of CMS in Japan.
1)
The EDs are designed to define areas assigned
to enumerators and cover about 50 households on
average. The EDs also serve as sampling
units for various sample surveys. In the
1995 Census, about 881 thousand EDs were established.
2)
The BUBs were introduced in demarcating EDs in
the 1990 Census for the first time. BUBs
correspond to area blocks separated by clearly
identifiable and permanent geographical objects,
such as roads, railways, rivers, etc. BUBs
are more or less permanent and correspond
to the address designations. They are used
as the smallest area units for compiling small
area statistics. In the 1995 Census, there
were about 1,742 thousand BUBs
2-1
Use of CMS for Demarcation of DIDs
DIDs were introduced to demarcate urbanized
areas since rural areas were absorbed into cities
through the integration policy of cities, towns
and villages during 1950's. In the 1995 Census,
CMS played a very important role in the work
of demarcating Densely Inhabited Districts (DIDs).
DIDs have been designated by the Statistics
Bureau in every census since 1960. Statistics
for DIDs and their area demarcation are used
for various official and analytical purposes,
e.g. for grant fund allocation from the national
government to local governments, for urban planning,
etc. A DID is defined as a contiguous area of
BUBs whose population density at the BUB level
is 4,000 persons/km2 or more, and whose total
population is 5,000 persons or more.
In the past censuses, the demarcation of DIDs
was done manually by measuring the area and
computing the population density for EDs. But
in the 1995 Census, the CMS automatically computed
the area and the population density for every
BUB, and drew the maps showing the areas which
have a possibility of meeting the criteria for
DID. As there are some additional criteria other
than population density and contiguity, it was
still necessary for the cartographic staff to
visually examine the maps and get some additional
information. For this reason, the demarcation
of DID could not be fully automated, but the
workload of DID demarcation was significantly
reduced by CMS (cf. App. 2).
2-2
CMS for Production of Grid-Square Statistics
CMS has also contributed to reducing the
workload of production of grid-square statistics
and thus speeding up its release by almost six
months. Grid squares are areas of 1km squares
uniformly defined for the whole country on the
basis of longitude and latitude lines. Grid-square
statistics are compiled from ED statistics by
establishing correspondence between EDs and
grid squares. In the past censuses, the work
to establish correspondence between EDs and
grid squares required quite a great deal of
manpower as the work was manually done. But
because the correspondence between EDs and grid
squares was established by CMS automatically,
the manual workload was significantly reduced
in the 1995 Census.
2-3
Comparison of Boundaries of EDs between Censuses
To reduce the response burden of various
sample household surveys, the Statistics Bureau
makes it a rule to avoid selecting the same
EDs as sample areas in different statistical
surveys within a certain period of time. For
example, in the Family Income and Expenditure
Survey, the sampling method is designed so that
the same ED may not be selected twice within
a period of five years or less. For this purpose,
correspondence of the EDs between two censuses
is required. This work was made by manually
comparing ED maps of two censuses. In the 1995
Census, the boundaries of EDs in 1990 and 1995
were compared by CMS, and the correspondence
list of EDs between two censuses was made automatically.
2-4
Production of Maps
2-4-1 Production of the
Enumeration Summary Map (Chosaku Youzu)
Drawing a Enumeration Summary Map is the
enumerator's work. Enumerators visit the ED
assigned to them, confirm the boarders of the
ED and BUBs, draw a Enumeration Summary Map
and put down the location of each household
on it. It is a time consuming job for the enumerator.
The Statistics Bureau is now studying possibilities
of producing base maps for the Enumeration Summary
Map by CMS for the 2000 Census(cf. App. 3).
2-4-2 Automated Production
of ED Maps
In the 1995 Census, the production of ED
maps were automated for seven major cities by
using the GIS that the respective cities had
developed for their own purposes. In the 2000
Census, it will be necessary to increase such
automated production of ED maps. But the problem
is that CMS contains the ED and BUB boundaries
but not background topographical maps, and that
another GIS having the background digital maps
is needed for drawing ED maps. CMS had to be
designed in this way because its development
and maintenance would have cost enormously if
it had to cover base topographical maps as well
(cf. App. 4).
2-5
Expansion of CMS to cover the Establishment Census
Application
of GIS in production of statistics is not limited
to the Population Census, but extends to other
censuses. The most important one is the Establishment
Census. Traditionally, the EDs of the Establishment
Census have been demarcated independently from
those of the Population Census, because the
geographical distribution of households and
establishments is quite different. However,
as the BUBs adopted in the Population Census
are relatively compatible or homogeneous with
the EDs of the Establishment Census, it has
become possible in the 1996 Establishment Census
to establish close linkage between the BUBs
and the EDs of the Establishment Census. By
integrating the data of the EDs of the Establishment
Census into CMS, the capabilities of CMS will
be greatly enhanced. As small area data on both
households and establishments can be used in
combination (e.g. for computing the day-time
population), there will be more applications
of CMS for various purposes.
2-6
Dissemination of Small area statistics with CMS
data
From the 1995 Census on, use of small-area
statistics of the Census will become easier
than before owing to CMS. In the past, the information
of locations of EDs could not be easily disseminated,
because ED maps were large and voluminous. But
the CMS data on locations of BUBs and EDs can
be disseminated via computer files. In the 1995
Census, the following machine-readable files
containing geographic reference information
have been disseminated: i) boundaries of BUBs,
ii) correspondence between BUBs and cho-aza
(area section) names, and iii) the longitude
and latitude of the central point of BUB. For
statistical users with a good GIS expertise,
the data of the boundaries of BUBs will be most
useful for identifying the locations. For statistical
users without GIS, the cho-aza names
file will be a simple but useful tool to identify
the locations of BUBs.
2-7
Privacy Consideration in Disseminating CMS data
Although the dissemination of CMS data are
expected to open up a new field of use of small-area
statistics, there are some risks of privacy
leakage. As a BUB is an area covering as small
as 25 households on average, there is always
some possibility that existence of a person
having a rare characteristics is recognized
in a particular BUB. In order to avoid such
risks, the detailed BUB statistics are provided
only to the national and local governments,
while the private users are provided with only
the total population, its breakdown by sex and
age (3 groups of 0-14, 15-64, and 65+), and
the total households at the BUB level. Although
some private users may not be satisfied with
the lack of detailed characteristics, it is
more important for the Statistics Bureau to
keep the public confidence in privacy protection
in the Population Census. To compensate the
restriction in BUB statistics on private users,
statistics of detailed characteristics are made
available for cho-aza areas which are
much broader than BUBs but smaller than municipalities.
In the future, if necessity of more detailed
BUB statistics and safety in terms of privacy
protection are well understood by the public,
it will become possible to disseminate more
detailed BUB statistics to private users.