The First Meeting of the
Working Party on the Application of New Technology
to Population Data
Bangkok, 24-26 September
1997
STAT/WPA.1/3.3
24 September 1997
ENGLISH ONLY
ECONOMIC AND SOCIAL COMMISSION FOR ASIA AND THE
PACIFIC
Working Party on Application of New Technology
to Population Data
First Meeting
24-26 September 1997
Bangkok
Recent Developments
in the Information Technology to Population Data
Collection, Processing, and Dissemination in the
Central Bureau of Statistics of the Republic of
Indonesia
The Central Bureau of Statistics of the Republic
of Indonesia (CBS) has utilized information
technology in processing population data since
1970s where a mainframe computer at that time
was used to process 1971 Population Census data..
In fact, the computer was equipped with an optical
mark reader (OMR) to speed up data capturing.
The computer has been modernized ever since
to more advanced computers from the ICL 1900
in 1970s, to NEC Acos 500 in 1980s and to NEC
Acos 1502 in 1990s.
On the other hand, the technology of a microcomputer
was developing very fast, hence, CBS also adapted
the technology. In line with the coverage works
of CBS where there were regional offices supporting
a central office. the microcomputer technology
enabled CBS to distribute data processing works
to the regional offices. Downsizing of the volume
computers and especially their prices supported
CBS in decentralizing its processing by allocating
computers in its 27 Provincial regional offices
since 1980s and 304~Kabupaten regional offices
since 1990s. And currently, those computers
are being linked into Local Area Networks (LAN)
in each province and also connected in a national
network to have synergetic capability.
With such computer capabilities, all regional
offices participated in processing 1990 Population
Census complete enumeration data, and several
selected regional offices also participated
in processing sample enumeration data. Processing
of the 1995 Intercensal Population Survey data
was also done with the involvement of regional
offices.
Impact of information technology on statistical
activities shows that a great progress has been
achieved in various stages of a population census
or surveys activities, specifically in collecting,
processing and disseminating of population data.
However, with the increasing demand for a better
quality and faster result of population statistics,
processing of the data has become more complex.
For that reason, CBS has always tried to scrutinize
the possibility of the information technology
in improving the statistical operation activities.
In the following, we will discuss the recent
development in the information technology to
population data collection, processing, and
dissemination, however, to give a background
of discussion, we will firstly discuss about
population data in CBS.
2.
Population data in CBS
Population information is collected by CBS
in several surveys, namely, Population Census,
Intercensal Population Survey, Demographic and
Health Survey, and Civil Registration. Either
one of this survey, data is collected mostly
from the household by asking the head of household
about several characteristics related to the
members of the household. Data collectors go
to respondents' places, and ask for the information
acquired in questionnaires. Questions are usually
grouped into blocks, such as identity, household
information, and. individually member information-
blocks.
n the previous population censuses, there are
two ways of collecting population information,
namely, complete enumeration and sample survey
enumeration. Complete enumeration collects basic
information items such as name, sex, and age,
while more detailed information such as relationship
to the head of the household, education, fertility,
mobility, and information on housing conditions
are collected in the sample enumeration.' However,
there is a plan for 2000 Population Census to
collect data only in complete enumeration because
most information in the past population censuses
could not be presented for small administrative
areas, except those that were collected through
complete enumeration. On the other hand, the
demand for such statistics has been increasing.
To meet the demand, CBS concluded that core
information, namely, sex, age, marital status,
religion, citizenship, place of birth, education,
labor force, and fertility should be collected
in the complete enumeration of the 2000 Population
Census, but the questions should be simplified.
Detailed information about housing units, which
were not very comprehensive in the past censuses,
will be collected through a household survey.
A population census is governed under the Census
Law No 7 Year 1960 which determines that population
census should be conducted every ten years,
in the year end with zero. Basically, planning
of a population census is not done by CBS (alone
but with the participation of other institutions.
Therefore, census planning is guided by an interdepartmental
census committee that gives directions regarding
to the general objective of the census and is
in charge with the design of census questionnaires,
training manuals, census methodology, tabulation
plan and data processing. Field tests and general
rehearsals are a prerequisite before the whole
census plan is finally made.
Information collected in the Intercensal Population
Survey (Supas) is very similar to -information
collected in the sample enumeration of the Population
Census but Supas concern mainly on fertility
and mortality. The same information is also
collected in Indonesia Demographic Health Survey
(SDK(). Hence, SDKI collects information on
fertility, mortality, health, and family planning,
for example, respondent background, birth history,
fertility preferences, breast breeding, family
planning, and employment. 1n addition
some information such as maternal care, health
and immunization of children under five years
old is also collected in SDKI. Current SDKI
is expanded to also collect information on knowledge
on knowledge of AIDS and maternal mortality,
as well as households expenditure, and service
available for planning and health.
Civil registration deals with information about
vital events such as births, deaths, and migration
and is reported to the village authorities.
However, the quality of information collected
in civil registration is very poor, so that
information collected is basically for comparison
purposes.
3.
Data Collection
Actually, the use of information technology
in capturing data by utilizing OMR has been
started when processing the 1971 Population
Census. However, there were some weaknesses
of the technology that enforced CBS to abandon
the technology. Among the main reasons for changing
the technology were: the high cost of a good
quality paper in the country which meet the
factory requirements of the OMR machine, the
lack of high precision printing facility in
the country (Suharto, 1993). Such technology
required CBS to have questionnaires printed
in other countries such as Australia. The other
disadvantage is the compulsory to keep the paper
neat and clean which is very difficult to be
done by data collectors in the remote areas
of Indonesia.
The emergence of electronic data entry stations
that key data directly to the computer media
was also one reason why CBS abandoned OMR.
In the 1980 population census; CBS utilized
a I number of microcomputers dedicated as data
entry machines in the central office. As the
microcomputer was flourishing, the decentralization
of computer capabilities allowed the data entry
of the 1990 Population Census complete enumeration
data to be done in the provincial regional offices.
As a result, total number of populations can
be announced by the President of the Republic
of Indonesia in less than six months after the
census date. Besides of decentralization
of data entering, the use of communication facilities
was also contributed in speeding up the data
processing where data entered in the regional
office was sent to the central office through
communication line and processed further for
getting national level data.
On the other hand, since sample enumeration
data was more complex than the complete enumeration
data, the entry of sample enumeration data was
performed in the central office and some selected
province regional offices. By decentralizing
data entry capabilities, the processing time
of 1990 Population Census was less than previous
censuses.
Distributing data entry work to regional offices
was also performed in processing 1995 Intercensal
Population Survey. Using a data entry
program prepared under ISSA system, provincial
regional offices entered data from questionnaires
directly into computer media. This capability
allowed CBS to finalize entering data of 260
000 documents in less than six months.
However, even though the decentralization of
the data entry has shown improvement in terms
of entering time, the transferring of data into
computer media is still critical in the processing
stages. This fact drives CBS to always
assess the possibility of using new data collection
system. And fortunately current OM OCR
facility has shown much improvement comparing
with its capability in 1970s. One advantage
is that there is no requirement for special
papers which was a mandatory in the old version.
In line with this improvement, in preparing
for the 2000 Population Census, currently CBS
with the assistance of JICA is studying the
possibility of using OCR/ OMR facilities in
the census. One implementation of the
study is the procurement of an OCR set which
includes an OCR software and scanner.
One OCR software that is being considered is
NCS ACCRA. Statistics Canada has redesigned
some of its capture, verification, and report
processing using NCS ACCRA. This system
has shown its capability in document scanning,
imaging and recognition technology processing
for millions of statistical source documents.
The recognition technologies that have been
implemented include optical character recognition
(OCR), Intelligent Character Recognition (ICR),
bar code, and multiple choice marks (Bookbinder,
1996). The procurement of this OCR/ OMR
set is expected to be final at the end of this
year, and studying of its capabilities can be
started in January 1998. Studying of this
OCR/ OMR set is one of the proposed applications
to be involved in the pilot project (Agenda
7). More detail about this proposal will
be discussed more in the separate paper.
Following up this study, for 2000 Population
Census, therefore, all Provincial and Kabupaten
regional offices in addition to the head office
will be equipped with OMR/OCR sets. To
allow capturing of about 200 million people'
data in three months, it is calculated that
there will be about 531 machines needed to be
installed.
Actually a data collection stage does not deal
only with data capturing, but also includes
the process of collecting data from respondents.
In this case, application of IT in the form
of GIS has indirectly made the task of data
collectors easier. CBS has been using
GI S for more than three years. Digital
sketch maps produced easily by GIS is very useful
for data collectors when they are on the held.
In the previous census, data collectors had
to draw maps from scratch before start collecting
data, where the availability of the GIS allows
collectors to use printed maps and make correction
when needed.
Up until now, questionanaires are alway mandatory
in data collection stages, for that reasons,
questionaires should be designed carefully.
Software with capabilities of color or graphics
especially desktop publishing software makes
the performance of the questionnaire much better.
4.
Data Processing
Data processing aim is to edit and to clean
raw data before they are tabulated. Processing
of population data was originally carried out
in the mainframe computer, at this method, data
was edited using validation program and tabulated
using tabulation programs. Concor programs
for data editing and imputation and Cocents
Program were heavily used at the processing
stage. Thank to US Bureau of the Census for
providing such systems for public use. The use
of Cocents programs have contributed in the
success of processing of all rounds of population
census and Intercensal Population.Survey.s.since.1971.
However; as the microcomputer technology was
improving very fast as well as its software,
the technology allowed CBS to work with a wide
variety of software in the microcomputer environment.
Therefore, CBS was not strictly dependent on
the programs mentioned above, but sometimes
CBS developed its custom-made editing and imputation
programs or used other public domain systems.
The availability of Integrated Microcomputer
Programming System (IMPS), and Integrated Systems
for Survey Analysis (ISSA) really supported
CBS in data processing stages. IMPS, developed
by the International Programs Center, US Bureau
of the Census, consists of complete modules
needed for data processing, namely, data dictionary,
data entry, edit & computation, frequency
& cross tabulation, and data capture management
and control. IMPS has always been improving
to meet statistical operation needs. In fact,
its new releases 4.1, IMPS will contain two
new modules: MapView; to view data in the form
of thematic maps and DataSort, to sort data
files (Dataline, August 1997).
ISSA, on the other hand, developed by Institute
for Resource Development Inc, was utilized in
processing of 1987, 1991, 1994 and 1997 SDKI
data as well as 1995 Intercensal Population
Survey (Supas). By using ISSA in microcomputer
environment, CBS can finalize processing of
about 30,000 documents of SDKI in about three
months and 260,000 documents of Supas in six
months.
In some cases, CBS developed its own custom
made editing program. For example, editing and
imputation programs for 1990 Population Census
were developed by CBS using a cobol compiler.
There are several look-up tables used in the
editing stages and that capability was not.
available at IMPS programs.
The availability public domain programs such
as MIPS supports distribution of the editing
work to regional offices. By distributing the
work, questionnaires will not be needed to be
sent to the central office, hence, it can save
money and more important, save time. In addition,
questionnaires can be kept closed to the respondent,
so, when regional office staff needs to contact
a respondent, he or she can easily do so.
The impact of IT to CBS' statistical operations
that should be noted is the use of networking.
Currently, computers at the central once and
several regional offices have been furnished
with LAN capability and the rest of regional
offices are on the way. Since the computers
also connected nationally through Internet facility,
cleaned data in the regional offices are sent
to the central office I through the Internet
facilities. With the wide spread use Internet
in Indonesia, some Internet Service Providers
(ISP) allows regional offices to utilize Internet
with a local charge. An ISP that is intensively
used by regional offices is Wasantara Net.
In the following we will discuss that besides
sending information to the central office, regional
offices also maintain their home pages in the
central office's server. This allows them to
maintain their own home pages.
5.
Data Dissemination
CBS realizes that data dissemination is very
important since the final output of a statistical
activity can reach its users if the method of
dissemination is correct and effective. This
is also true in disseminating population data.
At present, CBS disseminates data in four techniques.
The very common technique is printed publication
and has been used for a long time. Fortunately,
progressing in the microcomputer's graphics
presentation and word processing capabilities
contribute to the good appearance of the publication.
Furthermore, the subject matter people are able
to prepare the publication by themselves. This
is not only to improve the appearance of the
publication, but also to make the time. of preparation
of the publication could be decreased.
Better quality of printed publication is not
only done, by the subject matters in the central
office, but also in the Provincial regional
offices, and furthermore, in the Kabupaten regional
offices. Yearly Statistics could be published
by all regional offices in every levels in a
better and faster publication. For that reason,
all regional offices are equipped with suite
programs with various software capabilities
in addition to modern hardware facilities. To
handle these software and hardware facilities,
IT human resources are prepared continually
through formal IT training and also informal
development by publishing monthly IT newsletters,
monthly IT seminars, and consultancies.
The second technique of disseminations is to
also disseminate the image of publication in
the form of computer media. By procuring such
information, the user does not have to retype
particular tables if they want to process the
data with a computer analysis program.
The third method of dissemination is disseminating
individual data in computer media. Researchers
need this kind of data if they want to make
an intensive research to the data, more than
just tables given in the printed publication.
For that reason CBS provided various alternatives
storage media that can be chosen by users. The
alternative media are diskette, magnetic tapes,
optical discs and ZIPs.
The fourth technique of data dissemination
is using the Internet. In this technique, CBS
stores population data in the form of home pages
at its main server. At present, the address
of CBS' home page is http://www.bps.go.id
as shown in appendix-1. In this page, it is
shown that a user can access information about
CBS and its statistical information. Statistical
information are grouped into Population and
employment statistics, as well as other information,
namely, Social welfare statistics, Wage statistics,
Agricultural statistics, Industrial statistics,
Mining. Statistics, Energy statistics,. Construction
statistics, Foreign trade statistics, Transportation
and communication statistics, Price statistics,
National and regional account statistics.
We can see that various information is available
to Internet users around the world. They
can access the data directly from their place
directly in the easy way. That is why
there are so many users around the world utilize
this facility. In fact, in August 1997,
there were 108,945 hits to this home page with
the average of..3,514 hits per day. This
hits came from Indonesia (45%) and from other
countries, including Australia, US, Singapore,
and Japan.
The access to the population -data alone showed
that there were 192 hits to population data
in August 1997, population data itself includes
information about number of population by province,
growth, density, sex ratio, infant mortality
rate, total fertility rate, and life time migration.
In addition to the home pages prepared by CBS,
regional offices are asked to prepare their
own home pages, so that the burden of disseminating
data can be shared by the regional offices.
As seen in the appendix-3, users are allowed
to access regional offices' home pages. The
involvement of regional offices is supported
by Internet Service Providers such as Wasantara
Net that regional offices to access Internet
with local charge.
The other technique that is being developed
is CD-ROM. CD ROM becomes more popular in the
dissemination based of the fact that it can
hold a large amount of data and is very secured
compared with other media such as diskettes.
That is why CD-ROM will be an ideal solution
for disseminating population data in the future.
The portability and big capacity of CD ROM allow
put CD-ROM into the driver, and then the program
will guide him how to access the data. Hence,
in this meeting, CBS is to offer to develop
such system that allows a user to access data
in CD ROM (Agenda 7). Moreover, about this proposal
will be discussed more in the separate paper.
6.
Geographical Information System
The development of GIS in CBS was supported
by the availability of sketch maps of administrative
areas and census blocks produced in a population
census. Village maps and census block (CB) maps
were carried out at the early stage of census
fieldworks. In the 1971, 1980, and 1990 population
census, these maps were used as the basis for
sample selection and for estimation of documents
needed, but also used as guidance to the data
collectors in carrying out the household listing
and the census enumeration. Experiences in the
fieldwork of the 1980 census showed that many
villages and CB maps were needed to be updated
regularly.
CBS started its GIS by digitizing the administrative
areas in 1994 and this work was finalized in
the early 1997. Indonesia's administrative
areas are categorized into Province, Kabupaten
(Regency/ Munipacility), Kecamatan (districts),
and Villages. Therefore, the smallest
geographic unit in GIS village. Digitized base
maps were mainly based on sketch maps produced
in the 1990 Population Census. The sketch
maps do not have geographic coordination, at
this moment, GIS has administrative areas only.
Features such as buildings, roads, and rivers
are planned to be included.
Application of GIS is equipped with standard
administrative codes which will be the keys
for joining the spatial data and the attribute
statistical data. The population information
is stored in a large database and spatial data
are stored in a different place. Geographic
information (spatial data) is joined with the
attribute data (population data) using the administrative
area codes as the key. This will give
flexibility as to which administrative level
can be presented. Data can be presented
in a administrative levels down to the. village
level. GIS is useful for assisting in
descriptive analysis by providing an easy way
to visualize the spread of data on a particular
area.
For the 2000 Population Census, methods of
village mapping and the formation c Enumeration
Areas (EA) will be reviewed, considering that
the EA map is not only used for enumeration
of the census, but also for the construction
of sampling frames to be used for a future household
sample survey. Ideally, the EA. should have
as small as possible variations with each other
in terms of the number of households, and moreover
should have clear and permanent boundaries A
large variation in area and household size means
that the EAs would not be practical for use
as the lowest sampling unit.
7.
Concluding Remarks
In facing of the complexity and the expensive
of population statistics especially a population
census where the number of records to be processed
is very h.6ge, the application of technology
especially information. technology. should always
be explored. CBS has always explored the
possibility of this technology and in many cases
there is a number of progresse have been achieved.
At the end, the achievements will be benefited
the users of population data.
However, there are still many ways to improve
the performance of data collection, processing
and dissemination. In the data collection area
is to speed up the capturing of data. For that
reason, OMR/ OCR is being scrutinized to see
how it improves the performance of capturing
data of a bi census such as a population census.
In some areas of the data processing, there
is still a need to improve the way how t decentralize
the processing work so that hardware facilities
available can be utilized optimally. An dissemination
process is also needed to be improved, since
the technologies provide many possibilities
of improvements, whether in the use of CD ROM,
Internet, and others.
Reference
Bookbinder, Michael, A matter of Image, Statistics
Canada boosts Data Accuracy, Government, Computer
Magazine, November, 1996
Suharto, Sam, Innovative techniques in the 1970
Round of Population and Housing Censuses, Possibilities
for the Future, UN Technical Notes