UN Web Site | UN Web Site Locator
Home Site map Contact 
ESCAP Statistics Division
ESCAP Statistics Division
 
First Meeting    
The First Meeting of the Working Party on the Application of New Technology to Population Data
Bangkok, 24-26 September 1997

STAT/WPA.1/3.3
24 September 1997
ENGLISH ONLY

ECONOMIC AND SOCIAL COMMISSION FOR ASIA AND THE PACIFIC

Working Party on Application of  New Technology to Population Data
First Meeting
24-26 September 1997
Bangkok

Recent Developments in the Information Technology to Population Data Collection, Processing, and Dissemination in the Central Bureau of Statistics of the Republic of Indonesia
Sihar Lumbantobing
Contents
  1. Introduction
  2. Population data in CBS
  3. Data Collection
  4. Data Processing
  1. Data Dissemination
  2. Geographical Information System
  3. Concluding Remarks
Reference

1. Introduction

The Central Bureau of Statistics of the Republic of Indonesia (CBS) has utilized information technology in processing population data since 1970s where a mainframe computer at that time was used to process 1971 Population Census data.. In fact, the computer was equipped with an optical mark reader (OMR) to speed up data capturing. The computer has been modernized ever since to more advanced computers from the ICL 1900 in 1970s, to NEC Acos 500 in 1980s and to NEC Acos 1502 in 1990s.

On the other hand, the technology of a microcomputer was developing very fast, hence, CBS also adapted the technology. In line with the coverage works of CBS where there were regional offices supporting a central office. the microcomputer technology enabled CBS to distribute data processing works to the regional offices. Downsizing of the volume computers and especially their prices supported CBS in decentralizing its processing by allocating computers in its 27 Provincial regional offices since 1980s and 304~Kabupaten regional offices since 1990s. And currently, those computers are being linked into Local Area Networks (LAN) in each province and also connected in a national network to have synergetic capability.

With such computer capabilities, all regional offices participated in processing 1990 Population Census complete enumeration data, and several selected regional offices also participated in processing sample enumeration data. Processing of the 1995 Intercensal Population Survey data was also done with the involvement of regional offices.

Impact of information technology on statistical activities shows that a great progress has been achieved in various stages of a population census or surveys activities, specifically in collecting, processing and disseminating of population data. However, with the increasing demand for a better quality and faster result of population statistics, processing of the data has become more complex. For that reason, CBS has always tried to scrutinize the possibility of the information technology in improving the statistical operation activities. In the following, we will discuss the recent development in the information technology to population data collection, processing, and dissemination, however, to give a background of discussion, we will firstly discuss about population data in CBS.

2. Population data in CBS

Population information is collected by CBS in several surveys, namely, Population Census, Intercensal Population Survey, Demographic and Health Survey, and Civil Registration. Either one of this survey, data is collected mostly from the household by asking the head of household about several characteristics related to the members of the household. Data collectors go to respondents' places, and ask for the information acquired in questionnaires. Questions are usually grouped into blocks, such as identity, household information, and. individually member information- blocks.

n the previous population censuses, there are two ways of collecting population information, namely, complete enumeration and sample survey enumeration. Complete enumeration collects basic information items such as name, sex, and age, while more detailed information such as relationship to the head of the household, education, fertility, mobility, and information on housing conditions are collected in the sample enumeration.' However, there is a plan for 2000 Population Census to collect data only in complete enumeration because most information in the past population censuses could not be presented for small administrative areas, except those that were collected through complete enumeration. On the other hand, the demand for such statistics has been increasing. To meet the demand, CBS concluded that core information, namely, sex, age, marital status, religion, citizenship, place of birth, education, labor force, and fertility should be collected in the complete enumeration of the 2000 Population Census, but the questions should be simplified. Detailed information about housing units, which were not very comprehensive in the past censuses, will be collected through a household survey.

A population census is governed under the Census Law No 7 Year 1960 which determines that population census should be conducted every ten years, in the year end with zero. Basically, planning of a population census is not done by CBS (alone but with the participation of other institutions. Therefore, census planning is guided by an interdepartmental census committee that gives directions regarding to the general objective of the census and is in charge with the design of census questionnaires, training manuals, census methodology, tabulation plan and data processing. Field tests and general rehearsals are a prerequisite before the whole census plan is finally made.

Information collected in the Intercensal Population Survey (Supas) is very similar to -information collected in the sample enumeration of the Population Census but Supas concern mainly on fertility and mortality. The same information is also collected in Indonesia Demographic Health Survey (SDK(). Hence, SDKI collects information on fertility, mortality, health, and family planning, for example, respondent background, birth history, fertility preferences, breast breeding, family planning, and employment.  1n addition some information such as maternal care, health and immunization of children under five years old is also collected in SDKI. Current SDKI is expanded to also collect information on knowledge on knowledge of AIDS and maternal mortality, as well as households expenditure, and service available for planning and health. 

Civil registration deals with information about vital events such as births, deaths, and migration and is reported to the village authorities. However, the quality of information collected in civil registration is very poor, so that information collected is basically for comparison purposes.

3. Data Collection

Actually, the use of information technology in capturing data by utilizing OMR has been started when processing the 1971 Population Census. However, there were some weaknesses of the technology that enforced CBS to abandon the technology. Among the main reasons for changing the technology were: the high cost of a good quality paper in the country which meet the factory requirements of the OMR machine, the lack of high precision printing facility in the country (Suharto, 1993).  Such technology required CBS to have questionnaires printed in other countries such as Australia. The other disadvantage is the compulsory to keep the paper neat and clean which is very difficult to be done by data collectors in the remote areas of Indonesia.

The emergence of electronic data entry stations that key data directly to the computer media was also one reason why CBS abandoned OMR.  In the 1980 population census; CBS utilized a I number of microcomputers dedicated as data entry machines in the central office. As the microcomputer was flourishing, the decentralization of computer capabilities allowed the data entry of the 1990 Population Census complete enumeration data to be done in the provincial regional offices. As a result, total number of populations can be announced by the President of the Republic of Indonesia in less than six months after the census date.  Besides of decentralization of data entering, the use of communication facilities was also contributed in speeding up the data processing where data entered in the regional office was sent to the central office through communication line and processed further for getting national level data.

On the other hand, since sample enumeration data was more complex than the complete enumeration data, the entry of sample enumeration data was performed in the central office and some selected province regional offices. By decentralizing data entry capabilities, the processing time of 1990 Population Census was less than previous censuses.

Distributing data entry work to regional offices was also performed in processing 1995 Intercensal Population Survey.  Using a data entry program prepared under ISSA system, provincial regional offices entered data from questionnaires directly into computer media.  This capability allowed CBS to finalize entering data of 260 000 documents in less than six months.

However, even though the decentralization of the data entry has shown improvement in terms of entering time, the transferring of data into computer media is still critical in the processing stages.  This fact drives CBS to always assess the possibility of using new data collection system.  And fortunately current OM OCR facility has shown much improvement comparing with its capability in 1970s.  One advantage is that there is no requirement for special papers which was a mandatory in the old version.

In line with this improvement, in preparing for the 2000 Population Census, currently CBS with the assistance of JICA is studying the possibility of using OCR/ OMR facilities in the census.  One implementation of the study is the procurement of an OCR set which includes an OCR software and scanner.  One OCR software that is being considered is NCS ACCRA. Statistics Canada has redesigned some of its capture, verification, and report processing using NCS ACCRA.  This system has shown its capability in document scanning, imaging and recognition technology processing for millions of statistical source documents.  The recognition technologies that have been implemented include optical character recognition (OCR), Intelligent Character Recognition (ICR), bar code, and multiple choice marks (Bookbinder, 1996).  The procurement of this OCR/ OMR set is expected to be final at the end of this year, and studying of its capabilities can be started in January 1998.  Studying of this OCR/ OMR set is one of the proposed applications to be involved in the pilot project (Agenda 7).  More detail about this proposal will be discussed more in the separate paper.

Following up this study, for 2000 Population Census, therefore, all Provincial and Kabupaten regional offices in addition to the head office will be equipped with OMR/OCR sets.  To allow capturing of about 200 million people' data in three months, it is calculated that there will be about 531 machines needed to be installed.

Actually a data collection stage does not deal only with data capturing, but also includes the process of collecting data from respondents.  In this case, application of IT in the form of GIS has indirectly made the task of data collectors easier.  CBS has been using GI S for more than three years.  Digital sketch maps produced easily by GIS is very useful for data collectors when they are on the held. In the previous census, data collectors had to draw maps from scratch before start collecting data, where the availability of the GIS allows collectors to use printed maps and make correction when needed.

Up until now, questionanaires are alway mandatory in data collection stages, for that reasons, questionaires should be designed carefully. Software with capabilities of color or graphics especially desktop publishing software makes the performance of the questionnaire much better.

4. Data Processing

Data processing aim is to edit and to clean raw data before they are tabulated. Processing of population data was originally carried out in the mainframe computer, at this method, data was edited using validation program and tabulated using tabulation programs.  Concor programs for data editing and imputation and Cocents Program were heavily used at the processing stage. Thank to US Bureau of the Census for providing such systems for public use. The use of Cocents programs have contributed in the success of processing of all rounds of population census and Intercensal Population.Survey.s.since.1971.

However; as the microcomputer technology was improving very fast as well as its software, the technology allowed CBS to work with a wide variety of software in the microcomputer environment. Therefore, CBS was not strictly dependent on the programs mentioned above, but sometimes CBS developed its custom-made editing and imputation programs or used other public domain systems.  The availability of Integrated Microcomputer Programming System (IMPS), and Integrated Systems for Survey Analysis (ISSA) really supported CBS in data processing stages.  IMPS, developed by the International Programs Center, US Bureau of the Census, consists of complete modules needed for data processing, namely, data dictionary, data entry, edit & computation, frequency & cross tabulation, and data capture management and control. IMPS has always been improving to meet statistical operation needs. In fact, its new releases 4.1, IMPS will contain two new modules: MapView; to view data in the form of thematic maps and DataSort, to sort data files (Dataline, August 1997).

ISSA, on the other hand, developed by Institute for Resource Development Inc, was utilized in processing of 1987, 1991, 1994 and 1997 SDKI data as well as 1995 Intercensal Population Survey (Supas). By using ISSA in microcomputer environment, CBS can finalize processing of about 30,000 documents of SDKI in about three months and 260,000 documents of Supas in six months.

In some cases, CBS developed its own custom made editing program. For example, editing and imputation programs for 1990 Population Census were developed by CBS using a cobol compiler. There are several look-up tables used in the editing stages and that capability was not. available at IMPS programs.

The availability public domain programs such as MIPS supports distribution of the editing work to regional offices. By distributing the work, questionnaires will not be needed to be sent to the central office, hence, it can save money and more important, save time. In addition, questionnaires can be kept closed to the respondent, so, when regional office staff needs to contact a respondent, he or she can easily do so.

The impact of IT to CBS' statistical operations that should be noted is the use of networking. Currently, computers at the central once and several regional offices have been furnished with LAN capability and the rest of regional offices are on the way. Since the computers also connected nationally through Internet facility, cleaned data in the regional offices are sent to the central office I through the Internet facilities. With the wide spread use Internet in Indonesia, some Internet Service Providers (ISP) allows regional offices to utilize Internet with a local charge. An ISP that is intensively used by regional offices is Wasantara Net.

In the following we will discuss that besides sending information to the central office, regional offices also maintain their home pages in the central office's server. This allows them to maintain their own home pages.

5. Data Dissemination

CBS realizes that data dissemination is very important since the final output of a statistical activity can reach its users if the method of dissemination is correct and effective. This is also true in disseminating population data.

At present, CBS disseminates data in four techniques. The very common technique is printed publication and has been used for a long time. Fortunately, progressing in the microcomputer's graphics presentation and word processing capabilities contribute to the good appearance of the publication. Furthermore, the subject matter people are able to prepare the publication by themselves. This is not only to improve the appearance of the publication, but also to make the time. of preparation of the publication could be decreased. 

Better quality of printed publication is not only done, by the subject matters in the central office, but also in the Provincial regional offices, and furthermore, in the Kabupaten regional offices.  Yearly Statistics could be published by all regional offices in every levels in a better and faster publication. For that reason, all regional offices are equipped with suite programs with various software capabilities in addition to modern hardware facilities. To handle these software and hardware facilities, IT human resources are prepared continually through formal IT training and also informal development by publishing monthly IT newsletters, monthly IT seminars, and consultancies.

The second technique of disseminations is to also disseminate the image of publication in the form of computer media. By procuring such information, the user does not have to retype particular tables if they want to process the data with a computer analysis program.

The third method of dissemination is disseminating individual data in computer media. Researchers need this kind of data if they want to make an intensive research to the data, more than just tables given in the printed publication. For that reason CBS provided various alternatives storage media that can be chosen by users. The alternative media are diskette, magnetic tapes, optical discs and ZIPs.

The fourth technique of data dissemination is using the Internet. In this technique, CBS stores population data in the form of home pages at its main server. At present, the address of CBS' home page is http://www.bps.go.id as shown in appendix-1. In this page, it is shown that a user can access information about CBS and its statistical information. Statistical information are grouped into Population and employment statistics, as well as other information, namely, Social welfare statistics, Wage statistics, Agricultural statistics, Industrial statistics, Mining. Statistics, Energy statistics,. Construction statistics, Foreign trade statistics, Transportation and communication statistics, Price statistics, National and regional account statistics.

We can see that various information is available to Internet users around the world.  They can access the data directly from their place directly in the easy way.  That is why there are so many users around the world utilize this facility.  In fact, in August 1997, there were 108,945 hits to this home page with the average of..3,514 hits per day.  This hits came from Indonesia (45%) and from other countries, including Australia, US, Singapore, and Japan.

The access to the population -data alone showed that there were 192 hits to population data in August 1997, population data itself includes information about number of population by province, growth, density, sex ratio, infant mortality rate, total fertility rate, and life time migration.

In addition to the home pages prepared by CBS, regional offices are asked to prepare their own home pages, so that the burden of disseminating data can be shared by the regional offices. As seen in the appendix-3, users are allowed to access regional offices' home pages. The involvement of regional offices is supported by Internet Service Providers such as Wasantara Net that regional offices to access Internet with local charge.

The other technique that is being developed is CD-ROM. CD ROM becomes more popular in the dissemination based of the fact that it can hold a large amount of data and is very secured compared with other media such as diskettes. That is why CD-ROM will be an ideal solution for disseminating population data in the future. The portability and big capacity of CD ROM allow put CD-ROM into the driver, and then the program will guide him how to access the data. Hence, in this meeting, CBS is to offer to develop such system that allows a user to access data in CD ROM (Agenda 7). Moreover, about this proposal will be discussed more in the separate paper.

6. Geographical Information System

The development of GIS in CBS was supported by the availability of sketch maps of administrative areas and census blocks produced in a population census. Village maps and census block (CB) maps were carried out at the early stage of census fieldworks. In the 1971, 1980, and 1990 population census, these maps were used as the basis for sample selection and for estimation of documents needed, but also used as guidance to the data collectors in carrying out the household listing and the census enumeration. Experiences in the fieldwork of the 1980 census showed that many villages and CB maps were needed to be updated regularly.

CBS started its GIS by digitizing the administrative areas in 1994 and this work was finalized in the early 1997.  Indonesia's administrative areas are categorized into Province, Kabupaten (Regency/ Munipacility), Kecamatan (districts), and Villages.  Therefore, the smallest geographic unit in GIS village. Digitized base maps were mainly based on sketch maps produced in the 1990 Population Census.  The sketch maps do not have geographic coordination, at this moment, GIS has administrative areas only.  Features such as buildings, roads, and rivers are planned to be included.

Application of GIS is equipped with standard administrative codes which will be the keys for joining the spatial data and the attribute statistical data. The population information is stored in a large database and spatial data are stored in a different place. Geographic information (spatial data) is joined with the attribute data (population data) using the administrative area codes as the key.  This will give flexibility as to which administrative level can be presented.  Data can be presented in a administrative levels down to the. village level.  GIS is useful for assisting in descriptive analysis by providing an easy way to visualize the spread of data on a particular area.

For the 2000 Population Census, methods of village mapping and the formation c Enumeration Areas (EA) will be reviewed, considering that the EA map is not only used for enumeration of the census, but also for the construction of sampling frames to be used for a future household sample survey. Ideally, the EA. should have as small as possible variations with each other in terms of the number of households, and moreover should have clear and permanent boundaries A large variation in area and household size means that the EAs would not be practical for use as the lowest sampling unit.

7. Concluding Remarks

In facing of the complexity and the expensive of population statistics especially a population census where the number of records to be processed is very h.6ge, the application of technology especially information. technology. should always be explored.  CBS has always explored the possibility of this technology and in many cases there is a number of progresse have been achieved.  At the end, the achievements will be benefited the users of population data.

However, there are still many ways to improve the performance of data collection, processing and dissemination. In the data collection area is to speed up the capturing of data. For that reason, OMR/ OCR is being scrutinized to see how it improves the performance of capturing data of a bi census such as a population census.

In some areas of the data processing, there is still a need to improve the way how t decentralize the processing work so that hardware facilities available can be utilized optimally. An dissemination process is also needed to be improved, since the technologies provide many possibilities of improvements, whether in the use of CD ROM, Internet, and others.

Reference
Bookbinder, Michael, A matter of Image, Statistics Canada boosts Data Accuracy, Government, Computer Magazine, November, 1996
Suharto, Sam, Innovative techniques in the 1970 Round of Population and Housing Censuses, Possibilities for the Future, UN Technical Notes

 
Pop-IT project (1997-2001)
Project Objectives
Working Party Members
Working Party Meetings
First meeting, Bangkok, 24-26 September 1997
Second meeting, Singapore, 1-3 April 1998
Third meeting, Bali, 7-9 January 1999
Fourth meeting, Manila, 6-9 July 1999
Ffth meeting, Bangkok, 21 October 1999
Sixth meeting, Bangkok, 26 March 2001
Workshops
Application of New Information Technology to Population data, Bangkok, 12-20 October 1999
Population Data Analysis, Storage and Dissemination Technologies, Bangkok, 27-30 March 2001
Guidelines
Population data collection and capture (BBS - Statistics Indonesia)
GPS in modern mapping and GIS technologies to population data (Bangladesh Bureau of Statistics)
Population data dissemination (Statistics New Zealand)
Project Newsletter
Contact us
   
Copyright (c) 2013 ESCAP  |  Legal Notice