UN Web Site | UN Web Site Locator
Home Site map Contact 
ESCAP Statistics Division
ESCAP Statistics Division
 
First Meeting    
The First Meeting of the Working Party on the Application of New Technology to Population Data
Bangkok, 24-26 September 1997

STAT/WPA.1/3.7
24 September 1997
ENGLISH ONLY

ECONOMIC AND SOCIAL COMMISSION FOR ASIA AND THE PACIFIC

Working Party on Application of  New Technology to Population Data
First Meeting
24-26 September 1997
Bangkok

Recent Developments in the Application of Information Technology (IT) to Population Data Collection, Processing and Dissemination
Department of Statistics, Singapore
Contents
1: Introduction 2: Data collection 3: Data processing 4: Data dissemination
5: Conclusion

1: INTRODUCTION

The past decade has seen a quantum leap in Information Technology (IT) which, coupled with improved survey methodologies and procedures, will greatly enhance data quality and timeliness as well as reduce manpower needs. IT also opens up new methods of data dissemination, which facilitates analysis and wider circulation.

Responding to the new opportunities available, the Department of Statistics has undertaken a thorough re-examination of its Census and population surveys in order to exploit IT possibilities to its fullest potential. This paper will discuss Singapore's plans to apply the latest IT applications for efficient field operation and data processing in the upcoming Census 2000. It begins with a brief description of the technologies used in previous censuses, followed by possible ways to incorporate the latest IT innovations in Census 2000. IT plans in data collection, data processing and data dissemination are then discussed.

1A: Use of Technology in Previous Population Censuses and Surveys

The 1980 Census adopted the traditional method of census-taking and used mainframe computers extensively for data processing and tabulation. The 1990 Census was a bold experiment in data collection methodology. Administrative records were merged through unique identification numbers, reticulation of census districts was computerised and a pre-census household database was created to facilitate information collection from households. An integrated database system was used to capture and update data, while fieldwork was monitored through hand-held computers. In 1995, the mid-decade General Household Survey successfully adopted Computer Assisted Telephone Interviewing (CATI) as the main mode of data collection from households.

Census 2000 will most likely exploit IT innovations and cutting edge technologies in data collection and processing. Several interesting developments are presently happening which makes this possible. Firstly, the PC penetration rate has reached about one-third of total households, many of which have Internet access. With the government's policy of increasing IT literacy and promoting the use of the Internet in schools, PC and Internet penetration levels are likely to be much higher by 2000. Secondly, in line with Singapore's vision of transforming the Republic into "an intelligent island" by the year 2000, an island-wide multimedia broadband network will be in place by 1998. This nation-wide computer network will give Singaporeans access to a wide range of services like high-speed Internet access, teleshopping, video-conferencing, entertainment-on-demand, electronic libraries and government services from the comfort of their homes.

1B: Approach to Census 2000

The starting point of Census 2000 will be the HR (Household Registration) Database which is maintained by DOS. This database contains basic particulars of all citizens and permanent residents in Singapore. It is regularly updated with records from various administrative sources. Basic personal and some socio-demographic information on individuals will be available for Census 2000. However, data items that are not available in any government source (e.g. occupation and transport mode) need to be collected during census.

By further merging the HR Database with telephone numbers and foreigner's data from the respective authorities, a pre-census database will be formed. This will act as a live database whereby it is consistently updated as households respond through the various data collection modes including Internet submission, CATI, CAPI and mail or fax back methods.

2: DATA COLLECTION

Singapore can no longer afford to collect data using the traditional approach of full fieldwork enumeration. This is because of the tight labour situation we are facing. The 2000 Census will learn from and advance the experiences of the 1990 Census and the 1995 mini-Census (General Household Survey) data collection methodologies.

The 1990 Census adopted the pre-census database approach and collected other data through field enumeration. The 1995 mini-Census exploited IT further. Not only were records of individuals extracted from administrative databases, they were channelled to a Computer-Assisted Telephone Interviewing (CATI) system. The 1995 mini-Census is believed to be the first large-scale survey in the region to be conducted with the help of computers and telephones. The interviewing process was re-engineered to improve the survey operational efficiency and to protect the privacy of the homes of respondents.

For the Census 2000, relevant data on individuals from various sources, which are merged into the HR Database, will be pre-printed onto Census forms for verification by households. Only new data items or those not available in HR Database require responses from the households. This will result in significant savings in time and effort on the part of enumerators in form filling and on the part of coders and data-entry operators.

2A: CATI

Instead of interviewing and collecting information from the field for the 1995 mini-Census, data were obtained through telephone interviews and entered directly into the computer by the interviewers. Simple editing checks were also built into the system for direct on-line correction or verification with the respondents. The need to verify particulars with the respondents at a later date was greatly minimised.

The CATI system in 1995 was built from scratch, using Microsoft Visual Basic 3.0 (VB), together with Microsoft Access 1.1 as the database engine. Each PC was also fitted with a PhoneQuest card. The PCs were connected on a Novell local area network (LAN) with token ring architecture. Each PC had access to its own set of files, which were stored centrally on the LAN server.

Each interviewer was able to perform multiple tasks of interviewing the respondent on the telephone, enter the data into the computer, and at the same time, correct obvious errors while still connected to the respondent. This improves the quality of the survey results and reduces the number of re-calls. The CATI system had several other innovative features. The more important ones are :
  • the dialling and scheduling were automated by a computerised system. To dial a household, all the interviewer need do was to click the dialling button on the screen, and the system searched the next available telephone number, based on some priority rule. The dialling was done by a PhoneQuest card installed within each PC. If the call was not answered, the system re-scheduled the interview automatically to another session and dialled up another household. If the interview could not be completed, CATI allowed interviewers to re-schedule the appointment to a date and time preferred by the respondent.
  • the work allocation was handled fully by the system. Each PC was  allocated a fixed number of cases to call out each day. This allocation was done by a built-in scheduler. If the workload could not be cleared, the system re-scheduled the remaining cases to the following day, based on allocation rules.
  • it provided streamlined questioning. The system prompted and guided the interviewer to ask relevant questions, based on responses entered earlier. The automatic branching of questions skipped those that do not pertain to certain categories of persons. For example, a full-time student was asked the level of education attending. The system then skipped all questions on economic activity, and prompted the interviewer to ask the question on transport mode to school. This feature is a great help to the interviewers. It also ensured that all relevant data items were answered by the respondents.
  • the interviewer need not key in the description, other than for  occupation and industry items. For all other data items, the interviewer simply selected the appropriate descriptive responses from a "pull-down" menu. There was also no need to code, because once the data had been selected, they were automatically coded at the front-end.
  • it considered the language spoken by a household and assigned the case to the appropriate interviewer.

The CATI method greatly reduced the printing of voluminous questionnaires as well as time and effort for filling up forms and coding. The number of enumerators required and transport cost were also significantly lower than if it were conducted using the traditional method of fieldwork. One important consideration in using CATI was that close supervision could be exercised to ensure good work, as the interviewers were stationed in the office.

Reports on the number of households and persons interviewed were generated daily to monitor the progress of the survey. Statistics on the duration of each interview indicated a higher yield for CATI method, compared with fieldwork interviewing. Less staff was required and the interviewing time was shorter for CATI. The time taken to interview a household of four persons was about 20 minutes, a significant saving of 33% from the 30 minutes taken for the conventional field method.

For the 2000 Census, further innovative methods and latest advances in IT would be applied to help in the collection of census data. A tri-modal data collection strategy of improved CATI for households with listed telephone numbers, mail-out/mail-in-or-transmit-back for those with unlisted telephone numbers, and Computer-Assisted Personal Interviewing (CAPI) for the non-responses will be adopted. In addition, Computer-Assisted Self-Interviewing (CASI) will also be used to allow the population to enter their information through the electronic superhighway without intervention of a census enumerator.

2B: Internet Submission

The Department of Statistics would take opportunity of the ever increasing popularity of the Internet to collect data from households which have Internet access. The households will be supplied with passwords, which will enable them to enter the Department's Census web-site, retrieve their household record and input their individual and household particulars. The data will then be transmitted, via the Internet, to the Census Office to update the live database.

This electronic data interchange (EDI) approach would further alleviate the administrative burden of respondents, reduce manpower required to conduct the Census, improve statistical processing time and further increase the efficiency of internal operations. The Department views this restructuring in data processing methodology as necessary, in the light of new IT developments and the technology push. What is required then is for the Department to position Internet Submission within the coherent system of computer-based tools which have already been developed to increase productivity and efficiency as well as improve data timeliness.

2C: Mail-Out/Mail-In-Or-Transmit Back Electronically

For households with unlisted telephone numbers, the mail-out/mail-back approach can be adopted. Forms with pre-printed personal particulars could be sent out to the households. Apart from households mailing the completed forms back, we are studying several options. These include the use of dedicated digitised fax machines which read in the images and convert them to codes, interactive multimedia, and of course providing a hotline for such households to be interviewed immediately through CATI.

2D: CAPI

The CAPI system could be used in the 2000 Census. The smaller group of enumerators could each be equipped with a note-book computer to enter information on the spot. The interviewing process, including routing and checking, would be guided by the program in the enumerator's computer. This  system of computer-assisted personal interviewing allows for the integration of various traditional steps, such as data collection, data entry and data editing, into one interactive cycle. Hence, a clean, machine readable record directly after the completion of the interview will be produced.

CAPI would also ensure streamlined questioning. The automatic branching into relevant questions would be of tremendous help to the interviewers. Furthermore, it ensures that all relevant data items are answered by the respondents. Selection of appropriate descriptive responses from a "pull-down" menu during interviewing eradicates coding errors later in the data processing stage, as these are automatically coded at the front-end.

3: DATA PROCESSING

Owing to the huge number of documents involved and the considerable amount of manpower time that have to be devoted to handling them, the  traditional approach of processing data has to be further improved upon. Further use of IT in data processing would help alleviate the manpower shortage problem and ensure speedy and reliable results.

It is planned for data processing to be undertaken concurrently with the data collection stage, especially for data obtained through the CATI, CAPI and the Internet. These systems would automatically screen for obvious errors, omissions and glaring inconsistencies during the interviewing stage with the respondents, so that these can be corrected on the spot. This process greatly reduces the need for data entry operators during the data processing stage, as evident in the 1995 mini-Census.

3A: Imaging and Intelligent Character Recognition

Census forms returned through the "mail or transmit-back" approach during Census 2000 would be designed for direct imaging to create electronic documents. Intelligent character recognition will then be used to convert the responses for each data item from an image into character format, which can then be processed by the computer. As much information as possible in the census forms would be converted into computer files with minimum human intervention. In addition to being machine-readable, the census forms would be designed to be "user-friendly".

OMR can be adopted to capture self-coded responses where the number of possible answers to a question are limited or a large proportion of responses fall into a few categories. On the other hand, CR can capture the remaining write-in responses e.g. occupation, place of work and industry.

3B: Automatic Coding

The 1990 Census and 1995 mini-Census made use of automatic coding in the first instance to code occupation and industry at detailed levels. This process involves the matching of the name of firm/organisation for industry coding or occupational description for occupation coding with computer data dictionaries. The industry data dictionary contains the names of single-establishment companies as well as multi-establishment companies with at least 20 employees and having the specific five-digit Singapore Standard Industrial Classification (SSIC) codes. Common abbreviations or synonyms of companies' names are added to the dictionary to increase the matching rate.

The occupation data dictionary is created from the Singapore Standard Occupational Classification (SSOC) and contains occupational titles and synonyms, alternative occupational titles and other related terms. The coding system is designed to bypass superfluous words and characters that do not elaborate or explain the job content. The SSOC data dictionary is enhanced as and when new occupational titles and descriptions or new synonyms and abbreviations are encountered, so that the automatic coding rate can be improved for subsequent rounds of matching.

3C: Computer-Assisted Coding

In the 1990 Census and the 1995 mini-Census, occupation or industry descriptions, which could not be automatically coded by the system, were batched for Computer-Assisted Coding (CAC). This involved manual effort in searching for the correct code associated with the descriptive answer. For the coding of industry, the SSIC data dictionary, which contains three fields, namely, activity of company, main product of company and the corresponding SSIC code, was matched with the descriptions of the "product" and the "activity/service" captured. For the coding of occupation, the coders studied the occupation description available to them together with other pertinent information extracted for each working person on the screen. They then referred to a computerised alphabetical index of occupational description through a "pull-down" menu and selected the appropriate response. Once selected, the system stored the 5-digit SSOC code that corresponds to the description.

For the 2000 Census, these systems could well serve as prototypes to be further improved upon. The Batch-Editing sub-system to check intra- and inter-record consistencies, the Housekeeping sub-system to check for duplicate records and the Derivation sub-system to derive information not explicitly collected from the households would also be enhanced.

3D: Data Warehousing and Data Mining

Data warehousing concept will be used to manage and store the vast quantity of data efficiently. Database warehousing is a major driver in IT presently and offers a data storage architecture for collating, processing and managing data from different sources and databases into a single repository so that analysis can be performed with a user-friendly interface.

With the data warehouse, related data could be grouped into subject matter "data marts" for easy access. Furthermore, data collected from subsequent surveys or administrative sources could be easily matched with Census records for more detailed analysis and comparison. The data warehouse also supports the use of multiple processors in processing the vast volume of data, speeding up the access of Census data significantly.

A new tabulation software, FASTAB, would be used to tabulate the massive amount of information stored in the data warehouse. The Department of Statistics, in collaboration with the Information Technology Institute, is currently developing FASTAB, which offers a user-friendly windows interface to cross-tabulate data fields extremely quickly. In addition, FASTAB provides good presentation of tabulated data and enables automatic transfer of tabulations into the Microsoft Office environment for further manipulation and analysis.

Data mining tools will be used during the analysis stage to automate the process of finding key trends and results from the vast volume of data collected in the Census. With the rapid changes in IT technology, it will be prudent to keep abreast of the latest development in new tools and programs and to finalize the strategies nearer the end of data processing stage.

4: DATA DISSEMINATION

All the tabulations generated for Census 2000 will be of postscript quality to be printed on desk-top laser printers for publication in hard copies. This traditional paper publication method will still retain its importance in providing official statistics to a wide range of users. However, advances in IT are providing more opportunities for data dissemination. Census results can be disseminated in other electronic media such as diskettes and CD-ROMs. This will be of particular interest to researchers.

Providing on-line access is a popular method of information dissemination that is gaining greater acceptance. Database containing census data can be created to provide on-line access to interested users. Subscribers of the Time Series Retrieval and Dissemination (TREND) System, which is a windows-based on-line system developed by the Department of Statistics, are able to obtain time-series data on economic and social topics. Internet users would also be able to access census data through the Statistics Singapore Home Page.

5: CONCLUSION

The Department of Statistics is continually exploring ways to improve survey operations and enhance the quality and timeliness of its products and  services. Wherever feasible, IT advances are incorporated to achieve the objectives. Census 2000 will showcase some of the innovative solutions. These include database merging, pre-printing of particulars on census forms and the use of Internet, CATI and CAPI. New IT tools are sought to enhance and strengthen the data collection, processing and dissemination processes, while keeping in perspective the need to moderate cost increases and improve data quality and timeliness.

Department of Statistics
Singapore
September 1997


 
Pop-IT project (1997-2001)
Project Objectives
Working Party Members
Working Party Meetings
First meeting, Bangkok, 24-26 September 1997
Second meeting, Singapore, 1-3 April 1998
Third meeting, Bali, 7-9 January 1999
Fourth meeting, Manila, 6-9 July 1999
Ffth meeting, Bangkok, 21 October 1999
Sixth meeting, Bangkok, 26 March 2001
Workshops
Application of New Information Technology to Population data, Bangkok, 12-20 October 1999
Population Data Analysis, Storage and Dissemination Technologies, Bangkok, 27-30 March 2001
Guidelines
Population data collection and capture (BBS - Statistics Indonesia)
GPS in modern mapping and GIS technologies to population data (Bangladesh Bureau of Statistics)
Population data dissemination (Statistics New Zealand)
Project Newsletter
Contact us
   
Copyright (c) 2013 ESCAP  |  Legal Notice