The past decade has seen a quantum leap in
Information Technology (IT) which, coupled with
improved survey methodologies and procedures,
will greatly enhance data quality and timeliness
as well as reduce manpower needs. IT also opens
up new methods of data dissemination, which
facilitates analysis and wider circulation.
Responding to the new opportunities available,
the Department of Statistics has undertaken
a thorough re-examination of its Census and
population surveys in order to exploit IT possibilities
to its fullest potential. This paper will discuss
Singapore's plans to apply the latest IT applications
for efficient field operation and data processing
in the upcoming Census 2000. It begins with
a brief description of the technologies used
in previous censuses, followed by possible ways
to incorporate the latest IT innovations in
Census 2000. IT plans in data collection, data
processing and data dissemination are then discussed.
1A:
Use of Technology in Previous Population Censuses
and Surveys
The 1980 Census adopted the traditional method
of census-taking and used mainframe computers
extensively for data processing and tabulation.
The 1990 Census was a bold experiment in data
collection methodology. Administrative records
were merged through unique identification numbers,
reticulation of census districts was computerised
and a pre-census household database was created
to facilitate information collection from households.
An integrated database system was used to capture
and update data, while fieldwork was monitored
through hand-held computers. In 1995, the mid-decade
General Household Survey successfully adopted
Computer Assisted Telephone Interviewing (CATI)
as the main mode of data collection from households.
Census 2000 will most likely exploit IT innovations
and cutting edge technologies in data collection
and processing. Several interesting developments
are presently happening which makes this possible.
Firstly, the PC penetration rate has reached
about one-third of total households, many of
which have Internet access. With the government's
policy of increasing IT literacy and promoting
the use of the Internet in schools, PC and Internet
penetration levels are likely to be much higher
by 2000. Secondly, in line with Singapore's
vision of transforming the Republic into "an
intelligent island" by the year 2000, an island-wide
multimedia broadband network will be in place
by 1998. This nation-wide computer network will
give Singaporeans access to a wide range of
services like high-speed Internet access, teleshopping,
video-conferencing, entertainment-on-demand,
electronic libraries and government services
from the comfort of their homes.
1B:
Approach to Census 2000
The starting point of Census 2000 will be
the HR (Household Registration) Database which
is maintained by DOS. This database contains
basic particulars of all citizens and permanent
residents in Singapore. It is regularly updated
with records from various administrative sources.
Basic personal and some socio-demographic information
on individuals will be available for Census
2000. However, data items that are not available
in any government source (e.g. occupation and
transport mode) need to be collected during
census.
By further merging the HR Database with telephone
numbers and foreigner's data from the respective
authorities, a pre-census database will be formed.
This will act as a live database whereby it
is consistently updated as households respond
through the various data collection modes including
Internet submission, CATI, CAPI and mail or
fax back methods.
2:
DATA COLLECTION
Singapore can no longer afford to collect
data using the traditional approach of full
fieldwork enumeration. This is because of the
tight labour situation we are facing. The 2000
Census will learn from and advance the experiences
of the 1990 Census and the 1995 mini-Census
(General Household Survey) data collection methodologies.
The 1990 Census adopted the pre-census database
approach and collected other data through field
enumeration. The 1995 mini-Census exploited
IT further. Not only were records of individuals
extracted from administrative databases, they
were channelled to a Computer-Assisted Telephone
Interviewing (CATI) system. The 1995 mini-Census
is believed to be the first large-scale survey
in the region to be conducted with the help
of computers and telephones. The interviewing
process was re-engineered to improve the survey
operational efficiency and to protect the privacy
of the homes of respondents.
For the Census 2000, relevant data on individuals
from various sources, which are merged into
the HR Database, will be pre-printed onto Census
forms for verification by households. Only new
data items or those not available in HR Database
require responses from the households. This
will result in significant savings in time and
effort on the part of enumerators in form filling
and on the part of coders and data-entry operators.
2A: CATI
Instead of interviewing and collecting information
from the field for the 1995 mini-Census, data
were obtained through telephone interviews and
entered directly into the computer by the interviewers.
Simple editing checks were also built into the
system for direct on-line correction or verification
with the respondents. The need to verify particulars
with the respondents at a later date was greatly
minimised.
The CATI system in 1995 was built from scratch,
using Microsoft Visual Basic 3.0 (VB), together
with Microsoft Access 1.1 as the database engine.
Each PC was also fitted with a PhoneQuest card.
The PCs were connected on a Novell local area
network (LAN) with token ring architecture.
Each PC had access to its own set of files,
which were stored centrally on the LAN server.
Each interviewer was able to perform multiple
tasks of interviewing the respondent on the telephone,
enter the data into the computer, and at the same
time, correct obvious errors while still connected
to the respondent. This improves the quality of
the survey results and reduces the number of re-calls.
The CATI system had several other innovative features.
The more important ones are :
the dialling and scheduling
were automated by a computerised system. To
dial a household, all the interviewer need
do was to click the dialling button on the
screen, and the system searched the next available
telephone number, based on some priority rule.
The dialling was done by a PhoneQuest card
installed within each PC. If the call was
not answered, the system re-scheduled the
interview automatically to another session
and dialled up another household. If the interview
could not be completed, CATI allowed interviewers
to re-schedule the appointment to a date and
time preferred by the respondent.
the work allocation was
handled fully by the system. Each PC was
allocated a fixed number of cases to call
out each day. This allocation was done by
a built-in scheduler. If the workload could
not be cleared, the system re-scheduled the
remaining cases to the following day, based
on allocation rules.
it provided streamlined
questioning. The system prompted and guided
the interviewer to ask relevant questions,
based on responses entered earlier. The automatic
branching of questions skipped those that
do not pertain to certain categories of persons.
For example, a full-time student was asked
the level of education attending. The system
then skipped all questions on economic activity,
and prompted the interviewer to ask the question
on transport mode to school. This feature
is a great help to the interviewers. It also
ensured that all relevant data items were
answered by the respondents.
the interviewer need
not key in the description, other than for
occupation and industry items. For all other
data items, the interviewer simply selected
the appropriate descriptive responses from
a "pull-down" menu. There was also no need
to code, because once the data had been selected,
they were automatically coded at the front-end.
it considered the language
spoken by a household and assigned the case
to the appropriate interviewer.
The CATI method greatly reduced the printing
of voluminous questionnaires as well as time
and effort for filling up forms and coding.
The number of enumerators required and transport
cost were also significantly lower than if it
were conducted using the traditional method
of fieldwork. One important consideration in
using CATI was that close supervision could
be exercised to ensure good work, as the interviewers
were stationed in the office.
Reports on the number of households and persons
interviewed were generated daily to monitor
the progress of the survey. Statistics on the
duration of each interview indicated a higher
yield for CATI method, compared with fieldwork
interviewing. Less staff was required and the
interviewing time was shorter for CATI. The
time taken to interview a household of four
persons was about 20 minutes, a significant
saving of 33% from the 30 minutes taken for
the conventional field method.
For the 2000 Census, further innovative methods
and latest advances in IT would be applied to
help in the collection of census data. A tri-modal
data collection strategy of improved CATI for
households with listed telephone numbers, mail-out/mail-in-or-transmit-back
for those with unlisted telephone numbers, and
Computer-Assisted Personal Interviewing (CAPI)
for the non-responses will be adopted. In addition,
Computer-Assisted Self-Interviewing (CASI) will
also be used to allow the population to enter
their information through the electronic superhighway
without intervention of a census enumerator.
2B:
Internet Submission
The Department of Statistics would take opportunity
of the ever increasing popularity of the Internet
to collect data from households which have Internet
access. The households will be supplied with
passwords, which will enable them to enter the
Department's Census web-site, retrieve their
household record and input their individual
and household particulars. The data will then
be transmitted, via the Internet, to the Census
Office to update the live database.
This electronic data interchange (EDI) approach
would further alleviate the administrative burden
of respondents, reduce manpower required to
conduct the Census, improve statistical processing
time and further increase the efficiency of
internal operations. The Department views this
restructuring in data processing methodology
as necessary, in the light of new IT developments
and the technology push. What is required then
is for the Department to position Internet Submission
within the coherent system of computer-based
tools which have already been developed to increase
productivity and efficiency as well as improve
data timeliness.
2C:
Mail-Out/Mail-In-Or-Transmit Back Electronically
For households with unlisted telephone numbers,
the mail-out/mail-back approach can be adopted.
Forms with pre-printed personal particulars
could be sent out to the households. Apart from
households mailing the completed forms back,
we are studying several options. These include
the use of dedicated digitised fax machines
which read in the images and convert them to
codes, interactive multimedia, and of course
providing a hotline for such households to be
interviewed immediately through CATI.
2D: CAPI
The CAPI system could be used in the 2000
Census. The smaller group of enumerators could
each be equipped with a note-book computer to
enter information on the spot. The interviewing
process, including routing and checking, would
be guided by the program in the enumerator's
computer. This system of computer-assisted
personal interviewing allows for the integration
of various traditional steps, such as data collection,
data entry and data editing, into one interactive
cycle. Hence, a clean, machine readable record
directly after the completion of the interview
will be produced.
CAPI would also ensure streamlined questioning.
The automatic branching into relevant questions
would be of tremendous help to the interviewers.
Furthermore, it ensures that all relevant data
items are answered by the respondents. Selection
of appropriate descriptive responses from a
"pull-down" menu during interviewing eradicates
coding errors later in the data processing stage,
as these are automatically coded at the front-end.
3:
DATA PROCESSING
Owing to the huge number of documents involved
and the considerable amount of manpower time
that have to be devoted to handling them, the
traditional approach of processing data has
to be further improved upon. Further use of
IT in data processing would help alleviate the
manpower shortage problem and ensure speedy
and reliable results.
It is planned for data processing to be undertaken
concurrently with the data collection stage,
especially for data obtained through the CATI,
CAPI and the Internet. These systems would automatically
screen for obvious errors, omissions and glaring
inconsistencies during the interviewing stage
with the respondents, so that these can be corrected
on the spot. This process greatly reduces the
need for data entry operators during the data
processing stage, as evident in the 1995 mini-Census.
3A:
Imaging and Intelligent Character Recognition
Census forms returned through the "mail or
transmit-back" approach during Census 2000 would
be designed for direct imaging to create electronic
documents. Intelligent character recognition
will then be used to convert the responses for
each data item from an image into character
format, which can then be processed by the computer.
As much information as possible in the census
forms would be converted into computer files
with minimum human intervention. In addition
to being machine-readable, the census forms
would be designed to be "user-friendly".
OMR can be adopted to capture self-coded responses
where the number of possible answers to a question
are limited or a large proportion of responses
fall into a few categories. On the other hand,
CR can capture the remaining write-in responses
e.g. occupation, place of work and industry.
3B:
Automatic Coding
The 1990 Census and 1995 mini-Census made
use of automatic coding in the first instance
to code occupation and industry at detailed
levels. This process involves the matching of
the name of firm/organisation for industry coding
or occupational description for occupation coding
with computer data dictionaries. The industry
data dictionary contains the names of single-establishment
companies as well as multi-establishment companies
with at least 20 employees and having the specific
five-digit Singapore Standard Industrial Classification
(SSIC) codes. Common abbreviations or synonyms
of companies' names are added to the dictionary
to increase the matching rate.
The occupation data dictionary is created from
the Singapore Standard Occupational Classification
(SSOC) and contains occupational titles and
synonyms, alternative occupational titles and
other related terms. The coding system is designed
to bypass superfluous words and characters that
do not elaborate or explain the job content.
The SSOC data dictionary is enhanced as and
when new occupational titles and descriptions
or new synonyms and abbreviations are encountered,
so that the automatic coding rate can be improved
for subsequent rounds of matching.
3C:
Computer-Assisted Coding
In the 1990 Census and the 1995 mini-Census,
occupation or industry descriptions, which could
not be automatically coded by the system, were
batched for Computer-Assisted Coding (CAC).
This involved manual effort in searching for
the correct code associated with the descriptive
answer. For the coding of industry, the SSIC
data dictionary, which contains three fields,
namely, activity of company, main product of
company and the corresponding SSIC code, was
matched with the descriptions of the "product"
and the "activity/service" captured. For the
coding of occupation, the coders studied the
occupation description available to them together
with other pertinent information extracted for
each working person on the screen. They then
referred to a computerised alphabetical index
of occupational description through a "pull-down"
menu and selected the appropriate response.
Once selected, the system stored the 5-digit
SSOC code that corresponds to the description.
For the 2000 Census, these systems could well
serve as prototypes to be further improved upon.
The Batch-Editing sub-system to check intra-
and inter-record consistencies, the Housekeeping
sub-system to check for duplicate records and
the Derivation sub-system to derive information
not explicitly collected from the households
would also be enhanced.
3D:
Data Warehousing and Data Mining
Data warehousing concept will be used to
manage and store the vast quantity of data efficiently.
Database warehousing is a major driver in IT
presently and offers a data storage architecture
for collating, processing and managing data
from different sources and databases into a
single repository so that analysis can be performed
with a user-friendly interface.
With the data warehouse, related data could
be grouped into subject matter "data marts"
for easy access. Furthermore, data collected
from subsequent surveys or administrative sources
could be easily matched with Census records
for more detailed analysis and comparison. The
data warehouse also supports the use of multiple
processors in processing the vast volume of
data, speeding up the access of Census data
significantly.
A new tabulation software, FASTAB, would be
used to tabulate the massive amount of information
stored in the data warehouse. The Department
of Statistics, in collaboration with the Information
Technology Institute, is currently developing
FASTAB, which offers a user-friendly windows
interface to cross-tabulate data fields extremely
quickly. In addition, FASTAB provides good presentation
of tabulated data and enables automatic transfer
of tabulations into the Microsoft Office environment
for further manipulation and analysis.
Data mining tools will be used during the analysis
stage to automate the process of finding key
trends and results from the vast volume of data
collected in the Census. With the rapid changes
in IT technology, it will be prudent to keep
abreast of the latest development in new tools
and programs and to finalize the strategies
nearer the end of data processing stage.
4:
DATA DISSEMINATION
All the tabulations generated for Census
2000 will be of postscript quality to be printed
on desk-top laser printers for publication in
hard copies. This traditional paper publication
method will still retain its importance in providing
official statistics to a wide range of users.
However, advances in IT are providing more opportunities
for data dissemination. Census results can be
disseminated in other electronic media such
as diskettes and CD-ROMs. This will be of particular
interest to researchers.
Providing on-line access is a popular method
of information dissemination that is gaining
greater acceptance. Database containing census
data can be created to provide on-line access
to interested users. Subscribers of the Time
Series Retrieval and Dissemination (TREND) System,
which is a windows-based on-line system developed
by the Department of Statistics, are able to
obtain time-series data on economic and social
topics. Internet users would also be able to
access census data through the Statistics Singapore
Home Page.
5:
CONCLUSION
The Department of Statistics is continually
exploring ways to improve survey operations
and enhance the quality and timeliness of its
products and services. Wherever feasible,
IT advances are incorporated to achieve the
objectives. Census 2000 will showcase some of
the innovative solutions. These include database
merging, pre-printing of particulars on census
forms and the use of Internet, CATI and CAPI.
New IT tools are sought to enhance and strengthen
the data collection, processing and dissemination
processes, while keeping in perspective the
need to moderate cost increases and improve
data quality and timeliness.