|
|
|
|
| Workshop on
Application of New Information Technology to Population
Data |
| Bangkok,
12-20 October 1999 |
STAT/WNIT/Rep
16 June 2000
ENGLISH ONLY
ECONOMIC AND SOCIAL COMMISSION FOR ASIA AND THE
PACIFIC
Workshop on Application of New Information
Technology to Population Data
12-20 October 1999
Bangkok |
| Report on
the Workshop on Application of New Information
Technology to Population Data |
The designations employed and the presentation
of the material in this report do not imply the
expression of any opinion whatsoever on the part
of the Secretariat of the United Nations concerning
the legal status of any country, territory, city
or area, or of its authorities, or concerning
the delimitation of its frontiers or boundaries.
Mention of any firm, licensed process or product
does not imply endorsement by the United Nations.
This report has been issued without formal editing.
|
| Contents |
Abbreviations
and Descriptions
- ORGANIZATION
OF THE WORKSHOP
- Attendance
- Opening
of the Workshop
- Workshop
arrangements
- Documentation
- INTRODUCTION
TO INFORMATION TECHNOLOGY IN CENSUS OPERATIONS
- Project
RAS/96/P12
- Opening
of the Workshop
- Census
processes
- Technology
applied in recent censuses and surveys
- IT
trends
- Quality
management
- Expectations
for the Workshop
- PAPER
BASED DATA COLLECTION AND CAPTURE
- Optical
Mark Recognition (OMR)
- Demonstration
of Optical Mark Reader (OMR)
- Optical
Character Recognition (OCR)
- OCR
technology for the Indonesian Census
2000
- Demonstration
of OCR cluster
- New
Zealand experience in 1996
- Observations
and recommendations on OCR
- Archiving
of census forms
- NON-PAPER
BASED DATA COLLECTION AND CAPTURE
- Computer
Assisted Telephone Interviewing
- Internet
and CATI in Singapore Census 2000
- Computer
Assisted Personal Interviewing
- IMPLICATIONS
FOR THE GUIDELINES ON THE APPLICATION OF NEW
TECHNOLOGY TO POPULATION DATA COLLECTION AND
CAPTURE
- ADDING
VALUE TO CENSUS DATA THROUGH DATA WAREHOUSING
AND DATA MINING
- DATA
DISSEMINATION
- Implications for the guidelines on the Application of New Information Technology to Population Data Dissemination
- GEOGRAPHIC
INFORMATION SYSTEMS
- Implications
for the guidelines on the Application
of Geo-Positioning Systems and Geographic
Information Systems for Digital Mapping
and Statistical Management
- RECOMMENDATIONS
OF THE WORKSHOP
- General,
IT management
- Data
collection and capture
- Guidelines
- Data
warehousing, databases, data archiving
- Data
Dissemination
- Mapping
and GIS
- Follow
up
Annex I: List
of Participants
Annex
II: Tentative Time Schedule
Annex
III: List of Documents |
| ABBREVIATIONS
AND DESCRIPTIONS |
| AFPS Pro |
|
Comprehensive application
for high-volume forms processing based on
advanced imaging technologies. |
| ArcInfo |
|
Comprehensive GIS software
for a variety of computing environments. |
| ArcView |
|
Desktop mapping and GIS
software. |
| Blaise |
|
A survey data collection
and processing system. |
| CAPI |
|
Computer Assisted Personal
Interviewing. |
| CARS |
|
Classifications and Related
Systems. |
| CATI |
|
Computer Assisted Telephone
Interviewing. |
| CGI |
|
Common Gateway Interface
facilitating dynamic content provision from
web servers to client computers. |
| CSV |
|
format "Comma Separated
Value" format. An ASCII file that
is commonly used as an intermediate format
when transferring files between databases
and spreadsheets of different makes.
Values are enclosed in quotation marks and
separated by commas |
| dpi |
|
dots per inch. |
| EA |
|
Enumeration Area. |
| FLY (fly) |
|
C program that creates
GIF image files on the fly from CGI and
other programs. |
| GIS |
|
Geographic Information
System. |
| GPS |
|
Global Positioning System. |
| HTML |
|
HyperText Markup Language. |
| ICR |
|
Intelligent Character
Recognition. |
| IMPS |
|
Integrated Microcomputer
Processing System. |
| IT |
|
Information technology. |
| KFI |
|
Keying-from-image. |
| KFP |
|
Keying-from-paper. |
| LAN |
|
Local Area Network. |
| MapInfo |
|
Software product for mapping,
data visualization and GIS. |
| NCS Nestor Reader |
|
Development tool for building
forms processing or automatic data capture/entry
applications. |
| NSO(s) |
|
National Statistical Office(s). |
| OCR |
|
Optical Character Recognition. |
| OLAP |
|
Online Analysis Processing. |
| OMR |
|
Optical Mark Recognition/Reader. |
| PC |
|
Personal computer. |
| PDF |
|
Portable Document Format. |
| PopMap |
|
Integrated geographical
software providing maps and a graphics database. |
| PQM |
|
Process Quality Management. |
| SAS |
|
Statistical Analysis Software. |
| SIAP |
|
Statistical Institute
for Asia and the Pacific. |
| SPSS |
|
Statistical Package for
Social Sciences. |
| SQL |
|
Structured Query Language. |
| SQM |
|
Statistical Quality Management. |
| SuperCROSS |
|
Fast cross-tabulation
software. |
| SuperMAP |
|
Mapping software. |
| TCDC |
|
Technical Cooperation
among Developing Countries. |
| TIFF format |
|
Tag Image File Format. |
| TREND |
|
Time Series Retrieval
and Dissemination Database. |
| UNFPA |
|
United Nations Population
Fund. |
| UNFPA/CST |
|
United Nations Population
Fund /Country Support Team. |
|
|
|
|
| I.
ORGANIZATION OF THE WORKSHOP |
| A.
Attendance |
| 1. |
The Workshop on Application of New Information
Technology to Population Data, funded by the
United Nations Population Fund (UNFPA) under
the project RAS/96/P12, was held in Bangkok
from 12 to 20 October 1999. It was organized
by the secretariat of the Economic and Social
Commission for Asia and the Pacific of the United
Nations (ESCAP) with active support of the Working
Party on the Application of New Technology to
Population Data.
|
| 2. |
The Workshop was attended by thirty-one participants
from nineteen selected countries/areas in the
Asian and Pacific region: Bangladesh; Fiji;
Hong Kong, China; India; Indonesia; Islamic
Republic of Iran; Kazakhstan; Malaysia; Maldives;
Mongolia; Myanmar; Nepal; Pakistan; Philippines;
Republic of Korea; Samoa; Sri Lanka; Thailand
and Viet Nam.
|
| 3 |
The members of the Working Party,
consisting of nine experts from Australia; Bangladesh;
Indonesia; Japan; Macao, China; New Zealand; Philippines;
Singapore and Thailand; and representatives of
the Statistical Institute for Asia and the Pacific
(SIAP), and UNFPA Country Support Teams for East
Asia, and Central and South Asia participated
as resource persons. Invited private sector
companies also participated as observers and made
presentations. |
| 4. |
The list of participants is
attached as Annex I. |
| B.
Opening of the Workshop |
| 5. |
The Workshop was inaugurated
by Ms Kayoko Mizuta, the Deputy Executive Secretary
of ESCAP. In her opening statement, Ms Mizuta
welcomed the participants and thanked the donor
agency and resource persons for the role and the
commitment they played in the organization and
funding of the Workshop. She appreciated
the cooperation extended by private sector organizations
to the Workshop. She noted that the Workshop
was one of the outputs of the ESCAP project RAS/96/P12
and that it was organized under the guidance of
the Working Party on the Application of New Technology
to Population Data. Apart from the Workshop,
other major outputs of the Working Party included
three guidelines on (a) population data collection
and capture; (b) modern mapping and GIS; and (c)
population data dissemination. |
| 6. |
In noting the benefits of new
technology to statistical services in the region,
Ms Mizuta emphasized the role information technology
(IT) played in reducing costs of census and survey
operations. While it was not possible to
present the full spectrum of technological innovations
in just one Workshop, she hoped that, by sharing
information and experiences in significant areas
of IT, participants would enrich and further improve
their understanding of new technologies relevant
for their operations. Ms Mizuta closed her opening
statement by highlighting that the Workshop materials
would be made available through the project web
site and by wishing the Workshop success. |
| C.
Workshop arrangements |
| 7. |
The Workshop noted that the
time schedule (see Annex II) prepared by the secretariat
was based on the tentative agenda, and agreed
to proceed accordingly in six modules as follows:
|
| Module |
Organizer |
| 1. |
Introduction to IT in
census operations |
ESCAP secretariat |
| 2. |
Paper based data collection
and capture |
Indonesia and Japan |
| 3. |
Non-paper based data collection
and capture |
Singapore and Australia |
| 4. |
Adding value to census
data through data warehousing and data mining |
ESCAP secretariat |
| 5. |
Data dissemination |
New Zealand |
| 6. |
Geographic information
systems |
Bangladesh |
|
| 8. |
The Workshop acknowledged with
thanks the following presentations and support
by private sector companies: |
| Topic |
|
Presenter |
| 2.3 |
Is OMR technology still
feasible? |
|
DRS Data and Research
Services plc United Kingdom |
| 2.4 |
Census Success Story:
US Census |
|
Kodak (United States) |
| 2.6 |
Imaging for Census Data
Capture |
|
Kodak Philippines Ltd. |
| 2.8 |
Demonstration of pilot
application in Statistics Indonesia (hardware
support) |
|
Fujitsu, Thailand |
| 2.9 |
Integrated demonstration
on forms |
|
Co-ordinated by Scientific
Digital Business, Thailand |
|
- Forms capture |
|
Kodak |
|
- Forms recognition |
|
Top Image Systems. |
| 4.1 |
Data werehouse implementation
approach and methodology |
|
Unisys Thailand Ltd. |
| 4.2 |
SAS approach and fitness
to data warehouse processes |
|
SAS Institute Pte Ltd,
Bangkok, Thailand |
| 4.3 |
SAS demonstration |
|
SAS Institute Pte Ltd,
Bangkok, Thailand |
| 6.2 |
Production of quality
maps for censuses |
|
Kevron Pty. Ltd, Australia |
|
| D.
Documentation |
| 9. |
The documents presented at the
Workshop are listed in Annex III to the report. |
| II.
INTRODUCTION TO INFORMATION TECHNOLOGY IN CENSUS
OPERATIONS |
|
|
| A.
Project RAS/96/P12 |
|
|
| 10. |
The Workshop noted the extensive
activities and outputs of the UNFPA-funded project
RAS/96/P12, entitled the Application of New Technology
in Population Data Collection, Processing, Dissemination
and Presentation, and its Working Party on Application
of New Technology to Population Data. The
project had been initiated in April 1997 with
the objective of improving the capabilities of
member and associate member countries/areas of
ESCAP in the application of modern information
technology (IT) in population statistics production
and dissemination. |
| 11. |
The Workshop reiterated the
importance of providing valid, reliable and timely
data for developing population policies and programmes.
The application of modern IT would be more important
than ever in achieving that goal. |
| 12. |
It was noted that the ability
to exploit modern IT varied greatly in the region,
but that diversity also offered an opportunity
for intra-regional cooperation. Thus, the
basic thrust of the project was to share the experiences
of NSOs that had made significant progress in
exploiting new technology. At the beginning
of project implementation, a Working Party was
established with experts from nine countries to
identify priorities, to provide guidance in the
systematic application of IT, to consolidate the
experience of the countries and to share those
experiences within the region. |
| 13. |
Since 1997, the Working Party
had met four times to identify and discuss the
topics of principal interest to the project.
Each meeting had focused on one of the technology
areas for which members had contributed a large
number of technical papers. Other project
outputs included self-contained guidelines on
the application of new technology to three important
aspects of census processing, namely (a) population
data collection and capture; (b) mapping and geographic
information systems; and (c) population data dissemination.
The Working Party also guided the implementation
of three pilot projects under RAS/96/P12, one
each by the NSOs of Bangladesh, Indonesia and
Philippines, to test such new technologies.
Each project would produce a report at the Workshop
describing the technologies piloted and experiences
gained. |
| 14. |
The Workshop noted that further
outputs of the project included five newsletters,
a web site containing documents of the Working
Party meetings, an awareness package to promote
effective and efficient utilization of IT in population
census and survey processing, and a survey on
the application of IT within the region. |
|
|
| B.
Objective of the Workshop |
|
|
| 15. |
The participants noted that
the overall objective of the Workshop was to sensitize
participants to the opportunities that modern
information technology provided in population
data operations. Immediate objectives of
the Workshop were (a) to provide information that
would improve the basic understanding of new technologies
relevant to population censuses and surveys; (b)
to discuss advantages and constraints of important
new information technologies; (c) to consider
strategic implications that information technology
would have on the planning, conduct and processing
of population censuses and surveys; and (d) to
facilitate the understanding of the overall role
of new technology in conducting censuses and surveys. |
|
|
| C.
Census processes |
|
|
| 16. |
The Workshop reviewed major
processes and activities associated with the conduct
of censuses or large-scale population surveys.
Three distinct phases were identified. The
pre-enumeration stage included census planning,
census organization, questionnaire design, forms
and manuals drafting, cartography, publicity,
data processing system design and development,
and the conduct of the pilot census. The
census planning entailed obtaining legal and financial
support from the Government, estimating resource
requirements, preparing budgets and scheduling
the event. The census organization established
central and field offices, created national and
regional committees and co-ordinated with other
Government offices. The questionnaire design
required dialogue with potential users and was
a precursor to developing the tabulation plan.
The questionnaire, forms, manuals and the data
processing system were tested during the pilot
census. The enumeration stage included the
recruitment and training of field workers, the
establishing of house listings, the actual enumeration
and the post-enumeration survey. The post-enumeration
stage included the data processing from data capture
to final tabulations, the analysis of results,
the evaluation of the census process, and the
dissemination of reports. |
| 17. |
The Workshop noted that, during
the previous round of censuses, countries of the
region had needed from 3 to 7 years in order to
complete a census programme from the initial planning
stage until the basic results were disseminated. |
|
|
| D.
Technology applied in recent censuses and surveys
|
|
|
| 18. |
The Workshop reviewed the results
of the ESCAP Survey on Application of New Technology
in Population Data Collection, Processing and
Dissemination, conducted in April 1998.
The questionnaire had been sent to 56 national
statistical offices in the Region and 29 responses
were returned. The report was published
as document STAT/WNIT/1 and was made available
to the participants of the Workshop. |
| 19. |
The survey had revealed a broad
infrastructure gap among the countries of the
region. Technologically advanced offices
provided network-connected PCs for every staff
member, including individual e?mail addresses
and instant Internet connections. Offices
with the weakest IT infrastructure had practically
no internal or global network connectivity available
for general use and as many as 15 persons had
to share a PC. |
| 20. |
According to the Survey, on
average it took 17 months from the beginning of
data collection to the tabulation and analysis
of results. In some cases, up to four years
were needed. |
| 21. |
The Workshop noted that technologically
advanced NSOs developed applications in-house
and used IT across all operations. Such
custom-made applications were typically developed
in areas of data scrutiny, data editing, data
estimation and tabulation, whereas data analysis
was usually conducted with commercially available
statistical software packages. Overall,
a significant use was indicated of off-the-shelf
software packages, but there was no significant
difference in the prevalence of brand names between
developed and developing countries. |
|
|
| E.
IT trends |
|
|
| 22. |
The Workshop reviewed recent
trends in information technology and noted that
hardware and software developments produced data
processing systems with ever increasing power,
capacity and complexity which at the same time
had become easier to use and cheaper to acquire. |
| 23. |
Chip processing speeds commonly
available were 400 MHz or better, while RAM sizes
mostly exceeded 32 MB. Together with graphics
accelerators and other technical features, that
configuration translated into substantial processing
power which in turn was a basis for the development
of increasingly capable software systems.
Disk storage systems of 6 GB or more and with
random access times of a few nanoseconds came
as standard equipment with current desktop computers
and were sufficient to store the entire census
data files for a medium size country of 100 million
people. Optical storage media with 5 to
18 GB capacities were readily available and could
be used for the long-term storage of census data.
Processing and storage/retrieval speed was no
longer a constraint when scheduling the data processing
operations. Rather, delays caused by slow
human interventions were very often responsible
for the overall processing elapsed time. |
| 24. |
Various versions of the Microsoft
Windows operating system were currently being
used on a large majority of all desktop computers.
General purpose and dedicated software were widely
available for the Windows platform, some obtainable
at low cost or no cost at all, and sufficed to
manage most data processing tasks at the statistical
office. |
| 25. |
While individual desktop computers
had already a substantial and often sufficient
processing power, using local area networks with
a dedicated file server enhanced further the efficiency
of the entire operation by pooling resources,
reducing or eliminating redundancies, and centrally
managing common tasks such as data back-up.
Where infrastructure permitted, wireless communications
were becoming an important tool for the interfacing
between various computer components. The
Internet with features such as e?mail and World
Wide Web had gained importance firstly for the
dissemination of information about the statistical
office and its products and secondly for collecting
data from respondents. |
| 26. |
Thus, virtually all phases of
the census process could benefit from the latest
technologies. Those would include project
planning software, geographic information systems,
paperless data capture methods, scanning with
mark, character and intelligent recognition techniques,
automatic or computer assisted coding and editing
methods, metadata systems, CD/DVD and Internet/World
Wide Web media, etc. |
|
|
| F.
Quality management |
|
|
| 27. |
The Workshop noted that quality
control during all census phases posed a major
challenge from data collection to data validation
and editing, tabulation and dissemination.
Process quality management (PQM) focused on careful
planning and efficient implementation of the census
process, including human resource management and
the management of production means. Statistical
quality management (SQM) related to the management
of the metadata database and the integrity of
the data during the entire process of transformation
from raw data to publishable micro databases and
statistical tables. A better quality of
the end product would assure greater user satisfaction. |
| 28. |
The Workshop noted further that
quality management issues were often underestimated.
The introduction of new technologies could provide
an opportunity to give special consideration to
the application of quality management principles
for the entire census operations. Census
managers were urged to assess each new application
in respect of its potential capability to control
process as well as statistical qualities.
They also needed to assess the impact of the new
technology to noncomputerized statistical, management
and administrative processes and organization
structures. However, as each application
could interfere with others, special attention
to interoperability needed to be paid. |
| 29. |
The Workshop considered that
many new technologies might be presented during
the course of the Workshop that would be of interest
to IT management involved in the planning and
processing of the forthcoming census. This
wealth of new information posed another considerable
challenge to IT management who would be required
to select a combination of IT solutions that fits
the existing infrastructure. In that selection
process, IT management should not overlook the
effect those new technology solutions would have
on the ability to maintain or improve both process
and statistical quality management. |
|
|
| G.
Expectations for the Workshop |
|
|
| 30. |
The participants were invited,
based on the agenda and without having yet heard
the presentations, to rate their interest in the
various Workshop topics. Six work groups
were created to deliberate on the question.
The findings for each group were presented to
the other participants. It appeared that
Module 2, paper based data collection and capture,
received the highest interest from participants,
probably due to the proximity for many countries
of the next census date prior to which solutions
needed to be found soon. The respondents
also expressed high interest in the topics of
dissemination and geographic information systems.
However non-paper based data capture methods and
data warehousing received lower advance interest,
probably because those technologies required sufficiently
developed infrastructure and general technological
advancement which only the most advanced countries
had. |
| 31. |
The Workshop agreed that one
of the important expectations for the 2000 rounds
of censuses was to significantly reduce the time
needed for the entire census process, from planning
to final reporting, by employing some of these
new technologies in the various stages of census
data processing. Also, the final quality
of processed data could be improved by better
quality control throughout the process.
Furthermore, a wider and more targeted audience
could be reached by employing better dissemination
methods utilizing effective application of IT.
Significant quality and timeliness gains could
be achieved by improving data collection and capture
methods and much effort could be spared when preparing
census maps by using Geographic Information Systems.
Finally, where possible, increased use of the
Internet, including the World Wide Web, showed
great promise for more efficient information exchange. |
| 32. |
However, the Workshop emphasized
that individual countries would have to consider
the level of local infrastructure and resource
availability when deciding on the use of any of
the available technologies. The availability
of technical support and maintenance were of crucial
importance to the successful utilization of new
technologies. |
|
|
| III.
PAPER BASED DATA COLLECTION AND CAPTURE |
|
|
| 33. |
The Workshop was presented with
an overview of paper based data collection and
capture technologies. It was noted that
traditional key-to-disk methods were time consuming,
demanded a large quantity of equipment and personnel
and were, due to the human factor, not always
fully reliable. Employing technology-assisted
solutions would improve efficiency, economy and
reliability in the data capture process.
Optical mark and character recognition systems
were well tested, had become increasingly versatile
and reliable, and could therefore significantly
reduce the time needed for data capture and make
subsequent processing more flexible. Particularly
the imaging technology promised improved efficiency
by largely eliminating the need to return at later
processing stages to paper based documents that
were always cumbersome to handle. Experience
showed that keying from image could be more efficient
than keying from paper, which could particularly
benefit the coding and editing tasks. |
|
|
| A.
Optical Mark Recognition (OMR) |
|
|
| 34. |
Based on the example of Japan,
the Workshop had a detailed exposure about the
optical mark reader (OMR) technology. The
various hardware components of an OMR system comprised
a feeding unit, a photoelectric conversion unit,
and a recognition control unit. The feeding
unit consisted of a hopper for documents to be
read and several stackers for accepted and rejected
documents. The photoelectric conversion
unit used sensors to convert marks on the document
to electric signals and forwarded the signals
to the image memory. Finally, the recognition
control unit read those images and stored recognized
marks onto a magnetic medium. Marks could
be recognized in "alternative mode", i.e. only
one mark was expected for one question and the
darkest mark was selected if by chance there were
several marks found, and in "bit mode", i.e.,
plural marks were expected for one question and
all recognized marks were stored in file. |
| 35. |
The Workshop noted the high
quality requirements for OMR forms, which needed
to be carefully designed in order to improve processing
and recognition reliability. Paper and printing
quality had to be high, dropout colours had to
be used for lead text and mark boxes, the shape
and size of the mark boxes had to be carefully
designed and sufficient distance had to be maintained
between the mark boxes. The OMR form needed
also to include timing marks along the aligning
edge in the direction of reading. Finally,
it was important that the mark boxes were completely
filled with a soft black pencil and that wrong
marks should be erased completely. Since
OMR forms were designed to be readable by the
equipment, staff designated to handle the forms
needed special training to fully understand the
content. |
| 36. |
The Workshop noted that OMR
equipment had to be tested for reliability and
recognition stability at least three times daily,
namely, before, during and after the operation.
Failing those tests, the equipment needed to be
cleaned, adjusted or repaired, as the case might
be. In addition, the equipment needed to
be cleaned daily by removing paper powder from
the mark and image heads, feeding unit and other
susceptible parts. Normally, a monthly maintenance
service was to be scheduled by the vendor. |
| 37. |
The Workshop agreed that OMR
technology was a reliable and economical choice
for censuses and surveys if the responses could
be pre-coded. However, it acknowledged that
the particular requirements for questionnaire
design and paper and printing quality were the
main drawbacks of the technology. For instance,
enumerators, respondents and editors could have
difficulties in using the questionnaires due to
their highly machine-oriented layout. Therefore
it was necessary to allocate sufficient time and
funds for training the enumerators and the OMR
operating personnel. The Workshop noted
that leasing was one way to reduce cost. |
|
|
| B.
Demonstration of Optical Mark Reader (OMR) |
|
|
| 38. |
Data & Research Services
(DRS) plc, a British company manufacturing OMR
equipment and operating a data capture service
bureau, provided the Workshop with an overview
of OMR products and services and highlighted some
of OMR's advantages and disadvantages compared
with key-to-disk data capture. The Workshop
was informed that OMR was capable of capturing
7,000 forms per hour, a huge improvement over
manual key entry. Optical reading also improved
data quality. It was pointed out that as
data volumes increased the use of OMR became more
economical than key-to-disk data capture, particularly
where predominantly pre-coded tick-box responses
could be used. Some disadvantages of OMR
were mentioned, including the need for specially
designed and accurately printed, and therefore
more costly, questionnaires and the difficulty
of capturing subjective data, i.e. textual responses.
The Workshop heard that OMR would be more efficient
and cheaper than optical character recognition
systems (OCR) as long as the majority of responses
could be pre-coded. |
| 39. |
Recognizing that a census questionnaire
often had to include some textual responses, DRS
had developed a new generation of OMRs that added
an image recognition unit. The captured
images would be stored in a file and could be
viewed by coding and editing operators who would
key-in information from the image, possibly assisted
by a computerized table-lookup system. But,
the bulk of the information would still be captured
using the significantly more efficient mark reading
technology. |
| 40. |
A demonstration of a small-capacity
desktop OMR reading actual Greek census forms
concluded the presentation by DRS, which the Workshop
found most useful. |
|
|
| C.
Optical Character Recognition (OCR) |
|
|
| 41. |
The Workshop noted that in some
contexts the recognition of handwritten numerals
and alphabets was referred to as Intelligent Character
Recognition (ICR) to distinguish that technology
from the recognition of printed text and numbers.
This report, however, is using the term OCR to
cover all character recognition. |
| 42. |
Kodak (United States) had been
invited to introduce to the Workshop optical character
recognition (OCR) technology as used in the 1990
United States census. The Workshop was informed
that to obtain maximum reliability in the scanning
process, special care had to be taken when designing
and printing the questionnaires. The measures
included the use of non-carbon based ink and dropout
colours. Like the OMR forms, the OCR forms
design had to be a compromise between maximizing
the ease of use by the enumerators, coders and
editors on the one hand and optimizing the efficiency
of the recognition software on the other.
Experience showed that the best recognition rates
for hand written responses were achieved at a
scanning resolution of 200 dots per inch (dpi)
or lower; higher resolutions generally worsened
the recognition rates. |
| 43. |
It was explained that the confidence
level of character recognition was user definable
and was dependent on the overall document quality,
i.e. questionnaire design and clarity of hand
written responses. However, setting
the confidence level too high, e.g. above 90 per
cent, could result in excessive numbers of rejects,
while setting the level much lower could jeopardize
the quality of the output data. The Workshop
noted that one of the major problems in character
recognition was the acceptance of positively but
wrongly identified characters. In consequence,
reduction of the number of "false positives" would
have the most benefit for the overall quality
of the captured data. |
| 44. |
On a unit cost basis, the economics
of keying-from-paper (KFP) and keying-from-image
(KFI) were compared. With the selected labour
cost the calculations suggested that the break-even
point was at about 400,000 census forms, i.e.,
beyond those numbers KFI would become more economical.
It was pointed out that KFI might be feasible
even with a lesser number of forms, if improved
data quality at the data capture stage, reduced
costs for the additional processing steps and
increased capture speed resulting in earlier completion
of the entire census process were taken into account. |
|
|
| OCR
technology for the Indonesian Census 2000 |
|
|
| 45. |
The Workshop was informed about
the background and rationale based on which Indonesia
selected OCR as the data capture method for the
year 2000 census. Major considerations had
been (a) the very large number of forms to be
processed for a population of more than 200 million;
(b) the need to produce small area statistics
based on the many island areas; and (c) the possibility
of publishing basic results within 3 to 6 months.
Helpful in the decision had also been the availability
of external assistance in the form of equipment,
software and expertise. |
| 46. |
The OCR system and the questionnaire
design had been assessed and tuned in several
pilot tests. The changes in the questionnaire
design had improved the recognition results significantly.
Further improvements had been achieved by replacing
the built-in western character set in the recognition
engine with a localized version of the character
map. The local version had been developed
from writing samples submitted by 5,000 different
persons. However, it was eventually decided
that it was better to omit the recognition of
alpha characters and to concentrate on maximizing
the performance of numeric recognition and mark
reading. |
| 47. |
The Workshop was given an overview
of the processing flow of an OCR based system
in Indonesia. The OCR system consisted of
three steps, namely scanning, recognition and
verification. The scanning of questionnaires
produced an image file in TIF format. That
was compared to a template file containing information
about the relative locations of input in the questionnaire.
The resulting digital output file was then submitted
to the verification process in order to produce
a clean data file. |
| 48. |
The Workshop learned about the
issues and principles involved in the OCR questionnaire
design in Indonesia. It was noted that OCR equipment
required less stringent paper quality and printing
accuracy than did OMR. Instead, four rectangular
registration markers were placed near the corners
of the questionnaire page to define the location
of individual fields relative to these registration
markers, thus providing greater tolerance for
misaligned forms being fed through the scanner.
Data fields were placed on the page as boxes of
sufficient size to allow clear handwriting, with
appropriate distance between them to minimize
the risk for misinterpretation. Depending
on the use, field types could be defined as containing
marks or textual information. For textual
boxes the use of two vertical dots within each
character box was recommended that would guide
the respondent or enumerator and thus improve
the quality of handwriting. Standard form-processing
tools could normally be used for developing the
questionnaire. Once the design was complete,
the questionnaire was scanned to produce an image
file that was input to the NCS Nestor Reader editing
function in order to create the above mentioned
master questionnaire in ZDF format. The
questionnaires used for data collection were printed
with dropout colours. |
| 49. |
The Workshop was given a hands-on
demonstration of developing an OCR questionnaire
using the Visio Technical software. The
form design included text, recognition mark and
check boxes. It was thus shown that the
questionnaire design could be developed by the
user without assistance from the software company.
In contrast, the validation and editing rules
were programmed in Visual Basic and were linked
to the Nestor Reader software, a more difficult
task that perhaps needed assistance from the vendor. |
| 50. |
The Workshop also observed a
practical demonstration of a less powerful but
similar system to the one that Indonesia was planning
to use, showing the scanning and recognition of
characters and marks, and the output of questionnaire
data to a digital file. |
| 51. |
The Workshop heard that Indonesia
was planning to deploy for its 2000 census some
80 OCR systems, consisting of Fujitsu Scanners
M3099GX, NCS Nestor Reader 5.0, Visio Technical
scanning software Scan All, and Fujitsu PCs.
The systems would be distributed across the country,
allocated to provinces according to their population
size. After the census, those systems would
be allocated for long-term use at smaller regional
offices. The Workshop heard that greater
emphasis would be placed on enumerator training,
particularly on the writing of numbers.
Statistics Indonesia had chosen to use cardboard
boxes for storing and transporting the questionnaires
instead of plastic satchels. The boxes were
designed to serve the dual purpose of better protecting
the forms in the humid climate and providing writing
support for form filling to be done by the enumerator. |
| 52. |
For the Indonesian census, coding
would be done in the office before the forms were
scanned. The Workshop discussed the feasibility
of reversing the sequence, i.e. of subjecting
the forms first to scanning and then only to computer
assisted coding from the scanned images.
It was concluded that the feasibility depended
on the availability of suitably trained staff. |
|
|
| Demonstration
of OCR cluster |
|
|
| 53. |
The Workshop observed a practical
demonstration by Top Image System (TIS) of the
TIS AFPS Pro recognition cluster that used a Kodak
scanner with a controlled station linked to six
Pentium PC stations in the following functions:
(1) processing; (2) tile; (3) completion; (4)
exception handling; (5) archive and export; and
(6) controlling. It noted the flexibility
to inspect recognition results by character (tile
mode) and appreciated the system's simplicity
and efficiency in facilitating the recognition
of visibly wrongly interpreted characters. |
| 54. |
Depending on the overall workload,
the number of computers for each processing step
could be increased or decreased and depending
on current workflow conditions, i.e., bottlenecks,
the usage of any computer could be temporarily
or permanently reassigned to another function
in order to keep the overall system performance
well balanced. |
| 55. |
To highlight the efficiency
of the modular approach, the example of the 1997
Turkish Census was cited. In that census,
questionnaires for 62 million people were scanned
and recognized in 30 days, albeit only for a subset
of variables. The Workshop noted that the
processing time was an inverse function of available
scanning and recognition clusters. It was
informed that TIS had achieved alpha recognition
rates as high as 94 per cent (Brazil) and 98 per
cent (in Germany), although the latter case involved
less elaborate forms than census questionnaires. |
| 56. |
Improvements in recognition
rates achieved by the TIS software were attributed
to several advanced techniques, including (a)
image enhancement; (b) form identification and
removal (lift-off); (c) use of several recognition
engines with voting algorithms; (d) trainable
recognition algorithms, including local writing
styles; (e) validation function and rules; (f)
automatic coding; and (g) visual inspection in
tile mode. |
| 57. |
The Workshop heard that the
form identification and removal feature eliminated
the need for dropout colours and would significantly
reduce the required storage space. The voting
algorithms would evaluate the results of several
recognition engines and select the best answer
according to pre-defined rules. The tile
mode would show for each character from 0 to 9
and A to Z, one at the time, a table containing
all images as they were interpreted to represent
that character. That feature provided an
efficient means of visually inspecting all images
at a glance and easily identifying those images
that did not correspond to the character under
review. |
|
|
| New
Zealand experience in 1996 |
|
|
| 58. |
The Workshop learned that for
the 1996 New Zealand Population Census imaging
and character recognition were used to capture
the data. Benefits compared with the 1991
census included: results released 5 months earlier;
cost savings for data capture estimated at 9 per
cent, noticeable reduction in paper handling and
storage (particularly after the capture); and
easier access to forms during coding and editing.
In addition, better quality control was gained,
fewer staff needed to be recruited and trained,
and for comparison with the post-enumeration survey
access to census data was easier. |
| 59. |
The following lessons were learned
from the 1996 New Zealand Population Census use
of imaging and character recognition: (a) systematic
recognition errors for certain characters rendered
biased results; (b) the use of images for coding
and editing was a distinct advantage; (c) more
data validation during data capture would improve
overall data quality; and (d) high-priority variables
could easily be processed first. The Workshop
was informed that further contracting out the
data capture process might give significant economic
long-term benefits, and, last but not least, imaging
should not be used just as a replacement of traditional
data capture methods but the entire census process
could beneficially be re-thought at this occasion. |
|
|
| Observations
and recommendations on OCR |
|
|
| 60. |
The Workshop noted that recognition
engines could be expensive and therefore the use
of multiple engines had to be carefully evaluated.
However, it was also recognized that no single
recognition engine would give 100 per cent results
in all circumstances and that different engines
had different strengths and weaknesses.
Thus, using several recognition engines with a
voting mechanism could significantly improve the
overall recognition rate. |
| 61. |
The Workshop recommended that
users should demand that competing vendors of
census data capture systems demonstrate that the
promised capabilities of their system would work
under local circumstances, i.e. in the physical
and infrastructure environment of the user as
well as with the specific forms as developed by
the user. |
| 62. |
The Workshop noted that using
technologically advanced solutions should not
be self-serving but consideration should be given
to local circumstances, e.g., to the constraints
based on limitations of financial, technical and
personnel resources. |
| 63. |
The Workshop also noted that
paper based methods continued to be used for data
collection, particularly when the general public
was filling in the questionnaires. It was
noted that non-response remained one of the main
problems in census taking. |
| 64. |
The Workshop discussed the benefits
and drawbacks of paper based data collection and
capture methods. Considerable interest was
shown in the topic and the following were the
observations by the Workshop: |
|
- improved technology had
helped the census process in many developing
countries;
- operational issues for
data capture had to be considered in conjunction
with the entire survey process;
- the choice between OMR
and OCR/ICR needed careful consideration.
Questions arising were whether alpha recognition
was already well enough proven and whether
scanning of occupation and industry would
be viable;
- further, was the imaging-type
data capture really viable for all countries,
especially the smaller developing countries
in the Pacific with correspondingly small
budgets;
- in the context of censuses,
the simpler OMR technology with maximum utilization
of pre-coded variables could possibly be the
most efficient option;
- the low literacy level
in some countries might prove a problem when
using questionnaires for image scanning;
- the number of different
languages or dialects might prove a potential
problem with image recognition systems;
- the locally available
expertise in handling forms, in interviewing
and in computer literacy were issues to be
considered;
- the statistically less
developed countries could learn from the experience
of more developed countries which already
had successfully used sophisticated data capture
technologies.
|
| D.
Archiving of census forms |
| 65. |
The Workshop was informed about
an often-overlooked aspect of census data processing,
namely, the long-term archiving of census forms.
It was noted that some countries required census
documents to be discarded immediately while, in
contrast, others had legal stipulations demanding
the retention of original documents for decades
or centuries. The simplest archiving method
would be to store the original questionnaires.
But transfer of the images to a more efficient
storage medium could be considered because of
the significant space and environmental requirements
for paper documents. Obviously, when scanning
was part of the data capture system, the scanned
images could conveniently be stored on electronic
media (tapes, disks, CD-ROMs). The Workshop noted,
however, that the rapid evolution of storage formats
and hardware could make those types of digitized
information inaccessible over a long period of
time. Therefore, it recommended giving due
consideration to simple, stable and space efficient
microfilm technology as a long-term storage solution
for images. |
|
|
| IV.
NON-PAPER BASED DATA COLLECTION AND CAPTURE |
|
|
| 66. |
The technologies for direct
electronic data capture were becoming an alternative
or at least a complement to the use of paper forms.
The most common non-paper based data capture methods
were computer assisted personal interviewing (CAPI),
computer assisted telephone interviewing (CATI),
and submission of questionnaires through the Internet. |
|
|
| A.
Computer Assisted Telephone Interviewing |
|
|
| Internet
and CATI in Singapore Census 2000 |
|
|
| 67. |
The Workshop heard that the
year 2000 Census in Singapore would mark a significant
step towards a paperless census. The main
technology blocks that the Department of Statistics
was building on were the utilization of available
administrative records (for pre-filled personal
and household information), CATI and Internet
form submission. CATI was expected to be
the main mode of data collection, to be used for
60 to 80 per cent of the households. There
was no precedent for a large scale Internet submission
and therefore it was difficult to estimate its
popular acceptance beforehand. Personal
interviewers would be sent to households that
could not be reached by phone or that did not
submit their response through the Internet.
Their forms would be scanned and OCR/ICR would
be used to capture the results. |
| 68. |
Apart from an advanced technology
solution, the Singapore census was unique in the
sense that most of the data collection would be
through outsourcing and that multiple vendors
would be involved. The Singapore experience
showed that measures were required to prevent
conflicts between different vendors involved in
the census project. The measures included
procedures for keeping all parties informed about
decisions made and progress achieved, establishment
of conflict resolution procedures, and the use
by each vendor of their own servers and their
own licenses for their software. The Workshop
noted that end-users of complex applications were
not in a position to identify the causes of system
problems; for that purpose a separate help desk
was needed. |
| 69. |
The Department of Statistics
of Singapore had previous experience in using
the Internet for data collection, but that was
restricted to the transmission of survey information
from about 1,000 large companies. The year
2000 census would be an exercise of a completely
different scale, and therefore the challenges
were unprecedented. Although the technology
in the submitter's environment was beyond the
data collector's control, the standardization
of browsers and the availability of the Java language
and a secure data transfer protocol made it possible
to use the Internet for large scale data collection.
Post census surveys would be used to verify the
results and possible biases in the various modes
of data capture. |
| 70. |
The Workshop agreed that data
protection was perceived as a major consideration
in the Internet census submission and that major
publicity campaigns were needed to promote that
mode of submission. The Workshop was informed
that although it was always possible that data
might get into the wrong hands during the Internet
submission, that risk was actually rather small.
In fact, it was much easier to eavesdrop the CATI
interviews than to intercept and decrypt secure
data transfers over the Internet. However,
attacks of hackers on Internet servers were indeed
a major security consideration. The server
side design should include industry standard firewalls
to allow only authorized traffic; it was also
important to implement immediately any security
related patches that were frequently announced
by the suppliers of operating and database management
systems. A key precaution was to minimize
the data holdings on any server that was connected
to the Internet, i.e., to frequently move the
data to a non-connected system. In addition,
rapid response teams should be on stand-by to
identify and tackle any intrusion as soon as it
occurred. |
| 71. |
The management and integration
of the diverse data capture systems required a
well-designed centralized tracking system.
In order to minimize duplicate responses for the
same household, such as a daughter submitting
an Internet response and a father being simultaneously
interviewed by a CATI operator, the progress of
returns by each capture mode needed to be updated
and checked frequently. It was also important
to design the overall system such that a failure
in one capture mode did not bring down the rest
of the operation. A centralized backup system
that allowed a complete rollover to any point
of time during the previous few days was even
more important than in a conventional database
system. In any case, based on available
back-up information, including voice recordings
of telephone interviews, return to the respondent
for the purpose of repeating the interview should
be avoided at all cost. |
| 72. |
The development of a multiple-technology
and multiple-vendor system required excellent
coordination between all partners. The user
acceptance testing had to be rigorous, first for
each system component and then for the whole system
in integrated and simultaneous use in order to
discover design flaws and bugs that required rectification. |
|
|
| B.
Computer Assisted Personal Interviewing |
|
|
| 73. |
The Workshop was also given
an overview of the CAPI system as used by the
Australian Bureau of Statistics. It was
stated that CAPI had the potential for significantly
improving the quality of data and timeliness of
processing. It would also help to achieve
cost effectiveness, particularly if the required
equipment could be utilized for other applications
after the first data collection. |
| 74. |
The improved quality was achieved
through computer-assisted filling of the questionnaire,
thereby avoiding omissions and/or superfluous
responses, while on-line editing would reduce
the number of erroneous responses and permit more
detailed probing through the questionnaire.
Improvements in the timing of data release were
possible due to elimination of a separate data
capture phase (key-to-disk or OMR/OCR scanning),
implementation of a field coding system, use of
on-line derivation of output variables and electronic
data transfer from the enumerator's computer to
a central facility. Cost effectiveness might
be judged by less tangible results such as improved
coding effectiveness, reduced interview time,
streamlined processing, reduced reliance on clerical
procedures and printed material, etc. |
| 75. |
The Workshop noted, however,
that CAPI involved considerable set-up cost for
hardware and application development. The
availability of communications infrastructure
in the field was an important factor in reducing
data transfer times and making the most efficient
use of the expensive equipment. |
| 76. |
The Workshop was given a presentation
of the survey processing system Blaise developed
by Statistics Netherlands. That software
was specifically designed in support of computer-assisted
data capture, i.e. to be used by field enumerators
with a laptop computer or from the office when
interviewing by telephone, but could equally well
be used for key-from-paper data entry operations.
Form-based data entry, complex routing and checking,
interactive coding and data editing, strong data
manipulation and tabulation capabilities, as well
as survey management and export to other statistical
and database formats were features that made the
Blaise software a very useful tool for statistical
offices. The Workshop noted that Blaise
was commercial software but hoped that statistical
offices in developing countries could obtain it
at a lower cost if not free of charge. |
| 77. |
The Workshop drew the conclusion
that CAPI was a very useful technology but would
be less feasible for full scale census operations
until such time that the necessary equipment had
become significantly cheaper, smaller, easily
portable, more robust, and powered with long-lasting
batteries. The Workshop noted further that
non-paper data capture methods such as CATI and
electronic form submission would not be feasible
in many countries due to the insufficiently developed
communications infrastructure. |
| V.
IMPLICATIONS FOR THE GUIDELINES ON THE APPLICATION
OF NEW TECHNOLOGY TO POPULATION DATA COLLECTION
AND CAPTURE |
|
|
| 78. |
At the completion of modules
2 and 3, the Workshop reviewed the draft guidelines
on the Application of New Technology to Population
Data Collection and Capture in the light of the
Workshop proceedings. It was emphasized
that the guidelines were based on voluntary contributions
from the Working Party members and that given
the urgency to publish the project outputs, it
was not feasible to perfect the guidelines with
all possible aspects related to the application
of IT. |
| 79. |
The coordinator of the guidelines
noted that certain concepts and terminology needed
updating. They included, among others, the
latest in character recognition innovations (multiple-engine
recognition and voting system) and some Internet
data collection and security issues. He
agreed that it would be useful to add information
lessons learned from some of the unsuccessful
high-tech solutions, and invited contributions
from all participating statistical and census
offices on such experiences. Additional
information on public domain software, and examples
of census and survey forms used in connection
with the latest data capture technologies would
further enhance the guidelines. |
| 80. |
The Workshop identified several
areas where the guidelines could be improved and
requested the Working Party to implement the changes
where possible. Those included the technological
implications arising from the high confidentiality
requirements for census and survey data, quantification
of savings that had been obtained through the
application of new technology, and special training
requirements for each featured technology in the
guidelines. A technology update was required
on the recognition technology involving the combination
of OMR and OCR/ICR technologies. |
| 81. |
The Workshop noted that the
guidelines were yet to have an introductory section
that explained their purpose and coverage and,
as important, what they did not include.
And finally, the Workshop agreed that the guidelines
would be easier to read if the various sections
were structured in a similar fashion. |
|
|
| VI.
ADDING VALUE TO CENSUS DATA THROUGH DATA WAREHOUSING
AND DATA MINING |
|
|
| 82. |
Presentations were made by the
representatives of two local vendors (Unisys and
SAS Institute) that provided data warehousing,
online analytical processing and data mining solutions
for various businesses, including statistical
offices. Although some of the most advanced
statistical offices had been experimenting with
those technologies (e.g. common data dissemination
platform in the Australian Bureau of Statistics),
they were relatively unknown to most Workshop
participants. Therefore, the presentations
and the consequent discussion centred on the key
concepts and terminology, and their differences
from traditional relational databases and analytical
tools. |
| 83. |
Data warehousing technologies
typically involved several separate databases
in various platforms from which data were extracted
and cleansed to a normalized data warehouse.
The Workshop was informed of the analogy between
the evolution of database technology and data
warehousing technology. Relational database modelling
and SQL had changed little since the 1970s.
However, huge improvements to the hardware had
allowed the development of user friendly design
tools for databases to the extent that knowledge
of the SQL was no longer needed in order to develop
and run simple database systems. It was
pointed out that data modelling for data warehouses
was still very challenging and laborious and that
design tools needed considerable improvement.
Also, the query times and other performance factors
were not always satisfactory. Nevertheless,
it was expected that data warehousing technology
would go though a similar evolution as database
systems, and would eventually become much easier
to implement. |
| 84. |
The Workshop agreed that data
warehousing and related downstream technologies
offered a great potential for integrating data
derived from administrative records, various censuses
and surveys, and for different points of time.
It noted that setting up a full-blown data warehouse
system was not easy and required significant resources
for standardization of concepts and metadata,
for data modelling, for data cleansing and for
the rest of the implementation. Therefore
it was important that the organization was clear
about the business objectives that data warehousing
would help to achieve. Data warehouses were
typically built with a long-term goal in mind
and with scope for future growth. Noting
that the specification and use of a correct data
model was the single most crucial success factor
in the implementation of a data warehouse, the
Workshop strongly recommended the sharing of data
models among statistical offices, rather than
"reinventing the wheel" alone. |
| 85. |
The Workshop noted that data
mining was often related to data warehousing.
It could be implemented within or above the data
warehouse, but also outside and independent of
it. Data mining tools were used without
defining any test hypothesis in advance.
They involved mathematical algorithms that could
reveal hidden interdependencies in the data, thus
producing unexpected results and insights.
Online analytical processing (OLAP) was based
on a more traditional analytical approach with
an advance hypothesis setting. OLAP could
be used in a data warehouse environment.
The Workshop cautioned that an elaborate and nice-looking
interface of an OLAP or data mining tool did not
guarantee that the related data warehouse would
necessarily be implemented properly. In
fact, full-fledged data warehousing systems (top-down
developed systems) were so large and involved
so many different types of tools that there was
no single company offering all required products.
However, there were providers for smaller systems,
data marts, which were designed and developed
from the bottom up. |
| 86. |
The Workshop recommended that
data warehouses be developed in a modular fashion,
keeping the long-term needs in mind: "Start small,
think big". It noted that the Internet was
increasingly used for data transfers in data warehousing
solutions. |
| 87. |
At the end of the module, the
SAS Institute demonstrated an OLAP interface using
a Web browser and Java applets to create user-end
(thin clients) graphics. |
|
|
| VII.
DATA DISSEMINATION |
|
|
| 88. |
The Workshop noted that the
traditional way to disseminate census results
was in the form of tabulations, i.e. a listing
of the number of occurrences for individual or
grouped values of one or more variables.
Census publications comprised usually a set of
core tables presented in hierarchical, geographic
breakdowns and aggregations. Additionally,
they were increasingly complemented by custom
designed tables based on client specifications. |
| 89. |
In the past, when a client approached
the statistical office to obtain specific information,
the request was handed to the programming department
where a tailor-made query was developed and run
and the results were verified by a statistician
for correctness and consistency. That was
often a lengthy and costly procedure and was prone
to mistakes when involving both programming and
statistical staff. Today's technology allowed
statisticians to take on the entire task of designing
and producing the output without involving the
programming department, thus reducing significantly
the response time to a client's request.
Also, the risk of misinterpreting the client's
data request was reduced, which otherwise often
resulted in rerunning the job, thus wasting valuable
resources as well as delaying the delivery of
the data to the client. |
| 90. |
Electronic output from a data
extraction phase was best delivered in standard
file formats such as spreadsheet files, comma-separated-value
files (CSV format, usable by database and statistical
standard software) or tab-delimited files (TXT
format, usable by text processing software), so
that they could be used for further processing. |
| 91. |
For distribution of electronic
output, several media types were available, each
one having advantages and disadvantages.
Commonly available diskettes were easy and safe
to use and were ideal for storing small files
due to their portability and reliability.
However, large files had to be compressed or split
into smaller sections. It was noted that,
apart from their small capacity, a major disadvantage
of using diskettes was that once infected, they
were prone to distributing boot-section viruses. |
| 92. |
The Workshop noted that the
cost of producing individual copies of compact
disks (CD) had not reduced drastically.
Nevertheless, that medium was suitable for storing
or disseminating both large and small amounts
of data and information. |
| 93. |
The Workshop also noted the
benefits of data dissemination by electronic mail;
it was very fast and an ideal medium for transferring
small data sets to users and efficient when the
same material were disseminated simultaneously
to several recipients. However, it was not
suitable for very large files. Another drawback,
not different from ordinary mail, was that senders
would not be certain that users did indeed receive
the files. Also, ordinary file attachments
carried a threat of virus infection which mail
gateways and virus protection software could not
always detect. The Workshop heard that at
Statistics New Zealand the use of e-mail for dissemination
purposes had increased from 25 per cent in 1998
to 60 per cent in 1999, with diskette delivery
dropping considerably. |
| 94. |
The Workshop noted that the
World Wide Web was increasingly used by statistical
offices to make information available globally
about the office and its activities as well as
about statistical information of major interest
to the general public. It agreed that the
web offered a great tool and an opportunity for
improving customer relations and public perception
about statistical offices. During the Workshop
the participants had the opportunity to visit
the web sites of various statistical offices and
review the key aspects to be considered when developing
a web site for an NSO. The Workshop agreed
that clarity and ease of use were important design
objectives for a web site. Special consideration
should be given to the fact that many visitors
to the web site had slow Internet connections,
which put restrictions on the use of large files
and graphics intensive designs. |
| 95. |
The Workshop noted that nowadays
many national statistical offices provided the
possibility of retrieving data dynamically through
the Internet. Using standard web browsers
to formulate queries, users were able to obtain
data corresponding to their individual needs.
The implementation of that kind of service required
a relatively high degree of technological know-how
and the implementation of industry standard security
mechanisms to prevent intrusion and to safeguard
the confidentiality of data. The users needed
to have a relatively high degree of familiarity
with the data to ensure they were extracting the
correct variables to satisfy their request. |
| 96. |
The Workshop noted that magnetic
tapes had lost much of their attractiveness as
a storage and dissemination medium. While
they could store substantial amounts of data,
very few of the current microcomputer base systems
had a tape drive attached. Also, the access
to data on tapes was cumbersome and time consuming. |
| 97. |
The Workshop acknowledged that
hardcopy output still had advantages. It
did not require any technology at all to be used
and therefore could be read anywhere. Full
portability was also achieved if users had the
opportunity to print electronically disseminated
data. The Workshop noted the advantages
of the Portable Document Format (PDF) in that
regard. Disadvantages of hardcopy output
included the facts that the data could not easily
be manipulated or presented in a different form
and that the storage of bulky publications could
cause a problem. The Workshop also noted
that fax machines provided a feasible alternative
for sending small amounts of data to a limited
number of customers. |
| 98. |
The Workshop agreed that users
needed to have access to the information about
data collection methods, sources, definitions,
and terminology used. Additional statements
could be included on the quality of data as well
as sample error tables. Advice on the use
of data with low value cells, subject to sample
errors, might also be given. Disseminated
products should include the terms and conditions
of data supply, explaining to the users how they
could use the data and the rules that governed
the transmission of data to third parties.
The terms and conditions were required to protect
the statistical office from liability, should
users make wrongful use of the information or
if perchance the data included erroneous information.
With all distributions of information a statement
should be included specifying the confidentiality
provisions contained in the data. Finally,
the supplied data should always be accompanied
by details on whom to contact for queries. |
| 99. |
While tabulations were the most
condensed format for presenting statistical output,
the Workshop encouraged the use of graphics in
order to make information easier to understand.
Particularly, graphs could quickly inform about
trends or relationships by visually portraying
the underlying data content. Graphs were
used to support written commentary on statistical
results and were ideal for press and media releases. |
| 100. |
However, graphs should be designed
carefully so as not to defeat the principal reason
for their use, namely, clarity of presentation.
Graphs should not be overloaded with information,
should always clearly identify their purpose and
the origin of data, and should identify the variables
included. The Workshop agreed that a key
to good graphical presentation was the selection
of the correct form of graph (single, multiple,
vertical and horizontal bar graphs, line and pie
graphs, two and three dimensional graphs, etc.).
It was noted that graphs could be produced by
commonly available spreadsheet programs (Excel,
Lotus) as well as by general-purpose statistical
software packages (SAS, SPSS) and by some specially
developed data extraction software used at statistical
offices (IMPS, PopGraph, SuperCROSS). |
| 101. |
Another method of portraying
statistical information was thematic mapping.
With the availability of specially developed mapping
software or an industry-strength GIS, statistical
data could be linked to geographic areas and displayed
with great efficiency and clarity. Particularly,
thematic maps could show at a glance regional
differences or similarities of different indicators
such as population densities, fertility rates,
health service coverage, etc. As in any
data release the confidentiality of data had to
be maintained, especially in maps covering small
areas. |
| 102. |
The Workshop saw demonstrations
of several software products for mapping and tabulation,
namely PopMap, IMPS, SuperCROSS, Superstar and
SuperMap. In addition, a small group of
Workshop participants used SuperCROSS for constructing
simple cross tabulations from a synthetic database
derived from perturbed data originating from the
New Zealand Population Census. Based on
the demonstrations, the Workshop discussed the
criteria to be considered when selecting tabulation
and mapping software and agreed that important
aspects were: (a) the capability to handle large
data sets; (b) the availability of statistical
calculation functions; (c) the possibility to
compile camera-ready tabulations; (d) the suitability
for dissemination use with newer media such as
CD-ROM or the Internet; and most importantly,
(e) the user friendliness of the software and
(f) the cost. In addition, for many countries
the ability to handle non-Latin character sets
was important. |
| 103. |
The Workshop agreed that when
evaluating data extraction software packages the
NSOs needed to pay particular attention to the
ability of suppliers to demonstrate that they
(a) could support the software; (b) could provide
training, supply manuals and on-line help files;
and (c) were prepared to let the software be tested
thoroughly within the working environment where
it would be operating. |
| 104. |
In summary, the Workshop emphasized
that the modernization of dissemination methods
and the creation of products for new media were
essential in order to reach a wider audience.
At the same time, new technology allowed the production
and dissemination of information of special interest,
customized for narrow target groups. The
Workshop recognized that ultimately it was the
users of statistics who would determine how the
data were to be presented and the manner in which
they were to be delivered. |
| 105. |
At the end of the module, the
Workshop reviewed an application that was equally
useful in the production and dissemination of
statistics. It learned from Statistics New
Zealand that a Classification and Related Standards
System (CARS) had been implemented with the aim
of providing a centralized storage, maintenance
and access facility for all classification data
used in the input and output systems of the organization.
CARS contained historical classifications, code
files and concordances and information relating
to them; all economic, social and geographic standard
classifications; survey specific classifications;
and all classification categories used for coding
survey data at the input stage and their descriptions
and labels used in the presentation of output
data. |
| 106. |
The implementation of CARS in
Statistics New Zealand had reduced the time and
resources needed in developing new surveys; the
quality of surveys had also improved. In
addition, comparison and analysis of data was
facilitated by retaining concordances. Classification
information stored in CARS was accessible by a
large number of staff in their day-to-day work.
A more limited but better qualified number of
staff had access to the system for the maintenance
of the information. CARS was particularly
useful in a statistical agency as it standardized
all code files and descriptions used within all
surveys or censuses conducted. The major
advantage was the ability to compare the use of
variables between data sets. For example,
occupations within a labour force survey could
be compared with occupations collected from the
population census. According to the information
available, this was the first agency-wide implementation
of such a system. |
|
|
| Implications
for the guidelines on the Application of New Information
Technology to Population Data Dissemination |
|
|
| 107. |
At the completion of module
5, the Workshop reviewed the draft guidelines
on the Application of New Information Technology
to Population Data Dissemination in the light
of the proceedings, which were accessible through
the Internet at
http://www.unescap.org/stat/pop-it/pop-it5/meet_5.asp
or
http://www.unescap.org/stat/pop-it/pop-wit/pop-wit.asp |
| 108. |
The Workshop identified several
areas where the guidelines could be improved and
requested the Working Party to implement the changes
where possible. The Workshop agreed that
the guidelines would be easier to read if the
various sections were consistently structured,
and noted also that these guidelines required
an introductory section and disclaimers regarding
the intended coverage, as described in paragraph
81 of this report. |
|
|
| VIII.
GEOGRAPHIC INFORMATION SYSTEMS |
|
|
| 109. |
The Workshop noted that a Geographic
Information System (GIS) was a computerized database
system for storing, manipulating, retrieving,
displaying and printing spatial and non-spatial
geographic data and their attributes. GIS
was especially useful for statistical offices
in the preparation of enumeration area maps and
in illustrating census and survey results through
thematic maps. Several comprehensive GIS
software products such as MapInfo and ArcInfo
were available, with functionality much exceeding
the immediate needs of statistical offices in
developing countries. However, low- or no-cost
software solutions for mapping were also available,
such as PopMap developed and distributed by the
United Nations. |
| 110. |
The Workshop was informed that
to create a GIS database, paper-based maps needed
to be digitized and geo-coded, either manually
or by scanning. The map information could
also be imported from existing map data files.
If maps did not exist, methods such as aerial
photography or remote sensing could be used to
create them. However, often those options
were costly and beyond the means of the statistical
office to implement entirely from its own resources.
The Workshop heard that Geo-Positioning Systems
(GPS), that had recently become popular and affordable
for navigational purposes, could be beneficially
used to create detailed enumeration area maps. |
| 111. |
The Workshop was informed about
the components, features and limitations of GPS.
The GPS was based on 24 operational satellites,
which were orbiting at 20,200 km above ground
and were controlled, monitored and synchronized
from five ground stations. With the help
of a cheap, handheld mobile GPS unit, longitude,
latitude and altitude co-ordinates could be calculated
by receiving signals from at least three satellites.
The system provided an inherent accuracy of 5
metres or better. However, the Workshop
was informed that the launcher of the satellites,
the United States Department of Defence, intentionally
manipulated1/ the data sent by satellites so that
the actual accuracy in civilian use was no better
than 100 metres. The Workshop heard that
to overcome that limitation the industry had developed
a so called Differential GPS, that relied on nearby
fixed ground units with which the mobile unit
communicated in order to receive updated corrections
to measurements calculated from the satellite
information. With that correction an accuracy
of better than 2 metres could be achieved, which
should be good enough for any application the
statistical office might have. |
| 112. |
The Workshop was given an outdoor
demonstration of a handheld GPS (manufactured
by Magellan). Coordinates were continuously
recorded while Workshop participants walked around
the block. Back in the meeting room, the
list of coordinates were transferred from the
GPS unit to a computer and the MSTAR software
by Magellan was used to convert the coordinates
into plots and graphic images. |
| 113. |
The Workshop was also given
a demonstration of the ArcView software, a module
of the ArcInfo GIS. Based on a Bangladesh Ward
map showing several enumeration areas, various
features were displayed such as map viewer, table
displayer, layout map composer, table charter
and script text editor. |
| 114. |
The Workshop heard presentations
of the two pilot projects that were implemented
by Bangladesh and the Philippines as components
of the UNFPA funded project RAS/96/P12.
The Bangladesh pilot project concentrated on the
use of GPS for the creation and updating of enumeration
area maps. The Philippines pilot project
developed a census operations management system,
called Quick Count, to be used in the 2000 population
and housing census. The application was
based on the use of GIS and the World Wide Web.
The intention was to provide managers on all levels
with up-to-date information throughout the 30-day
enumeration period that would be available through
the Internet to anyone who was pre-authorized
to access it. Above a certain level of access
authority managers would be allowed to update
the information. It was expected that the
Quick Count system would report preliminary census
results very soon after enumeration was completed.
Due to budgetary constraints, the National Statistics
Office of the Philippines elected to develop its
own GIS solution based on the FLY shareware obtained
via the Internet. It was expected that the
Quick Count system would be tested in connection
with the forthcoming enumeration for the pilot
census. |
| 115. |
The Workshop concluded that
GIS as well as GPS were valuable tools for the
statistical office to better cope with the cumbersome
mapping task. |
|
|
| Implications
for the guidelines on the Application of Geo-Positioning
Systems and Geographic Information Systems for
Digital Mapping and Statistical Management |
|
|
| 116. |
The Workshop noted that the
lessons learned from the two pilot projects would
be reflected in the guidelines and hoped that
a complete draft version of the guidelines would
be swiftly made available on the Internet. |
|
|
| IX.
RECOMMENDATIONS OF THE WORKSHOP |
|
|
| General,
IT management |
|
|
| 117. |
The Workshop agreed that the
conduct of censuses and surveys was necessarily
becoming increasingly technology intensive.
It recommended that national statistical offices
keep abreast of the latest information technology
by continuously monitoring technology evolution
and by upgrading production and office systems
periodically. |
| 118. |
Appreciating the excellent cooperation
and contribution received during the project,
the Workshop recommended that technologically
advanced offices continue to share with others
their experiences in adopting new information
technologies. |
| 119. |
Noting that modern data capture
technologies (OMR, OCR/ICR, CAPI, CATI, Internet
data collection) had uses in many sectors, the
Workshop recommended that in order to keep IT
applications cost-effective, census and survey
organizations should collaborate, among themselves
and with other agencies, in the procurement and
post?census use of the equipment and software. |
| 120. |
The Workshop noted that for
many countries budgetary constraints hampered
the effective application of new technology.
It requested the bilateral and multilateral donor
agencies to increase their assistance to developing
countries for IT applications, and recommended
that the technical cooperation among developing
countries (TCDC) modality be promoted for an enhanced
sharing of IT experience and skills through expert
visits and study tours. |
| 121. |
The Workshop recommended that
statistical offices upgrade their organizational
IT knowledge and create a modern IT culture, and
develop prudent procurement methods to match the
skilful and articulate marketing techniques of
private sector vendors. |
| 122. |
The Workshop recommended that
Governments should take into account in their
procurement rules the overall costs and benefits
that each technology alternative offered in the
long term, and not take decisions solely on the
bid price for a particular application. |
| 123. |
The Workshop emphasized that
it was crucial for senior management in the national
statistical offices to increase its awareness
of trends in information technology and the associated
costs and benefits, and to improve related
management skills. |
| 124. |
The Workshop recommended that
national statistical offices should ensure that
any vendor being considered for the supply of
new technology systems was able to substantiate
its claims. Statistical offices should have
a benchmark drawn up addressing their requirements,
before the commencement of the evaluation process.
They should also ensure that staff evaluating
potential systems and vendors have a good knowledge
of the requirements and of the technology being
evaluated. |
| 125. |
The Workshop recommended that
NSOs and census organizations make full use of
the opportunities that new information technology
offered in the conduct of censuses.
They should bear in mind that no stage of a census
could now be planned and executed without taking
technology into account and that new technology
had merged certain stages in the census operation.
The Workshop recommended that census organizations
make corresponding changes in their organizational
and management structures, and adjust the resources
available for IT procurement, recruitment of skilled
staff, and training of existing staff. |
| 126. |
Given that statistical offices
had to take the whole range of census operations
into consideration while assessing the implementation
of new technology applications, the Workshop recommended
the application of quality management strategies
as a useful method for control of the whole process.
Further, the interoperability of the various components
to be chosen required special attention, not only
with regard to the operational aspects but also
in terms of the integrity of the huge masses of
data to be processed. |
| 127. |
Recognizing that many developing
countries were using public domain software packages,
the Workshop recommended that ESCAP should promote
sharing of experiences on the use of such packages
with a view to maximizing the benefits of those
applications. |
| 128. |
The Workshop recommended that
statistical offices should avoid procuring hardware
and software that did not run under common operating
systems, that did not provide integration with
other systems, that was not easily extendable,
that had no indication of long-term support and
that was likely to lead to dependency on one vendor. |
| 129. |
Noting that electronic format
had many advantages over hard copy format, the
Workshop recommended that statistical offices
should aim at digitizing census and survey information
as early as possible. That would involve
greater utilization of existing electronic records
(administrative records), adoption of computer-aided
interview technologies, and scanning of census
forms immediately after enumeration. Electronic
format minimized manual handling of forms and
allowed maximum flexibility in data verification
and editing. |
| 130. |
The Workshop noted that it was
essential for statistical offices to ensure, as
part of the evaluation process, that selected
vendors had the commitment and capacity to train
the statistical office staff in the hardware or
software, and to provide continuing service and
support. |
| 131. |
Considering scarce resources,
especially in the small developing countries and
areas, the Workshop recommended that on a subregional
level governments should find ways to cooperate
in the purchase and utilization of expensive current
technology, e.g., by sharing the cost of acquisition
and responsibility for operation and maintenance
of such equipment. Further, the Workshop
recommended that governments of developed countries
and areas, which operate such advanced?technology
systems, should make their use available to developing
countries in the region, preferably at nominal
or no cost. |
| 132. |
The Workshop noted that language
capabilities of data capture and dissemination
software were important in many countries.
In the area of data capture, the OCR/ICR engines
achieved very high recognition rates for hand?written
characters in a limited number of languages; that
efficiency was not matched for numerous other
languages in Asia. Similarly, many NSOs
required bi? or multilingual capabilities for
data tabulation and dissemination software.
The Workshop recommended that the NSOs express
language capability as one of the prerequisites
for software acquisition, and recommended that
the software developers expend efforts in incorporating
local language and multilingual capabilities in
their products. In that regard, it was noted
that the Workshop had provided an excellent opportunity
for the vendors to better understand the needs
of the NSOs and also explain some of the features
of their products which were of interest to the
NSOs. |
| 133. |
The Workshop recommended that
further technical meetings be held after the 2000
and 2001 censuses to share information on technology
lessons learned, and to promote effective data
utilization and dissemination. |
| 134. |
To facilitate exchange of experiences,
ideas and information on resourcing and other
topics, it was recommended that an e-mail based
discussion group be established. |
|
|
| Data
collection and capture |
| 135. |
As the current data capture
technology provided increasingly powerful means
of handling data on numerous topics for large
collections, the pressure for expanding the scope
of the census was mounting. The Workshop cautioned
that in considering those demands, census statisticians
must not ignore the operational aspects of actual
data collection in the field, the skill levels
required for data collection and handling, and
the technical requirements. |
| 136. |
The application of IT would
also assist countries in improving the management
of errors and coding of captured information from
censuses and surveys. The Workshop recommended
that greater sharing of information should be
promoted in those areas, including computer?assisted
coding. |
| 137. |
The Workshop recognized that
selection of data capture technology was a crucial
success factor in census taking. It advised
census organizations to assess carefully all costs,
including the implications for various census
operations, involved in the selection, procurement,
operation, maintenance and management of capture
technology. |
| 138. |
The Workshop recommended the
conduct of at least one and preferably two major
tests using real forms, real enumerators and real
respondents to test systems. Testing was
needed for:
- the selection of the
preferred technology
- refinement and improvements
in the technology
- development of procedures
and arrangements related to the implementation
of the technology
- the building of awareness
within management about how the new technology
should be handled
- calculation of the resources
needed for the main event
- preparation of the content,
schedule, and methodology of training to be
carried out.
|
| 139. |
The Workshop recommended that
census organizations make full use of the flexibility
that was offered by new imaging and recognition
technologies, for instance by planning for an
early release of results for the most important
topics. |
| 140. |
The Workshop recommended that
census organizations evaluate data capture solutions
carefully taking into account country circumstances.
Evaluation results obtained elsewhere were not
necessarily directly applicable, due to differences
in handwriting patterns, questionnaire design,
and availability of quality paper, ink and printing
facilities. It noted that competitive benchmark
testing had become a standard evaluation method
in large census organizations all over the world. |
| 141. |
Noting that the available character
recognition software was developed for universal
use and that the turn-key OCR/ICR solutions were
restricted to data capture (and did not cover
the whole census operation), the Workshop recommended
that software developers incorporate in character
recognition applications statistical features,
such as classifications that assisted in data
coding. |
| 142. |
The Workshop recommended that
statistical organizations planning to use OCR/ICR
should develop procedures to control the quality
of recognition. It was particularly
important to search and check for non?random bias
caused by systematic recognition errors. |
| 143. |
The Workshop agreed that imaging
should not be used simply as a data capture replacement
technology, and recommended that statistical organizations
identify which other census processes were affected
and determine how they could be made more efficient
and cost-effective. |
| 144. |
The Workshop recommended that
census and survey offices should consider outsourcing
as an option for implementing elements of censuses
and surveys. It noted that the feasibility
of outsourcing depended on national circumstances,
the organization's own resources and skills, and
the availability of external partners. It
heard of the experiences of the Singapore Department
of Statistics in developing an innovative multi?modal
data capture system for the year 2000 census by
using several external developers.
The Workshop noted that the multi?vendor approach
required clear delineation of responsibilities
for system development and support, which could
be conveniently achieved by using a prime contractor
approach. |
|
|
| Guidelines |
|
|
| 145. |
The Workshop identified several
areas where the guidelines on data collection
and capture could be improved and requested the
Working Party to implement the changes where possible. |
|
|
| Data
warehousing, databases, data archiving |
|
|
| 146. |
The Workshop noted that data
warehousing was a new technology with high potential
for increasing the value of census and survey
data by linking them to other data holdings.
Data warehouses provided access to a variety of
different databases and created the possibility
of combining statistical data from various statistical
surveys. The Workshop, however, recommended
that NSOs develop these data warehouses in a modular
fashion and keep long-term needs prominently in
mind: "Start small, but think big". |
| 147. |
Noting that getting the data
models correct was probably the most important
success factor in the implementation of a data
warehouse, the Workshop strongly recommended the
sharing of data models amongst the statistical
offices. |
|
|
| Data
Dissemination |
| 148. |
The Workshop recognized that
the evolution of information technology was not
only continuously offering opportunities for increasing
operational efficiency, but was also affecting
the requirements of data users. The Workshop recommended
that statistical offices should periodically assess
the needs and perceptions of the users in order
to be able to deliver census and survey results
through channels and formats that customers expected. |
| 149. |
Noting that the Internet was
a cost-effective dissemination mode both for data
providers and users, the Workshop recommended
that NSOs should establish and develop web sites
as a major data and information dissemination
channel. |
| 150. |
The Workshop noted the variety
of web sites available from statistical organizations
and recommended that offices investigating the
option of setting up a site of their own should
evaluate the features of other sites. |
| 151. |
The Workshop recommended that
NSOs start the development of web sites from simple
structures and designs that allowed expansion
of the site in a modular fashion, and provided
accessibility to users with narrow bandwidth. |
| 152. |
The Workshop noted that small
island countries did not have the skills as yet
to develop their own web sites; the cost was also
a major factor. The Workshop recommended
that countries which did not yet have their own
web site should look at the feasibility of acquiring
space on another organization's server, or on
a server in another country. The Workshop
recommended that this information be included
in the draft guidelines. |
|
|
| Mapping
and GIS |
|
|
| 153. |
Noting that maps were the best
way to illustrate spatial features of population,
the Workshop recommended that statistical offices
create new products that utilize digitized maps.
It also noted that maps were essential in census
planning, field work and operations monitoring,
and that in the long run, geographic information
systems were a feasible option for creating accurate
multi-purpose maps. |
| 154. |
Noting that GPS (Global Positioning
System) offered a cost-effective option for determining
spatial coordinates, the Workshop recommended
that NSOs should consider this technology option
for improving the accuracy of area maps required
in census and survey field work. |
| 155. |
The Workshop emphasized the
need for promoting training on special topics
related to the application of IT to census and
survey operations. |
|
|
| Follow
up |
|
|
| 156. |
At the end of the Workshop,
the participants proposed that a follow-up workshop
should be organized during the second half of
2000 to exchange information about the technological
successes and failures in the data capture and
data processing of the year 2000 round of censuses
in the region. That workshop could also
cover issues of data dissemination and data use. |
| |
|
| Annex
I |
| LIST
OF PARTICIPANTS |
| |
| Annex
II |
| TENTATIVE
TIME SCHEDULE |
| |
| Annex
III |
| LIST OF DOCUMENTS |
| Symbol |
|
Title |
|
| STAT/WNIT/L.1 |
|
Provisional agenda |
|
|
|
| Module
1: Introduction to IT in census operations |
| Module 1.1 |
|
Objectives of the Workshop* |
| Module 1.3 |
|
- An overview of
the project RAS/96/P12
- Project RAS/96/P12*
|
| Module 1.4 |
|
Introduction to Census
Operations* |
|
|
|
| Module 1.5 |
|
- Result of the ESCAP
Survey on Applications of Information
Technology to Population Data
- Presentation paper*
|
| Module 1.6 |
|
Information Technology
Trends and their impact on Census Data Processing* |
| Module 1.7 |
|
IT Management Challenges* |
| Module 1.9 |
|
Expectations for the year
2000 rounds of censuses* |
|
|
|
| Module
2: Paper based data collection and capture |
| Module 2.1 |
|
An Overview of Paper Based
Data Collection and Capture Technologies* |
| Module 2.2 |
|
An Overview of the OMR
Technology (Based on the experiences in
Japan)* |
| Module 2.3 |
|
The Use of Optical Mark
Reading (OMR) for Census Data Collection** |
| Module 2.5 |
|
OCR Questionnaire* |
| Module 2.7 |
|
OCR Technology Selection
for 2000 Population Census in Indonesia* |
| Module 2.8 |
|
Application of Imaging
Technology for Capturing Population Census
Data |
| Module 2.9 |
|
- Recent Experience
in Using New Technologies for Census**
- AFPSPRO - modules
description**
- Configuration for
UN's Demo (Census)**
|
| Module 2.10 |
|
Improving Work flows by
using Imaging for the New Zealand Population
Census |
|
|
|
| Module
3: Non-paper based data collection and capture |
| Module 3.1 |
|
Introduction to non-paper
based data collection and capture technologies
- CAPI* |
| Module 3.2 |
|
Efficient Computer Aided
Telephone Interview (CATI)* |
| Module 3.3 |
|
- Computer Assisted
Personal Interviewing Solutions in Australia
- Attachment 1: CAI
Manual Outline
- Attachment 2: Diary
and Office Processing: Integrating Blaise
with Other Facilities
- Attachment 3: Sample
Business Case for the use of Computer
Assisted Interviewing in Household Surveys
|
| Module 3.4 |
|
Data Collection Through
the Internet - IT Design & Security
Issues* |
| Module 3.5 |
|
Blaise: A survey processing
system* |
| Module 3.6 |
|
Integration of Different
Modes of Data Capture* |
| Module 2&3 |
|
- Guidelines on the
Application of New Technology to Population
Data Collection and Capture
|
| Module 3.8 |
|
- Guidelines on the
Application of New Technology to Population
Data Collection and Capture (Presentation
paper)*
|
|
|
|
| Module
4: Adding value to census data through data
warehousing and data mining |
| Module 4.1 |
|
- Adding Value to
Census Data through Data Warehousing**
- Stranded
on Islands of Data**
|
| Module 4.2 |
|
Data Warehousing** |
| Module 4.3 |
|
SAS demonstration** |
|
|
|
| Module
5: Data dissemination |
| Module 5 |
|
Guidelines on the Application
of New Information Technology to Population
Data Dissemination |
| Module 5.1-5.5 |
|
Data Dissemination* |
| Module 5.6 |
|
- PopMap*
- Use of IMPS for
Census and Survey Data Dissemination
in the Philippines*
|
| Module 5.10 |
|
Graphs* |
| Module 5.11 |
|
Maps* |
| Module 5.14 |
|
Interesting features of
Statistical Office web sites in the ESCAP
Region |
| Module 5.16 |
|
- Statistics New
Zealand's Classifications and Related
Standards (CARS) System
- Classification
and related Standards system (CARS)*
|
|
|
|
| Module
6: Geographic information systems |
| Module 6 |
|
Guidelines on the Application
of GPS and GIS Technologies for Digital
Mapping and Statistical Management |
| Module 6.3 |
|
- Demonstration of
GPS and its Applications in Digital
mapping
- Theory of DGPS
- DGPS Survey Manual
- Demonstration on
Application of Arc/Info, Arcview and
ERDAS Imagine Softwares in Digital Mapping
and GIS
|
| Module 6.4 |
|
- Application of
GPS for Digital Mapping and GIS
- Application of
Modern Mapping and GIS Technology to
Census
- Use of GPS for
Preparation of Census Enumeration Area
Maps and Mauza Database
|
| Module 6.5 |
|
- Pilot Application
of GIS to the Philippines Census 2000
Operations
- Presentation paper*
|
Background
papers
- Data Processing
for Demographic Censuses and Surveys
- Report of the Workshop
on Computer-Assisted Coding New Zealand,
17-21 April 1989
- Kazakhstan, 1999
Census
- Statistics New
Zealand
- Link to other government
statistical offices
- Distribution of
Household, L2+KBL2 Form
|
*
PowerPoint or other computer-based presentation.
** Vendor
material.
|
|