UN Web Site | UN Web Site Locator
Home Site map Contact 
ESCAP Statistics Division
ESCAP Statistics Division
 
Pop-IT Newsletter    
Application of New Technology in Population Data Newsletter
Newsletter No. 4, December 2001PDF format
Contents

Organized by the ESCAP secretariat with active support of the Working Party on the Application of New Technology to Population Data, the Workshop was attended by 38 participants from Armenia, Bangladesh, Brunei Darussalam, Cambodia, China, India, Indonesia, Japan, Kiribati, Malaysia, Maldives, Mongolia, Nepal, Pakistan, Papua New Guinea, Philippines, Republic of Korea, Samoa, Sri Lanka, Thailand and Viet Nam. The members of the Working Party from Australia; Bangladesh; Indonesia; Japan; Macao, China; New Zealand; Philippines; Singapore; and Thailand shared their respective country experiences.  Representatives of the Statistical Institute for Asia and the Pacific (SIAP), the UNFPA Country Technical Services Team in Bangkok, and the United Nations Statistics Division (UNSD) and the United States Census Bureau (USCB) participated actively as resource persons.  The Workshop benefited also from presentations by invited private sector companies.

The Workshop report, documents and presentations are available at http://www.unescap.org/stat/pop-it/pop-wdt/pop-wdt.asp

"I encourage you in your invaluable work of making census data as easily available as possible to your clients.  That goal cannot be achieved without application of modern information technology."

Mr Kim Hak-Su, Executive Secretary of ESCAP, in his inaugural address to the Workshop.

State of Information Technology in National Statistical and Census Offices in June 2001

This article provides an update of selected aspects of ESCAP's 1998 survey on Application of New Technology in Population Data Collection, Processing, Dissemination and Presentation, the results of which were published in June 1998 (see http://www.unescap.org/stat/pop-it/pop-itnl/news_03.asp).  It is based on responses to an email questionnaire sent in June 2001 to the statistical and census offices that had responded to the first survey.  Only a small number of questions were included from the previous round in order to make responding easy.

A comparison between the 1998 and 2001 results indicate that during the past three years, the PC, LAN and Internet infrastructure have been upgraded in all responding offices.  The increase in the number of computers in use has been remarkable, exceeding 50 per cent in many responding offices.  The staff-to-PC ratio has come down because of the increase in the number of PCs, although staff reduction was also a factor in some offices.-

Table 1. The number of staff and PCs in selected statistical offices, 1998 and 2001 (sorted by the staff/PC ratio in 1998)
Country/area Total staff 1998 Total staff 2001 Staff/PC 1998 Staff/PC 2001 PCs in LAN 1998, per cent PCs in LAN 2001, per cent Staff change 1998-2000 PC change 1998-2000, per cent
New Zealand 729 900 0.8 0.8 100 100 23.5 19.3
Australia 2845 3140 0.9 0.8 100 100 10.4 19.3
Japan 1823 1788 0.9 0.8 100 100 -1.9 10.0
Republic of Korea 1281 1568 1.2 0.9 99 100 22.4 57.6
Hong Kong, China 1495 1505 1.9 1.0 59 49 0.7 81.5
Lao PDR 50 30 1.9 0.7 85 78 -40.0 76.9
Samoa 32 36 2.7 2.0 0 100 12.5 50.0
Philippines 3131 3554 3.3 3.1 26 47 13.5 21.3
Turkey 2741 3079 3.8 2.2 5 71 12.3 91.8
Armenia 61 226 4.4 1.5 71 32 270.5 1000.0
Myanmar 311 302 8.9 5.3 29 28 -2.9 62.9

While technologically advanced offices provide email and web connection for all staff, and on every PC, some offices still have to manage with a couple of email accounts and a dismal connection speed, which make any web browsing barely feasible (see Table 2). Where a viable Internet connection is missing, field operations have to rely on conventional means of communication, with data being transferred on paper, on diskettes or through direct telephone connections.  Web sites, if they exist, are likely to be a result of efforts by dedicated individuals using infrastructure outside v  of their offices.

Table 2. Typical PC configuration and Internet connection in selected NSOs in June 2001
Country Typical PC processor, MHz Typical PC RAM, MB Typical PC hard disk, GB Type of Internet connection Speed of Internet connection, kbps Share of PCs that can send email, % Share of PCs that can browse web, %
Armenia 100 16 1 Radio modem 4 1.9 6.5
Australia 484* 137* 10* Frame relay, full duplex 1000

82.1 87.2
Hong Kong, China 333 32 3.2 T1 1544 21.3 21.3
Japan 667 256 15 Dual T1 3000 100.0 15.9
Lao PDR 600 64 10 Cable modem 56 4.3 0.0
Myanmar 166 16 1 None - 3.5 0.0
New Zealand 200 128 6 Frame relay 2000 90.9 90.9
Philippines 500 64 8 Leased line 64 20.0 20.0
Republic of Korea 700 64 10 T1 2048 100.0 100.0
Samoa 333 64 6 Dial-up . 38.9 38.9
Turkey 200 32 6 Leased line 128 46.4 46.4
* Weighted average of PCs and notebooks
Web site analysis

Outside the survey, a separate technology review[1] was made of known web sites of statistical and census offices.

In June 2001, a little over half of ESCAP's regional members and associate members, i.e. 32 of 57, had a national statistical web site in June 2001.  Most of the NSO and census web servers were located in the respective capitals.  Six of the 35 sites investigated were hosted abroad (statistical offices of Azerbaijan, Fiji, Islamic Republic of Iran, Marshall Islands, Federated States of Micronesia, and the Office of the Registrar General of India).

Figure 1 shows results of an experimental test on how fast individual web sites responded to a series of ping requests.  In the test done from Bangkok, the fasted average response was received from the nearest web site, the National Statistical Office of Thailand, followed by sites located in the United States and ASEAN countries.  The slowest responses came generally from the most distant sites in the Eurasian continent.  The fast responses from .fj (Fiji) and .fm (Micronesia) are due to their location in Seattle and Honolulu, respectively, which in the Internet topology are advantageous locations in relation to Thailand.  If the same test were conducted from a third party server located somewhere else, the results would be different.


[1] A complete version of the review is available in paper http://www.stat.go.jp/english/iaos/paper/survo.pdf presented at the IAOS Satellite Meeting On Statistics for the Information Society, Tokyo, 30-31 August 2001.
Figure 1. Average response time of statistical web sites to ICMP ping requests from Bangkok.
Average response time from Bangkok (milliseconds)

The technology of each web server was further investigated through Netcraft's detection service (http://www.netcraft.com).  Apache and Microsoft Internet Information Server were by far the most common servers (see Table 3).  In addition, Netscape Enterprise and Lotus Domino servers were hosting three and two web sites, respectively.  The Apache web servers were running on various Unix derivatives, the most popular being Linux (5 servers) and Solaris (4 servers).  All fourteen MS-IIS servers, as well as the two Lotus Domino servers, ran on Windows NT4.  The Netscape Enterprise servers were on Solaris.  Judging from the name of the net block owner, three quarters of statistical and census web servers were maintained externally.

Table 3. Statistical and census web servers by type and location of hosting, June 2001
  Net block owned by Total Share, per cent
NS/census office Outsider
Apache 1.3.x 1 14 15 44
Microsoft-IIS/4.0 4 10 14 41
Netscape Enterprise 3.6 - 4.1 2 1 3 9
Lotus Domino 5.0.x 2 - 2 6
Total number of servers 9 25 34 100
Share, per cent 26 74 100  
Data capture technologies

The results from the 2000/2001 round of censuses are being tabulated faster than ever before.  The March 2001 ESCAP Workshop concluded that data capture through OCR/ICR had become a proven technology that could make significant cost, timeliness and accuracy improvements in census data capture.  Several countries that were using OCR or ICR technology for the first time had released preliminary results (based on the whole population) in a matter of a couple of months.

Although the learning curve to master OCR/ICR is relatively steep, the technology has lowered the total cost of census taking, in some countries by 50 per cent or more.  The scanners and recognition software are rather expensive, but the cost can be moderated by using the same technology in several censuses and surveys and by sharing it with other agencies.

Twelve of the participating 24 offices in the mentioned workshop indicated that their offices still relied on keyboard entry; two used OMR and nine OCR/ICR.  The only country to offer the possibility of submitting information through the Internet was Singapore, where eventually 15 per cent of the population chose this option (see article on page 7).  Other Singaporeans responded either to computer-aided telephone interviews (CATI) or to person-to-person interviews.

Data capture technology in the 2000 round of censuses in selected ESCAP members and associate members
Keyboard entry OMR OCR/ICR Internet+CATI+OCR
Brunei Darussalam Bangladesh Australia Singapore
Cambodia Pakistan Bangladesh  
Indonesia   China  
Kiribati   India  
Malaysia   Indonesia  
Mongolia   Macao, China  
Nepal   New Zealand  
Papua New Guinea   Philippines  
Republic of Korea   Thailand  
Samoa      
Sri Lanka      
Viet Nam      

The 'beauty' of optical recognition technologies is that after the questionnaire forms have been scanned into images, they can be split into pieces, question by question or character by character, for recognition in a priority order.  Thus, data tabulation and analysis can be started from the most important information and almost immediately after imaging.  That is a major advantage over manual keyboard entry, which normally progresses form by form.  Handwritten open responses and questions requiring manual coding can be dealt with later as experts and verifiers working on them make progress.

Comparison of two data capture strategies

The ESCAP Workshop agreed that data capture through OCR/ICR has become a proven technology that can make significant cost, timeliness and accuracy improvements in census data processing.   Below is a comparison of two experiences shared at the Workshop:

Philippines Indonesia
  • Optical numeric recognition.
  • Four regional data capture centres, each having.
  • Windows NT network with five mid-volume scanners (Kodak 3510), fifteen Pentium III workstations, three magneto-optical disk drives, three CD-writers, a network printer and a 500 MHz Pentium III server with 90 GB hard disk capacity.
  • Software: Kodak MVCS for scanning, Eyes and Hands for Forms for ICR, and a tailor-made Census Progress Monitoring System.
  • The four data capture centres were operated by a total of 146 persons, in two shifts, six days a week.
  • A work shift was staffed by a shift supervisor, four data controllers (preparing forms for scanning and checking the validity of geographic codes), five scanner operators, four verifier operators and an operator for file preparation and transfer.
  • Optical numeric recognition.
  • Decentralized data capture in 41 centres having a total of 79 scanners at their disposal.
  • Scanning, recognition, verification and editing stages.
  • Kodak DS Scanners 3500 in the central office and in provincial offices.
Results 
  • Over 15 million forms scanned.
  • Reduction of staff required for capturing the data from 600 to persons in 1995 to 146 in 2000.
  • Nearly perfect recognition rate for OMR fields.
  • For handwritten fields a much lower rate.
  • Average recognition rate of 90-95 per cent.
  • Average speeds for interpretation and verification 3,400-3,500 and 270-320 forms per hour, respectively.
Results
  • 55 million double-sided household forms (representing the number of households in Indonesia) scanned.
  • Nearly perfect OMR recognition rate.
  • Recognition of numbers at a lower rate.
  • Human intervention by enhancing the quality of numbers did not markedly improve the recognition results.
Main problems 
  • The configuration had too few (only four) software licences for data verification; 8-10 verification licences would have been optimal.
  • Uneven quality of the printed forms.
  • Handwriting entries illegible or too faint, which increased the work needed before scanning and at the verification stage.
  • Some forms had to be enhanced or rewritten before scanning.
For more information, please see http://www.unescap.org/stat/pop-it/
pop-wdt/ericta1.pdf
Main problems 
  • Handwriting illegible or too faint
  • Use of unapproved or dull pencils.
  • The quality of the drop-out colour varied too much in the printed forms.
  • Sometimes the guiding colour marks were not omitted as expected, requiring manual entry of the data.
  • The allocated training budget for the enumerators was not available at an optimal point of time.
For more information, please see http://www.unescap.org/stat/pop-it/pop-wdt/wdt-05.asp
Singapore enumerates through the Internet

The Singapore Department of Statistics implemented a ground-breaking Internet census information submission in its 2000 census.  Of all census respondents, 15 per cent chose to submit their information through the Internet while others responded either to computer-aided telephone interviews (CATI) or to person-to-person interviews.

The system was available in the English language only and represented the second generation of Internet data collection systems in Singapore.  The first one, for the Business Expectation Survey, was launched in March 1998.

The Singaporean Internet data collection system was designed keeping in mind nine target features, namely (i) fast performance, (ii) user-friendliness, (iii) security, (iv) stability, (v) compatibility with a large number of browser platforms, (vi) possibility to continue form completion in another user session, (vii) integration with other data collection modes, (viii) intelligent branching of questions, and (ix) verification during and after completion of the form.  Given the existing technology, many of those requirements are still in obvious contradiction with each other.

The Department of Statistics used prototyping and intensive user-acceptance testing to fine-tune the system.  The front page of the census site was made small in size (kilobytes) and the web form was split into many parts in order to achieve satisfactory performance for users.  For the same reason, the number of automated checks, which were first built into the form, were reduced and moved to the server side.  Special attention was paid to the clarity of the form layout, questions and definitions.  During the enumeration period, hotline telephone support was available, and in response to the feedback, frequent system upgrades were made. High-level security was maintained at all times, with escalation procedures and plans for contingencies in place.

Some other countries, including Australia and Switzerland, have used the Internet for census data collection in 2001.  If well implemented, the technology platform is not the main obstacle in Internet collection.  The main concerns are related to perceived data security and potential bias that the collection method could cause.

Put census data on the Internet

The ESCAP Workshop agreed that the ultimate goal was that all publishable census data should be made available on the Internet.  That goal is today well within the reach of currently available technology as large volumes of data can be made accessible more easily and cheaply than ever before.

In a modern Internet development strategy, the same facility is designed to cater for the needs of both internal and external users.  Well-designed web sites could deliver data both to general data users, such as students, pensioners, libraries, and small businesses, and to analysts with more complex and often voluminous requirements including an interest in detailed metadata.

Statistics New Zealand is planning to expand the use of intermediaries in connection with its 2001 census, including the media, libraries, information brokers and bundlers, channel managers of high speed networks, community organizations and government organizations who already had close contacts with user groups.  The agency will pay significant attention to improving the navigation of the census web site, thereby assisting users to service themselves. A more user friendly site is expected to be achieved, among other things, by using common language; removing or explaining census jargon; increasing the ways to access data, terminology, area breakdowns and maps; and by improving sorting-by-topic and other features of the search facility.

Compact disks ideal for delivering volumes of data

The ESCAP Workshop was given some demonstrations of user friendly CDs that were developed with public domain software. The Cambodian 1998 census is available on four CDs, containing priority tables at country, province and district levels; mapping and graphing database based on PopMap; a very large REDATAM-based database containing microdata of all person and housing records; and aggregated data for Cambodia's 13,339 villages in six DBF-databases, each covering a different topic.  The visual effectiveness and user friendliness of the PopMap-based CD was particularly noted by the Workshop.  The GIS application consists of detailed maps for Cambodia, its provinces, districts and communes, with line layers for the main routes and rivers and point layers for the villages and schools.  A total of 123 different indicators down to the commune level formed the heart of the application.   The Viet Nam census CD is based on the IMPS suite, including its database, cross tabulation, and table and map viewer components.

Also presented at the Workshop were three leading commercial data dissemination tools.  Software suites of Beyond 20/20, PC-Axis (Statistics Sweden) and SuperSTAR (Space-Time Research) are suitable for small and large data sets, and have powerful desktop data manipulation facilities and web based detailed data access facilities.  Their performance, especially in terms of retrieval and tabulation speeds, and the flexibility and ease of control, is  impressive and goes well beyond what off-the-shelf database packages and some of the public domain packages provide.  Although sophistication and performance naturally carry a price tag, commercial dissemination packages are worth evaluating when creating dissemination strategies.  Prices for statistical and census offices generally depend on the population of the country, the size of data sets involved, and the volume of dissemination, and are generally subject to one-to-one negotiation.

Data warehouse with a browser interface is today's mass storage solution

In a traditional census database model each census year has formed a dedicated database with specialized codes and definitions.  In a modern warehouse approach, data from censuses conducted at different times are combined with other data.  The ESCAP Workshop recognized, however, that setting up a data warehouse is a challenging process and involves a lot of preparatory work, including standardization of codes and definitions and cleaning of data.

Compared to conventional data warehouses holding transaction and business data, statistical data warehouses have to facilitate more elaborate data analysis.  Statisticians and analysts require that data warehouses facilitate highly flexible data analysis, display metadata dynamically during analysis, and allow the customization of reports and other outputs.

The ESCAP Workshop noted that a thin-client design, where most processing is done at the server-end, is preferred for warehouses that stored huge volumes of census data.  In the system design, special attention needs to be paid to the integration of data extraction and data analysis tools, since statistical analysis is usually an iterative process, requiring testing of a large number of variables.

In their evaluation, the Singapore Department of Statistics considered a hierarchical drilldown a suitable method for selecting data from a warehouse, especially when business metadata are dynamically displayed.  The ability to save previously selected items is very important for queries that are needed frequently or repeatedly.  A "drag and drop" -type of interface would make statistical analysis convenient:  calculating statistical parameters, such as the mean and standard deviation, could be achieved by 'dropping them into' data items (records or variables), or vice versa, data items could be 'dropped into' statistical parameters.  Another criterion that Singapore set for a data warehouse package is the possibility of making revisions to data both locally (affecting only the analyst) and globally (affecting all users of the data warehouse).

The Workshop agreed that graphical and topographical tools, with integration to tabulation and drill-down possibility into points of interest in a graph or map, are also desirable features in a census data warehouse.  A good data warehouse system supports saving of data outputs, including data extracts, tabulations, analytical and other reports, or graphs, in common data formats which could be read by third party software.

GIS for effective presentation

Information presented on maps is essential at almost all stages of census operations.  Therefore, one cannot do without geofererenced databases.  A grid square database is a low cost alternative for presenting small area data.  It could be considered by census organizations that do not have the resources and expertise required for digitizing the enumeration boundaries.  The allocation of households to grid squares is resource consuming and requires fairly detailed maps.  There are simpler techniques that could be used for allocating complete enumeration districts to grid squares.

The Workshop was given an overview of how the United States Census Bureau uses georeferenced data to display census results.  The Bureau's GIS presentation system is building on TIGER (Topologically Integrated Geographic Encoding and Referencing) database, which contains detailed geographic features for the United States.  TIGER mapping was used at all stages of the 2000 census, from enumeration to reporting of results.

The American FactFinder is a web-based system for access and dissemination of Census Bureau data on the Internet, built from TIGER boundaries and other geographic information, census data and metadata.  The current elaborate online version is a result of incremental work over the past two decades, responding to the legislative mandate to provide the public a full and free access to census statistics.  In the FactFinder, it is possible to drilldown the maps (which were based on vector graphics) from country level down to the census block level.

Feedback from data users is a cornerstone of census dissemination strategies in Australia and New Zealand

The Australian Bureau of Statistics and Statistics New Zealand are among census offices that pay significant and continuous attention to evaluating their products and consulting with users.  Internal and external users are also involved in prototype and acceptance testing.  The user feedback forms a basis for their proactive product development strategy.

The web has emerged as the main dissemination channel for the 2001 Australian and New Zealand censuses, and their development efforts are focused accordingly.  Providing self-service and dynamic access to data, they are planning to make data users more self-reliant and to lower the overall dissemination cost.  At the same time the role of printed material is changing. Statistics New Zealand for instance is phasing out some of the 'traditional' publications and developing a capability to print any electronic publication, on an individual basis, as and when needed.

Another significant advantage of the Internet is that it shortens the delivery time of census data to users.  Internet technology makes the time of data release more predictable than in conventional hard copy dissemination, as the printing process and distribution often take a longer time than expected.

Australia and New Zealand have decided to continue publishing community profiles of key data from their 2001 censuses as such products have proved effective in raising public awareness of census data and in increasing its use.

Elsewhere in the region, the ESCAP Workshop observed that the participation of the private sector in user consultations was often sporadic and in some countries absent altogether.  Therefore, it encouraged census offices to contact potential clients in the private sector and involve them in producer-user consultations and other promotional activities as equal customers.  The public at large and children were also recognized as important clients.

Just as important as good design is the users' awareness of available census products and services.  Census offices should establish marketing strategies to inform established and potential users about the benefits of census products.  Those strategies might use several modes of communication and include visible product launches.  Maintaining ongoing awareness during and between the census cycles is an important part of the strategy.

ESCAP Workshop on Population Data Analysis, Storage and Dissemination Technologies, Bangkok, 27-30 March 2001Recommendations

The ESCAP Workshop made over 50 specific recommendations regarding census data collection and capture, data storage and analysis, and data dissemination strategies and technologies.  The topics on which recommendations were made are listed below; the detailed recommendations can be found in the Workshop report, http://www.unescap.org/stat/pop-it/pop-wdt/wdt-rep.asp

Related to census data collection and capture, recommendations were made on:
  • the importance of careful questionnaire form design in successful character recognition
  • just-in-time training of enumerators in filling out OCR/ICR forms
  • the use of proper pencils or pens in marking OCR/ICR forms
  • the maintenance of scanners
  • the robustness of the file management component of the data capture chain
  • the testing of the proposed data capture configurations in real situations and making necessary modifications to them
  • bandwidth, security and other considerations in Internet data collection systems
  • the testing of Internet data collection forms in different bandwidths and improving  the real and perceived performance
  • data collection control when Internet collection was accompanied by other collection methods
Recommendations regarding census data storage and analysis related to:
  • using new technologies to link census data longitudinally and with other data sets
  • reviewing the applicability of data warehousing technology when new storage systems were considered
  • starting the building of a data warehouse in a modular fashion and with manageable data content, with business and statistical considerations in mind
  • the high cost and effort involved in setting up a data warehouse and cleaning the data
  • building a central system for maintaining statistical metadata
When considering data users' needs and dissemination strategies, census offices were recommended to:
  • adopt a proactive strategy towards the improvement of data dissemination
  • diversify data dissemination strategies and technology solutions according to the needs of different types of users
  • utilize the possibility offered by optical recognition to capture and release census data gradually, starting from key information
  • use prototyping and vigorous testing to perfect dissemination products
  • use modern marketing techniques to increase data use
  • choose hardware and software platforms that are compatible with standard technologies
  • provide web links to national counterpart sites and other sites containing useful census information
  • consider creating community profiles of census data to increase their use
With regard to data dissemination through the Internet, the Workshop recommended that statistical and census offices
  • adopt the Internet as part of their dissemination strategy, use hypertext interface on CD-ROM, and use email for data promotion and for disseminating summary results
  • develop an internal policy and utilization of the Internet in general and include the production of web material in training programmes
  • create functional coordinating mechanisms for web site management
  • improve internal web site management skills through recruitment and training
  • design census dissemination sites for relatively low bandwidths by using various page authoring and data access techniques
  • provide file formats and scripts that all common browsers could handle
  • include in web sites census metadata in an easily accessible format
  • consider features that help clients service themselves when accessing census data
  • monitor the web site traffic and adjust the site content and navigation as the reports might suggest
  • pay special attention to the clarity of information and test the individual pages and the whole site thoroughly
  • provide the most popular content in static HTML in order to improve the site performance
  • be prepared to adjust the number of servers and balance the load as the traffic increases
  • to ensure the uptime of the public web site, use separate servers for resource-consuming tasks
  • keep production servers isolated from the Internet
  • consider using XML to code structured data pages
Recommendations on using geographical information systems included:
  • starting the application of GIS from low-cost alternatives and moving to advanced GIS technology when skills improved
  • considering grid square GIS as an alternative for presenting census data on maps
  • the visually effective use of low-end GIS and high-end GIS

 
Pop-IT project (1997-2001)
Project Objectives
Working Party Members
Working Party Meetings
First meeting, Bangkok, 24-26 September 1997
Second meeting, Singapore, 1-3 April 1998
Third meeting, Bali, 7-9 January 1999
Fourth meeting, Manila, 6-9 July 1999
Ffth meeting, Bangkok, 21 October 1999
Sixth meeting, Bangkok, 26 March 2001
Workshops
Application of New Information Technology to Population data, Bangkok, 12-20 October 1999
Population Data Analysis, Storage and Dissemination Technologies, Bangkok, 27-30 March 2001
Guidelines
Population data collection and capture (BBS - Statistics Indonesia)
GPS in modern mapping and GIS technologies to population data (Bangladesh Bureau of Statistics)
Population data dissemination (Statistics New Zealand)
Project Newsletter
Contact us
   
Copyright (c) 2013 ESCAP  |  Legal Notice