UN Web Site | UN Web Site Locator
Home Site map Contact 
ESCAP Statistics Division
ESCAP Statistics Division
 
First Meeting    
The First Meeting of the Working Party on the Application of New Technology to Population Data
Bangkok, 24-26 September 1997

STAT/WPA.1/3.1
24 September 1997
ENGLISH ONLY

ECONOMIC AND SOCIAL COMMISSION FOR ASIA AND THE PACIFIC

Working Party on Application of  New Technology to Population Data
First Meeting
24-26 September 1997
Bangkok

Recent developments in the application of information technology to population data collection, processing and dissemination at the Australian Bureau of Statistics
Dr Rob Edmondson
Director, Technology Application, Population Statistics Group
Australian Bureau of Statistics
rob.edmondson@abs.gov.au
September 1997
Contents

General Background

The application of information technology to population data collection, processing and dissemination at the Australian Bureau of Statistics (ABS) can be the conveniently divided into three roughly equal parts. The Population Census (the Census) with its size and public profile, the household survey program with many intertwining strands of work, and 'the rest' - such as some labour force collections based on employer respondents, and some demography and crime collections based on administrative by-product data.

Population Census

There were a number of significant IT developments in the last Census.  Personal Computers (PCs) were provided to field managers, Geographic Information Systems (GIS) were used for Collection District design, coding was performed using a PC based system, and tabulation and dissemination were largely performed using PC platforms. In the next Census, in addition to further developing these systems, the use of Imaging and Optical Character Recognition (OCR) is under active consideration. Each of these is briefly discussed below.

For the first time, PCs equipped with modems were provided to 145 field managers working from their homes around the country. Secure communications facilities were provided for frequent data exchange. This environment was used to make more information available to key personnel, and to provide it in both a timely and relevant manner. Information could flow between the field managers and more questions could be resolved locally. Overall, a successful use of new technology that will probably be refined in the next Census.

Another first in the Census was the move to GIS technology for Collection District design. The most significant obstacle to adopting this approach was the availability of suitable digitised geographic data, and a contract for the provision of this data was signed some years ago. Once available, the digitised information is a valuable resource for a range of purposes, both within the statistical agency and outside. Within the ABS, the digitised geographic information is being used in the household surveys program, and it forms the basis for GIS based dissemination products. The basic concept of using GIS technology for CD design worked well, and is likely to be retained and enhanced for the next Census. Maintenance of the information in the intercensal period is an issue under consideration.

The third major change in the last Census was the move from Mainframe based coding to PC based coding for those fields not captured by Optical Mark Recognition (OMR) processing. Together with a very effective system to manage paper flows, this proved very worth while. The PC based coding facility used a general-use coding engine that accommodates a wide range of coding indexes and a few styles of coding. To further reduce costs and improve timeliness, imaging and OCR are being considered for the next Census. It is hoped that OCR will permit the automatic coding of a significant fraction of responses, and that imaging will significantly reduce the need for paper handling. The concept is that when automatic coding of an OCRed field fails, the image will be presented for further processing rather than the paper. At this stage we are trialing OCR and automatic coding systems to test viability and refine estimates.

The last area worth commenting on is the use of PC based tabulation and dissemination facilities. These facilities are essentially improved version of the products used in the preceding Census. All tabulations were done using Supercross, and including the provision of data to the information warehouse. A number of CDROM based products have been released with software that provides simple 'browse, manipulate, export' functionality. A cut down version of Supercross is being considered as an alternative to the current product. The successful CDATA91 GIS product was improved for the '96 release, and there are some pilot projects using the Internet for dissemination of some Census data, including online (or offline from CDROM) area selection from maps. This is consistent with a general move to using internet interface and software technology for dissemination. Though based on internet software, the products are often packaged on CDROM and can be used offline, or they can be easily incorporated into an internal "intranet". Moves in this direction are greatly assisted by the incorporation of internet format output options in many of the packages used for dissemination in the Bureau.

Household Surveys

Household Surveys are predominantly interviewer administered, either face to face or (increasingly) by telephone, typically with data entered on an OMR form that is mailed back for scanning and processing. The main processing system is currently mainframe based and is reaching the end of its effective life. It is likely to be progressively replaced with new components that operate on the PC or in client-server mode. The processing systems are largely being replaced by SAS (available on mainframe, Unix, and PC) and the tabulation systems are to be largely replaced by Supercross (available on NT servers and workstations). Dissemination is predominantly by paper publication, by tailored data services (paper or floppy disk), and by confidentialised unit record data. There is increasing use of the Information Warehouse to hold data and metadata in a form fit for dissemination, and there are some successful pilot projects that move parts of the survey design stage activities, dispatch and collection control facilities, and management information systems into Lotus Notes (Notes).  Some surveys have used other data capture techniques: computer assisted personal interviewing (CAPI), telephone interviewing, and OCR have been or will be used in various surveys and are discussed below.

CAPI has been successfully used in a number of surveys, though the cost of in-field notebooks has made the cost/effectiveness of this approach questionable. The use of Blaise software (from Statistics Netherlands) proved very effective for population surveys, and the current DOS based Blaise software will be available in a Windows based edition before long. Blaise has enabled the fielding of more complex instruments, with in-field editing to improve data quality, and early transmission of relatively clean data. Experience to date indicates that processing time to clean unit record data has been reduced while ensuring the consistent application of edits to the data. A number of surveys have been and will be conducted using the existing stock of notebooks. It is not clear what use will be made of CAPI after this period though smaller handheld in-field devices hold some promise.

Telephone interviewing has been successfully used in the main Labour force collection to conduct the second and subsequent interviews. For the second and subsequent interviews, the interviewer enters the response onto an OMR form, and the forms are processed in the usual way. As the telephone interviewers work from home, the IT assistance has been limited, though centralised and computer assisted telephone interviewing is used in a number of economic collections.

Optical Character Recognition is also starting to be used in some surveys. Some economic collections have been using OCR for a few years, but the greater need for alphabetic rather than numeric character recognition in population statistics has slowed the adoption of OCR for collecting population statistics. The institutional mailback component of the Survey of Disability and Ageing will trial OCR, and the ongoing Survey of Income and Housing Costs will probably convert to OCR. If successful, these pilot projects, together with complementary work in Census, may see wide spread adoption of OCR in the future, perhaps displacing OMR as the normal data capture vehicle. Indeed, OCR may be more suitable for interviewer completed questionnaires than respondent completed questionnaires as the interviewers may be able to complete the forms with higher recognition rates.

To complete the picture, some household collections have used more traditional computer assisted data entry (CADE) systems. CADE systems have been based on a number of software products: older style desktop database software; the internally developed client-server Input Processing System (IPS); the DOS based Blaise system; and more recently, the Windows based Notes system. Notes is usually associated with messaging and groupware applications, but it now contains enough functionality and programmability to make it a useful data entry platform. Blaise proved particularly suitable for some innovative collections, such as the CADE system for capturing information from Time Use Diaries. Other CADE systems are mostly well established, but there are moves to take advantage of the Internet and related electronic data interchange initiatives to take advantage of emerging opportunities in this area.

Processing of the captured data is still largely done in the traditional manner using aging mainframe processes. There have been various approaches to rejuvenating these systems. In the main, these have emphasised the use of portable SAS for processing, the use of Supercross to replace Table Production Language (TPL), the use of input processing facilities packaged with data capture systems, and the use of Notes and other client server technologies. The ABS has contracted with the supplier of Supercross to incorporate various 'TPL' functionality into their product, particularly for processing the kind of hierarchical unit structures found in household surveys. Once the functionality has been delivered, we expect a substantial shift of processing from the mainframe SAS/TPL/PL1 environment to a client server SAS/Supercross/SQLWindows environment. These systems are being integrated with downstream dissemination initiatives, and upstream survey development processes to provide the next generation of household survey systems. Many of the systems and products can be used with the 'server' on the 'client' PC, though in the ABS we tend to field the systems using shared access UNIX and NT servers. A number of specialised statistical processing sub-systems are being or will be integrated into the new environment, including seasonal analysis/trending facilities and some general use statistical sampling and weighting systems that are tailored to the family of statistical methodologies used in the ABS by most (95%?) household surveys.

Increasing volumes of data are being made available for electronic dissemination. Some older dissemination techniques are being phased out in favour of newer technologies. Fiche is being displaced by CDROM based facilities providing much improved location, manipulation and display capabilities. Dissemination using older electronic messaging facilities is being replaced with Internet facilities and even floppy disks are being displaced by internet email. The internet presents many opportunities, and the ABS is quite well placed to move in this area with a large volume of data cleared for electronic release, and the ability to associate a steadily increasing collection of electronic documentation with such data.

Other Population Collections

As might be expected, the 'other' category is difficult to generalise. In the main, employer based surveys tend to adopt processing systems resembling those used by economic surveys. Some Labour Force employer surveys are starting to use OCR, and have used administrative byproduct data and electronic capture for some time. Crime and Demography collections do not have employer respondents, but most information is collected by other government instrumentalities (often State based). Administrative byproduct data is typically collected from each supplier in a different format and run through a specialised processing system.

Impact of new technologies on the operations of the national statistical office, benefits drawn, and issues encountered.

In the last few years the most significant technology changes have been a move to roughly one networked PC per employee, the installation of a much improved wide area network and the deployment of relational database technology (Oracle/Unix) and groupware technology (Lotus Notes). As a direct result, the dedicated mainframe terminal network has been removed, the centralised print service has shrunk to one printer part time, and the support cost of the mainframe has shrunk to very low levels.

Relational database systems have been deployed primarily in economic collection areas, but input processing and final dissemination (via the information warehouse) have both been used in population surveys, and the Census DPC had a dedicated Unix server. They have also been used for some administrative systems - particularly financial and personnel system.

Lotus Notes has been used for electronic mail, discussion databases, and the automation of many administrative systems (leave, acquisition, recruitment, staff movements, planning etc). It has also been used increasingly for a range of statistical processes, providing opportunities for improvements from the earliest survey design stages to final dissemination. The earliest successful systems used the flexible document structures to develop systems for query tracking and resolution, structured survey documentation, and management information systems. Administrative processes have also been automated, with most paper forms eliminated and electronic forms routing and processing automated so that only the originating and approving officers need view the electronic form in almost all cases.

Not only have processing systems been modernised, often with improved output and better timeliness, but the quantify, quality, and accessibility of documentation has improved. This has not been without cost, and there has been a substantial shift in resources from people to technology. This was in part the result of a change to full cost recovery of all IT operation, and giving the users the freedom to move money between various IT and non-IT expenditure items. To date, the IT organisation has retained a monopoly on the provision of IT Services (subject to demonstrable unit-cost decreases year on year), though it has increasingly used external service providers to deliver the service.

The network has provided the environment to host a range of servers and services available from every machine on the network. Banyan/Vines servers provide basic file, print, and communications services, Notes servers provide a range of messaging and document-database services. Unix servers carry a significant proportion of the data entry, analysis, and dissemination load. NT servers are providing specialised application services such as timeseries manipulation (FAME) and tabulation (SUPERCROSS). Most servers have no dedicated input/output devices, relying on general network services instead. Much more mainframe output is now sent to Banyan network printers (or to Notes databases) than is printed on centralised printers. As well as servers, there are a range of general services available over the network - OMR and OCR scanners feed data in, computer fax gateways receive and send faxes without paper being generated, internet gateways and firewalls provided internet email and limited internet browsing, file transfer machines provide secure transmission facilities for PCs in the field, and special devices such as CDROM cutters and plotters are available to all (or just to selected individuals).

Experiences in the implementation of applications
Client Server Technology

The large number of server types listed above indicates that we have had a considerable success with client server systems overall. However there have been a number of difficult implementations along the way, and some systems can only be classed as marginally successful when compared to the original expectations. As a general rule, economic collection have moved more rapidly to use client server platforms, but population collections are starting to catch up.

A number of early systems used PC database technology with multiuser database backends residing on Banyan servers. These were successful, but unpleasantly network intensive, and the remainder are being eliminated. We rarely develop server applications for these platforms now except when the software being used requires a shared file system. Blaise processes, some SAS systems etc are run from Banyan servers.

Notes is based on a non-relational database engine, and this can be used to develop applications of various kinds. Starting with documentation centred applications and moving into workflow, task tracking, planning and management information systems, these have been very successful. With the increase in the programmability of Notes, we are now developing much more traditional database applications using Notes.

The earliest UNIX/RDBMS systems were a CATI application and Financial and Personnel management. These were successful, though the required CPU power was underestimated at the start (a problem we have had with most client server applications). Financial and Personnel Management is still successfully based on a UNIX server, but most subsequent applications have been statistical applications, including some general use environments within which particular survey applications are constructed. General use environments include an Oracle based input processing system, an Oracle/SAS based processing environment, and an Oracle based information warehouse system. There have also been a number of Unix hosted but more specialised applications including a GIS server, OCR engines, and a new Business Register.

We have just started to deploy some NT servers for general use (as host environments for Notes, they have been used for some time). So far, these have been used to provide particular third party applications: FAME and Supercross. It is not clear how widely we may end up using NT servers, and whether they will supplement or displace existing Unix servers.

The Internet

ABS has had an Internet site for some time providing "subscription" based access to a range of statistics. More recently we have provided a Web site for public good information. We currently maintain several thousand pages of information using Lotus Domino technology which essentially allows us to put nominated Notes databases on the Web. Maintenance of the information is straight forward as it only uses the usual office documentation tool - Notes. Some more specialised initiatives are underway, including map-based drill down area selection to public good Census information. This will add a few thousand more pages to the site.

The ABS will continue to enhance the content of the site, and when appropriate third party charging arrangements are available, we expect to sell data over the Web on a self-serve basis. Even when direct use of the Web is not expected, for example when releasing CDROM material, we are increasingly using software and interfaces associated with the Web. There is also interest in collecting data using the internet once suitable security arrangements have been agreed. This can range from data capture using electronic forms on the Web to moving existing electronic providers to a better communications medium.

Data capture using OCR, OMR and other technology

OMR is well established as the main data capture technology in Census and Household surveys. More recently, the use of OCR has been growing, particularly in business mail back surveys. As outlined above, OCR is now being actively investigated by Census and some other population collections, and has a number of attractions - printing requirements are not as stringent, recognition rates for alphabetic characters are improving, and implementation costs are reducing.

Other data capture options also have their attractions. CAPI enables more complex questionnaires to be used and improves data quality and timeliness, but hardware costs will limit its use. This situation may change with hand held devices becoming available at significantly lower prices. CATI has also been used with significant benefits, though in population statistics, telephone interviewing has usually been used with paper OMR forms. Administrative by-product capture is also used by several collections, and opportunities in this area are increasing.

The use of GIS software packages for data collection and analysis

In the ABS, GIS are mainly used for 'frame' creation and selection, and for various publication and dissemination initiatives. The Integrated Regional Database (IRDB) and CDATA have been successful GIS dissemination initiatives, and the social atlases and other map products sell well.

Strategies adopted to address various issues and problems
Year 2000

Many population collections are cross sectional, substantially modified each time they are run, and are unlikely to have significant year 2000 problems. The next Census will be in 2001 and will be extensively retested before production use. Thus the scale of problem is manageable, but there are still significant areas of risk.

The ABS has evaluated all its systems and identified its more significant and risky applications, and various external dependencies. External dependencies include external organisations providing electronic data, and third party hardware and software. External hardware and software providers have been approached about year 2000 compliance and plans. External data providers are being identified and approached about file format and data provision risks. The highest priority application systems are being modified and/or redeveloped as part of this year's work program. This includes the significant sub annual Labour Force and Demography collections, and the household surveys processing environment. Test environments in which dates can be set forward are being progressively made available, and will be used to verify that the remaining systems do not have significant problems, and/or to correct any minor problems discovered. Outstanding and any more significant problems discovered will be remedied as part of next years work program.

Cost Recovery

The ABS has fully cost recovered all internal technology services, both applications and infrastructure services, for some years. The first couple of years were difficult, but the system is now reasonably well understood and works reasonably well. The main benefits have flowed from resource shifts resulting from the provision of better cost and consumption data, more direct pricing signals, and more freedom for line areas to capture benefits. The pricing structure is reasonably detailed, and each item fully recovers the costs, including all overheads, that are attributed to it.

In the infrastructure services area, cost recovery has resulted in demonstrably lower unit costs each year, and in higher overall technology expenditure as a result of a significant rise in demand for (and supply of) infrastructure services. The significant rise in the number of PCs, and the consequential demand for network and server capacity, was not a direct result of executive decision making. It was largely driven by individual managers shifting available resources into these areas. It occurred during a period when overall financial resources available to the ABS were subject to an annual reduction. The last mainframe, and all the servers have been acquired using existing budget allocations rather than requiring additional funds.

In the applications area, there has been a fairly steady demand for services, but there has been more flexibility in deployment, and more attention paid to the cost effectiveness of requested developments. The applications dollar could be spent on PCs, on subject matter people, on travel or other items.

Technology services have undergone a number of external benchmarking studies and have, as a rule, performed at the standard of world best practice. This is probably due in part to the cultural and financial effects of cost recovery.

Outsourcing and Market Testing

Australian Government policy is followed, and we have market tested and/or outsourced in number of areas - help desk provision, provision of GIS services, provision of support for financial management software, provision of field support, etc. The result of market testing has not always been outsourcing, but outsourcing often occurs when the required skills or expertise is peripheral to mainstream statistical processing. The Census and Statistics Act places limits on our ability to outsource unit record processing facilities, particularly for population data.

Areas where it might be possible for the NSO to make contributions to facilitating the transfer of technology and/or exchange of information to developing countries.

Some of the area that seem to offer some possibilities for transfer of technology and/or exchange of information of immediate relevant to population statistics include the use of Notes, the use of Blaise, computer assisted and/or automatic coding, exploiting various Internet technologies and opportunities, the use of Supercross, OMR/OCR experiences and developments, and exploiting GIS.


 
Pop-IT project (1997-2001)
Project Objectives
Working Party Members
Working Party Meetings
First meeting, Bangkok, 24-26 September 1997
Second meeting, Singapore, 1-3 April 1998
Third meeting, Bali, 7-9 January 1999
Fourth meeting, Manila, 6-9 July 1999
Ffth meeting, Bangkok, 21 October 1999
Sixth meeting, Bangkok, 26 March 2001
Workshops
Application of New Information Technology to Population data, Bangkok, 12-20 October 1999
Population Data Analysis, Storage and Dissemination Technologies, Bangkok, 27-30 March 2001
Guidelines
Population data collection and capture (BBS - Statistics Indonesia)
GPS in modern mapping and GIS technologies to population data (Bangladesh Bureau of Statistics)
Population data dissemination (Statistics New Zealand)
Project Newsletter
Contact us
   
Copyright (c) 2013 ESCAP  |  Legal Notice