The Third Meeting of the
Working Party on the Application of New Technology
to Population Data
Bali, 7-9 January 1999
STAT/WPA(3)/16
7 January 1999
ENGLISH ONLY
ECONOMIC AND SOCIAL COMMISSION FOR ASIA AND THE
PACIFIC
Working Party on the Application of New
Technology to Population Data
Third meeting
7-9 January 1999
Bali
CAPI in Australia
Dr Rob Edmondson
Director, Technology Services Division
Client Relations Manager for Population Statistics
Group and
Methodology Division, Australian Bureau of Statistics
rob.edmondson@abs.gov.au
The ABS has used CAPI over the last three
years for a number of significant surveys including
two waves of the longitudinal Survey of Employment
and Unemployment Patterns, the Survey of Mental
Health and Well-being and the Survey of Disability,
Ageing and Carers. The CAPI software used (Blaise)
has also been used for straight data entry of
Time Use diaries and Household Expenditure Survey
diaries. CAPI will be used in the forthcoming
Housing survey, but that may be the last CAPI
survey for some time.
CAPI in a Blaise/Notebook environment has provided
significant gains in the complexity and nature
of instruments that can be fielded, and in the
quality of data captured. The fleet eventually
purchased was sufficient for instruments in
the special supplementary survey (SSS) program
- typically a couple of large surveys a year,
but not for the general Labour Force program.
Under the full cost recovery model employed
in the ABS, this has come at significant cost
and the current notebook fleet is reaching the
end of its useful life. Given that the alternative
OMR technology is quite fast and efficient,
the cost-benefit equation is difficult - real
costs against other gains.
Our interviewer workforce adapted rapidly to
CAPI collection, and acceptance of it was high.
This probably reflects, in part, early consideration
of interviewer-related issues and action taken
to address these. Interviewer training for new
CAPI instruments may be slightly longer than
non-CAPI instruments, but given that the instruments
themselves are typically more complex, this
is not entirely unexpected. Respondent reactions
were a little more mixed, many were favourably
disposed but some were less happy.
Consideration is being given to future CAPI
options including notebook or palmtop based
options, SSS or Labour Force fleet size, and
doing without CAPI for a while. A current review
of the household survey program in the ABS canvasses
the better exploitation of the wide range of
information held in administrative datasets.
It also questions whether the marginal benefits
in terms of data better suited to user needs
warrants the added complexity that has increased
costs and time taken to develop and process
each survey. However the review recognises the
desirability of a range of different data collection
options including computer assisted interviewing.
Very useful material about computer assisted
interviewing is increasingly becoming available
in the public arena. A recent book " Computer
Assisted Survey Information Collection" edited
by Couper, Baker, Bethlehem, Clark, Martin,
Nicholls and O'Reilly, part of the Wiley series
in Probability and Statistics, 1998 covers many
topics including instrument design, survey design,
and case management. This paper will only address
the way that the ABS has implemented CAPI.
Broad
Architecture
Sample selection
CAPI requirements lead to the development
of a new sample and workload formation system.
Workload formation is the allocation of households
to interviewers. These facilities are not, for
the most part, CAPI specific, and they are used
for the general household survey program. However
the redeveloped system provided a convenient
way to "add" electronic generation and transmission
of workloads to CAPI machines. The sample selection
facilities are based on standard RDBMS client-server
technologies and are integrated with the standard
ABS applications environment. These facilities
were designed to provide the basis for later
standardised estimation components.
Office Management systems
Processing required before dispatch and after
receipt requires a range of office facilities,
including instrument development, initial notebook
configuration, resolution of edits, additional
editing, reallocation of incomplete interviews,
backup and recovery, etc. This heterogeneous
collection of things have little in common except
that they are centralised PC based systems.
Transmission
The transmission to and from the field uses
encrypting modems, authentication fobs and specialised
secure servers. On top of this hardware layer,
a simple but robust software system provides
a convenient transmission scheme. The field
notebooks have automated dialup connection from
the PC to a server (various states have a server).
To establish a connection one must not only
identify oneself, but provide a 'password' generated
by and frequently changed by an authentication
'fob'. This essentially checks that the dialup
party has a currently authorised fob in their
possession, as well as being a recognised field
operative using an encrypting modem. Having
established a connection, we transfer the content
of a person-specific directory in each direction
(using FTP). Any special 'automatic execution'
files delivered to the PC will execute after
closing the link. The server provides for periodic
transmission of collected files to the ABS network.
The only significant developments over time
were an increasingly structured subdirectory
structure, and a move to fully exchange the
workload state(s) each transmission. Early developments
emphasised reducing connection time by reducing
the amount of data transferred.
Remote device
CAPI only required low end DOS based notebooks,
but Occupational Health and Safety considerations
(weight, keyboard, screen readability in field
etc), and the available notebooks at the time
of purchase, meant that we actually use reasonably
capable IBM colour notebooks. While these were
quite capable of running Windows, the CAPI system
only requires DOS. The machines included disk
encryption and access control software, and
the modems plugged in 'at home'. Extra batteries
and 'door stop' interviewing stands were included.
The stands were rarely needed - most interviewers
are invited in and often permitted to use mains
power.
Survey Software
Software on the Notebook is essentially based
on Blaise and DOS. Other software can
be loaded and used, for instance some specialised
software was used during the Survey of Mental
Health and Well-being. Blaise has proved to
be a useful and powerful environment for CAPI
applications (as well as some straight data
entry applications). It's entry, coding, sequencing
and editing capabilities are well suited to
the needs of household CAPI. It is also used
for field management software (contact lists,
workloads), as well as household instruments
and individual instruments. This gives substantially
the same look and feel for most operations,
reducing training and support loads. The basic
survey instruments were generally developed
by subject matter processing people, though
specialist IT programmers wrote, integrated
and maintained substantial Blaise programs.
Over time, particular 'styles' for writing Blaise
programs evolved that both avoided some common
problems, and reused standard ways of doing
certain types of operation. These would have
(or will) become better documented 'standards'
if Blaise continued to be used.
Workflow
The following diagram shows the core processes
used during CAPI survey operation. This is followed
by a more narrative account of the main workflow
path. Various precursor activities (such as
instrument development, field trials, and interviewer
training) and later processing steps are not
covered.
Sample allocation (Assignment
of sample batch to interviewer)
This process deals with taking the sample
selected and allocating work to interviewers
in accordance with certain rules. ABS-standard
processes in ABS-standard processing environments
were used (though the software was redeveloped),
with additional processes to prepare the information
for electronic dispatch to CAPI machines. Instruments
can be in the field for various lengths of time
- up to a year, and workload allocation has
to take into account field trials of proposed
instruments. The workload for each interviewer
may need to be merged with already captured
information for some interviews for example
for incomplete response or multiple interview
instruments. Problems usually occurred when
the workloads were not quite standard - eg.
when the household system was 'adapted' for
the longitudinal person-based Survey of Employment
and Unemployment Patterns.
Instrument distribution
Instrument distribution used two techniques,
either preloading instruments to disk before
dispatching to interviewers, or transmitting
the new files via the modem based file transfer
system. It is desirable to cleanly separate
the transmission of software and the data. Retransmission
of an instrument is sometimes necessary, and
care needs to be taken that previously captured
data is not lost if this occurs. A register
of software (and data) present on each field
machine would have simplified matters
somewhat.
Workload transmission (Prepare
and Receive transmission)
A workload is a set of interviews to be performed
by an interviewer. An interviewer may
have several workloads at the same time. Transmission
is by the modem based file transfer mechanism,
though care needs to be taken in when handling
multiple workloads and/or surveys. Mechanisms
that risk overwriting already collected information
should be avoided, and the central servers should
be capable of acting as a remote backup device.
However data separation need not be complete
as on occasion it is desirable to use data collected
by one instrument in processing another. This
would have been common in the ABS if the monthly
labour force had gone CAPI, but as it turned
out data sharing was only between household
and individual survey instruments.
Software patch distribution
and update
Patching and upgrading is not desirable when
instruments are in the field, but it is sometimes
necessary. When needed, we used the file exchange
mechanism and 'automatic execution' feature.
Synchronising software release with necessary
data patches is particularly error prone.
Some means of 'switching off' an instrument
till the completion and success of software
installation has been checked would be useful.
Interview scheduling and
followup
A Blaise based system was provided, but it's
use was not mandatory -some interviewers used
it, some didn't. Typically newer interviewers
found it useful but more established interviewers
had already established effective work practices.
Uptake would probably have been higher if all
surveys had been available and each interviewer
always had a device.
Case selection (Interviewer
Management System)
Case selection was provided by a Blaise-based
front end program whose use was mandatory -
essentially it launched the Blaise instrument
on the appropriate record(s). This part of the
system records the basic status information
and can hold commentary useful for subsequent
visits/contacts, office editing, etc.
Field capture, coding and
verification (Electronic Questionaire)
Field capture was essentially by direct entry
at time of interview, but sequencing was always
fully automated and this provided the ability
to field instruments with much more complex
sequencing paths (though causes some 'instrument
validation' problems). For coding, picklists
were used where practical, otherwise either
Blaise based trigram coding was used, or free
text was captured for later resolution. Verification
was at-time-of interview when possible, improving
data quality and reducing downstream problems,
but edit override and later editing were also
enabled on some fields. Remarks could be inserted
against any fields, and post-interview editing
could make use of these comments.
Support for field personnel
A help desk number was provided, with third
line support available for the relatively few
'technical' problems. Most problems were with
instrument interpretation and relatively simple
technical problems. Incorrect early 'finalisation'
of an interview (blocking further data entry
and editing) was a common non-technical problem.
Transmission problems were the most common technical
problem.
Transmission of (partial)
status and results (Send and Process received
transmissions)
Returning data used the above file exchange
mechanism. Before dialling up the server a program
is run to extract data to be transmitted, the
data is 'zipped' and encrypted. The ability
to take a full snapshot of status at transmission
time proved a useful backup and recovery mechanism.
Data integrity needs careful thought when the
data can reside on central servers and one or
more remote machines. Unlocking field records
for further work requires care that the office
record (and any changes made to it) be properly
handed. Transferring interviews between interviewers
has the potential to cause similar integrity
problems particularly in longitudinal surveys,
but also in incomplete response situations,
or when an interviewer could not complete a
workload.
Office editing and manipulation
(various 'Office System' processes)
We mostly started by using the original Blaise
instrument and progressively extended the program
over time to incorporate derivations, new edits
etc. Sometimes it was useful to do 'batch' operation
opening and reprocessing/resaving each interview
record, though Blaise had some limitations in
this regard being oriented to on-line processing.
Extraction of data (Export
for further processing)
We mostly used Manipula (a standard part
of the Blaise environment) to extract data.
Typically a SAS based environment was the target
for further processing. Blaise field naming
conventions caused a few problems in transferring
the data to SAS, but the process is quite manageable.
For
next time
Three facilities would, with the benefit
of hindsight, been very useful. The first, providing
"remote debugging", would provide the ability
to control a PC from a remote site (ie. the
in-field machine from a central help desk).
This would have greatly simplified problem resolution
at times. The second, is the provision
of fairly (Windows-)standard news/mail/groupware
facilities. At the time Notes and Windows were
available but the training overheads, particularly
given the difference between BLAISE/DOS and
Notes/Windows, were though too high. With Blaise
for Windows now available, and a better appreciation
of the potential benefits of additional facilities,
the decision would probably be reversed today.
The third, is the provision of additional management
information systems - particularly around tracking
hardware location, help desk queries and hardware
failure records, software instruments downloaded
to notebooks, and some exception reports - treatment
of special dwellings etc.