The ABS has decided to use OCR for the 2001
Census unless unexpected circumstances arise.
We are currently negotiating with suppliers
for OCR equipment for the 2001 Census, so I
propose send some more details once a contract
has been signed. Broadly the plan is to replace
OMR with Imaging with relatively few changes
to the rest of the 1996 OMR-based system. We
expect to use a combination of OCR and Automatic
coding to reduce both publication timeframes
and processing costs, and expect to replace
paper distribution with image distribution thereby
reducing paper handling overheads. We may or
may not use 'repair' processes to increase automatic
coding as the existing computer assisted coding
facilities can easily be modified to enable
computer assisted coding from images. We will
also be considering the extent to which 'OMR-like'
questions and responses should be used.
We currently use OCR for a number of small
collections - mainly business collections but
including some employer-based labour surveys
and a social survey. Overall, the results have
been encouraging, but in a qualified and marginal
way. The setup overheads of small surveys have
been significant, and the social survey (disability)
had a number of alphabetic fields that required
coding, and repair overheads were hight than
expected. At current ABS cost-recovery
rates the net cost of OCR compared with more
traditional capture options varies from small
(unquantified) gains to small (though significant)
losses. Part of the problem has been the evolving
nature of the OCR service, part has been the
usual learning curves, and part has been the
difficulty interfacing the OCR subsystems with
the rest of our statistical processing infrastructure.
All these factors can be expected to improve
over time, and there are a number of interesting
developments in commercial OCR systems.
The Labour Statistics Centre have used existing
OCR facilities for two surveys (Major Labour
Costs and Employment Earnings and Hours), and
are considering using it in a number of other
surveys. A recent report on their experiences
suggests that OCR is suitable for surveys with
a large number of data items and forms, though
significant effort should be expended in the
initial stages on tolerances and form definitions
as these produce savings during operation. It
also suggests that forms should be designed
(differently) for OCR. The movement to image
based rather than paper based form storage was
appreciated, and staff adapted to the new way
of working well. However, cost savings are only
expected once things have settled down.
In contrast to OCR, The ABS has successfully
used OMR for many years and the processes, issues
and costs are relatively well understood. OMR
has been used primarily in Household and Census
applications. Household use is essentially continuous
as it is used for the monthly Labour Force questionaire
as well as a range of supplementary surveys.
Labour Force data is collected for two weeks
each month and published on a tight timetable
- OMR processing is primarily in the second
and third weeks of the monthly cycle, followed
by late returns and final cleansing for a up
to a week and publication towards the end of
the week following. The OMR (LAN based) capture
system essentially just scans to data in and
then reformats it to meet the needs of a (previously
developed) system. OMR equipment is distributed
in the various state offices. This (mainframe
based) editing system uses a parameterised set
of edit rules to validate the data and amendments
are applied till 'clean'. Some coding (particularly
industry and occupation) is needed at times
and this is usually done manually, with the
codes marked onto the OMR forms in State offices
before scanning. Once clean data is achieved,
aggregation, weighting, item derivation, seasonal
adjustment, trend calculation, analytical reports,
and publication tables are prepared in 2-3 hours.
Final clearance and publication takes a few
more days. The workflow associated with labour
force OMR is essentially a reduced and simplified
version of the Census cycle, with receipt registration,
precapture operations, scanning, then transmission.
Edit failures often requires the location of
the original OMR form, and amendment is essentially
a manual operation. Output processing for labour
force is fully automated, but output processing
for supplementaries is not.
he household surveys systems will be revamped
progressively over the next few years. The data
capture part of the process will be one of the
last to be revamped, permitting this key family
of coll ections to take advantage of current
developments without being exposed to undue
risks. Early developments will focus on
rejuvenating and integrating elements of the
output processing systems of labour force and
related supplementary surveys.
|