| Workshop on Application
of New Information Technology to Population Data |
| Bangkok, 12-20 October
1999 |
|
| Information Technology
Trends and their impact on Census Data Processing |
| (Presentation paper)
|
| Curiosity is probably a very
early developed human trail. The wish to walk
faster and to travel further, to till more land
and to lift more weight, resulted in recent exponential
development in most technical areas |
|
| General
IT trends
- Faster
- Smaller
- Cheaper
- Handier
- Better
|
|
Like the general human progress,
also . are charecteriszed by . solutions
|
- 1890 first electro-mechanical
tabulator, Hollerith
- 1940's first electronic
computer, vacuum tubes, UNIVAC, punch tapes
and cards, machine code programming, occupied
a huge hall
- 1950/60's transistors,
mini computers, magnetic tapes, line printers,
higher level programming languages (Cobol,
Fortran)
- 1970's solid state
integrated circuits, LSI, VLSI, micro chips,
hard disks, diskettes, disk operating systems
DOS, transaction based online systems, object
oriented programming
- 1980's, micro computers,
networks (LAN, WAN), laser printers, CD's,
scanners, relational databases, OMR, GIS,
standard software packages for micro computers
(word processor, spread sheet, database),
desktop publishing
- 1990's color VDUs and
printers, DVD, PDAs, OCR/ICR, VoiceR, enterprise
computing, intranet, internet, warehousing,
expert systems.
|
|
| Principal
trend setters
- Statistical organization
- Private sector,
government sector, universities, home
entertainment
- Manufacturers
|
|
Interesting to note the shifting
driving forces behind the technological developments
|
- Often, first computing
resource in country at statistical organization.
Resource was used by others (finance, administration)
- With improved technical
infrastructure, other organizations became
avant-garde users and statistical offices
are adapting the evolving technology to their
needs
- Lately, it is the computer
manufacturing industry which drives technological
development and pushes it onto the user community
|
|
| Relevant
hardware technologies
- High-performance
and high-capacity stationary and mobile
micro computers
- High-capacity
fixed and exchangeable hard disk storage
devices
- Color VDUs and
printers
- Optical scanning
devices
- Writeable CDs,
Digital Video Disks
- Local and wide-area
networks
- Remote sensing,
Geo-positioning System (GPS)
|
|
- Computers: 400-600 MHz,
64-256 MB RAM
- Disk storage: 6GB and
up, Zip drive 200 MB
- Printers: 5-8 ppm for
personal, combi-features for home office use
- Scanners: very cheap
home use, high-capacity industrial use
- CD 600 MB, DVD 5-18
GB, impressive retrieval speed good for video
replay
- Networks: provides
for work groups, organization-wide data sharing
- Remote sensing: for
cartography, accuracy can be better than 1m
GPS for mapping of enumeration areas
|
|
| Relevant
software technologies
- Graphic interface
operating systems
- Hierarchical
and relational database systems
- Metadata systems
- Statistical analysis
tools
- Optical and intelligent
character recognition
- Geographic information
systems
|
|
- Graphic interface:
first developed by Xerox, adapted by Apple,
appropriated by Windows
- Databases: for analytical
processing: square files, transposed files
(Redatam) for transaction processing: relational
databases (Access, dBase, Oracle)
- Metadata: data about
meaning, content, organization and purpose
of data
- Statistical analysis
s/w: SPSS, SAS, special demographic, s/w
- OCR/ICR: improved processing
power gives better results: Uruguay 1996:
preprinted numeric 99.98%, marks 99.7%, handwritten
numeric 98.9%, handwritten alpha 97.4% (but
about 15% of forms had to be manually improved
before submitting to the scanner)
- GIS: for catography
ArcInfo, MapInfo (commercial), for thematic
mapping PopMap (free), Supermap (commercial)
|
| Relevant
software technologies (continued)
- Integrated office
management
- Project planing
and management tools
- Typesetting
- On-line services,
bulletin boards
- E-mail, internet,
world wide web
|
|
- Integrated office management:
inter-office access to common information,
document sharing
- Project planning: MS
Project, Timeline, Primavera, critical path
resource planning
- Typesetting: transfer
of printed output in digital form to printing
house
- On-line service: external
end-user access to basic information, BBS:
internal access to instructions, documents
- E-mail and internet:
efficient correspondence (with audit trail),
dissemination of reports.
|
|
| Anticipated
future Trends
- Improved hardware
price/performance ratio
- Continued miniaturization
- Mobile computing,
incl. wireless communication
- Expanded world
wide web, E-commerce
- Improved expert
systems (ICR, voice recognition)
- Warehousing,
data mining
- Multimedia
|
|
| Improvements of current technologies
will have noticeable effects on the efficiency,
timeliness, quality and visibility of census processing.
Concerning completely new technologies, I cannot
see any on the horizon apart from robotics or
a fully developed Orson Wells environment where
every citizen is watched and controlled at all
times. But if we get that far, then we don't need
any population census anymore |
|
- Improved hardware:
(a) Faster and cheaper equipment, more affordable,
better performing, better quality, more throughput;
(b) Increasingly powerful software, greater
sophistication and complexity of problem solving,
more timely, more relevant and more useful
results; (c) Improved user-friendliness; (d)
Better targeting of result
- Continued miniaturization:
smaller and sturdier equipment, ever increasing
storage capacity
- Better mobility: hand-held
PDAs with WIN-CE and wireless transmission
for intelligent data collection (CAPI). Mobile
phone for voice transmissions from remote
areas
- Expanded WWW: improved
dissemination efficiency, dynamic data retrieval,
income generating
- Expert systems: advanced
knowledge based software solutions for: ICR
at image processing or directly at point of
data collection (write pad), voice recognition,
data mining, (far in future: enumeration by
robots?)
- Warehousing: currently
mainly for commercial use to identify consumer
preferences, trends and unexpected relationships
contained in large and varied data sets
- Multimedia: driven
by home entertainment, dissemination of dry
statistics can perhaps be made more intriguing
for the end-user
|
| IT supported
elements of census processing
| |
- Mapping
- Forms
and manuals
- Data
collection
- Data
capture
- Coding
|
- Error
checking
- Editing
- Output
- Dissemination
- Analysis
|
|
|
| These are the various steps
in the census process which can be supported by
IT. Three areas will be covered in detail during
the workshop, namely data capture, dissemination
and mapping. |
| Planning
and Management
| |
|
- Work
flow
- questionnaires
- data
files
- data
back-up
|
|
|
|
- MS
Project
-
Timeline
-
Quicken
|
|
|
-
Spreadsheet
-
IMPS/Centrack
|
|
|
|
- Critical path: important
to plan from the start, use available means
to define activities and resource requirements
and obtain critical path, manage the implementation
of the plan, assure feed back from line offices
to keep progress up-to-date
- Budget control: important
to control the budget at project level, even
of perhaps the Ministry of Finance is responsible
for the official records
- Workflow control: needs
clear advance definition (EA list) with count,
processing plan, tracking of EA folders and
data files through various stages of processing
- Back-up system: must
be effective and assure safety of data
- Virus protection
- Challenge is to keep
implementation plan updated and document and
data flow under control
|
|
| Mapping
- Geographic Information
Systems
- vector
- raster
|
|
- GIS: Vector-efficient,
space saving, elegant scale change. Raster
- sufficient for EA maps but space consuming
storage of graphic image
- Commercial systems:
ArcInfo and MapInfo are industrial strength
products, perhaps unnecessarily powerful for
census mapping needs. SDBQ, SuperMap, Redatam
are specially developed for statistical use
- Free software: UN/Vietnam
developed PopMap, IMPS, MapView
|
|
| Forms and
manuals
- Questionnaires
- ontrol forms
- Manuals
- Tabulation plan
- Census
design system
- Word
processor
- Form
maker
- Spreadsheet
|
|
|
- Numerous documents
to be prepared:
- questionnaires,
- control forms,
- preliminary manual
count forms,
- batch transfer forms,
- manuals for enumerator
and supervisor,
- editing and coding
rules and instructions,
- tabulation plan and
table definitions,
- analytical and administrative
reports,
- regular office communications.
- Census Design System
by US Bureau of the Censuses. First mentioned
in 1996, but development seems delayed (funding
problems?)
|
|
| Data collection
- Paper
questionnaires
- door-to-door
enumeration
- mail-in
|
| |
- CAPI
- fixed
collection points
- door-to-door
|
|
| |
| |
|
|
| This is an are where improvement
would have the most benefit to reliability and
timeliness of the further processing of census
data and to the overall quality of results. |
|
- Paper questionnaire:
unreliable, individual interpretation by enumerator
or respondent
- CAPI, CATI and E-form
should minimize such variations due to computer
validation. Door-to-door used for surveys
- Fixed points such as
customs, magistrate
- PDA already successfully
used as enterprise platform, great hope for
future, as these would bring significant improvements
to reliability, quality and timeliness of
census processing. Problem might be typing,
but write pad capability of PDAs will improve.
Slow-down due to error checking compensated
by reply sensitive guidance through questionnaire
- Voice recognition still
far off, but could start playing a roll in
a few years
- E-form could be efficient
but too few respondents with access to internet,
even in highly developed countries
|
| Data capture
- Key-to-disk data
entry
- OMR
- Image scanning
with OCR/ICR
|
|
Of course, if we have to have
paper based data collection, then improvement
of data capture will have significant benefits
in time and accuracy
|
- Key-to-disk is for many
developing countries still the preferred mode,
add'l advantage is equipment influx, DP training
- OMR was successfully
used already in the 80s (Caribbeans, Bangladesh)
but has stringent paper quality and environmental
demands
- OCR can also read marks,
ICR interpretation of handwritten characters
(numbers better than alpha) storage of images
can be interpreted during coding and editing
without further reference to the questionnaires
|
| Coding
- Manual (before
data capture)
- Computer assisted
(after data capture)
- Automatic
|
|
- Manual coding very slow,
cumbersome and rather unreliable
- Computer assisted coding
gives significant gains in consistency due
to look-up tables
- Automatic coding after
OCR/ICR, only for certain variables feasible
such as gender, age, but less so for occupation
and industry
|
| Perhaps a combination of computer
assisted and automatic coding is most feasible |
| Error checking
- Manually
- Automatic, with
error listings
- Automatic, including
imputation
- pre-determined
- hot-deck
- undetermined
|
|
Even with much improved or almost
perfect data collection and capture techniques,
this processing step will always have to be performed
|
- Manual checking. Some
basic checks always required before data capture,
such as geographic code and presence of essential
fields
- Trend is toward imputation:
- pre-defined, a fix
value depending on some indicators within
record,
- hot-deck, copy value
from another record with similar characteristics
- undetermined, category
for clearly out-of-range and inconsistent
values
|
|
|
|
- Like all manual process,
unreliable and time consuming, error prone
and inconsistent
- Computer assisted editing
results in improved speed and consistency,
accuracy, can be done automatically in connection
with validation
|
|
| Output
- Database
- Tabulation
- Thematic maps,
graphs, census atlas
- Administrative
reports
- Analytical reports
- IMPS/Cents
- Redatam
Plus
- PC-Axis
- PopMap
- SDBQ,
SuperMap
|
|
|
A variety of output possibilities,
some are essential, i.e. database, tabulation
|
- Database are: microdata
stored as square hierarchical file, transposed
indexed file, macrodata in table format (printout
copy or aggregate data), integrated metadata
systems in preparation for warehousing
- Tabulations are primarily
on paper, lowest unit: village level
- Maps and graphs: help
better visualizing results
- Administrative reports:
provide full documentation of the entire census
undertaking, including lessons learned
- Analytical reports:
extensive analysis usually by outside organizations
after census project is completed
|
|
| Dissemination
- Printed
reports
- Microfilm
- Disk media
- On-line
(BBS)
- world wide
web
|
|
|
This is an interesting area,
because by selecting proper dissemination methods
the user base can be dramatically enlarged
|
- Printed reports: traditional
printing facilities, directly from hard copy
printout, or, better, from tabulation data
file
- Microfilm: requires
special equipment (inexpensive) but somewhat
uncomfortable to operate and read, has waned
in popularity due to available electronic
means
- For all digitally distributed
information: confidentiality is an important
issue. Diskettes for subset of data, CD can
be used with dynamic retrieval software (SDBQ)
when entire census macro data are stored.
Cheap desk top systems exist for CD-ROM recording
- On-line: Phone/modem
access required to obtain pre-defined tables
or dynamically generated output from microdata,
used for domestic consumption. Diminishing
importance for bulletin boards
- WWW similar but with
larger global audience, may include remote
tabulation requests, delivery possibly against
payment like E-commerce
|
|
| Analysis
- Demographic
analysis software
- General purpose
analysis software
- PAS
- MortPak,
Qfive
- Fertility
estimates
- PANDEM
|
- DemProj
- People
and Workers
- FIVFIV
- LIPRO
|
|
|
Here come the magicians who
can manipulate the diligently collected, processed
and presented census data. However, analysis is
usually an activity beyond the actual census operation.
|
- Relevant analysis s/w
has been available, with or without cost,
for long time, some being adapted to more
recent computing environment, others remain
DOS based.
- Vary powerful commercial
analysis s/w such as SAS and SPSS has been
developed for mainframe computers but have
been adapted successfully for the micro computer
environment
|
| Conclusion
- Continued accelerated
technological development
- Improved reliability,
quality and timeliness
Depending on local
infrastructure:
- Improvement of
traditional methods
- Increased use
of OCR/ICR, CAPI, E-form
- Increased use
of GIS and thematic mapping
- Increased use
of CD/DVD, (BBS) and WWW
|
|
| |
- Accelerated technological
development: smaller, faster, cheaper, easier
to use hardware and software
- Improved reliability
quality and timeliness: might be achieved
with improvements in the area of data collection
- Some of the recent
or forthcoming technologies may have limited
use for countries without appropriate infrastructure.
Implementation of proven technology should
therefore be carefully considered
- Improvement of traditional
methods: paper based data collection but with
better designed forms, better methods for
planning and control, better mapping, etc.
- Increased use of OCR/ICR
for data capture. CAPI (also with PDAs), CATI,
E-form for data collection, resulting in better
quality of data and more timely reports
- GIS: increased affordability,
cooperation with other Gov offices (Cadaster),
easy digitizing, powerful presentation s/w
(PopMap, Redatam, SQBQ)
- CD/DVD: cheap CD cutting,
efficient storage and dynamic tabulation s/w
(SQBQ), Improved communications infrastructure
(BBS), globalisation with WWW
|
|
|
|