ESCAP logo
Home Site Map Index Contact
 
About US Media Centre Members Programmes Documents Publications Jobs
Search:
More Options | Search Tips
Bangkok, Thailand  
  Home > Statistics Division > Workshop 1999

Statistics Division, UNESCAP
About us
Statistics Development
 
Bullet Statistics for monitoring MDGs
Bullet Statistics on disability
Bullet Statistics on informal sector and informal employment
Bullet Microdata management
Data Centre
Statistical Publications
Statistical Newsletter
Committee on Statistics
Meetings
Contact Us
Related Links
Calendar of statistical meetings in Asia and the Pacific
National Statistical Offices in Asia and the Pacific
Statistical Institute for Asia and the Pacific
United Nations Statistics Division
UNdata
Millennium Development Goals Asia Pacific
 
Workshop on Application of New Information Technology to Population Data
Bangkok, 12-20 October 1999
Information Technology Trends and their impact on Census Data Processing
(Presentation paper)
Curiosity is probably a very early developed human trail. The wish to walk faster and to travel further, to till more land and to lift more weight, resulted in recent exponential development in most technical areas

General IT trends

  • Faster
  • Smaller
  • Cheaper
  • Handier
  • Better
Like the general human progress, also . are charecteriszed by . solutions

  1. 1890 first electro-mechanical tabulator, Hollerith
  2. 1940's first electronic computer, vacuum tubes, UNIVAC, punch tapes and cards, machine code programming, occupied a huge hall
  3. 1950/60's transistors, mini computers, magnetic tapes, line printers, higher level programming languages (Cobol, Fortran)
  4. 1970's solid state integrated circuits, LSI, VLSI, micro chips, hard disks, diskettes, disk operating systems DOS, transaction based online systems, object oriented programming
  5. 1980's, micro computers, networks (LAN, WAN), laser printers, CD's, scanners, relational databases, OMR, GIS, standard software packages for micro computers (word processor, spread sheet, database), desktop publishing
  6. 1990's color VDUs and printers, DVD, PDAs, OCR/ICR, VoiceR, enterprise computing, intranet, internet, warehousing, expert systems.

Principal trend setters

  • Statistical organization
  • Private sector, government sector, universities, home entertainment
  • Manufacturers
Interesting to note the shifting driving forces behind the technological developments

  1. Often, first computing resource in country at statistical organization. Resource was used by others (finance, administration)
  2. With improved technical infrastructure, other organizations became avant-garde users and statistical offices are adapting the evolving technology to their needs
  3. Lately, it is the computer manufacturing industry which drives technological development and pushes it onto the user community

Relevant hardware technologies

  • High-performance and high-capacity stationary and mobile micro computers
  • High-capacity fixed and exchangeable hard disk storage devices
  • Color VDUs and printers
  • Optical scanning devices
  • Writeable CDs, Digital Video Disks
  • Local and wide-area networks
  • Remote sensing, Geo-positioning System (GPS)
  1. Computers: 400-600 MHz, 64-256 MB RAM
  2. Disk storage: 6GB and up, Zip drive 200 MB
  3. Printers: 5-8 ppm for personal, combi-features for home office use
  4. Scanners: very cheap home use, high-capacity industrial use
  5. CD 600 MB, DVD 5-18 GB, impressive retrieval speed good for video replay
  6. Networks: provides for work groups, organization-wide data sharing
  7. Remote sensing: for cartography, accuracy can be better than 1m GPS for mapping of enumeration areas

Relevant software technologies

  • Graphic interface operating systems
  • Hierarchical and relational database systems
  • Metadata systems
  • Statistical analysis tools 
  • Optical and intelligent character recognition
  • Geographic information systems
  1. Graphic interface: first developed by Xerox, adapted by Apple, appropriated by Windows
  2. Databases: for analytical processing: square files, transposed files (Redatam) for transaction processing: relational databases (Access, dBase, Oracle)
  3. Metadata: data about meaning, content, organization and purpose of data
  4. Statistical analysis s/w: SPSS, SAS, special demographic, s/w
  5. OCR/ICR: improved processing power gives better results: Uruguay 1996: preprinted numeric 99.98%, marks 99.7%, handwritten numeric 98.9%, handwritten alpha 97.4% (but about 15% of forms had to be manually improved before submitting to the scanner)
  6. GIS: for catography ArcInfo, MapInfo (commercial), for thematic mapping PopMap (free), Supermap (commercial)

Relevant software technologies (continued)

  • Integrated office management
  • Project planing and management tools
  • Typesetting
  • On-line services, bulletin boards
  • E-mail, internet, world wide web
  1. Integrated office management: inter-office access to common information, document sharing
  2. Project planning: MS Project, Timeline, Primavera, critical path resource planning
  3. Typesetting: transfer of printed output in digital form to printing house
  4. On-line service: external end-user access to basic information, BBS: internal access to instructions, documents
  5. E-mail and internet: efficient correspondence (with audit trail), dissemination of reports.

Anticipated future Trends

  • Improved hardware price/performance ratio
  • Continued miniaturization
  • Mobile computing, incl. wireless communication
  • Expanded world wide web, E-commerce
  • Improved expert systems (ICR, voice recognition)
  • Warehousing, data mining
  • Multimedia
Improvements of current technologies will have noticeable effects on the efficiency, timeliness, quality and visibility of census processing. Concerning completely new technologies, I cannot see any on the horizon apart from robotics or a fully developed Orson Wells environment where every citizen is watched and controlled at all times. But if we get that far, then we don't need any population census anymore

  1. Improved hardware: (a) Faster and cheaper equipment, more affordable, better performing, better quality, more throughput; (b) Increasingly powerful software, greater sophistication and complexity of problem solving, more timely, more relevant and more useful results; (c) Improved user-friendliness; (d) Better targeting of result
  2. Continued miniaturization: smaller and sturdier equipment, ever increasing storage capacity
  3. Better mobility: hand-held PDAs with WIN-CE and wireless transmission for intelligent data collection (CAPI). Mobile phone for voice transmissions from remote areas
  4. Expanded WWW: improved dissemination efficiency, dynamic data retrieval, income generating
  5. Expert systems: advanced knowledge based software solutions for: ICR at image processing or directly at point of data collection (write pad), voice recognition, data mining, (far in future: enumeration by robots?)
  6. Warehousing: currently mainly for commercial use to identify consumer preferences, trends and unexpected relationships contained in large and varied data sets
  7. Multimedia: driven by home entertainment, dissemination of dry statistics can perhaps be made more intriguing for the end-user

IT supported elements of census processing

  • Planning and Management
  • Mapping
  • Forms and manuals
  • Data collection
  • Data capture
  • Coding
  • Error checking
  • Editing
  • Output
  • Dissemination
  • Analysis
These are the various steps in the census process which can be supported by IT. Three areas will be covered in detail during the workshop, namely data capture, dissemination and mapping.

Planning and Management

  • Process and resources
    • critical path
    • budget
  • Work flow
    • questionnaires
    • data files
    • data back-up
  • MS Project
  • Timeline
  • Quicken
  • Spreadsheet
  • IMPS/Centrack
  1. Critical path: important to plan from the start, use available means to define activities and resource requirements and obtain critical path, manage the implementation of the plan, assure feed back from line offices to keep progress up-to-date
  2. Budget control: important to control the budget at project level, even of perhaps the Ministry of Finance is responsible for the official records
  3. Workflow control: needs clear advance definition (EA list) with count, processing plan, tracking of EA folders and data files through various stages of processing
  4. Back-up system: must be effective and assure safety of data 
  5. Virus protection
  6. Challenge is to keep implementation plan updated and document and data flow under control

Mapping

  • Geographic Information Systems
  • vector
  • raster
    • ArcInfo
    • MapInfo
    • PopMap
  1. GIS: Vector-efficient, space saving, elegant scale change. Raster - sufficient for EA maps but space consuming storage of graphic image
  2. Commercial systems: ArcInfo and MapInfo are industrial strength products, perhaps unnecessarily powerful for census mapping needs. SDBQ, SuperMap, Redatam are specially developed for statistical use
  3. Free software: UN/Vietnam developed PopMap, IMPS, MapView

Forms and manuals

  • Questionnaires
  • ontrol forms
  • Manuals
  • Tabulation plan
  • Census design system
  • Word processor
  • Form maker
  • Spreadsheet
  1. Numerous documents to be prepared:
    1. questionnaires,
    2. control forms,
    3. preliminary manual count forms,
    4. batch transfer forms,
    5. manuals for enumerator and supervisor,
    6. editing and coding rules and instructions,
    7. tabulation plan and table definitions,
    8. analytical and administrative reports,
    9. regular office communications.
  2. Census Design System by US Bureau of the Censuses. First mentioned in 1996, but development seems delayed (funding problems?)

Data collection

  • Paper questionnaires
    • door-to-door enumeration
    • mail-in
 
  • CAPI
    • fixed collection points
    • door-to-door

 
PDAs
  • CATI
  • E-form
This is an are where improvement would have the most benefit to reliability and timeliness of the further processing of census data and to the overall quality of results.

  1. Paper questionnaire: unreliable, individual interpretation by enumerator or respondent
  2. CAPI, CATI and E-form should minimize such variations due to computer validation. Door-to-door used for surveys
  3. Fixed points such as customs, magistrate
  4. PDA already successfully used as enterprise platform, great hope for future, as these would bring significant improvements to reliability, quality and timeliness of census processing. Problem might be typing, but write pad capability of PDAs will improve. Slow-down due to error checking compensated by reply sensitive guidance through questionnaire
  5. Voice recognition still far off, but could start playing a roll in a few years
  6. E-form could be efficient but too few respondents with access to internet, even in highly developed countries

Data capture

  • Key-to-disk data entry
  • OMR
  • Image scanning with OCR/ICR
  • IMPS/Centry
Of course, if we have to have paper based data collection, then improvement of data capture will have significant benefits in time and accuracy

  1. Key-to-disk is for many developing countries still the preferred mode, add'l advantage is equipment influx, DP training
  2. OMR was successfully used already in the 80s (Caribbeans, Bangladesh) but has stringent paper quality and environmental demands
  3. OCR can also read marks, ICR interpretation of handwritten characters (numbers better than alpha) storage of images can be interpreted during coding and editing without further reference to the questionnaires

Coding

  • Manual (before data capture)
  • Computer assisted (after data capture)
  • Automatic
  1. Manual coding very slow, cumbersome and rather unreliable
  2. Computer assisted coding gives significant gains in consistency due to look-up tables
  3. Automatic coding after OCR/ICR, only for certain variables feasible such as gender, age, but less so for occupation and industry
Perhaps a combination of computer assisted and automatic coding is most feasible

Error checking

  • Manually
  • Automatic, with error listings
  • Automatic, including imputation
    • pre-determined
    • hot-deck
    • undetermined
 
  • IMPS/Concor
Even with much improved or almost perfect data collection and capture techniques, this processing step will always have to be performed

  1. Manual checking. Some basic checks always required before data capture, such as geographic code and presence of essential fields
  2. Trend is toward imputation:
  • pre-defined, a fix value depending on some indicators within record,
  • hot-deck, copy value from another record with similar characteristics
  • undetermined, category for clearly out-of-range and inconsistent values

Editing

  • Manual
  • Computer assisted
  1. Like all manual process, unreliable and time consuming, error prone and inconsistent
  2. Computer assisted editing results in improved speed and consistency, accuracy, can be done automatically in connection with validation

Output

  • Database
  • Tabulation
  • Thematic maps, graphs, census atlas
  • Administrative reports
  • Analytical reports
  • IMPS/Cents
  • Redatam Plus
  • PC-Axis
  • PopMap
  • SDBQ, SuperMap
A variety of output possibilities, some are essential, i.e. database, tabulation

  1. Database are: microdata stored as square hierarchical file, transposed indexed file, macrodata in table format (printout copy or aggregate data), integrated metadata systems in preparation for warehousing
  2. Tabulations are primarily on paper, lowest unit: village level
  3. Maps and graphs: help better visualizing results
  4. Administrative reports: provide full documentation of the entire census undertaking, including lessons learned
  5. Analytical reports: extensive analysis usually by outside organizations after census project is completed

Dissemination

        • Printed reports
        • Microfilm
        • Disk media
        • On-line (BBS)
        • world wide web
  • pre-defined (push)
  • dynamic (pull)
This is an interesting area, because by selecting proper dissemination methods the user base can be dramatically enlarged

  1. Printed reports: traditional printing facilities, directly from hard copy printout, or, better, from tabulation data file
  2. Microfilm: requires special equipment (inexpensive) but somewhat uncomfortable to operate and read, has waned in popularity due to available electronic means
  3. For all digitally distributed information: confidentiality is an important issue. Diskettes for subset of data, CD can be used with dynamic retrieval software (SDBQ) when entire census macro data are stored. Cheap desk top systems exist for CD-ROM recording
  4. On-line: Phone/modem access required to obtain pre-defined tables or dynamically generated output from microdata, used for domestic consumption. Diminishing importance for bulletin boards
  5. WWW similar but with larger global audience, may include remote tabulation requests, delivery possibly against payment like E-commerce

Analysis

  • Demographic analysis software
  • General purpose analysis software
  • PAS
  • MortPak, Qfive
  • Fertility estimates
  • PANDEM
  • DemProj
  • People and Workers
  • FIVFIV
  • LIPRO
Here come the magicians who can manipulate the diligently collected, processed and presented census data. However, analysis is usually an activity beyond the actual census operation.

  1. Relevant analysis s/w has been available, with or without cost, for long time, some being adapted to more recent computing environment, others remain DOS based.
  2. Vary powerful commercial analysis s/w such as SAS and SPSS has been developed for mainframe computers but have been adapted successfully for the micro computer environment

Conclusion

  • Continued accelerated technological development
  • Improved reliability, quality and timeliness

Depending on local infrastructure:

  • Improvement of traditional methods
  • Increased use of OCR/ICR, CAPI, E-form
  • Increased use of GIS and thematic mapping
  • Increased use of CD/DVD, (BBS) and WWW
 
  1. Accelerated technological development: smaller, faster, cheaper, easier to use hardware and software
  2. Improved reliability quality and timeliness: might be achieved with improvements in the area of data collection
  3. Some of the recent or forthcoming technologies may have limited use for countries without appropriate infrastructure. Implementation of proven technology should therefore be carefully considered
  4. Improvement of traditional methods: paper based data collection but with better designed forms, better methods for planning and control, better mapping, etc.
  5. Increased use of OCR/ICR for data capture. CAPI (also with PDAs), CATI, E-form for data collection, resulting in better quality of data and more timely reports
  6. GIS: increased affordability, cooperation with other Gov offices (Cadaster), easy digitizing, powerful presentation s/w (PopMap, Redatam, SQBQ)
  7. CD/DVD: cheap CD cutting, efficient storage and dynamic tabulation s/w (SQBQ), Improved communications infrastructure (BBS), globalisation with WWW


Copyright (c) 2008 ESCAP  |  Legal Notice