UN Web Site | UN Web Site Locator
Home Site map Contact 
ESCAP Statistics Division
ESCAP Statistics Division
 
Second Meeting    
The Second Meeting of the Working Party on the Application of New Technology to Population Data
Singapore, 1-3 April 1998

STAT/WPA(2)/5(UNSD)
27 March 1998

ECONOMIC AND SOCIAL COMMISSION FOR ASIA AND THE PACIFIC

Working Party on the Application of New Technology to Population Data
Second Meeting
1-3 April 1998
Singapore

Using Population Census Data Modules to Produce Census Statistics for Internet
Robert Mayo
United Nations Statistics Division
Contents
Summary 
  1. Introduction to the development of World Wide Web (WWW) Internet dissemination of population and census data
  2. Modules for user access to statistical data
  3. The output modules for data
  4. Supporting information modules
  5. The module approach to WWW site development
  6. Population of capital cities and cities of over 100,000
References

Summary
This paper provides an overview of various options and strategies used in the dissemination of population and other statistics via the Internet World Wide Web (WWW). The core features of a population census statistics Web site are user access to population census statistics, at various levels of detail, Census outputs and the necessary support information including metadata. Various models used by national and international statistics offices for WWW statistics are presented. The benefits of a modular approach to developing a WWW based population census statistics dissemination strategy are discussed with emphasis on building a dissemination system starting with limited experience and resources.
I. Introduction to the development of World Wide Web (WWW) Internet dissemination of population and census data
The rapid and unexpected development of the Internet as a dissemination medium over the past few years has provided statisticians with very powerful and efficient mechanisms to provide data and related information to users. The Internet however posed many challenges to statisticians as established mechanisms for dissemination were no longer considered the most efficient. There however was little or no experience in disseminating these publications or statistical series via the WWW. In addition, rapidly changing technology lead many Statistical Offices to be concerned that investments into specific technologies or approaches would turn out to be misdirected and result in having to start the development phase from the beginning. The situation that we all found ourselves in was that we had a technology which we were convinced was the future for Statistical Offices, but there were a number of different directions we could go in and little or no experience or expertise in developing or using them.
Now two years later, we have a clearer vision of the capabilities of various approaches and the skills, and resources needed to produce specific results. We know where the major problems are likely to occur and which strategies produce the most efficient results.
One of the early lessons we learned at the United Nations Statistics Division was to base the development strategy on use of current outputs (publications) as the basis for movement of statistics to the WWW. We found it is easier and more productive to move from a established print or CD-ROM output to the WWW than it was to take an output concept, design the total output and then develop it for the WWW. Our experience in producing WWW outputs over the past few years has reinforced this point time again. Once the first version of the WWW output is developed from the print or CD-ROM output, then it is time to add the extra features that are possible via the WWW.
The Internet provides us with a number of opportunities and options for the dissemination of population and other statistics. Many statistical offices have adopted a user orientation in developing their Web sites and this has lead to similarities in the basic elements in the structure of national statistical Web sites. These basic elements include modules to: 
Access the statistics (via indexes, searches, filters);
Display or provide the statistic outputs;
Provide support information (including metadata).
A general WWW statistics dissemination model can be seen in more detail in figure 1. The model is built with a number of modules to provide access, output display and supporting information.
figure 1.
Figure 1: A general WWW statistics dissemination model
II. Modules for user access to statistical data
Users generally have three main requirements when trying to access statistics data on the WWW. They want to find data on a general or specific topic, they have some sort of geographical or other grouping they want to sort the data by and they want it some time period they are interested in. In our experience we find that the series or topic is the first aspect a user wants to search for the data on, but alternative approaches are also important.
The series access can be achieved by either a series/topic index filters used in conjunction with geographical country/area filters. An example of a series/topic index filter is provided by the Argentina statistical office, Instituto Nacional De Estadística y Censos (INDEC) <http://www.indec.mecon.ar/Anuario/default.htm>. The index is grouped by series within topics and in this case the regions and periods are pre-selected for the user. In other cases, filters may be provided. (Note, however, that no name is provided on the page, which could confuse some users.)
The use of Web search engines are becoming more popular with national statistics offices. An example is provided by The United States Census Bureau at <http://www.census.gov/main/www/srchtool.html>. The Census Bureau's search facility allows for searches by word, place, map, and staff. The user enters a keyword or phrase is presented with the resulting series. For selections of specific population data the user first generally selects regions/areas of interest, followed by series and time periods, if there is an option.
III. The output modules for data
Statistics offices have various options to provide the user with population statistics.  Users can be provided with population statistics which are screen viewable and printable or which are in electronic file format which are not receivable on the Web but available for transfer to the user for further action (analysis or printing).
A. Static census data
There are various screen viewable and printable options for presenting population census statistics. The most basic method for disseminating population statistics is in a static html file. This option is used extensively by national statistics offices and an example by the Brazilian Statistical and Geographic Foundation (IBGE) <http://www.ibge.org/english/Brasil/e-pop.htm> . This example provides users with detailed population and surface area statistics for 1980 and 1991, broken down by region. In addition a brief commentary as well as a graph is provided on population statistics relating to children and adolescents. This is good example of a well-designed and structured static html page. The static html page can provide the user with a traditional statistical table with full referencing, sources and other metadata such as technical notes. This is very useful from the statistical offices' perspective as it lowers the chance that there will be problems in be interpretation and citing of the population statistics. The user has the option of saving the file onto their computer or printing the page for further reference or analysis.
Over the past couple of years ago the Adobe portable document format (pdf) file has become popular with national and international statistics offices as a method of disseminating statistics. The pdf file is has a number of benefits to offer in disseminating statistics. The pdf file provides a mechanism for disseminating population and other statistics to multi-platform users in a single fixed format. So often we see a well designed statistical table layout not being displayed or printed as a different web browser or version of a web browser is used to display the table than was intended. The other main advantage to the statistics provider of the pdf file is that the user is not able to change or modify the content of the document. The user can obtain the Adobe viewer free of charge.
B. Dynamic census data
Access to population census data via dynamic interface models has become popular in some national statistics offices. This approach is one that offers the users more flexibility in population census data selection and thus the ability to create population census statistics tables than better reflex their needs. Two examples of this approach are:
These two systems provide very structured methods for accessing, selecting and displaying population census statistics. They provide the user with the ability to be more selective in the population statistics and have more control over the display of the statistics than the static html model does. This method does however require substantial resources to be implemented.
The choice of which one or combination of static or dynamic options to offer relies on a number of factors such as size of data files, data and file structures, user requirements, resources etc.
C. Electronic file downloads
Viewing a static or dynamic population web output page involves the html file being transferred from the server (National Statistics office) to the client (user). There is also however the option of transferring the population statistics in electronic format without viewing them. This can be done via file transfer protocol (ftp), or using the web browser as transfer device and saving the files to the users' computer. The ftp dissemination solution is recommended for high volume or large files and is therefore more likely to be used by the professional or specialist user. The web browser solution is one that suits small or file sizes and suits both the professional or specialist user as well as the more general user.
The electronic files can be in a various formats. The most common are ASCII comma separated variable (csv), spreadsheets or fixed format SAS/SPSS files. The user can select the file format which is appropriate for their analysis software and download it.
IV. Supporting information modules
The final elements in the model are the supporting information modules. These modules can include a wide variety of information under the general term metadata, as well as copyright information, references to statistical and related publications, contact information for staff, site map, links to other sites, service centers, and so on. The Statistics Canada web site <http://www.statcan.ca/english/concepts/> has a good example of detailed supporting information modules. Under each of these elements here should be detailed information to explain the data and their sources.
Metadata add considerably to the value of data and the WWW provides statisticians with the means to provide detailed metadata to users of population census statistics. Recommended metadata are:
A. Organization of the population census
This would include information on the general organization of the population census, including information on who is conducting the census, how it was organized and implemented, Statement of purpose of the population census, the legal authority for the census. The date and duration of the census should be included in the meta data.
B. Description of the coverage
An exact description should be given of the geographic regions or other categories of constituent parts covered by the population census. For example, it is necessary to specify whether such categories as persons without fixed abode or military personnel were included and to indicate the order of magnitude of the categories omitted.
C. Collection of information
The nature of the information collected should be reported in considerable detail, including a statement of items of information collected but not reported on.
D. Numerical results
A general indication should be given of the methods followed in the derivation of numerical results. Particulars should be given of methods, if any, of checking and correcting for under- or over-numeration and for making small-area estimates. Any methods of analyzing and adjusting for non-response, if any, should also be described.
E. Accuracy
A general analysis of the accuracy attained should be given and a distinction should be made and if any sampling were used, a description and analysis of these and/or sampling and non-sampling errors.
F. Assessment
The extent to which the purposes of the survey were fulfilled should be assessed.
G. References
References should be given to any reports or papers relating to the population census.
H. Statistical analysis and computational procedures
The statistical methods followed in the compilation of the final summary tables from the primary data should be described. If any more elaborate processes of estimation than simple totals and means have been used, the methods followed should be explained, the relevant formulae being reproduced where necessary.
I. Accuracy, completeness and adequacy of the enumeration coverage
The accuracy of the enumeration can and should be checked and corrected in the course of the census. Its completeness and adequacy cannot be judged by internal evidence alone. Thus, complete omission of a geographic region or omission of people cannot be discovered by the inquiry itself and auxiliary investigations have often to be made. These should be put on record, indicating the extent of inaccuracy which may be ascribable to such defects.
J. Comparisons with other sources of information
Every reasonable effort should be made to provide comparisons with other independent sources of information. Such comparisons should be reported along with other results, significant differences should be discussed. The object of this is not to throw light on sampling error, since a well designed census provides adequate internal estimates of such errors, but rather to gain knowledge of biases, and other non-random errors.
K. Questionnaires and coding systems
The inclusion of copies of the questionnaires or other schedules, and related parts of the instructions used in the population census (including special rules for coding and classifying) is of great value and should be included as metadata.
These items of metadata add considerable value to the population census data and the web site in general. In many cases the necessary html files can be prepared with very limited resources. The linking of these metadata elements to the appropriate data is a straight forward procedure. The same population census meatdata will be usable over much of an offices population census statistical outputs. Thus a small effort can have wide application in a web site.
The detail of metadata information should always be commensurate with the detail and expected uses of the data, it relates to on the site. However, a good advantage of Internet is that much more metadata can be provided for various levels of users at low cost.
V. The module approach to WWW site development
The use of a modular approach to WWW site development has been a successful strategy for us to use at the United Nations Statistics Division and one that offers national agencies with limited resources a strategy for disseminating population census statistics via the Internet. The modular approach provides statistical WWW site developers with the opportunity to start with a small number of statistical series and add them as resources become available. This allows for the statistical office to select the series that they consider the most important, they may be the specific series that are in great demand from their users or perhaps series that have a short "shelf-life" and are revised regularly. The series that are ranked as top priority are the first to be put on the WWW site, with the remaining series being added at a latter date. Since self-contained modules are used, the new statistical series, tables or data can be linked to the established modules.
The various modules in the site are supplemented with additional information as new series are added. The addition of new series therefore does not require a complete infrastructure to be developed.
This approach allows for various modules to be reused for other topic areas, thus providing considerable resource savings. We have found that the filtering systems for selecting series, country/areas allow this reusability. For example, the country area lists or map interfaces can be used for many topics. Once country area lists or map interfaces a have been developed, for example, a few minor changes in links to other pages or titles would enable the pages to be used for other series or topics.
It is important to develop a WWW model for each group of statistical data you want to put on the WWW. The models may be essentially the same, but with minor variations, or they may have few similarities. This will all depend upon factors such as whether the output pages they are static or dynamically produced, the complexity of the data, the data audience, etc.
The Statistics Division population census statistics on the its web site in the United Nations Monthly Bulletin of Statistics On-line (MBS Online) and Population and Vital Statistics Report. These web outputs are updated on a monthly and quarterly basis respectively. We have just developed a new web output based on population census from the United Nations Demographic Yearbook.
VI. Population of capital cities and cities of over 100,000
This WWW project grew from the numerous requests the Statistics Division has received for population statistics on cities. This data is supplemented with coordinate information. In this situation the WWW model shown in figure 2.was adopted. This model varies from other Division models mainly in the area of selection of countries/areas. In the other respects it re-uses modules from other outputs.
Figure 2
Figure 2:
This model uses a map/list approach for the users to select the countries/areas see <http://www.un.org/Depts/unsd/demog/index.html>. The other modules of the model such as technical notes, sources, help etc. are standard files. This approach is very well suited to population census data as users are focused on specific series and areas, thus building a visual map interface for filtering the data needs is efficient.
The main advantage to the Division in developing this WWW output is that it provides us with a new module to use for population census WWW outputs. The Demographic Yearbook has a number of tables that could use this same model. We can then develop the WWW population outputs as required. In addition the model uses previously developed modules from our other statistical outputs such as MBS On-line. We have found that this re-use of modules an approach that saves time and resources. This modular approach allows for additional models to be added as they are developed. The UNSD has been developed a WWW module of the United Nations, Statistics Division's Standard Country or Area Codes for Statistical Use. This module will be linked to a number of the Division's outputs and be a useful resource for international statistical community.
A further module under development is a data dictionary covering all row and column labels in its World Statistics Pocketbook. An extended version of the dictionary will cover all series published on the Division's Web site <http://www.un.org/Depts/unsd/>, which includes the Monthly Bulletin of Statistics On-line, Social Indicators and selected tables from its 1995 publication The World's Women --Trends and Statistics. The data dictionary quotes verbatim the internationally recommended definition for each label. Footnotes accompanying data in the tables are used to indicate differences from the international recommendations. The complete and extended version will be released soon on the Division's Web site and will become an additional module which will supplement our current and future   outputs.
References
  • Brazilian Statistical and Geographic Foundation (IBGE) <http://www.ibge.org/english/Brasil/e-pop.htm>
  • Instituto Nacional De Estadística y Censos (INDEC) <http://www.indec.mecon.ar/Anuario/default.html>
  • Statistics Canada, <http://www.statcan.ca/english/census96/list.htm>, <http://www.statcan.ca/english/concepts/index.html>
  • United Nations, 1995 Demographic Yearbook, United Nations publication, Series R, No. 26,  Sales No. E/F.97.XIII.1
  • United Nations, Population and Vital Statistics Report, United Nations publication, Series A,  Vol. L, 1998 (E) quarterly. (print & Internet)
  • United Nations, Statistics Division (1997/8), United Nations publication, Monthly Bulletin of  Statistics On-line (MBS On-line), <http://www.un.org/Depts/unsd>
  • United Nations, Statistics Division (1996), Standard Country or Area Codes for Statistical Use, United Nations publication, ST/ESA/STTAT/SER.M/49/Rev.3
  • United Nations, Statistics Division, Recommendations for the Preparation of Sample Survey Reports (Provisional Issue), Series C, No.1, Rev.2
  • United Nations, World Statistics in Brief, United Nations publication, Series V, No. 17, Sales No. E.97.XVII.5
  • United Nations, The World's Women --Trends and Statistics, United Nations publication, Series K, No. 12, Sales No. E.95.XVII.2
  • United States Census Bureau, 1990 Census Lookup, <http://venus.census.gov/cdrom/lookup>
  • The Urban Information Center, University of Missouri, Basic Tables: 1990 Demographic Profile Generator, <http://www.oseda.missouri.edu/uic/uicapps/xtabs3.html>

 
Pop-IT project (1997-2001)
Project Objectives
Working Party Members
Working Party Meetings
First meeting, Bangkok, 24-26 September 1997
Second meeting, Singapore, 1-3 April 1998
Third meeting, Bali, 7-9 January 1999
Fourth meeting, Manila, 6-9 July 1999
Ffth meeting, Bangkok, 21 October 1999
Sixth meeting, Bangkok, 26 March 2001
Workshops
Application of New Information Technology to Population data, Bangkok, 12-20 October 1999
Population Data Analysis, Storage and Dissemination Technologies, Bangkok, 27-30 March 2001
Guidelines
Population data collection and capture (BBS - Statistics Indonesia)
GPS in modern mapping and GIS technologies to population data (Bangladesh Bureau of Statistics)
Population data dissemination (Statistics New Zealand)
Project Newsletter
Contact us
   
Copyright (c) 2013 ESCAP  |  Legal Notice