| The Second
Meeting of the Working Party on the Application
of New Technology to Population Data |
| Singapore, 1-3 April
1998 |
| |
STAT/WPA(2)/5(UNSD)
27 March 1998
ECONOMIC AND SOCIAL COMMISSION FOR ASIA AND THE
PACIFIC
Working Party on the Application of New Technology
to Population Data
Second Meeting
1-3 April 1998
Singapore |
| Using Population Census
Data Modules to Produce Census Statistics for
Internet |
Robert Mayo
United Nations Statistics Division |
| Contents |
Summary
- Introduction
to the development of World Wide Web (WWW)
Internet dissemination of population and census
data
- Modules
for user access to statistical data
- The
output modules for data
- Supporting
information modules
- The
module approach to WWW site development
- Population
of capital cities and cities of over 100,000
References |
|
| Summary
|
| This paper provides an overview
of various options and strategies used in the
dissemination of population and other statistics
via the Internet World Wide Web (WWW). The core
features of a population census statistics Web
site are user access to population census statistics,
at various levels of detail, Census outputs and
the necessary support information including metadata.
Various models used by national and international
statistics offices for WWW statistics are presented.
The benefits of a modular approach to developing
a WWW based population census statistics dissemination
strategy are discussed with emphasis on building
a dissemination system starting with limited experience
and resources. |
|
| I.
Introduction to the development of World Wide
Web (WWW) Internet dissemination of population
and census data |
| The rapid and unexpected development
of the Internet as a dissemination medium over
the past few years has provided statisticians
with very powerful and efficient mechanisms to
provide data and related information to users.
The Internet however posed many challenges to
statisticians as established mechanisms for dissemination
were no longer considered the most efficient.
There however was little or no experience in disseminating
these publications or statistical series via the
WWW. In addition, rapidly changing technology
lead many Statistical Offices to be concerned
that investments into specific technologies or
approaches would turn out to be misdirected and
result in having to start the development phase
from the beginning. The situation that we all
found ourselves in was that we had a technology
which we were convinced was the future for Statistical
Offices, but there were a number of different
directions we could go in and little or no experience
or expertise in developing or using them. |
| Now two years later, we have
a clearer vision of the capabilities of various
approaches and the skills, and resources needed
to produce specific results. We know where the
major problems are likely to occur and which strategies
produce the most efficient results. |
| One of the early lessons we
learned at the United Nations Statistics Division
was to base the development strategy on use of
current outputs (publications) as the basis for
movement of statistics to the WWW. We found it
is easier and more productive to move from a established
print or CD-ROM output to the WWW than it was
to take an output concept, design the total output
and then develop it for the WWW. Our experience
in producing WWW outputs over the past few years
has reinforced this point time again. Once the
first version of the WWW output is developed from
the print or CD-ROM output, then it is time to
add the extra features that are possible via the
WWW. |
| The Internet provides us with
a number of opportunities and options for the
dissemination of population and other statistics.
Many statistical offices have adopted a user orientation
in developing their Web sites and this has lead
to similarities in the basic elements in the structure
of national statistical Web sites. These basic
elements include modules to: |
Access the statistics (via indexes,
searches, filters);
Display or provide the statistic outputs;
Provide support information (including metadata). |
| A general WWW statistics dissemination
model can be seen in more detail in figure 1.
The model is built with a number of modules to
provide access, output display and supporting
information. |
| figure 1. |
 |
|
| II.
Modules for user access to statistical data |
| Users generally have three main
requirements when trying to access statistics
data on the WWW. They want to find data on a general
or specific topic, they have some sort of geographical
or other grouping they want to sort the data by
and they want it some time period they are interested
in. In our experience we find that the series
or topic is the first aspect a user wants to search
for the data on, but alternative approaches are
also important. |
| The series access can be achieved
by either a series/topic index filters used in
conjunction with geographical country/area filters.
An example of a series/topic index filter is provided
by the Argentina statistical office, Instituto
Nacional De Estadística y Censos (INDEC)
<http://www.indec.mecon.ar/Anuario/default.htm>.
The index is grouped by series within topics and
in this case the regions and periods are pre-selected
for the user. In other cases, filters may be provided.
(Note, however, that no name is provided on the
page, which could confuse some users.) |
| The use of Web search engines
are becoming more popular with national statistics
offices. An example is provided by The United
States Census Bureau at <http://www.census.gov/main/www/srchtool.html>.
The Census Bureau's search facility allows for
searches by word, place, map, and staff. The user
enters a keyword or phrase is presented with the
resulting series. For selections of specific population
data the user first generally selects regions/areas
of interest, followed by series and time periods,
if there is an option. |
|
| III.
The output modules for data |
| Statistics offices have various
options to provide the user with population statistics.
Users can be provided with population statistics
which are screen viewable and printable or which
are in electronic file format which are not receivable
on the Web but available for transfer to the user
for further action (analysis or printing). |
| A. Static census data |
| There are various screen viewable
and printable options for presenting population
census statistics. The most basic method for disseminating
population statistics is in a static html file.
This option is used extensively by national statistics
offices and an example by the Brazilian Statistical
and Geographic Foundation (IBGE) <http://www.ibge.org/english/Brasil/e-pop.htm>
. This example provides users with detailed population
and surface area statistics for 1980 and 1991,
broken down by region. In addition a brief commentary
as well as a graph is provided on population statistics
relating to children and adolescents. This is
good example of a well-designed and structured
static html page. The static html page can provide
the user with a traditional statistical table
with full referencing, sources and other metadata
such as technical notes. This is very useful from
the statistical offices' perspective as it lowers
the chance that there will be problems in be interpretation
and citing of the population statistics. The user
has the option of saving the file onto their computer
or printing the page for further reference or
analysis. |
| Over the past couple of years
ago the Adobe portable document format (pdf) file
has become popular with national and international
statistics offices as a method of disseminating
statistics. The pdf file is has a number of benefits
to offer in disseminating statistics. The pdf
file provides a mechanism for disseminating population
and other statistics to multi-platform users in
a single fixed format. So often we see a well
designed statistical table layout not being displayed
or printed as a different web browser or version
of a web browser is used to display the table
than was intended. The other main advantage to
the statistics provider of the pdf file is that
the user is not able to change or modify the content
of the document. The user can obtain the Adobe
viewer free of charge. |
| B. Dynamic census data |
| Access to population census
data via dynamic interface models has become popular
in some national statistics offices. This approach
is one that offers the users more flexibility
in population census data selection and thus the
ability to create population census statistics
tables than better reflex their needs. Two examples
of this approach are:
|
| These two systems provide very
structured methods for accessing, selecting and
displaying population census statistics. They
provide the user with the ability to be more selective
in the population statistics and have more control
over the display of the statistics than the static
html model does. This method does however require
substantial resources to be implemented. |
| The choice of which one or combination
of static or dynamic options to offer relies on
a number of factors such as size of data files,
data and file structures, user requirements, resources
etc. |
| C. Electronic file downloads |
| Viewing a static or dynamic
population web output page involves the html file
being transferred from the server (National Statistics
office) to the client (user). There is also however
the option of transferring the population statistics
in electronic format without viewing them. This
can be done via file transfer protocol (ftp),
or using the web browser as transfer device and
saving the files to the users' computer. The ftp
dissemination solution is recommended for high
volume or large files and is therefore more likely
to be used by the professional or specialist user.
The web browser solution is one that suits small
or file sizes and suits both the professional
or specialist user as well as the more general
user. |
| The electronic files can be
in a various formats. The most common are ASCII
comma separated variable (csv), spreadsheets or
fixed format SAS/SPSS files. The user can select
the file format which is appropriate for their
analysis software and download it. |
|
| IV.
Supporting information modules |
| The final elements in the model
are the supporting information modules. These
modules can include a wide variety of information
under the general term metadata, as well as copyright
information, references to statistical and related
publications, contact information for staff, site
map, links to other sites, service centers, and
so on. The Statistics Canada web site <http://www.statcan.ca/english/concepts/>
has a good example of detailed supporting information
modules. Under each of these elements here should
be detailed information to explain the data and
their sources. |
| Metadata add considerably to
the value of data and the WWW provides statisticians
with the means to provide detailed metadata to
users of population census statistics. Recommended
metadata are: |
| A. Organization of the
population census |
| This would include information
on the general organization of the population
census, including information on who is conducting
the census, how it was organized and implemented,
Statement of purpose of the population census,
the legal authority for the census. The date and
duration of the census should be included in the
meta data. |
| B. Description of the coverage |
| An exact description should
be given of the geographic regions or other categories
of constituent parts covered by the population
census. For example, it is necessary to specify
whether such categories as persons without fixed
abode or military personnel were included and
to indicate the order of magnitude of the categories
omitted. |
| C. Collection of information |
| The nature of the information
collected should be reported in considerable detail,
including a statement of items of information
collected but not reported on. |
| D. Numerical results |
| A general indication should
be given of the methods followed in the derivation
of numerical results. Particulars should be given
of methods, if any, of checking and correcting
for under- or over-numeration and for making small-area
estimates. Any methods of analyzing and adjusting
for non-response, if any, should also be described. |
| E. Accuracy |
| A general analysis of the accuracy
attained should be given and a distinction should
be made and if any sampling were used, a description
and analysis of these and/or sampling and non-sampling
errors. |
| F. Assessment |
| The extent to which the purposes
of the survey were fulfilled should be assessed. |
| G. References |
| References should be given to
any reports or papers relating to the population
census. |
| H. Statistical analysis
and computational procedures |
| The statistical methods followed
in the compilation of the final summary tables
from the primary data should be described. If
any more elaborate processes of estimation than
simple totals and means have been used, the methods
followed should be explained, the relevant formulae
being reproduced where necessary. |
| I. Accuracy, completeness
and adequacy of the enumeration coverage |
| The accuracy of the enumeration
can and should be checked and corrected in the
course of the census. Its completeness and adequacy
cannot be judged by internal evidence alone. Thus,
complete omission of a geographic region or omission
of people cannot be discovered by the inquiry
itself and auxiliary investigations have often
to be made. These should be put on record, indicating
the extent of inaccuracy which may be ascribable
to such defects. |
| J. Comparisons with other
sources of information |
| Every reasonable effort should
be made to provide comparisons with other independent
sources of information. Such comparisons should
be reported along with other results, significant
differences should be discussed. The object of
this is not to throw light on sampling error,
since a well designed census provides adequate
internal estimates of such errors, but rather
to gain knowledge of biases, and other non-random
errors. |
| K. Questionnaires and coding
systems |
| The inclusion of copies of the
questionnaires or other schedules, and related
parts of the instructions used in the population
census (including special rules for coding and
classifying) is of great value and should be included
as metadata. |
| These items of metadata add
considerable value to the population census data
and the web site in general. In many cases the
necessary html files can be prepared with very
limited resources. The linking of these metadata
elements to the appropriate data is a straight
forward procedure. The same population census
meatdata will be usable over much of an offices
population census statistical outputs. Thus a
small effort can have wide application in a web
site. |
| The detail of metadata information
should always be commensurate with the detail
and expected uses of the data, it relates to on
the site. However, a good advantage of Internet
is that much more metadata can be provided for
various levels of users at low cost. |
|
| V.
The module approach to WWW site development |
| The use of a modular approach
to WWW site development has been a successful
strategy for us to use at the United Nations Statistics
Division and one that offers national agencies
with limited resources a strategy for disseminating
population census statistics via the Internet.
The modular approach provides statistical WWW
site developers with the opportunity to start
with a small number of statistical series and
add them as resources become available. This allows
for the statistical office to select the series
that they consider the most important, they may
be the specific series that are in great demand
from their users or perhaps series that have a
short "shelf-life" and are revised regularly.
The series that are ranked as top priority are
the first to be put on the WWW site, with the
remaining series being added at a latter date.
Since self-contained modules are used, the new
statistical series, tables or data can be linked
to the established modules. |
| The various modules in the site
are supplemented with additional information as
new series are added. The addition of new series
therefore does not require a complete infrastructure
to be developed. |
| This approach allows for various
modules to be reused for other topic areas, thus
providing considerable resource savings. We have
found that the filtering systems for selecting
series, country/areas allow this reusability.
For example, the country area lists or map interfaces
can be used for many topics. Once country area
lists or map interfaces a have been developed,
for example, a few minor changes in links to other
pages or titles would enable the pages to be used
for other series or topics. |
| It is important to develop a
WWW model for each group of statistical data you
want to put on the WWW. The models may be essentially
the same, but with minor variations, or they may
have few similarities. This will all depend upon
factors such as whether the output pages they
are static or dynamically produced, the complexity
of the data, the data audience, etc. |
| The Statistics Division population
census statistics on the its web site in the United
Nations Monthly Bulletin of Statistics On-line
(MBS Online) and Population and Vital Statistics
Report. These web outputs are updated on a monthly
and quarterly basis respectively. We have just
developed a new web output based on population
census from the United Nations Demographic Yearbook. |
|
| VI.
Population of capital cities and cities of over
100,000 |
| This WWW project grew from the
numerous requests the Statistics Division has
received for population statistics on cities.
This data is supplemented with coordinate information.
In this situation the WWW model shown in figure
2.was adopted. This model varies from other Division
models mainly in the area of selection of countries/areas.
In the other respects it re-uses modules from
other outputs. |
| Figure 2 |
 |
| This model uses a map/list approach
for the users to select the countries/areas see
<http://www.un.org/Depts/unsd/demog/index.html>.
The other modules of the model such as technical
notes, sources, help etc. are standard files.
This approach is very well suited to population
census data as users are focused on specific series
and areas, thus building a visual map interface
for filtering the data needs is efficient. |
| The main advantage to the Division
in developing this WWW output is that it provides
us with a new module to use for population census
WWW outputs. The Demographic Yearbook has a number
of tables that could use this same model. We can
then develop the WWW population outputs as required.
In addition the model uses previously developed
modules from our other statistical outputs such
as MBS On-line. We have found that this re-use
of modules an approach that saves time and resources.
This modular approach allows for additional models
to be added as they are developed. The UNSD has
been developed a WWW module of the United Nations,
Statistics Division's Standard Country or Area
Codes for Statistical Use. This module will be
linked to a number of the Division's outputs and
be a useful resource for international statistical
community. |
| A further module under development
is a data dictionary covering all row and column
labels in its World Statistics Pocketbook. An
extended version of the dictionary will cover
all series published on the Division's Web site
<http://www.un.org/Depts/unsd/>,
which includes the Monthly Bulletin of Statistics
On-line, Social Indicators and selected tables
from its 1995 publication The World's Women --Trends
and Statistics. The data dictionary quotes verbatim
the internationally recommended definition for
each label. Footnotes accompanying data in the
tables are used to indicate differences from the
international recommendations. The complete and
extended version will be released soon on the
Division's Web site and will become an additional
module which will supplement our current and future
outputs. |
|
| References
|
- Brazilian Statistical
and Geographic Foundation (IBGE) <http://www.ibge.org/english/Brasil/e-pop.htm>
- Instituto Nacional De Estadística
y Censos (INDEC) <http://www.indec.mecon.ar/Anuario/default.html>
- Statistics Canada, <http://www.statcan.ca/english/census96/list.htm>,
<http://www.statcan.ca/english/concepts/index.html>
- United Nations, 1995 Demographic
Yearbook, United Nations publication, Series
R, No. 26, Sales No. E/F.97.XIII.1
- United Nations, Population
and Vital Statistics Report, United Nations
publication, Series A, Vol. L, 1998
(E) quarterly. (print & Internet)
- United Nations, Statistics
Division (1997/8), United Nations publication,
Monthly Bulletin of Statistics On-line
(MBS On-line), <http://www.un.org/Depts/unsd>
- United Nations, Statistics
Division (1996), Standard Country or Area
Codes for Statistical Use, United Nations
publication, ST/ESA/STTAT/SER.M/49/Rev.3
- United Nations, Statistics
Division, Recommendations for the Preparation
of Sample Survey Reports (Provisional Issue),
Series C, No.1, Rev.2
- United Nations, World
Statistics in Brief, United Nations publication,
Series V, No. 17, Sales No. E.97.XVII.5
- United Nations, The World's
Women --Trends and Statistics, United Nations
publication, Series K, No. 12, Sales No. E.95.XVII.2
- United States Census Bureau,
1990 Census Lookup, <http://venus.census.gov/cdrom/lookup>
- The Urban Information
Center, University of Missouri, Basic Tables:
1990 Demographic Profile Generator, <http://www.oseda.missouri.edu/uic/uicapps/xtabs3.html>
|
|