UN Web Site | UN Web Site Locator
Home Site map Contact 
ESCAP Statistics Division
ESCAP Statistics Division
 
Fourth Meeting    
The Fourth Meeting of the Working Party on the Application of New Technology to Population Data
Manila, 6-9 July 1999

STAT/WPA(4)/Australia(A)
ENGLISH ONLY

ECONOMIC AND SOCIAL COMMISSION FOR ASIA AND THE PACIFIC

Working Party on the Application of New Technology to Population Data
Fourth meeting
6-9 July 1999
Manila

GIS for the integration of social and economic statistics its benefits and limitations
Frank J Blanchfield
Australian Bureau of Statistics
Contents

ABSTRACT

Population and economic data are often collected or published on seemingly incomparable aerial units.  Even in fortunate countries where a standard geographical classification is used for all national statistical data, changes to the boundaries of the aerial units have traditionally been considered as a break in the time series.  Can GIS successfully solve the problem of comparing data collected on different aerial units or on a classification which has changed over time ?  Some simple examples will show the benefits and the limitations of the GIS solution.

INTRODUCTION

GIS is a tool widely used in many different disciplines.  Cartography,
environment, geology, meteorology and statistics, to name but a few.  It must be remembered, however, that, even though they share common tools, some of these disciplines have quite different views of the world.  A GIS is a powerful and intuitive tool to model the world around us but it has its origin in topographic mapping and there are some fundamental differences between the topographic and statistical models. The two diagrams below illustrate.

Figure 1: Grape Vines in the SLA of Swan(S), The Statistical Model
Figure 2: Grape Vines in the SLA of Swan(S), The Mapping Model
Figure 1
Grape Vines in the SLA of Swan(S)
The Statistical Model
Figure 2
Grape Vines in the SLA of Swan(S)
The Mapping Model

In figure 2 the small black shapes represent grape vines.   These shapes are mutually exclusive of any other land use and the only descriptive information required is "grape vine", (which would normally appear as a label on the map or in the map legend).  The accuracy of the information is entirely dependent on the accuracy with which the areas of vine are mapped and the distribution of the vineyards is obvious.  This is a cartographers view of the world but statisticians rarely use this model.

Statistical organisations, such as the Australian Bureau of Statistics (ABS), gather information about various populations of people or businesses, and attribute that information to an area or region.   In figure 1, for example, the area is the Statistical Local Area (SLA) of Swan (S) to which has been attached the attribute of area of grape bearing vines.  In an ABS dataset, such as the Agricultural Census of 1997, there would of course be hundreds of other attributes attached to the SLA.  The ABS uses a common geographical classification for most collections, both social and economic, so, in fact, within the ABS databases there would be many thousands of other attributes, or statistics, also attributed to this same area.

The statistical model of the world consists of aerial units which contain a population about which some information has been gathered.  The difficulty arises when data are collected on different, incomparable aerial units.  GIS is often touted as a means to compare these data collected on different aerial units and this paper seeks to show, by some simple examples, the limitations of the GIS approach and to offer an imperfect but workable alternative.

GIS BOUNDARY OVERLAY

Where data are available for overlapping but different aerial units, a GIS can very graphically show the relationship between areas.  Sometimes this is all that is needed to give a reasonable picture of the relationship between two otherwise unrelated variables.  The example below shows the population density for part of Australia in the form of a dot density map overlayed over recent rainfall data.  The correlation between rainfall and settlement patterns is clearly visible and there is really no need to quantify this further by transforming one or other of the two data sets to a common set of boundaries.  In other words, the GIS picture speaks for itself.

Figure 3
Figure 3

Where it is necessary to transform data to a common set of boundaries, various techniques can be used, including area overlays in a GIS.  Although some GIS can apply a level of sophistication (see "Handbook on GIS and digital mapping for population and housing census", UN Statistical Division January 1999, Sec 14.1.2) this technique basically relies on the population being uniformly distributed within the target aerial unit.  Consider the example below. The boundary of a river catchment cuts through a number of local government areas.  In the case of Holbrook (A) the catchment boundary cuts the area exactly in half.  If the population of Holbrook (A) is proportioned according to the area ratios then half the population would be attributed to the northern catchment area and the other half of the population to the southern catchment area.

Figure 4
Figure 4

In this case, however, population data is available at a higher level of geographical disaggregation, ie a more detailed picture of the population distribution  is available.  The underlying enumeration districts reveal that the small town of Holbrook, which is the administrative centre for the local government area of Holbrook (A) comprising three enumeration districts, lies just to the south of the river catchment boundary.  The population figures for the enumeration districts show that in fact there are five times as many people in the southern half of the local government area than in the northern half.  This example shows that area proportion, the basis of most GIS overlay techniques, should not be used unless the population in question is distributed very smoothly.  The ideal solution is of course to have point referenced data which shows precisely the spatial distribution of the population, ie the position of individual dwellings, farms or businesses, but most countries are far from achieving this.  In the mean time, area overlay as a technique for comparing data on dissimilar boundaries has very limited application.

Figure 5
Figure 5
BOUNDARIES WHICH CHANGE OVER TIME

A special case of incomparable boundaries is a geographical classification where the aerial units change over time.  The ABS uses a single classification, the Australian Standard Geographical Classification (ASGC), for most data collections.  Thus it is generally possible to compare social and economic data from different censuses and surveys on a common set of boundaries.  The aerial units of the ASGC have a direct relationship to local government areas.  By including local government in the geographical classification, the ABS is able to provide local administrations with a range of data vitally important to good governance at the local area. Unfortunately local government boundaries have changed considerably over time.  Over the years, large areas have been split and rearranged as population growth has occurred.  Conversely, in these times of economic rationalism, federal and state governments have encouraged local governments to amalgamate and merge to form larger administrative regions capable of delivering a wider range of services to their rate payers.  This has meant that, while it is possible to integrate statistics from various collection, it is often not possible to compare data, even from the same collection, across time.

Consider the following example.  In the early 1990's, to allow for the expansion of the city of Townsville, the three administrative areas shown below became four areas.

Figure 6: Before the change
Figure 6: Before the change
Figure 7: After the change
Figure 7: After the change

In statistical terms the new aerial units are different to the old and this constitutes a break in the statistical time series.  A time series of agricultural data showing the number of cattle and calves in these areas would look as follows.

Area
1991
1992
1993
1994
   
Burdekin (S) [ASGC Ed2.3]
85,602
79,379
79,430
74,420
   
Dalrymple (S) [ASGC Ed2.3]
482,086
521,660
499,530
413,669
   
Thuringowa (C) - Pt B [ASGC Ed2.3]
26,977
28,415
26,401
32,664
   
Total
594,665
629,454
605,361
520,753
   
Area 
1991
1992
1993
1994
1995
1996
Burdekin (S) [ASGC Ed2.4]        
72,791
73,611
Dalrymple (S) [ASGC Ed2.4]        
395,052
424,125
Thuringowa (C) - Pt B [ASGC Ed2.4]        
30,575
29,134
Townsville (C) - Pt B [ASGC Ed2.4]        
75
1,439
Total
       
498,493
528,309
Table 1

It is not possible to examine the growth or decline in cattle production in these areas because it is not known how much is real growth and how much is a result of boundary changes.  A simple GIS overlay, however, reveals the relationship between old and new boundaries.

Table 2 below shows the ratio of the area of change to the area of the old unit.
Area
From
To
Area (square kilometres)
Proportion (of old area)
a Dalrymple (S) Thuringowa (C)- Pt B
16.9
0
b Thuringowa (C)- Pt B Thuringowa (C)- Pt B
1676.6
0.42
c Thuringowa (C)- Pt B Townsville (C)- Pt B
1556.7
0.39
d Thuringowa (C)- Pt B Burdekin (S)
280
0.07
e Burdekin (S) Burdekin (S)
4840.4
0.97
f Burdekin (S) Dalrymple (S)
161.9
0.03
g Thuringowa (C)- Pt B Dalrymple (S)
490.8
0.12

This ratio of areas can be applied to the pre 1995 data on cattle and calves as follows:

Thuringowa (C) - Pt B [ASGC Ed2.41994  =  32,664 + (0 x 413,669) - 
(0.39 x  32,664) - (0.07 x 32,664) - (0.12 x 32,664)

Thus a time series based on the new boundaries  results in the following table.

Area
1991
1992
1993
1994
1995
1996
Burdekin (S) [ASGC Ed2.4]
84,922
78,987
78,895
74,474
72,791
73,611
Dalrymple (S) [ASGC Ed2.4]
487,891
527,451
505,081
419,821
395,052
424,125
Thuringowa (C) - Pt B [ASGC Ed2.4]
11,330
11,934
11,088
13,719
30,575
29,134
Townsville (C) - Pt B [ASGC Ed2.4]
10,521
11,082
10,296
12,739
75 
1,439
Total
594,665 629,454 605,361 520,753 498,493 528,309
Table 3

Using the area proportion technique it is estimated that the Townsville (C) - Pt B area would have contained over 10,000 cattle had it existed in 1994.  This area is known to contain only 75 cattle in 1995 and just over 1,000 in 1996 so the estimate is most unlikely to be correct.  The GIS technique does of course calculate the area of land moved from one unit to another very precisely.  The error in this methodology is in the assumption that the population is evenly distributed over the original area.  The GIS, however, shows not only how much land is transferred.  It also shows which land.  If additional information is available about that land, it is possible to improve the estimate.  For example an overlay which shows urban area or national park or forest could be used to weight the estimate by eliminating some land as unlikely to contain cattle.

AN ALTERNATIVE ESTIMATION

The problem of boundaries which change over time is very common in any statistical analysis.  Statistical agencies are continually confronted by the conflicting desire for currency with real world aerial units versus stability of existing statistical boundaries.  It can be seen from the above that GIS offers only a limited solution.  In this case, however, there is an alternative approach.  Reviewing the example above, three areas became four but the outside perimeter of the total area has not changed.  This is quite often the case, except where a mass redesign of administrative units has occurred (such as the Australian State of Victoria in 1993/94).  For the purpose of this discussion this outside perimeter which has not changed will be termed the bounding region.  Where data is available before and after the change a simple estimate can be made as follows.

Figure 8
Figure 8

The same GIS generated area proportions,  discussed in the Holbrook example above, can be applied to these boundaries.   Because of the necessity to assume evenly distributed population, the same limitations also apply.  Cattle may be more evenly distributed than people but without some underlying information about topography and land use it is dangerous to make this assumption.  For example, Table 1 above shows that in 1995 the number of cattle in the newly created area of Townsville (C)- Pt B is very low compared to the adjacent areas.

The GIS approach to the problem of showing this years data on last years boundaries, or vice versa, is to divide the old and new areas into constituent parts which are common to both.  The diagram below illustrates this process.  The areas of change lie in the north east corner of the total area.  The GIS can determine, quite precisely, by spatial overlay the area which has moved from one unit in one year to another unit in the next.

Figure 9: Areas of change
Figure 9: Areas of change
Vy  =  Totaly x  V(y+1)/ Total(y+1)
Where y is the year preceding the change
(y+1) is the year immediately after the change
Vy is the value of the data item for year y but for an area as it exists after the change
Totaly is the sum of the data item for year y for all the areas which make up the bounding region

is the value of the data item, for year y+1 for an area as it exists after the change

A feature of this estimation procedure is that, in proportioning this years data to last years boundaries or vice versa, the actual data item being studied is used in the calculation.  The distribution of the population is not assumed to be even but is assumed to be the same before and after the change in boundaries.  In other words if the new area of Townsville (C) - Pt B has only 0.015% of the cattle and calves in 1995 it is assumed that this area, had it existed in 1994 would have contained 0.015% of the total cattle and calves in the bounding area.

For the area in the above example this method of estimating gives the following results for the  old data on the new boundaries.  It is of course equally simple to estimate the 1995 and 1996 years on the pre 1995 boundaries.


 
Pop-IT project (1997-2001)
Project Objectives
Working Party Members
Working Party Meetings
First meeting, Bangkok, 24-26 September 1997
Second meeting, Singapore, 1-3 April 1998
Third meeting, Bali, 7-9 January 1999
Fourth meeting, Manila, 6-9 July 1999
Ffth meeting, Bangkok, 21 October 1999
Sixth meeting, Bangkok, 26 March 2001
Workshops
Application of New Information Technology to Population data, Bangkok, 12-20 October 1999
Population Data Analysis, Storage and Dissemination Technologies, Bangkok, 27-30 March 2001
Guidelines
Population data collection and capture (BBS - Statistics Indonesia)
GPS in modern mapping and GIS technologies to population data (Bangladesh Bureau of Statistics)
Population data dissemination (Statistics New Zealand)
Project Newsletter
Contact us
   
Copyright (c) 2013 ESCAP  |  Legal Notice