Population and economic data are often
collected or published on seemingly incomparable
aerial units. Even in fortunate countries
where a standard geographical classification
is used for all national statistical data, changes
to the boundaries of the aerial units have traditionally
been considered as a break in the time series.
Can GIS successfully solve the problem of comparing
data collected on different aerial units or
on a classification which has changed over time
? Some simple examples will show the benefits
and the limitations of the GIS solution.
INTRODUCTION
GIS is a tool widely used in many different
disciplines. Cartography,
environment, geology, meteorology and statistics,
to name but a few. It must be remembered,
however, that, even though they share common
tools, some of these disciplines have quite
different views of the world. A GIS is
a powerful and intuitive tool to model the world
around us but it has its origin in topographic
mapping and there are some fundamental differences
between the topographic and statistical models.
The two diagrams below illustrate.
Figure 1
Grape Vines in the SLA of Swan(S)
The Statistical Model
Figure 2
Grape Vines in the SLA of Swan(S)
The Mapping Model
In figure 2 the small black shapes represent
grape vines. These shapes are mutually
exclusive of any other land use and the only
descriptive information required is "grape vine",
(which would normally appear as a label on the
map or in the map legend). The accuracy
of the information is entirely dependent on
the accuracy with which the areas of vine are
mapped and the distribution of the vineyards
is obvious. This is a cartographers view
of the world but statisticians rarely use this
model.
Statistical organisations, such as the Australian
Bureau of Statistics (ABS), gather information
about various populations of people or businesses,
and attribute that information to an area or
region. In figure 1, for example,
the area is the Statistical Local Area (SLA)
of Swan (S) to which has been attached the attribute
of area of grape bearing vines. In an
ABS dataset, such as the Agricultural Census
of 1997, there would of course be hundreds of
other attributes attached to the SLA.
The ABS uses a common geographical classification
for most collections, both social and economic,
so, in fact, within the ABS databases there
would be many thousands of other attributes,
or statistics, also attributed to this same
area.
The statistical model of the world consists
of aerial units which contain a population about
which some information has been gathered.
The difficulty arises when data are collected
on different, incomparable aerial units.
GIS is often touted as a means to compare these
data collected on different aerial units and
this paper seeks to show, by some simple examples,
the limitations of the GIS approach and to offer
an imperfect but workable alternative.
GIS
BOUNDARY OVERLAY
Where data are available for overlapping
but different aerial units, a GIS can very graphically
show the relationship between areas. Sometimes
this is all that is needed to give a reasonable
picture of the relationship between two otherwise
unrelated variables. The example below
shows the population density for part of Australia
in the form of a dot density map overlayed over
recent rainfall data. The correlation
between rainfall and settlement patterns is
clearly visible and there is really no need
to quantify this further by transforming one
or other of the two data sets to a common set
of boundaries. In other words, the GIS
picture speaks for itself.
Figure 3
Where it is necessary to transform data to
a common set of boundaries, various techniques
can be used, including area overlays in a GIS.
Although some GIS can apply a level of sophistication
(see "Handbook on GIS and digital mapping for
population and housing census", UN Statistical
Division January 1999, Sec 14.1.2) this technique
basically relies on the population being uniformly
distributed within the target aerial unit.
Consider the example below. The boundary of
a river catchment cuts through a number of local
government areas. In the case of Holbrook
(A) the catchment boundary cuts the area exactly
in half. If the population of Holbrook
(A) is proportioned according to the area ratios
then half the population would be attributed
to the northern catchment area and the other
half of the population to the southern catchment
area.
Figure 4
In this case, however, population data is
available at a higher level of geographical
disaggregation, ie a more detailed picture of
the population distribution is available.
The underlying enumeration districts reveal
that the small town of Holbrook, which is the
administrative centre for the local government
area of Holbrook (A) comprising three enumeration
districts, lies just to the south of the river
catchment boundary. The population figures
for the enumeration districts show that in fact
there are five times as many people in the southern
half of the local government area than in the
northern half. This example shows that
area proportion, the basis of most GIS overlay
techniques, should not be used unless the population
in question is distributed very smoothly.
The ideal solution is of course to have point
referenced data which shows precisely the spatial
distribution of the population, ie the position
of individual dwellings, farms or businesses,
but most countries are far from achieving this.
In the mean time, area overlay as a technique
for comparing data on dissimilar boundaries
has very limited application.
Figure 5
BOUNDARIES
WHICH CHANGE OVER TIME
A special case of incomparable boundaries
is a geographical classification where the aerial
units change over time. The ABS uses a
single classification, the Australian Standard
Geographical Classification (ASGC), for most
data collections. Thus it is generally
possible to compare social and economic data
from different censuses and surveys on a common
set of boundaries. The aerial units of
the ASGC have a direct relationship to local
government areas. By including local government
in the geographical classification, the ABS
is able to provide local administrations with
a range of data vitally important to good governance
at the local area. Unfortunately local government
boundaries have changed considerably over time.
Over the years, large areas have been split
and rearranged as population growth has occurred.
Conversely, in these times of economic rationalism,
federal and state governments have encouraged
local governments to amalgamate and merge to
form larger administrative regions capable of
delivering a wider range of services to their
rate payers. This has meant that, while
it is possible to integrate statistics from
various collection, it is often not possible
to compare data, even from the same collection,
across time.
Consider the following example. In the
early 1990's, to allow for the expansion of
the city of Townsville, the three administrative
areas shown below became four areas.
Figure 6: Before the
change
Figure 7: After the
change
In statistical terms the new aerial units
are different to the old and this constitutes
a break in the statistical time series.
A time series of agricultural data showing the
number of cattle and calves in these areas would
look as follows.
Area
1991
1992
1993
1994
Burdekin (S) [ASGC Ed2.3]
85,602
79,379
79,430
74,420
Dalrymple (S) [ASGC Ed2.3]
482,086
521,660
499,530
413,669
Thuringowa (C) - Pt B [ASGC Ed2.3]
26,977
28,415
26,401
32,664
Total
594,665
629,454
605,361
520,753
Area
1991
1992
1993
1994
1995
1996
Burdekin (S) [ASGC Ed2.4]
72,791
73,611
Dalrymple (S) [ASGC Ed2.4]
395,052
424,125
Thuringowa (C) - Pt B [ASGC Ed2.4]
30,575
29,134
Townsville (C) - Pt B [ASGC Ed2.4]
75
1,439
Total
498,493
528,309
Table 1
It is not possible to examine the growth
or decline in cattle production in these areas
because it is not known how much is real growth
and how much is a result of boundary changes.
A simple GIS overlay, however, reveals the relationship
between old and new boundaries.
Table 2 below shows the
ratio of the area of change to the area of the
old unit.
Area
From
To
Area (square kilometres)
Proportion (of old area)
a
Dalrymple (S)
Thuringowa (C)- Pt B
16.9
0
b
Thuringowa (C)- Pt B
Thuringowa (C)- Pt B
1676.6
0.42
c
Thuringowa (C)- Pt B
Townsville (C)- Pt B
1556.7
0.39
d
Thuringowa (C)- Pt B
Burdekin (S)
280
0.07
e
Burdekin (S)
Burdekin (S)
4840.4
0.97
f
Burdekin (S)
Dalrymple (S)
161.9
0.03
g
Thuringowa (C)- Pt B
Dalrymple (S)
490.8
0.12
This ratio of areas can be applied to the
pre 1995 data on cattle and calves as follows:
Thuringowa (C) - Pt B [ASGC Ed2.41994
= 32,664 + (0 x 413,669) -
(0.39 x 32,664) - (0.07 x 32,664) - (0.12
x 32,664)
Thus a time series based on the new boundaries
results in the following table.
Area
1991
1992
1993
1994
1995
1996
Burdekin (S) [ASGC Ed2.4]
84,922
78,987
78,895
74,474
72,791
73,611
Dalrymple (S) [ASGC Ed2.4]
487,891
527,451
505,081
419,821
395,052
424,125
Thuringowa (C) - Pt B [ASGC Ed2.4]
11,330
11,934
11,088
13,719
30,575
29,134
Townsville (C) - Pt B [ASGC Ed2.4]
10,521
11,082
10,296
12,739
75
1,439
Total
594,665
629,454
605,361
520,753
498,493
528,309
Table 3
Using the area proportion technique it is
estimated that the Townsville (C) - Pt B area
would have contained over 10,000 cattle had
it existed in 1994. This area is known
to contain only 75 cattle in 1995 and just over
1,000 in 1996 so the estimate is most unlikely
to be correct. The GIS technique does
of course calculate the area of land moved from
one unit to another very precisely. The
error in this methodology is in the assumption
that the population is evenly distributed over
the original area. The GIS, however, shows
not only how much land is transferred.
It also shows which land. If additional
information is available about that land, it
is possible to improve the estimate. For
example an overlay which shows urban area or
national park or forest could be used to weight
the estimate by eliminating some land as unlikely
to contain cattle.
AN
ALTERNATIVE ESTIMATION
The problem of boundaries which change over
time is very common in any statistical analysis.
Statistical agencies are continually confronted
by the conflicting desire for currency with
real world aerial units versus stability of
existing statistical boundaries. It can
be seen from the above that GIS offers only
a limited solution. In this case, however,
there is an alternative approach. Reviewing
the example above, three areas became four but
the outside perimeter of the total area has
not changed. This is quite often the case,
except where a mass redesign of administrative
units has occurred (such as the Australian State
of Victoria in 1993/94). For the purpose
of this discussion this outside perimeter which
has not changed will be termed the bounding
region. Where data is available before
and after the change a simple estimate can be
made as follows.
Figure 8
The same GIS generated area proportions,
discussed in the Holbrook example above, can
be applied to these boundaries.
Because of the necessity to assume evenly distributed
population, the same limitations also apply.
Cattle may be more evenly distributed than people
but without some underlying information about
topography and land use it is dangerous to make
this assumption. For example, Table 1
above shows that in 1995 the number of cattle
in the newly created area of Townsville (C)-
Pt B is very low compared to the adjacent areas.
The GIS approach to the problem of showing
this years data on last years boundaries, or
vice versa, is to divide the old and new areas
into constituent parts which are common to both.
The diagram below illustrates this process.
The areas of change lie in the north east corner
of the total area. The GIS can determine,
quite precisely, by spatial overlay the area
which has moved from one unit in one year to
another unit in the next.
Figure 9: Areas of change
Vy = Totaly
x V(y+1)/ Total(y+1)
Where
y
is the year preceding the change
(y+1)
is the year immediately after the change
Vy
is the value of the data item for year
y but for an area as it exists after the
change
Totaly
is the sum of the data item for year
y for all the areas which make up the
bounding region
is the value of the data item, for year y+1
for an area as it exists after the change
A feature of this estimation procedure is that,
in proportioning this years data to last years
boundaries or vice versa, the actual data item
being studied is used in the calculation.
The distribution of the population is not assumed
to be even but is assumed to be the same before
and after the change in boundaries. In
other words if the new area of Townsville (C)
- Pt B has only 0.015% of the cattle and calves
in 1995 it is assumed that this area, had it
existed in 1994 would have contained 0.015%
of the total cattle and calves in the bounding
area.
For the area in the above example this method
of estimating gives the following results for
the old data on the new boundaries.
It is of course equally simple to estimate the
1995 and 1996 years on the pre 1995 boundaries.