|Workshop on Population
Data Analysis, Storage and Dissemination Technologies
|Bangkok, 27-30 March
ECONOMIC AND SOCIAL COMMISSION FOR ASIA AND THE
21 March 2001
Workshop on Population Data Analysis, Storage
and Dissemination Technologies
27-30 March 2001
of Collection, Analysis, Storage and Dissemination
of 1998 Census Data of Pakistan1/
|By: Muhammad Aslam Chaudhry
& Rana Insaf Alikhan
Information about data processing of 1998
This paper has been reproduced as submitted.
It has been issued without formal editing.
|1. Census history of Pakistan
dates back to 1881 when first scientific census
was undertaken by the British regime in the area
now comprising Pakistan. From that time
regular decennial census was conducted.
Since the inception of Pakistan the first census
was conducted in 1951, followed by in 1961, 1972
(due in 1971), 1981 and 1998 (due in 1991).
|2. The planning of 5th Census
was initiated in 1987 with the consultation of
users and delimitation of country into census
areas for enumeration purpose. At the initial
stage many users, including federal and provincial
governments, universities, research organizations
and some NGOs were contacted through correspondence.
Also individual users were given an opportunity
for face to face discussion to apprise the scope
of the population and housing census and inability
of Population Census Organization (PCO) to meet
some demands beyond census scope. However
they were guided to contact concerned agencies
for meeting their such demands. Also more
than 100 users were called in the Census Advisory
Committee (CAC) after the questionnaires were
designed and tested in the field alongwith their
field observations for discussion in the meeting.
Thus in the light of the recommendations of the
CAC questionnaires (forms) were finalized and
submitted to the Cabinet for approval.
|3. There were 4 questionnaires
- house listing operation, big count, sample count
and collection of information on housing and population
from FATA. House listing questionnaire contained
10 questions, housing census questionnaire 12,
big count questionnaire also 12 and sample count
contained 38 questions. Objectives of houselisting
operation were to prepare inventory of housing
units/households for carrying out subsequent census
operation without omission and duplication of
any building/housing unit, to provide guideline
to the field staff for identifying housing units
in their assigned area, to supervise the enumeration
operation and to ensure adequate logistic supplies.
|4. Main or the short count questionnaire
contained core questions viz name, relation to
the head of household, residential status, sex
of individuals, age, their marital status, nationality,
religion, mother tongue, literacy level, educational
attainment and issue of National Identity Cards
besides containing questions on individual households
like number of rooms, type of tenure, period since
constructed, construction material of outer walls
and roofs, source of potable water, light and
cooking fuel; and housing facilities like presence
and type of kitchen, bathroom and latrine and
source of getting information through TV, radio
or newspapers for keeping themselves aware about
day to day development. While the sample
or long questionnaire, in additional to the above
information, contained sensitive and difficult
questions demanding detailed probing like district
of birth, previous district of residence, period
since living in the district, reasons for migration,
field of education, occupation of individuals,
industries where they are working, their employment
status, unemployed persons and reasons o f their
unemployment, disability, children ever born,
children still living, children born during the
last one year preceding the census date and children
still living by sex and inoculation of children
against six fatal diseases namely tuberculossis,
diphtheria, pertussis, tetanus, polio and measles.
DATA COLLECTION METHODOLOGY
|5. For carrying out census operation
the entire country was divided into small areas,
called census blocks, comprising 200-250 households
or one thousand to 15 hundred persons without
combining the area of a Mauza/Deh/Village with
the area of other Mauza/Deh (in case of revenue
settled areas) and village (in case of unsettled
area). Any Mauza/Deh/Village smaller than
the above size was declared as an independent
census areas. On an average 5 to 7 blocks
were merged to form a census circle but within
the limit of a Patwar Halqa. Similarly 5 to 7
census circles were combined within a Qanungo
Halqa depending upon its size to form a census
charge. In turn census charges were combined
together within an administrative district (excluding
big urban areas and cantonments) to form as separate
census districts. All big urban areas were
notified as independent census districts.
All cantonments were declared as independent census
districts. The delimitation work was completed
by mid 1990, followed by preparation of maps of
census areas, their tracing and reproduction.
Because of 7 years delay in census operation the
delimitation work was updated from time to time
to accommodate likely population expansion.
Accordingly all field use material and logistics
|6. In the entire country (including
FATA and Islamabad), Azad Kashmir and Northern
Areas over 103 thousand blocks were delimited
falling in over 166 hundred census circles, nearly
25 hundred census charges and 180 census districts.
For coverage of these areas over 114 thousand
enumerators, comprising low paid staff of provincial
revenue and education departments, around 186
hundred circle supervisors, nearly 27 hundred
charge superintendents and 180 census district
officers were appointed with the assistance of
the concerned provincial governments. Circles
Supervisors were mostly senior Patwaries and Teachers,
Charge Superintendents were senior Quangos and
senior high school teachers while Census District
Officers were Deputy Commissioner of the concerned
census district, Municipal Chief of the Municipal
Committees/Corporations, Mayer of the Metropolitans
and Cantonment Executive Officers of Cantonment
Boards. They were given complete count training
for three days in batches which continued for
one month and ended a week before the census operation.
Those who could not get training in regular sessions,
because of certain reasons, were called for training
during the last week preceding the census date.
Preceding the training of field staff trainers
were given a week long training by the Master
Trainers including one day field visit and one
day for review of field experience and discussion
of problems faced by them. Master Trainers
were trained by the subject-matter specialists
of PCO. In all 27 master trainers and 824
trainers were trained who imparted training to
the field staff at 824 venues at 129 places throughout
Pakistan including Azad Kashmir and Northern Areas.
Besides, they were sent to the field for filling
in questionnaires themselves. These questionnaires
were evaluated and errors committed by them were
discussed and were rectified on the last day of
their session. These 824 trainers were picked
up from 1,525 persons for training on the basis
of their performance in the field and quiz test
given to them. Again of 824 trainers top
83 trainers were allowed to give training for
sample count. This training was also given
in batches of 30 persons each with six days duration
including field visit etc. At each venue
audio and video aids were provided for assistance
in training and maintaining uniformity in clearing
the concept. Because of several time postponsement
of census field staff was trained several times.
So before they were engaged in census operation
about 95 per cent of them had atleast seven times
training. Obviously the cost per head had
gone up but with better census returns.
|7. Armed forces assistance were
sought for carrying out smooth running of census
operation who helped the census field staff by
facilitating them logistic supplies and safe carrying
of the documents. No major complaint was
lodged from any corner except delay in snow bound
and inaccessible areas and failure to carry census
work in some pockets of Balochistan which were
minor in nature. Exception to this was the
part of Quetta city, population of which was estimated
later for releasing the provisional results.
|8. Besides engagement of Armed
Force personals, staff of Federal Bureau of Statistics,
local administration, Registration Department
and judiciary were engaged to supervise/check
the field operation. Census control rooms
were established in census offices and Monitoring
Cells were created at the provincial capitals
by the respective provincial governments for this
purpose. These control rooms functioned
24 hours during the census operation. During
this operation Census headquarters maintained
closed liaison with the concerned Armed Forces
Offices and administrative hierarchy for attending
complaints. List of control rooms was also
published twice in leading newspapers besides
publishing the same in Census journal called
'Hum Loag' copies of which were widely distributed
throughout the country. Also man to man
engagement of Armed Forces personnel with census
field staff ensured door to door visit by enumerators
for collection of census information.
|9. A fool proof arrangement
was ensured through sealing of census documents
at the level of enumerators and their supervisors
in the presence of Armed Forces representatives
minimising, if not eliminating, the chances of
tempering census information in the field.
Another stage where errors are likely to creep
into the census information is data entry.
Such chances were altogether eliminated this time
through direct scanning of filled-in questionnaires
by Optical Mark Reader (OMR) besides reducing
data entry time. This was first experience
of the PCO to use OMR for data entry in Pakistan,
therefore, questionnaire was designed for OMR
with the help of UNFPA advisors who visited PCO
from time to time. Thirty million forms
for big count were printing from U.S.A with the
financial assistance of UNFPA. However,
with gaining knowledge and experience sample count
and FATA questionnaires were designed by the staff
of the PCO and printed locally. Around 15
per cent of the filled-in forms were rejected
and transcribed on blanks forms because of their
mis-handling in the field, multiple marking, blank
entries, poor quality paper and carbonated ink
used for printing, improper cutting of critical
edge of these forms etc.
|10. Our analysis is based on
from simple techniques such as working out rates,
ratios, percentages to application of sophisticated
and complex techniques. For regular reports
like District Census Reports, Provincial and National
Census Reports the entire presentation of data,
in condensed format, mostly depends upon simple
analysis. However, detail analysis of data
as planned is to present in series of census monographs,
hand books, children's Profile and Women's Profile.
For projection of population model life tables
and logistic curve will be used. For working
out drop out, wastage etc, economically active
life and nuptiality study decrement life tables
will also be prepared.
|11. For the computation of age
at marriage singulate mean age at marriage (SMAM)
will be estimated by applying U.N. method and
|12. For estimation of migration
statistics our main reliance would be on simple
rates, ratios etc. However indirect estimation
of net migration will also be measured by applying
national growth rate as a counter check.
|13. Fertility will be estimated
by Brass method, reverse survival method and Relational
Gompertz Model. Regression analysis and
Path analysis techniques will also be applied
on fertility and its some linked variables to
study the relationship and contribution of such
variables on fertility behaviour. For estimation
of mortality two methods are likely to be used
that is Trussel's method for measuring child mortality
and Widowhood method for the estimation of adult
STORAGE OF DATA
|14. Since the Data was processed
at two different places i.e. Population Census
Organization Head Office and Computer Centre of
Federal Bureau of Statistics therefore, it was
stored at these two places. After scanning of
census filled in forms at Population Census Organization
Headquarters the data was stored on magnetic tapes,
on hard disks of 4 PCs of gega bites capacity
each and on 20 Zip disks of 100 MB each.
To save these storage devices from dust and humidity
protected environment was created at both the
places. Further safety measures such as
heat blowers for reducing humidity level, fumigation
for preventing pest attach and automatic alarm
for indication of smoke and gases injurious for
documents and workers both were adopted. (For
further details please seen Annex-A).
|15. These data input devices
were in continuous flow towards Computer Centre
of Federal Bureau of Statistics who after further
processing of data, tabulation and analysis stored
the data on magnetic tapes, Zip disks and CDs.
For this purpose 250 tapes of 2400 ft length or
7 MB capacity each, 65 Zip disks of 10 MB each
and 77 CDs of 650 MB each were used.
|16. We are also using Census
Library for storage of data in book shape with
supply of one published copy of each report to
our National Archive Department, Islamabad who
are responsible for storing all important national
documents on various storage media including micro-fish
|a. Dissemination Programme
|17. As planned last census report
was scheduled to be published before December,
2000. However, because of transcription problems,
late installation of some OMR machines, frequent
electricity fluctuation, etc. caused a slight
delay in releasing the last publication of the
census report which is expected to be released
now few months later to its scheduled time.
|b. Dissemination Methods
|18. Census results are widely
disseminated to all kinds of data users through
print media. The reports in the shape of
census bulletins/reports like District Census
Reports, Provincial Census Reports and National
Census Report of Pakistan are published and sent
to Federal Ministries, Provincial Departments,
Universities, Main Libraries and NGOs for their
use. The census data is also provided through
electronic media in the shape of CDs, tape etc
if required by any Government Department, Private
Agency, International Organization and person
|19. Census is a gigantic operation
so it is difficult to claim achievement of 100
per cent accuracy. However with experience,
improved methodologies and controlling of anticipated
problems, especially field related, are the offending
endeavours and efforts to claim improvement in
data through successive censuses. The ensuing
paragraphs witness some main findings in this
|20. The total area of Pakistan
is 796,095 square kilometres. The total
population as enumerated in 1951 was 33,816,555
persons which grew with an average annual growth
rate of 2.4 per cent from 1951-61, 3.7 per cent
from 1961-72, 3.1 per cent from 1972-81 and 2.69
per cent from 1981-98. The population as
enumerated in 1998 was over 132 million comprising
56.1 per cent living in Punjab, 22.6 per cent
in Sindh, 13.1 per cent in NWFP, 5.1 per cent
in Balochistan, 2.6 per cent in FATA and 0.4 per
cent in Islamabad.
|21. The population is unevenly
distributed with sparsely settled in Balochistan
having 19 persons per square kilometre and thickly
populated in Punjab and Islamabad having 231 and
376 persons living on one square kilometre of
land. The population pressure per square
kilometre of land has increased from 42 persons
in 1951 to 166 persons in 1998.
|22. The urban proportion has
also increased from 17.8 per cent in 1951 to 32.5
per cent in 1998. The sex ratio has decreased
during this period i.e. from 116 males for every
100 females in 1951 to 108 males in 1998 which
is an evident of better coverage of females in
the latest census when compared with the earlier
|23. The proportion of infant
(under one) has decreased too, from 2.6 per cent
in 1972 to 2.3 per cent in 1998. The proportion
of never married female seems to be continuously
increasing, 14.8 per cent in 1972, 17.4 per cent
in 1981 and 25.3 per cent in 1998. The infant
mortality rate is also declining rapidly from
over 140 per thousand in earlier 50's to 80 or
below at the time of holding of last census.
The proportions of children below 15 and aged
population (65 and over) are also decreasing.
These declining trend of proportion of children
particularly of infants, percentage of never married
females and infant mortality rate are the indication
of fertility decline though it has started declining
a bit latter than the decline in mortality. The
decline in proportion of children and aged population
has increased the age dependency ratio from 84.8
per cent in 1972 to above 90 per cent in 1998.
|24. The decline in infant mortality
rate, infact, has resulted from launching of immunization
programme against six fatal diseases. Though
health facilities has increased considerably yet
the incidence of some diseases like cardiovascular,
respiratory, cancer and gestro-enteritis has increased
over time which might have affected the aged population
(65 and over) or it could be due to difference
in content errors in the successive censuses or
|25. The literacy ratio has also
increased from a very low level of 16.7 per cent
in 1961 to 45.0 per cent in 1998. But still
we are far below the desired level of literacy
even amongst the less developed countries in the
world. The gap would have been perhaps much
wider if no conceptual changes have been introduced
in the census, over the period. There is
a sharp difference in literacy ratio of male and
female population. Female ratio has increased
from very low level of 6.7 per cent in 1961 to
32.6 per cent in 1998. While male ratio
has doubled during this period. Similarly there
is a wider gap in literacy ratio in urban and
rural areas. The ratio ranges from 64.7
per cent in urban to 34.4 per cent in rural areas.
Punjab has higher level of literacy, followed
by Sindh, NWFP and Balochistan. However,
proportion of literate population is the highest
in Islamabad as registered in the last census.
|26. Though lot of improvements
have been made in collection, processing and dissemination
of data over the past censuses yet we have to
achieve the desired accuracy in data collection.
We have used OMR technology for the first time
in the 1998 Census with almost no exposure of
any staff member of the Population Census Organization
with respect to designing of OMR questionnaire
and scanning through the machine thus we faced
a problems due to frequent electricity failure.
Printing of questionnaires from local market for
meeting additional demand exceeding over 30 million
(printed from USA) imposed scanning problems resulting
shooting up rejection rate.
|27. Another problem was misreporting
of ages especially of children, young unmarried
females and older people. That was mainly
due to ignorance of actual age backed by illiteracy.
Uptill now 55 per cent of population aged 10 year
and over is still not literate.
|28. Accurate head counting will
remain a problem as long as constitutional linkage
of census population for determination of National/Provincial
Assembly seats, allocation of national funds to
Provinces and civil servant recruitment quota
will remain in tact. Political gains and
ethnic rifts will be the additional problems affecting
accuracy of head counting. Seasonal migrants
will also remain a problem at district and probably
at provincial level in production of accurate
census results. Counting of mobile population
especially at/near coastal areas is also a problem
in enumeration of individuals.
|29. For electricity control
there is a need of installation of an independent
transformer for Population Census Organization's
own properly hooked up with each system of OMR
for controlled supply to the system.
|30. Population Census Organization
may instal its own roller printing machine for
printing of OMR questionnaires and other documents
meant for scanning of filled in forms/questionnaires.
Purchase of paper in required quantity and specifications,
may be ensured ahead of time schedule to avoid
any unforeseen problem.
|31. Though eradication of illiteracy
is the main solution for better age reporting
yet till the achievement of 100 per cent literacy
level Population Census Organization may explore
and depend upon other efforts for creating awareness
amongst the masses regarding benefits of accurate
|32. Three incentives affecting
census results may be delinked through necessary
amendments in the legislation, which perhaps is
a very difficult job. Alternatively these
incentives may be frozen for contain period as
|33. Problem of seasonal migrants
can be controlled to some extent by asking migration
questions on 100 per cent bases permitting readjustment
of seasonal migrants at present as well as their
usual place of residence.
| INFORMATION ABOUT DATA
PROCESSING OF 1998 CENSUS
|A) STORAGE OF CENSUS DOCUMENTS:
|The registers & books of
filled in Census Forms retrieved from the field
were stored in 554 steel racks, pre-labeled for
each each Admn./census district and properly arranged
in 6 stores.
|B) DATA ENTRY AND STORAGE
|Four OMRs, OPSCAN-21 Model 75/100
of National Computer System (NCS), USA were used
for capturing the data from the filled-in Census
Forms. The data was stored on the Hard Disks
(HDs) of the PCs., Pentium, attached with the
OMR machines. The storage capacity of each
HD is 4.0 Gega Bites (GB).
|C) TOTAL RECORDS:
|The total records feeded for
data production are about 24 millions, equal to
total filled-in forms (Long & Short).
The record length of Short Form is 204 characters
and that of Long Form is 347 characters.
The detail of separate Population & Housing
records will be provided after the availability
of tabulation of Pakistan from Federal Bureau
|D) REJECTION RATE:
|The rejection rate of data scanning
remained about 10 to 15 per cent for the forms
printed by NCS, USA and 15 to 30 per cent for
locally printed forms. The reasons are as under:
- General Reasons:
- Im-proper manual editing
of the Forms
- Seasonal impacts i.e.,
humidity, extreme coldness/hotness.
- Dust particles.
- Voltage problems between
Phase, Natural and Groun
- Reasons for American Printed
- Failure of Edit Checks
due to multiple marking and blankness.
- Reasons for Locally Printed
- System errors i.e. mis-alignment
of timing marks & ovals and faulty
- Printing errors i.e.
light/dark printing and use of improper
- Cutting errors
- Paper errors i.e. low/high
- Failure of edit chocks
|E) TRANSFER OF DATA:
|Most of the data was transferred
on 767 magnetic tapes of length 2400 feets and
3600 feets, which accommodated the date files
of approx. 7MB and 11MB respectively. However
the data files of some districts was also transferred
on 6 Zip Disks each of 100 MB. Moreover
13 Zip Disks of the same capacity were used for
back-up when the hard Disk of the PCs over-flowed.
Also 264 floppy diskettes each of (1.44 MB), were
used for supplementary data files.
|F) SAFETY/SECURITY MEASURES:
|The following safety measures
were taken for storage of census documents:
- Census Forms:
- The sensors of Smoke/Fire
Alarm System were installed in each store
- Smoking was strictly
prohibited in the store & OMR rooms.
- The issuing of documents
from the store were under the direct control
of a responsible officer.
- No outsider were allowed
to enter in the store and OMR rooms without
- The documents were
fully protected from dust and humidity
for an easy scanning.
- Input/Output Storage Devices:
- The input/output storage
devices i.e, tapes, Zip Disks, floppy
diskettes, were under the safe custody
of responsible officers.
- These devices were
stored under the required/proper environments.
- The data files were
created with the unique names and every
data tape was allotted a unique serial
number to avoid any loss of data.
- The data file names
were anonymous and stored in a unknown
directory for an unapproachable access
of un-authorized persons.
- The backup of all software
& applications pre-loaded on the hard
disk of the PCs. attached with the OMR
machines has been preserved on 2 Zip disks,
each of 100 MB.
NOTE: The quoted figures are tentative
which are likely to change slightly.