User’s Manual
for the
National Digital Base Map of
Public Library Locations
Version 1.0
written by
Dean K. Jue
Christine M. Koontz
GeoLib Program
Florida Resources and Environmental Analysis Center
Florida State University
Tallahassee, FL 32306
Funded by
National Center for Education Statistics
U.S. Department of Education
1990 K St.
Washington, D.C. 20006
Introduction
The National Center for Education Statistics (NCES)
initiated a formal public library statistics program in 1989. Nationwide descriptive statistics on public libraries
are collected and disseminated annually through a voluntary census called the
Public Libraries Survey (PLS). The survey is conducted by the NCES through the
Federal‑State Cooperative System (FSCS) for public library data. Statistics are collected from nearly 9,000
public libraries entities. Data are available for individual public libraries
and are also aggregated to state and national levels.
The collected data are
available for download from the NCES web site at The downloaded file is compressed into a WinZip format
for speed of downloading. When
uncompressed, the single file becomes three separate files. The general file name for each file is:
SumChrXX - summary statistics for libraries aggregated to the state
level
PubLibXX - library statistics for library entities
PloutXX - library statistics for each library outlet
where ‘XX’ represents the last two digits of the calendar year to which
the data pertains.
These three files have
been maintained by the NCES in either a dBase III or Microsoft Access
format. Although the maintenance of the
files in these tabular formats make the data readily-accessible to a wide
audience, the growth of geographic information system (GIS) software has led
the NCES to seek development of the PubLibXX and the PloutXX files into a
format that could be displayed in a GIS environment (i.e., in a mapped
format). This would allow library locations
to be displayed on a map on a computer screen relative to a large number of
other geographic data features that may have already been developed in a GIS
environment (e.g., roads, water features, hospitals, voting districts, school
district boundaries). It would also
allow libraries to better analyze the socioeconomic and demographic
characteristics of the population near each library outlet by mapping the
libraries relative to the U.S. Census Bureau geographic boundaries that are
already available in a GIS format (e.g., Census-defined geographic areas such
as places, counties, and school districts).
In late May of 2001,
the NCES, in collaboration with the National Commission on Library and
Information Science (NCLIS), contracted with Dr. Christie Koontz and Mr. Dean
K. Jue of the GeoLib Program at Florida State University to develop the 1998
PLS data files into a format suitable for use in a GIS. Although Version 1.0 of the GIS data files
utilizes 1998 PLS data, plans are to continue maintaining and updating these
GIS data files so they will, in short order, depict an accurate and up-to-date
location for all public library outlets and entities throughout the U.S.
Questions about NCES,
the FSCS, the PLS data files themselves, or interpretations of the library
statistics that are contained in those files should be addressed to Ms.
Adrienne Chute of the NCES at:
Library Statistics Program
National Center for Education Statistics
U.S. Department of Education
1990 K St., NW
Washington, D.C. 20006
Phone: (202) 502-7300
E-mail: Adrienne.Chute@ed.gov
Questions about the locations of library outlets as depicted in the GIS
data files, the boundaries of the legal service areas depicted in the GIS data
files, or the suitability of use of these GIS data files in a specific GIS
analysis or application should be directed to Mr. Dean Jue at the GeoLib
Program at Florida State University at:
GeoLib Program
Florida State University
C2200 University Center
Tallahassee, Florida 32306
Phone: (850) 644-2007
E-mail: djue@admin.fsu.edu
Development of the GIS Data Sets
The three GIS data
sets were developed using the 1998 PLS data files. After uncompressing the WinZip compressed
file, there were three database files:
SumChr98 - 1998 summary statistics for libraries aggregated to the
state level
PubLib98 - 1998 library statistics for library entities
Plout98 - 1998
library statistics for each library outlet
The SumChr98 data file was not used in the development of the GIS data
sets. There were 16,994 library outlets
in the initial Plout98 data file and 8,966 library entities in the initial
PubLib98 data files prior to processing.
The NCES and GeoLib
agreed that bookmobiles were not to be mapped for the purposes of these GIS
data sets since bookmobiles did not have a fixed location. The library outlets and entities that were
outside of the 50 U.S. states and the District of Columbia were also excluded.
After removing the
bookmobiles and the library entities and outlets outside of the 50 U.S. states
and the District of Columbia from the original data files, the modified
PubLib98 and the Plout98 data files were sent to a geocoding vendor. Utilizing the address, city, and zip code
fields for each record in the files, the geocoding vendor assigned a latitude
and longitude location to each library entity and library outlet along with a
geocoding matching/accuracy code. The
library entities and outlets that were assigned a high geocoding matching and
accuracy code (i.e., an “AS0,” an “AS1,” or an “AS2") were deemed to
already be of high-enough geographic accuracy for the public library GIS data
sets.
For the remaining
outlets and entities that did not receive a high geocoding accuracy code,
additional research was conducted into determining their exact geographic
location. This research could take
either one of several paths:
1) other researchers may have already done some work
on determining a more accurate location for an outlet or an entity. For instance, it is not possible to utilize a
post office box mailing address to determine the physical location of a
library. But if a physical address can
be determined for that library, then it may be possible to utilize the
newly-obtained physical address for geocoding.
2) some libraries have web sites where they have maps
showing the locations of their outlets.
If a library-developed map of their location was deemed to be accurate
enough to display on their web page, then it was deemed accurate enough for the
GIS data sets.
3) if neither of the above two methods resulted in an
accurate location for a public library entity or outlet, then a phone call was
made to the entity or outlet. Using
either a printed map or a digital map on a computer screen, GeoLib staff would
discuss the location of the entity or outlet relative to geographic features or
roads with a librarian or a library assistant and determine the best point
location for the entity or outlet.
Once the improved
geographic location for a library entity or outlet was obtained, the digital US
Bureau of the Census 2000 TIGER street line files were used as the reference
map to which the improved library entity or outlet locations were then
re-mapped. This process was used to
develop both the library entity and the library outlet GIS data sets.
The development of the
legal services area GIS data set required using the geographic code attribute
field contained in the original library entity file (i.e., the PubLib98
file). This geographic code can be any
one of eleven values. The assigned value
indicated whether the legal service area of a library entity corresponded to a
city, county, a metropolitan area, a multi-county area, a school district, or
some other unit of geography, either exactly or most nearly. The boundaries of many of these geographic area
features are maintained in the network of lines in the U.S. Census Burea’s
TIGER files. For consistency sake, the
boundaries of these features are maintained using a standardized Federal
Information Processing Standards (FIPS) code that is assigned to each county,
place, metropolitan area, or school district. Using this linear network from
the TIGER files and knowing the appropriate FIPS code for the geographic area
associated with a library entity’s legal service area then allows a GIS program
to generate the appropriate polygon representing the legal service area for a
particular library entity.
The FIPS code for most
geographic areas can be readily obtained from the Internet web site For example,
the geographic code for the Juneau Public Library in Alaska is a “CO1" in
the PubLib98 data file, which means that its legal service area boundaries
exactly matches that of a county. Using
the FIPS web site, one learns that the FIPS code for the Juneau Borough
(county) is “02110.” A GIS program can
then extract the boundaries for the geographic area that corresponds to FIPS
code 02110, the Juneau Borough in Alaska, from the 2000 TIGER files.
This procedure was
utilized for developing the legal service areas for library entities whose
geographic code indicated that their legal service area boundaries conformed
exactly or most nearly to a place (city), county, or metropolitan area. For entities whose legal service areas
conformed exactly or most nearly to a multi-county area or a school district, a
phone call was made to the library entity to determine exactly which counties
or school district(s) were in its legal service areas. From the information provided over the phone,
those library entities’ legal service areas could then be derived using either
a combination of several county FIPS code or through the FIPS code value(s) for
the school district(s) in the entity’s legal service area.
More specific details
about the development of each of the specific GIS data sets are provided in the
following section.
Description of GIS Data Sets
This section describes
each of the three GIS data sets in more detail.
Each of the three data sets is maintained in a shapefile (.shp) format
and should be capable of being imported into any GIS software package that can
read and import shapefiles.
Library Entity GIS Data Set
This is a point data
set composed of three separate files entitled
all_entities.shp
all_entities.shx
all_entities.dbf
and requires approximately 9 megabytes of disk space.
The original library
entity data file from the 1998 PLS contained 8,966 library entities. The GIS data set contains 8,959
entities. See Appendix A for a
discussion of this difference and the specific library entities that are not
part of the GIS data set.
With the exception of
the addition of the .shp and .shx file and one additional attribute field
(named “Accuracy”) in the all.entities.dbf file, there is no other difference
between the GIS data set for library entities and the 1998 PLS data for library
entities obtained directly from NCES.
The .shp and .shx file
and the Accuracy attribute field are the only maintained information that is
unique to the GIS environment. The .shp
and .shx files maintain the geographic location of the library entities
relative to the computer-generated map on which the entities will be displayed
in a GIS environment. The Accuracy field
maintains the information on how accurate the mapped geographic location of a
particular library entity should be.
The value of the
Accuracy field can be any one of five different values. These values are “AS0" (5,898 entities),
“AS1" (19 entities), “AS2" (6 entities), “AC1" (1,268 entities),
and “AC2" (1,768 entities). These
values represent the following levels of geographic accuracy:
AS0 - This address location code indicates an
address range geocode and is the most accurate computer-automated geocode
available. The street segment was found
in the city/zip code TIGER file that matched that in the library entity’s
address. The street segment also had a
from/to address range for both the left and right-hand side of the segment into
which the numeric portion of the entity’s address fell.
AS1
- Same as AS0 except that the street
side is unknown.
AS2
- Same as AS0, with the address
interpolated onto a TIGER street segment on which the address range was added.
AC1
- The library entity point was located
through information obtained directly by GeoLib staff. This usually meant a
direct telephone conversation with a representative from the library
entity. During such conversations, an
attempt was always made to obtain a location accurate to within 100 feet or
less. Other sources of information that
could be used for a library entity to receive this accuracy code included very
specific and detailed maps or instructions on how to get to the library that
may be posted on the library entity’s web page.
See Appendix D for further information on this accuracy classification
code.
AC2
- The library entity point was located
through information obtained directly by GeoLib staff. However, due to the remoteness of the library
entity or contradictory information between the digital TIGER map and the
information provided by the library entity representative, the accuracy of the
library entity location to within at least one-quarter mile or closer cannot be
ascertained without an on-site visit.
Accuracy should still be to within one-half mile or less. Secondary sources that provided library
entity location information that are believed to be highly reliable but which
was not independently verified by GeoLib staff was also given this accuracy
classification. See Appendix D for
further information on this accuracy classification code.
The highest level of confidence in
placement of a point for a library entity is represented by the AC1 accuracy
classification, with a locational accuracy of one-quarter mile or closer,
whether rural or urban. The remaining
accuracy classifications are all approximately equal in accuracy but the method
of locating the particular point is different for the different accuracy
codes. Accuracy in urban areas should be
to within one or two city blocks but accuracy may be less for entities in rural
areas due to the longer distances between street intersections. Accuracy for these other classifications
should be within one-half mile or (usually) closer.
Users should keep in mind that many
library entities are also central library outlets of a larger library
“system.” Consequently, a library entity
can and often does appear in the public library outlet file as well.
Library
Outlet GIS Data Set
This is a point data set is composed of three separate
files entitled
all_outlets.shp
all_outlets.shx
all_outlets.dbf
and requires between four and five megabytes of disk
space.
The
original library outlet data file from the 1998 PLS data file contained 16,994
library outlets. The GIS data set
contains 16,112 outlets. See Appendix B
for a discussion on this difference and the specific library outlets that are
not part of the GIS data set.
With
the exception of the addition of the .shp and .shx file and one additional
attribute field (named “Accuracy”) in the all.outlets.dbf file, there is no
other difference between the GIS data set for library outlets and the 1998 PLS
data for library outlets obtained directly from NCES.
The
.shp and .shx file and the Accuracy attribute field are the only maintained
information that is unique to the GIS environment. The .shp and .shx files maintain the
geographic location of the library outlets relative to the computer-generated
map on which the outlets will be displayed in a GIS. The Accuracy field maintains the information
on how accurate the mapped geographic location of a particular library outlet should be.
The
Accuracy field can take on any one of five different values. These values are “AS0" (9,809 outlets),
“AS1" (66 outlets), “AS2" (35 outlets), “AC1" (3,783 outlets),
and “AC2" (2,419 outlets). These
values represent the following levels of geographic accuracy:
AS0 - This
address location code indicates an address range geocode and is the most
accurate geocode available. The street
segment was found in the city/zip code TIGER file that matched that in the
library outlet’s address. The street
segment also had a from/to address range for both the left and right-hand side
of the segment into which the numeric portion of the outlet’s address fell.
AS1
- Same as AS0 except that the street
side is unknown.
AS2
- Same as AS0, with the address interpolated
onto a TIGER street segment on which the address range was added.
AC1
- The library outlet point was located
through information obtained directly by GeoLib staff. Most of the time this meant a direct
telephone conversation with a representative from the library outlet. During such conversations, an attempt was
always made to obtain a location accurate to within 100 feet or less. Other sources of information that could be
used for a library outlet to receive this accuracy code included very specific
and detailed maps or instructions on how to get to the library that may be
posted on the library outlet’s web page.
See Appendix D for further information on this accuracy classification
code.
AC2
- The library outlet point was located
through information obtained directly by GeoLib staff. However, due to the remoteness of the library
outlet or contradictory information between the digital TIGER map and the
information provided by the library outlet representative, the accuracy of the
library outlet location to within at least one-quarter mile or closer cannot be
ascertained witout an on-site visit.
Accuracy should still be to within one-half mile or less. Secondary sources that provided library
outlet location information that are believed to be highly reliable but which
was not independently verified by GeoLib staff was also given this accuracy
classification. See Appendix D for
further information on this accuracy classification code.
The highest level of confidence in
placement of a point for a library outlet is represented by the AC1 accuracy
classification, with a locational accuracy of one-quarter mile or closer. The remaining accuracy classifications are
all approximately equal in accuracy but the method of locating the particular
point is different for the different accuracy codes. Accuracy in urban areas should be to within
one or two city blocks but accuracy may be less for those entities in rural
areas due to the longer distances between street intersections. Accuracy for these other classifications
should be within one-half mile or (usually) closer.
Users should keep in mind that some
library outlets are also library entities of a larger library “system.” Consequently, a library outlet may also
appear in the public library entity file.
Legal Service Areas GIS Data Set
This
data set is comprised of three files named:
all_lsa.shp
all_lsa.shx
all_lsa.dbf
and requires about 28 megabytes of disk storage.
There
is one legal service area for each library entity. Thus, there could be a maximum of 8,959 legal
service areas in this GIS data set. An
entity’s particular service area could have a partial or 100% overlap with the
service area of another entity. For this
release of the legal service areas GIS data set, there is a total of 6,646
legal service areas, or just over 74% of the possible total.
It
was agreed upon between the NCES and GeoLib that the first release of the legal
services area GIS data set would only contain those areas that either exactly
matched or closely matched that of a US
Census-defined boundary (e.g., a place, county, metropolitan area, or school
district). Any library entity that
selected the “Other” category for its legal service area was immediately
relegated to a future data release for the development of its legal service
area boundaries. In addition to those
library entities whose legal service areas fell into the “Other” category,
some library entities had geographic
codes of “CI1" or “CI2" (i.e., their legal service areas either
exactly or most nearly matched that of a city) in which the digital boundary
data for the corresponding city limits did not exist in the 2000 TIGER
files. The development of the legal
service area for those library entities were also postponed until the next
release of this data set. See Appendix C for a detailed discussion of which
library entities had the development of their legal service area boundaries
postponed until future data set releases.
In
order to develop as many legal service area boundaries as possible for this GIS
data set, the NCES requested that the distinction between exactly or most
nearly matching a particular Census geography be ignored. For instance, regardless of whether a library
entity indicated that its legal service area either exactly matched or most
nearly matched a city, the geographic area for that city as depicted in the
2000 TIGER files would be used as the entity’s legal service area. This approximation of a legal service area
for those library entities that do not exactly match that assigned geographic
area will be fine-tuned in future data releases of this data set.
The
reason for developing the legal service area boundaries is to link the
geographic area inside the developed boundaries with the Census 2000 population
and demographic data. This linkage
between the geographic area and the Census 2000 data was performed for all
library entities for which either an exact or an approximate legal service area
boundary could be developed. See the
following section on the database structure of the legal service area GIS data
set to obtain a detailed description of the Census 2000 data that has been
developed for each library entity’s legal service area.
Consideration and Limitation in Using the GIS Data
Sets
These
GIS data sets were developed from a nationwide database maintained by a federal
agency after the data was received from a state coordinator that standardized
the local library information.
Consequently, there are issues of timeliness and accuracy that are
common to all such databases that need to be considered in evaluating whether
or not these GIS data sets are suitable for use in a particular analysis.
Accuracy of Locations
It is
believed that these GIS data sets are the first and only nationwide coverage of
public library entities, public library outlets, and the associated legal
service area of library entities that is worthy of the term “digital nationwide
GIS base map of public libraries.” That
is, these data sets were developed using a standardized methodology in which a
library outlet or entity’s location is documented to a known level of accuracy
such that the data sets can be used as a map reference upon which other data
sets (e.g., 2000 Census data) can be overlaid with a high degree of accuracy
and confidence.
Although
a systematic study of the distance libraries points were moved was not
undertaken as part of the data development process, library locations that were
not geocoded with an AS0, AS1, or AS2 were routinely being moved two to three
miles from their original geocoded location as part of this data set
development process. Movement of
libraries to new locations that were 10 or more miles from their original
location was not uncommon in rural areas.
The maximum distance one library was moved during the development of
these data sets is believed to be about 74 miles in the state of Alaska.
The
placement of a library location to within approximately one hundred feet of its
actual location is sufficient for local library directors evaluating the
adequacy of their local library services to their local population or for
regional, statewide, or national library research and policymaking. However, without actual on-site
“ground truthing”, the point location in the GIS data set for the library
entities and outlets are NOT accurately located to a level of precision that is
suitable for detailed local government requiring extremely high levels of
accuracy. In those particular
situations, being off by one or two city blocks can present major problems (e.g.,
for a traffic planner modeling transportation flows).
Some
specific caveats relative to the library entity and outlet locations that users
of these data sets should remember include:
1) geocoding vendors assign latitude/longitude
locations based on the existence of a street segment with an appropriate
address range that matches a correct address provided to them. If there is both a 500 N. Main St. and a 500
S. Main St. separated by one mile in the same town with the same zip code and
the provided address read “North Main” where it should have been “South Main”,
then the geocoded location would be off by one mile. The accuracy code would still be recorded as
AS0 though because there is no way the vendor (i.e., the computer) could have
known about the error in the provided address.
2) distance estimates by individuals are not
necessarily accurate. During this
project, library representatives would estimate distances to be between “one or
two miles” when the 2000 TIGER files would show that the total distance between
the two points in question is less than one mile. The GeoLib staff making the telephone call
would try to bracket the location of the library outlet or entity in between
two streets (e.g., “So your location is between Second and Third St. then, is
that correct?”) to ensure accurate placement of the point in the GIS data set
but this was not always possible, especially towards the periphery of rural
towns.
3) the location of the entities and outlets are only
as good as the description provided to the GeoLib staff by the library director
or library representative. Questionable
entity or outlet locations were assigned an AC2 code and GeoLib staff did not
assign AC1 codes if there was any ambiguity.
Users of these data sets must remember that no on-site visits were made
to check the accuracy of the information provided over the phone. Especially in rural areas, street names are
oftentimes not used at all and the sole library worker/director may be very
familiar with the area but not know any of the street names in the area as
provided in the 2000 TIGER files.
Accuracy of the Database Information
Most
of the attribute data in the GIS data sets are taken directly from the 1998 PLS
data. However, there are several
database accuracy issues that are specific to the GIS databases that users
should remember:
1) a procedural question that needed to be answered
when telephone calls were being made to the library representatives is: If a
library location, name, address, or phone number has changed relative to the
1998 PLS data that was originally provided to GeoLib, what level of correction
should be made to the original database?
After careful consideration, the following decisions were made:
A)
The GIS data set, since it is a newly-developed point
data set, will reflect the most current location possible of a library entity
or outlet. That is, if a library entity
or outlet has moved since its address in the 1998 PLS data, then it will be
mapped to its most recent (latest) location.
B)
Any other changes relative to the 1998 PLS library
attribute data will NOT be incorporated into the GIS data sets. Rather, the original data from the 1998 PLS
data will be used in the GIS data set. Otherwise a data integrity problem will
develop relative to the original 1998 PLS data.
As a result of this decision, the mapped location of
all public library entities and outlets in the GIS data set is correct as of
June 1, 2001 (when telephone calls were first initiated) or later. The attribute data that is associated with
the entities and outlets however may not be correct, including possibly the
address. It will be correct (i.e., the
same) relative to the 1998 PLS data. In
the situation where the mapped location of an outlet or entity does not match
that given its physical address listed in the attribute database, a user should
assume that the mapped location is more current and correct.
2) The GIS database sets were developed using the data
in the 1998 PLS data. Thus, any library
entity or outlet in that database that may have been closed since its original
incorporation into the PLS data will be reflected in the GIS data set because
GeoLib had determined that the entity or outlet was closed and it was simply
not mapped in the GIS data set. By the
same token, any entity or outlet relocations that may have occurred since 1998
will also be reflected in the GIS data sets.
A completely new outlet or entity that opened since the development of
the 1998 PLS data, however, will NOT be incorporated into the current GIS data
sets because the entity or outlet was not listed in the 1998 PLS data. Such entities or outlets will be incorporated
into future releases of the GIS data sets.
Database Structure of the GIS Data Sets
This
section describes the unique attribute fields for the GIS data sets. The attribute fields that are developed by
the FSCS and maintained in the original NCES 1998 PLS data is NOT described
here. The reader is referred to the NCES
web page at for information about obtaining documentation on
those attribute fields.
Library Entity and Library Outlet GIS Data Sets
Field name is the name of the attribute field in the
.dbf file. The field length is the
length of the attribute field in bytes.
The data type can be character or numeric. If it is numeric, it is followed by the
overall length of the field and the number of decimal places in the
number. The description discusses what
is contained in the attribute field and the valid values for the field if appropriate.
For both of these data sets, there is only one
additional attribute field added to the original PLS data. This attribute field is as follows:
|
FIELD NAME |
LENGTH |
DATA TYPE |
DESCRIPTION |
|
ACCURACY |
4 |
Character |
Maintains record of how point location for entity or
outlet was derived; Valid values are AS0, AS1, AS2, AC1, or AC2 |
<All
other fields in the all_outlets.dbf and all_entities.dbf files are maintained
and documented
through the NCES, U.S. Department of
Education>
Legal Service Areas GIS Data Set
Field name is the name of the attribute field in the .dbf file. The field length is the length of the
attribute field in bytes. The data type
can be character or numeric. If it is
numeric, it is followed by the overall length of the field and the number of
decimal places in the number. The
description discusses what is contained in the attribute field and the valid
values for the field if appropriate.
All the demographic and population numbers are from the 2000 Census
Redistricting data and represent the totals within the legal service area
identified for the library entity.
The database structure for the all_lsa.dbf attribute file is as
follows:
|
FIELD NAME |
FIELD LENGTH |
DATA TYPE |
DESCRIPTION |