User’s Manual

 

for the

 

National Digital Base Map of

Public Library Locations

 

Version 1.0

 

 

 

 

written by

 

Dean K. Jue

Christine M. Koontz

 

GeoLib Program

Florida Resources and Environmental Analysis Center

Florida State University

Tallahassee, FL 32306

 

 

Funded by

National Center for Education Statistics

U.S. Department of Education

1990 K St.

Washington, D.C. 20006

 

 


Introduction


 


          The National Center for Education Statistics (NCES) initiated a formal public library statistics program in 1989.  Nationwide descriptive statistics on public libraries are collected and disseminated annually through a voluntary census called the Public Libraries Survey (PLS). The survey is conducted by the NCES through the Federal‑State Cooperative System (FSCS) for public library data.  Statistics are collected from nearly 9,000 public libraries entities. Data are available for individual public libraries and are also aggregated to state and national levels.

 

            The collected data are available for download from the NCES web site at   The downloaded file is compressed into a WinZip format for speed of downloading.  When uncompressed, the single file becomes three separate files.  The general file name for each file is:

 

            SumChrXX      - summary statistics for libraries aggregated to the state level

            PubLibXX        - library statistics for library entities

            PloutXX           - library statistics for each library outlet

 

where ‘XX’ represents the last two digits of the calendar year to which the data pertains.

 

            These three files have been maintained by the NCES in either a dBase III or Microsoft Access format.  Although the maintenance of the files in these tabular formats make the data readily-accessible to a wide audience, the growth of geographic information system (GIS) software has led the NCES to seek development of the PubLibXX and the PloutXX files into a format that could be displayed in a GIS environment (i.e., in a mapped format).  This would allow library locations to be displayed on a map on a computer screen relative to a large number of other geographic data features that may have already been developed in a GIS environment (e.g., roads, water features, hospitals, voting districts, school district boundaries).  It would also allow libraries to better analyze the socioeconomic and demographic characteristics of the population near each library outlet by mapping the libraries relative to the U.S. Census Bureau geographic boundaries that are already available in a GIS format (e.g., Census-defined geographic areas such as places, counties, and school districts).

 

            In late May of 2001, the NCES, in collaboration with the National Commission on Library and Information Science (NCLIS), contracted with Dr. Christie Koontz and Mr. Dean K. Jue of the GeoLib Program at Florida State University to develop the 1998 PLS data files into a format suitable for use in a GIS.  Although Version 1.0 of the GIS data files utilizes 1998 PLS data, plans are to continue maintaining and updating these GIS data files so they will, in short order, depict an accurate and up-to-date location for all public library outlets and entities throughout the U.S.

 

            Questions about NCES, the FSCS, the PLS data files themselves, or interpretations of the library statistics that are contained in those files should be addressed to Ms. Adrienne Chute of the NCES at:

                                                                                   

 

Library Statistics Program

National Center for Education Statistics

U.S. Department of Education

1990 K St., NW

Washington, D.C. 20006

Phone: (202) 502-7300

E-mail: Adrienne.Chute@ed.gov

 

Questions about the locations of library outlets as depicted in the GIS data files, the boundaries of the legal service areas depicted in the GIS data files, or the suitability of use of these GIS data files in a specific GIS analysis or application should be directed to Mr. Dean Jue at the GeoLib Program at Florida State University at:

 

GeoLib Program

Florida State University

C2200 University Center

Tallahassee, Florida 32306

Phone: (850) 644-2007

E-mail: djue@admin.fsu.edu

 


Development of the GIS Data Sets

 

            The three GIS data sets were developed using the 1998 PLS data files.  After uncompressing the WinZip compressed file, there were three database files:

 

            SumChr98        - 1998 summary statistics for libraries aggregated to the state level

            PubLib98         - 1998 library statistics for library entities

            Plout98                        - 1998  library statistics for each library outlet

 

The SumChr98 data file was not used in the development of the GIS data sets.  There were 16,994 library outlets in the initial Plout98 data file and 8,966 library entities in the initial PubLib98 data files prior to processing.

 

            The NCES and GeoLib agreed that bookmobiles were not to be mapped for the purposes of these GIS data sets since bookmobiles did not have a fixed location.  The library outlets and entities that were outside of the 50 U.S. states and the District of Columbia were also excluded.

 

            After removing the bookmobiles and the library entities and outlets outside of the 50 U.S. states and the District of Columbia from the original data files, the modified PubLib98 and the Plout98 data files were sent to a geocoding vendor.  Utilizing the address, city, and zip code fields for each record in the files, the geocoding vendor assigned a latitude and longitude location to each library entity and library outlet along with a geocoding matching/accuracy code.  The library entities and outlets that were assigned a high geocoding matching and accuracy code (i.e., an “AS0,” an “AS1,” or an “AS2") were deemed to already be of high-enough geographic accuracy for the public library GIS data sets.

 

            For the remaining outlets and entities that did not receive a high geocoding accuracy code, additional research was conducted into determining their exact geographic location.  This research could take either one of several paths:

 

1) other researchers may have already done some work on determining a more accurate location for an outlet or an entity.  For instance, it is not possible to utilize a post office box mailing address to determine the physical location of a library.  But if a physical address can be determined for that library, then it may be possible to utilize the newly-obtained physical address for geocoding.


2) some libraries have web sites where they have maps showing the locations of their outlets.  If a library-developed map of their location was deemed to be accurate enough to display on their web page, then it was deemed accurate enough for the GIS data sets.

3) if neither of the above two methods resulted in an accurate location for a public library entity or outlet, then a phone call was made to the entity or outlet.  Using either a printed map or a digital map on a computer screen, GeoLib staff would discuss the location of the entity or outlet relative to geographic features or roads with a librarian or a library assistant and determine the best point location for the entity or outlet.

 

            Once the improved geographic location for a library entity or outlet was obtained, the digital US Bureau of the Census 2000 TIGER street line files were used as the reference map to which the improved library entity or outlet locations were then re-mapped.  This process was used to develop both the library entity and the library outlet GIS data sets.

 

            The development of the legal services area GIS data set required using the geographic code attribute field contained in the original library entity file (i.e., the PubLib98 file).  This geographic code can be any one of eleven values.  The assigned value indicated whether the legal service area of a library entity corresponded to a city, county, a metropolitan area, a multi-county area, a school district, or some other unit of geography, either exactly or most nearly.  The boundaries of many of these geographic area features are maintained in the network of lines in the U.S. Census Burea’s TIGER files.  For consistency sake, the boundaries of these features are maintained using a standardized Federal Information Processing Standards (FIPS) code that is assigned to each county, place, metropolitan area, or school district. Using this linear network from the TIGER files and knowing the appropriate FIPS code for the geographic area associated with a library entity’s legal service area then allows a GIS program to generate the appropriate polygon representing the legal service area for a particular library entity.

 

            The FIPS code for most geographic areas can be readily obtained from the Internet web site   For example, the geographic code for the Juneau Public Library in Alaska is a “CO1" in the PubLib98 data file, which means that its legal service area boundaries exactly matches that of a county.  Using the FIPS web site, one learns that the FIPS code for the Juneau Borough (county) is “02110.”  A GIS program can then extract the boundaries for the geographic area that corresponds to FIPS code 02110, the Juneau Borough in Alaska, from the 2000 TIGER files.

 

            This procedure was utilized for developing the legal service areas for library entities whose geographic code indicated that their legal service area boundaries conformed exactly or most nearly to a place (city), county, or metropolitan area.  For entities whose legal service areas conformed exactly or most nearly to a multi-county area or a school district, a phone call was made to the library entity to determine exactly which counties or school district(s) were in its legal service areas.  From the information provided over the phone, those library entities’ legal service areas could then be derived using either a combination of several county FIPS code or through the FIPS code value(s) for the school district(s) in the entity’s legal service area.

 

            More specific details about the development of each of the specific GIS data sets are provided in the following section.

 


Description of GIS Data Sets

 

            This section describes each of the three GIS data sets in more detail.  Each of the three data sets is maintained in a shapefile (.shp) format and should be capable of being imported into any GIS software package that can read and import shapefiles.

 

Library Entity GIS Data Set

 

            This is a point data set composed of three separate files entitled

 

            all_entities.shp

            all_entities.shx

            all_entities.dbf

 

and requires approximately 9 megabytes of disk space.

 

            The original library entity data file from the 1998 PLS contained 8,966 library entities.  The GIS data set contains 8,959 entities.  See Appendix A for a discussion of this difference and the specific library entities that are not part of the GIS data set.

 

            With the exception of the addition of the .shp and .shx file and one additional attribute field (named “Accuracy”) in the all.entities.dbf file, there is no other difference between the GIS data set for library entities and the 1998 PLS data for library entities obtained directly from NCES.

 

            The .shp and .shx file and the Accuracy attribute field are the only maintained information that is unique to the GIS environment.  The .shp and .shx files maintain the geographic location of the library entities relative to the computer-generated map on which the entities will be displayed in a GIS environment.  The Accuracy field maintains the information on how accurate the mapped geographic location of a particular library entity should be.

 

            The value of the Accuracy field can be any one of five different values.  These values are “AS0" (5,898 entities), “AS1" (19 entities), “AS2" (6 entities), “AC1" (1,268 entities), and “AC2" (1,768 entities).   These values represent the following levels of geographic accuracy:

 

AS0 -   This address location code indicates an address range geocode and is the most accurate computer-automated geocode available.  The street segment was found in the city/zip code TIGER file that matched that in the library entity’s address.  The street segment also had a from/to address range for both the left and right-hand side of the segment into which the numeric portion of the entity’s address fell.

AS1 -   Same as AS0 except that the street side is unknown.

AS2 -   Same as AS0, with the address interpolated onto a TIGER street segment on which the address range was added.

AC1 -   The library entity point was located through information obtained directly by GeoLib staff. This usually meant a direct telephone conversation with a representative from the library entity.  During such conversations, an attempt was always made to obtain a location accurate to within 100 feet or less.  Other sources of information that could be used for a library entity to receive this accuracy code included very specific and detailed maps or instructions on how to get to the library that may be posted on the library entity’s web page.  See Appendix D for further information on this accuracy classification code.

AC2 -   The library entity point was located through information obtained directly by GeoLib staff.  However, due to the remoteness of the library entity or contradictory information between the digital TIGER map and the information provided by the library entity representative, the accuracy of the library entity location to within at least one-quarter mile or closer cannot be ascertained without an on-site visit.  Accuracy should still be to within one-half mile or less.  Secondary sources that provided library entity location information that are believed to be highly reliable but which was not independently verified by GeoLib staff was also given this accuracy classification.  See Appendix D for further information on this accuracy classification code.

 

            The highest level of confidence in placement of a point for a library entity is represented by the AC1 accuracy classification, with a locational accuracy of one-quarter mile or closer, whether rural or urban.  The remaining accuracy classifications are all approximately equal in accuracy but the method of locating the particular point is different for the different accuracy codes.  Accuracy in urban areas should be to within one or two city blocks but accuracy may be less for entities in rural areas due to the longer distances between street intersections.  Accuracy for these other classifications should be within one-half mile or (usually) closer.

 

            Users should keep in mind that many library entities are also central library outlets of a larger library “system.”  Consequently, a library entity can and often does appear in the public library outlet file as well.

 

Library Outlet GIS Data Set

 

This is a point data set is composed of three separate files entitled

 

            all_outlets.shp

            all_outlets.shx

            all_outlets.dbf

 

and requires between four and five megabytes of disk space.

 

            The original library outlet data file from the 1998 PLS data file contained 16,994 library outlets.  The GIS data set contains 16,112 outlets.  See Appendix B for a discussion on this difference and the specific library outlets that are not part of the GIS data set.

 

            With the exception of the addition of the .shp and .shx file and one additional attribute field (named “Accuracy”) in the all.outlets.dbf file, there is no other difference between the GIS data set for library outlets and the 1998 PLS data for library outlets obtained directly from NCES.

 

            The .shp and .shx file and the Accuracy attribute field are the only maintained information that is unique to the GIS environment.  The .shp and .shx files maintain the geographic location of the library outlets relative to the computer-generated map on which the outlets will be displayed in a GIS.  The Accuracy field maintains the information on how accurate the mapped geographic location of a particular library outlet  should be.

 

            The Accuracy field can take on any one of five different values.  These values are “AS0" (9,809 outlets), “AS1" (66 outlets), “AS2" (35 outlets), “AC1" (3,783 outlets), and “AC2" (2,419 outlets).   These values represent the following levels of geographic accuracy:

 

AS0 -   This address location code indicates an address range geocode and is the most accurate geocode available.  The street segment was found in the city/zip code TIGER file that matched that in the library outlet’s address.  The street segment also had a from/to address range for both the left and right-hand side of the segment into which the numeric portion of the outlet’s address fell.

AS1 -   Same as AS0 except that the street side is unknown.

AS2 -   Same as AS0, with the address interpolated onto a TIGER street segment on which the address range was added.

AC1 -   The library outlet point was located through information obtained directly by GeoLib staff.  Most of the time this meant a direct telephone conversation with a representative from the library outlet.  During such conversations, an attempt was always made to obtain a location accurate to within 100 feet or less.  Other sources of information that could be used for a library outlet to receive this accuracy code included very specific and detailed maps or instructions on how to get to the library that may be posted on the library outlet’s web page.  See Appendix D for further information on this accuracy classification code.

AC2 -   The library outlet point was located through information obtained directly by GeoLib staff.  However, due to the remoteness of the library outlet or contradictory information between the digital TIGER map and the information provided by the library outlet representative, the accuracy of the library outlet location to within at least one-quarter mile or closer cannot be ascertained witout an on-site visit.  Accuracy should still be to within one-half mile or less.  Secondary sources that provided library outlet location information that are believed to be highly reliable but which was not independently verified by GeoLib staff was also given this accuracy classification.  See Appendix D for further information on this accuracy classification code.

 

            The highest level of confidence in placement of a point for a library outlet is represented by the AC1 accuracy classification, with a locational accuracy of one-quarter mile or closer.  The remaining accuracy classifications are all approximately equal in accuracy but the method of locating the particular point is different for the different accuracy codes.  Accuracy in urban areas should be to within one or two city blocks but accuracy may be less for those entities in rural areas due to the longer distances between street intersections.  Accuracy for these other classifications should be within one-half mile or (usually) closer.

 

            Users should keep in mind that some library outlets are also library entities of a larger library “system.”  Consequently, a library outlet may also appear in the public library entity file.

 

 

Legal Service Areas GIS Data Set

 

            This data set is comprised of three files named:

 

            all_lsa.shp

            all_lsa.shx

            all_lsa.dbf

 

and requires about 28 megabytes of disk storage.

 

            There is one legal service area for each library entity.  Thus, there could be a maximum of 8,959 legal service areas in this GIS data set.  An entity’s particular service area could have a partial or 100% overlap with the service area of another entity.  For this release of the legal service areas GIS data set, there is a total of 6,646 legal service areas, or just over 74% of the possible total. 

 

            It was agreed upon between the NCES and GeoLib that the first release of the legal services area GIS data set would only contain those areas that either exactly matched or closely matched  that of a US Census-defined boundary (e.g., a place, county, metropolitan area, or school district).  Any library entity that selected the “Other” category for its legal service area was immediately relegated to a future data release for the development of its legal service area boundaries.  In addition to those library entities whose legal service areas fell into the “Other” category, some  library entities had geographic codes of “CI1" or “CI2" (i.e., their legal service areas either exactly or most nearly matched that of a city) in which the digital boundary data for the corresponding city limits did not exist in the 2000 TIGER files.  The development of the legal service area for those library entities were also postponed until the next release of this data set. See Appendix C for a detailed discussion of which library entities had the development of their legal service area boundaries postponed until future data set releases.

           

            In order to develop as many legal service area boundaries as possible for this GIS data set, the NCES requested that the distinction between exactly or most nearly matching a particular Census geography be ignored.  For instance, regardless of whether a library entity indicated that its legal service area either exactly matched or most nearly matched a city, the geographic area for that city as depicted in the 2000 TIGER files would be used as the entity’s legal service area.  This approximation of a legal service area for those library entities that do not exactly match that assigned geographic area will be fine-tuned in future data releases of this data set.

 

            The reason for developing the legal service area boundaries is to link the geographic area inside the developed boundaries with the Census 2000 population and demographic data.  This linkage between the geographic area and the Census 2000 data was performed for all library entities for which either an exact or an approximate legal service area boundary could be developed.  See the following section on the database structure of the legal service area GIS data set to obtain a detailed description of the Census 2000 data that has been developed for each library entity’s legal service area.

 

 


Consideration and Limitation in Using the GIS Data Sets

 

            These GIS data sets were developed from a nationwide database maintained by a federal agency after the data was received from a state coordinator that standardized the local library information.  Consequently, there are issues of timeliness and accuracy that are common to all such databases that need to be considered in evaluating whether or not these GIS data sets are suitable for use in a particular analysis.

 

Accuracy of Locations

 

            It is believed that these GIS data sets are the first and only nationwide coverage of public library entities, public library outlets, and the associated legal service area of library entities that is worthy of the term “digital nationwide GIS base map of public libraries.”  That is, these data sets were developed using a standardized methodology in which a library outlet or entity’s location is documented to a known level of accuracy such that the data sets can be used as a map reference upon which other data sets (e.g., 2000 Census data) can be overlaid with a high degree of accuracy and confidence.

 

            Although a systematic study of the distance libraries points were moved was not undertaken as part of the data development process, library locations that were not geocoded with an AS0, AS1, or AS2 were routinely being moved two to three miles from their original geocoded location as part of this data set development process.  Movement of libraries to new locations that were 10 or more miles from their original location was not uncommon in rural areas.  The maximum distance one library was moved during the development of these data sets is believed to be about 74 miles in the state of Alaska.

 

            The placement of a library location to within approximately one hundred feet of its actual location is sufficient for local library directors evaluating the adequacy of their local library services to their local population or for regional, statewide, or national library research and  policymaking. However, without actual on-site “ground truthing”, the point location in the GIS data set for the library entities and outlets are NOT accurately located to a level of precision that is suitable for detailed local government requiring extremely high levels of accuracy.   In those particular situations, being off by one or two city blocks can present major problems (e.g., for a traffic planner modeling transportation flows).

 

            Some specific caveats relative to the library entity and outlet locations that users of these data sets should remember include:

 

1) geocoding vendors assign latitude/longitude locations based on the existence of a street segment with an appropriate address range that matches a correct address provided to them.  If there is both a 500 N. Main St. and a 500 S. Main St. separated by one mile in the same town with the same zip code and the provided address read “North Main” where it should have been “South Main”, then the geocoded location would be off by one mile.  The accuracy code would still be recorded as AS0 though because there is no way the vendor (i.e., the computer) could have known about the error in the provided address.

2) distance estimates by individuals are not necessarily accurate.  During this project, library representatives would estimate distances to be between “one or two miles” when the 2000 TIGER files would show that the total distance between the two points in question is less than one mile.  The GeoLib staff making the telephone call would try to bracket the location of the library outlet or entity in between two streets (e.g., “So your location is between Second and Third St. then, is that correct?”) to ensure accurate placement of the point in the GIS data set but this was not always possible, especially towards the periphery of rural towns.

3) the location of the entities and outlets are only as good as the description provided to the GeoLib staff by the library director or library representative.  Questionable entity or outlet locations were assigned an AC2 code and GeoLib staff did not assign AC1 codes if there was any ambiguity.  Users of these data sets must remember that no on-site visits were made to check the accuracy of the information provided over the phone.  Especially in rural areas, street names are oftentimes not used at all and the sole library worker/director may be very familiar with the area but not know any of the street names in the area as provided in the 2000 TIGER files. 

 

Accuracy of the Database Information

 

            Most of the attribute data in the GIS data sets are taken directly from the 1998 PLS data.  However, there are several database accuracy issues that are specific to the GIS databases that users should remember:

                                                                       


1) a procedural question that needed to be answered when telephone calls were being made to the library representatives is: If a library location, name, address, or phone number has changed relative to the 1998 PLS data that was originally provided to GeoLib, what level of correction should be made to the original database?  After careful consideration, the following decisions were made:

 

A)                The GIS data set, since it is a newly-developed point data set, will reflect the most current location possible of a library entity or outlet.  That is, if a library entity or outlet has moved since its address in the 1998 PLS data, then it will be mapped to its most recent (latest) location.

B)                 Any other changes relative to the 1998 PLS library attribute data will NOT be incorporated into the GIS data sets.  Rather, the original data from the 1998 PLS data will be used in the GIS data set. Otherwise a data integrity problem will develop relative to the original 1998 PLS data.

 

As a result of this decision, the mapped location of all public library entities and outlets in the GIS data set is correct as of June 1, 2001 (when telephone calls were first initiated) or later.  The attribute data that is associated with the entities and outlets however may not be correct, including possibly the address.  It will be correct (i.e., the same) relative to the 1998 PLS data.  In the situation where the mapped location of an outlet or entity does not match that given its physical address listed in the attribute database, a user should assume that the mapped location is more current and correct.


2) The GIS database sets were developed using the data in the 1998 PLS data.  Thus, any library entity or outlet in that database that may have been closed since its original incorporation into the PLS data will be reflected in the GIS data set because GeoLib had determined that the entity or outlet was closed and it was simply not mapped in the GIS data set.  By the same token, any entity or outlet relocations that may have occurred since 1998 will also be reflected in the GIS data sets.  A completely new outlet or entity that opened since the development of the 1998 PLS data, however, will NOT be incorporated into the current GIS data sets because the entity or outlet was not listed in the 1998 PLS data.  Such entities or outlets will be incorporated into future releases of the GIS data sets.

 


Database Structure of the GIS Data Sets

 

            This section describes the unique attribute fields for the GIS data sets.  The attribute fields that are developed by the FSCS and maintained in the original NCES 1998 PLS data is NOT described here.  The reader is referred to the NCES web page at for information about obtaining documentation on those attribute fields.

 

Library Entity and Library Outlet GIS Data Sets

 

Field name is the name of the attribute field in the .dbf file.  The field length is the length of the attribute field in bytes.  The data type can be character or numeric.  If it is numeric, it is followed by the overall length of the field and the number of decimal places in the number.  The description discusses what is contained in the attribute field and the valid values for the field if appropriate.                                           

For both of these data sets, there is only one additional attribute field added to the original PLS data.  This attribute field is as follows:

           

FIELD NAME

LENGTH

DATA TYPE

DESCRIPTION

ACCURACY

4

Character

Maintains record of how point location for entity or outlet was derived; Valid values are AS0, AS1, AS2, AC1, or AC2

 

 

            <All other fields in the all_outlets.dbf and all_entities.dbf files are maintained and documented

              through the NCES, U.S. Department of Education>

 

 


Legal Service Areas GIS Data Set

 

Field name is the name of the attribute field in the .dbf file.  The field length is the length of the attribute field in bytes.  The data type can be character or numeric.  If it is numeric, it is followed by the overall length of the field and the number of decimal places in the number.  The description discusses what is contained in the attribute field and the valid values for the field if appropriate.

 

All the demographic and population numbers are from the 2000 Census Redistricting data and represent the totals within the legal service area identified for the library entity.

 

The database structure for the all_lsa.dbf attribute file is as follows:

 

FIELD NAME

FIELD LENGTH

DATA TYPE

DESCRIPTION