Geog5190/6190 GIS and Environmental Health
Exercise 2: Prevalence of Lyme disease lab
Goal: In this exercise, you will get some hands on
experience with spatial data, particularly with dealing with data from
different projections. You will also continue your adventure into GIS by
trying out some simple spatial operations to select and manipulate data.
We will consider a hypothetical health problem: Lyme
disease in
STEP 1
Download the following datasets to your computer (remember to use the N:\GeogHealth\<your name> directory as in exercise 1):
Link to data here: exercise2.zip
Here's a description of the shapefiles:
roads -
roads from the Census for
censustracts_UTM12 - census tracts, 2000 for
censustracts_NAD83 - census tracts, 1990
for
city - city boundaries for
cases - hypothetical disease case locations
cntry00 - world country map
You will need to uncompress the zip file using the Winzip program.
STEP 2
Start up ArcCatalog.
From the ArcCatalog menu, select File-Connect Folder. Select the directory where you stored the shapefiles you downloaded.
STEP 3
Start ArcGIS - ArcMap. You can do this from the Windows start button, or if you already have ArcCatalog running, you can select the menu item, Tools-ArcMap.
You have two files for census tracts. Both of them are valid, just with different projections and coordinate systems. One is unprojected in geographic coordinates with units of decimal degrees, and is based on the North American Datum 1983 (NAD83), which is one specific model of the spherical shape of the earth. The other one is projected to units of meters, and is based on the WGS84 datum, a different model of the earth. Let's try and add these two to the same map:
From ArcCatalog drag and drop the two Census tract shapefiles (censustracts_UTM12 and censustracts_NAD83) into ArcMap. Note that you may get an error message that one or more layers don't have correct spatial data. What happens?
Right-click on each layer, and select "zoom to layer" to zoom to each layer. Note the difference in coordinates at the bottom of the screen for each layer. When you zoom to each layer, you can see that they are there, but they just don't line up with one another. That's potentially what can happen when you don't explicitly set the coordinate systems for each spatial data that you use.
Let's set the coordinate system for each data set. First, right-click on each layer and select "remove" to remove them from the map.
Then in ArcCatalog, right-click
on the censustracts00_UTM12 shapefile, select
properties. Under the "fields" tab, you will see a list of the
fields for the shapefile. Select the
"shape" field. Note that under "field
properties" the spatial reference is "unknown" or “WGS72, UTM
zone
Similarly for the censustracts90_NAD83 shapefile, in ArcCatalog, right-click on the censustracts_NAD83 shapefile, select properties. Under the "fields" tab, you will see a list of the fields for the shapefile. Select the "shape" field. Note that under "field properties" the spatial reference is "GCS Assumed Geographic" or “NAD_1983 Geographic”. This means that ArcGIS "guessed" that the units were geographic, decimal degree units. However, we should set this to coordinates based on the same projection as the censustracts00. So, press on the "..." button to the right of the spatial reference property. Then in the Spatial Reference Properties window, press "select", "projected coordinate systems", "utm", "wgs84", wgs 1984 utm zone 12N.prj". Then OK out of the windows to finish setting it.
Now try adding the two shapefiles to ArcMap. Do they line up now? They should.
Hence, in the future, if you download data from the internet, keep an eye out for any information about coordinate systems, datums, units, projections, etc. because you'll need to specify these for your data layers in ArcGIS.
STEP 4
So we've seen in the previous step how ArcGIS can register layers with different coordinate systems as long as we've told ArcGIS what these coordinate systems are. There's an added twist to all this projection stuff. It turns out that once the data is in ArcGIS, we can choose to output maps in any number of different projections.
Create a new blank map file by pressing the button on the ArcMap toolbar that looks like a blank sheet of paper (upper left corner of ArcMap). Then, let's load in the cntry00 (world country map) by simply drag and dropping the cntry00 file from ArcCatalog into ArcMap.
To select an projection for the map output, in ArcMap, right-click on the main Display area, select properties. Under the "coordinate system" tab, go to where it says "select coordinate system". In that box, select "predefined", "projected coordinate systems", "world", and then try any of the world projection systems listed. Click "apply" to redraw the data in the selected projection system. Try a few and see how they vary.
STEP 5
OK, now that we've got some sense of the importance of
coordinate systems and projections, let's turn to GIS operations on our spatial
data. To illustrate some operations, we'll consider a hypothetical health
problem. The problem goes like this, recently
First, create a new blank map file by pressing the button on the ArcMap toolbar that looks like a blank sheet of paper (upper left corner of ArcMap). Then, let's load in data by drag and dropping the following files from ArcCatalog into ArcMap:
roads
censustracts_NAD83
city
cases
Take a look at the various layers to be sure you understand each one.
First let's figure out where
"NAME" = ‘
Then press "Apply" and then "Close". You should find that the campus is now shown as selected on the map.
Let's save this selection as it's own shapefile so that we can just concentrate on a shapefile with just the city. To do that, right-click on the city layer, and choose "Data", "export data". Then export "selected features", using the "same coordinate system as the layer's source data", and save it to a meaningful filename. When asked if you want to load the new data, say yes, and it will show up on the TOC as a new layer.
Notice how the cases surround the city.
STEP 6
Now let's compute the prevalence rate of lyme disease for each census
tract. To do this we need to know how many cases are in each tract and
divide the tracts by the tract population. This can be done with a Spatial
Join. Remember in exercise 1 we did a
regular relational database join, where we matched on identifiers in different
tables. Make sure that the attribute table of the census tract layer has a “Pop
After you run the spatial join, notice that you get a new layer in the TOC. It should match your census tract layer. However, if you right-click on the new layer, and open it's attribute table. Notice there's a new column at the end called "Count". That column holds the number of lyme disease cases in each tract.
Let's do a choropleth map of the number of cases in each census tract. Do you remember how to create a choropleth map? Here's a hint: right-click on your new spatial joined layer, and select properties. Then in the properties window, select the "symbology" tab, followed by quantities on the left side of the window.
How can we use the count to compute the prevalence rate? What we will do is add a new column in the attribute table and use a formula like in Excel to compute the prevalence. Right-click on the new spatial joined layer, and open the attribute table. On the table, press the "Options" button and select "Add field". For name, enter "Prevalence". Then choose the "float" data type, with precision "5" and scale "2". After you press OK, notice that there's a new column called Prevalence at the end.
Use your mouse and click on the column heading "Prevalence" to highlight the entire column. Then right-click on it and select "Calculate values". Press "yes" to the field calculator warning. Then in the box where it says "Prevalence = " build the following formula (you can click on the fields above to help out):
10000* [Count_]/ [POP2000]
This is the basic formula for prevalence... it's the number of cases divided by the underlying population at risk times 10,000. Press OK to run the calculation.
Go back and create the choropleth map for prevalence.
STEP 7
Let's do another select by attribute now to find those census tracts with prevalence greater than 3 per 10,000. This is an arbitrary cut-off, however, in a realistic situation perhaps we would use that state prevalence rate as a reasonable cut-off. From the select menu, choose "select by attributes". In the select by attributes window, choose layer as the spatial joined layer you created and method as "create new selection". Then, you will see a list of fields within the layer that you can build a query on. In this case, double click on "Prevalence" to move it to the query window. Follow that by a click of the ">" button, and type in 3. The select statement should look like this:
"Prevalence" >3
Then press "Apply" and then "Close". You should find that the high prevalence tracts are now shown as selected on the map. Leave these items selected because the next step will rely on selected features.
Now let's do a GIS spatial operation to clip the roads to the selected high prevalence tracts.
If you use an ArcGIS version of 8.3 or lower: To do that we will use a tool called the Geoprocessing Wizard. From the tools menu, select Geoprocessing Wizard. Notice the different operations you can do with this tool: dissolve, merge, clip, intersect and union.
If you use an ArcGIS version of 9.0 or higher: Open ArcToolbox, you will see “Analysis Tools”, click on it. There is a function called “Clip” under “Extract”. Type in corresponding input features, clip features and output features. Follow the instructions below.
If you click on an operation, you get a description of what the operation does. For this example, select "clip", and the next button. The input layer we will clip is "roads". The polygon clip layer is the spatial joined census tract layer with the high prevalence rates selected. Since we only want to clip on the selected high prevalence tracts, make sure there's a checkmark on "use selected features only". Then give a meaningful name to the clipped road output, and run the clip operation. You should end up with a new layer on the TOC with the clipped roads.
As one last step, let's export the table of clipped street names to a file that we can load into Excel, so that field personnel have a tabular checklist for which roads they've posted educational flyers. To do this right-click on the clipped road layer, open it's table. On the table, press the "Options" button, and select "export". Export the data to a meaningful filename. Notice that the default format of data tables in ArcGIS is the dBase file format (.dbf). This format can be read and modified in Excel, which you might want to try.
If you want, export a jpg map.
Done!
STEP 8
Some of you have asked, "where did the census data come from?" Census data is usually distributed by the US Census in their own TIGER format. However, since we are using industry standard Arcview (ArcGIS), we have the luxury of downloading shapefile versions of the TIGER data. To get the census data:
Start ArcGIS - ArcMap
From the menu, select File-Add data from Geography Network. You will get ESRI's online data search engine called the Geography Network. From "Content Type" select "<All Content Types>". From "Content Theme" select "<All Content Themes>". In "Keyword", enter in "TIGER" and press the search button.
When you get the results, under "Downloadable data", you will see:
Publisher:
Content Title: TIGER/Line Files, Redistricting
Census 2000
Coverage Area:
If you press "link to data", you will reach a download site for census data in shapefile format.
Congratulations you are all done! Enjoy the week!