Geog5190/6190 GIS and Environmental Health

Exercise 2:  Prevalence of Lyme disease lab

Goal:  In this exercise, you will get some hands on experience with spatial data, particularly with dealing with data from different projections.  You will also continue your adventure into GIS by trying out some simple spatial operations to select and manipulate data.  We will consider a hypothetical health problem: Lyme disease in Salt Lake City!

STEP 1

Download the following datasets to your computer (remember to use the N:\GeogHealth\<your name> directory as in exercise 1):

Link to data here: exercise2.zip

Here's a description of the shapefiles:

    roads -  roads from the Census for Salt Lake county
    censustracts_UTM12 - census tracts, 2000 for Salt Lake county projected in WGS84, UTM zone 12N with units of meter.
    censustracts_NAD83 - census tracts, 1990 for Salt Lake county unprojected geographic coordinates with units of decimal degree.
    city - city boundaries for Salt Lake county
    cases - hypothetical disease case locations
    cntry00 - world country map

You will need to uncompress the zip file using the Winzip program.

 

STEP 2

Start up ArcCatalog.  

From the ArcCatalog menu, select File-Connect Folder.  Select the directory where you stored the shapefiles you downloaded.

 

STEP 3

Start ArcGIS - ArcMapYou can do this from the Windows start button, or if you already have ArcCatalog running, you can select the menu item, Tools-ArcMap.

You have two files for census tracts.  Both of them are valid, just with different projections and coordinate systems.  One is unprojected in geographic coordinates with units of decimal degrees, and is based on the North American Datum 1983 (NAD83), which is one specific model of the spherical shape of the earth.  The other one is projected to units of meters, and is based on the WGS84 datum, a different model of the earth.  Let's try and add these two to the same map:

From ArcCatalog drag and drop the two Census tract shapefiles (censustracts_UTM12 and censustracts_NAD83) into ArcMap.  Note that you may get an error message that one or more layers don't have correct spatial data.  What happens?

Right-click on each layer, and select "zoom to layer" to zoom to each layer.  Note the difference in coordinates at the bottom of the screen for each layer.  When you zoom to each layer, you can see that they are there, but they just don't line up with one another.  That's potentially what can happen when you don't explicitly set the coordinate systems for each spatial data that you use.

Let's set the coordinate system for each data set.  First, right-click on each layer and select "remove" to remove them from the map. 

Then in ArcCatalog, right-click on the censustracts00_UTM12 shapefile, select properties.  Under the "fields" tab, you will see a list of the fields for the shapefile.  Select the "shape" field.  Note that under "field properties" the spatial reference is "unknown" or “WGS72, UTM zone 12”.  We need to set this to WGS84, UTM zone 12.  Press on the "..." button to the right of the spatial reference property.  Then in the Spatial Reference Properties window, press "select", "projetcted coordinate systems", "utm", "wgs84", wgs 1984 utm zone 12N.prj".  Then OK out of the windows to finish setting it.

Similarly for the censustracts90_NAD83 shapefile, in ArcCatalog, right-click on the censustracts_NAD83 shapefile, select properties. Under the "fields" tab, you will see a list of the fields for the shapefile.  Select the "shape" field.  Note that under "field properties" the spatial reference is "GCS Assumed Geographic" or “NAD_1983 Geographic”.  This means that ArcGIS "guessed" that the units were geographic, decimal degree units.  However, we should set this to coordinates based on the same projection as the censustracts00.  So, press on the "..." button to the right of the spatial reference property.  Then in the Spatial Reference Properties window, press "select", "projected coordinate systems", "utm", "wgs84", wgs 1984 utm zone 12N.prj".  Then OK out of the windows to finish setting it.

Now try adding the two shapefiles to ArcMap.  Do they line up now?  They should.

Hence, in the future, if you download data from the internet, keep an eye out for any information about coordinate systems, datums, units, projections, etc. because you'll need to specify these for your data layers in ArcGIS.

 

STEP 4

So we've seen in the previous step how ArcGIS can register layers with different coordinate systems as long as we've told ArcGIS what these coordinate systems are.  There's an added twist to all this projection stuff.  It turns out that once the data is in ArcGIS, we can choose to output maps in any number of different projections.

Create a new blank map file by pressing the button on the ArcMap toolbar that looks like a blank sheet of paper (upper left corner of ArcMap).  Then, let's load in the cntry00 (world country map) by simply drag and dropping the cntry00 file from ArcCatalog into ArcMap.

To select an projection for the map output,  in ArcMap, right-click on the main Display area, select properties.  Under the "coordinate system" tab, go to where it says "select coordinate system".  In that box, select "predefined", "projected coordinate systems", "world", and then try any of the world projection systems listed.  Click "apply" to redraw the data in the selected projection system.  Try a few and see how they vary.

 

STEP 5

OK, now that we've got some sense of the importance of coordinate systems and projections, let's turn to GIS operations on our spatial data.  To illustrate some operations, we'll consider a hypothetical health problem.  The problem goes like this, recently Salt Lake City has been notified that there has been an increase in cases of Lyme disease in the community around the campus.  The Salt Lake City folks get all worried, and decide they need to do some health education within the most heavily hit neighborhoods.  So what we'll do is compute the prevalence in each census tract, figure out which census tracts have a prevalence higher than 3 per 10,000, and create a clipped street map to these areas so that educational flyers can be posted in the neighborhoods.

First, create a new blank map file by pressing the button on the ArcMap toolbar that looks like a blank sheet of paper (upper left corner of ArcMap).  Then, let's load in data by drag and dropping the following files from ArcCatalog into ArcMap:

    roads
    censustracts_NAD83
    city
    cases

Take a look at the various layers to be sure you understand each one.

First let's figure out where Salt Lake City is on the map.  The Salt Lake City is a city in the City  layer.  In exercise 1 we used the Find tool to search for data.  This time we will do a more precise Select Query to find the campus.  In the "Select" menu, choose "select by attributes".  In the select by attributes window, choose layer as "city" and method as "create new selection".  Then, you will see a list of fields within the layer that you can build a query on.  In this case, double click on "NAME" to move it to the query window.  Follow that by a click of the "=" button, and a double-click on "Salt Lake City" from the list of field values.  The select statement should look like this:

    "NAME" = ‘Salt Lake City

Then press "Apply" and then "Close".  You should find that the campus is now shown as selected on the map.

Let's save this selection as it's own shapefile so that we can just concentrate on a shapefile with just the city.  To do that, right-click on the city layer, and choose "Data", "export data".  Then export "selected features", using the "same coordinate system as the layer's source data", and save it to a meaningful filename.  When asked if you want to load the new data, say yes, and it will show up on the TOC as a new layer.

Notice how the cases surround the city.

 

STEP 6

Now let's compute the prevalence rate of lyme disease for each census tract.  To do this we need to know how many cases are in each tract and divide the tracts by the tract population.  This can be done with a Spatial Join.  Remember in exercise 1 we did a regular relational database join, where we matched on identifiers in different tables. Make sure that the attribute table of the census tract layer has a “Pop2000” field. With a spatial join, we want to match on places that have the same spatial location.  To do a spatial join, right-click on the census tract layer, and select "joins and relates", "join".  Where is asks "what do you want to do to this layer", select "join data from another layer based on spatial location" (ie do a spatial join).  Then select the layer to join as "cases". Note how you can then pick how you want to summarize the case points to each tract.  In this case, we simply want to sum up the number of cases in each tract, so select "sum".  Then give a meaningful name to the output layer that will receive the summed data.

After you run the spatial join, notice that you get a new layer in the TOC.  It should match your census tract layer.  However, if you right-click on the new layer, and open it's attribute table.  Notice there's a new column at the end called "Count".  That column holds the number of lyme disease cases in each tract.

Let's do a choropleth map of the number of cases in each census tract.  Do you remember how to create a choropleth map?  Here's a hint:  right-click on your new spatial joined layer, and select properties.  Then in the properties window, select the "symbology" tab, followed by quantities on the left side of the window.

How can we use the count to compute the prevalence rate?  What we will do is add a new column in the attribute table and use a formula like in Excel to compute the prevalence.  Right-click on the new spatial joined layer, and open the attribute table.  On the table, press the "Options" button and select "Add field".  For name, enter "Prevalence".  Then choose the "float" data type, with precision "5" and scale "2".  After you press OK, notice that there's a new column called Prevalence at the end.

Use your mouse and click on the column heading "Prevalence" to highlight the entire column.  Then right-click on it and select "Calculate values".  Press "yes" to the field calculator warning.  Then in the box where it says "Prevalence = " build the following formula (you can click on the fields above to help out): 

   10000* [Count_]/ [POP2000]

This is the basic formula for prevalence... it's the number of cases divided by the underlying population at risk times 10,000.  Press OK to run the calculation.

Go back and create the choropleth map for prevalence.

 

STEP 7

Let's do another select by attribute now to find those census tracts with prevalence greater than 3 per 10,000.  This is an arbitrary cut-off, however, in a realistic situation perhaps we would use that state prevalence rate as a reasonable cut-off.  From the select menu, choose "select by attributes".  In the select by attributes window, choose layer as the spatial joined layer you created and method as "create new selection".  Then, you will see a list of fields within the layer that you can build a query on.  In this case, double click on "Prevalence" to move it to the query window.  Follow that by a click of the ">" button, and type in 3.  The select statement should look like this:

    "Prevalence" >3

Then press "Apply" and then "Close".  You should find that the high prevalence tracts are now shown as selected on the map.  Leave these items selected because the next step will rely on selected features.

Now let's do a GIS spatial operation to clip the roads to the selected high prevalence tracts. 

If you use an ArcGIS version of 8.3 or lower: To do that we will use a tool called the Geoprocessing Wizard.  From the tools menu, select Geoprocessing Wizard.  Notice the different operations you can do with this tool:  dissolve, merge, clip, intersect and union. 

If you use an ArcGIS version of 9.0 or higher: Open ArcToolbox, you will see “Analysis Tools”, click on it. There is a function called “Clip” under “Extract”. Type in corresponding input features, clip features and output features. Follow the instructions below.

If you click on an operation, you get a description of what the operation does.  For this example, select "clip", and the next button.  The input layer we will clip is "roads".  The polygon clip layer is the spatial joined census tract layer with the high prevalence rates selected.  Since we only want to clip on the selected high prevalence tracts, make sure there's a checkmark on "use selected features only".  Then give a meaningful name to the clipped road output, and run the clip operation.  You should end up with a new layer on the TOC with the clipped roads.

As one last step, let's export the table of clipped street names to a file that we can load into Excel, so that field personnel have a tabular checklist for which roads they've posted educational flyers.  To do this right-click on the clipped road layer, open it's table.  On the table, press the "Options" button, and select "export".  Export the data to a meaningful filename.  Notice that the default format of data tables in ArcGIS is the dBase file format (.dbf).  This format can be read and modified in Excel, which you might want to try.

If you want, export a jpg map.

Done!

 

STEP 8

Some of you have asked, "where did the census data come from?"  Census data is usually distributed by the US Census in their own TIGER format.  However, since we are using industry standard Arcview (ArcGIS), we have the luxury of downloading shapefile versions of the TIGER data.  To get the census data:

Start ArcGIS - ArcMap

From the menu, select File-Add data from Geography Network.  You will get ESRI's online data search engine called the Geography Network.  From "Content Type" select "<All Content Types>". From "Content Theme" select "<All Content Themes>".  In "Keyword", enter in "TIGER" and press the search button.

When you get the results, under "Downloadable data", you will see:

    Publisher: U.S. Bureau of the Census
    Content Title: TIGER/Line Files, Redistricting Census 2000
    Coverage Area: United States

If you press "link to data", you will reach a download site for census data in shapefile format.

Congratulations you are all done!  Enjoy the week!