Exercise 4: Spatial Statistics Lab
Goal: In this exercise, you will explore a number of
different spatial statistics. You will get some hands on experience with
a program called Crimestat (http://www.icpsr.umich.edu/NACJD/crimestat.html),
a free program originally designed for the analysis of crime locations (as its
name suggests). The program is great since it reads and writes shapefiles for ArcGIS. We
will consider another hypothetical problem: racial/ethnic differences in
infant mortality in
Context: According to the
You will calculate statistics that characterize the
location of infant deaths in
STEP 1
Download the following datasets to your computer (remember to use the C:\GeogHealth\<your name> directory as before):
Link to data here: exercise4.zip
Here's a description of the shapefiles:
whiteinfantdeaths - infant death
locations in
blackinfantdeaths - infant death
locations in
Note: the infant death data is fake data, though county health dept would have similar real data
You will need to uncompress the zip file using the Winzip program.
STEP 2
Start up ArcCatalog.
From the ArcCatalog menu, select File-Connect Folder. Select the directory where you stored the shapefiles you downloaded.
STEP 3
Let's first explore the racial distributions of white and
black populations in
Start up ArcMap with a new blank
map. Drag the
Where do you think most blacks live? How about
whites? Create a choropleth map for the black
population (do you remember how from exercises 1 and 2?). The
attribute field for the black population is called
"Black". After you've created the choropleth
map for blacks, try creating one for whites. How are they distributed
across
Let's now compute central tendency (spatial mean) and
dispersion statistics (spatial standard deviation and standard deviational
ellipse) for the two populations. We will use
the centroids of each census tract to represent
"point" estimates for population. We will do this in Crimestat, but first we need to generate x,y coordinates for the census
tract centroids. Right-click on the
You should now have two new columns at the end of the table called latitude and longitude. Right-click the "longitude" heading on the table, and select "calculate values", press "yes" to ignore the warning. You should get to the field calculator, which allows you to compute the values of the longitude field. Put a checkmark where says "advanced". Then copy and paste the following calculation into the "pre-logic VBA script code" window:
Dim dblArea as double
Dim pArea as IArea
Set pArea = [shape]
dblArea = pArea.centroid.x
Then in the window where it says "longitude =", type this:
dblArea
Then press OK to compute the longitudes. The stuff you copy and pasted is Visual Basic Applications (VBA) programming code. ArcGIS has an embedded program language that allows for all sorts of programming trickery. Here we used it to compute the x location of the centroids for each census tract polygon. It's a bit beyond the level of the class to expect you to learn VBA in addition to using a GIS, so don't hesitate to ask me for help when you need it.
Repeat the field calculation step for the latitude field. Except when you enter in the code use a "y" instead of "x" like this:
Dim dblArea as double
Dim pArea as IArea
Set pArea = [shape]
dblArea = pArea.centroid.y
After making the field calculations you should see that the longitude and latitude columns have been filled in with decimal degree. Let's now export the table to a dbf file that can be read by Crimestat. While the attribute table is still open, click on options, and select export. Save the table to a meaningful file name in your GeogHealth folder. When asked whether you want to add the table to ArcMap, select No.
Now, start Crimestat. Click on file crimestat.exe in your data files, so that CrimeStat will run. I am not sure if our lab computer will allow us to do so. If not, save the dbf table in a storage disk, so that you could continue to do so at home. Click past the title screen to get to the main interface. Notice how the functionality in Crimestat is arranged into tabs. First you setup your data file, then you can do different analyses: spatial description and/or spatial modeling.
Where it says primary file, press "select files". Set type to "dbf", and give it the filename for the table you just saved, then press OK. Then where it asks for X and Y, set the file to your dbf file, and set column for x to be longitude, and latitude for y. Then where it asks for Z give it the "White" population field. That way the spatial mean we're about to calculate will be weighted by the white population. Coordinate system is "spherical" since we are giving angular decimal degrees coordinates.
Then click on the "spatial description" tab at
the top. Under "spatial distribution", put checkmarks next
to: mean center, standard deviational
ellipse, morans I and geary's
C.
For mean center and standard deviational ellipse, press
"save result to". Save output to "Arcview Shp" and give it a
name like mcsd_white and sde_white.
This will tell Crimestat to make shapefiles
for the mean center and ellipse for us to view in ArcMap
afterwards.
Then click compute.
You should get a text listing of the results:
Mcsd = mean center, standard
deviation
sde = standard deviational ellipse
Moran's I and Geary's C
Can you figure out what's the x and y location of the mean center? How about the standard deviation in the x and y directions? What's the moran's I? What does it mean? How about Geary's C?
Repeat crimestat for the Black population. When you save the mean center and standard deviational ellipse, be sure to give it a different name like mcsd_black and sde_black.
Now let's see what these look like in ArcMap.
In ArcCatlog look for the shapefiles
you just created in Crimestat. They should look
like:
MCmcsdwhite.shp (mean center for the white
population)
MCmcsdblack.shp (mean center for the black
population)
SDDmcsdwhite.shp (standard deviation for the white
population)
SDDmcsdblack.shp (standard deviation for the black
population)
SDEwhite.shp (standard deviational ellipse for the
white population)
SDEblack.shp (standard deviational ellipse for the
black population)
Drag them into ArcMap's Table of
Contents.
Describe the differences between the statistics? Are the centers located
where you thought they would be based on the choropleth
maps you saw earlier?
STEP 4
Now take a look at the whiteinfantdeaths and blackinfantdeaths shapefiles in ArcMap. Do the patterns mortality seem to match their underlying ethnic populations?
Let's compute the same central tendency and dispersion
statistics in Crimestat for these cases.
Where it says primary file, press
"select files". Set type to
"shp", and give it the filename whiteinfantdeaths, then press OK. Then where it asks
for X and Y, set the file to whiteinfantdeaths,
and set column to X and Y, respectively. This time, do not give it a Z
field since we want Crimestat to use just the
locations of the cases, and not weight them by any value. Coordinate
system is "Projected (Euclidean)" since the shapefile
is spatially referenced to UTM.
Then compute mean center and standard deviation, and standard deviational ellipses. Be sure to press "save result to" and save the results to a meaningful shapefile name. Note that you can not run Moran's I and Geary's C because we are purely looking at location this time. Hence we're not providing any attribute value in which to consider spatial autocorrelation.
Repeat for blackinfantdeaths.
After you've computed the statistics load the shapefiles into ArcMap. Compare the statistics to those previously computed for the underlying population.
STEP 5
Let's now look at clustering of the mortality cases. Let's do a Nearest Neighbor Analysis.
Where it says primary file, press "select files". Set type to "shp", and give it the filename whiteinfantdeaths, then press OK. Then where it asks for X and Y, set the file to whiteinfantdeaths, and set column to X and Y, respectively. This time, do not give it a Z field since we want Crimestat to use just the locations of the cases, and not weight them by any value. Coordinate system is "Projected (Euclidean)" since the shapefile is spatially referenced to UTM.
If you have anything checked on the "spatial distribution" tab, uncheck them. Then click on the "distance analysis" tab. Check mark nearest neighbor, select to do only 1 nearest neighbor (if you entered in a larger number here, you'd be doing a K-th Order Nearest Neighbor analysis which we talked about in the lecture). Let's not worry about border correction.
Press "compute" to run the statistics.
What's the nearest neighbor result? does it indicate clustering or not?
Repeat for blackinfantdeaths.
Email me your findings.
Congratulations you are all done! Enjoy the week!
When you have a chance you might want to go to the Crimestat website listed above, and check out the pdf manual. It's a great intro primer on spatial stats, with examples (albeit, crime examples).