Analytic Techniques

Adventures in Transportation and Realtional Procedures

This tutorial serves to introduce a useful resource: The Census Transportation Planning Package; a very useful tool: Relational Database Procedures; and a neat way to save abd document complex chains of processing steps: ArcGIS Model Builder

References

Here is a listing of the majoe concepts covered in this tutorial and some places where you can find further documentation:


The Data

The core dataset used in this tutorial is the 2010 Census Transportation Planning Package, which provides an Origin-Destination matrix that describes how many people travel from any census tract to any other census tract for their daily commute. I downloaded the CTPP for the boston metro area from the Bureau of Transportation Statistics. You can Download the data for this project by clicking here. Open the ArcMap.mxd file to explore the data. Be sure to click the Source tab at the bottom of the table of contents so you can see the table.

This tutorial begins with two tables. One a table of census tracts, another, a table representing the number of trips between census tracts for commuters using different modes of transportation.

Our census tracts table is taken from the ESRI Maps and Data Collection. It represents the geometry and selected demographic statistics from the 2000 census.

Lets Get Relational

A large part of this tutorial is about relational databases. The fascinating transportation data we have here, only provides an interesting vehicle for learning to use the logic of Relational Database Management Systems to help us make new information from these data.

To begin, we will examine our tables. According to the axioms of relational database design, a Table, stores relations between some Entity and its Attributes there should be one row in the table for each entity. The Tracts Table provides an good example of this. Examine tyhe attributes of the tract layer. you will see one row per tract, each tract having a set of attributes that convey demographic data for that tract.

New information can be learned from this table by associating the rows in various ways. We can perform Sorts on the rows to based on specific attributes. We can procude Summary Statistics on the attributes of all of the rows in the table. And we can do Queries on the table to select rows, according to their attributes. It is very useful that with the ArcGIS interface, that the Summary Statistics calculated for a table are calculated only for the selected rows, so even more information can be learned by selecting specific sets of rows and calculatiing statistics for these associations of rows.

About FIPS Codes

The geographic areas in our tracts table come from the Federal Information Processing Standard (FIPS) codes. For example the FIPS State code for Massachusetts is 25. The FIPS County Code for Middlesex County is O17 Therefore to identify Middlesex County distinctly, we would use the FIPS State-County Code 25017.

Associative Operations on a Single Table

The following procedures can be found by rioght-clicking on the columns names in the Tracts table.

  1. Open the Attributes Table for Tracts.
  2. View the most populayed tracts by sorting on the Pop2000 field.
  3. Calculate Statistics on the Pop_2000 field to see the sum of all of the population in covered by the se tracts.
  4. From the Options menu at the bottom of the table choose Select by Attributes and select all of the tracts where where the State is Massachusetts and the County is Middlesex. Hint: [STATE_FIPS] = '25' and [CNTY_FIPS] = '017'
  5. Now calculate the statistics for Pop_2000 again. You can now see how many tracts are in middlesex county and what the total population is.
  6. Note how these functions operate only on selected records. This is useful, but can also be hazardous if we think we are operating on the whole table when we catually only have a selection of the records!
  7. Clear the selection using the apropriate command from the Options menu.
  8. Now lets examine how the Field->Summarize operation automates the query and statistics operations that we just did, for every unique value in a field. In other words, we can calculate population statistics for each county in the tracts table. See Picture. This operation produces a new table that has a single record for every FIPS State and County code that occurs in our tracts table. In addition to the unique stcnty fips code, this table has an attrribute summarizing the sum of the tract populations for that county.

Joins: Associating Records Between Different Tables

The new County_Population_Summary table we created in the exercise above has one row per county, and the sum of population over all tracts should be the sum of population for the each county (at least for the counties of Massachusetts. We have a feature layer that has the geometry for Massachusetts counties, and this table has the FIPS code for each county, and this same FIPS code identifies our county population data. We should now be able to associate the apropriate row from the population table with each row from the Counties table.

Setting up A Join

  1. Open the attribute table for the Counties layer
  2. Open the table for your County_Pop_Sum
  3. Examine how the STCOFIPS and the FIPS columns from these tables form a linking field.
  4. The Counties Attribute Table will be the Target Tahble for our join. Right-Click on it and choose Joins->Add Join
  5. Choose the Join Table and the linking fields apropriately See Picture.
  6. Now note that you have joined the columns from the County_pop_Sum table to your Counties Attribute table.
  7. You can make a thematic map of counties based on population density!

Part 2: Creating a Repeatable Model

IN the steps above, you have made an Attribute Selection, Calculated Summary Statistics, and Added a Join. All of this has required many mouse-clicks and over a page of text to describe. If you wanted to do it again, all of these would have to be repeated. This Ad-Hoc way of working is good for exploring data and creating one-off maps. But once we know what we want, especially if we intend to share our work with others we should learn a more efficient way to document our processing steps that saves everything we do. In arcGIS, the mechanism for this is known as Geoprocessing Models.

Open your ArcGIS toolbox icon to add the toolbox panel to your map. You have probably used tools in this before. In this project we are using, you can see there is a special toolbox named GSD_CTPP This toolbox has several models in it that we will use later. For now we will just look at one of them.

Examine a Model

  1. Right-click on the model, GSD_CTPP->County_Pop_Sum and choose Edit
  2. The blue ovals represent input data. The yellow boxes represent procedures, and the green ovals represent results.
  3. Double-click on the Select Layer by Attributes box. Note that this wizard lets you specify an input layer a Selection Type and an Expression.
  4. Note that this wizard is all filled in the way I left it! How convenient!
  5. Click on the Show Help button to see how this wizard works. Click on each field in the wizard, notice how the help panel is context sensitive.
  6. Pull down the pick-list next to Input Layer Take a look at the selections here. The blue icons are references to datasets that are already in the model. Yello icons are in the map but not in the model. You can also select input datasets that are not in the map at all.
  7. You can right-click on the Select By Attributes box and click Run to make it work.
  8. look at and run the Summary Statistics process. Note that it creates a County_Pop_Sum table, and deletes the old one if it needs to.
  9. Look at and run the Add Join process. If this wizard gives you an error, it is because you already have a join on your Counties table. Remove the Join by right-clicking the Counties Layer in your map and choose Joins->Remove and rerun the model.
  10. Close this Model

Running a Model and Setting Parameters

Models work like the other tools in the Toolbox. If you duuble-click or right-click->open a model it will present you with a list of parameters that you can set and an OK button. If you do this to our County_Sum model, we can see that our model does not yet have any parameters. Our next step will be to alter our model so that it has parameters that we can set from outside.

Setting Parameters in a Model

  1. Double-click on the County_Pop_Sum model and observe the parameters sheet.
  2. Run The model. If you get an error, remove the joins from the Counties layer and run it again.
  3. Now Right-Click on the model and choose Edit.
  4. Double-click on the Select by Attributes process and note the parameters that this wizard takes -- especially Expression. Now close this wizard.
  5. Right-click on the Select by Attributes Wizard and choose Make Variable->From Parameter->Expression.
  6. Now you have a new blue oval. Double-click on it, and you will see that you have pulled this parameter out of the Select by Attributes wizard. You can now set this parameter outside of the wizard.
  7. Now save your model and open it from the toolbox window. See now you can change the experssion when you run the model!

Making your own Toolbox and Model

As you have seen: Models can be made inside toolboxes, and these models create new data based on existing data layers. The models also include procedures from other toolboxes. The next several steps will show you how to create your own toolboxes and models and to set up the default file-system locations for the inpout and output data. More detailed understanding of this can be found in the Geoprocessing in ArcGIS User Guide referenced at the top of this page.

Setting up a ToolBox and A Model

  • In ArcMap, choose Tools-Options and click the Geoprocessing tab. Take a look at all of the geoprocessing options here.
  • We want our toolbox to travel with the data directory we have set up, so choose the outer directory containing the data for this lab -- it should probably be c:\temp\ctpp. This is where new toolboxes will be created.
  • Choose Environments and under General Settings set your Current Workspace and Scratch Workspace to the same directory. This is the placxe where your models will automatically look for their input data and place their oputput data by default.
  • Now close your Tools Options dialog.
  • Right-Click on an empty place in your Toolbox Panel and say New Toolbox
  • Take a look at your workspace directory using windows explorer. You can see your new toolbox there. It is simply a file with a .tbx extension.
  • Right-Click on your New Toolbox and say New Model. You now have a nice new empty model.
  • Finally, before you forget, Choose File->Model Properties in this model, and check the box for Store Relative Pathnames. Now if you keep this model and the data in the same folder structure, your model will run predicably wherever the outer folder happens to be!

Building a Model

So now you have a toolbox and your own model. Now comes the really good part -- putting in Data and Procedures and chainning them together. It will be a very good, and hopefully very easy, exercise, to rebuild our County_Pop_Sum model. The one thing we havn't yet discussed is how to find the wizards to enter into the model. The toolbox has hundreds of tools in it, and until you have an idea of what functions you should expect to find, and what they are named, it can be very difficult to find the procedure you are looking for. The Toolbox Index and Search tabs at the bottom of the Toolbox panel will help you with this.

Finding the wizard you Want

  1. Open your new model for editing
  2. Close all of the toolboxes in your toolbox panel
  3. Click the Search tab at the bottom of the tool panel.
  4. Enter Select in the search field, and click Search.
  5. Find the tool you want Hint: Select Layer by Attribute
  6. Click Locate to locate the tool.
  7. Take a look at where the Select by Attributes tool is located for future reference.
  8. Drag the Select Layer by Attributes tool into your model.

Exercise: Recreate the County_Sum_Pop Model

  1. Find the tools for Summary Statistics and Add Join
  2. Add them to your model
  3. Fill out the parameters for these procedures, consult the steps from the previous part of this tutorial if you need to.
  4. Run the individual procedures
  5. Run the entire model
  6. Give yourself a pat on the back or find someone else to do it for you you are now a master of complex geo-processing!

Models are extremely powerful way of sharing complicated procedures with others. Not only are they self-documenting, but they allow procedures to be re-run with various different parameters. The power of models will be illustrated in the next example.


Part 2: Relational Adventures in Transportation

The previous example and exercise points out some of the more important procedures and concepts of relational databases:

This next example introduces a slightly more complicated table, but uses the same three functions to generate some very interesting and useful results. It is hoped that your understanding of these simple relational procedures and the technology and terminology of Geoprocessing Models will make it much easier for me to describe and for you to understand and use the procedures that unlock this surprisiing information.

The Journey to Work Table

The central table in this example comes from the Census Transportation Planning Package. Our table is one of the hundreds of tables that are derived from the Journey To Work questions form the 200o Census. This is avaialble as the Census Transportation Planning Package, Part 3. Available from The Bureau of Transportation Statistics Click "C" for Census Transportation Planning Package. We downloaded the table, P3-006, Means of Transportation To Work table, from This Page.

The table comes in .csv format, which is a plain text file with values separated by commas. The first row in the table has the column names. The Readme.txt file has the metadata explaining the column names. This table is what is known as an origin/destination matrix. Each record reflects the flow of commuters from one tract to another tract. Therefore it can potentially have a large number of records equal to the square of the total number of tracts in the state. The table has a simple Metadata File which lists columns names and descriptions.

Open this table and take a look at its attributes. Note that I have added a couple of fields to build FIPS codes for the origin and destination tracts. Each row in this table represents a trip from one tract to another, and the attributes of these trips provide the total number of commuters, and also give subtotals by mode of transportation. Just to understand this, use a Sort function to find the tracts that have the highest number of bicycle trips originating in them. Find the tracts that are the most popular destinations for bicycle commmuters.

Now think about how you could make a map of census tracts shaded by the counts or percentage of bicycle trips that originate there. Hint: you would simply do a summarize function by Orig_Tract, and tabulate the sum of the count of bicycle commuters. The summarize is nexessary, because for any origin tract there are potentially hunderrsd of different destination tracts. We would then join this new summary table to a copy of the tracts layer. Of course the same procedure can be used to make the map of tracts as commuter destinations.

Next, what if we wanted to select a set of tracts and make maps showing where the commuters originating in these tracts come from or where the commuters originating in this tract are destined? The procedure for this is the same as the one described in the previous paragraph, except that we first narrow the rows from the tracts table to those that have a specific set of origin tracts or destination tracts. This is accomplished with a select by attributes query and takes advantage of the option that our summary statistics function tabulates summary statistics only for selected rows.

These procedures are encapsulated in the model Orig_Dest_Sum. YOu will see a couple of new procedures used here. The Copy Features procedure and the Feature to Points procedures create persistent feature classes from our joined tables, so they will be accessible outside of this map document. The new points layer will make it easy to make a proportional symbol map that will help us communicate effectively about the counts of commuters using proportional symbols.

About Data Preparation

One of the cool things about this demonstration is that you can do this for any state in the country. With this in mind, I tried to save all of my processing steps so that this project will be repeatable with a minimum of effort. You can take a look at the Models in the GSD CTPP->Prep toolbox where I have saved my work.

The first thing that I did in preparing the data for this project was to use ArcCatalog to create a new Personal Geodatabase. This is a means of arranging tables -- including those with shapes associated with them -- in a microsoft access database. A personal geodatabase is much quicker for working with very large tables than the shapefile and dbase (.dbf) formats that we have worked with previously. I have exported the .csv data table into this database to the table p3_006 I have also taken a layer of census tracts for Massachusetts and surrounding areas from the ESRi Maps and Data Cds and placed it in this geodatabase.

To finsh prewparing the data, we need to create fields in our origin0destination table to identify tracts the same way that they are identified in the tracts table. If you look at these tables, you will see a field, FIPS that is a unique ID for each tract. The Add Fields model uses the Add Field wizard from the Data Management->Fields toolbox to add a couple of fields to the origin-destination table to hold the FIPS codess for the Origin and Destination tracts. It then Calculates the values for these fields, by concatenating (which means 'putting together') the State, County and Tract codes for the origins and destinations. For a listing of all of the FIPS State-County Codes, Click Here