Data Sets

I enjoy working with data. This site makes some interesting data sets available to download.

fundamental diagram This web page below is from playing around with Javascript maps and analyzing a large data set.

Data formats July 2017. This is example parquet data. You'll need to download then unzip it. (17 MB)

Python DSI workshop April 2017 - Here's one file of the FARS data from 2011: (16 MB)

Most of this traffic data was downloaded from the excellent source Caltrans PEMS and is made available here for educational purposes.

ECI 256 Mini Project

This video shows how traffic flow across all lanes changes over time and location for the I80 freeway near Davis, CA. The left end of the x axis is near Dixon, and the right end is near Sacramento. Traffic flow starts is tiny at 2AM and increases during commute and work hours.

The variability in each line is likely due to bad data and traffic entering and exiting the freeway around Davis. Flatter lines can be interpreted as coming from through traffic, while jagged lines may be due to local traffic. I hoped to see waves of commuter traffic moving across the graph, which could manifest itself through changing slopes of the yellow and blue lines. We do see much more westbound traffic early around 5 AM as people leave the population center of Sacramento towards the Bay Area. By 8 AM eastbound traffic catches up.

If I did this again I would prefer to use the 5 minute aggregates, since the 30 second data show excessive noise when the goal is to present the underlying trend. Data quality for the 30 second raw sensor data was quite poor. I would also use Javascript and D3 as a technology more suited to the web.

Supporting Data

The file below contains all of the 30 second raw sensor data from January to October 2016 for the locations shown below- the I80 corridor near Davis, CA. I've started to write some tools in Python for processing this data. These can be found at @clarkfitzg on Github

Raw Data Description

Copied directly from PEMS website: Caltrans PEMS.

Larger PeMS datasets

The files below each contain one days worth of 30 second sensor readings for freeways around the SF Bay Area. Each CSV file is about 100 MB when compressed and has around 10 million rows with 24 columns. Handling these may require some technique, as each file will be around 1.5 GB in memory if loaded directly into an R data.frame. PEMS contains more than 10 years of such data for all of California, and the complete data is larger than 10 TB.

Other interesting stuff


You can find me on Twitter @clarkfitzg.