I enjoy working with data. This site makes some interesting data sets available to download.
Data formats July 2017. This is example parquet data. You'll need to download then unzip it.
pems_parquet.zip (17 MB)
Python DSI workshop April 2017 - Here's one file of the FARS data from 2011:
FARS2011.zip (16 MB)
Most of this traffic data was downloaded from the excellent source Caltrans PEMS and is made available here for educational purposes.
ECI 256 Mini Project
This video shows how traffic flow across all lanes changes over time and location for the I80 freeway near Davis, CA.
The left end of the x axis is near Dixon, and the right end is near Sacramento.
Traffic flow starts is tiny at 2AM and increases during commute and work hours.
The variability in each line is likely due to bad data and traffic entering and exiting the freeway around Davis.
Flatter lines can be interpreted as coming from through traffic, while jagged lines may be due to local traffic.
I hoped to see waves of commuter traffic moving across the graph, which could manifest itself through changing slopes of the yellow and blue lines.
We do see much more westbound traffic early around 5 AM as people leave the population center of Sacramento towards the Bay Area.
By 8 AM eastbound traffic catches up.
If I did this again I would prefer to use the 5 minute aggregates, since the 30 second data show excessive noise when the goal is to present the underlying trend.
Data quality for the 30 second raw sensor data was quite poor.
The file below contains all of the 30 second raw sensor data from January to
October 2016 for the locations shown below- the I80 corridor near Davis, CA.
I've started to write some tools in Python for processing this data. These can be found
at @clarkfitzg on Github
Summary statistics for each station station_summary.csv (4 KB)
Median of total flow for each weekday. Could be used to make animation. weekday.csv.gz (1 MB)
30 second time series for a single station (Richards Ave) near downtown Davis. richards.csv.gz (9.7 MB)
The medians for each observation grouped by (station, weekday, hour, half minute) I80_median.csv.gz (4.7 MB)
Station Metadata is small- just 53 stations: I80_stations.csv (8 KB)
Main data is around 35 million observations. I80_davis.txt.gz (215 MB)
Raw Data Description
Copied directly from PEMS website: Caltrans PEMS.
Larger PeMS datasets
The files below each contain one days worth of 30 second sensor readings for freeways around the SF Bay Area. Each CSV file is about 100 MB when compressed and has around 10 million rows with 24 columns. Handling these may require some technique, as each file will be around 1.5 GB in memory if loaded directly into an R data.frame.
PEMS contains more than 10 years of such data for all of California, and the complete data is larger than 10 TB.
Other interesting stuff
You can find me on Twitter @clarkfitzg.