Exploring United States Policing Data Using Python

Data Science, Politics, and Police

The intersection of science, politics, personal opinion, and social policy can be rather complex. This junction of ideas and disciplines is often rife with controversies, strongly held viewpoints, and agendas that are often more based on belief than on empirical evidence. Data science is particularly important in this area since it provides a methodology for examining the world in a pragmatic fact-first manner, and is capable of providing insight into some of the most important issues that we face today.

The recent high-profile police shootings of unarmed black men, such as Michael Brown (2014), Tamir Rice (2014), Anton Sterling (2016), and Philando Castile (2016), have triggered a divisive national dialog on the issue of racial bias in policing.

These shootings have spurred the growth of large social movements seeking to raise awareness of what is viewed as the systemic targeting of people-of-color by police forces across the country. On the other side of the political spectrum, many hold a view that the unbalanced targeting of non-white citizens is a myth created by the media based on a handful of extreme cases, and that these highly-publicized stories are not representative of the national norm.

In June 2017, a team of researchers at Stanford University collected and released an open-source data set of 60 million state police patrol stops from 20 states across the US. In this tutorial, we will walk through how to analyze and visualize this data using Python.

county scatters vt

The source code and figures for this analysis can be found in the companion Github repository - https://github.com/triestpa/Police-Analysis-Python

To preview the completed IPython notebook, visit the page here.

This tutorial and analysis would not be possible without the work performed by The Standford Open Policing Project. Much of the analysis performed in this tutorial is based on the work that has already performed by this team. A short tutorial for working with the data using the R programming language is provided on the official project website.

The Data

In the United States there are more than 50,000 traffic stops on a typical day. The potential number of data points for each stop is huge, from the demographics (age, race, gender) of the driver, to the location, time of day, stop reason, stop outcome, car model, and much more. Unfortunately, not every state makes this data available, and those that do often have different standards for which information is reported. Different counties and districts within each state can also be inconstant in how each traffic stop is recorded. The research team at Stanford has managed to gather traffic-stop data from twenty states, and has worked to regularize the reporting standards for 11 fields.

  • Stop Date
  • Stop Time
  • Stop Location
  • Driver Race
  • Driver Gender
  • Driver Age
  • Stop Reason
  • Search Conducted
  • Search Type
  • Contraband Found
  • Stop Outcome

Most states do not have data available for every field, but there is enough overlap between the data sets to provide a solid foundation for some very interesting analysis.

0 - Getting Started

We'll start with analyzing the data set for Vermont. We're looking at Vermont first for a few reasons.

  1. The Vermont dataset is small enough to be very manageable and quick to operate on, with only 283,285 traffic stops (compared to the Texas data set, for instance, which contains almost 24 million records).
  2. There is not much missing data, as all eleven fields mentioned above are covered.
  3. Vermont is 94% white, but is also in a part of the country known for being very liberal (disclaimer - I grew up in the Boston area, and I've spent a quite a bit of time in Vermont). Many in this area consider this state to be very progressive and might like to believe that their state institutions are not as prone to systemic racism as the institutions in other parts of the country. It will be interesting to determine if the data validates this view.

0.0 - Download Datset

First, download the Vermont traffic stop data - https://stacks.stanford.edu/file/druid:py883nd2578/VT-clean.csv.gz

0.1 - Setup Project

Create a new directory for the project, say police-data-analysis, and move the downloaded file into a /data directory within the project.

0.2 - Optional: Create new virtualenv (or Anaconda) environment

If you want to keep your Python dependencies neat and separated between projects, now would be the time to create and activate a new environment for this analysis, using either virtualenv or Anaconda.

Here are some tutorials to help you get set up.
virtualenv - https://virtualenv.pypa.io/en/stable/
Anaconda - https://conda.io/docs/user-guide/install/index.html

0.3 - Install dependencies

We'll need to install a few Python packages to perform our analysis.

On the command line, run the following command to install the required libraries.

pip install numpy pandas matplotlib ipython jupyter

If you're using Anaconda, you can replace the pip command here with conda. Also, depending on your installation, you might need to use pip3 instead of pip in order to install the Python 3 versions of the packages.

0.4 - Start Jupyter Notebook

Start a new local Jupyter notebook server from the command line.

jupyter notebook

Open your browser to the specified URL (probably localhost:8888, unless you have a special configuration) and create a new notebook.

I used Python 3.6 for writing this tutorial. If you want to use another Python version, that's fine, most of the code that we'll cover should work on any Python 2.x or 3.x distribution.

0.5 - Load Dependencies

In the first cell of the notebook, import our dependencies.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

figsize = (16,8)

We're also setting a shared variable figsize that we'll reuse later on in our data visualization logic.

0.6 - Load Dataset

In the next cell, load Vermont police stop data set into a Pandas dataframe.

df_vt = pd.read_csv('./data/VT-clean.csv.gz', compression='gzip', low_memory=False)

This command assumes that you are storing the data set in the data directory of the project. If you are not, you can adjust the data file path accordingly.

1 - Vermont Data Exploration

Now begins the fun part.

1.0 - Preview the Available Data

We can get a quick preview of the first ten rows of the data set with the head() method.

df_vt.head()
id state stop_date stop_time location_raw county_name county_fips fine_grained_location police_department driver_gender driver_age_raw driver_age driver_race_raw driver_race violation_raw violation search_conducted search_type_raw search_type contraband_found stop_outcome is_arrested officer_id is_white
0 VT-2010-00001 VT 2010-07-01 00:10 East Montpelier Washington County 50023.0 COUNTY RD MIDDLESEX VSP M 22.0 22.0 White White Moving Violation Moving violation False No Search Conducted N/A False Citation False -1.562157e+09 True
3 VT-2010-00004 VT 2010-07-01 00:11 Whiting Addison County 50001.0 N MAIN ST NEW HAVEN VSP F 18.0 18.0 White White Moving Violation Moving violation False No Search Conducted N/A False Arrest for Violation True -3.126844e+08 True
4 VT-2010-00005 VT 2010-07-01 00:35 Hardwick Caledonia County 50005.0 i91 nb mm 62 ROYALTON VSP M 18.0 18.0 White White Moving Violation Moving violation False No Search Conducted N/A False Written Warning False 9.225661e+08 True
5 VT-2010-00006 VT 2010-07-01 00:44 Hardwick Caledonia County 50005.0 64000 I 91 N; MM64 I 91 N ROYALTON VSP F 20.0 20.0 White White Vehicle Equipment Equipment False No Search Conducted N/A False Written Warning False -6.032327e+08 True
8 VT-2010-00009 VT 2010-07-01 01:10 Rochester Windsor County 50027.0 36000 I 91 S; MM36 I 91 S ROCKINGHAM VSP M 24.0 24.0 Black Black Moving Violation Moving violation False No Search Conducted N/A False Written Warning False 2.939526e+08 False

We can also list the available fields by reading the columns property.

df_vt.columns
Index(['id', 'state', 'stop_date', 'stop_time', 'location_raw', 'county_name',
       'county_fips', 'fine_grained_location', 'police_department',
       'driver_gender', 'driver_age_raw', 'driver_age', 'driver_race_raw',
       'driver_race', 'violation_raw', 'violation', 'search_conducted',
       'search_type_raw', 'search_type', 'contraband_found', 'stop_outcome',
       'is_arrested', 'officer_id'],
      dtype='object')

1.1 - Drop Missing Values

Let's do a quick count of each column to determine how consistently populated the data is.

df_vt.count()
id                       283285
state                    283285
stop_date                283285
stop_time                283285
location_raw             282591
county_name              282580
county_fips              282580
fine_grained_location    282938
police_department        283285
driver_gender            281573
driver_age_raw           282114
driver_age               281999
driver_race_raw          279301
driver_race              278468
violation_raw            281107
violation                281107
search_conducted         283285
search_type_raw          281045
search_type                3419
contraband_found         283251
stop_outcome             280960
is_arrested              283285
officer_id               283273
dtype: int64

We can see that most columns have similar numbers of values besides search_type, which is not present for most of the rows, likely because most stops do not result in a search.

For our analysis, it will be best to have the exact same number of values for each field. We'll go ahead now and make sure that every single cell has a value.

# Fill missing search type values with placeholder
df_vt['search_type'].fillna('N/A', inplace=True)

# Drop rows with missing values
df_vt.dropna(inplace=True)

df_vt.count()

When we count the values again, we'll see that each column has the exact same number of entries.

id                       273181
state                    273181
stop_date                273181
stop_time                273181
location_raw             273181
county_name              273181
county_fips              273181
fine_grained_location    273181
police_department        273181
driver_gender            273181
driver_age_raw           273181
driver_age               273181
driver_race_raw          273181
driver_race              273181
violation_raw            273181
violation                273181
search_conducted         273181
search_type_raw          273181
search_type              273181
contraband_found         273181
stop_outcome             273181
is_arrested              273181
officer_id               273181
dtype: int64

1.2 - Stops By County

Let's get a list of all counties in the data set, along with how many traffic stops happened in each.

df_vt['county_name'].value_counts()
Windham County       37715
Windsor County       36464
Chittenden County    24815
Orange County        24679
Washington County    24633
Rutland County       22885
Addison County       22813
Bennington County    22250
Franklin County      19715
Caledonia County     16505
Orleans County       10344
Lamoille County       8604
Essex County          1239
Grand Isle County      520
Name: county_name, dtype: int64

If you're familiar with Vermont's geography, you'll notice that the police stops seem to be more concentrated in counties in the southern-half of the state. The southern-half of the state is also where much of the cross-state traffic flows in transit to and from New Hampshire, Massachusetts, and New York. Since the traffic stop data is from the state troopers, this interstate highway traffic could potentially explain why we see more traffic stops in these counties.

Here's a quick map generated with Tableau to visualize this regional distribution.

Vermont County Map

1.3 - Violations

We can also check out the distribution of traffic stop reasons.

df_vt['violation'].value_counts()
Moving violation      212100
Equipment              50600
Other                   9768
DUI                      711
Other (non-mapped)         2
Name: violation, dtype: int64

Unsurprisingly, the top reason for a traffic stop is Moving Violation (speeding, reckless driving, etc.), followed by Equipment (faulty lights, illegal modifications, etc.).

By using the violation_raw fields as reference, we can see that the Other category includes "Investigatory Stop" (the police have reason to suspect that the driver of the vehicle has committed a crime) and "Externally Generated Stop" (possibly as a result of a 911 call, or a referral from municipal police departments).

DUI ("driving under the influence", i.e. drunk driving) is surprisingly the least prevalent, with only 711 total recorded stops for this reason over the five year period (2010-2015) that the dataset covers. This seems low, since Vermont had 2,647 DUI arrests in 2015, so I suspect that a large proportion of these arrests were performed by municipal police departments, and/or began with a Moving Violation stop, instead of a more specific DUI stop.

1.4 - Outcomes

We can also examine the traffic stop outcomes.

df_vt['stop_outcome'].value_counts()
Written Warning         166488
Citation                103401
Arrest for Violation      3206
Warrant Arrest              76
Verbal Warning              10
Name: stop_outcome, dtype: int64

A majority of stops result in a written warning - which goes on the record but carries no direct penalty. A bit over 1/3 of the stops result in a citation (commonly known as a ticket), which comes with a direct fine and can carry other negative side-effects such as raising a driver's auto insurance premiums.

The decision to give a warning or a citation is often at the discretion of the police officer, so this could be a good source for studying bias.

1.5 - Stops By Gender

Let's break down the traffic stops by gender.

df_vt['driver_gender'].value_counts()
M    179678
F    101895
Name: driver_gender, dtype: int64

We can see that approximately 30% of the stops are of women drivers, and 70% are of men.

1.6 - Stops By Race

Let's also examine the distribution by race.

df_vt['driver_race'].value_counts()
White       266216
Black         5741
Asian         3607
Hispanic      2625
Other          279
Name: driver_race, dtype: int64

Most traffic stops are of white drivers, which is to be expected since Vermont is around 94% white (making it the 2nd-least diverse state in the nation, behind Maine). Since white drivers make up approximately 94% of the traffic stops, there's no obvious bias here for pulling over non-white drivers vs white drivers. Using the same methodology, however, we can also see that while black drivers make up roughly 2% of all traffic stops, only 1.3% of Vermont's population is black.

Let's keep on analyzing the data to see what else we can learn.

1.7 - Police Stop Frequency by Race and Age

It would be interesting to visualize how the frequency of police stops breaks down by both race and age.

fig, ax = plt.subplots()
ax.set_xlim(15, 70)
for race in df_vt['driver_race'].unique():
    s = df_vt[df_vt['driver_race'] == race]['driver_age']
    s.plot.kde(ax=ax, label=race)
ax.legend()

Race Age Traffic Stop Distribution

We can see that young drivers in their late teens and early twenties are the most likely to be pulled over. Between ages 25 and 35, the stop rate of each demographic drops off quickly. As far as the racial comparison goes, the most interesting disparity is that for white drivers between the ages of 35 and 50 the pull-over rate stays mostly flat, whereas for other races it continues to drop steadily.

2 - Violation and Outcome Analysis

Now that we've got a feel for the dataset, we can start getting into some more advanced analysis.

One interesting topic that we touched on earlier is the fact that the decision to penalize a driver with a ticket or a citation is often at the discretion of the police officer. With this in mind, let's see if there are any discernable patterns in driver demographics and stop outcome.

2.0 - Analysis Helper Function

In order to assist in this analysis, we'll define a helper function to aggregate a few important statistics from our dataset.

  • citations_per_warning - The ratio of citations to warnings. A higher number signifies a greater likelihood of being ticketed instead of getting off with a warning.
  • arrest_rate - The percentage of stops that end in an arrest.
def compute_outcome_stats(df):
    """Compute statistics regarding the relative quanties of arrests, warnings, and citations"""
    n_total = len(df)
    n_warnings = len(df[df['stop_outcome'] == 'Written Warning'])
    n_citations = len(df[df['stop_outcome'] == 'Citation'])
    n_arrests = len(df[df['stop_outcome'] == 'Arrest for Violation'])
    citations_per_warning = n_citations / n_warnings
    arrest_rate = n_arrests / n_total

    return(pd.Series(data = {
        'n_total': n_total,
        'n_warnings': n_warnings,
        'n_citations': n_citations,
        'n_arrests': n_arrests,
        'citations_per_warning': citations_per_warning,
        'arrest_rate': arrest_rate
    }))

Let's test out this helper function by applying it to the entire dataframe.

compute_outcome_stats(df_vt)
arrest_rate                   0.011721
citations_per_warning         0.620751
n_arrests                  3199.000000
n_citations              103270.000000
n_total                  272918.000000
n_warnings               166363.000000
dtype: float64

In the above result, we can see that about 1.17% of traffic stops result in an arrest, and there are on-average 0.62 citations (tickets) issued per warning. This data passes the sanity check, but it's too coarse to provide any terribly interesting insights. Let's dig deeper.

2.1 - Breakdown By Gender

Using our helper function, along with Panda's groupby method, we can easily compare these stats for male and female drivers.

df_vt.groupby('driver_gender').apply(compute_outcome_stats)
arrest_rate citations_per_warning n_arrests n_citations n_total n_warnings
driver_gender
F 0.007038 0.548033 697.0 34805.0 99036.0 63509.0
M 0.014389 0.665652 2502.0 68465.0 173882.0 102854.0

This is a simple example of the common split-apply-combine technique. We'll be building on this pattern for the remainder of the tutorial, so make sure that you understand how this comparison table is generated before continuing.

We can see here that men are, on average, twice as likely to be arrested during a traffic stop, and are also slightly more likely to be given a citation than women. It is, of course, not clear from the data whether this is indicative of any bias by the police officers, or if it reflects that men are being pulled over for more serious offenses than women on average.

2.2 - Breakdown By Race

Let's now compute the same comparison, grouping by race.

df_vt.groupby('driver_race').apply(compute_outcome_stats)
arrest_rate citations_per_warning n_arrests n_citations n_total n_warnings
driver_race
Asian 0.006384 1.002339 22.0 1714.0 3446.0 1710.0
Black 0.019925 0.802379 111.0 2428.0 5571.0 3026.0
Hispanic 0.016393 0.865827 42.0 1168.0 2562.0 1349.0
White 0.011571 0.611188 3024.0 97960.0 261339.0 160278.0

Ok, this is interesting. We can see that Asian drivers are arrested at the lowest rate, but receive tickets at the highest rate (roughly 1 ticket per warning). Black and Hispanic drivers are both arrested at a higher rate and ticketed at a higher rate than white drivers.

Let's visualize these results.

race_agg = df_vt.groupby(['driver_race']).apply(compute_outcome_stats)
fig, axes = plt.subplots(nrows=2, ncols=1, figsize=figsize)
race_agg['citations_per_warning'].plot.barh(ax=axes[0], figsize=figsize, title="Citation Rate By Race")
race_agg['arrest_rate'].plot.barh(ax=axes[1], figsize=figsize, title='Arrest Rate By Race')

Citations and arrests by race and violation

2.3 - Group By Outcome and Violation

We'll deepen our analysis by grouping each statistic by the violation that triggered the traffic stop.

df_vt.groupby(['driver_race','violation']).apply(compute_outcome_stats)
arrest_rate citations_per_warning n_arrests n_citations n_total n_warnings
driver_race violation
Asian DUI 0.200000 0.333333 2.0 2.0 10.0 6.0
Equipment 0.006270 0.132143 2.0 37.0 319.0 280.0
Moving violation 0.005563 1.183190 17.0 1647.0 3056.0 1392.0
Other 0.016393 0.875000 1.0 28.0 61.0 32.0
Black DUI 0.200000 0.142857 2.0 1.0 10.0 7.0
Equipment 0.029181 0.220651 26.0 156.0 891.0 707.0
Moving violation 0.016052 0.942385 71.0 2110.0 4423.0 2239.0
Other 0.048583 2.205479 12.0 161.0 247.0 73.0
Hispanic DUI 0.200000 3.000000 2.0 6.0 10.0 2.0
Equipment 0.023560 0.187898 9.0 59.0 382.0 314.0
Moving violation 0.012422 1.058824 26.0 1062.0 2093.0 1003.0
Other 0.064935 1.366667 5.0 41.0 77.0 30.0
White DUI 0.192364 0.455026 131.0 172.0 681.0 378.0
Equipment 0.012233 0.190486 599.0 7736.0 48965.0 40612.0
Moving violation 0.008635 0.732720 1747.0 84797.0 202321.0 115729.0
Other 0.058378 1.476672 547.0 5254.0 9370.0 3558.0
Other (non-mapped) 0.000000 1.000000 0.0 1.0 2.0 1.0

Ok, well this table looks interesting, but it's rather large and visually overwhelming. Let's trim down that dataset in order to just retrieve the most important information.

# Create new column to represent whether the driver is white
df_vt['is_white'] = df_vt['driver_race'] == 'White'

# Remove violation with too few data points
df_vt_filtered = df_vt[~df_vt['violation'].isin(['Other (non-mapped)', 'DUI'])]

We're generating a new column to represent whether or not the driver is white. We are also generating a filtered version of the dataframe that strips out the two violation types with the fewest datapoints.

We not assigning the filtered dataframe to df_vt since we'll want to keep using the complete unfiltered dataset in the next sections.

Let's redo our race + violation aggregation now, using our filtered dataset.

df_vt_filtered.groupby(['is_white','violation']).apply(compute_outcome_stats)
arrest_rate citations_per_warning n_arrests n_citations n_total n_warnings
is_white violation
False Equipment 0.023241 0.193697 37.0 252.0 1592.0 1301.0
Moving violation 0.011910 1.039922 114.0 4819.0 9572.0 4634.0
Other 0.046753 1.703704 18.0 230.0 385.0 135.0
True Equipment 0.012233 0.190486 599.0 7736.0 48965.0 40612.0
Moving violation 0.008635 0.732720 1747.0 84797.0 202321.0 115729.0
Other 0.058378 1.476672 547.0 5254.0 9370.0 3558.0

Ok great, this is much easier to read.

In the above table, we can see that non-white drivers are more likely to be arrested during a stop that was initiated due to an equipment or moving violation, but white drivers are more likely to be arrested for a traffic stop resulting from "Other" reasons. Non-white drivers are more likely than white drivers to be given tickets for each violation.

2.4 - Visualize Stop Outcome and Violation Results

Let's generate a bar chart now in order to visualize this data broken down by race.

race_stats = df_vt_filtered.groupby(['violation', 'driver_race']).apply(compute_outcome_stats).unstack()
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=figsize)
race_stats.plot.bar(y='arrest_rate', ax=axes[0], title='Arrest Rate By Race and Violation')
race_stats.plot.bar(y='citations_per_warning', ax=axes[1], title='Citations Per Warning By Race and Violation')

citations and arrests by race and violation

We can see in these charts that Hispanic and Black drivers are generally arrested at a higher rate than white drivers (with the exception of the rather ambiguous "Other" category). and that Black drivers are more likely, across the board, to be issued a citation than white drivers. Asian drivers are arrested at very low rates, and their citation rates are highly variable.

These results are compelling, and are suggestive of potential racial bias, but they are too inconsistent across violation types to provide any definitive answers. Let's dig deeper to see what else we can find.

3 - Search Outcome Analysis

Two of the more interesting fields available to us are search_conducted and contraband_found.

In the analysis by the "Standford Open Policing Project", they use these two fields to perform what is known as an "outcome test".

On the project website, the "outcome test" is summarized clearly.

In the 1950s, the Nobel prize-winning economist Gary Becker proposed an elegant method to test for bias in search decisions: the outcome test.

Becker proposed looking at search outcomes. If officers don’t discriminate, he argued, they should find contraband — like illegal drugs or weapons — on searched minorities at the same rate as on searched whites. If searches of minorities turn up contraband at lower rates than searches of whites, the outcome test suggests officers are applying a double standard, searching minorities on the basis of less evidence."

Findings, Stanford Open Policing Project

The authors of the project also make the point that only using the "hit rate", or the rate of searches where contraband is found, can be misleading. For this reason, we'll also need to use the "search rate" in our analysis - the rate at which a traffic stop results in a search.

We'll now use the available data to perform our own outcome test, in order to determine whether minorities in Vermont are routinely searched on the basis of less evidence than white drivers.

3.0 Compute Search Rate and Hit Rate

We'll define a new function to compute the search rate and hit rate for the traffic stops in our dataframe.

  • Search Rate - The rate at which a traffic stop results in a search. A search rate of 0.20 would signify that out of 100 traffic stops, 20 resulted in a search.
  • Hit Rate - The rate at which contraband is found in a search. A hit rate of 0.80 would signify that out of 100 searches, 80 searches resulted in contraband (drugs, unregistered weapons, etc.) being found.
def compute_search_stats(df):
    """Compute the search rate and hit rate"""
    search_conducted = df['search_conducted']
    contraband_found = df['contraband_found']
    n_stops     = len(search_conducted)
    n_searches  = sum(search_conducted)
    n_hits      = sum(contraband_found)

    # Filter out counties with too few stops
    if (n_stops) < 50:
        search_rate = None
    else:
        search_rate = n_searches / n_stops

    # Filter out counties with too few searches
    if (n_searches) < 5:
        hit_rate = None
    else:
        hit_rate = n_hits / n_searches

    return(pd.Series(data = {
        'n_stops': n_stops,
        'n_searches': n_searches,
        'n_hits': n_hits,
        'search_rate': search_rate,
        'hit_rate': hit_rate
    }))

3.1 - Compute Search Stats For Entire Dataset

We can test our new function to determine the search rate and hit rate for the entire state.

compute_search_stats(df_vt)
hit_rate            0.796865
n_hits           2593.000000
n_searches       3254.000000
n_stops        272918.000000
search_rate         0.011923
dtype: float64

Here we can see that each traffic stop had a 1.2% change of resulting in a search, and each search had an 80% chance of yielding contraband.

3.2 - Compare Search Stats By Driver Gender

Using the Pandas groupby method, we can compute how the search stats differ by gender.

df_vt.groupby('driver_gender').apply(compute_search_stats)
hit_rate n_hits n_searches n_stops search_rate
driver_gender
F 0.789392 506.0 641.0 99036.0 0.006472
M 0.798699 2087.0 2613.0 173882.0 0.015027

We can see here that men are three times as likely to be searched as women, and that 80% of searches for both genders resulted in contraband being found. The data shows that men are searched and caught with contraband more often than women, but it is unclear whether there is any gender discrimination in deciding who to search since the hit rate is equal.

3.3 - Compare Search Stats By Age

We can split the dataset into age buckets and perform the same analysis.

age_groups = pd.cut(df_vt["driver_age"], np.arange(15, 70, 5))
df_vt.groupby(age_groups).apply(compute_search_stats)
hit_rate n_hits n_searches n_stops search_rate
driver_age
(15, 20] 0.847988 569.0 671.0 27418.0 0.024473
(20, 25] 0.838000 838.0 1000.0 43275.0 0.023108
(25, 30] 0.788462 492.0 624.0 34759.0 0.017952
(30, 35] 0.766756 286.0 373.0 27746.0 0.013443
(35, 40] 0.742991 159.0 214.0 23203.0 0.009223
(40, 45] 0.692913 88.0 127.0 24055.0 0.005280
(45, 50] 0.575472 61.0 106.0 24103.0 0.004398
(50, 55] 0.706667 53.0 75.0 22517.0 0.003331
(55, 60] 0.833333 30.0 36.0 17502.0 0.002057
(60, 65] 0.500000 6.0 12.0 12514.0 0.000959

We can see here that the search rate steadily declines as drivers get older, and that the hit rate also declines rapidly for older drivers.

3.4 - Compare Search Stats By Race

Now for the most interesting part - comparing search data by race.

df_vt.groupby('driver_race').apply(compute_search_stats)
hit_rate n_hits n_searches n_stops search_rate
driver_race
Asian 0.785714 22.0 28.0 3446.0 0.008125
Black 0.686620 195.0 284.0 5571.0 0.050978
Hispanic 0.644231 67.0 104.0 2562.0 0.040593
White 0.813601 2309.0 2838.0 261339.0 0.010859

Black and Hispanic drivers are searched at much higher rates than White drivers (5% and 4% of traffic stops respectively, versus 1% for white drivers), but the searches of these drivers only yield contraband 60-70% of the time, compared to 80% of the time for White drivers.

Let's rephrase these results.

Black drivers are 500% more likely to be searched than white drivers during a traffic stop, but are 13% less likely to be caught with contraband in the event of a search.

Hispanic drivers are 400% more likely to be searched than white drivers during a traffic stop, but are 17% less likely to be caught with contraband in the event of a search.

3.5 - Compare Search Stats By Race and Location

Let's add in location as another factor. It's possible that some counties (such as those with larger towns or with interstate highways where opioid trafficking is prevalent) have a much higher search rate / lower hit rates for both white and non-white drivers, but also have greater racial diversity, leading to distortion in the overall stats. By controlling for location, we can determine if this is the case.

We'll define three new helper functions to generate the visualizations.

def generate_comparison_scatter(df, ax, state, race, field, color):
    """Generate scatter plot comparing field for white drivers with minority drivers"""
    race_location_agg = df.groupby(['county_fips','driver_race']).apply(compute_search_stats).reset_index().dropna()
    race_location_agg = race_location_agg.pivot(index='county_fips', columns='driver_race', values=field)
    ax = race_location_agg.plot.scatter(ax=ax, x='White', y=race, s=150, label=race, color=color)
    return ax

def format_scatter_chart(ax, state, field):
    """Format and label to scatter chart"""
    ax.set_xlabel('{} - White'.format(field))
    ax.set_ylabel('{} - Non-White'.format(field, race))
    ax.set_title("{} By County - {}".format(field, state))
    lim = max(ax.get_xlim()[1], ax.get_ylim()[1])
    ax.set_xlim(0, lim)
    ax.set_ylim(0, lim)
    diag_line, = ax.plot(ax.get_xlim(), ax.get_ylim(), ls="--", c=".3")
    ax.legend()
    return ax

def generate_comparison_scatters(df, state):
    """Generate scatter plots comparing search rates of white drivers with black and hispanic drivers"""
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=figsize)
    generate_comparison_scatter(df, axes[0], state, 'Black', 'search_rate', 'red')
    generate_comparison_scatter(df, axes[0], state, 'Hispanic', 'search_rate', 'orange')
    generate_comparison_scatter(df, axes[0], state, 'Asian', 'search_rate', 'green')
    format_scatter_chart(axes[0], state, 'Search Rate')

    generate_comparison_scatter(df, axes[1], state, 'Black', 'hit_rate', 'red')
    generate_comparison_scatter(df, axes[1], state, 'Hispanic', 'hit_rate', 'orange')
    generate_comparison_scatter(df, axes[1], state, 'Asian', 'hit_rate', 'green')
    format_scatter_chart(axes[1], state, 'Hit Rate')

    return fig

We can now generate the scatter plots using the generate_comparison_scatters function.

generate_comparison_scatters(df_vt, 'VT')

search scatters vt

The plots above are comparing search_rate (left) and hit_rate (right) for minority drivers compared with white drivers in each county. If all of the dots (each of which represents the stats for a single county and race) followed the diagonal center line, the implication would be that white drivers and non-white drivers are searched at the exact same rate with the exact same standard of evidence.

Unfortunately, this is not the case. In the above charts, we can see that, for every county, the search rate is higher for Black and Hispanic drivers even though the hit rate is lower.

Let's define one more visualization helper function, to show all of these results on a single scatter plot.

def generate_county_search_stats_scatter(df, state):
    """Generate a scatter plot of search rate vs. hit rate by race and county"""
    race_location_agg = df.groupby(['county_fips','driver_race']).apply(compute_search_stats)

    colors = ['blue','orange','red', 'green']
    fig, ax = plt.subplots(figsize=figsize)
    for c, frame in race_location_agg.groupby('driver_race'):
        ax.scatter(x=frame['hit_rate'], y=frame['search_rate'], s=150, label=c, color=colors.pop())
    ax.legend(loc='upper center', bbox_to_anchor=(0.5, 1.2), ncol=4, fancybox=True)
    ax.set_xlabel('Hit Rate')
    ax.set_ylabel('Search Rate')
    ax.set_title("Search Stats By County and Race - {}".format(state))
    return fig
generate_county_search_stats_scatter(df_vt, "VT")

county scatters vt

As the old idiom goes - a picture is worth a thousand words. The above chart is one of those pictures - and the name of the picture is "Systemic Racism".

The search rates and hit rates for white drivers in most counties are consistently clustered around 80% and 1% respectively. We can see, however, that nearly every county searches Black and Hispanic drivers at a higher rate, and that these searches uniformly have a lower hit rate than those on White drivers.

This state-wide pattern of a higher search rate combined with a lower hit rate suggests that a lower standard of evidence is used when deciding to search Black and Hispanic drivers compared to when searching White drivers.

You might notice that only one county is represented by Asian drivers - this is due to the lack of data for searches of Asian drivers in other counties.

4 - Analyzing Other States

Vermont is a great state to test out our analysis on, but the dataset size is relatively small. Let's now perform the same analysis on other states to determine if this pattern persists across state lines.

4.0 - Massachusetts

First we'll generate the analysis for my home state, Massachusetts. This time we'll have more data to work with - roughly 3.4 million traffic stops.

Download the dataset to your project's /data directory - https://stacks.stanford.edu/file/druid:py883nd2578/MA-clean.csv.gz

We've developed a solid reusable formula for reading and visualizing each state's dataset, so let's wrap the entire recipe in a new helper function.

fields = ['county_fips', 'driver_race', 'search_conducted', 'contraband_found']
types = {
    'contraband_found': bool,
    'county_fips': float,
    'driver_race': object,
    'search_conducted': bool
}

def analyze_state_data(state):
    df = pd.read_csv('./data/{}-clean.csv.gz'.format(state), compression='gzip', low_memory=True, dtype=types, usecols=fields)
    df.dropna(inplace=True)
    df = df[df['driver_race'] != 'Other']
    generate_comparison_scatters(df, state)
    generate_county_search_stats_scatter(df, state)
    return df.groupby('driver_race').apply(compute_search_stats)

We're making a few optimizations here in order to make the analysis a bit more streamlined and computationally efficient. By only reading the four columns that we're interested in, and by specifying the datatypes ahead of time, we'll be able to read larger datasets into memory more quickly.

analyze_state_data('MA')

The first output is a statewide table of search rate and hit rate by race.

hit_rate n_hits n_searches n_stops search_rate
driver_race
Asian 0.331169 357.0 1078.0 101942.0 0.010575
Black 0.487150 4170.0 8560.0 350498.0 0.024422
Hispanic 0.449502 5007.0 11139.0 337782.0 0.032977
White 0.523037 18220.0 34835.0 2527393.0 0.013783

We can see here again that Black and Hispanic drivers are searched at significantly higher rates than white drivers. The differences in hit rates are not as extreme as in Vermont, but they are still noticeably lower for Black and Hispanic drivers than for White drivers. Asian drivers, interestingly, are the least likely to be searched and also the least likely to have contraband if they are searched.

search scatters MA
county scatters MA

If we compare the stats for MA to VT, we'll also notice that police in MA seem to use a much lower standard of evidence when searching a vehicle, with their searches averaging around a 50% hit rate, compared to 80% in VT.

The trend here is much less obvious than in Vermont, but it is still clear that traffic stops of Black and Hispanic drivers are more likely to result in a search, despite the fact the searches of White drivers are more likely to result in contraband being found.

4.1 - Wisconsin & Connecticut

Wisconsin and Connecticut have been named as some of the worst states in America for racial disparities. Let's see how their police stats stack up.

Again, you'll need to download the Wisconsin and Connecticut dataset to your project's /data directory.

We can call our analyze_state_data function for Wisconsin once the dataset has been downloaded.

analyze_state_data('WI')
hit_rate n_hits n_searches n_stops search_rate
driver_race
Asian 0.470817 121.0 257.0 24577.0 0.010457
Black 0.477574 1299.0 2720.0 56050.0 0.048528
Hispanic 0.415741 449.0 1080.0 35210.0 0.030673
White 0.526300 5103.0 9696.0 778227.0 0.012459

The trends here are starting to look familiar. White drivers in Wisconsin are much less likely to be searched than non-white drivers (aside from Asians, who tend to be searched at around the same rates as whites). Searches of non-white drivers are, again, less likely to yield contraband than searches on white drivers.

search scatters WI
county scatters WI

We can see here, yet again, that the standard of evidence for searching Black and Hispanic drivers is lower in virtually every county than for White drivers. In one outlying county, almost 25% (!) of traffic stops for Black drivers resulted in a search, even though only half of those searches yielded contraband.

Let's do the same analysis for Connecticut

analyze_state_data('CT')
hit_rate n_hits n_searches n_stops search_rate
driver_race
Asian 0.384615 10.0 26.0 5949.0 0.004370
Black 0.284072 346.0 1218.0 37460.0 0.032515
Hispanic 0.291925 282.0 966.0 31154.0 0.031007
White 0.379344 1179.0 3108.0 242314.0 0.012826

search scatters CT
county scatters CT

Again, the pattern persists.

4.2 - Arizona

We can generate each result rather quickly for each state (with available data), once we've downloaded each dataset.

analyze_state_data('AZ')
hit_rate n_hits n_searches n_stops search_rate
driver_race
Asian 0.196664 224.0 1139.0 48177.0 0.023642
Black 0.255548 2188.0 8562.0 116795.0 0.073308
Hispanic 0.160930 5943.0 36929.0 501619.0 0.073620
White 0.242564 9288.0 38291.0 1212652.0 0.031576

search scatters AZ
county scatters AZ

4.3 - Colorado

analyze_state_data('CO')
hit_rate n_hits n_searches n_stops search_rate
driver_race
Asian 0.537634 50.0 93.0 32471.0 0.002864
Black 0.481283 270.0 561.0 71965.0 0.007795
Hispanic 0.450454 1041.0 2311.0 308499.0 0.007491
White 0.651388 3638.0 5585.0 1767804.0 0.003159

search scatters CO
county scatters CO

4.4 - Washington

analyze_state_data('WA')
hit_rate n_hits n_searches n_stops search_rate
driver_race
Asian 0.087143 608.0 6977.0 352063.0 0.019817
Black 0.130799 1717.0 13127.0 254577.0 0.051564
Hispanic 0.103366 2128.0 20587.0 502254.0 0.040989
White 0.156008 15768.0 101072.0 4279273.0 0.023619

search scatters WA
county scatters WA

4.5 - North Carolina

analyze_state_data('NC')
hit_rate n_hits n_searches n_stops search_rate
driver_race
Asian 0.104377 31.0 297.0 46287.0 0.006416
Black 0.182489 1955.0 10713.0 1222533.0 0.008763
Hispanic 0.119330 776.0 6503.0 368878.0 0.017629
White 0.153850 3387.0 22015.0 3146302.0 0.006997

search scatters NC
county scatters NC

4.6 - Texas

You might want to let this one run while you go fix yourself a cup of coffee or tea. At almost 24 million traffic stops, the Texas dataset takes a rather long time to process.

analyze_state_data('TX')
hit_rate n_hits n_searches n_stops search_rate
driver_race
Asian 0.289271 976.0 3374.0 349105.0 0.009665
Black 0.345983 27588.0 79738.0 2300427.0 0.034662
Hispanic 0.219449 37080.0 168969.0 6525365.0 0.025894
White 0.335098 83157.0 248157.0 13576726.0 0.018278

search scatters TX
county scatters TX

4.7 - Even more data visualizations

I highly recommend that you visit the Standford Open Policing Project results page for more visualizations of this data. Here you can browse the search outcome results for all available states, and explore additional analysis that the researchers have performed such as stop rate by race (using county population demographics data) as well as the effects of recreational marijuana legalization on search rates.

5 - What next?

Do these results imply that all police officers are overtly racist? No.

Do they show that Black and Hispanic drivers are searched much more frequently than white drivers, often with a lower standard of evidence? Yes.

What we are observing here appears to be a pattern of systemic racism. The racial disparities revealed in this analysis are a reflection of an entrenched mistrust of certain minorities in the United States. The data and accompanying analysis are indicative of social trends that are certainly not limited to police officers, and should not be used to disparage this profession as a whole. Racial discrimination is present at all levels of society from retail stores to the tech industry to academia.

We are able to empirically identify these trends only because state police deparments (and the Open Policing team at Stanford) have made this data available to the public; no similar datasets exist for most other professions and industries. Releasing datasets about these issues is commendable (but sadly still somewhat uncommon, especially in the private sector) and will help to further identify where these disparities exist, and to influence policies in order to provide a fair, effective way to counteract these biases.

To see the full official analysis for all 20 available states, check out the official findings paper here - https://5harad.com/papers/traffic-stops.pdf.

I hope that this tutorial has provided the tools you might need to take this analysis further. There's a lot more that you can do with the data than what we've covered here.

  • Analyze police stops for your home state and county (if the data is available). If the data is not available, submit a formal request to your local representatives and institutions that the data be made public.
  • Combine your analysis with US census data on the demographic, social, and economic stats about each county.
  • Create a web app to display the county trends on an interactive map.
  • Build a mobile app to warn drivers when they're entering an area that appears to be more distrusting of drivers of a certain race.
  • Open-source your own analysis, spread your findings, seek out peer review, maybe even write an explanatory blog post.

The source code and figures for this analysis can be found in the companion Github repository - https://github.com/triestpa/Police-Analysis-Python

To view the completed IPython notebook, visit the page here.

The code for this project is 100% open source (MIT license), so feel free to use it however you see fit in your own projects.

As always, please feel free to comment below with any questions, comments, or criticisms.

Patrick Triest

Full-stack engineer, data enthusiast, insatiable learner, obsessive builder. You can find me wandering on a mountain trail, pretending not to be lost.

Subscribe to Break | Better

Get the latest posts delivered right to your inbox.

or subscribe via RSS with Feedly!

Note - Disqus is a great commenting service, but it also embeds a lot of Javascript analytics trackers. If you want to keep your browsing data private, you can disable these trackers from your "Disqus" account settings, in the Acount -> Personalization page.

Get the latest posts delivered to your inbox.

I hate spam. I promise not to send many emails.

or subscribe via RSS with Feedly