A group of sheriff's deputies in full riot gear, including helmets, shields, and body armor, stand in formation on a city street. One deputy stands apart, facing the group. The street is lined with trees and buildings with boarded-up windows and awnings. The scene appears to be a preparation or response to a public disturbance.

Weekend Reads | Can Police Misconduct Be Predicted?

by Kevin Schofield


This weekend’s read is a deep dive by researchers at the University of Chicago Crime Lab into the question of whether police departments can accurately predict which of their police officers are most likely to engage in misconduct in the near future, either while on duty or off duty.

The article begins by noting that most news coverage of police misconduct, coming after a tragedy where a person died wrongfully at the hands of a police officer, discusses what the proper response is after the fact, rather than how it might have been prevented. Yet, “whether these tragedies are preventable surely depends at least in part on the degree to which they are predictable,” the authors note.

When a police department comes under a federal consent decree that covers police misconduct, such as the one in force here in Seattle, the consent decree often includes the requirement that the department establishes an “early intervention system” or EIS to flag officers at high risk of committing misconduct and conduct interventions such as increased supervision, mentoring, or special training. Unfortunately, Seattle’s experience with its EIS is similar to other departments: It sounds great on paper, but it turns out to be hard to accurately identify which officers are really at high risk of future misconduct.

The trio of researchers evaluate two approaches to predicting officer misconduct, using data on the Chicago Police Department: a traditional statistical analysis and an AI-based approach that trains a model based on historical data of police officers and complaints filed against them. Their published paper evaluates both approaches.

Evaluation of this kind of system is based upon how well the officers flagged as high-risk for misconduct correlate with the officers who do actually engage in misconduct in the future. It’s measured in two ways, which complement each other:

  • Precision: What percentage of the officers who were flagged as high-risk subsequently have an instance of misconduct? In other words, when it flags someone, how often is it right?
  • Recall: What percentage of the officers who engaged in misconduct were flagged as high-risk? In other words, how close did it get to flagging all of the future offenders?

It turns out that most of the obvious candidate statistics for predicting officer misconduct, such as a recent serious incident or a history of sustained complaints, are not very predictive of future misconduct. The best-performing statistic is a history of total complaints, including ones that are not sustained (only about 3% of all complaints are ultimately sustained). But even then, it’s not very good: When ranked by number of previous complaints (a system they call “rank by complaint” or RBC), flagging the officers in the top 10% as high-risk would only capture about 35% of future officer misconduct. As the researchers put it, “predictable risk is concentrated in a very small group of officers.”

A line graph titled "Recall for On-Duty and Off-Duty Risk Models" shows recall percentages versus risk percentiles (90 to 99). The blue line represents on-duty misconduct with higher recall, and the orange line represents off-duty misconduct with lower recall. Shaded areas indicate confidence intervals.
Figure 2: Recall of risk models. “Predicting Police Misconduct.” Greg Stoddard, Dylan J. Fitzpatrick, and Jens Ludwig. NBER Working Paper No. 32432. May 2024. JEL No. C0,K0

And even then, relying on complaints that aren’t sustained is problematic; that’s the kind of information that officers and police officer unions want deleted as soon as possible — as would anyone accused and eventually exonerated of misdeeds. On the flip side, the researchers point out that there is a very high bar for sustaining complaints against police officers: Only about 3% of all complaints are sustained, suggesting that a pattern of unsustained complaints still has some value. Though we also need to be aware that in theory someone with a grudge against a police officer could probably “weaponize” an EIS system by filing enough complaints to get the officer flagged as high-risk — even if all the complaints are not sustained. But even if we need to take care in how we use this statistic, RBC provides a good baseline for evaluating a trained machine-learning model to see if it performs better in predicting officer misconduct.

The researchers trained a machine learning system with data from the Chicago Police Department spanning from 2010 to 2018 and then tested its ability to predict future officer misconduct. The trained system consistently did about 6 percentage points better than rank-by-complaint; where 16% of officers who had sustained on-duty complaints were ranked in the highest-risk 5% by RBC, 22% were ranked in the highest-risk 5% by the trained model. It’s certainly better, but it’s still not good.

From there, the researchers looked at some of the harder questions, starting with whether there were biases in the dataset that skewed the results. For example — and this is a common concern raised about EIS systems — do some officers inevitably attract more complaints because of their specific beat or type of on-duty activity, such as “vice squad” officers? The researchers concluded that accounting for differences and job categories, the results come out about the same.

They also looked at whether EIS systems that attempt to predict future officer misconduct are cost-effective. While acknowledging that saving lives by reducing misconduct is a benefit that goes beyond dollars, helping the bean-counters to justify the cost of implementing an EIS is still useful. According to the researchers, it costs about $500,000 to build out an EIS based on the simple rank-by-complaint metric, and about $5 million to build one out using a trained model. Departments also incur costs for the specific interventions they employ with officers who are flagged as high-risk. But those costs are offset by savings from fewer complaint investigations and reduced litigation costs from sustained complaints that turn into lawsuits when misconduct is prevented. In total, when an EIS flags the 5% highest-risk officers for appropriate intervention, the Chicago Police Department would save about $880,000 per year using a simple RBC system and just over $1 million per year with a trained model that costs more but has slightly better performance. It’s not a huge savings for a department with nearly 12,000 police officers, but it more than pays for itself.

In addition, the researchers looked at how much data is needed to successfully train a model; specifically, what size department generates enough data to get a useful level of accuracy? By their calculations, about 500 officers is the crossover point where a trained model does better than simply using rank-by-complaint, and at 1,000 officers the model beats RBC by about two percentage points.

A line graph titled "Recall at 5%" shows recall percentages versus the number of officers on a log scale. The blue line represents on-duty misconduct with higher recall, and the orange line represents off-duty misconduct with lower recall. Shaded areas indicate confidence intervals, and horizontal dashed lines mark specific recall percentages.
Graph from “Predicting Police Misconduct.” Greg Stoddard, Dylan J. Fitzpatrick, and Jens Ludwig. NBER Working Paper No. 32432. May 2024. JEL No. C0,K0

What both the cost-effectiveness calculations and the dataset-size information tell us is that smaller departments — less than 500 officers — are almost certainly better served by implementing an EIS based upon rank-by-complaint rather than trying to build their own machine-learning model; the cost of developing the model is too high, and the accuracy will be too low. Of course, at the end of the day, the money isn’t the measure of the value of early intervention; the real value is in saving lives. Interestingly, police departments with less than 500 officers account for 63% of all police killings, so early-intervention solutions that only address large departments would not have an impact on nearly two-thirds of those killings.

A bar chart titled "Police Department Size and Number of Killings" compares the percentage of total killings to the percentage of total officers across various department sizes. For departments with 1-49 officers, 21% killings and 24% officers. For 50-99: 12% killings, 11% officers. For 100-199: 14% killings, 12% officers. For 200-499: 16% killings, 13% officers. For 500-999: 12% killings, 10% officers. For 1000+: 26% killings, 31% officers.
58 United States Department of Justice. Office of Justice Programs. Bureau of Justice Statistics. Law Enforcement Agency Roster (LEAR), 2016. Inter-university Consortium for Political and Social Research [distributor], 2017-04-05.

So where does that leave us? Early intervention systems seem to provide some value, in that they prevent a small amount of police misconduct while essentially paying for themselves in reduced costs. But they are still not very accurate, flagging only a fraction of the officers who will commit misconduct in the future. This is not because the technology is bad, but because the portion of all officer misconduct that is actually predictable is concentrated in a small number of officers.


Read the full “Predicting Police Misconduct” report on the National Bureau of Economic Research’s website.


Kevin Schofield is a freelance writer and publishes Seattle Paper Trail. Previously he worked for Microsoft, published Seattle City Council Insight, co-hosted the “Seattle News, Views and Brews” podcast, and raised two daughters as a single dad. He serves on the Board of Directors of Woodland Park Zoo, where he also volunteers.

Before you move on to the next story …

The South Seattle Emerald™ is brought to you by Rainmakers. Rainmakers give recurring gifts at any amount. With around 1,000 Rainmakers, the Emerald™ is truly community-driven local media. Help us keep BIPOC-led media free and accessible.

If just half of our readers signed up to give $6 a month, we wouldn’t have to fundraise for the rest of the year. Small amounts make a difference.

We cannot do this work without you. Become a Rainmaker today!