Skip to main content area Skip to main content area Skip to institutional navigation Skip to search Skip to section navigation

Quattrone Center for the Fair Administration of Justice

Risk Assessment Tools: Capabilities, Benefits and Risks

More and more jurisdictions across the country are deploying predictive assessment tools (often called risk assessment tools) to guide decisions about pretrial detention, sentencing, prison classification, parole, and probation supervision.  Not unlike risk models used by insurance companies, criminal justice risk assessment tools are based on statistical analysis of large, aggregated data sets of criminal behavior over time.  Tool developers identify traits that correlate with arrest in the sample set; these are deemed “risk factors.”  They then develop an algorithm that calculates an individual’s statistical likelihood of future arrest on the basis of the number of risk factors that apply. There are two kinds of risk assessment tools in use in the criminal justice system: checklist instruments and machine-learned forecasting programs.  Both operate on this basic model.

The large-scale shift toward actuarial risk assessment has provoked both excitement and concern. Supporters of actuarial tools advocate for their use to support and improve the information and context provided to a judge, allowing the judge to combine information about the individual and the judge’s individual experiences and observations with a more formulaic, data-driven assessment of how an individual is likely to act in the future.  But questions abound.  Others worry that judges will show too much reliance on the tools, using them as a substitute for judicial discretion rather than conducting a more informed individualized assessment of the individual.

A variety of predictive assessment tools have been implemented across the country in pre-trial detention decisions, in sentencing hearings, and to assist in probation and parole decisions.  But how exactly do they work?  How comfortable are we basing restrictions on one person’s liberty on the potential of future misconduct, as extrapolated from other people’s past behavior?  How exactly do the tools work?  How “accurate” are they, and are criminal justice system actors capable of understanding the tools deeply enough to monitor them effectively?  Should we worry that judges will accord the tools undue weight, using them as a substitute for discretion?  And are the tools as free of bias and subjectivity as we believe?  This panel provides a framework to evaluate the role of such tools in our criminal justice system, their utility, and their constitutionality.

Describing the talk as the “Minority Report” panel, moderator Sandra Mayson of the Quattrone Center captures the visceral ambivalence felt by many about proactively limiting an individual’s physical freedom based on predictive assessments of future actions, rather than actual past acts.  Mayson tries to distinguish three separate concerns about actuarial risk assessment: concerns about the act of prediction itself, concerns about specific methodologies, and concerns with the state’s response to predictions.

A Designer’s View.

The use of computerized, algorithmic risk assessment may be new, but the question that such tools seek to answer is one judges have been asking for centuries, as Penn Criminology and Statistics Professor Richard Berk points out.  Berk, who has designed and assessed multiple machine-learned risk assessment programs in use today, reminds us that predictive risk assessment has always been part of the judicial role.  Thus, Berk frames the discussion not in terms of whether we should limit freedoms based on predictive risk assessments - we already do.  Instead, he asks whether and how the tools can improve a judge’s ability to make tailored decisions that improve outcomes for the individuals assessed and for the community.

In Berk’s view, the deployment of risk assessment tools should not replace judicial discretion, or free judges from independently assessing an individual defendant based on the specific facts of the case at hand.  Nor should the risk assessment forecasts serve as substitute for the judge’s decisions.  Rather, they provide additional information that should allow for a more thoughtful exercise of the judge’s discretion.  We should not seek or expect perfect prediction, says Berk.  Rather, we should ask whether the tools generate information that improves a judge’s ability to make fair and informed determinations.

Risk assessment tools can meet this challenge through algorithmic machine learning, says Berk, in which computers, using complex mathematical calculations, find associations in a given dataset between certain characteristics and the outcome of concern – for example, arrest for violent crime.  (For example:  In general, the younger the age at which a person is first charged as an adult, the more likely it is that person will be arrested in the future.)  By combining many such associations, the computer generates an algorithm that can be deployed to assess other individuals’ likelihood of being arrested for any crime or a violent crime.  Machine-learned algorithms can provide an assessment of this likelihood that is more robust, transparent and potentially less distorted by implicit bias, than the assessment that the judge’s individual experiences by themselves might generate.

Some obvious concerns arise.  First, any risk assessment tool will only be as accurate as its algorithms.  While the algorithms should improve over time, they are not perfect.  Some individuals will be misclassified. Berk explains that, whenever we base custody decisions on a risk assessment, there are two kinds of error:

  1. A false negative (which Berk calls the “Darth Vader” error – i.e., a person is released on the basis of a judgment of low risk but in fact is dangerous, and subsequently commits harm); and
  2. A false positive (which Berk calls the “Luke Skywalker” error – i.e., a person is detained on the basis of a judgment of dangerousness, but in fact would not commit any harm if released).

As Berk points out, judges balance the risk of false negatives and false positives in their decision-making every day.  Thus, so long as predictive assessment tools help judges more accurately identify who is likely to commit future crimes and who is not, we have improved the criminal justice system, reducing incarceration without compromising public safety and delivering more services more effectively to individuals who need them in pre-trial, corrections, and probation contexts. 

To those who worry that judges will accord the assessments undue weight, Berk is direct: “[An assessment tool] just provides an additional piece of information that is more explicit about some value of risk… . You can tell them how reliable that is.  [Judges] are not stupid.”  At the same time, jurisdictions that seek to implement a tool must make important policy decisions about how the tool will operate.  Berk reports that most policymakers, criminal justice professionals, and even victim’s advocates find a false negative to be more worrisome than a false positive (that is, they would rather a harmless person be mistakenly detained than a dangerous person mistakenly released).  They therefore elect to structure the algorithm to classify people as “high-risk” at a lower weight of statistical evidence than they otherwise would.  Berk explains that we also need to know the biases inherent in the data, and be comfortable with the underlying algorithms and how they use the underlying data.  Furthermore, it is essential that the tools be transparent, with open source code that allows judges, defense attorneys, and communities to understand, evaluate, and challenge the algorithm.  (Note:  For additional discussion on transparency, see our Panel on DNA Mixture Assessment Software.)  For example, one jurisdiction structured its assessment tool so that an individual would be classified as “high” risk if his or her calculated risk of being arrested for a violent crime in the specified timespan was 20% or greater, or if the calculated risk of being arrested for a lower-grade felony was greater than 70%.  This application of policy-driven set of weights and balances modifies the weight of the data and can have a considerable impact on individuals and the progression of their cases in the criminal justice system.

A Constitutional Scholar’s View. 

While Berk evaluates risk assessment tools from a practical, user-based perspective, University of Michigan Law Professor Sonja Starr takes a different view, evaluating their constitutional implications.  While risk assessment tools incorporate a great deal of information about the historical acts of other people in the past, she points out, their assessments do not include any information about the actual crime alleged to have been committed by the individual.  They rely instead on risk factors that include the individual’s past criminal history, gender, age, education, marital status, and employment status, and may also include more attenuated information about one’s credit history, information about one’s parents and their criminal histories, residential stability, and socio-demographics involving crime in and around their residence. 

While none of the risk assessment tools today overtly uses race as a factor – which Starr says would almost certainly violate equal protection law – Starr argues that these “risk-heightening” criteria have a greater negative impact on minorities based not on their actual acts, but on their socio-demographic characteristics. And because the classifications include age and gender information, they raise equal protection concerns, because “individuals have a right to be treated as individuals, regardless of the accuracy of generalizations about others in their group.”

Starr questions whether risk assessment tools can survive the heightened constitutional scrutiny to which state policies that discriminate on the basis of gender or poverty are subject.  Heightened scrutiny requires that the state’s policy be tailored to its ultimate purpose. Since the State’s interest is not in predicting crime, which is what risk assessment tools do, but in preventing crime, Starr argues that the statistical classification of certain classes of people as “high-risk” may not be sufficiently tailored to the ultimate goal of prevention.  This argument has yet to be litigated in the courts, but may provide an avenue for limiting the deployment of risk assessment tools in more risk-averse jurisdictions.

A Civil Libertarian View.

Ezekiel “Zeke” Edwards, Director of the ACLU’s Criminal Law Reform Project (and a Penn Law alumnus) shares Starr’s concern that predictive assessment tools may not be objective providers of information. In fact, as Edwards put it, the data “is inherently subjective.”  As an example, Edwards points to information about an individual’s prior criminal history, which is a key factor in all risk assessment analyses.  That criminal history is based upon a series of subjective assessments made by policymakers having nothing to do with any individual’s predispositions – decisions about where to deploy the police, whom to stop, what to charge for what kinds of behaviors, under what circumstances to propose bail versus release on recognizance, etc.  It is impossible to separate the societal and systemic biases that inform all of these discretionary decisions from the data itself.  Our risk assessment tools suffer from an inherent bias, in that race and class disparities in past policing practices have shaped the criminal history factors that constitute the core of today’s risk assessment algorithms.  As a result, our tools are far more likely to classify a poor black person as a “Darth Vader.”

In Edwards’ view, saying that race is not a factor included in the risk assessment may be semantically true because an individual’s race is not included as an explicit factor in the algorithm.  Even so, it is Edwards’ contention that these tools are “dripping with race” due to the de facto bias of past criminal history and its overlay with race and geography.  “Our fear,” Edwards concluded, “is that we’re going to use statistics and data that [are] contaminated to further a sanitized racial injustice and justify a continued disparate treatment.”

At the same time, Edwards concedes that “the status quo, where judges make their own often racially biased decisions … hasn’t been working.” While he and Berk may draw lines in different places, they both seem to agree that the tools could be better than a judge’s unfettered discretion.  Edwards’ two-part solution is (a) to suggest that jurisdictions build a presumption against preventive detention into new risk assessment tools in order to counteract the inherent biases in the data, and (b) to advocate for the use of defense counsel at any hearing where risk assessment tools are used.  Counsel can educate judges about inherent biases in the dataset, and can present personal information about the individual in question not captured by the statistical assessment.  He describes bail hearings conducted by judges afraid to depart from assessment recommendations, applying a biased data set to unrepresented defendants, as a triple negative threat against the defendant.  (For more on bail hearings without defense representation, see the Video in Criminal Justice Panel).

Is a 360° View of Risk Possible?

Audience members point out that risk assessment tools focus on individuals who have re-engaged with the criminal justice system in a negative way, violating probation or parole or by committing additional crimes after completion of a sentence.  They do not factor in the potential positive impact of criminal justice interventions such as diversion programs or other social welfare programs that might mitigate the historical risk of re-offense by an individual with a particular risk profile.  Nor, says Starr, do they typically differentiate among the various types of re-offense; perhaps the tools could be used to qualify the risks in more detail so that a judge could differentiate between the likelihood of the commission of violent crime in the future, as opposed to the likelihood that an individual would violate probation by staying out beyond a curfew or being found in possession of personal amounts of marijuana, for example.  (At least one tool, created by the Laura & John Arnold Foundation, has a “flag” for individuals who are anticipated to be at high risk of committing violent crimes in the future.)

Looking Ahead.

As actuarial risk assessment becomes standard practice, communities, criminal justice institutions, policymakers and scholars will have to confront and resolve a set of core questions (perhaps in ways that differ across jurisdictions):

  1. What “risk” should an actuarial tool assess (what outcome(s))?
  2. What likelihood of a given outcome, over what timespan, is sufficient to classify a person as “high risk”?  (Answering this question requires a jurisdiction to decide what ratio of false negatives to false positives it will tolerate.)
  3. How should a risk assessment tool communicate its assessments to decision makers in order to maximize clarity and reduce the chance of overreliance?
  4. How should the state respond to risk classifications? What interventions produce the greatest public-safety benefit relative to their cost, including their infringement on individuals’ liberty?
  5. Can risk assessment tools be structured to eliminate racial bias (and what might that mean)?

Quattrone Fellow Sandra Mayson is exploring these questions further in a work in progress.  Please return to the Quattrone Center’s “Output” page for additional information when available.


View full panel video here

Recent Publications:

Presenters’ work (selected publications)

Other selected materials