How fair is an algorithm? A comment on the Algorithm Assessment Report

The worldwide use of faster, smarter and more complex algorithms have the potential to make many things better but some things worse. The now famous controversies over facial recognition software and the COMPAS criminal recidivism prediction tools that overstated the future risk of African Americans being cases in points. Of course, we have had our own history of controversy over the use of predictive analytics in the field of child welfare – first proposed to deliver preventive services, then trialled in child protection decision making at the intake office of what is now Oranga Tamariki. Neither are currently in use, confirmed in the Algorithmic Assessment report released last week by Stats NZ, a report outlining all the ways that algorithms are currently used in government Algorithm Report.

The report is a great start towards more transparency around the ways algorithmic tools are currently used in Aotearoa, and shows a commitment to increased public transparency around the use of such tools. The report gives some insight into the ways algorithms are used across a range of services – from identifying school leavers at risk of long term unemployment, to identifying dodgy packages arriving at the border for the NZ customs service. But how should we evaluate the ways algorithms impact on rights? Algorithmic tools used in social policy and criminal justice spheres inevitably shape who qualifies for limited resources, and the interactions of the state with those in contact with criminal justice systems. In both areas, there are important ethical implications, and these implications depend on the data used, the type of algorithm, and to what extent to which it is used in actual decision-making.

One way to evaluate the ethical impacts of algorithmic tools on people is via the fairness, transparency and accountability framework arising from the work of US academics. Figuring out if an algorithm is ‘fair’ leads to questions about statistical fairness like: where does the data come from used to risk score people, and is it based on a representative sample? Does the tool over-identify any particular group of people who belong to a ‘protected class’ (think class race gender) and if it does, is the disproportionate identification a reflection of disparate incidence, or social biases? Does the error rate – the level of false positives or negatives for different groups – mean that some are subject to more incorrect classifications than others? Does the feedback the tool is ‘learning from’ exacerbate or reduce biases caused by incomplete, inaccurate or skewed data?

But these are not the only way to think about fairness and bias. Social definitions of fairness include more fundamental ways that inequities can work their ways into algorithms. For example, while a tool might be statistically ‘fair’ in terms of similar error rates between groups, it can still reproduce social inequalities if the predictor variables in the input data  reflect social biases, differentials in patterns of surveillance, or biases held by the decision-makers involved in the system concerned that become the predictor variables. Think unconscious ethnic bias on the part of decision makers in criminal justice or child protection systems, or differing levels of neighborhood surveillance for example.

The use of predictive tools in crime is a case in point on both types of fairness. The Algorithm Assessment Report states two are used: the ROC *ROI tool that predicts recidivism and is used in sentencing decisions, and the family violence tool used at domestic violence callouts to predict risk of future perpetration. In terms of statistical fairness, crime data used in these algorithms is based not on a sample of all crime, but on all crime that was detected, charged and convicted.  If some groups are subject to more surveillance and conviction rates than others, then this skews the sample frame of the data used to inform the predictions. This then becomes a self-fulfilling prophecy, as if used in this way, then those surveilled, caught and charged are then calculated as high risk, get longer sentences, more surveillance and so the cycle goes on. For Māori, Pacific and those in more deprived communities, this is an entirely feasible reality, as differences in conviction rates for the same crime for Māori compared to non-Māori, for example, are well known (Morrison, 2009). As the error rates for different groups are not published in the report, we can’t know if the false positives and negatives generated by the criminal justice algorithms differ between groups or how this ‘ratchetting’ effect might be mitigated.

In terms of social definitions of fairness, the broader context of colonisation and direct bias for Māori is inescapably built in to the data used, so even if all crime was detected and convicted equally, the pernicious effects of colonisation and poverty might still push up the ‘risk’ scores for Maori.  So while we are offered the reassurance that ethnicity is not used as a variable in the ROC ROI algorithms, literally every other variable such as age at first offence, frequency of conviction, number of convictions will over-identify Māori as being high risk. The Ministry of Justice’s own report notes this, stating that that factors such as “offence seriousness, offending history, and socioeconomic status are not neutral factors and may be interpreted as the product of earlier bias in the system and/or the result of broader structural biases that have become entrenched in criminal justice decision making criteria” (Morrison, 2012, p.9).

Using algorithmic tools in the criminal justice domain creates other legal issues too. There is a fundamental right to be treated as an individual before the law. An algorithm is essentially a very complex classification tool that sorts people into categories. The question must be asked: is it a breach of human rights to make legal judgements based on a person’s statistical similarity to others in their group, rather than them as an individual?

Fairness issues also relate to how algorithmic scores are actually used to inform decisions. For example the Client Service Matching tool used by MSD to determine the best case management service for each client is ‘based on service design and effectiveness’. While the details about how this operates are not given, generally rules used to guide an algorithm reflect a set of moral assumptions – not just effectiveness – even if this is unintentional. For example, if the tool is used to predict who qualifies for more support based on who is most likely, if they get that ‘case management’ support, to be off the benefit within a certain amount of time, this could result in those people calculated as ‘harder to help’ not qualifying for support, while those deemed easier to help get the service. Let’s think about that for a moment. Does this mean that those who have been on the benefit longer get less help to get off it? Or those who live in a particular job-scarce region get less help, because the algorithm says it’s not as effective to help them? Or that older people get less help than younger ones, simply because they may have more challenges in accessing paid employment? Or is it more like the NEET tool used for young people, where the tool is used to identify those who ‘may be at greater risk of long-term unemployment’, focussing on those most in need? What happens to someone’s score if they refuse help? How tools such as these are used to ration services, whether they focus on the easiest or hardest to reach, whether they are used as inclusion or exclusion criteria, and whether they are hard or soft criteria (absolutes or suggestions) are important fairness issues too. What your algorithmic score gets you or excludes you from is essentially a judgement relating to a much larger conversation about the availability of resources, how ‘successful’ outcomes are defined in the data, and therefore are inherently moral decisions about who is considered deserving of help. Algorithms can obscure the moral content of these conversations in the name of efficiency.

But isn’t this better than a human? After all, where an algorithm is involved, the question shouldn’t be how good is it, but is it better (more accurate) and fairer (follows the criteria rules more consistently) than humans? This is a very ‘fair’ question, as humans can also be biased, incorrectly weight various factors and be overly reliant on heuristics or fast rules of thumb, particularly when making decisions under time pressure. But to truly answer this, we need more comparative studies of humans in action compared to the predictive tool, and more transparency about how accurate an algorithm actually is, rather than responses citing generic research about how much better statistical prediction tools are than humans. As Dressel and Farid recently showed, such assumptions should be tested in context. They found that random humans recruited on the internet were just as good as predicting crime recidivism as the COMPAS predictive tool, and so was a very simply two factor checklist.  Other meta studies conclude that while actuarial or statistical tools are often more accurate than humans, they are not always, and that it really depends on the context, type of decision, and the decision-maker (See Baartelink et al., 2015 for an overview of these in child protection). Some common errors of reasoning can be amenable to educative and reflective correction, while others less so.

Accuracy, while important, is not the only consideration when comparing algorithmic tools to humans. Some decisions just should be made by humans, because they are made in the context of responding to other humans who don’t just need just a prediction – they need a relationship. In the delivery of social work, for example, an estimation of future risk is just one of the many things a person is doing when they have conversations with people about their situation. They are also assessing needs, resources, social and emotional functioning, cultural preferences and building a relationship that might enable them to assist the person further. All things automated tools just can’t do – yet.

The report finishes with a strong and well considered list of recommendations. For example in development and procurement, the report suggests an emphasis on ensuring that the views of ‘the people who will be the subjects of the services in which algorithms will be embedded’ are included (were any included in the development of those currently in use?) and consideration of ‘ways to embed a te Ao Māori perspective through a Treaty-based partnership’. Retaining human oversight is also recommended, along with  the need to balance algorithms with human input. All good stuff. But it’s clear that in some ways, the horse has already bolted. In the most intrusive uses of algorithms in the criminal justice system, populations who are most socially marginalised are already heavily affected. Ensuring their input into future use will be imperative, but it’s interesting that these have been used so far with little public debate or resistance. After all, it’s easy to avoid consulting with people caught up in the criminal justice system about the use of the ROC ROI and the Family Violence prediction tools – their views are easy to dismiss because they spring from the most marginalised and impoverished groups in society. Whether they are even informed that the tool has been used is unclear. As AI Now notes, challenging decisions made by automated tools in criminal justice contexts is surprisingly rare given their significant effects simply because: “most defendants and their lawyers at the trial level simply do not have the time, energy, or expertise to raise such challenges.” (AI Now 2018: 14).

Other groups worldwide are also trying to evaluate the ethical and practical effects of algorithmic use. The Algorithmic Impact Assessment framework, also published by AI in Society, for example, suggests ‘informing communities about how such systems may affect their lives’, and ‘increasing the capacity of public agencies to assess fairness, justice, due process, and disparate impact’ (Reisman et al., 2018, p2).

The Algorithmic Assessment Report makes a great start in this direction, but its contents remind us of the need to get under the surface of algorithms, and keep a strong line of sight on how each one functions, how accurate they are, what assumptions or moral judgements are built into them, who benefits or is disadvantaged by their use, and who is in control of them.

Image credit: Bill Smith

 

References

Bartelink, C., van Yperen, T. A., & ten Berge, I. J. (2015). Deciding on child maltreatment: A literature review on methods that improve decision-making. Child Abuse & Neglect, 49(Supplement C), 142-153. doi: https://doi.org/10.1016/j.chiabu.2015.07.002

Morrison, B. (2009). Identifying and responding to bias in the criminal justice system: A review of international and new zealand research. Wellington, NZ: Retrieved from https://www.justice.govt.nz/assets/Documents/Publications/Identifying-and-responding-to-bias-in-the-criminal-justice-system.pdf.

Leave a Reply

Your email address will not be published. Required fields are marked *