AI Systems Are Imperfect. Use Them With Care. — The Value Engineers

Roel Wieringa
8 min readJun 6, 2021

Spoiler: AI decisions must be regulated because they can be used to make moral decisions on a massive scale, and this allows massive shirking of human responsibility. Unregulated use leads to a lack of autonomy in decision makers and subjects alike.

This risk can be avoided. But instead, it currently materializes because of another mass phenomenon: decision makers who, confused about what AI is, think AI systems can be perfect. Let’s start with deconstructing this confusion.

What is AI?

Last year, the AI Watch of the Joint Research Centre of the European Commission published a review and analysis of 54 different definitions of artificial intelligence. This includes the definition given by the High-Level Expert Group on Artificial Intelligence but excludes the one given in the proposed AI regulations of the EU Council and Parliament — which constitutes definition number 55.

Let this sink in: Sixty four years after John McCarthy coined the term “Artificial Intelligence”, there is no agreement about what AI is.

But there are common threads in all definitions that have been proposed. AI Watch does a commendable job in identifying them. Most AI researchers and professionals think that AI systems are systems that perceive the environment, interpret this in terms of a model and act on this environment with a certain level of autonomy in the service of a goal.

Examples are speech recognition (Alexa), recommendation engines (Amazon, Facebook), autopilots (Tesla), fraud detection (credit cards), medical diagnosis (Watson) and recidivism prediction (Compas). These systems collect data, decide on an action based on the goal to be achieved, and then perform the action or give their user an advice to perform this action.

But this means that AI systems are reactive systems! Reactive systems maintain a model of their environment by which it enables, enforces, or prevents events in its environment [1]. Examples are elevator control systems, cruise control systems and fly-by-wire systems.

But surely AI systems are autonomous? Well, whatever system you build, it can only respond to three things: events (a sensor is triggered), a change in conditions (temperature crosses a critical value) or a tick of the clock (it is time to do something). It cannot respond to anything else. There is no magic.

The behavior of AI systems too, consists of responses to their environment. There is nothing else they can do. Spontaneous action would be random action. Even for AI systems there is no magic.

But doesn’t autonomy mean that the system chooses which action to perform? Yes, every reactive system does. It has a repertoire of actions from which to choose. Even when AlphaGo taught itself to play Go, it had a fixed repertoire of actions. Don’t expect it to learn how to mow your lawn all by itself.

But surely it is autonomous in choosing which actions to perform when it tries to achieve a goal? Yes it does. As do all reactive systems which are programmed to achieve a goal. Your thermostat does it. Your cruise control does it.

I think we tend to regard a reactive system as AI when the model is perceived as sophisticated. What was once considered AI (a program that played checkers) is now considered a routine software application. Cruise controls and thermostats have extremely simple models of their environment, of the goal to be achieved, and of the actions to perform. We don’t think of them as AI systems. Speech recognition models, warehouse robots and medical diagnosis are hugely more complex. We tend to view them as AI systems.

Sky-high hopes about imperfect systems

Into this confusion about what AI systems are, step technologists who overpromise on their products, journalists reporting about devices with fabulous capabilities, and politicians who believe in miracle solutions. But AI systems are far from perfect.

Consider the example AI systems listed above. They are data-driven. I listed the ways in which such systems may make a wrong decision in an earlier blog. Let me summarize them here.

  • Data-driven AI systems are trained on samples from a population that may have prejudices. Some of the decisions the system is trained on, encode these prejudices.
  • Any sample is unrepresentative of its population in any number of ways. This may introduce additional bias. This is unavoidable. A sample is a model of a population, and all models are simplifications.
  • All training algorithms have error rates. Data-driven AI systems will make wrong decisions. Eliminating errors is a mathematical impossibility.
  • The quality of the data in the sample, and of the data about the case at hand, has limits. Data entry is sometime erroneous, data semantics is sometimes unclear. There will be an unknown number of errors in the sample data and in the case data.
  • The case at hand is always unique. It will not be exactly identical to any case in the training sample. This uniqueness may or may not be relevant, and the AI system has no way of knowing this.
  • In addition, since the data encodes a finite number of variables, there may be an unknown number of additional variables in which the case at hand is crucially different from the ones in the sample. But this is not visible in the data.

None of this is sufficient reason not to use AI systems. If a system is not perfect, this does not imply that it is rubbish. AI systems can give many useful recommendations and can make fair decisions. But they are not perfect [2].

These imperfections pose risks. And these risks materialize when decision-makers are not aware of these risks because they think that these systems are perfect.

Imperfect automation of moral decisions

If an AI system is used to recommend products that people like me bought, then a bad recommendation does not harm me or anyone else. But if it recommends keeping me in custody because people like me committed crimes, then this harms me. And if it recommends releasing me, then this may result in harm to others.

This decision requires moral reasoning. Annex III of the proposed EU AI regulation lists decisions that require moral reasoning. This includes decisions about access to education, about employment, about access to public services, about law enforcement, about immigration, and about administration of justice.

For those decisions, the subject of the decision -the case at hand- has a right to know how the decision has been made and how it is justified. He or she has the right to respond to it by asking for explanation, asking for a second opinion, or objecting against it.

This in turn requires that the decision maker understand how the decision is made, can justify it and can explain the justification to the subject.

Decision makers are imperfect, just as AI system are. But jointly, they should be able to do a better job than each of them separately. The decision maker should weigh the decision of the machine about a case against what he or she knows about the imperfections of the machine and the uniqueness of the case, to make a decision that he or she can take responsibility for [3]. This is the best of both worlds, because it uses the strength of machines to reduce the limitations of people, and vice versa.

But note that this creates more work than completely automated or completely manual decisions. Where mechanization typically saves us work, automating moral decisions creates more work for the decision-maker if it is a true human-machine cooperation. It requires us to take time for a decision, but the prize to be won is the improvement of moral decision-making.

Repeat this at scale and you have a massive problem

But here is the thing: AI systems allow making moral decisions in bulk without anyone taking personal responsibility. Decisions about entry exams, hiring, promotion, firing, eligibility to public service, and many more can now be made faster and at greater scale than ever before.

Decisions can be massively automated if human decision makers are not involved in decision-making. They have delegated the decision to the machine entirely, thinking that the machine is perfect. But this assumes that the machine never makes mistakes, and that the designers of the machine could have foreseen all relevant properties of all cases ever to be decided by the machine. This is impossible.

In a variant nightmare scenario, decision makers do work with the machine but follow its advice in 100% of the cases, again because they think it is perfect.

In either scenario, decision makers shirk their responsibility.

  • They have no idea what population was sampled, or what prejudices it contains;
  • They do not know how the sample was selected, and do not know in which ways the sample is and is not representative of the population;
  • They are unaware of the unavoidable positive and negative error rates in the algorithm.
  • They are not aware of the quality, or lack of it, of the data in the training sample nor of the data about the case at hand;
  • They do not assess why and how this case is (dis)similar to the sample on which the model is based;
  • They have no idea of the underlying mechanisms that explain the phenomena of a particular case and that could help predicting what the effect of a decision would be.

In short, there is no way the decision-maker can take responsibility for the decisions. Disagreeing with the advice of the machine, and taking responsibility for the decision, requires answering all of the questions above. And this brings bulk automated decision-making to a halt.

It is to avoid these irresponsible bulk scenarios that AI needs to be regulated.

Just as we want to restrict mass surveillance, we want to restrict mass decision-making. In mass surveillance, we lose our privacy. In mass decision-making, we lose our autonomy. Decision-makers lose their autonomy because they don’t make decisions for which they can take responsibility. The subjects of mass decisions lose their autonomy because they cannot respond to something they do not understand.

[1] R. J. Wieringa, Design Methods for Reactive Systems, Morgan Kaufmann, 2003.

[2] S. Goel, R. Shroff, J. L. Skeem and C. Slobogin, “The accuracy, equity, and jurisprudence of criminal risk assessment,” SSRN, no. December 26, p. 21, 2018.

[3] B. C. Smith, “The limits of correctness,” ACM SIGCAS Computers and Society, Vols. 14,15, no. 1,2,3,4, pp. 18–26, 1 January 1985.

Originally published at https://www.thevalueengineers.nl on June 6, 2021.

--

--

Roel Wieringa

Professor emeritus Information Systems, University of Twente, The Netherlands. Co-founder and Director, The Value Engineers (www.thevalueengineers.nl).