by Cathy O'Neil
from
TheGuardian Website
Photograph: MatejMo/Getty Images/iStockphoto whether you get a mortgage or how much you pay for insurance. But sometimes they're wrong - and sometimes they are designed to deceive
Some of them, however, are made to be criminal. Algorithms are formal rules, usually written in computer code, that make predictions on future events based on historical patterns.
To train an algorithm you
need to provide historical data as well as a definition of success.
Financial risk models also use historical market changes to predict cataclysmic events in a more global sense, so not for an individual stock but rather for an entire market.
The risk model for mortgage-backed securities was famously bad - intentionally so - and the trust in those models can be blamed for much of the scale and subsequent damage wrought by the 2008 financial crisis.
Since 2008, we've heard less from algorithms in finance, and much more from big data algorithms. The target of this new generation of algorithms has been shifted from abstract markets to individuals.
But the underlying functionality is the same:
The recent proliferation in big data models has gone largely unnoticed by the average person, but it's safe to say that most important moments where people interact with large bureaucratic systems now involve an algorithm in the form of a scoring system.
Getting into college, getting a job, being assessed as a worker, getting a credit card or insurance, voting, and even policing are in many cases done algorithmically.
Moreover, the technology introduced into these systematic decisions is largely opaque, even to their creators, and has so far largely escaped meaningful regulation, even when it fails.
That makes the question of which of these algorithms are working on our behalf even more important and urgent.
I have a four-layer hierarchy when it comes to bad algorithms:
In 2015, e-commerce business Poster Revolution was found guilty of using algorithms to collude with other poster sellers to set prices. Photograph: Bob Handelman/Getty Images
It's worth dwelling on the example of car manufacturers because the world of algorithms - a very young, highly risky new industry with no safety precautions in place - is rather like the early car industry.
With its naive and exuberant faith in its own technology, the world of AI is selling the equivalent of cars without bumpers whose wheels might fall off at any moment.
And I'm sure there were such cars made once upon a time, but over time, as we saw more damage being done by faulty design, we came up with more rules to protect passengers and pedestrians.
So, what can we learn from the current, mature world of car makers in the context of illegal software?
First, similar types of software are being deployed by other car manufacturers that turn off emissions controls in certain settings. In other words, this was not a situation in which there was only one bad actor, but rather a standard operating procedure.
Moreover, we can assume this doesn't represent collusion, but rather a simple case of extreme incentives combined with a calculated low probability of getting caught on the part of the car manufacturers.
It's reasonable to expect, then, that there are plenty of other algorithms being used to skirt rules and regulations deemed too expensive, especially when the builders of the algorithms remain smug about their chances.
Next, the VW cheating started in 2009, which means it went undetected for five years.
What else has been going on for five years?
This line of thinking makes us start looking around, wondering which companies are currently hoodwinking regulators, evading privacy laws, or committing algorithmic fraud with impunity.
Indeed it might seem like a slam dunk business model, in terms of cost-benefit analysis: cheat until regulators catch up with us, if they ever do, and then pay a limited fine that doesn't make much of a dent in our cumulative profit. That's how it worked in the aftermath of the financial crisis, after all.
In the name of shareholder value, we might be obliged to do this. Put it another way... we're all expecting cars to be self-driving in a few years or a couple of decades at most.
When that happens,
If this sounds confusing for something as easy to observe as car crashes, imagine what's going on under the hood, in the relatively obscure world of complex "deep learning" models.
The tools are there already, to be sure.
China has recently demonstrated how well facial recognition technology already works - enough to catch jaywalkers and toilet paper thieves. That means there are plenty of opportunities for companies to perform devious tricks on customers or potential hires.
For that matter, the incentives are also in place. Just last month Google was fined €2.4bn for unfairly placing its own shopping search results in a more prominent place than its competitors.
A similar complaint was leveled at Amazon by ProPublica last year with respect to its pricing algorithm, namely that it was privileging its own, in-house products - even when they weren't a better deal - over those outside its marketplace. If you think of the internet as a place where big data companies vie for your attention, then we can imagine more algorithms like this in our future.
There's a final parallel to draw with the VW scandal.
Namely, the discrepancy in emissions was finally discovered in 2014 by a team of professors and students at West Virginia University, who applied and received a measly grant of $50,000 from the International Council on Clean Transportation, an independent nonprofit organisation paid for by US taxpayers.
They spent their money driving cars around the country and capturing the emissions, a cheap and straightforward test.
In 2015, Volkswagen was found to have used a malicious algorithm to deceive the emissions test. Seven VW executives have been charged in the US. Photograph: Patrick T Fallon/Bloomberg/Getty
The answer is, so far, no.
Instead, at least in the US, a disparate group of federal agencies is in charge of enforcing laws in their industry or domain, none of which is particularly on top of the complex world of big data algorithms.
Elsewhere, the European commission seems to be looking into Google's antitrust activity, and Facebook's fake news problems, but that leaves multiple industries untouched by scrutiny.
Even more to the point, though, is the question of how involved the investigation of algorithms would have to be. The current nature of algorithms is secret, proprietary code, protected as the "secret sauce" of corporations.
They're so secret that most online scoring systems aren't even apparent to the people targeted by them.
That means those people also don't know the score they've been given, nor can they complain about or contest those scores. Most important, they typically won't know if something unfair has happened to them.
Given all of this, it's difficult to imagine oversight for algorithms, even when they've gone wrong and are actively harming people. For that matter, not all kinds of harm are distinctly measurable in the first place. One can make the argument that, what with all the fake news floating around, our democracy has been harmed.
But how do you measure democracy?
That's not to say there is no hope. After all, by definition, an illegal algorithm is breaking an actual law that we can point to. There is, ultimately, someone that should be held accountable for this.
The problem still remains, how will such laws be enforced?
Ben Shneiderman, a computer science professor at the University of Maryland, proposed the concept of a National Algorithms Safety Board, in a talk at the Alan Turing Institute.
Modeled on the National Transportation Safety Board, which investigates ground and air traffic accidents, this body would similarly be charged with investigating harm, and specifically in deciding who should be held responsible for algorithmic harm.
Algorithms sift through historical data to value homes. In the US, one homeowner is suing Zoopla for knocking $100,000 from the value of her property by drawing on the wrong data. Photograph: Yui Mok/PA
This is a good idea.
We should investigate problems when we find them, and it's good to have a formal process to do so. If it has sufficient legal power, the board can perhaps get to the bottom of lots of commonsense issues. But it's not clear how comprehensive it could be.
Because here's where the analogy with car makers breaks down:
A proliferation of silent and undetectable car crashes is harder to investigate than when it happens in plain sight.
I'd still maintain there's hope. One of the miracles of being a data skeptic in a land of data evangelists is that people are so impressed with their technology, even when it is unintentionally creating harm, they openly describe how amazing it is.
And the fact that we've already come across quite a few examples of algorithmic harm means that, as secret and opaque as these algorithms are, they're eventually going to be discovered, albeit after they've caused a lot of trouble.
What does this mean for the future? First and foremost, we need to start keeping track.
Each criminal algorithm we discover should be seen as a test case.
As we learned after the 2008 financial crisis, a rule is ignored if the penalty for breaking it is less than the profit pocketed.
And that goes double for a broken rule that is only discovered half the time.
Even once we start building a track record of enforcement, we have ourselves an arms race. We can soon expect a fully fledged army of algorithms that skirt laws, that are sophisticated and silent, and that seek to get around rules and regulations.
They will learn from how others were caught and do it better the next time. In other words, it will get progressively more difficult to catch them cheating.
Our tactics have to get better over time too.
Predictive policing algorithms use historical data to forecast where crime will happen next. Civil rights groups argue that these systems exacerbate existing police prejudices. Photograph: Stuart Emmerson/Alamy
We can also expect to be told that the big companies are "dealing with it privately".
This is already happening with respect to fighting terrorism. We should not trust them when they say this. We need to create a standard testing framework - a standard definition of harm - and require that algorithms be submitted for testing.
And we cannot do this only in "test lab conditions," either, or we will be reconstructing the VW emissions scandal.
One of the biggest obstacles to this is that Google, Facebook, or for that matter Amazon, don't allow testing of multiple personas - or online profiles - by outside researchers.
Since those companies offer tailored and personalized service, the only way to see what that service looks like would be to take on the profile of multiple people, but that is not allowed.
Think about that in the context of the VW testing:
We need to demand more access and ongoing monitoring, especially once we catch them in illegal acts.
For that matter, entire industries, such as algorithms for insurance and hiring, should be subject to these monitors, not just individual culprits.
It's time to gird ourselves for a fight. It will eventually be a technological arms race, but it starts, now, as a political fight.
We need to demand evidence that algorithms with the potential to harm us be shown to be acting fairly, legally, and consistently. When we find problems, we need to enforce our laws with sufficiently hefty fines that companies don't find it profitable to cheat in the first place.
This is the time to start demanding that the machines work for us, and not the other way around.
|