Book Review: The Ethical Algorithm

Judy Robertson — Thu, 17 Sep 2020 10:35:52 +0000

Book Review of ‘The Ethical Algorithm: the Science of Socially Aware Algorithm Design’ by Michael Kearns and Aaron Roth

I am reading ‘The Ethical Algorithm: the Science of Socially Aware Algorithm Design’ by Michael Kearns and Aaron Roth, and it has reminded me why I like computer science.

Computer science is all about carefully specifying problems and what acceptable solutions would look like. This means pinning down, beginning with precise language and then perhaps moving to maths, what exactly an algorithm should do and what properties its output should have. As any computer scientist can tell you, designing algorithms usually involves trade-offs; there might be many solutions to your problem and the “best” solution will change depending on your point of view. It’s not that us computer scientists can’t accept shades of grey. We can. We’ll insist on quantifying the shades of grey though, and then we’ll reason about them for you in a rigorous way. We might come back to you to ask you to make some hard choices: do you want your algorithm to be accurate or fair? How accurate? How fair? Or even, what sort of fairness do you want (coz you can’t have it all)?

This is very useful because it enables us to move past hand-wavey calls for algorithm designers to become more ethical, or suggestions that algorithms should be regulated in some vague way. Armed with concrete methods for how we can design algorithms to meet various social needs, we can put in place systems which enact a society’s chosen values (assuming that society can clearly articulate them and they aren’t inherently contradictory). The advances in theoretical computer science explained in this book can move us forward to the point that every computer scientist or data scientist can be taught how to routinely build ethical issues (such as privacy and fairness) into their algorithms, given that the stakeholders have specified the particular ethical constraints which are important.

Machine learning algorithms are trained on data sets of examples with the purpose of working out a way to produce the “best” possible solution on a set of new examples in the future. Common tasks for machine learners would be categorising images (e.g. does this picture contain a giraffe?), recommending what items a shopper might like to buy next, or – with higher stakes – whether a candidate should be given a job interview. Machine learners never give you anything for free – if you want it to give you an accurate answer, don’t assume its output will also be ethical (e.g. unbiased) unless you spell that out very clearly. In short, defining “best” might take some work.

Algorithms for privacy

The first shocker in the chapter on privacy is that “anonymised data isn’t” – that is, there is a real danger that individuals can be identified by comparing several datasets, even if each dataset has been anonymised. The classic example of this is the Netflix competition which released an anonymised dataset of users’ film rental histories to enable teams to develop an improved recommendation system (e.g. if you like ‘Harry Potter’, why not try ‘The Golden Compass’?). that even though the names were taken out of the Netflix data, they could infer the identify of a Netflix user relatively easily if they also had completed a review on IMDB using their own name. Oops.

In general, it is hard to deanonymize small datasets or those which might be disclosive when linked with other data sets. The trade-off here is between protecting the privacy of an individual against the disclosure of sensitive information while also enabling data sharing for scientific purposes. The human race will benefit if I contribute genomic information to a research study, but I don’t want my insurance company to infer anything about me as an individual from the resulting public dataset in case they use it to increase my insurance payments.

Happily, there is an algorithmic solution for this called ‘differential privacy’. It’s the idea that “nothing about an individual should be learnable from a dataset that cannot be learned from the same dataset but with that individual’s data removed” (p36). Imagine you’re conducting a poll about an embarrassing topic (let’s say you want to establish the prevalence of toenail fungus in the population). The NSA are taking a sinister interest in fungus, and will subpoena your research data, compelling you to turn over individual records of people in need of what they euphemistically call “fungicide treatment”. You don’t want that (nobody wants that) so you decide the best approach is as follows. When you call participants to ask them about their toenails, you give them these instructions: “flip a coin and don’t tell me how it landed. If it is heads, tell me honestly whether you have a toe-nail fungus. But if the coin landed tails, then tell me a random answer by doing the following. Flip the coin again and say ‘Yes, I am full of fungus’ if it comes up heads and ‘No, I am fungus free if it is tails’”. This is fiendishly clever because now if the NSA comes knocking on someone’s door, they can always deny that there is anything untoward in their socks, it’s just the way the second coin landed. The NSA would never know they difference unless they are willing to remove citizen’s footwear on a whim. And yet, you as a researcher can still infer what you need to about fungus frequency because although there are errors in the dataset, you systematically introduced them, so you can work backwards to factor them out too.

It turns out that you can glue together a series of algorithms which guarantee differential privacy and the result will also be differentially private too. This technique has real world applications which are used by Google and the US Census Bureau.

How to make algorithms fair

Machine learning classifiers have gained notoriety for being unfair on the grounds that their output can be sexist and racist. This is not because they have been designed by chauvinist computer scientists who long for the 1940s, but because they are trained on real world data sets. Algorithms which are trained on datasets of photos which don’t have many examples of black faces will have poor accuracy when attempting to classify new images of black people. Algorithms which are trained on text written by humans are sadly more likely to associate men with the word “genius” and women with the word “earring”. The human world is riddled with bias, so our machine learners will reflect that. Due to the mathematical properties of the algorithms they use, they may also amplify bias against minority groups simply because the groups are smaller and influence decisions less; there are more samples of the majority group. Increasing fairness for one group may reduce accuracy for other groups.

Beyond the bias in the training set, I was fascinated to read about the different reasonable definitions of fairness and how there are mathematical proofs that cannot be achieved concurrently. The book uses an example of a bank which uses machine learning to decide whether to grant a loan to fictional people in a society of two racial groups: Circles and Squares. One definition of fairness would be that the proportion of false positives and false negatives for Circle and Square should be the same: the same proportion of Circles should be approved loans when they actually can’t pay them back as their Square counterparts, and also the same proportion of Circles and Squares should have their loans rejected even if they really could have repaid them. However, another definition of fairness is called equality of positive predictive value: among the people who the algorithm predicts will repay the loan, the repayment rates among Circles and Squares should be the same. It turns out that it is not possible to have equality of false negatives and positives at the same time as equality of positive predictive value. There are also some interesting results on intersectionality. It is possible to be fair to multiple groups (Reds, Blues, Circles, Squares) but it comes at the expense of being unfair to individuals in the intersection of those groups (such as Red Circles, Blue Squares).

Who should read this book?

The book is well written, using small examples to build up a series of logical steps which often lead to surprising conclusions. Most of the explanations are just plain English, without a lot of maths notation. It would probably be helpful for readers to have a basic understanding of machine learning and a background understanding of algorithm design. I can attest to the fact that a dimly remembered computer science degree from the 1990s will do the job but I wouldn’t recommend it for school pupils.

By Judy Robertson

The post Book Review: The Ethical Algorithm appeared first on ��Ƶ.

Book Review: You Look Like a Thing and I Love You

Judy Robertson — Tue, 25 Aug 2020 08:09:12 +0000

I’ve just finished the book You Look Like a Thing and I Love You (available in multiple formats from ) and it is hilarious! It’s written by who has a PhD in electronic engineering and is author of the blog .

I love it because it is so refreshing. At work, I sometimes have to go to meetings where people put on their worried faces and spout about “datification” and “algorithms” and how we’re all doomed. This book very quickly reminded me that we’re totally not doomed, or at least we have breathing space while the AIs faff around perfecting their cookie recipes. We’re more likely to be destroyed by AI stupidity than smart AI – “worrying about an AI takeover is like worrying about overcrowding on Mars” as (the god of deep learning) puts it.

If you’re wondering about the title of the book, it’s a pick up line generated by an AI with about as many neurons as a worm. The AI was trained on a dataset of pick-up lines which the author collected from the dark recesses of the internet. It then was instructed to generate sentences with a similar pattern. “You must be a tringle? Cause you’re the only thing here” it said coyly among a whole slew of other surreal come-ons.

Don’t need any pick-up lines? Train your AI to generate paint colours, guinea pig names or Harry Potter novels. I would have been happy with a book full of lists of weirdness, to be honest, but the author has accomplished a lot more than making me laugh at the stupidity of algorithms. She also managed to write a very readable book which explains how some of the main techniques in machine learning work using amusing examples and hand drawn cartoons. She walks through recurrent neural networks, markov chains, random forests, genetic algorithms, generative adversarial networks, and reinforcement learning without the reader breaking a sweat. There are gleeful but useful descriptions of the myriad ways in which AI fails. At times the author sounds like an exasperated animal trainer. The explanations are clear and maths free which is quite a feat given the technical details lying beneath. The book would be suitable for a keen upper high school student with a surreal sense of humour, or for adults without a technical background who want to learn more about AI. I particularly recommend it for arm-chair doomsayers.

There is a lot of hype about AI just now, both in terms of scare stories of the dangers or evangelising about business successes. One of the most valuable sections of the book is the set of questions to ask yourself if you’re trying to work out whether to believe a claim you read about a new AI product.

How broad is the problem which the AI is meant to solve? It’s much more likely that an AI can do one simple task (such as categorise items in an image) than successfully do a range of different activities, even those which humans do every day (like walking to the shops to buy milk).
Where did the training data come from? Ask yourself whether it is plausible that there are lots of examples of the output the AI is designed to produce. If not, it would be difficult to train the AI with the volume of examples it would need to get accurate.
Does the problem require a lot of memory? AI is computationally intensive, particularly with memory. Text generated by an AI gets incoherent after a bit and they often can’t remember what happened previously in an interaction. Is that a tired human you’re chatting to on the customer service website, or a bot?
It is just copying human biases? If the company selling the AI is claiming that it can carry out complex social judgements in a bias-free way (such as how trustworthy someone is), take it with a pinch of salt. For a start, this is a broad problem which AIs are typically rubbish at, and also unless it has been carefully design to avoid perpetuating human biases, it will likely be faithfully reproducing the biases of the humans who made the judgements in the dataset on which it was trained.

I leave you with an April Fool’s Day prank generated by a markov chain: “Make a toilet seat into pants and then ask your car to pee”. Happy Reading!

–Professor Judy Robertson

The post Book Review: You Look Like a Thing and I Love You appeared first on ��Ƶ.