From "radical efficiencies" to "hallucinations", we explore the implications for government officials of large language models like ChatGPT
“I keep hearing about #ChatGPT, so thought I would test their diplomatic skills by asking for ‘an amusing speech by the UK ambassador to France on UK/France relations’. Let’s just say I won’t be giving up the day job.”
So tweeted Menna Rawlings, our ambassador to France, in March. Accompanying the tweet was a screengrab in which the advanced chatbot talked – convivially and convincingly – about how, “from the battles of Waterloo and Trafalgar to the Normandy landings, the British and French have stood shoulder to shoulder in defence of freedom and democracy”. Oh dear.
In ChatGPT speak, this is known as a “hallucination”. Rawlings was having a bit of fun, but sometimes hallucinations can be more damaging. In April, Australian regional mayor Brian Hood threatened to sue ChatGPT’s parent company OpenAI for defamation after the chatbot produced text claiming he was a guilty party in a foreign bribery scandal. The reality was the exact opposite: Hood had been one of the whistleblowers alerting the media to the crime.
OpenAI has warned that its software “sometimes writes plausible-sounding but incorrect or nonsensical answers” and that it has “limited knowledge of world and events after 2021”. Google’s forthcoming rival service, Bard, has a similar disclaimer. And yet these large language models (LLMs), which have been trained on the internet to produce unique, detailed and human-sounding responses across any number of disciplines – are improving all the time. As they learn from their mistakes, hallucinations are becoming less common.
Ever since its launch last November, ChatGPT has been hugely popular with the public – not least because of its ability to mimic different styles of writing. For the same reason, it’s becoming a real headache for teachers worried about student plagiarism. But what is also becoming rapidly apparent is that its impact on society will stretch far beyond helping people write personalised sonnets, letters of condolence and even PowerPoint presentations. Alongside some of the more apocalyptic warnings that AI will usher in the end of days, lists have begun to emerge of all the professions that LLMs could render redundant – from accountants and translators to administrative assistants and journalists.
Perhaps this is unsurprising. The mass processing power available to us today simply wasn’t on offer 10 years ago, and machine learning technology – with its ability to find patterns in masses of data at a speed and scale humans could only dream about – has been outperforming us in multiple areas of professional life for some time. (“AI now diagnoses disease better than your doctor, study finds” ran one headline from 2020.)
But the implications of applying that processing power to what could crudely be called “reading and writing” has taken even those in the tech world by surprise. Paul Maltby is former chief digital officer at the Department for Levelling Up, Housing and Communities and now works at the AI consultancy Faculty.
“This is internet-level technology, rather than a tweak,” Maltby says. “This is a really big deal. Even if there was no further development of it, this is going to disrupt things in an unpredictable way. But the fact is, it is developing very, very quickly and there’s a lot of money, effort and time going into it. And these LLMs are a profound shift, and, if we’re not careful, potentially quite dangerous things. So we need to get this stuff right.”
“This is internet-level technology, rather than a tweak. This is a really big deal"
This note of caution has been echoed elsewhere: in early May, Geoffrey Hinton – widely regarded as the godfather of AI – quit his job at Google, saying he regretted his work and warning that AI chatbots would be used by “bad actors” to do “bad things”.
Jack Perschke, director of public sector development at the consultancy Content + Cloud, is also concerned about the potential dangers of the technology but takes a broadly glass-half-full view about its future.
In its current state, he says, ChatGPT is “an amazing party trick”. “It’s a bit like a 14-year-old who can write really fast,” he says, before switching similes. “At the moment, it’s like someone’s invented the internal combustion engine, and they’ve put it on a plinth somewhere, and everyone gets to touch the button and see the pistons go up and down and they’re like, ‘Wow, isn’t that an amazing thing?’
“But it doesn’t actually change the world until people start putting it into cars and aeroplanes and motorbikes and so on. And of course, at that point, the engine becomes invisible. What you buy is the car, the motorbike. And that’s what is about to happen. A load of vehicles, platforms, things that we all use, are about to be completely transformed.”
CSW asked another AI programme, Midjourney, to supply the main image for this feature. Midjourney generates images from written descriptions and in this instance, we gave it the following prompt: “Oil painting on canvas. Female civil servant wearing white shirt and navy blue trouser suit, brown hair. Sitting on a modern office chair in a modern, open-plan office. Looking at a computer screen which says “ChatGPT”. The software took about 30 seconds to generate this picture. We will leave readers to draw their own conclusions about why the AI made the choices it did, including the woman’s Barbie-doll dimensions and Kate-Middleton features. The eagle-eyed will also notice she appears to have six fingers…
What can ChatGPT already do for civil servants – and what could it mean for them in the future?
For hard-pressed civil servants trying to squeeze more work into less and less time, a tool that could chop out, say, 25% of their administrative tasks is very appealing. So what – to deploy a classic bit of Whitehallese – are the use cases? Well, for officials needing to summarise an evidence base, or get a handle on a particular topic, ChatGPT in its current form might be a good place to start. Commands could include: “Give me 10 ideas about the implications of using ChatGPT in the civil service” or “summarise the key elements in the regulation of processed meat”. It could also help them escape the tyranny of the blank page by coming up with an inscription in a colleague’s leaving card or – whisper it softly – a first draft of a ministerial speech.
For Perschke, “it’s not very useful at the moment”. The fact that OpenAI has partnered with Microsoft, however, means things are about to get interesting. “Soon it’s going to be embedded into Word, Excel, PowerPoint and Outlook. When it’s in your laptop, that’s when it’s going to be like ‘wow’,” he says.
Sensitive data should never be sent to these models, at least without careful specialist know-how. But more powerful algorithms like GPT-4 can increasingly be safely directed towards particular – and even private – sources of information, such as complex regulatory policy papers, legislation and departmental data stores to enable the models to understand and adopt particular styles and to give highly useful and specific answers. Using models in this way requires implementing them as part of a discrete digital service designed to meet a particular need. Doing this can also help validate the results, for example by automating a check of results with the wider internet, and in the process greatly improving accuracy. As Maltby says: “The result will mean some radical efficiencies in how many core government services work, not just to how civil servants spend their time in the office.”
“I suspect we are in the equivalent of 1992 in internet years... as a result of LLMs, we’ll have new types of work that didn’t exist beforehand, not just easier versions of what we do now"
Asked to gaze into his crystal ball and predict which civil service roles could be under threat from the new technology, Maltby says we might see fewer jobs which involve writing regular reports “that translate things without adding a ton of value”. He also strikes an optimistic note, however. Officials will be able to scan incredible amounts of information in a salient way and “those value-added jobs, with which the civil service is already stocked to the rafters, will feel superpowered”.
Prognostication has its limits, though. “I suspect we are in the equivalent of 1992 in internet years,” Maltby says. “People simply didn’t foresee all the new businesses – let alone things like music and TV streaming – that were going to spring up. I would say it’s a safe bet that as a result of LLMs, we’ll have new processes, new business models, new types of work that didn’t exist beforehand, not just easier versions of what we do now.”
For his part, Perschke believes it could lead to a huge democratisation of ideas and a new era of face-to-face communication.
“Really, for the last 200 years, the written word has been the communication vehicle of choice for humans – driven by the sense that once you write something down, it’s useful. But it’s actually a terrible form of communication,” he says, giving government procurement as an example.
“When government wants to buy something, for many different reasons – largely around compliance with laws – they write down, in a massive document, what it is they want. They send it out to some people who they think can meet their requirements, and those people send back another massive document explaining how they’ll do it. And then government chooses the winning tender based on those documents. Ultimately, government is selecting for your ability to write about this subject, not for your ability to do the thing.
“It won’t be long until an MoD official can press a button and go ‘Write me an invitation to tender for a nuclear submarine’ then you, a company that makes submarines, will receive it, tell your computer to read this invitation to tender and ask it to write a winning response. And the MoD evaluators will then get 150 flawless responses. So we’re back to where we started: how do we know who’s best at building a nuclear submarine? We’ve just spun a load of information around between computers.
“Ultimately at that stage, government will say: ‘We’re going to have to get their humans in a room with our humans and talk about it. And then we’ll probably ask them to walk us around the last nuclear submarine they built and we’ll see if that’s any good. And then we’ll choose them based on how good they are at building nuclear submarines.’ Which – unlike basing decisions on how well people can write about building submarines – is a much better way of choosing.”
“Mass communication using the written word, adjudicated by the high priests of form – journalists, officials, lawyers, whatever – is dead"
The rise of LLMs could have important ramifications for diversity and inclusion in the civil service too. At the moment, so many parts of the system – from job-application sifting processes to officials’ ability to get their ideas in front of the right people – are weighted towards those from white, privileged backgrounds who have been conditioned to write in a certain way. But when everyone can feed their ideas into the same software which can parrot the style of a perfectly written submission, surely the way in which you write and format something is no longer an indicator of the value of the idea at its heart.
According to Perschke: “Mass communication using the written word, adjudicated by the high priests of form – journalists, officials, lawyers, whatever – is dead. It’s a complete free-for-all now: everyone can write like that. So how are we going to establish quality? We’re going to talk to each other. We’ve exposed the limit of written communications. In a way, it’s a massive leap forwards to go backwards.”
This vision of the world might take years to materialise. Or it might be with us in a few months. But even now, and even in their current, flawed incarnations, ChatGPT and its ilk are remarkable, and possibly rather useful to government officials. To find out how useful, read how CSW’s civil servant reviewers got on when they asked it to craft a letter from a minister, design a policy and apply for a civil service job: