Many AI researchers think fakes will become undetectable

Many AI researchers think fakes will become undetectable

Rishi Sunak is Britain’s prime minister. If some advertisements on Facebook can be trusted (which they cannot) he also appears to be flogging get-rich-quick schemes. One such advert shows Mr Sunak endorsing an app supposedly developed by Elon Musk, a businessman, into which viewers can make regular “savings”.

The video is fake. Generated with the help of AI, it is just one of 143 such advertisements catalogued by Fenimore Harper Communications, a British firm, which ran in December and January. It is not just those in the public eye who can have their likenesses used for dubious ends. In June 2023 the Federal Bureau of Investigation in America warned the public of “malicious actors” using AI to create fake sexually themed videos and images of ordinary people, in order to extort money.

How to detect such trickery is a live topic among AI researchers, many of whom attended NeurIPS, one of the field’s biggest conferences, held in New Orleans in December. A slew of firms, from startups to established tech giants such as Intel and Microsoft, offer software that aims to spot machine-generated media. The makers of big AI models, meanwhile, are searching for ways of “watermarking” their output so that real pictures, video or text can be readily distinguished from the machine-generated sort.

But such technologies have not, so far, proved reliable. The AI cognoscenti seem gloomy about their prospects. The Economist conducted a (deeply unscientific) straw poll of delegates to NeurIPS. Of 23 people asked, 17 thought AI-generated media would eventually become undetectable. Only one believed that reliable detection would be possible. (The other five demurred, preferring to wait and see.)

Detection software relies on the idea that AI models will leave a trace. Either they will fail to reproduce some aspect of real images and video, or of human-generated text, or they will add something superfluous—and will do so often enough to let other software spot the error. For a while, humans could do the job. Up until about the middle of 2023, for instance, image-generation algorithms would often produce people with malformed hands, or get the numbers wrong on things like clock faces. These days, the best no longer do.

But such telltales often still exist, even if they are becoming harder for humans to spot. Just as machines can be trained to reliably identify cats, or cancerous tumours on medical scans, they can also be trained to differentiate between real images and AI-generated ones.

It seems, though, that they cannot do so all that well. Detection software is prone to both false positives (wrongly flagging human content as generated by AI) and false negatives (allowing machine-generated stuff to pass undetected). A pre-print published in September by Zeyu Lu, a computer scientist at Shanghai Jiao Tong University, found that the best-performing program failed to correctly spot computer-generated images 13% of the time (though that was better than the humans, who erred in 39% of cases). Things are little better when it comes to text. One analysis, published in December in the International Journal of Educational Integrity, compared 14 tools and found that none achieved an accuracy of more than 80%.

If trying to spot computer-generated media after the fact is too tricky, another option is to label it in advance with a digital watermark. As with the paper sort, the idea is to add a distinguishing feature that is subtle enough not to compromise the quality of the text or image, but that is obvious to anyone who goes looking for it.

One technique for marking text was proposed by a team at the University of Maryland in July 2023, and added to by a team at University of California, Santa Barbara, who presented their tweaks at NeurIPS. The idea is to fiddle with a language model’s word preferences. First, the model randomly assigns a clutch of words it knows to a “green” group, and puts all the others in a “red” group. Then, when generating a given block of text, the algorithm loads the dice, raising the probability that it will plump for a green word instead of one of its red synonyms. Checking for watermarking involves comparing the proportion of green to red words—though since the technique is statistical, it is most reliable for longer chunks of writing.

Many methods for watermarking images, meanwhile, involve tweaking the pixels in subtle ways, such as shifting their colours. The alterations are too subtle for human observers to notice, but can be picked up by computers. But cropping an image, rotating it, or even blurring and then resharpening it can remove such marks.

Another group of researchers at NeurIPS presented a scheme called “Tree-Ring” watermarking that is designed to be more robust. Diffusion models, the most advanced type of image-generation software, begin by filling their digital canvas with random noise, out of which the required picture slowly emerges. The tree-ring method embeds the watermark not in the finished picture, but in the noise at the start. If the software that created a picture is run in reverse, it will reproduce the watermark along with the noise. Crucially, the technique is less easy to thwart by fiddling with the final image.

But it is probably not impossible. Watermarkers are in an arms race with other researchers aiming to defeat their techniques. Another team led by Hanlin Zhang, Benjamin Edelman and Boaz Barak, all of Harvard University, presented a method (not yet peer-reviewed) that can, they say, erase watermarks. It works by adding a dash of new noise, then using a second, different AI model to remove that noise, which removes the original watermark in the process. They claim to be able to foil three new text-watermarking schemes proposed in 2023. In September scientists at the University of Maryland published a paper (also not yet peer-reviewed) claiming that none of the current methods of image watermarking—Tree-Rings included—is foolproof.

Nevertheless, in July 2023 America’s government announced “voluntary commitments” with several AI firms, including OpenAI and Google, to boost investment in watermarking research. Having imperfect safeguards is certainly better than having none (although open-source models, which users are free to tweak, will be harder to police.) But in the battle between the fakers and the detectives, it seems that the fakers have the upper hand.

Leave a Reply

Your email address will not be published. Required fields are marked *