Facebook to open source advanced photo and video-matching technology
Facebook is preparing to release new technology which will help web platforms identify harmful content, including child sexual exploitation material and terrorist propaganda, and prevent people sharing it.
By releasing the source code for these technologies to the public the company will enable anyone to use the infrastructure for taking down abuse images without having to share the images themselves.
The company’s technology has advanced significantly since 2017 when it piloted a project in which users could upload their intimate photographs and videos to Facebook to request that they be blocked.
As analysed by Sky News when the web giant made its initial announcement, sending Facebook staff our private sexual material in order to prevent that material being seen by strangers seemed counter-intuitive.
But the two new pieces of software known as PDQ and TMK+PDQF will allow images and photographs to be blocked even if they have been modified in a way which would trick classical forms of cryptographic fingerprinting.
Facebook director Antigone Davis explained: “These technologies create an efficient way to store files as short digital hashes that can determine whether two files are the same or similar, even without the original image or video.
“Hashes can also be more easily shared with other companies and non-profits.
“For example, when we identify terrorist propaganda on our platforms, we remove it and hash it using a variety of techniques, including the algorithms we’re sharing today.”
Identifying such modified videos has become an increasingly important responsibility for technology platforms which have seen active attempts by extremist groups to evade their content filters and spread terrorist propaganda, especially from Islamic State supporters and white supremacists.
Sky News found videos celebrating the New Zealand mosque shootings in Christchurch were easily avoiding YouTube’s moderation efforts despite a general clampdown across social media platforms because of the simplicity of the analysis tools.
Classically, fingerprinting technology used cryptographic hash functions to identify image or video files by a short unique code which computers could easily use to automatically identify them.
But unfortunately the technology is very easy to fool. With cryptographic hash functions, even the smallest change to the input file will result in a completely different fingerprint as its output.
Image files which have been manually manipulated to change a single pixel – or have simply been rotated or resized – might seem similar to the human eye, but would be completely unrecognisable to a computer.
This makes it very possible for someone to share blocked images by deceiving the automated system meant to catch them.
Here, obviously similar images of Facebook founder Mark Zuckerberg generate completely different hashes using the MD5 algorithm.
However Facebook’s PDQ and TMK+PDQF technologies don’t use cryptographic hashing, but are instead more similar to perceptual hashing technologies, such as pHash.
Unlike cryptographic hashes, perceptual hashes are able to detect the vast similarities between images which are not identical, foiling attempts to deceive the automated system.
Where the MD5 hashes didn’t reflect any similarity between the images, the perceptual hashes for these images generated with the open source pHash algorithm allows the computer to say they are 89% similar.
Ms Davis said: “Our photo-matching algorithm, PDQ, owes much inspiration to pHash although was built from the ground up as a distinct algorithm with independent software implementation.
“The video-matching technology, TMK+PDQF, was developed together by Facebook’s Artificial Intelligence Research team and academics from the University of Modena and Reggio Emilia in Italy.”