Pangram Logo
pangramlabs

Third-Party Research Study Shows Pangram is the Most Robust AI Detector

Bradley EmiOctober 30, 2024

Third-Party Research Study Shows Pangram is the most Robust AI Detector

Researchers from the University of Houston, UC Berkeley, UC Irvine and the startup Esperanto AI have found that Pangram is the most robust AI text detector out of a wide variety of both commercial and open-source methods. In the paper, titled “Esperanto: Evaluating Synthesized Phrases to Enhance Robustness in AI Detection for Text Origination”, the researchers studied the effects of language translation on the ability of AI detectors.

Exploiting AI Detectors by using Translators

It is a known exploit of AI detection that passing AI text through Google Translate into a foreign language and then re-translating that text back into English can help an adversary (or simply a clever time-starved student) evade AI detection programs. At Pangram, we internally call this attack “double translation,” and the researchers refer to it as “backtranslation.” Here’s an example of double translation. We ask ChatGPT to write some text for us. We first translate the text to Japanese, and then translate it back to English. We notice that some of the phrases have changed due to the fact that translation software isn’t perfect and often there are multiple ways to say the same thing. This has an effect similar to what a paraphrasing tool like Quillbot does.

ChatGPT generated text Double translated text An example of double translation

Many of our competitors are not robust to this exploit. Above is one of the more common competitor AI detectors in use on the market. We see that the model can detect AI straight from ChatGPT, but once put through double translation, it only predicts 15% AI.

GPTZero results A popular competitor tool classifiers the original AI text correctly, but incorrectly classifies the double translated text as human-written

Pangram, however, is able to predict both the original ChatGPT text and the double translated text as 99.99% AI. Not only are we able to predict that this is AI-generated text, but we are able to confidently predict that it was GPT-4 that was the original source. The researchers set out to study this phenomenon in general terms, at scale.

Pangram results Pangram correctly identifies both the original and double translated text as AI-generated

Studying the effect of backtranslation on 720,000 documents

One example is not enough to prove that our detector is robust and others are not. In the research study, the researchers sourced thousands of news articles, scientific paper abstracts, Reddit posts, and product reviews confirmed to be human written. They then generated several AI examples using GPT-3.5-Turbo, LLaMA 3, Mistral, Phi3, and Yi.

Overall, even before employing a translation attack, many of the open-source methods and commercial detectors are in fact completely ineffective.

First, a threshold was chosen: what this means is selecting the percentage cutoff above which we’d consider a document AI. Most AI detectors give a percentage as the final output. To put all detectors on comparable terms, the thresholds were chosen such that each model has a 1% False Positive Rate. Then, the detector accuracy can be compared as the fraction of true positives: how many AI examples can each detector catch at that threshold?

Many of the other methods studied in the paper completely fail to detect AI content. For example, ZeroGPT and GPTZero cannot even achieve a 1% false positive rate at any threshold on some domains, and well-cited academic papers like RADAR and LLMDet are less than 50% accurate.

The proposed metric to evaluate performance is to measure the TPR @ 1% FPR: meaning, given a constant 1% false positive rate, how often can the model detect AI-generated text? ZeroGPT cannot even achieve a 1% false positive rate at any threshold on most domains, and well-cited academic papers like RADAR and LLMDet achieve well under 50% on this metric.

Meanwhile, Pangram achieves above 96% recall on all domains at 1% FPR, and even achieves 85% on the challenging reviews dataset, which contains reviews that are only 40-50 words long (which is well under our recommended word count threshold for detecting AI in the wild commercially).

After a double translation attack, many of the detectors completely fall apart. GPTZero, for example, drops from 97% to only 42% on the news domain and 65% to 9% on the reviews domain. The researchers conclude “The outcomes for GPTZero and ZeroGPT indicate a lack of robustness against backtranslation techniques…Pangram exhibits a degree of robustness, especially on longer texts.”

The full results are reproduced here. Pangram exhibits superior performance in all categories.

Results table comparing AI detectors Results table from the Esperanto paper showing Pangram's robustness

Conclusion

This research further supports our claims that Pangram is the only AI detection software on the market today that works reliably enough to be used in academic and commercial settings, and cannot be bypassed by tricks such as double translation.

This is not an accident or a coincidence. Pangram’s robustness is evidence of a powerful model that knows how to generalize and is backed by large datasets and our targeted active learning approach. While anyone can build an AI detection tool that works some or even most of the time, our scalable approach is the only way to achieve reliable, consistent accuracy that does not completely break down when the text is modified or altered.

We are always working to improve the performance and robustness of our AI detection model. We stay up-to-date with the latest research in adversarial machine learning and are constantly testing our own model against potential attacks and bypasses.

More to come soon on this topic!

Subscribe to our newsletter
We share monthly updates on our AI detection research.