Pangram Logo
pangramlabs

How Pangram detects AI-generated content?

View technical report PDF

Overview

Pangram Text is designed to detect AI-generated content with a near-zero false positive rate. Our rigorous training approach minimizes errors and allows the model to detect AI text by analyzing and understanding subtle cues in the writing.

Initial training process

Our classifier uses a traditional language model architecture. It receives input text and tokenizes it. Then, the model turns each token into an embedding, which is a vector of numbers representing the meaning of each token.

The input is passed through the neural network, producing an output embedding. A classifier head transforms the output embedding into a 0 or 1 prediction, where 0 is the human label and 1 is the AI label.

We train an initial model on a small but diverse dataset of approximately 1 million documents comprised of public and licensed human-written text. The dataset also includes AI-generated text produced by GPT-4 and other frontier language models. The result of training is a neural network capable of reliably predicting whether text was authored by human or AI.

Continued improvement through iteration

Hard Negative Mining

The initial model was already quite effective, but we wanted to maximize accuracy and reduce any possibility of false positives (incorrectly predicting human-authored documents as AI-generated). To do this, we developed an algorithm specifically for AI detection models.

With the initial dataset, our model did not have enough signal to go from 99% accurate to 99.999% accurate. While the model learns the initial patterns in the data quickly, the model needs to see hard edge cases in order to precisely distinguish between human and AI text.

We solve this by using the model to search large datasets for false positives and augmenting the initial training set with these additional hard examples before retraining. After several cycles of this, the resulting model exhibits a near-zero false positive rate as well as overall improved performance on held-out evaluation sets.

HUMANAIMirror Prompts
Mirror Prompts
We design the AI side of the dataset to closely resemble the human side in style, tone, and semantic content. For each human example, we generate an AI-generated example that matches the original document on as many axes as possible, to ensure that our model learns to classify documents solely based on specific characteristics of LLM writing.
Retrain
We train the model with updated training set and evaluate the model's performance at each step. Using this method, we are able to reduce errors and increase the accuracy of our model beyond what is possible with normal training.
retrain model diagram

Learn more

arxiv.org
Technical Report on the Pangram AI-generated Text Classifier
Check out our full technical white paper on arXiv where we go in-depth on training details, performance, and other experiments!