Pangram is the leading software in detecting AI-generated text written by ChatGPT, Claude, Gemini, and more, and distinguishing AI-written text from human text.
We are now going one step further and releasing an advanced model that can not only detect AI-generated content, but also can tell which LLM a piece of AI-generated text comes from. We call our new technology "AI Identification".
Intuitively, people are beginning to identify that the different LLMs have different writing styles. For example, ChatGPT is known for being quite direct and straightforward, Claude is known for being more fluent and conversational, Grok is known for being uncensored and provocative, and Deepseek-R1 is starting to become known for being rambly and verbose.
![Graham Neubig pokes fun at the different LLMs' stylistic tendencies][tweet2] [tweet2]: https://pangram-public.s3.us-east-1.amazonaws.com/web/blog/28/neubig_tweet.png
![Ethan Mollick muses on Claude Sonnet's pleasant personality.][tweet1] [tweet1]: https://pangram-public.s3.us-east-1.amazonaws.com/web/blog/28/mollick_tweet.png
A recent study from Lisa Dunlap and collaborators at UC Berkeley probed the qualitative differences (or informally, the "vibes") of different LLMs. They found many interesting things, such as "Llama is more humorous, utilizes more formatting, provides more examples, and comments much less on ethics than GPT and Claude". The implication is that model performance is not always aligned with human preferences: even though GPT-4 and Claude-3.5 are more advanced models than the Llama series, Llama always seems to punch above its weight on Chatbot Arena, a crowdsourced Elo based ranking of LLMs based on preferences over answers to the same prompts. Are models that perform well on Chatbot Arena smarter and more capable, or are they just trying to game human psychology in a way that just makes them more "likable"? And some models are more helpful and likable than others, is it even important that they may be less capable at solving PhD level reasoning problems? These are questions worth studying, and important ones for understanding the utility of systems like Chatbot Arena over traditional model evaluations.
We wondered at Pangram if it is possible that our model could use these vibes to identify and distinguish these LLMs from each other.
Similar to how we train our base AI detection model to distinguish AI writing from human text, we also train the same detection model to perform AI identification using a technique called multi-task learning. In practice, we classify the various language models into 9 families, which we have determined through extensive experimentation.
The families are the following:
The way we accomplish this in practice is to add another "head" to our neural network. When we supervise the AI detection task, we also supervise the AI identification task by passing the model label to the network and backpropagating the error in the AI identification as well as the detection prediction.
Image source: GeeksForGeeks
Almost all of the layers of the model are shared between the two tasks, and only the final prediction layer is split.
We find in multi-task learning that some tasks help each other when learned together, and some tasks hurt each other. In biology, a similar concept is the idea of symbiosis vs. parasitism. For example, a clownfish living in a sea anenome is an example of symbiosis: the clownfish feeds on predators that can harm the anemone, while the clownfish is protected from its own predators by camouflaging and hiding inside the anemone.
We find that adding the LLM identification task is symbiotic with the LLM detection task. In other words, asking our model to not only detect AI-generated text but identify the model it came from is overall helpful to being able to detect AI. Other researchers have also confirmed that the various LLMs are not only distinguishable from human text, but are also distinguishable from each other.
An embedding is a representation of a piece of text as a numerical vector. The actual values of the embedding are not meaningful in isolation, but when two embeddings are close together, that means that they either have similar meaning, or similar style. Using a technique called UMAP, we can visualize the embeddings, which are very high dimensional, in 2-D space. These authors find that when documents written by humans and LLMs are converted to style embeddings, as you can see in the image above, all documents corresponding to the same LLM are separable in embedding space! This means that overall, all documents written by the same LLM are closer in style than ones written by different LLMs, or LLMs and humans.
This result gave us confidence that a classifier that could identify the source LLM was possible.
Our model is 93% accurate at identifying the correct LLM family that a piece of AI-generated text has come from. Below is the confusion matrix, which shows how often our model correctly identifies each LLM family (diagonal cells) versus how often it confuses one LLM for another (off-diagonal cells). The darker the color, the more predictions fall into that cell. A perfect model would have dark squares only along the diagonal and white squares everywhere else.
A few interesting observations about our confusion matrix:
Confusions occur more frequently between model families. For example, GPT-4 is frequently confused with the OpenAI reasoning series. This makes sense, because GPT-4 is likely to be a component or a starting point for OpenAI's reasoning models!
The model more frequently confuses LLMs with "Other" than with specific LLMs. This shows that in cases where the model is not certain, it is more likely to default to "Other" rather than commit to a certain LLM.
While the LLM classifier is not perfect, it is often accurate, and most importantly, when the LLM classifier is wrong, it confuses certain AI systems with other AI systems, but does not confuse AI systems' outputs with genuine human writing.
We believed it was important to go beyond AI detection and also solve AI identification for a few reasons.
Firstly, we believe that teaching the model to learn to distinguish the writing styles of different LLMs, which is a more difficult task than just identifying whether something is AI or not, is helpful in strengthening the performance of the AI detector itself. By asking the model to go above and beyond, it is in a way acquiring advanced skills and latent knowledge that is helping it generalize to detecting AI-generated text with higher accuracy.
Interpretability is another reason we want to display the results of the LLM classifier. We would like to build confidence that the model actually knows what it's doing under the hood, and is not just taking a random guess (like many other random detectors). By showing not just the AI score, but which LLM the text came from, we hope to build confidence in the model's ability to understand the nuances of AI writing style.
Finally, we want to discover patterns over time: which LLMs are being used in the wild and with what frequency? What are the LLMs of choice for students, vs. fraudsters, vs. programmers? These are the kinds of questions that we can now hope to answer in future studies.
We hope you enjoy trying our AI identification feature, and that it is useful in helping people understand the innate personalities and styles of the different LLM families. For more information, please reach out to info@pangram.com!