Today, OpenAI released GPT-4.5: the latest and largest frontier language model available, and a significant update to ChatGPT. While not achieving benchmark statistics comparable to reasoning models such as DeepSeek R1 and OpenAI O3, GPT-4.5 represents the biggest and most anticipated model release of the year so far, and we are excited to test it out. OpenAI claims there are large improvements to writing quality, and hot takes on the performance are already all over social media.
We wanted to answer the question that many wonder: as the models get better, can we still detect AI-generated text with GPT-4.5? We ran a quick test today to find out.
We started by sampling 11 prompts that are indicative of everyday writing tasks that one might ask ChatGPT.
Here are the prompts we used:
We tried to make the prompts as diverse and varied as possible, and in addition, we tried to write prompts that showcased a significant qualitative difference from the previous GPT models as possible: in other words, if there was an opportunity for the model to be creative and show off the "wow" factor, we tried our best to afford GPT-4.5 that opportunity.
Prompt | Pangram | Leading Competitor 1 | Leading Competitor 2 |
---|---|---|---|
Koala Conservation | 100% | 100% | 100% |
Newspaper Email | 100% | 100% | 67% |
Room Temperature Semiconductor | 100% | 56% | 86% |
School Uniforms | 85% | 100% | 80% |
Poetry Diary | 100% | 100% | 15% |
Escape Room Review | 100% | 81% | 56% |
Russian Film Email | 100% | 100% | 91% |
Mars Landing Scene | 100% | 43% | 7% |
Komodo Dragon Script | 98% | 88% | 0% |
Halloween Breakup Poem | 100% | 100% | 0% |
Venice Chase Scene | 100% | 49% | 9% |
Pangram is able to detect all 11 GPT-4.5 written essays, even without any GPT-4.5 data in the training set. Comparatively, two leading AI detection competitors present spotty results at best. While Pangram is able to confidently predict 10 out of 11 samples as 98% or higher AI likelihood, the competition often expresses high amounts of uncertainty, or in the worst case, predicts with high confidence that the text is human-generated.
Pangram is itself a large machine learning model that has seen millions of examples of both human and AI-generated text. Large models tend to generalize better, and pick up on subtle patterns across AI-generated text that others are not able to catch. Our active learning approach further decreases our false positive rate while increasing our sensitivity, allowing the model to work well at scale and generalize to new LLMs much more effectively than our competitors. Additionally, our focus on data quality and diversity ultimately results in a model that has much more experience in understanding the finer-grained details that other models cannot pick up on.
Yes, our AI detection tool is still highly effective at detecting GPT-4.5 generated text.
So if you're wondering how well Pangram will do when a new, bigger and better model comes out, Pangram passes the test with the most anticipated AI release we have seen in a while, without any retraining at all. If you don't want your AI detection software to suddenly stop working the next time OpenAI updates their model, give Pangram a try today.
For more information on our research or free credits to trial our model on GPT-4.5, please contact us at info@pangram.com.