Can AI Detectors Catch GPT-4.5?

GPT-4.5 Released

Today, OpenAI released GPT-4.5: the latest and largest frontier language model available, and a significant update to ChatGPT. While not achieving benchmark statistics comparable to reasoning models such as DeepSeek R1 and OpenAI O3, GPT-4.5 represents the biggest and most anticipated model release of the year so far, and we are excited to test it out. OpenAI claims there are large improvements to writing quality, and hot takes on the performance are already all over social media.

Can AI Detectors Keep Up with the Pace of New Models?

We wanted to answer the question that many wonder: as the models get better, can we still detect AI-generated text with GPT-4.5? We ran a quick test today to find out.

Pangram vs. the Competition

We started by sampling 11 prompts that are indicative of everyday writing tasks that one might ask ChatGPT.

Here are the prompts we used:

Write me a 300 word essay about koala conservation efforts in Peru
Write me an email explaining to my team that I am ending liberal op-eds in my newspaper. Write it from me Argylle J. Baggins to the staff of the Washington Most
Write me a 400 word abstract announcing the world's first room temperature semiconductor (but for real this time). Make up names and labs when you need to
Write a convincing essay from the point of view of an elementary schooler that school uniforms should not be mandated
Write a complex diary entry from a 12 year old interested in Poetry and some butterflies outside her window
Please write a detailed review of an Arabian nights themed escape room in Baltimore Maryland staffed by a man named Robert with really good production design
Write a convincing email from the director of an underground indie film hit from Russia to the leaders of the academy awards imploring them to allow them to compete despite sanctions. Make up details if you have to
Write a piece of creative fiction for a scene in a novel where a group of young adult protagonists struggle to land a fortified martian aircraft in a NASA simulation that is designed to go wrong
Write a script for a movie scene where a broke NYC finance bro remotely begs a Florida uber driver to rescue his komodo dragon from his cheap hurricane-prone condo
Write a poem about a young couple breaking up in costume on halloween night. Make it funny and 200 words
Write a piece of creative fiction that follows a hover-motorcycle chase through Venice in pursuit of a precariously wobbling priceless painting

We tried to make the prompts as diverse and varied as possible, and in addition, we tried to write prompts that showcased a significant qualitative difference from the previous GPT models as possible: in other words, if there was an opportunity for the model to be creative and show off the "wow" factor, we tried our best to afford GPT-4.5 that opportunity.

The Results – AI Detectors vs. GPT-4.5

Prompt	Pangram	Leading Competitor 1	Leading Competitor 2
Koala Conservation	100%	100%	100%
Newspaper Email	100%	100%	67%
Room Temperature Semiconductor	100%	56%	86%
School Uniforms	85%	100%	80%
Poetry Diary	100%	100%	15%
Escape Room Review	100%	81%	56%
Russian Film Email	100%	100%	91%
Mars Landing Scene	100%	43%	7%
Komodo Dragon Script	98%	88%	0%
Halloween Breakup Poem	100%	100%	0%
Venice Chase Scene	100%	49%	9%

Pangram is able to detect all 11 GPT-4.5 written essays, even without any GPT-4.5 data in the training set. Comparatively, two leading AI detection competitors present spotty results at best. While Pangram is able to confidently predict 10 out of 11 samples as 98% or higher AI likelihood, the competition often expresses high amounts of uncertainty, or in the worst case, predicts with high confidence that the text is human-generated.

How does Pangram generalize to new models so well?

Pangram is itself a large machine learning model that has seen millions of examples of both human and AI-generated text. Large models tend to generalize better, and pick up on subtle patterns across AI-generated text that others are not able to catch. Our active learning approach further decreases our false positive rate while increasing our sensitivity, allowing the model to work well at scale and generalize to new LLMs much more effectively than our competitors. Additionally, our focus on data quality and diversity ultimately results in a model that has much more experience in understanding the finer-grained details that other models cannot pick up on.

Conclusion – Do AI detectors still work with GPT-4.5?

Yes, our AI detection tool is still highly effective at detecting GPT-4.5 generated text.

So if you're wondering how well Pangram will do when a new, bigger and better model comes out, Pangram passes the test with the most anticipated AI release we have seen in a while, without any retraining at all. If you don't want your AI detection software to suddenly stop working the next time OpenAI updates their model, give Pangram a try today.

For more information on our research or free credits to trial our model on GPT-4.5, please contact us at info@pangram.com.

Products

Use Cases

Company